[ 
https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730129#comment-14730129
 ] 

Nick Dimiduk commented on HBASE-12911:
--------------------------------------

Thanks a lot for taking a peek [~stack].

bq. Looking at the PNG, what is the "tag." prefix in client-side metrics.

I'm curious about that myself. Far as I can tell, that comes from some default 
in the MBean translation, they were there when I implemented a bean that didn't 
report anything at all. I guess it's helpful to know what host produced the 
metric in question?

bq. What is batchPool*? If I hover over the metiric, there is a description.

See ConnectionImplementation#getCurrentBatchPool(). Far as I can tell, it's the 
thread pool used by the connection to service DML work on behalf of all 
Connection consumers. For instance, it's the thread pool that's passed down 
into AsyncProcess unless the user specifies their own pool at Table creation.

bq. Can we count threads in client metrics?

We can inspect the batch pool, as I've started here. Pools passed into a Table 
instance (as I mention above) wouldn't be a part of that. Maybe we can query 
the JVM for all threads that call themselves "HBase" and expose that? I'm not 
sure what you have in mind with this one.

bq. So, having trouble following the client side jmx bean hierarchy. When do 
the rpc metrics show up? Clients will list each remote server they connect too?

Yeah, i'm not exactly sure how this plays out on a real cluster. This is from a 
simple standalone run. The goal is to have a bean for each host the client is 
sending an RPC to. From the ltt snap, "192.168.1.10-60917" is the single RS 
endpoint (that would be <hostname>-<RPCPort> if I had real DNS) and 
"192.168.1.10-60915" is the master RPC endpoint.

My open questions are around expiring old hosts when they go away, and about 
aggregating this host-level information at the connection level (or if that's 
even useful, given the drastic difference between our various RPC call 
durations and sizes). We could also explore other aggregations, like per region 
or per table, but that requires a bit more unpacking of the IPC layer than I've 
tackled just yet.

> Client-side metrics
> -------------------
>
>                 Key: HBASE-12911
>                 URL: https://issues.apache.org/jira/browse/HBASE-12911
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Client, Performance, Usability
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, client metrics RS-Master.jpg, 
> client metrics client.jpg, connection attributes.jpg, ltt.jpg, standalone.jpg
>
>
> There's very little visibility into the hbase client. Folks who care to add 
> some kind of metrics collection end up wrapping Table method invocations with 
> {{System.currentTimeMillis()}}. For a crude example of this, have a look at 
> what I did in {{PerformanceEvaluation}} for exposing requests latencies up to 
> {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a 
> lot going on under the hood that is impossible to see right now without a 
> profiler. Being a crucial part of the performance of this distributed system, 
> we should have deeper visibility into the client's function.
> I'm not sure that wiring into the hadoop metrics system is the right choice 
> because the client is often embedded as a library in a user's application. We 
> should have integration with our metrics tools so that, i.e., a client 
> embedded in a coprocessor can report metrics through the usual RS channels, 
> or a client used in a MR job can do the same.
> I would propose an interface-based system with pluggable implementations. Out 
> of the box we'd include a hadoop-metrics implementation and one other, 
> possibly [dropwizard/metrics|https://github.com/dropwizard/metrics].
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to