[jira] [Commented] (HBASE-12911) Client-side metrics

Nick Dimiduk (JIRA) Tue, 08 Sep 2015 10:13:22 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735178#comment-14735178
 ]


Nick Dimiduk commented on HBASE-12911:
--------------------------------------

bq. Hard to tell what connection is hosted where and where it is connected. Is 
that fixable?

I was considering logging the connection creation stack trace as a tag on the 
bean, similar to [~chenheng]'s investigation on HBASE-14361. Dunno if that's 
helpful to commit though.

Okay, we'll keep connection-level aggregation.

bq. If slow query though, how you find it? 

I was imagining one could set alerting based on the aggregate 95pct latency. 
Ie, if the 95pct of RPC's to any individual server trend drastically higher 
than the aggregate, I'd want to know about it. [~phobos182], [~toffer], 
[~clayb], [~eclark] is this crazy thinking?

Do we like the individual connection objects exposed separately? I was thinking 
of applications like the (I think it was [~malaskat]'s) multi-cluster 
client-side failover patch, where you'd be embedding multiple connection 
instances in an application and want to see their behaviors separately. Hence 
Connection objects reported by their objectId (I assume this is stable in 
Java?). Maybe this is an uncommon case and supporting it makes this feature 
harder to consume for everyone else? Those objectIds change all the time, so 
parsing them, for instance, for an OpenTSDB tcollector maybe be annoying.

Then again, we don't provide a /jmx over http from the clients, so there's not 
an easy way for tcollector to grab these as they are, unless it supports raw 
jmx too.

> Client-side metrics
> -------------------
>
>                 Key: HBASE-12911
>                 URL: https://issues.apache.org/jira/browse/HBASE-12911
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, Operability, Performance
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, am.jpg, client metrics 
> RS-Master.jpg, client metrics client.jpg, conn_agg.jpg, connection 
> attributes.jpg, ltt.jpg, standalone.jpg
>
>
> There's very little visibility into the hbase client. Folks who care to add 
> some kind of metrics collection end up wrapping Table method invocations with 
> {{System.currentTimeMillis()}}. For a crude example of this, have a look at 
> what I did in {{PerformanceEvaluation}} for exposing requests latencies up to 
> {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a 
> lot going on under the hood that is impossible to see right now without a 
> profiler. Being a crucial part of the performance of this distributed system, 
> we should have deeper visibility into the client's function.
> I'm not sure that wiring into the hadoop metrics system is the right choice 
> because the client is often embedded as a library in a user's application. We 
> should have integration with our metrics tools so that, i.e., a client 
> embedded in a coprocessor can report metrics through the usual RS channels, 
> or a client used in a MR job can do the same.
> I would propose an interface-based system with pluggable implementations. Out 
> of the box we'd include a hadoop-metrics implementation and one other, 
> possibly [dropwizard/metrics|https://github.com/dropwizard/metrics].
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12911) Client-side metrics

Reply via email to