[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511727#comment-14511727
 ] 

Colin Patrick McCabe commented on HDFS-8213:
--------------------------------------------

Thanks for that perspective, [~ndimiduk].  

I actually don't see any conflict between allowing the client to trace itself, 
and allowing the application to trace itself.  We should be able to support 
both use-cases.  The people who don't want to have the client initiate tracing 
can simply not set {{hdfs.client.htrace.spanreceiver.classes}} and 
{{hdfs.client.trace.sampler}}.

One very important use-case for HTrace is "how can HBase figure out what HDFS 
is doing."  For this use-case, of course, we don't need the client to initiate 
tracing... HBase can simply change its code to have the relevant calls to 
HTrace, and then that will get picked up by DFSClient, DataNode, NN, etc.  I 
think this is the use-case you guys have been focusing on, and understandably 
so.  But this is only one use-case of many.  Another very important use case of 
tracing is "I have proprietary app X that talks to HDFS, and it's slow.  How 
come?"  For that use-case, we need to be able to have the DFSClient initiate 
the tracing, since we don't have the source code for the proprietary app (or if 
we do, modifying it and redeploying it may require a lengthy admin process.)

bq. Should HBase and Accumulo clients be providing the same?

I believe they should.  It would be nice to be able to figure out why HBase is 
slow for some arbitrary workload, without hacking the client.  I would like to 
be able to give a talk about profiling HBase that doesn't start with "first, 
modify your source code in ways X, Y, and Z"... it's much nicer to tell people 
to set a config option.  Otherwise I feel like I'm telling people to write a 
mapreduce job in erlang... and you know what that really means I'm telling them 
:)  This is especially true for non-devs.

I think we could also improve our API to make it less likely (or maybe even 
impossible) for client and server tracing configs to conflict so much.  I have 
some ideas for how to do that which I'll take a look at in a follow-on jira

> DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
> than hadoop.htrace
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8213
>                 URL: https://issues.apache.org/jira/browse/HDFS-8213
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Billie Rinaldi
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-8213.001.patch
>
>
> DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
> SpanReceivers through its own configuration.  This results in the same 
> receivers being registered multiple times and spans being delivered more than 
> once.  The documentation says SpanReceiverHost.getInstance should be issued 
> once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to