[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153706#comment-14153706
 ] 

Colin Patrick McCabe commented on HDFS-7055:
--------------------------------------------

bq. I think SpanReceiverHost#getUniqueLocalTraceFileName is useful but it 
should belong to htrace. Can I port it to htrace later and remove from hadoop 
on the next bumping of htrace version?

Yeah, absolutely.

bq. I attached screenshot of spans for reference. It shows trace of getting 1MB 
of file by FsShell on pseudo distributed cluster with .004 patch. The trace 
consists of over 500 spans in this case.... Setting 
hadoop.trace.sampler=ProbabilitySampler did not reduce the number of spans 
above because Trace#startSpan always start span without regarding to sampler 
when there is ongoing trace.

Well, I guess it depends on what you mean by "granular." :)  I certainly don't 
want all trace spans to be activated randomly.  We need to see the parent/child 
relationships between the spans.  I think the granularity of individual reads 
is just about right-- less than that, and we start not being able to see the 
big picture.  More than that, and we can't effectively do random sampling.

But you are right that we have too many trace spans here.  I thought about this 
a little more, and I don't think we have to create a trace span for each 
BlockReader operation.  We can just create trace spans for the operations that 
actually perform I/O to the datanode.

I think we can reduce this by not creating trace spans for every read done via 
a BlockReader-- only the reads which actually result in data being written from 
the DN.  Similarly for BlockReaderLocal, we can trace the times we fill up the 
buffer, but not every call into BlockReaderLocal.

> Add tracing to DFSInputStream
> -----------------------------
>
>                 Key: HDFS-7055
>                 URL: https://issues.apache.org/jira/browse/HDFS-7055
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, 
> HDFS-7055.004.patch, screenshot-get-1mb.png
>
>
> Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to