[ 
https://issues.apache.org/jira/browse/HDFS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486310#comment-14486310
 ] 

Josh Elser commented on HDFS-8069:
----------------------------------

bq. Thanks for confirming this. Just to double-check, can you confirm that you 
have hadoop.htrace.sampler set to nothing (the default).

Sorry I took so long: Yes, I explicitly set {{hadoop.htrace.sampler}} to 
NeverSampler and re-ran the test with the same end result.

bq. I am going to open an issue in HDFS to only trace the cases where we 
actually fill the buffer of the HDFS BlockReader. I think that it's a 
reasonable tradeoff to make, given that filling the HDFS BlockReader buffer 
tends to be the main thing that delays readers from HDFS. Just reading a byte 
from the in-memory buffer that already exists very seldom causes any delay, if 
ever.

Agreed. Thanks for doing this.

bq. If the Accumlo operation is big enough, it may be necessary to split it 
into multiple HTrace spans. For example, I think tracing an entire compaction 
would be too big. We may have to experiment with this somewhat.

Agreed on experimentation. Personally, I'd love to be able to know "is a 
compaction taking long because I'm waiting on HDFS?", "is there an inefficiency 
in how we read/write the bytes in Accumulo?". I think a happy-medium just needs 
to be found.

Thanks again for your time with this.

> Tracing implementation on DFSInputStream seriously degrades performance
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8069
>                 URL: https://issues.apache.org/jira/browse/HDFS-8069
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.0
>            Reporter: Josh Elser
>            Priority: Critical
>
> I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a 
> serious performance impact when Accumulo registers itself as a SpanReceiver.
> The context of the test which I noticed the impact is that an Accumulo 
> process reads a series of updates from a write-ahead log. This is just 
> reading a series of Writable objects from a file in HDFS. With tracing 
> enabled, I waited for at least 10 minutes and the server still hadn't read a 
> ~300MB file.
> Doing a poor-man's inspection via repeated thread dumps, I always see 
> something like the following:
> {noformat}
> "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d 
> runnable [0x00007f6c7b1ec000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
>         at org.apache.htrace.Tracer.deliver(Tracer.java:80)
>         at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
>         - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
>         at org.apache.htrace.TraceScope.close(TraceScope.java:78)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
>         - locked <0x000000079fa39a48> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
>         - locked <0x000000079fa39a48> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>         at java.io.DataInputStream.readByte(DataInputStream.java:265)
>         at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
>         at 
> org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
>        ... more accumulo code omitted...
> {noformat}
> What I'm seeing here is that reading a single byte (in 
> WritableUtils.readVLong) is causing a new Span creation and close (which 
> includes a flush to the SpanReceiver). This results in an extreme amount of 
> spans for {{DFSInputStream.byteArrayRead}} just for reading a file from HDFS 
> -- over 700k spans for just reading a few hundred MB file.
> Perhaps there's something different we need to do for the SpanReceiver in 
> Accumulo? I'm not entirely sure, but this was rather unexpected.
> cc/ [~cmccabe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to