Josh Elser created HDFS-8069:
--------------------------------

             Summary: Tracing implementation on DFSInputStream seriously 
degrades performance
                 Key: HDFS-8069
                 URL: https://issues.apache.org/jira/browse/HDFS-8069
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.7.0
            Reporter: Josh Elser
            Priority: Critical


I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a 
serious performance impact when Accumulo registers itself as a SpanReceiver.

The context of the test which I noticed the impact is that an Accumulo process 
reads a series of updates from a write-ahead log. This is just reading a series 
of Writable objects from a file in HDFS. With tracing enabled, I waited for at 
least 10 minutes and the server still hadn't read a ~300MB file.

Doing a poor-man's inspection via repeated thread dumps, I always see something 
like the following:

{noformat}
"replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d runnable 
[0x00007f6c7b1ec000]
   java.lang.Thread.State: RUNNABLE
        at 
java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
        at org.apache.htrace.Tracer.deliver(Tracer.java:80)
        at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
        - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
        at org.apache.htrace.TraceScope.close(TraceScope.java:78)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
        - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
        - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
        at java.io.DataInputStream.readByte(DataInputStream.java:265)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
        at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
       ... more accumulo code omitted...
{noformat}

What I'm seeing here is that reading a single byte (in WritableUtils.readVLong) 
is causing a new Span creation and close (which includes a flush to the 
SpanReceiver). This results in an extreme amount of spans for 
{{DFSInputStream.byteArrayRead}} just for reading a file from HDFS -- over 700k 
spans for just reading a few hundred MB file.

Perhaps there's something different we need to do for the SpanReceiver in 
Accumulo? I'm not entirely sure, but this was rather unexpected.

cc/ [~cmccabe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to