[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494760#comment-14494760
 ] 

Colin Patrick McCabe commented on HDFS-8088:
--------------------------------------------

bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and 
things are much happier. The performance is much closer to what I previously 
saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc)

Thanks, Josh.  I discovered that we are reading non-trivial amounts of remote 
data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also 
need to create a trace span for that one.  Also, the {{BlockReader}} trace 
scopes will need to use the {{DFSClient#traceSampler}} (currently they don't) 
or else we will never get any trace spans from reads.  I think that is what we 
would need to get the patch on this JIRA committed.

bq. Giving a very quick look at the code (and making what's possible a bad 
guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above, 
as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only 
appending data into the current packet and not actually submitting that packet 
for the data streamer to process? With some more investigation into the 
hierarchy, I bet I could definitively determine that.

Keep in mind that doing a write in HDFS just hands the data off to a background 
thread called {{DataStreamer}}. which writes it out asynchronously.  The only 
reason why {{writeChunk}} would ever have a time much higher than 0 is that 
there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method 
couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there 
were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in 
flight already.  (HDFS calls "messages" by the name of "packets" even though 
each message is typically multiple ethernet packets.)

I guess we have to step back and ask what the end goal is for HTrace.  If the 
end goal is figuring out why some requests had a high latency, it makes sense 
to only trace parts of the program that we think will take a non-trivial amount 
of time.  In that case, we should probably only trace the handoff of the full 
packet to the {{DataStreamer}}.  If the end goal is understanding the 
downstream consequences of all operations, then we have to connect up the dots 
for all operations.  That's why I originally had all calls to write() and 
read() create trace spans.

I'm inclined to lean more towards goal #1 (figure out why specific requests had 
high latency) than goal #2.  I think that looking at the high-latency outliers 
will naturally lead us to fix the biggest performance issues (such as locking 
contention, disk issues, network issues, etc.).  Also, if all calls to write() 
and read() create trace spans, then this will have a "multiplicative" effect on 
our top-level sampling rate which I think is undesirable.

bq. That being said, I hope I'm not being too much of a bother with all this. I 
was just really excited to see this functionality in HDFS and want to make 
we're getting good data coming back out. Thanks for bearing with me and for the 
patches you've already made!

We definitely appreciate all the input.  I think it's very helpful.  I do think 
maybe we should target 2.7.1 for some of these changes since I need to think 
through everything.  I know that's frustrating, but hopefully if we maintain a 
reasonable Hadoop release cadence it won't be too bad.  I'd also like to run 
some patches by you guys to see if it improves the usefulness of HTrace to you. 
 And I am doing a bunch of testing internally which I think will turn up a lot 
more potential improvements to HTrace and to its integration into HDFS.  
Use-cases really should be very helpful in motivating us here.

> Reduce the number of HTrace spans generated by HDFS reads
> ---------------------------------------------------------
>
>                 Key: HDFS-8088
>                 URL: https://issues.apache.org/jira/browse/HDFS-8088
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to