[ https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494760#comment-14494760 ]
Colin Patrick McCabe commented on HDFS-8088: -------------------------------------------- bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and things are much happier. The performance is much closer to what I previously saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc) Thanks, Josh. I discovered that we are reading non-trivial amounts of remote data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also need to create a trace span for that one. Also, the {{BlockReader}} trace scopes will need to use the {{DFSClient#traceSampler}} (currently they don't) or else we will never get any trace spans from reads. I think that is what we would need to get the patch on this JIRA committed. bq. Giving a very quick look at the code (and making what's possible a bad guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above, as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only appending data into the current packet and not actually submitting that packet for the data streamer to process? With some more investigation into the hierarchy, I bet I could definitively determine that. Keep in mind that doing a write in HDFS just hands the data off to a background thread called {{DataStreamer}}. which writes it out asynchronously. The only reason why {{writeChunk}} would ever have a time much higher than 0 is that there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in flight already. (HDFS calls "messages" by the name of "packets" even though each message is typically multiple ethernet packets.) I guess we have to step back and ask what the end goal is for HTrace. If the end goal is figuring out why some requests had a high latency, it makes sense to only trace parts of the program that we think will take a non-trivial amount of time. In that case, we should probably only trace the handoff of the full packet to the {{DataStreamer}}. If the end goal is understanding the downstream consequences of all operations, then we have to connect up the dots for all operations. That's why I originally had all calls to write() and read() create trace spans. I'm inclined to lean more towards goal #1 (figure out why specific requests had high latency) than goal #2. I think that looking at the high-latency outliers will naturally lead us to fix the biggest performance issues (such as locking contention, disk issues, network issues, etc.). Also, if all calls to write() and read() create trace spans, then this will have a "multiplicative" effect on our top-level sampling rate which I think is undesirable. bq. That being said, I hope I'm not being too much of a bother with all this. I was just really excited to see this functionality in HDFS and want to make we're getting good data coming back out. Thanks for bearing with me and for the patches you've already made! We definitely appreciate all the input. I think it's very helpful. I do think maybe we should target 2.7.1 for some of these changes since I need to think through everything. I know that's frustrating, but hopefully if we maintain a reasonable Hadoop release cadence it won't be too bad. I'd also like to run some patches by you guys to see if it improves the usefulness of HTrace to you. And I am doing a bunch of testing internally which I think will turn up a lot more potential improvements to HTrace and to its integration into HDFS. Use-cases really should be very helpful in motivating us here. > Reduce the number of HTrace spans generated by HDFS reads > --------------------------------------------------------- > > Key: HDFS-8088 > URL: https://issues.apache.org/jira/browse/HDFS-8088 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-8088.001.patch > > > HDFS generates too many trace spans on read right now. Every call to read() > we make generates its own span, which is not very practical for things like > HBase or Accumulo that do many such reads as part of a single operation. > Instead of tracing every call to read(), we should only trace the cases where > we refill the buffer inside a BlockReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)