[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883685#comment-13883685 ]
Arpit Agarwal commented on HDFS-5776: ------------------------------------- {quote} Yes, that would be perfect sometimes, but not works for HBase scenario(the above Stack's consideration is great), since we made the pool "static", and per client view, it's more flexible if we provide instance level disable/enable APIs, so we can archive to use the hbase shell script to control the switch per dfs client instance, that'll be cooler {quote} Okay. {quote} In actualGetFromOneDatanode(), the refetchToken/refetchEncryptionKey is initialized outside the while (true) loop (see Line 993-996), when we hit InvalidEncryptionKeyException/InvalidBlockTokenException, the refetchToken and refetchEncryptionKey will be decreased by 1, (see refetchEncryptionKey-- and refetchToken-- statement), if the exceptions happened again, the check conditions will be failed definitely(see "e instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0" and "refetchToken > 0"), so go to the else clause, that'll execute: {quote} Isn't the call to {{actualGetFromOneDataNode}} wrapped in a loop itself? I am talking about the while loop in {{fetchBlockByteRange}}. Will that not change the behavior? Maybe it is harmless, I am not sure. I just want us to be clear either way. Thanks for adding the thread count limit. If we need more than 128 threads per client process just for backup reads we (hdfs) need to think about proper async rpc. Suggesting a lack of limits ignores the point that it can double the DN load on an already loaded cluster. Also 1ms lower bound for the delay is as good as zero but as long as we have a thread count limit I am okay. Minor points that don't need to hold up the checkin: # The test looks like a stress test, i.e. we are hoping that some of the hedged requests will complete before the primary requests. We can create a separate Jira to write a deterministic unit test and it’s fine if someone else picks that up later. # A couple of points from my initial feedback (#10, #12) were missed but again not worth holding the checkin. Other than clarifying the loop behavior the v9 patch looks fine to me. Thanks again for working with the feedback Liang, this is a nice capability to have in HDFS. > Support 'hedged' reads in DFSClient > ----------------------------------- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)