[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883823#comment-13883823 ]
Liang Xie commented on HDFS-5776: --------------------------------- bq. Isn't the call to actualGetFromOneDataNode wrapped in a loop itself? I am talking about the while loop in fetchBlockByteRange. Will that not change the behavior? Maybe it is harmless, I am not sure. I just want us to be clear either way. Yes, it doesn't change the whole behavior and harmless, in deed, it's safer than before. In the old impl, the refetchToken/refetchEncryptionKey are shared by all nodes from chooseDataNode once key/token exception happened. that means if the first node consumed this retry quota, then if the second or third node hit the key/token exception, clearDataEncryptionKey/fetchBlockAt opeerations will not be called, it's a little unfair:) In the new impl/patch, we make the second or later node have a similar retry quota as the first node, it's more fair to me. Anyway, it doesn't change the normal path, just safer/fair to the security-enabled scenario. bq. The test looks like a stress test, i.e. we are hoping that some of the hedged requests will complete before the primary requests. We can create a separate Jira to write a deterministic unit test and it’s fine if someone else picks that up later. Ok, I can track it later. For patch v9 or v10, both are OK with me(though our internal branch use the style without limit), since my original wish is to reduce the HBase's P99 and P99.9 latency, not any difference on this point. V9 is safer but probably need to modify HDFS source code again if hit the hardcode limit(It's difficult to a normal end user). IMHO, the actual/final committer who will commit this JIRA can pick one up. It'll be a pity if lots of guys continue to argue this style and hold on the progress, that doesn't help the downstream HBase project at all. > Support 'hedged' reads in DFSClient > ----------------------------------- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)