[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883823#comment-13883823
 ] 

Liang Xie commented on HDFS-5776:
---------------------------------

bq. Isn't the call to actualGetFromOneDataNode wrapped in a loop itself? I am 
talking about the while loop in fetchBlockByteRange. Will that not change the 
behavior? Maybe it is harmless, I am not sure. I just want us to be clear 
either way.
Yes, it doesn't change the whole behavior and harmless, in deed, it's safer 
than before.
In the old impl, the refetchToken/refetchEncryptionKey are shared by all nodes 
from chooseDataNode once key/token exception happened. that means if the first 
node consumed this retry quota, then if the second or third node hit the 
key/token exception,  clearDataEncryptionKey/fetchBlockAt opeerations will not 
be called, it's a little unfair:)
In the new impl/patch, we make the second or later node have a similar retry 
quota as the first node, it's more fair to me.
Anyway, it doesn't change the normal path, just safer/fair to the 
security-enabled scenario.

bq. The test looks like a stress test, i.e. we are hoping that some of the 
hedged requests will complete before the primary requests. We can create a 
separate Jira to write a deterministic unit test and it’s fine if someone else 
picks that up later.
Ok, I can track it later.

For patch v9 or v10, both are OK with me(though our internal branch use the 
style without limit), since my original wish is to reduce the HBase's P99 and 
P99.9 latency, not any difference on this point. V9 is safer but probably need 
to modify HDFS source code again if hit the hardcode limit(It's difficult to a 
normal end user).  IMHO, the actual/final committer who will commit this JIRA 
can pick one up. It'll be a pity if lots of guys continue to argue this style 
and hold on the progress, that doesn't help the downstream HBase project at all.

> Support 'hedged' reads in DFSClient
> -----------------------------------
>
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to