[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877996#comment-13877996
 ] 

Enis Soztutar commented on HDFS-5776:
-------------------------------------

Nice work Liang. You have beat us to implement this! 
A couple of higher level comments:
 
 - The numbers look very promising. 
http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
 slides 50+ gives some numbers for increased RPC's caused by this. If will be 
great if we can get some info about this as well. 
 - Regarding naming, FB branch calls this quorum reads (which is misleading), 
and google calls this backup requests. We preferred to use the name "parallel", 
and "parallel with delay" in design doc for HBASE-10070 (a similar feature in 
HBase) and in the code we ended up calling it RPC with fallback. It will be 
very good to use a consistent naming across hdfs and hbase, but not sure which 
one is better. 
 - In getFirstToComplete(), sleeping is not the best practice. It puts an 
arbitrary delay in returning back, and configuring the sleep timeout is 
non-trivial. Can we do smt like ExecutorService.invokeAny() or a wait/notify or 
a coundownLatch design?   See 
http://stackoverflow.com/questions/117690/wait-until-any-of-futuret-is-done
 - Again in the Jeff Dean's slides, they talk about doing the 3rd requests with 
larger timeout does not buy a lot. Wondering whether we should limit this to 
only 2 requests or not. Without real-world usage it will be hard to choose one 
way or the other. 


> Support 'hedged' reads in DFSClient
> -----------------------------------
>
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
> HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to