[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

Jing Zhao (JIRA) Thu, 23 Jan 2014 11:51:33 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880279#comment-13880279
 ]


Jing Zhao commented on HDFS-5776:
---------------------------------

# In DFSClient, I agree with Arpit that we should remove the allowHedgedReads 
field and the enable/disable methods. In the current code, whether hedged read 
is enabled is determined by the initial setting of the hedgedReadThreadPool. If 
we provide these extra enable/disable methods, what if a user of DFSClient sets 
0 to the thread pool size and later call the enableHedgedReads? Unless we have 
a clear use case to support the usage of the enable/disable methods, I guess we 
do not need to provide these flexibility here.
An alternative way to do this is to have an "Allow-Hedged-Reads" configuration, 
and if it is set to true, we load the number of thread pool and the threshold 
time. We will provide an isHedgedReadsEnabled method but we will not provide 
enable/disable methods. I guess this may be easier for users to understand.
# Can this scenario be possible? In hedgedFetchBlockByteRange, if we hit the 
timeout for the first DN, we will add the DN to the ignore list, and call 
chooseDataNode again. If the first DN is the only DN we can read, we will get 
IOException from bestNode. Then we will run into a loop where we keep trying to 
get another DN multiple times (some NN rpc call will even be fired). And during 
this process the first DN can even return the data. In this scenario I guess we 
may get a worse performance? Thus I guess we should not trigger hedged read if 
we find that we cannot (easily) find the second DN for read?

> Support 'hedged' reads in DFSClient
> -----------------------------------
>
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
> HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

Reply via email to