[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883685#comment-13883685
 ] 

Arpit Agarwal commented on HDFS-5776:
-------------------------------------

{quote}
Yes, that would be perfect sometimes, but not works for HBase scenario(the 
above Stack's consideration is great), since we made the pool "static", and per 
client view, it's more flexible if we provide instance level disable/enable 
APIs, so we can archive to use the hbase shell script to control the switch per 
dfs client instance, that'll be cooler
{quote}
Okay.

{quote}
In actualGetFromOneDatanode(), the refetchToken/refetchEncryptionKey is 
initialized outside the while (true) loop (see Line 993-996), when we hit 
InvalidEncryptionKeyException/InvalidBlockTokenException, the refetchToken and 
refetchEncryptionKey will be decreased by 1, (see refetchEncryptionKey-- and 
refetchToken-- statement), if the exceptions happened again, the check 
conditions will be failed definitely(see "e instanceof 
InvalidEncryptionKeyException && refetchEncryptionKey > 0" and "refetchToken > 
0"), so go to the else clause, that'll execute:
{quote}
Isn't the call to {{actualGetFromOneDataNode}} wrapped in a loop itself? I am 
talking about the while loop in {{fetchBlockByteRange}}. Will that not change 
the behavior? Maybe it is harmless, I am not sure. I just want us to be clear 
either way.

Thanks for adding the thread count limit. If we need more than 128 threads per 
client process just for backup reads we (hdfs) need to think about proper async 
rpc. Suggesting a lack of limits ignores the point that it can double the DN 
load on an already loaded cluster. Also 1ms lower bound for the delay is as 
good as zero but as long as we have a thread count limit I am okay.

Minor points that don't need to hold up the checkin:
# The test looks like a stress test, i.e. we are hoping that some of the hedged 
requests will complete before the primary requests. We can create a separate 
Jira to write a deterministic unit test and it’s fine if someone else picks 
that up later.
# A couple of points from my initial feedback (#10, #12) were missed but again 
not worth holding the checkin.

Other than clarifying the loop behavior the v9 patch looks fine to me.

Thanks again for working with the feedback Liang, this is a nice capability to 
have in HDFS.

> Support 'hedged' reads in DFSClient
> -----------------------------------
>
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, 
> HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, 
> HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to