[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879326#comment-13879326
 ] 

Arpit Agarwal commented on HDFS-5776:
-------------------------------------

Hi Liang, thanks for this contribution to HDFS!

I am still reviewing {{DFSInputStream#hedgedFetchBlockByteRange}} and the tests 
but here is some initial feedback.
 
# Does it make sense for {{DFSClient#hedgedReadsThreadPool}} to be a static 
field? The concern is too many thread pools created by multiple clients on the 
same node.
# Related to the previous - what do you think of not exposing the 
{{DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE}} setting at all? Maybe we can just 
expose a boolean setting to enable it. The reason I prefer not to surface such 
settings is because it invites abuse (the concern is not with trusted apps like 
HBase). If we do expose this setting we should at least have an internal upper 
bound.
# {{DFSClient#allowHedgedReads}} seems unnecessary since you can just use 
({{hedgedReadsThreadPool == null}}). Also you can remove {{#enableHedgedReads}} 
and {{#disableHedgedReads}}.
# For {{DEFAULT_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS}} - can we add an 
inbuilt minimum delay to defeat applications that set it too low or even zero?
# {{DFSInputStream#chooseDataNode}} - can the call to getBestNodeErrorString go 
inside the "{{if (failures >=...}}" clause?
# {{#fetchBlockByteRange}} - can we rename {{retVal}} to something like 
{{addressPair}}?
# Do we still need the {{while}} loop still there in 
{{actualGetFromOneDataNode}}? There is already a while loop in 
{{fetchBlockByteRange}} enclosing the call to {{actualGetFromOneDataNode}}. Now 
we have a nested loop.
# Maybe I misunderstood the code flow but it looks like the way the while loops 
are nested it defeats the usage of {{refetchToken}} and 
{{refetchEncryptionKey}}. It looks like the intention was to limit the refetch 
to 1 across all retries, now we can refetch multiple times.
# Related to the previous, {{#actualGetFromOneDataNode}}, line 1026, - sorry I 
did not understand why the try-catch was added around the call to 
{{fetchBlockAt}}.
# {{#actualGetFromOneDataNode}}, line 1010 - we are using an exception to 
signal retry to the caller. It might be better to return a {{boolean}} instead.
# {{#actualGetFromOneDataNode}}, line 1033 - the call to {{DFSClient.LOG.warn}} 
is deleted. Assume that was unintentional?
# Nitpick - some lines have whitespace-only changes.


> Support 'hedged' reads in DFSClient
> -----------------------------------
>
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
> HDFS-5776-v5.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to