[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879326#comment-13879326 ]
Arpit Agarwal commented on HDFS-5776: ------------------------------------- Hi Liang, thanks for this contribution to HDFS! I am still reviewing {{DFSInputStream#hedgedFetchBlockByteRange}} and the tests but here is some initial feedback. # Does it make sense for {{DFSClient#hedgedReadsThreadPool}} to be a static field? The concern is too many thread pools created by multiple clients on the same node. # Related to the previous - what do you think of not exposing the {{DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE}} setting at all? Maybe we can just expose a boolean setting to enable it. The reason I prefer not to surface such settings is because it invites abuse (the concern is not with trusted apps like HBase). If we do expose this setting we should at least have an internal upper bound. # {{DFSClient#allowHedgedReads}} seems unnecessary since you can just use ({{hedgedReadsThreadPool == null}}). Also you can remove {{#enableHedgedReads}} and {{#disableHedgedReads}}. # For {{DEFAULT_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS}} - can we add an inbuilt minimum delay to defeat applications that set it too low or even zero? # {{DFSInputStream#chooseDataNode}} - can the call to getBestNodeErrorString go inside the "{{if (failures >=...}}" clause? # {{#fetchBlockByteRange}} - can we rename {{retVal}} to something like {{addressPair}}? # Do we still need the {{while}} loop still there in {{actualGetFromOneDataNode}}? There is already a while loop in {{fetchBlockByteRange}} enclosing the call to {{actualGetFromOneDataNode}}. Now we have a nested loop. # Maybe I misunderstood the code flow but it looks like the way the while loops are nested it defeats the usage of {{refetchToken}} and {{refetchEncryptionKey}}. It looks like the intention was to limit the refetch to 1 across all retries, now we can refetch multiple times. # Related to the previous, {{#actualGetFromOneDataNode}}, line 1026, - sorry I did not understand why the try-catch was added around the call to {{fetchBlockAt}}. # {{#actualGetFromOneDataNode}}, line 1010 - we are using an exception to signal retry to the caller. It might be better to return a {{boolean}} instead. # {{#actualGetFromOneDataNode}}, line 1033 - the call to {{DFSClient.LOG.warn}} is deleted. Assume that was unintentional? # Nitpick - some lines have whitespace-only changes. > Support 'hedged' reads in DFSClient > ----------------------------------- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, > HDFS-5776-v5.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)