[ https://issues.apache.org/jira/browse/HDFS-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127273#comment-16127273 ]
Wei-Chiu Chuang commented on HDFS-11738: ---------------------------------------- Hello [~vinayrpet] thanks for the patch! I reviewed the patch and I think I grasp the gist of the patch. IIUC, the client would stuck in {{chooseDataNode()}} in such a scenario? The method {{chooseDataNode}} should add a {{@Nullable}} to indicate a null return value is valid. It seems the following code {code:title=DFSInputStream#hedgedFetchBlockByteRange} chosenNode = getBestNodeDNAddrPair(block, ignored); if (chosenNode == null) { chosenNode = chooseDataNode(block, ignored, false); } {code} can be simplified as {code} chosenNode = chooseDataNode(block, ignored, false); {code} I ran the patch with the simplified code and it passed as well. The timeout of 30 seconds seems a little short. On my laptop this test takes approximately 20 seconds, so on a busy host the unit test might potentially run slightly over time. Or would it be reasonable to reduce some wait time? E.g. reduce dfs.client.retry.window.base from 3000 to 1000? {code} conf.setInt(HdfsClientConfigKeys.Retry.WINDOW_BASE_KEY, 1000); {code} > Hedged pread takes more time when block moved from initial locations > -------------------------------------------------------------------- > > Key: HDFS-11738 > URL: https://issues.apache.org/jira/browse/HDFS-11738 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Reporter: Vinayakumar B > Assignee: Vinayakumar B > Attachments: HDFS-11738-01.patch, HDFS-11738-02.patch, > HDFS-11738-03.patch > > > Scenario : > Same as HDFS-11708. > During Hedge read, > 1. First two locations fails to read the data in hedged mode. > 2. chooseData refetches locations and adds a future to read from DN3. > 3. after adding future to DN3, main thread goes for refetching locations in > loop and stucks there till all 3 retries to fetch locations exhausted, which > consumes ~20 seconds with exponential retry time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org