[ 
https://issues.apache.org/jira/browse/HDFS-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127273#comment-16127273
 ] 

Wei-Chiu Chuang commented on HDFS-11738:
----------------------------------------

Hello [~vinayrpet] thanks for the patch!
I reviewed the patch and I think I grasp the gist of the patch. IIUC, the 
client would stuck in {{chooseDataNode()}} in such a scenario? 

The method {{chooseDataNode}} should add a {{@Nullable}} to indicate a null 
return value is valid.

It seems the following code
{code:title=DFSInputStream#hedgedFetchBlockByteRange}
          chosenNode = getBestNodeDNAddrPair(block, ignored);
          if (chosenNode == null) {
            chosenNode = chooseDataNode(block, ignored, false);
          }
{code}
can be simplified as
{code}
chosenNode = chooseDataNode(block, ignored, false);
{code}
I ran the patch with the simplified code and it passed as well.

The timeout of 30 seconds seems a little short. On my laptop this test takes 
approximately 20 seconds, so on a busy host the unit test might potentially run 
slightly over time. Or would it be reasonable to reduce some wait time?
E.g. reduce dfs.client.retry.window.base from 3000 to 1000?
{code}
conf.setInt(HdfsClientConfigKeys.Retry.WINDOW_BASE_KEY, 1000);
{code}

> Hedged pread takes more time when block moved from initial locations
> --------------------------------------------------------------------
>
>                 Key: HDFS-11738
>                 URL: https://issues.apache.org/jira/browse/HDFS-11738
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFS-11738-01.patch, HDFS-11738-02.patch, 
> HDFS-11738-03.patch
>
>
> Scenario : 
> Same as HDFS-11708.
> During Hedge read, 
> 1. First two locations fails to read the data in hedged mode.
> 2. chooseData refetches locations and adds a future to read from DN3.
> 3. after adding future to DN3, main thread goes for refetching locations in 
> loop and stucks there till all 3  retries to fetch locations exhausted, which 
> consumes ~20 seconds with exponential retry time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to