[ 
https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065630#comment-14065630
 ] 

Ashwin Shankar commented on HDFS-6268:
--------------------------------------

Hi [~andrew.wang],
After applying your patch in our cluster, we see that all read requests for a 
block were still going to the same rack replica when there is no node local 
replica.This resulted in some containers getting stuck at LOCALIZING phase and 
eventually failing.
Looking at the patch, I see you are setting a seed to the RNG,which is 
basically the blockid,which gives the same
pseudo random order for a block. Hence the same rack replica gets bombarded for 
a block(when there is no nodelocal).
Do you see any problem if we don't have a seed  and randomize rack local nodes 
for a block ?

> Better sorting in NetworkTopology#pseudoSortByDistance when no local node is 
> found
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6268
>                 URL: https://issues.apache.org/jira/browse/HDFS-6268
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, 
> hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch
>
>
> In NetworkTopology#pseudoSortByDistance, if no local node is found, it will 
> always place the first rack local node in the list in front.
> This became an issue when a dataset was loaded from a single datanode. This 
> datanode ended up being the first replica for all the blocks in the dataset. 
> When running an Impala query, the non-local reads when reading past a block 
> boundary were all hitting this node, meaning massive load skew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to