[ 
https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987207#comment-13987207
 ] 

Aaron T. Myers commented on HDFS-6268:
--------------------------------------

bq. ATM, I read this code and had the same thought. It would be cleaner and 
less corner-casey if we first binned by network distance, then randomized each 
bin. I didn't make this change since this is a hot code path and it'd be a bit 
slower, but since we're typically dealing with 3 replicas, I can't imagine it 
making a big difference. We could also potentially fold in the decom/stale 
state too, and get better locality for these edge cases. If you agree with this 
assessment, I'll redo this patch as per above.

This is precisely my thinking as well. Not really obvious to me if this would 
be actually be measurably slower at all. A few extra short-lived small objects 
shouldn't make a noticeable difference at all, I wouldn't think.

> Better sorting in NetworkTopology#pseudoSortByDistance when no local node is 
> found
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6268
>                 URL: https://issues.apache.org/jira/browse/HDFS-6268
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Minor
>         Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch
>
>
> In NetworkTopology#pseudoSortByDistance, if no local node is found, it will 
> always place the first rack local node in the list in front.
> This became an issue when a dataset was loaded from a single datanode. This 
> datanode ended up being the first replica for all the blocks in the dataset. 
> When running an Impala query, the non-local reads when reading past a block 
> boundary were all hitting this node, meaning massive load skew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to