[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093613#comment-14093613
 ] 

Jason Lowe commented on HDFS-6840:
----------------------------------

I think the previous behavior was not deterministic due to this change that was 
removed in the HDFS-6268 patch:

{code}
    // put a random node at position 0 if it is not a local/local-rack node
    if(tempIndex == 0 && localRackNode == -1 && nodes.length != 0) {
      swap(nodes, 0, r.nextInt(nodes.length));
{code}

The list used to be mostly deterministic, but the first node in the list (i.e.: 
the one clients are likely to be the only one to use) was random.

I have not done the bisect to prove without a doubt it was HDFS-6268, but we've 
run builds based on something 2.4.1+ and 2.5 and this behavior is brand-new 
with 2.5.  There weren't a lot of changes in the topology sorting arena besides 
this one between 2.4.1 and 2.5.0, and the code and JIRA for HDFS-6268 state 
it's intentionally not randomizing the datanode list between clients.  Besides 
the bisect approach I probably can try replacing the network topology class 
with the one from before HDFS-6268 and see if the behavior reverts to what it 
used to be.

> Clients are always sent to the same datanode when read is off rack
> ------------------------------------------------------------------
>
>                 Key: HDFS-6840
>                 URL: https://issues.apache.org/jira/browse/HDFS-6840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to