[ https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093613#comment-14093613 ]
Jason Lowe commented on HDFS-6840: ---------------------------------- I think the previous behavior was not deterministic due to this change that was removed in the HDFS-6268 patch: {code} // put a random node at position 0 if it is not a local/local-rack node if(tempIndex == 0 && localRackNode == -1 && nodes.length != 0) { swap(nodes, 0, r.nextInt(nodes.length)); {code} The list used to be mostly deterministic, but the first node in the list (i.e.: the one clients are likely to be the only one to use) was random. I have not done the bisect to prove without a doubt it was HDFS-6268, but we've run builds based on something 2.4.1+ and 2.5 and this behavior is brand-new with 2.5. There weren't a lot of changes in the topology sorting arena besides this one between 2.4.1 and 2.5.0, and the code and JIRA for HDFS-6268 state it's intentionally not randomizing the datanode list between clients. Besides the bisect approach I probably can try replacing the network topology class with the one from before HDFS-6268 and see if the behavior reverts to what it used to be. > Clients are always sent to the same datanode when read is off rack > ------------------------------------------------------------------ > > Key: HDFS-6840 > URL: https://issues.apache.org/jira/browse/HDFS-6840 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.5.0 > Reporter: Jason Lowe > Priority: Critical > > After HDFS-6268 the sorting order of block locations is deterministic for a > given block and locality level (e.g.: local, rack. off-rack), so off-rack > clients all see the same datanode for the same block. This leads to very > poor behavior in distributed cache localization and other scenarios where > many clients all want the same block data at approximately the same time. > The one datanode is crushed by the load while the other replicas only handle > local and rack-local requests. -- This message was sent by Atlassian JIRA (v6.2#6252)