[jira] [Commented] (HDFS-5946) Webhdfs DN choosing code is flawed

Ming Ma (JIRA) Thu, 11 Sep 2014 14:23:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130734#comment-14130734
 ]


Ming Ma commented on HDFS-5946:
-------------------------------

We ran into the similar issue Daryn mentioned. It seems possible to have the 
first node to be the decommissioned node even though 
ClientProtocol.getBlockLocations put decommissioned nodes at the end. In 
JspHelper.java,

{noformat}
    HashMap<DatanodeInfo, NodeRecord> map =
      new HashMap<DatanodeInfo, NodeRecord>();
...
    NodeRecord[] nodes = map.values().toArray(new NodeRecord[map.size()]);
...
{noformat}

Jing, Haohui, so it appears the order in nodes can be different from the 
original order.

> Webhdfs DN choosing code is flawed
> ----------------------------------
>
>                 Key: HDFS-5946
>                 URL: https://issues.apache.org/jira/browse/HDFS-5946
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, webhdfs
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> HDFS-5891 improved the performance of redirecting webhdfs clients to a DN.  
> Instead of attempting a connection with a 1-minute timeout, the NN skips 
> decommissioned nodes.
> The logic appears flawed.  It finds the index of the first decommissioned 
> node, if any, then:
> * Throws an exception if index = 0, even if other nodes later in the list are 
> not decommissioned.
> * Else picks a random node prior to the index.  Let's say there are 10 
> replicas, 2nd location is decommissioned.  All clients will be redirected to 
> the first location even though there are 8 other valid locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5946) Webhdfs DN choosing code is flawed

Reply via email to