[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11860:
------------------------------
    Description: 
This was caught in Jenkins run randomly. After debugging, found the cause is 
the 
logic when two random index happens to be the same below where the node id was 
returned without being removed from the healthy list for next round of 
selection. As a result, there could be duplicated datanodes chosen for the 
pipeline and the machine list size smaller than expected. I will post a fix 
soon. 

{code}
SCMContainerPlacementCapacity#chooseNode
     // There is a possibility that both numbers will be same.
     // if that is so, we just return the node.
     if (firstNodeNdx == secondNodeNdx) {
      return healthyNodes.get(firstNodeNdx);
     }

{code}

  was:
This was caught in Jenkins run. After debugging, found the cause is the 
logic below where the node was returned without being removed from the healthy 
list for next round. As a result, there could be duplicated datanodes chosen 
with pipeline size smaller than expected. I will post a fix soon. 

{code}
SCMContainerPlacementCapacity#chooseNode
     // There is a possibility that both numbers will be same.
     // if that is so, we just return the node.
     if (firstNodeNdx == secondNodeNdx) {
      return healthyNodes.get(firstNodeNdx);
     }

{code}


> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11860
>                 URL: https://issues.apache.org/jira/browse/HDFS-11860
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-11860-HDFS-7240.001.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>      // There is a possibility that both numbers will be same.
>      // if that is so, we just return the node.
>      if (firstNodeNdx == secondNodeNdx) {
>       return healthyNodes.get(firstNodeNdx);
>      }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to