[ https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoyu Yao updated HDFS-11860: ------------------------------ Description: This was caught in Jenkins run randomly. After debugging, found the cause is the logic when two random index happens to be the same below where the node id was returned without being removed from the healthy list for next round of selection. As a result, there could be duplicated datanodes chosen for the pipeline and the machine list size smaller than expected. I will post a fix soon. {code} SCMContainerPlacementCapacity#chooseNode // There is a possibility that both numbers will be same. // if that is so, we just return the node. if (firstNodeNdx == secondNodeNdx) { return healthyNodes.get(firstNodeNdx); } {code} was: This was caught in Jenkins run. After debugging, found the cause is the logic below where the node was returned without being removed from the healthy list for next round. As a result, there could be duplicated datanodes chosen with pipeline size smaller than expected. I will post a fix soon. {code} SCMContainerPlacementCapacity#chooseNode // There is a possibility that both numbers will be same. // if that is so, we just return the node. if (firstNodeNdx == secondNodeNdx) { return healthyNodes.get(firstNodeNdx); } {code} > Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not > remove chosen node from healthy list. > ------------------------------------------------------------------------------------------------------------- > > Key: HDFS-11860 > URL: https://issues.apache.org/jira/browse/HDFS-11860 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Affects Versions: HDFS-7240 > Reporter: Xiaoyu Yao > Assignee: Xiaoyu Yao > Attachments: HDFS-11860-HDFS-7240.001.patch > > > This was caught in Jenkins run randomly. After debugging, found the cause is > the > logic when two random index happens to be the same below where the node id > was returned without being removed from the healthy list for next round of > selection. As a result, there could be duplicated datanodes chosen for the > pipeline and the machine list size smaller than expected. I will post a fix > soon. > {code} > SCMContainerPlacementCapacity#chooseNode > // There is a possibility that both numbers will be same. > // if that is so, we just return the node. > if (firstNodeNdx == secondNodeNdx) { > return healthyNodes.get(firstNodeNdx); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org