[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.
[ https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11860: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Target Version/s: HDFS-7240 Status: Resolved (was: Patch Available) Thanks [~vagarychen] for the review. I've commit the fix to the feature branch. > Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not > remove chosen node from healthy list. > - > > Key: HDFS-11860 > URL: https://issues.apache.org/jira/browse/HDFS-11860 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Fix For: HDFS-7240 > > Attachments: HDFS-11860-HDFS-7240.001.patch, > HDFS-11860-HDFS-7240.002.patch > > > This was caught in Jenkins run randomly. After debugging, found the cause is > the > logic when two random index happens to be the same below where the node id > was returned without being removed from the healthy list for next round of > selection. As a result, there could be duplicated datanodes chosen for the > pipeline and the machine list size smaller than expected. I will post a fix > soon. > {code} > SCMContainerPlacementCapacity#chooseNode > // There is a possibility that both numbers will be same. > // if that is so, we just return the node. > if (firstNodeNdx == secondNodeNdx) { > return healthyNodes.get(firstNodeNdx); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.
[ https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11860: -- Attachment: HDFS-11860-HDFS-7240.002.patch > Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not > remove chosen node from healthy list. > - > > Key: HDFS-11860 > URL: https://issues.apache.org/jira/browse/HDFS-11860 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-11860-HDFS-7240.001.patch, > HDFS-11860-HDFS-7240.002.patch > > > This was caught in Jenkins run randomly. After debugging, found the cause is > the > logic when two random index happens to be the same below where the node id > was returned without being removed from the healthy list for next round of > selection. As a result, there could be duplicated datanodes chosen for the > pipeline and the machine list size smaller than expected. I will post a fix > soon. > {code} > SCMContainerPlacementCapacity#chooseNode > // There is a possibility that both numbers will be same. > // if that is so, we just return the node. > if (firstNodeNdx == secondNodeNdx) { > return healthyNodes.get(firstNodeNdx); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.
[ https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11860: -- Description: This was caught in Jenkins run randomly. After debugging, found the cause is the logic when two random index happens to be the same below where the node id was returned without being removed from the healthy list for next round of selection. As a result, there could be duplicated datanodes chosen for the pipeline and the machine list size smaller than expected. I will post a fix soon. {code} SCMContainerPlacementCapacity#chooseNode // There is a possibility that both numbers will be same. // if that is so, we just return the node. if (firstNodeNdx == secondNodeNdx) { return healthyNodes.get(firstNodeNdx); } {code} was: This was caught in Jenkins run. After debugging, found the cause is the logic below where the node was returned without being removed from the healthy list for next round. As a result, there could be duplicated datanodes chosen with pipeline size smaller than expected. I will post a fix soon. {code} SCMContainerPlacementCapacity#chooseNode // There is a possibility that both numbers will be same. // if that is so, we just return the node. if (firstNodeNdx == secondNodeNdx) { return healthyNodes.get(firstNodeNdx); } {code} > Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not > remove chosen node from healthy list. > - > > Key: HDFS-11860 > URL: https://issues.apache.org/jira/browse/HDFS-11860 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-11860-HDFS-7240.001.patch > > > This was caught in Jenkins run randomly. After debugging, found the cause is > the > logic when two random index happens to be the same below where the node id > was returned without being removed from the healthy list for next round of > selection. As a result, there could be duplicated datanodes chosen for the > pipeline and the machine list size smaller than expected. I will post a fix > soon. > {code} > SCMContainerPlacementCapacity#chooseNode > // There is a possibility that both numbers will be same. > // if that is so, we just return the node. > if (firstNodeNdx == secondNodeNdx) { > return healthyNodes.get(firstNodeNdx); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.
[ https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-11860: -- Summary: Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list. (was: Ozone: SCM: SCMContainerPlacementCapacity#chooseNode chosen node is not removed from healthy list.) > Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not > remove chosen node from healthy list. > - > > Key: HDFS-11860 > URL: https://issues.apache.org/jira/browse/HDFS-11860 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-11860-HDFS-7240.001.patch > > > This was caught in Jenkins run. After debugging, found the cause is the > logic below where the node was returned without being removed from the > healthy list for next round. As a result, there could be duplicated datanodes > chosen with pipeline size smaller than expected. I will post a fix soon. > {code} > SCMContainerPlacementCapacity#chooseNode > // There is a possibility that both numbers will be same. > // if that is so, we just return the node. > if (firstNodeNdx == secondNodeNdx) { > return healthyNodes.get(firstNodeNdx); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org