[ https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975592#comment-14975592 ]
Walter Su commented on HDFS-9313: --------------------------------- I'm ok that adding a {{null}} check. However, I don't think it's enough to address the scenario here, In the test case, you added 1 SSD + 3 DISKs. As you said in the patch, {code} 1040 // In this case, 1041 // no replica can't be chosen as the excessive replica as 1042 // chooseReplicasToDelete only considers storages[4] and storages[5] that 1043 // are the same rack. But neither's storage type is SSD. {code} If we choose nothing, the replica on SSD won't be deleted. And I remember, {{Mover}} won't do it neither, since the existings contains the expected. Instead of choosing nothing, we should choose the SSD, since the remaining 3 DISKs are already on enough racks. > Possible NullPointerException in BlockManager if no excess replica can be > chosen > -------------------------------------------------------------------------------- > > Key: HDFS-9313 > URL: https://issues.apache.org/jira/browse/HDFS-9313 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: HDFS-9313.patch > > > HDFS-8647 makes it easier to reason about various block placement scenarios. > Here is one possible case where BlockManager won't be able to find the excess > replica to delete: when storage policy changes around the same time balancer > moves the block. When this happens, it will cause NullPointerException. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978) > {noformat} > Note that it isn't found in any production clusters. Instead, it is found > from new unit tests. In addition, the issue has been there before HDFS-8647. -- This message was sent by Atlassian JIRA (v6.3.4#6332)