Yiqun Lin created HDFS-14503: -------------------------------- Summary: ThrottledAsyncChecker throws NPE during block pool initialization Key: HDFS-14503 URL: https://issues.apache.org/jira/browse/HDFS-14503 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.3.0 Reporter: Yiqun Lin
ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the block pool registration failure. The exception {noformat} 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211) at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129) at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209) at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768) at java.lang.Thread.run(Thread.java:745) {noformat} Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has removed the target entry while we still get that entry. Although we have done a check before we get it, there is still a chance the entry is got as null. We met a corner case for this: A federation mode, two block pools in DN, {{ThrottledAsyncChecker}} schedules two same health checks for same volume. {noformat} 2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current 2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org