[ https://issues.apache.org/jira/browse/HDFS-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hung updated HDFS-14503: --------------------------------- Target Version/s: 3.3.0, 2.10.1 (was: 2.10.0, 3.3.0) > ThrottledAsyncChecker throws NPE during block pool initialization > ------------------------------------------------------------------ > > Key: HDFS-14503 > URL: https://issues.apache.org/jira/browse/HDFS-14503 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Yiqun Lin > Priority: Major > > ThrottledAsyncChecker throws NPE during block pool initialization. The error > leads the block pool registration failure. > The exception > {noformat} > 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Unexpected exception in block pool Block pool <registering> (Datanode Uuid > xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211) > at > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129) > at > org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has > removed the target entry while we still get that entry. Although we have done > a check before we get it, there is still a chance the entry is got as null. > We met a corner case for this: A federation mode, two block pools in DN, > {{ThrottledAsyncChecker}} schedules two same health checks for same volume. > {noformat} > 2019-05-20 01:02:36,000 INFO > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: > Scheduling a check for /hadoop/2/hdfs/data/current > 2019-05-20 01:02:36,000 INFO > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: > Scheduling a check for /hadoop/2/hdfs/data/current > {noformat} > {{completedChecks}} cleans up the entry for one successful check after called > {{completedChecks#get}}. However, after this, another check we get the null. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org