[ https://issues.apache.org/jira/browse/HDFS-16402?focusedWorklogId=705862&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-705862 ]
ASF GitHub Bot logged work on HDFS-16402: ----------------------------------------- Author: ASF GitHub Bot Created on: 10/Jan/22 01:52 Start Date: 10/Jan/22 01:52 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3839: URL: https://github.com/apache/hadoop/pull/3839#issuecomment-1008483869 > It's good catch here. Would you mind to add new test to cover this case? BTW, what is the root cause about NPE here? Thanks. Thanks @Hexiaoqiao for your review and comments. The main reason is that the `DatanodeDescriptor#storageMap` cannot find the storage that DN reports through `lifeline`. As a result, the `NPE` is thrown in line 460 of the code when the storage is operated. ![image](https://user-images.githubusercontent.com/55134131/148710190-0009fa17-09f4-404c-980d-bdea6bf10dd1.png) Here is the scenario we encountered: 1 We add a disk to datanode and reconfig `dfs.datanode.data.dir`. 2 After a while, many values on NN's Web become negative. 3 We found the NPE from the NN log and discovered this issue [HDFS-14042](https://issues.apache.org/jira/browse/HDFS-14042). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 705862) Time Spent: 40m (was: 0.5h) > HeartbeatManager may cause incorrect stats > ------------------------------------------ > > Key: HDFS-16402 > URL: https://issues.apache.org/jira/browse/HDFS-16402 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: tomscut > Assignee: tomscut > Priority: Major > Labels: pull-request-available > Attachments: image-2021-12-29-08-25-44-303.png, > image-2021-12-29-08-25-54-441.png > > Time Spent: 40m > Remaining Estimate: 0h > > After reconfig {*}dfs.datanode.data.dir{*}, we found that the stats of the > Namenode Web became *negative* and there were many NPE in namenode logs. This > problem has been solved by HDFS-14042. > !image-2021-12-29-08-25-54-441.png|width=681,height=293! > !image-2021-12-29-08-25-44-303.png|width=677,height=180! > However, if *HeartbeatManager#updateHeartbeat* and > *HeartbeatManager#updateLifeline* throw other exceptions, stats errors can > also occur. We should ensure that *stats.subtract()* and *stats.add()* are > transactional. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org