[jira] [Work logged] (HDFS-16402) HeartbeatManager may cause incorrect stats

ASF GitHub Bot (Jira) Sun, 09 Jan 2022 17:54:05 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16402?focusedWorklogId=705863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-705863
 ]


ASF GitHub Bot logged work on HDFS-16402:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Jan/22 01:53
            Start Date: 10/Jan/22 01:53
    Worklog Time Spent: 10m 
      Work Description: tomscut edited a comment on pull request #3839:
URL: https://github.com/apache/hadoop/pull/3839#issuecomment-1008483869


   > It's good catch here. Would you mind to add new test to cover this case? 
BTW, what is the root cause about NPE here? Thanks.
   
   Thanks @Hexiaoqiao for your review and comments. 
   
   The main reason is that the `DatanodeDescriptor#storageMap` cannot find the 
storage that DN reports through `lifeline`.  As a result, the `NPE` is thrown 
in line 460 of the code when the storage is operated. It is similar to the 
situation encountered in this issue  
[HDFS-14042](https://issues.apache.org/jira/browse/HDFS-14042).
   
   
![image](https://user-images.githubusercontent.com/55134131/148710190-0009fa17-09f4-404c-980d-bdea6bf10dd1.png)
   
   Here is the scenario we encountered:
   1 We add a disk to datanode and reconfig `dfs.datanode.data.dir`.
   2 After a while, many values on NN's Web become negative.
   3 We found the NPE from the NN log and discovered this issue 
[HDFS-14042](https://issues.apache.org/jira/browse/HDFS-14042).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 705863)
    Time Spent: 50m  (was: 40m)

> HeartbeatManager may cause incorrect stats
> ------------------------------------------
>
>                 Key: HDFS-16402
>                 URL: https://issues.apache.org/jira/browse/HDFS-16402
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2021-12-29-08-25-44-303.png, 
> image-2021-12-29-08-25-54-441.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> After reconfig {*}dfs.datanode.data.dir{*}, we found that the stats of the 
> Namenode Web became *negative* and there were many NPE in namenode logs. This 
> problem has been solved by HDFS-14042.
> !image-2021-12-29-08-25-54-441.png|width=681,height=293!
> !image-2021-12-29-08-25-44-303.png|width=677,height=180!
> However, if *HeartbeatManager#updateHeartbeat* and 
> *HeartbeatManager#updateLifeline* throw other exceptions, stats errors can 
> also occur. We should ensure that *stats.subtract()* and *stats.add()* are 
> transactional.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16402) HeartbeatManager may cause incorrect stats

Reply via email to