[ 
https://issues.apache.org/jira/browse/HDFS-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh Shah updated HDFS-17188:
--------------------------------
    Description: 
Recently we saw missing blocks in our production clusters running on dynamic 
environments like AWS. We are running some version of hadoop-2.10 code line.

Events that led to data loss:
 #  We have pool of available IP address and whenever datanode restarts we use 
any available IP address from that pool.
 #  We have seen during the lifetime of namenode process, multiple datanodes 
were restarted and the same datanode has used different IP address.
 # One case that I was debugging was very interesting. 
DN with datanode UUID DN-UUID-1 moved from ip-address-1 --> ip-address-2 --> 
ip-address-3
DN with datanode UUID DN-UUID-2 moved from ip-address-4 --> ip-address-5 --> 
ip-address-1 
Observe the last IP address change for DN-UUID-2. It is ip-address-1 which is 
the first ip address of DN-UUID-1
 #  There was some bug in our operational script which led to all datanodes 
getting restarted at the same time.

Just after the restart, we see the following log lines.
{noformat}
2023-08-26 04:04:41,964 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.1:50010
2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.2:50010
2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.2:50010
2023-08-26 04:04:51,680 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.3:50010
2023-08-26 04:04:55,328 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.4:50010
{noformat}
This line is logged 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1184].

Snippet below:
{code:java}
      DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
      DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
          nodeReg.getIpAddr(), nodeReg.getXferPort());
        
      if (nodeN != null && nodeN != nodeS) {
        NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
        // nodeN previously served a different data storage, 
        // which is not served by anybody anymore.
        removeDatanode(nodeN);
        // physically remove node from datanodeMap
        wipeDatanode(nodeN);
        nodeN = null;
      } {code}
 

This happens when the DatanodeDescriptor is not the same in datanodeMap and 
host2DatanodeMap. HDFS-16540 fixed this bug for lost data locality and not data 
loss. :)

 

By filing this jira, I want to discuss following things:
 # Do we really want to call removeDatanode method from namenode whenever any 
such discrepancy in maps is spotted? Can we rely on the first full block report 
or the periodic full block report from the datanode to fix the metadata? 
 # Improve logging in the blockmanagement code to debug these issues faster.
 # Add a test case with the exact events that occured in our env and still make 
sure that datanodeMap and host2DatanodeMap are consistent.

  was:
Recently we saw missing blocks in our production clusters running on dynamic 
environments like AWS. We are running some version of hadoop-2.10 code line.

Events that led to data loss:
 #  We have pool of available IP address and whenever datanode restarts we use 
any available IP address from that pool.
 #  We have seen during the lifetime of namenode process, multiple datanodes 
were restarted and the same datanode has used different IP address.
 # One case that I was debugging was very interesting. 
DN with datanode UUID DN-UUID-1 moved from ip-address-1 --> ip-address-2 --> 
ip-address-3
DN with datanode UUID DN-UUID-2 moved from ip-address-4 --> ip-address-5 --> 
ip-address-1 
Observe the last IP address change for DN-UUID-2. It is ip-address-1 which is 
the first ip address of DN-UUID-1
 #  There was some bug in our operational script which led to all datanodes 
getting restarted at the same time.

Just after the restart, we see the following log lines.
{noformat}
2023-08-26 04:04:41,964 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.1:50010
2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.2:50010
2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.2:50010
2023-08-26 04:04:51,680 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.3:50010
2023-08-26 04:04:55,328 INFO [on default port 9000] namenode.NameNode - BLOCK* 
registerDatanode: 10.x.x.4:50010
{noformat}
This line is logged [here|#L1184].]

Snippet below:
{code:java}
      DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
      DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
          nodeReg.getIpAddr(), nodeReg.getXferPort());
        
      if (nodeN != null && nodeN != nodeS) {
        NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
        // nodeN previously served a different data storage, 
        // which is not served by anybody anymore.
        removeDatanode(nodeN);
        // physically remove node from datanodeMap
        wipeDatanode(nodeN);
        nodeN = null;
      } {code}
 

This happens when the DatanodeDescriptor is not the same in datanodeMap and 
host2DatanodeMap. HDFS-16540 fixed this bug for lost data locality and not data 
loss. :)

 

By filing this jira, I want to discuss following things:
 # Do we really want to call removeDatanode method from namenode whenever any 
such discrepancy in maps is spotted? Can we rely on the first full block report 
or the periodic full block report from the datanode to fix the metadata? 
 # Improve logging in the blockmanagement code to debug these issues faster.
 # Add a test case with the exact events that occured in our env and still make 
sure that datanodeMap and host2DatanodeMap are consistent.


> Data loss in our production clusters due to missing HDFS-16540 
> ---------------------------------------------------------------
>
>                 Key: HDFS-17188
>                 URL: https://issues.apache.org/jira/browse/HDFS-17188
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.10.1
>            Reporter: Rushabh Shah
>            Assignee: Rushabh Shah
>            Priority: Major
>
> Recently we saw missing blocks in our production clusters running on dynamic 
> environments like AWS. We are running some version of hadoop-2.10 code line.
> Events that led to data loss:
>  #  We have pool of available IP address and whenever datanode restarts we 
> use any available IP address from that pool.
>  #  We have seen during the lifetime of namenode process, multiple datanodes 
> were restarted and the same datanode has used different IP address.
>  # One case that I was debugging was very interesting. 
> DN with datanode UUID DN-UUID-1 moved from ip-address-1 --> ip-address-2 --> 
> ip-address-3
> DN with datanode UUID DN-UUID-2 moved from ip-address-4 --> ip-address-5 --> 
> ip-address-1 
> Observe the last IP address change for DN-UUID-2. It is ip-address-1 which is 
> the first ip address of DN-UUID-1
>  #  There was some bug in our operational script which led to all datanodes 
> getting restarted at the same time.
> Just after the restart, we see the following log lines.
> {noformat}
> 2023-08-26 04:04:41,964 INFO [on default port 9000] namenode.NameNode - 
> BLOCK* registerDatanode: 10.x.x.1:50010
> 2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - 
> BLOCK* registerDatanode: 10.x.x.2:50010
> 2023-08-26 04:04:45,720 INFO [on default port 9000] namenode.NameNode - 
> BLOCK* registerDatanode: 10.x.x.2:50010
> 2023-08-26 04:04:51,680 INFO [on default port 9000] namenode.NameNode - 
> BLOCK* registerDatanode: 10.x.x.3:50010
> 2023-08-26 04:04:55,328 INFO [on default port 9000] namenode.NameNode - 
> BLOCK* registerDatanode: 10.x.x.4:50010
> {noformat}
> This line is logged 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1184].
> Snippet below:
> {code:java}
>       DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
>       DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
>           nodeReg.getIpAddr(), nodeReg.getXferPort());
>         
>       if (nodeN != null && nodeN != nodeS) {
>         NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
>         // nodeN previously served a different data storage, 
>         // which is not served by anybody anymore.
>         removeDatanode(nodeN);
>         // physically remove node from datanodeMap
>         wipeDatanode(nodeN);
>         nodeN = null;
>       } {code}
>  
> This happens when the DatanodeDescriptor is not the same in datanodeMap and 
> host2DatanodeMap. HDFS-16540 fixed this bug for lost data locality and not 
> data loss. :)
>  
> By filing this jira, I want to discuss following things:
>  # Do we really want to call removeDatanode method from namenode whenever any 
> such discrepancy in maps is spotted? Can we rely on the first full block 
> report or the periodic full block report from the datanode to fix the 
> metadata? 
>  # Improve logging in the blockmanagement code to debug these issues faster.
>  # Add a test case with the exact events that occured in our env and still 
> make sure that datanodeMap and host2DatanodeMap are consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to