[jira] [Updated] (HDFS-17798) The problem that bad replicas in the mini cluster cannot be automatically replicated

kuper (Jira) Wed, 18 Jun 2025 20:53:09 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-17798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


kuper updated HDFS-17798:
-------------------------
    Description: 
* In a 3-datanode cluster with a 3-replica block, if one replica on a node 
becomes corrupted (and this corruption did not occur during the write process), 
it will result in:
 * 
 ** The corrupted replica cannot be removed from the damaged node.
 ** Due to the missing replica, replication reconstruction tasks will 
continuously attempt to replicate the corrupted replica.
 ** However, during reconstruction, nodes already hosting a replica of this 
block are excluded—meaning all 3 datanodes are excluded.
 ** This prevents the selection of a suitable target node for replication, 
eventually creating a {*}vicious cycle{*}.

*reproduction*
 * Or execute in the 3datanode cluster in the following order
 ** Find a normal block with three replicas and destroy the replica file of one 
of its Datanodes
 ** When waiting for the datanode disk scan cycle, it will be found that there 
is a damaged copy and it cannot be rebuilt by other copies
 * In the TestBlockManager add 
testMiniClusterCannotReconstructionWhileReplicaAnomaly

 
{code:java}
  @Test(timeout = 60000)
  public void testMiniClusterCannotReconstructionWhileReplicaAnomaly() 
      throws IOException, InterruptedException, TimeoutException {
    Configuration conf = new HdfsConfiguration();
    conf.setInt("dfs.datanode.directoryscan.interval", 
DN_DIRECTORYSCAN_INTERVAL);
    conf.setInt("dfs.namenode.replication.interval", 1);
    conf.setInt("dfs.heartbeat.interval", 1);
    String src = "/test-reconstruction";
    Path file = new Path(src);
    MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(3).build();
    try {
      cluster.waitActive();
      FSNamesystem fsn = cluster.getNamesystem();
      BlockManager bm = fsn.getBlockManager();
      
      FSDataOutputStream out = null;
      FileSystem fs = cluster.getFileSystem();
      try {
        out = fs.create(file);
        for (int i = 0; i < 1024 * 1; i++) {
          out.write(i);
        }
        out.hflush();
      } finally {
        IOUtils.closeStream(out);
      }
      
      FSDataInputStream in = null;
      ExtendedBlock oldBlock = null;
      try {
        in = fs.open(file);
        oldBlock = DFSTestUtil.getAllBlocks(in).get(0).getBlock();
      } finally {
        IOUtils.closeStream(in);
      }
      DataNode dn = cluster.getDataNodes().get(0);
      String blockPath = 
dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getBlockPath();
      String metaBlockPath = 
dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getMetaPath();
      Files.write(Paths.get(blockPath), Collections.emptyList());
      Files.write(Paths.get(metaBlockPath), Collections.emptyList());
      cluster.restartDataNode(0, true);
      cluster.waitDatanodeConnectedToActive(dn, 60000);
      while(!dn.isDatanodeFullyStarted()) {
        Thread.sleep(1000);
      }
      Thread.sleep(DN_DIRECTORYSCAN_INTERVAL * 1000);
      cluster.triggerBlockReports();
      BlockInfo bi = bm.getStoredBlock(oldBlock.getLocalBlock());
      assertTrue(bm.isNeededReconstruction(bi,
          bm.countNodes(bi, cluster.getNamesystem().isInStartupSafeMode())));
      BlockReconstructionWork reconstructionWork = null;
      fsn.readLock();
      try {
        reconstructionWork = bm.scheduleReconstruction(bi, 3);
      } finally {
        fsn.readUnlock();
      }
      assertNotNull(reconstructionWork);
      assertEquals(reconstructionWork.getContainingNodes().size(), 3);
     
    } finally {
      if (cluster != null) {
        cluster.shutdown();
      }
    }
  } {code}
 
 * Or execute in the 3datanode cluster in the following order
 **  

 

  was:
* In a 3-datanode cluster with a 3-replica block, if one replica on a node 
becomes corrupted (and this corruption did not occur during the write process), 
it will result in:
 * 
 ** The corrupted replica cannot be removed from the damaged node.
 ** Due to the missing replica, replication reconstruction tasks will 
continuously attempt to replicate the corrupted replica.
 ** However, during reconstruction, nodes already hosting a replica of this 
block are excluded—meaning all 3 datanodes are excluded.
 ** This prevents the selection of a suitable target node for replication, 
eventually creating a {*}vicious cycle{*}.

*reproduction*
 * Or execute in the 3datanode cluster in the following order
 ** Find a normal block with three replicas and destroy the replica file of one 
of its Datanodes
 ** 
When waiting for the datanode disk scan cycle, it will be found that there is a 
damaged copy and it cannot be rebuilt by other copies
 * In the TestBlockManager add 
testMiniClusterCannotReconstructionWhileReplicaAnomaly

 
{code:java}
  @Test(timeout = 60000)
  public void testMiniClusterCannotReconstructionWhileReplicaAnomaly() 
      throws IOException, InterruptedException, TimeoutException {
    Configuration conf = new HdfsConfiguration();
    conf.setInt("dfs.datanode.directoryscan.interval", 
DN_DIRECTORYSCAN_INTERVAL);
    conf.setInt("dfs.namenode.replication.interval", 1);
    conf.setInt("dfs.heartbeat.interval", 1);
    String src = "/test-reconstruction";
    Path file = new Path(src);
    MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(3).build();
    try {
      cluster.waitActive();
      FSNamesystem fsn = cluster.getNamesystem();
      BlockManager bm = fsn.getBlockManager();
      
      FSDataOutputStream out = null;
      FileSystem fs = cluster.getFileSystem();
      try {
        out = fs.create(file);
        for (int i = 0; i < 1024 * 1; i++) {
          out.write(i);
        }
        out.hflush();
      } finally {
        IOUtils.closeStream(out);
      }
      
      FSDataInputStream in = null;
      ExtendedBlock oldBlock = null;
      try {
        in = fs.open(file);
        oldBlock = DFSTestUtil.getAllBlocks(in).get(0).getBlock();
      } finally {
        IOUtils.closeStream(in);
      }
      DataNode dn = cluster.getDataNodes().get(0);
      String blockPath = 
dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getBlockPath();
      String metaBlockPath = 
dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getMetaPath();
      Files.write(Paths.get(blockPath), Collections.emptyList());
      Files.write(Paths.get(metaBlockPath), Collections.emptyList());
      cluster.restartDataNode(0, true);
      cluster.waitDatanodeConnectedToActive(dn, 60000);
      while(!dn.isDatanodeFullyStarted()) {
        Thread.sleep(1000);
      }
      Thread.sleep(DN_DIRECTORYSCAN_INTERVAL * 1000);
      cluster.triggerBlockReports();
      BlockInfo bi = bm.getStoredBlock(oldBlock.getLocalBlock());
      assertTrue(bm.isNeededReconstruction(bi,
          bm.countNodes(bi, cluster.getNamesystem().isInStartupSafeMode())));
      BlockReconstructionWork reconstructionWork = null;
      fsn.readLock();
      try {
        reconstructionWork = bm.scheduleReconstruction(bi, 3);
      } finally {
        fsn.readUnlock();
      }
      assertNotNull(reconstructionWork);
      assertEquals(reconstructionWork.getContainingNodes().size(), 3);
     
    } finally {
      if (cluster != null) {
        cluster.shutdown();
      }
    }
  } {code}
 
 * Or execute in the 3datanode cluster in the following order
 **  

 


> The problem that bad replicas in the mini cluster cannot be automatically 
> replicated
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-17798
>                 URL: https://issues.apache.org/jira/browse/HDFS-17798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 3.3.6
>            Reporter: kuper
>            Assignee: kuper
>            Priority: Major
>
> * In a 3-datanode cluster with a 3-replica block, if one replica on a node 
> becomes corrupted (and this corruption did not occur during the write 
> process), it will result in:
>  * 
>  ** The corrupted replica cannot be removed from the damaged node.
>  ** Due to the missing replica, replication reconstruction tasks will 
> continuously attempt to replicate the corrupted replica.
>  ** However, during reconstruction, nodes already hosting a replica of this 
> block are excluded—meaning all 3 datanodes are excluded.
>  ** This prevents the selection of a suitable target node for replication, 
> eventually creating a {*}vicious cycle{*}.
> *reproduction*
>  * Or execute in the 3datanode cluster in the following order
>  ** Find a normal block with three replicas and destroy the replica file of 
> one of its Datanodes
>  ** When waiting for the datanode disk scan cycle, it will be found that 
> there is a damaged copy and it cannot be rebuilt by other copies
>  * In the TestBlockManager add 
> testMiniClusterCannotReconstructionWhileReplicaAnomaly
>  
> {code:java}
>   @Test(timeout = 60000)
>   public void testMiniClusterCannotReconstructionWhileReplicaAnomaly() 
>       throws IOException, InterruptedException, TimeoutException {
>     Configuration conf = new HdfsConfiguration();
>     conf.setInt("dfs.datanode.directoryscan.interval", 
> DN_DIRECTORYSCAN_INTERVAL);
>     conf.setInt("dfs.namenode.replication.interval", 1);
>     conf.setInt("dfs.heartbeat.interval", 1);
>     String src = "/test-reconstruction";
>     Path file = new Path(src);
>     MiniDFSCluster cluster = new 
> MiniDFSCluster.Builder(conf).numDataNodes(3).build();
>     try {
>       cluster.waitActive();
>       FSNamesystem fsn = cluster.getNamesystem();
>       BlockManager bm = fsn.getBlockManager();
>       
>       FSDataOutputStream out = null;
>       FileSystem fs = cluster.getFileSystem();
>       try {
>         out = fs.create(file);
>         for (int i = 0; i < 1024 * 1; i++) {
>           out.write(i);
>         }
>         out.hflush();
>       } finally {
>         IOUtils.closeStream(out);
>       }
>       
>       FSDataInputStream in = null;
>       ExtendedBlock oldBlock = null;
>       try {
>         in = fs.open(file);
>         oldBlock = DFSTestUtil.getAllBlocks(in).get(0).getBlock();
>       } finally {
>         IOUtils.closeStream(in);
>       }
>       DataNode dn = cluster.getDataNodes().get(0);
>       String blockPath = 
> dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getBlockPath();
>       String metaBlockPath = 
> dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getMetaPath();
>       Files.write(Paths.get(blockPath), Collections.emptyList());
>       Files.write(Paths.get(metaBlockPath), Collections.emptyList());
>       cluster.restartDataNode(0, true);
>       cluster.waitDatanodeConnectedToActive(dn, 60000);
>       while(!dn.isDatanodeFullyStarted()) {
>         Thread.sleep(1000);
>       }
>       Thread.sleep(DN_DIRECTORYSCAN_INTERVAL * 1000);
>       cluster.triggerBlockReports();
>       BlockInfo bi = bm.getStoredBlock(oldBlock.getLocalBlock());
>       assertTrue(bm.isNeededReconstruction(bi,
>           bm.countNodes(bi, cluster.getNamesystem().isInStartupSafeMode())));
>       BlockReconstructionWork reconstructionWork = null;
>       fsn.readLock();
>       try {
>         reconstructionWork = bm.scheduleReconstruction(bi, 3);
>       } finally {
>         fsn.readUnlock();
>       }
>       assertNotNull(reconstructionWork);
>       assertEquals(reconstructionWork.getContainingNodes().size(), 3);
>      
>     } finally {
>       if (cluster != null) {
>         cluster.shutdown();
>       }
>     }
>   } {code}
>  
>  * Or execute in the 3datanode cluster in the following order
>  **  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-17798) The problem that bad replicas in the mini cluster cannot be automatically replicated

Reply via email to