[jira] [Work logged] (HDFS-16333) fix balancer bug when transfer an EC block

ASF GitHub Bot (Jira) Thu, 02 Dec 2021 01:47:05 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16333?focusedWorklogId=689122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689122
 ]


ASF GitHub Bot logged work on HDFS-16333:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Dec/21 09:46
            Start Date: 02/Dec/21 09:46
    Worklog Time Spent: 10m 
      Work Description: tasanuma commented on a change in pull request #3679:
URL: https://github.com/apache/hadoop/pull/3679#discussion_r760910640



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
##########
@@ -848,16 +867,30 @@ private long getBlockList() throws IOException {
           synchronized (block) {
             block.clearLocations();
 
+            if (blkLocs instanceof StripedBlockWithLocations) {
+              // indices may have been adjusted before
+              ((DBlockStriped) block).setIndices(
+                  ((StripedBlockWithLocations) blkLocs).getIndices());
+            }
+
             // update locations
+            List<Integer> adjustList = new ArrayList<>();
             final String[] datanodeUuids = blkLocs.getDatanodeUuids();
             final StorageType[] storageTypes = blkLocs.getStorageTypes();
             for (int i = 0; i < datanodeUuids.length; i++) {
               final StorageGroup g = storageGroupMap.get(
                   datanodeUuids[i], storageTypes[i]);
               if (g != null) { // not unknown
                 block.addLocation(g);
+              } else if (blkLocs instanceof StripedBlockWithLocations) {
+                adjustList.add(i);
               }
             }
+
+            if (blkLocs instanceof StripedBlockWithLocations) {
+              // adjust indices if locations has been updated

Review comment:
       Could you please provide more detailed comments on when the locations 
could be updated and why we need to adjust indices?

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
##########
@@ -1615,6 +1619,92 @@ public void testBalancerWithStripedFile() throws 
Exception {
     NameNodeConnector.setWrite2IdFile(false);
   }
 
+  @Test
+  public void testBalancerWithExcludeListWithStripedFile() throws Exception {
+    Configuration conf = new Configuration();
+    initConfWithStripe(conf);
+    NameNodeConnector.setWrite2IdFile(true);
+    doTestBalancerWithExcludeListWithStripedFile(conf);
+    NameNodeConnector.setWrite2IdFile(false);
+  }
+
+  private void doTestBalancerWithExcludeListWithStripedFile(Configuration 
conf) throws Exception {
+    int numOfDatanodes = dataBlocks + parityBlocks + 3;
+    int numOfRacks = dataBlocks;
+    long capacity = 20 * defaultBlockSize;
+    long[] capacities = new long[numOfDatanodes];
+    Arrays.fill(capacities, capacity);
+    String[] racks = new String[numOfDatanodes];
+    for (int i = 0; i < numOfDatanodes; i++) {
+      racks[i] = "/rack" + (i % numOfRacks);
+    }
+    cluster = new MiniDFSCluster.Builder(conf)
+        .numDataNodes(numOfDatanodes)
+        .racks(racks)
+        .simulatedCapacities(capacities)
+        .build();
+
+    try {
+      cluster.waitActive();
+      client = NameNodeProxies.createProxy(conf, 
cluster.getFileSystem(0).getUri(),
+          ClientProtocol.class).getProxy();
+      client.enableErasureCodingPolicy(
+          StripedFileTestUtil.getDefaultECPolicy().getName());
+      client.setErasureCodingPolicy("/",
+          StripedFileTestUtil.getDefaultECPolicy().getName());
+
+      long totalCapacity = sum(capacities);
+
+      // fill up the cluster with 30% data. It'll be 45% full plus parity.
+      long fileLen = totalCapacity * 3 / 10;
+      long totalUsedSpace = fileLen * (dataBlocks + parityBlocks) / dataBlocks;
+      FileSystem fs = cluster.getFileSystem(0);
+      DFSTestUtil.createFile(fs, filePath, fileLen, (short) 3, r.nextLong());
+
+      // verify locations of striped blocks
+      LocatedBlocks locatedBlocks = client.getBlockLocations(fileName, 0, 
fileLen);
+      StripedFileTestUtil.verifyLocatedStripedBlocks(locatedBlocks, groupSize);
+
+      // get datanode report
+      DatanodeInfo[] datanodeReport = 
client.getDatanodeReport(DatanodeReportType.ALL);
+
+      // add datanode in new rack
+      String newRack = "/rack" + (++numOfRacks);
+      cluster.startDataNodes(conf, 2, true, null,
+          new String[]{newRack, newRack}, null,
+          new long[]{capacity, capacity});
+      totalCapacity += capacity*2;
+      cluster.triggerHeartbeats();
+
+      // add datanode to exclude list
+      Set<String> dnList = new HashSet<>();
+      dnList.add(datanodeReport[0].getHostName());
+      BalancerParameters.Builder pBuilder = new BalancerParameters.Builder();
+      pBuilder.setExcludedNodes(dnList);
+      waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);
+
+      // start balancer and check the failed num of moving task
+      Collection<URI> namenodes = DFSUtil.getInternalNsRpcUris(conf);
+      final int run = runBalancer(namenodes, pBuilder.build(), conf, true);
+      if (conf.getInt(
+          DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY,
+          DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT)
+          == 0) {
+        assertEquals(ExitStatus.NO_MOVE_PROGRESS.getExitCode(), run);
+      } else {
+        assertEquals(ExitStatus.SUCCESS.getExitCode(), run);
+      }
+      waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);

Review comment:
       How about adding another `void runBalancer` method
   ```diff
      private void runBalancer(Configuration conf, long totalUsedSpace,
          long totalCapacity, BalancerParameters p, int excludedNodes)
          throws Exception {
   +    runBalancer(conf, totalUsedSpace, totalCapacity, p, excludedNodes, 
false);
   +  }
   +
   +  private void runBalancer(Configuration conf, long totalUsedSpace,
   +      long totalCapacity, BalancerParameters p, int excludedNodes, boolean 
checkFailedNum)
   +      throws Exception {
        waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);
   ```
   
   and just calling it from the unit test?
   ```suggestion
        runBalancer(namenodes, pBuilder.build(), conf, true);
   ```

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
##########
@@ -532,6 +532,25 @@ public long getNumBytes(StorageGroup storage) {
       }
       return block.getNumBytes();
     }
+
+    public void setIndices(byte[] indices) {
+      this.indices = indices;
+    }
+
+    public void adjustIndices(List<Integer> list) {

Review comment:
       Please provide comments on what this method does.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
##########
@@ -1615,6 +1619,92 @@ public void testBalancerWithStripedFile() throws 
Exception {
     NameNodeConnector.setWrite2IdFile(false);
   }
 
+  @Test
+  public void testBalancerWithExcludeListWithStripedFile() throws Exception {
+    Configuration conf = new Configuration();
+    initConfWithStripe(conf);
+    NameNodeConnector.setWrite2IdFile(true);
+    doTestBalancerWithExcludeListWithStripedFile(conf);
+    NameNodeConnector.setWrite2IdFile(false);
+  }
+
+  private void doTestBalancerWithExcludeListWithStripedFile(Configuration 
conf) throws Exception {
+    int numOfDatanodes = dataBlocks + parityBlocks + 3;
+    int numOfRacks = dataBlocks;
+    long capacity = 20 * defaultBlockSize;
+    long[] capacities = new long[numOfDatanodes];
+    Arrays.fill(capacities, capacity);
+    String[] racks = new String[numOfDatanodes];
+    for (int i = 0; i < numOfDatanodes; i++) {
+      racks[i] = "/rack" + (i % numOfRacks);
+    }
+    cluster = new MiniDFSCluster.Builder(conf)
+        .numDataNodes(numOfDatanodes)
+        .racks(racks)
+        .simulatedCapacities(capacities)
+        .build();
+
+    try {
+      cluster.waitActive();
+      client = NameNodeProxies.createProxy(conf, 
cluster.getFileSystem(0).getUri(),
+          ClientProtocol.class).getProxy();
+      client.enableErasureCodingPolicy(
+          StripedFileTestUtil.getDefaultECPolicy().getName());
+      client.setErasureCodingPolicy("/",
+          StripedFileTestUtil.getDefaultECPolicy().getName());
+
+      long totalCapacity = sum(capacities);
+
+      // fill up the cluster with 30% data. It'll be 45% full plus parity.
+      long fileLen = totalCapacity * 3 / 10;
+      long totalUsedSpace = fileLen * (dataBlocks + parityBlocks) / dataBlocks;
+      FileSystem fs = cluster.getFileSystem(0);
+      DFSTestUtil.createFile(fs, filePath, fileLen, (short) 3, r.nextLong());
+
+      // verify locations of striped blocks
+      LocatedBlocks locatedBlocks = client.getBlockLocations(fileName, 0, 
fileLen);
+      StripedFileTestUtil.verifyLocatedStripedBlocks(locatedBlocks, groupSize);
+
+      // get datanode report
+      DatanodeInfo[] datanodeReport = 
client.getDatanodeReport(DatanodeReportType.ALL);
+
+      // add datanode in new rack
+      String newRack = "/rack" + (++numOfRacks);
+      cluster.startDataNodes(conf, 2, true, null,
+          new String[]{newRack, newRack}, null,
+          new long[]{capacity, capacity});
+      totalCapacity += capacity*2;
+      cluster.triggerHeartbeats();
+
+      // add datanode to exclude list
+      Set<String> dnList = new HashSet<>();
+      dnList.add(datanodeReport[0].getHostName());
+      BalancerParameters.Builder pBuilder = new BalancerParameters.Builder();
+      pBuilder.setExcludedNodes(dnList);
+      waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);
+
+      // start balancer and check the failed num of moving task
+      Collection<URI> namenodes = DFSUtil.getInternalNsRpcUris(conf);
+      final int run = runBalancer(namenodes, pBuilder.build(), conf, true);
+      if (conf.getInt(
+          DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY,
+          DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT)
+          == 0) {
+        assertEquals(ExitStatus.NO_MOVE_PROGRESS.getExitCode(), run);
+      } else {
+        assertEquals(ExitStatus.SUCCESS.getExitCode(), run);
+      }
+      waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);
+
+      // verify locations of striped blocks
+      locatedBlocks = client.getBlockLocations(fileName, 0, fileLen);
+      StripedFileTestUtil.verifyLocatedStripedBlocks(locatedBlocks, groupSize);

Review comment:
       I did the unit test multiple times, and sometimes the length of the 
`locatedBlocks` is larger than `groupSize`, and the verification failed. Could 
you check it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 689122)
    Time Spent: 1.5h  (was: 1h 20m)

> fix balancer bug when transfer an EC block
> ------------------------------------------
>
>                 Key: HDFS-16333
>                 URL: https://issues.apache.org/jira/browse/HDFS-16333
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer &amp; mover
>            Reporter: qinyuren
>            Assignee: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF0000}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF0000}b{color} to node 
> {color:#FF0000}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF0000}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF0000}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF0000}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF0000}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16333) fix balancer bug when transfer an EC block

Reply via email to