Re: [PR] HDFS-17299. Adding rack failure tolerance when creating a new file [hadoop]

via GitHub Mon, 05 Feb 2024 15:43:16 -0800


shahrs87 commented on code in PR #6513:
URL: https://github.com/apache/hadoop/pull/6513#discussion_r1479041865



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##########
@@ -1607,8 +1607,11 @@ private void transfer(final DatanodeInfo src, final 
DatanodeInfo[] targets,
    * it can be written to.
    * This happens when a file is appended or data streaming fails
    * It keeps on trying until a pipeline is setup
+   *
+   * Returns boolean whether pipeline was setup successfully or not.
+   * This boolean is used upstream on whether to continue creating pipeline or 
throw exception
    */
-  private void setupPipelineForAppendOrRecovery() throws IOException {
+  private boolean setupPipelineForAppendOrRecovery() throws IOException {

Review Comment:
   We are changing the return type of `setupPipelineForAppendOrRecovery` and 
`setupPipelineInternal` methods.
   IIRC this is the reason: `handleBadDatanode` can silently fail to handle bad 
datanode 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1700-L1706)
 and `setupPipelineInternal` will silently return 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1637-L1638)
 without bubbling up the exception. 
   



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##########
@@ -1618,24 +1621,33 @@ private void setupPipelineForAppendOrRecovery() throws 
IOException {
       LOG.warn(msg);
       lastException.set(new IOException(msg));
       streamerClosed = true;
-      return;
+      return false;
     }
-    setupPipelineInternal(nodes, storageTypes, storageIDs);
+    return setupPipelineInternal(nodes, storageTypes, storageIDs);
   }
 
-  protected void setupPipelineInternal(DatanodeInfo[] datanodes,
+  protected boolean setupPipelineInternal(DatanodeInfo[] datanodes,
       StorageType[] nodeStorageTypes, String[] nodeStorageIDs)
       throws IOException {
     boolean success = false;
     long newGS = 0L;
+    boolean isCreateStage = BlockConstructionStage.PIPELINE_SETUP_CREATE == 
stage;
     while (!success && !streamerClosed && dfsClient.clientRunning) {
       if (!handleRestartingDatanode()) {
-        return;
+        return false;
+      }
+
+      final boolean isRecovery = errorState.hasInternalError() && 
!isCreateStage;
+
+      // During create stage, if we remove a node (nodes.length - 1)
+      //  min replication should still be satisfied.
+      if (isCreateStage && !(dfsClient.dtpReplaceDatanodeOnFailureReplication 
> 0 &&

Review Comment:
   Reason behind adding this check here:
   We are already doing this check in catch block of 
`addDatanode2ExistingPipeline` method 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1528-L1539).
   But when `isAppend` flag is set to `false` and we are in 
`PIPELINE_SETUP_CREATE` phase, we exit early from 
`addDatanode2ExistingPipeline` method 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1489-L1492)
   
   Lets say the replication factor is 3 and we have set the config property 
`dfs.client.block.write.replace-datanode-on-failure.min-replication` to 3 and 
there is one bad node in the pipeline.  Even if we have set the config property 
to `ReplaceDatanodeOnFailure.CONDITION_TRUE`, the code will exit the 
addDatanode2ExistingPipeline method early since `isAppend` is set to false and 
stage is `PIPELINE_SETUP_CREATE`. Assuming that there are NO available nodes in 
the rack, the pipeline will succeed with 2 nodes in the pipeline which will 
violate the config property: 
`dfs.client.block.write.replace-datanode-on-failure.min-replication`
   
   Having written all of these, I realized that even if there are some good 
nodes available in the rack, we will exit early after this patch. Should we 
move this check after `handleDatanodeReplacement` method?  @ritegarg 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Re: [PR] HDFS-17299. Adding rack failure tolerance when creating a new file [hadoop]

Reply via email to