[ 
https://issues.apache.org/jira/browse/HDFS-16322?focusedWorklogId=690449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690449
 ]

ASF GitHub Bot logged work on HDFS-16322:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Dec/21 06:28
            Start Date: 04/Dec/21 06:28
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on a change in pull request #3705:
URL: https://github.com/apache/hadoop/pull/3705#discussion_r762392267



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
##########
@@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, 
Options.Rename... options)
   }
 
   @Override // ClientProtocol
-  public boolean truncate(String src, long newLength, String clientName)
-      throws IOException {
+  public boolean truncate(String src, long newLength, String clientName) 
throws IOException {

Review comment:
       nit: only formatting change and unrelated, Please avoid 

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
##########
@@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, 
Options.Rename... options)
   }
 
   @Override // ClientProtocol
-  public boolean truncate(String src, long newLength, String clientName)
-      throws IOException {
+  public boolean truncate(String src, long newLength, String clientName) 
throws IOException {
     checkNNStartup();
-    stateChangeLog
-        .debug("*DIR* NameNode.truncate: " + src + " to " + newLength);
+    if(stateChangeLog.isDebugEnabled()) {
+      stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " +
+          newLength);
+    }
+    CacheEntryWithPayload cacheEntry = 
RetryCache.waitForCompletion(retryCache, null);
+    if (cacheEntry != null && cacheEntry.isSuccess()) {
+      return (boolean)cacheEntry.getPayload();
+    }
+
     String clientMachine = getClientMachine();
+    boolean ret = false;
     try {
-      return namesystem.truncate(
+      ret = namesystem.truncate(
           src, newLength, clientName, clientMachine, now());
     } finally {
+      RetryCache.setState(cacheEntry, true, ret);

Review comment:
       Finally block will be executed in case of exception as well, we can not 
hard-code `true` here. 
   Can check other codes like for `append` to get some reference & idea.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
##########
@@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, 
Options.Rename... options)
   }
 
   @Override // ClientProtocol
-  public boolean truncate(String src, long newLength, String clientName)
-      throws IOException {
+  public boolean truncate(String src, long newLength, String clientName) 
throws IOException {
     checkNNStartup();
-    stateChangeLog
-        .debug("*DIR* NameNode.truncate: " + src + " to " + newLength);
+    if(stateChangeLog.isDebugEnabled()) {
+      stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " +
+          newLength);
+    }
+    CacheEntryWithPayload cacheEntry = 
RetryCache.waitForCompletion(retryCache, null);

Review comment:
       I think we should have 
   ``
       namesystem.checkOperation(OperationCategory.WRITE);
   ``
   above the this like other calls and remove this check before lock in 
FsNamesystem.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
##########
@@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, 
Options.Rename... options)
   }
 
   @Override // ClientProtocol
-  public boolean truncate(String src, long newLength, String clientName)
-      throws IOException {
+  public boolean truncate(String src, long newLength, String clientName) 
throws IOException {
     checkNNStartup();
-    stateChangeLog
-        .debug("*DIR* NameNode.truncate: " + src + " to " + newLength);
+    if(stateChangeLog.isDebugEnabled()) {
+      stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " +
+          newLength);
+    }
+    CacheEntryWithPayload cacheEntry = 
RetryCache.waitForCompletion(retryCache, null);
+    if (cacheEntry != null && cacheEntry.isSuccess()) {
+      return (boolean)cacheEntry.getPayload();
+    }
+
     String clientMachine = getClientMachine();
+    boolean ret = false;
     try {
-      return namesystem.truncate(
+      ret = namesystem.truncate(
           src, newLength, clientName, clientMachine, now());
     } finally {
+      RetryCache.setState(cacheEntry, true, ret);
       metrics.incrFilesTruncated();
     }
+    return ret;
   }
-
+  

Review comment:
       nit:
   unrelated change, revert!!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 690449)
    Time Spent: 0.5h  (was: 20m)

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-16322
>                 URL: https://issues.apache.org/jira/browse/HDFS-16322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namanode
>         Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>            Reporter: nhaorand
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: TestFileTruncate.java, h16322_20211116.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to