[jira] [Comment Edited] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

nhaorand (Jira) Mon, 15 Nov 2021 04:46:41 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443803#comment-17443803
 ]


nhaorand edited comment on HDFS-16322 at 11/15/21, 12:45 PM:
-------------------------------------------------------------

Thanks [~hexiaoqiao] and [~szetszwo] for your discussion. Considering the 
following case:
 # The initial length of the file is 100.
 # Client A sends a request of truncate() for the file.
 # NameNode executes truncate request, changes the file length to be 50 and 
responds to Client A.
 # Client B finds the length of the file is incorrect and requests append for 
the file. Then the file length becomes 100.
 # Client A retries the truncate because it does not achieve response. NameNode 
will re-execute it because no RetryCache for truncate request then respond to 
Client A.

There is only one client sending truncate. However, from the view of client B, 
there are two truncate, which causes that new data are lost and the final 
length of the file is not 100. As a method to perform fault tolerance, retry 
should not make any individual client to realize that retry happens. But in 
this case, client B can realize that the client A issues a retry. Therefore, 
truncate() should be fixed.


was (Author: JIRAUSER280052):
Thanks [~hexiaoqiao] and [~szetszwo] for your discussion. Considering the 
following case:
 # The initial length of the file is 100.
 # Client A sends a request of truncate() for the file.
 # NameNode executes truncate request, changes the file length to be 50 and 
responds to Client A.
 # Client B finds the length of the file is incorrect and requests append for 
the file. Then the file length becomes 100.
 # Client A retries the truncate because it does not achieve response. NameNode 
will re-execute it because no RetryCache for truncate request then respond to 
Client A.

There is only one client sending truncate. However, from the view of client B, 
there are two truncate, which causes that new data are lost and the final 
length of the file is not 100. As a method to perform fault tolerance, retry 
should not make any individual client to realize that retry happens. But in 
this case, client B can realize that the client A issues a retry.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-16322
>                 URL: https://issues.apache.org/jira/browse/HDFS-16322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>         Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>            Reporter: nhaorand
>            Priority: Major
>         Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

Reply via email to