[ https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804667#comment-14804667 ]
Brandon Li edited comment on HDFS-9092 at 9/17/15 10:51 PM: ------------------------------------------------------------ Thank you, [~yzhangal] for the patch. Could you roughly describe the idea of the fix? possibly by copy&paste the comment from the code to here. was (Author: brandonli): Thank you, [~yzhangal] for the patch. Could you roughly describe the idea of the fix? > Nfs silently drops overlapping write requests, thus data copying can't > complete > ------------------------------------------------------------------------------- > > Key: HDFS-9092 > URL: https://issues.apache.org/jira/browse/HDFS-9092 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs > Affects Versions: 2.7.1 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-9092.001.patch > > > When NOT using 'sync' option, the NFS writes may issue the following warning: > org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write > (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now > and the size of data copied via NFS will stay at 1248752400. > Found what happened is: > 1. The write requests from client are sent asynchronously. > 2. The NFS gateway has handler to handle the incoming requests by creating an > internal write request structuire and put it into cache; > 3. In parallel, a separate thread in NFS gateway takes requests out from the > cache and writes the data to HDFS. > The current offset is how much data has been written by the write thread in > 3. The detection of overlapping write request happens in 2, but it only > checks the write request against the curent offset, and trim the request if > necessary. Because the write requests are sent asynchronously, if two > requests are beyond the current offset, and they overlap, it's not detected > and both are put into the cache. This cause the symptom reported in this case > at step 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)