[ https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934577#comment-14934577 ]
Hudson commented on HDFS-9092: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk #1195 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1195/]) HDFS-9092. Nfs silently drops overlapping write requests and causes data copying to fail. Contributed by Yongjun Zhang. (yzhang: rev 151fca5032719e561226ef278e002739073c23ec) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OffsetRange.java > Nfs silently drops overlapping write requests and causes data copying to fail > ----------------------------------------------------------------------------- > > Key: HDFS-9092 > URL: https://issues.apache.org/jira/browse/HDFS-9092 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs > Affects Versions: 2.7.1 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Fix For: 2.8.0 > > Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch > > > When NOT using 'sync' option, the NFS writes may issue the following warning: > org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write > (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now > and the size of data copied via NFS will stay at 1248752400. > Found what happened is: > 1. The write requests from client are sent asynchronously. > 2. The NFS gateway has handler to handle the incoming requests by creating an > internal write request structuire and put it into cache; > 3. In parallel, a separate thread in NFS gateway takes requests out from the > cache and writes the data to HDFS. > The current offset is how much data has been written by the write thread in > 3. The detection of overlapping write request happens in 2, but it only > checks the write request against the curent offset, and trim the request if > necessary. Because the write requests are sent asynchronously, if two > requests are beyond the current offset, and they overlap, it's not detected > and both are put into the cache. This cause the symptom reported in this case > at step 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)