[ https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846807#comment-13846807 ]
Jing Zhao commented on HDFS-5657: --------------------------------- The solution looks good to me. Some comments: # The writeAsync method can be deleted instead of commented out. {code} +/* void writeAsync(OpenFileCtx openFileCtx) { if (LOG.isDebugEnabled()) { LOG.debug("Scheduling write back task for fileId: " + openFileCtx.getLatestAttr().getFileId()); @@ -103,7 +103,7 @@ void writeAsync(OpenFileCtx openFileCtx) { WriteBackTask wbTask = new WriteBackTask(openFileCtx); execute(wbTask); } - +*/ {code} # The following section is very similar with the checkAndStartWrite method. You may want to modify checkAndStartWrite and try to reuse it here. {code} + if (!pendingWrites.isEmpty() + && pendingWrites.firstKey().getMin() == nextOffset.get()) { + LOG.info("Race happened: a sequential write was just added." + + " Start another async task."); + asyncStatus = true; + try { + ads.execute(new AsyncDataService.WriteBackTask(this, ads)); + } catch (Throwable t) { + activeState = false; + // Can't set asyncStatus to false since task might be in queue + LOG.error("Can't reinvoke async task, fileId=" + fileId, t); + } + } {code} > race condition causes writeback state error in NFS gateway > ---------------------------------------------------------- > > Key: HDFS-5657 > URL: https://issues.apache.org/jira/browse/HDFS-5657 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs > Reporter: Brandon Li > Assignee: Brandon Li > Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch > > > A race condition between NFS gateway writeback executor thread and new write > handler thread can cause writeback state check failure, e.g., > {noformat} > 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 > (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END______957880843 > 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx > (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending > writes, fileId: 30938 > 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService > (AsyncDataService.java:run(136)) - Asyn data service got > error:java.lang.IllegalStateException: The openFileCtx has false async status > at > com.google.common.base.Preconditions.checkState(Preconditions.java:145) > at > org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890) > at > org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 > (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current > filesize=917504 > 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager > (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: > 917504 length:65536 stableHow:0 > {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)