[ 
https://issues.apache.org/jira/browse/HDFS-11486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893474#comment-15893474
 ] 

Andrew Wang commented on HDFS-11486:
------------------------------------

Thanks for investigating this Wei-Chiu,

I think this is an issue with the NN's completeFile handling. The 
{{hasMinStorage}} check for LIVE nodes excludes D_I_P nodes. This case doesn't 
come up often since min rep is normally set to 1, and operators typically 
decommission a rack at a time so there'll always be one replica that's not on a 
D_I_P node.

IMO it's still safe to commit/complete a block even when on a D_I_P node. The 
DecommissionManager does a final block scan before transitioning a node from 
D_I_P to Decommissioned, which will catch any under-replicated blocks like 
this. It'd be even better though to add these under-replicated blocks to the 
replication queues at close time.

This might also apply to maintenance mode nodes, though I haven't caught up on 
that work.

> Client close() should not fail fast if the last block is being decommissioned
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-11486
>                 URL: https://issues.apache.org/jira/browse/HDFS-11486
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDF-11486.test.patch
>
>
> If a DFS client closes a file while the last block is being decommissioned, 
> the close() may fail if the decommission of the block does not complete in a 
> few seconds.
> When a DataNode is being decommissioned, NameNode marks the DN's state as 
> DECOMMISSION_INPROGRESS_INPROGRESS, and blocks with replicas on these 
> DataNodes become under-replicated immediately. A close() call which attempts 
> to complete the last open block will fail if the number of live replicas is 
> below minimal replicated factor, due to too many replicas residing on the 
> DataNodes.
> The client internally will try to complete the last open block for up to 5 
> times by default, which is roughly 12 seconds. After that, close() throws an 
> exception like the following, which is typically not handled properly.
> {noformat}
> java.io.IOException: Unable to close file because the last 
> blockBP-33575088-10.0.0.200-1488410554081:blk_1073741827_1003 does not have 
> enough number of replicas.
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:864)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:827)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:793)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>       at 
> org.apache.hadoop.hdfs.TestDecommission.testCloseWhileDecommission(TestDecommission.java:708)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> Once the exception is thrown, the client usually does not attempt to close 
> again, so the file remains in open state, and the last block remains in under 
> replicated state.
> Subsequently, administrator runs recoverLease tool to salvage the file, but 
> the attempt failed because the block remains in under replicated state. It is 
> not clear why the block is never replicated though. However, administrators 
> think it becomes a corrupt file because the file remains open via fsck 
> -openforwrite and the file modification time is hours ago.
> In summary, I do not think close() should fail because the last block is 
> being decommissioned. The block has sufficient number replicas, and it's just 
> that some replicas are being decommissioned. Decomm should be transparent to 
> clients.
> This issue seems to be more prominent on a very large scale cluster, with min 
> replication factor set to 2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to