[ 
https://issues.apache.org/jira/browse/HDFS-12142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087565#comment-16087565
 ] 

Kihwal Lee commented on HDFS-12142:
-----------------------------------

The following appears after the files is successfully closed. It seems 
DataStreamer is sometimes left running and the regular pipeline shutdown is 
somehow recognized as a failure.

{noformat}
2017-07-10 20:19:11,870 [IPC Server handler 72 on 8020] INFO ipc.Server: IPC 
Server handler 72 on 8020, call Call#99 Retry#0
 org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from 
x.x.x.x:50972
java.io.IOException: Unexpected BlockUCState: BP-yyy:blk_12300000_10000 is 
COMPLETE but not UNDER_CONSTRUCTION
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:5509)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:5576)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:918)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline
(ClientNamenodeProtocolServerSideTranslatorPB.java:971)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:999)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:881)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:810)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2523)
{noformat}

The blocks are all finalized normally and had no data loss, but until we know 
the actual cause of this, I can't be sure whether it will cause any data loss.

> Files may be closed before streamer is done
> -------------------------------------------
>
>                 Key: HDFS-12142
>                 URL: https://issues.apache.org/jira/browse/HDFS-12142
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>
> We're encountering multiple cases of clients calling updateBlockForPipeline 
> on completed blocks.  Initial analysis is the client closes a file, 
> completeFile succeeds, then it immediately attempts recovery.  The exception 
> is swallowed on the client, only logged on the NN by checkUCBlock.
> The problem "appears" to be benign (no data loss) but it's unproven if the 
> issue always occurs for successfully closed files.  There appears to be very 
> poor coordination between the dfs output stream's threads which leads to 
> races that confuse the streamer thread – which probably should have been 
> joined before returning from close.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to