jack created HDFS-6592:
--------------------------

             Summary: Use Fluent to collect data to append to HDFS. Throw the 
AlreadyBeingCreatedException exception
                 Key: HDFS-6592
                 URL: https://issues.apache.org/jira/browse/HDFS-6592
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.3.0
            Reporter: jack


We use Fluent to collect log data. The log data append to the files in HDFS. 

The cluster configuration:

Namenode : namenode1(hostname)

secondnamenode: namenode2
3 datanodes: datanode1, datanode2, datanode3
3 replications

Every few days,  suffere from the following exception:

Exception in nameNode1:

2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to create file [file_nameXXXXX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
create file [file_nameXXXXX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] 
on client [dataNode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]

Exception in DataNode1:

2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Unable to close file because the last block does not have enough number of 
replicas.
2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file [file_nameXXXXX] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because 
this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1] at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

According to the log, we infer the flow of the exception:

1. Namenode update pipeline with just one datanode

        namenode1 log: 2014-06-22 09:54:16,604 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline  
        (block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, 
newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010], 
        clientName=DFSClient_NONMAPREDUCE_349196146_2027206)

2. datanode1 throw exception during close.

        datanode1 log: 2014-06-22 09:54:26,569 INFO 
org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying...

3. The subsequent collected data from  Fluent will triger another DFSClient to 
append to the same file.

        namenode1 log: 2014-06-22 09:54:41,892 WARN 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:hadoop (auth:SIMPLE) 
        cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: 
Failed to create file [file_name] for 
[DFSClient_NONMAPREDUCE_-1425263782_2027206] on client 
        [datanode1], because this file is already being created by 
[DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1]

4. The subsequent DFSClient will triger to recover the Lease every 
LEASE_SOFTLIMIT_PERIOD
        
        namenode1 log: 2014-06-22 09:58:34,722 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. 
 Holder: 
        DFSClient_NONMAPREDUCE_349196146_2027206, pendingcreates: 1], 
src=file_name client DFSClient_NONMAPREDUCE_349196146_2027206

5. Fail to recover the lease.

        namenode1 log:

        2014-06-22 09:58:34,722 WARN 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:hadoop (auth:SIMPLE) 
        cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: 
DIR* NameSystem.internalReleaseLease: Failed to release lease for file 
file_name. Committed 
        blocks are waiting to be minimally replicated. Try again later.

        datanode1 log:

        2014-06-22 09:56:15,149 WARN 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:hadoop (auth:SIMPLE) 
        
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 DIR* NameSystem.internalReleaseLease: Failed to release 
        lease for file file_name. Committed blocks are waiting to be minimally 
replicated. Try again later. at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3722)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2429)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

6. epire the LEASE_HARDLIMIT_PERIOD

        namenode1 log: WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: failed to create file file_name for DFSClient on client 
datanode1 because pendingCreates is non-null but no leases found.

we found HDFS-4504 may be associated with our issue. However, need to check.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to