[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243669#comment-13243669
 ] 

VinayaKumar B commented on HDFS-2994:
-------------------------------------

Able to Reproduce till recoverLease releases the lease because of All Blocks 
COMPLETE, but not able to Reproduce 
*replaceNode* failure.

Scenario may be like this.

1. Client completed the writing the lastpacket to pipeline and got Ack also.
2. Before DN report the finalized block, Client's first *completeFile* call 
reached NN and marked Block as COMPLETE, but lease not removed since 
minReplication not satisfied. Say now Client dead.
3. Now DNs reports blocks and same thing is updated in BlockMap.
4. Now recoverLease is called on same file. As part of this file is finalized 
and Lease is getting removed because of COMPLETE blocks.
5. Now append is also called on the same file. 

In the Issue case, append is getting failed, because of *replaceNode* failure.
But, when tried to reproduce, append is successfully reopening the stream.
                
> If lease is recovered successfully inline with create, create can fail
> ----------------------------------------------------------------------
>
>                 Key: HDFS-2994
>                 URL: https://issues.apache.org/jira/browse/HDFS-2994
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to