[
https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671560#action_12671560
]
dhruba borthakur commented on HADOOP-5134:
------------------------------------------
At first I thought that the fix is simple. commitBlockSync should not update
the blocksMap. But this could cause a subtle problem. The problem is the above
fix could cause a call to "append" fail when it should not. let me try to
explain.
Suppose, a writer is writing data to a file and dies before closing the file. A
new writer starts and invokes "append" to try to append to this file. This
"append" call triggers lease recovery, and thereby causes the Primary Datanode
to invoke commitBlockSync. Meanwhle, the append call fails (as expected) with
AlreadyBeingCreatedExcetion". The commitBlockSync call removes the lease but
does not update block locations of the last block. The datanode(s) that have
the last block sends blockReceived to the NN, but before they reach the NN,
the NN starts processing another call to "append". This append will now find
that the file is not under construction anymore but that the last block of the
file does not have any block locations associated with it. This means that the
"append" call will fail. This is not expected behaviour.
> FSNamesystem#commitBlockSynchronization adds under-construction block
> locations to blocksMap
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-5134
> URL: https://issues.apache.org/jira/browse/HADOOP-5134
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.2
> Reporter: Hairong Kuang
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.18.4
>
>
> From my understanding of sync/append design, an under construction block
> should not have any block locations associated with it in the blocksMap. So
> an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will
> trigger a call, commitBlockSynchronization, to NN. This call will add the
> successfully-recovered datanodes to blocksMap. This seems to violate the
> design. It should update the targets of the last block at INode instead.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.