[
https://issues.apache.org/jira/browse/HADOOP-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669096#action_12669096
]
Sanjay Radia commented on HADOOP-4663:
--------------------------------------
I really like Hirong's suggestion (
https://issues.apache.org/jira/browse/HADOOP-4663?focusedCommentId=12668127#action_12668127)
to *keep* the DN tmp blocks as ongoingCreates and send the special BR to the
NN. This is symmetric to the inodes-under-construction and lease recovery of
the NN.
The main issue I had with the older approaches was the inconsistency:
* we added a notion of Tmp because we didn't want to send these block as
part of a BR to the NN,
* but if the DN restarted all block in tmp are moved to main directory and
included in the BR anyway.
Hairong's suggestion keeps the semantics of tmp the same across reboots of DN.
This is very clean even though it adds additional code and a new "block under
cons" BR.
Furthermore it allows us to verify that these blocks match those under
construction on the NN side.
Our past attempts at sync/append and at fixing this bug have been unsuccessful
because I think we were trying to
be too clever.
The problem I have with Dhruba's suggestion is that it retains the
inconsistency I mention above and somehow appears
to be trying to avoid the special BR. If on a reboot , the DN
moves some blocks from tmp to main (after doing the validations Dhruba
suggested), why have them in tmp in the first place?
One could consider not sending this special BR at all ever. This does not work
because the blocks in tmp may not
every get cleaned in some circumstances, For example, if a NN is restarted
from
an older fsimage, the tmp files in the DNs will never be removed.
So +1 for Hairong's suggestion.
> Datanode should delete files under tmp when upgraded from 0.17
> --------------------------------------------------------------
>
> Key: HADOOP-4663
> URL: https://issues.apache.org/jira/browse/HADOOP-4663
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Raghu Angadi
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.19.1
>
> Attachments: deleteTmp.patch, deleteTmp2.patch, deleteTmp_0.18.patch,
> handleTmp1.patch
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp
> directory since these files are not valid anymore. But in 0.18 it moves these
> files to normal directory incorrectly making them valid blocks. One of the
> following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete
> them.
> Currently effect of this bug is that, these files end up failing block
> verification and eventually get deleted. But cause incorrect over-replication
> at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be
> defined better. Right now there are probably one or two more bugs with it.
> Dhruba, please file them if you rememeber.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.