[ 
https://issues.apache.org/jira/browse/HADOOP-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669096#action_12669096
 ] 

Sanjay Radia commented on HADOOP-4663:
--------------------------------------

I really like Hirong's suggestion ( 
https://issues.apache.org/jira/browse/HADOOP-4663?focusedCommentId=12668127#action_12668127)
 to  *keep* the DN tmp blocks as ongoingCreates and send the special BR to the 
NN. This is symmetric to the  inodes-under-construction and lease recovery of 
the NN.


The main issue I had with the older approaches was the inconsistency:
*   we added a notion of  Tmp because we didn't want to send these block as 
part of a BR to the NN,
*  but if the DN restarted all block in tmp are moved to main directory and 
included in  the BR anyway. 

Hairong's suggestion keeps the semantics of tmp the same across reboots of DN.
This is  very clean even though it adds additional code and a new "block under 
cons" BR.
Furthermore it allows us to verify that these blocks match those under 
construction on the NN side. 
Our past attempts at sync/append and at fixing this bug have  been unsuccessful 
because I think we were trying to
be too clever. 

The problem I have with Dhruba's suggestion is that it retains the 
inconsistency I mention above and somehow appears 
to be trying to avoid the special BR. If on a reboot , the DN
moves some blocks from tmp to main (after doing the validations Dhruba 
suggested), why have them in tmp in the first place?

One could consider not sending this special BR at all ever. This does not work 
because  the blocks in tmp may not 
every get cleaned in some circumstances, For example,  if  a NN is restarted 
from 
an older fsimage, the tmp files in the DNs will never be removed. 

So +1 for Hairong's suggestion.


> Datanode should delete files under tmp when upgraded from 0.17
> --------------------------------------------------------------
>
>                 Key: HADOOP-4663
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4663
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.19.1
>
>         Attachments: deleteTmp.patch, deleteTmp2.patch, deleteTmp_0.18.patch, 
> handleTmp1.patch
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
> directory since these files are not valid anymore. But in 0.18 it moves these 
> files to normal directory incorrectly making them valid blocks. One of the 
> following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
> them.
> Currently effect of this bug is that, these files end up failing block 
> verification and eventually get deleted. But cause incorrect over-replication 
> at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be 
> defined better. Right now there are probably one or two more bugs with it. 
> Dhruba, please file them if you rememeber.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to