[jira] [Commented] (HDFS-3605) Block mistakenly marked corrupt during edit log catchup phase of failover

Uma Maheswara Rao G (JIRA) Sat, 14 Jul 2012 22:53:45 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414580#comment-13414580
 ]


Uma Maheswara Rao G commented on HDFS-3605:
-------------------------------------------

 Thanks a lot, Todd for the patch.

I have taken a quick look on the patch. Yes, this approach should work as well. 
Blocks will get processed for all the ops, so, the matching to current genStamp 
will get processed in current iteration and future ones will get postponed 
again. 

A few comments on patch: Did not check for any javadoc issues as you mentioned 
already, i.e,  will work on javadocs.

{quote}
+      out.writeBytes("/data");
+      
+      // TODO: why do we need an hflush for this test case to fail?
{quote}
I remember, this is just added to ensure tthat the current packet will be 
en-queued and block will get allocated.
Other wise less than 64K content may not be flushed at that time.

{quote}
DFSTestUtil.appendFile(fs, fileToAppend, "data");
{quote}
Having the multiple append calls can give the regression for the case where we 
have many genstamp and they got processed in order and future ones will get 
postponed.

{quote}
// Wait till DN reports blocks
+      cluster.triggerBlockReports();
{quote}
comment need to update?

{quote}
shouldPostponeInvalidBlocks  
{quote}
do we need to change the variable name? Since blocks are not declared as 
invalid yet.

Will take a look deeply on the patch tomorrow again. ( not able to concentrate 
much, as I am traveling today)


                
> Block mistakenly marked corrupt during edit log catchup phase of failover
> -------------------------------------------------------------------------
>
>                 Key: HDFS-3605
>                 URL: https://issues.apache.org/jira/browse/HDFS-3605
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, name-node
>    Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>            Reporter: Brahma Reddy Battula
>            Assignee: Todd Lipcon
>         Attachments: HDFS-3605.patch, TestAppendBlockMiss.java, hdfs-3605.txt
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3605) Block mistakenly marked corrupt during edit log catchup phase of failover

Reply via email to