[ 
https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677472#comment-16677472
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13999:
--------------------------------------------

+1 the 001 patch looks good.

> Bogus missing block warning if the file is under construction when NN starts
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-13999
>                 URL: https://issues.apache.org/jira/browse/HDFS-13999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png
>
>
> We found an interesting case where web UI displays a few missing blocks, but 
> it doesn't state which files are corrupt. What'll also happen is that fsck 
> states the file system is healthy. This bug is similar to HDFS-10827 and 
> HDFS-8533. 
>  (See the attachment for an example)
> Using Dynamometer, I was able to reproduce the bug, and realized the the 
> "missing" blocks are actually healthy, but somehow neededReplications doesn't 
> get updated when NN receives block reports. What's more interesting is that 
> the files associated with the "missing" blocks are under construction when NN 
> starts, and so after a while NN prints file recovery log.
> Given that, I determined the following code is the source of bug:
> {code:java|title=BlockManager#addStoredBlock}
> ....
>    // if file is under construction, then done for now
>     if (bc.isUnderConstruction()) {
>       return storedBlock;
>     }
> {code}
> which is wrong, because a file may have multiple blocks, and the first block 
> is complete. In which case, the neededReplications structure doesn't get 
> updated for the first block, and thus the missing block warning on the web 
> UI. More appropriately, it should check the state of the block itself, not 
> the file.
> Fortunately, it was unintentionally fixed via HDFS-9754:
> {code:java}
>     // if block is still under construction, then done for now
>     if (!storedBlock.isCompleteOrCommitted()) {
>       return storedBlock;
>     }
> {code}
> We should bring this fix into branch-2.7 too. That said, this is a harmless 
> warning, and should go away after the under-construction-files are recovered, 
> and the NN restarts (or force full block reports).
> Kudos to Dynamometer! It would be impossible to reproduce this bug without 
> the tool. And thanks [~smeng] for helping with the reproduction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to