[ https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677472#comment-16677472 ]
Tsz Wo Nicholas Sze commented on HDFS-13999: -------------------------------------------- +1 the 001 patch looks good. > Bogus missing block warning if the file is under construction when NN starts > ---------------------------------------------------------------------------- > > Key: HDFS-13999 > URL: https://issues.apache.org/jira/browse/HDFS-13999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Priority: Major > Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png > > > We found an interesting case where web UI displays a few missing blocks, but > it doesn't state which files are corrupt. What'll also happen is that fsck > states the file system is healthy. This bug is similar to HDFS-10827 and > HDFS-8533. > (See the attachment for an example) > Using Dynamometer, I was able to reproduce the bug, and realized the the > "missing" blocks are actually healthy, but somehow neededReplications doesn't > get updated when NN receives block reports. What's more interesting is that > the files associated with the "missing" blocks are under construction when NN > starts, and so after a while NN prints file recovery log. > Given that, I determined the following code is the source of bug: > {code:java|title=BlockManager#addStoredBlock} > .... > // if file is under construction, then done for now > if (bc.isUnderConstruction()) { > return storedBlock; > } > {code} > which is wrong, because a file may have multiple blocks, and the first block > is complete. In which case, the neededReplications structure doesn't get > updated for the first block, and thus the missing block warning on the web > UI. More appropriately, it should check the state of the block itself, not > the file. > Fortunately, it was unintentionally fixed via HDFS-9754: > {code:java} > // if block is still under construction, then done for now > if (!storedBlock.isCompleteOrCommitted()) { > return storedBlock; > } > {code} > We should bring this fix into branch-2.7 too. That said, this is a harmless > warning, and should go away after the under-construction-files are recovered, > and the NN restarts (or force full block reports). > Kudos to Dynamometer! It would be impossible to reproduce this bug without > the tool. And thanks [~smeng] for helping with the reproduction. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org