[ https://issues.apache.org/jira/browse/HADOOP-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated HADOOP-2540: --------------------------------- Fix Version/s: 0.15.3 I'm guessing this also means, if client fails while writing, it has to delete the stale file before retrying. If that's the case, I'd like to ask this to be in 0.15.3. > Empty blocks make fsck report corrupt, even when it isn't > --------------------------------------------------------- > > Key: HADOOP-2540 > URL: https://issues.apache.org/jira/browse/HADOOP-2540 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.15.1 > Reporter: Allen Wittenauer > Assignee: dhruba borthakur > Fix For: 0.15.3 > > Attachments: recoverLastBlock.patch > > > If the name node crashes after blocks have been allocated and before the > content has been uploaded, fsck will report the zero sized files as corrupt > upon restart: > /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 > blocks of total size 0 B > ... even though all blocks are accounted for: > Status: CORRUPT > Total size: 2932802658847 B > Total blocks: 26603 (avg. block size 110243305 B) > Total dirs: 419 > Total files: 5031 > Over-replicated blocks: 197 (0.740518 %) > Under-replicated blocks: 0 (0.0 %) > Target replication factor: 3 > Real replication factor: 3.0074053 > The filesystem under path '/' is CORRUPT > In UFS and related filesystems, such files would get put into lost+found > after an fsck and the filesystem would return back to normal. It would be > super if HDFS could do a similar thing. Perhaps if all of the nodes stored > in the name node's 'includes' file have reported in, HDFS could automatically > run a fsck and store these not-necessarily-broken files in something like > lost+found. > Files that are actually missing blocks, however, should not be touched. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.