[
https://issues.apache.org/jira/browse/HADOOP-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur updated HADOOP-2540:
-------------------------------------
Status: Patch Available (was: Open)
> Empty blocks make fsck report corrupt, even when it isn't
> ---------------------------------------------------------
>
> Key: HADOOP-2540
> URL: https://issues.apache.org/jira/browse/HADOOP-2540
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.15.1
> Reporter: Allen Wittenauer
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.15.3
>
> Attachments: recoverLastBlock.patch, recoverLastBlock2.patch
>
>
> If the name node crashes after blocks have been allocated and before the
> content has been uploaded, fsck will report the zero sized files as corrupt
> upon restart:
> /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1
> blocks of total size 0 B
> ... even though all blocks are accounted for:
> Status: CORRUPT
> Total size: 2932802658847 B
> Total blocks: 26603 (avg. block size 110243305 B)
> Total dirs: 419
> Total files: 5031
> Over-replicated blocks: 197 (0.740518 %)
> Under-replicated blocks: 0 (0.0 %)
> Target replication factor: 3
> Real replication factor: 3.0074053
> The filesystem under path '/' is CORRUPT
> In UFS and related filesystems, such files would get put into lost+found
> after an fsck and the filesystem would return back to normal. It would be
> super if HDFS could do a similar thing. Perhaps if all of the nodes stored
> in the name node's 'includes' file have reported in, HDFS could automatically
> run a fsck and store these not-necessarily-broken files in something like
> lost+found.
> Files that are actually missing blocks, however, should not be touched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.