Empty blocks make fsck report corrupt, even when it isn't
---------------------------------------------------------
Key: HADOOP-2540
URL: https://issues.apache.org/jira/browse/HADOOP-2540
Project: Hadoop
Issue Type: Bug
Components: dfs
Affects Versions: 0.15.1
Reporter: Allen Wittenauer
If the name node crashes after blocks have been allocated and before the
content has been uploaded, fsck will report the zero sized files as corrupt
upon restart:
/user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1
blocks of total size 0 B
... even though all blocks are accounted for:
Status: CORRUPT
Total size: 2932802658847 B
Total blocks: 26603 (avg. block size 110243305 B)
Total dirs: 419
Total files: 5031
Over-replicated blocks: 197 (0.740518 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0074053
The filesystem under path '/' is CORRUPT
In UFS and related filesystems, such files would get put into lost+found after
an fsck and the filesystem would return back to normal. It would be super if
HDFS could do a similar thing. Perhaps if all of the nodes stored in the name
node's 'includes' file have reported in, HDFS could automatically run a fsck
and store these not-necessarily-broken files in something like lost+found.
Files that are actually missing blocks, however, should not be touched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.