Empty blocks make fsck report corrupt, even when it isn't ---------------------------------------------------------
Key: HADOOP-2540 URL: https://issues.apache.org/jira/browse/HADOOP-2540 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.1 Reporter: Allen Wittenauer If the name node crashes after blocks have been allocated and before the content has been uploaded, fsck will report the zero sized files as corrupt upon restart: /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 blocks of total size 0 B ... even though all blocks are accounted for: Status: CORRUPT Total size: 2932802658847 B Total blocks: 26603 (avg. block size 110243305 B) Total dirs: 419 Total files: 5031 Over-replicated blocks: 197 (0.740518 %) Under-replicated blocks: 0 (0.0 %) Target replication factor: 3 Real replication factor: 3.0074053 The filesystem under path '/' is CORRUPT In UFS and related filesystems, such files would get put into lost+found after an fsck and the filesystem would return back to normal. It would be super if HDFS could do a similar thing. Perhaps if all of the nodes stored in the name node's 'includes' file have reported in, HDFS could automatically run a fsck and store these not-necessarily-broken files in something like lost+found. Files that are actually missing blocks, however, should not be touched. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.