Mike, you might want to look at -move option in fsck. bash-3.00$ hadoop fsck Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] <path> start checking from this path -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations
I never use it since I would rather have users' jobs fail than jobs succeeding with incomplete inputs. Koji -----Original Message----- From: Aaron Kimball [mailto:aa...@cloudera.com] Sent: Thursday, March 26, 2009 9:41 AM To: core-user@hadoop.apache.org Subject: Re: corrupt unreplicated block in dfs (0.18.3) Just because a block is corrupt doesn't mean the entire file is corrupt. Furthermore, the presence/absence of a file in the namespace is a completely separate issue to the data in the file. I think it would be a surprising interface change if files suddenly disappeared just because 1 out of potentially many blocks were corrupt. - Aaron On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews <m...@xoba.com> wrote: > i noticed that when a file with no replication (i.e., replication=1) > develops a corrupt block, hadoop takes no action aside from the > datanode throwing an exception to the client trying to read the file. > i manually corrupted a block in order to observe this. > > obviously, with replication=1 its impossible to fix the block, but i > thought perhaps hadoop would take some other action, such as deleting > the file outright, or moving it to a "corrupt" directory, or marking > it or keeping track of it somehow to note that there's un-fixable > corruption in the filesystem? thus, the current behaviour seems to > sweep the corruption under the rug and allows its continued existence, > aside from notifying the specific client doing the read with an > exception. > > if anyone has any information about this issue or how to work around > it, please let me know. > > on the other hand, i tested that corrupting a block in a replication=3 > file causes hadoop to re-replicate the block from another existing > copy, which is good and is i what i expected. > > best, > mike > > > -- > permanent contact information at http://mikerandrews.com >