Re: corrupt unreplicated block in dfs (0.18.3)

2009-03-26 Thread Aaron Kimball
Just because a block is corrupt doesn't mean the entire file is corrupt.
Furthermore, the presence/absence of a file in the namespace is a completely
separate issue to the data in the file. I think it would be a surprising
interface change if files suddenly disappeared just because 1 out of
potentially many blocks were corrupt.

- Aaron

On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews m...@xoba.com wrote:

 i noticed that when a file with no replication (i.e., replication=1)
 develops a corrupt block, hadoop takes no action aside from the
 datanode throwing an exception to the client trying to read the file.
 i manually corrupted a block in order to observe this.

 obviously, with replication=1 its impossible to fix the block, but i
 thought perhaps hadoop would take some other action, such as deleting
 the file outright, or moving it to a corrupt directory, or marking
 it or keeping track of it somehow to note that there's un-fixable
 corruption in the filesystem? thus, the current behaviour seems to
 sweep the corruption under the rug and allows its continued existence,
 aside from notifying the specific client doing the read with an
 exception.

 if anyone has any information about this issue or how to work around
 it, please let me know.

 on the other hand, i tested that corrupting a block in a replication=3
 file causes hadoop to re-replicate the block from another existing
 copy, which is good and is i what i expected.

 best,
 mike


 --
 permanent contact information at http://mikerandrews.com



RE: corrupt unreplicated block in dfs (0.18.3)

2009-03-26 Thread Koji Noguchi
Mike, you might want to look at -move option in fsck.

bash-3.00$ hadoop fsck
Usage: DFSck path [-move | -delete | -openforwrite] [-files [-blocks
[-locations | -racks]]]
path  start checking from this path
-move   move corrupted files to /lost+found
-delete delete corrupted files
-files  print out files being checked
-openforwrite   print out files opened for write
-blocks print out block report
-locations  print out locations for every block
-racks  print out network topology for data-node locations



I never use it since I would rather have users' jobs fail than jobs
succeeding with incomplete inputs.

Koji


-Original Message-
From: Aaron Kimball [mailto:aa...@cloudera.com] 
Sent: Thursday, March 26, 2009 9:41 AM
To: core-user@hadoop.apache.org
Subject: Re: corrupt unreplicated block in dfs (0.18.3)

Just because a block is corrupt doesn't mean the entire file is corrupt.
Furthermore, the presence/absence of a file in the namespace is a
completely
separate issue to the data in the file. I think it would be a surprising
interface change if files suddenly disappeared just because 1 out of
potentially many blocks were corrupt.

- Aaron

On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews m...@xoba.com wrote:

 i noticed that when a file with no replication (i.e., replication=1)
 develops a corrupt block, hadoop takes no action aside from the
 datanode throwing an exception to the client trying to read the file.
 i manually corrupted a block in order to observe this.

 obviously, with replication=1 its impossible to fix the block, but i
 thought perhaps hadoop would take some other action, such as deleting
 the file outright, or moving it to a corrupt directory, or marking
 it or keeping track of it somehow to note that there's un-fixable
 corruption in the filesystem? thus, the current behaviour seems to
 sweep the corruption under the rug and allows its continued existence,
 aside from notifying the specific client doing the read with an
 exception.

 if anyone has any information about this issue or how to work around
 it, please let me know.

 on the other hand, i tested that corrupting a block in a replication=3
 file causes hadoop to re-replicate the block from another existing
 copy, which is good and is i what i expected.

 best,
 mike


 --
 permanent contact information at http://mikerandrews.com