Re: Many Checksum Errors

Doug Cutting Wed, 16 May 2007 11:26:13 -0700

[ Moving discussion to hadoop-dev.  -drc ]

Raghu Angadi wrote:

This is good validation how important ECC memory is. Currently HDFSclient deletes a block when it notices a checksum error. After moving toBlock level CRCs soon, we should make Datanode re-validate the blockbefore deciding to delete it.

It also emphasizes how important end-to-end checksums are. Data shouldalso be checksummed as soon as possible after it is generated, before ithas a chance to be corrupted.

Ideally, the initial buffer that stores the data should be small, anddata should be checksummed as this initial buffer is flushed. In thecurrent implementation, the small checksum buffer is the second buffer,the initial buffer is the larger, io.buffer.size buffer. To providemaximum protection against memory errors, this situation should be reversed.

This is discussed in https://issues.apache.org/jira/browse/HADOOP-928.Perhaps a new issue should be filed to reverse the order of thesebuffers, so that data is checksummed before entering the larger,longer-lived buffer?


Doug

Re: Many Checksum Errors

Reply via email to