Hairong Kuang wrote:
Another option is to create a checksum file per block at the data node where the block is placed.
Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already guarantee end-to-end data integrity. Also, a checksum per block would not permit checksums on randomly accessed data without re-checksumming the entire block. Finally, the checksum wouldn't be end-to-end. We really want to checksum data as close to its source as possible, then validate that checksum as close to its use as possible.
Doug
