Hairong Kuang wrote:
Another option is to create a checksum file per block at the data node where
the block is placed.

Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already guarantee end-to-end data integrity. Also, a checksum per block would not permit checksums on randomly accessed data without re-checksumming the entire block. Finally, the checksum wouldn't be end-to-end. We really want to checksum data as close to its source as possible, then validate that checksum as close to its use as possible.

Doug

Reply via email to