Re: inline checksums

Raghu Angadi Wed, 24 Jan 2007 10:29:22 -0800

Doug Cutting wrote:

Hairong Kuang wrote:
Another option is to create a checksum file per block at the data nodewhere
the block is placed.
Yes, but then we'd need a separate checksum implementation forintermediate data, and for other distributed filesystems that don'talready guarantee end-to-end data integrity. Also, a checksum per blockwould not permit checksums on randomly accessed data withoutre-checksumming the entire block. Finally, the checksum wouldn't beend-to-end. We really want to checksum data as close to its source aspossible, then validate that checksum as close to its use as possible.

DFS checksum need not be for entire block. It could be maintained byclients for for every 64k (as in GFS paper) which avoids reading thewhole block. This ensures that there is no data corruption on disk,inside dfs etc. Client protocol can be extended so that client anddatanode exchange and verify checksums. And clients like map/reduce canfurther verify checksums received from DFSClient. This could still beend-to-end, verified hop-to-hop.

How is above different from your original proposal of inline checksums?Only difference I see is that inline moves checksum management to clientand block checksum moves the management to datanode. With block checksumevery file gets advantage of checksums.


Raghu.

Doug

Re: inline checksums

Reply via email to