inline checksums

Doug Cutting Tue, 23 Jan 2007 15:51:18 -0800

The current checksum implementation writes CRC32 values to a parallelfile. Unfortunately these parallel files pollute the namespace. Inparticular, this places a heavier burden on the HDFS namenode.

Perhaps we should consider placing checksums inline in file data. Forexample, we might write the data as a sequence of fixed-size<checksum><payload> entries. This could be implemented as a FileSystemwrapper, ChecksummedFileSystem. The create() method would return astream that uses a small buffer that checksums data as it arrives, thenwrites the checksums in front of the data as the buffer is flushed. Theopen() method could similarly check each buffer as it is read. Theseek() and length() methods would adjust for the interpolated checksums.

Checksummed files could have their names suffixed internally withsomething like ".hcs0". Checksum processing would be skipped for fileswithout this suffix, for back-compatibility and interoperability.Directory listings would be modified to remove this suffix.

Existing checksum code in FileSystem.java could be removed, includingall 'raw' methods.

HDFS would use ChecksummedFileSystem. If block names were modified toencode the checksum version, then datanodes could validate checksums.(We could ensure that checksum boundaries are aligned with blockboundaries.)

We could have two versions of the local filesystem: one with checksumsand one without. The DFS shell could use the checksumless version forexporting files, while MapReduce could use the checksummed version forintermediate data.

S3 might use this, or might not, if we think that Amazon alreadyprovides sufficient data integrity.


Thoughts?

Doug

inline checksums

Reply via email to