Raghu Angadi wrote:
But this will not fix the same problem with block-level checksums. Pretty soon, HDFS will not use ChecksumFileSystem at all.
I'd hope that block-level checksums do not replicate logic from ChecksumFileSystem. Rather they should probably share substantial portions of their checksumming input and output stream implementations, no? So it could fix the same problem for block-level checksums, and should if possible.
Ideally we should let the implementations decide how to buffer.
I'm not sure what you mean by this. The buffer size is a parameter to FileSystem's open() and create() methods. Whether checksums require another level of buffering is a separate issue. Is it efficient to invoke the CRC32 code as each byte is written, or is it faster to run it in 512-byte or larger batches?
Doug
