Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile ......................................................................
Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG Commit Message: PS3, Line 19: immeditaly typo PS3, Line 25: writen typo PS3, Line 27: enabled typo http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: PS3, Line 772: FLAGS_cfile_write_checksums = true; : FLAGS_cfile_verify_checksums = true; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = true; : FLAGS_cfile_verify_checksums = false; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = false; : FLAGS_cfile_verify_checksums = true; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = false; : FLAGS_cfile_verify_checksums = false; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); maybe this could be collapsed into a for loop? for (FLAGS_cfile_write_checksum : {false, true}) { for (FLAGS_cfile_verify_checksum : {false, true}) { ... } } PS3, Line 819: CorruptReadableBlock would it be simpler to just corrupt the data on disk instead of this code-injection-based approach? A hook in the fs manager code to corrupt a block on disk is probably more generally reusable in higher-level integration tests as well (where injecting this CorruptReadableBlock class may be a bit tricky). I think that would also enable making this code quite a bit simpler by reusing existing ReadFile/WriteFile functions http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS3, Line 157: > IncompatibleFeatures::ALL this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)? PS3, Line 363: uint32_t best to use a signed int here Line 366: data_size = ptr.size() - sizeof(uint32_t); we should verify that ptr.size() >= sizeof(uint32_t) first and return a Corruption otherwise. Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), &checksum, checksum_scratch)); I think perf wise it may be better to just allocate the extra 4 bytes in the block cache so that we can read the checksum in the same IO vs issuing the extra syscall here. Plus the code is likely to be a little simpler? -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes