Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile ......................................................................
Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG Commit Message: Line 19: immeditaly result in performance degredation and incompatible data on > typo Done Line 25: are not writen the incompatible_features "checksum" bit is not set. > typo Done Line 27: cfile_verify_checksums is used in the CFileReader to enabled validating the > typo Done http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 790: TestReadWriteRawBlocks(SNAPPY, 1000); > maybe this could be collapsed into a for loop? Done Line 819: unique_ptr<CorruptReadableBlock> corrupt_source( > would it be simpler to just corrupt the data on disk instead of this code-i I will look into this. I think ideally I would like to corrupt each byte in a sample file and show it results in a corrupt/io error. I was planning to use this to do that. However I ran into some issues with the caching of index blocks. So regardless a slightly different approach than what is here is needed. http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 157: if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) { > this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)? Your check is technically more correct since if there are bits in the middle that are not valid but are set, it will check that. I assumed incompatible feature bits would be used sequentially. Especially since we output "supported" as an integer in the log message below. Line 363: uint32_t data_size = ptr.size(); > best to use a signed int here ptr.size() returns uint32_t, so I kept it the same for consistency. Line 366: data_size = ptr.size() - sizeof(uint32_t); > we should verify that ptr.size() >= sizeof(uint32_t) first and return a Cor Done Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), &checksum, checksum_scratch)); > I think perf wise it may be better to just allocate the extra 4 bytes in th I didn't want to cache the checksum since that seamed to complicate cache reads. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes