Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
......................................................................


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG
Commit Message:

Line 19: immeditaly result in performance degredation and incompatible data on
> typo
Done


Line 25: are not writen the incompatible_features "checksum" bit is not set.
> typo
Done


Line 27: cfile_verify_checksums is used in the CFileReader to enabled 
validating the
> typo
Done


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 790:   TestReadWriteRawBlocks(SNAPPY, 1000);
> maybe this could be collapsed into a for loop?
Done


Line 819:   unique_ptr<CorruptReadableBlock> corrupt_source(
> would it be simpler to just corrupt the data on disk instead of this code-i
I will look into this. I think ideally I would like to corrupt each byte in a 
sample file and show it results in a corrupt/io error. I was planning to use 
this to do that. However I ran into some issues with the caching of index 
blocks. So regardless a slightly different approach than what is here is needed.


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 157:   if (PREDICT_FALSE(footer_->incompatible_features() > 
IncompatibleFeatures::ALL)) {
> this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)?
Your check is technically more correct since if there are bits in the middle 
that are not valid but are set, it will check that. I assumed incompatible 
feature bits would be used sequentially. Especially since we output "supported" 
as an integer in the log message below.


Line 363:   uint32_t data_size = ptr.size();
> best to use a signed int here
ptr.size() returns uint32_t, so I kept it the same for consistency.


Line 366:     data_size = ptr.size() - sizeof(uint32_t);
> we should verify that ptr.size() >= sizeof(uint32_t) first and return a Cor
Done


Line 392:     RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), 
&checksum, checksum_scratch));
> I think perf wise it may be better to just allocate the extra 4 bytes in th
I didn't want to cache the checksum since that seamed to complicate cache reads.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <granthe...@gmail.com>
Gerrit-Reviewer: Grant Henke <granthe...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to