[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 15: (14 comments) http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 31: : 32-bit unsigned integer length delimiter > specify whether this includes the checksum or just the following PB Done Line 33: : An optional Crc32 checksum of the entire header > does this include the magic? Done Line 43: : An optional Crc32 checksum of the entire footer > same - including the footer, magic, and pb len? Done Line 211: When checksums for the header, data, and footer are written in the CFile > nit: add comma at end of line Done http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 223: opts.storage_attributes.compression = compression; > oh, that's interesting, we forgot to actually use the compression here? I think so. I noticed it while doing my testing. Line 336: uint8_t orig = data.mutable_data()[corrupt_offset]; > Nit: could do data.data()[corrupt_offset] here I think (though it may mean Done http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 141: // Parse Footer first to find unsupported features. > does this mean that an old server will have problems with checksummed heade It shouldn't because the length doesn't include the checksum. Line 146: if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) { > '>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL I put a comment on a similar older review and wasn't sure if I should change it. Here was the response: > Your check is technically more correct since if there are bits in the middle > that are not valid but are set, it will check that. I assumed incompatible > feature bits would be used sequentially. Especially since we output > "supported" as an integer in the log message below. Line 150: IncompatibleFeatures::ALL)); > instead of ALL, maybe a better name would be 'SUPPORTED'? Done Line 239: if (footer_size >= file_size_ - kMagicAndLengthSize) { > do we want to have some kind of max size here? am afraid a flipped bit in a ParseMagicAndLength has a max size check: if (len > kMaxHeaderFooterPBSize) { return Status::Corruption("invalid data size"); } Line 416: uint32_t checksum_size = sizeof(uint32_t); > you have this in a few places. Maybe better to just have a global constant Done http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 230: incompatible_features = IncompatibleFeatures::CHECKSUM; > maybe |= here so it doesn't need to change when we add some other feature Done Line 500: // Calculate and append a checksum data checksum. > typo? Done http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc File src/kudu/util/crc-test.cc: Line 50: const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil usage test program. > nit: kExpectedCrc Done -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 15: (13 comments) http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 31: : 32-bit unsigned integer length delimiter specify whether this includes the checksum or just the following PB Line 33: : An optional Crc32 checksum of the entire header does this include the magic? Line 43: : An optional Crc32 checksum of the entire footer same - including the footer, magic, and pb len? Line 211: When checksums for the header, data, and footer are written in the CFile nit: add comma at end of line http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 223: opts.storage_attributes.compression = compression; oh, that's interesting, we forgot to actually use the compression here? http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 141: // Parse Footer first to find unsupported features. does this mean that an old server will have problems with checksummed headers, because it will fail while reading the header rather than seeing the incompatible feature flag in the footer first? Line 146: if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) { '>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL PS15, Line 150: ALL instead of ALL, maybe a better name would be 'SUPPORTED'? Line 239: if (footer_size >= file_size_ - kMagicAndLengthSize) { do we want to have some kind of max size here? am afraid a flipped bit in a large block would cause a huge stack alloc below. alernatively we could use faststring for the buffer below, which is still stack-allocated for small sizes but avoids blowing out the stack in the case of a corruption PS15, Line 416: uint32_t checksum_size = sizeof(uint32_t); you have this in a few places. Maybe better to just have a global constant kChecksumSize? http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS15, Line 230: = maybe |= here so it doesn't need to change when we add some other feature PS15, Line 500: checksum data checksum typo? http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc File src/kudu/util/crc-test.cc: Line 50: const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil usage test program. nit: kExpectedCrc -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/15//COMMIT_MSG Commit Message: Line 7: WIP: KUDU-463. Add checksumming to cfile Also, this is no longer a WIP, right? -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 27: Header > I will add that they are optional. I am not sure if the cflags should go in That's fair. http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 336: data.mutable_data()[corrupt_offset] = corrupt_value; Nit: could do data.data()[corrupt_offset] here I think (though it may mean making 'orig' const uint8_t), since the data isn't yet beint modified. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#15). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h 9 files changed, 328 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/15 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 27: Header > Can you update this to reflect that the checksums are optional (and maybe d I will add that they are optional. I am not sure if the cflags should go in this document since it describes the format and not the details of the reader/writer implementations. http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 847: Status s = CorruptAndReadBlock(id, i, 254); > I had something similar to this at one point but I worried I was getting to Done http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 438: } > I understand, but what exactly does it mean for a ReadBlock() call to provi Since this is a public method I want to validate the input parameters. I don't think there is ever a valid scenario where a block would ever have 3 bytes or less. However, when hasChecksums is true, this is always the case given the checksum is 4 bytes and is included in the size. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 14: (1 comment) Looks good, just two other things: 1. What was your conclusion re: bit flipping in the test? 2. Let's continue the comment thread about ReadBlock() when ptr.size() < 4. http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 27: Header Can you update this to reflect that the checksums are optional (and maybe describe the cflags that control them a little)? -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#14). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h 9 files changed, 324 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/14 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 67: TAG_FLAG(cfile_write_checksums, experimental); > No, my reading skills aren't as good as they should be. Sorry about that. Done Line 186: RETURN_NOT_OK_PREPEND(WriteRawData(Slice(checksum_buf, sizeof(checksum_buf))), > Rather than two separate calls to WriteRawData(), could we append the check Done Line 494: RETURN_NOT_OK(WriteRawData(data)); > As I was reading through the comments I started getting a similar feeling t Done Line 504: RETURN_NOT_OK(WriteRawData(Slice(crc_buf, sizeof(crc_buf; > Let's also combine the writes here. It's a little trickier since we need to Done -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Jean-Daniel Cryans has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 56: TAG_FLAG(cfile_verify_checksums, experimental); > I think the main difference is that experimental is for things we don't nec Adar is right, also here's the reference: https://github.com/apache/kudu/blob/master/src/kudu/util/flag_tags.h#L21 The main thing about experimental is that you must unlock them first to be able to use them. So if you enable something by default... but then mark it experimental, it means that users have to go out of their way to disable the experimental feature. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 56: TAG_FLAG(cfile_verify_checksums, experimental); > Done. I am not sure I am clear on the difference between experimental and e I think the main difference is that experimental is for things we don't necessarily want to draw attention to, while evolving is for something that we want people to use, but retain the right to adjust how it's configured in the future. Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) { > I only kept it for the sake of the TRACE_EVENT1. Can remove it. I think every caller already has a TRACE_EVENT() of some kind in it, so we don't need to retain this purely for that. Line 438: } > This is mainly to prevent data_size -= checksum_size from becoming negative I understand, but what exactly does it mean for a ReadBlock() call to provide ptr.size() < 4? Is that impossible when hasChecksums() is true? http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 67: TAG_FLAG(cfile_write_checksums, experimental); > I describe the defaults a bit in the patch summary. Think I should add a mo No, my reading skills aren't as good as they should be. Sorry about that. But, since we want people to try this feature out, I think evolving is the better tag choice. You could check with Todd/Mike for a second opinion. Line 263: // Prepend a footer checksum. > I saw the same thing but don't have a great answer. I can change it if you If you can't prove to yourself that NOT using WriteRawData() is more correct, then change it. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (25 comments) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 244: if (FLAGS_cfile_write_checksums) { > Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) Done Line 330: source->Size(_size); > RETURN_NOT_OK Done Line 333: source->Read(0, ); > RETURN_NOT_OK Done Line 843: source->Size(_size); > Should ASSERT_OK on this too. Done Line 847: Status s = CorruptAndReadBlock(id, i, 254); > Just in case one of the file's bytes actually has the value 254, maybe you I had something similar to this at one point but I worried I was getting to clever in a test. I figured a simple static value that was known to corrupt all bytes in the given test would be a clear way to fail fast if something in the test changed. I will take a look again now that my corrupt code is more straightforward. http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 56: TAG_FLAG(cfile_verify_checksums, experimental); > Should probably default to evolving. Done. I am not sure I am clear on the difference between experimental and evolving. Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) { > Since this function is now just a block read, I think it's net more readabl I only kept it for the sake of the TRACE_EVENT1. Can remove it. Line 210: if (header.size() != header_size || checksum.size() != checksum_size) { > Do we need this check now that there are no short reads? Done Line 215: if (FLAGS_cfile_verify_checksums) { > If this is false, why bother reading the checksum out of the block at all? Done Line 226: RETURN_NOT_OK(block_->Read(off, )); > Would be cleaner to call ReadV() regardless of whether checksums exist or n Done Line 227: if (header.size() != header_size) { > Do we need this check now that there are no short reads? Done Line 273: // This is done to avoid the need for a follow up read call. > Just to be clear, if there are no checksums, the four extra bytes we're rea That is exactly the choice being made. An earlier version of the patch did this in 2 reads. However because we know the header alone has enough bytes to support the over read, it should be safe to do this. The draw back here is that cfiles without checksums read 4 bytes more than required. Line 277: if (footer.size() != footer_size || checksum.size() != checksum_size) { > No short reads; do we still need this? Done Line 283: // incompatible_features flag tells us if a checksum exists at all. > What's the point of the footer checksum then? It seems extraordinarily rare Though rare, I am viewing checksumming as an all or nothing feature. It's possible the contents of the protobuf could be parsable but incorrect. In the case parsing fails, we know its corrupt. In the case parsing succeeds the checksum ensures we parsed the right thing. Given the importance of the fields in the footer I think it's worth being sure. Line 438: } > This effectively forbids zero-length block reads, right? Is that an invaria This is mainly to prevent data_size -= checksum_size from becoming negative and resulting in a cryptic error. Line 461: if (block.size() != data_size || checksum.size() != checksum_size) { > No short reads... Done Line 474: } > Maybe it would even be possible to incorporate the ReadV() too? Basically a Good idea. The one place that is different is the footer since we "preemptively" read checksums data and the checksum comes before the data. This could be reused for the header and data blocks though. Line 476: RETURN_NOT_OK(block_->Read(ptr.offset(), )); > Let's use ReadV() for both cases, and vary whether we pass a one-slice vect Done Line 477: if (block.size() != data_size) { > No short reads... Done http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 188: // Returns true if the file has checksums on the header, footer, and data blocks > Nit: terminate with a period. Done Line 189: bool hasChecksums() const; > Nit: should be either has_checksums() or HasChecksums() Done http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 67: TAG_FLAG(cfile_write_checksums, experimental); > 1. Why default to false? I describe the defaults a bit in the patch summary. Think I should add a more detailed comment here? Here is a quote from the patch summary: cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS13, Line 466: if (FLAGS_cfile_verify_checksums) { : uint32_t expected_checksum = DecodeFixed32(checksum.data()); : uint32_t checksum_value = crc::Crc32c(block.data(), block.size()); : if (PREDICT_FALSE(checksum_value != expected_checksum)) { : return Status::Corruption( : Substitute("Checksum does not match at offset $0 size $1: $2 vs $3", :ptr.offset(), block.size(), checksum_value, expected_checksum)); : } : } > I think I've seen this block of code three times; can you factor it out int Maybe it would even be possible to incorporate the ReadV() too? Basically a function that takes as input the data Slice and: 1. Creates a slice vector to pass to ReadV(). 2. Checks if checksums are to be verified. 3. If so, creates a checksum buffer+slice and modifies the slice vector. 4. Calls ReadV(). 5. If checksums are to be verified, performs the verification. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (28 comments) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 244: if (FLAGS_cfile_write_checksums) { Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) and call HasChecksum() instead? Or make HasChecksum() a public method? Line 330: source->Size(_size); RETURN_NOT_OK Line 333: source->Read(0, ); RETURN_NOT_OK Line 843: source->Size(_size); Should ASSERT_OK on this too. PS13, Line 847: 254 Just in case one of the file's bytes actually has the value 254, maybe you should do something more drastic like flip one of the bits? That way there's always a "change" to the byte's value. Bonus points: if it's not too slow, add another loop which selects which of 8 bits in the byte to flip. http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS13, Line 56: TAG_FLAG(cfile_verify_checksums, experimental); Should probably default to evolving. Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) { Since this function is now just a block read, I think it's net more readable to remove it and do the block read inline wherever it was needed. Line 210: if (header.size() != header_size || checksum.size() != checksum_size) { Do we need this check now that there are no short reads? Line 215: if (FLAGS_cfile_verify_checksums) { If this is false, why bother reading the checksum out of the block at all? Line 226: RETURN_NOT_OK(block_->Read(off, )); Would be cleaner to call ReadV() regardless of whether checksums exist or not, and just pass either a single Slice or two Slices as needed. Line 227: if (header.size() != header_size) { Do we need this check now that there are no short reads? PS13, Line 271: // Read both the header and checksum in one call. : // We read the checksum position in case one exists. : // This is done to avoid the need for a follow up read call. Just to be clear, if there are no checksums, the four extra bytes we're reading: 1. Are definitely not checksum data, but, 2. Are guaranteed to exist. Is that right? And AFAICT, since the footer itself tells us whether the cfile has checksums, the only alternative is to do two separate reads, and we think reading an extra 4 bytes is the better trade-off, right? Line 277: if (footer.size() != footer_size || checksum.size() != checksum_size) { No short reads; do we still need this? PS13, Line 282: // This needs to be done before validating the checksum since the : // incompatible_features flag tells us if a checksum exists at all. What's the point of the footer checksum then? It seems extraordinarily rare that we'll find a footer than can serialize properly but fails its checksum test. PS13, Line 436: if (checksum_size > data_size) { : return Status::Corruption("invalid data size"); : } This effectively forbids zero-length block reads, right? Is that an invariant that we already enforce? Line 461: if (block.size() != data_size || checksum.size() != checksum_size) { No short reads... PS13, Line 466: if (FLAGS_cfile_verify_checksums) { : uint32_t expected_checksum = DecodeFixed32(checksum.data()); : uint32_t checksum_value = crc::Crc32c(block.data(), block.size()); : if (PREDICT_FALSE(checksum_value != expected_checksum)) { : return Status::Corruption( : Substitute("Checksum does not match at offset $0 size $1: $2 vs $3", :ptr.offset(), block.size(), checksum_value, expected_checksum)); : } : } I think I've seen this block of code three times; can you factor it out into a helper function? Something like: Status VerifyChecksum(...): if !FLAGS_cfile_verify_checksums: return Status::OK(); Line 476: RETURN_NOT_OK(block_->Read(ptr.offset(), )); Let's use ReadV() for both cases, and vary whether we pass a one-slice vector or a two-slice vector. Line 477: if (block.size() != data_size) { No short reads... http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 188: // Returns true if the file has checksums on the header, footer, and data blocks Nit: terminate with a period. Line 189: bool hasChecksums() const; Nit: should be either has_checksums() or HasChecksums() http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS13, Line 65: DEFINE_bool(cfile_write_checksums, false, :
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#13). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h 9 files changed, 344 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/13 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#12). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 18 files changed, 724 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/12 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/11/src/kudu/util/env.h File src/kudu/util/env.h: Line 375: Slice* result; // Set to the data and length that was read (OUT) Since there are no short reads, can we omit 'result' altogether? Without short reads, ReadV() either fails (in which case 'entries' is totally invalid) or it succeeds fully (in which case the results of the read is described fully by each 'length' and 'scratch' pair). If the caller needs slices, it can make its own. Alternatively, perhaps we can replace vector with vector? Each slice describes both 'length' and 'scratch', and would be used as input when constructing the iov internally. In either case, ReadV() won't modify the vector and so it should be passed by const ref to avoid needless copies. An alternative would to be pass it by value and move it, but I think the caller still needs access after ReadV(), and it would lose access if move were used. Given the amount of discussion around this interface, I really think this part of the patch should be split out into its own gerrit change, partly to ease code review and partly so the new git commit can more fully document the design trade-offs taken. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#11). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 18 files changed, 724 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/11 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 174: // Simulates Linux's preadv API on OS X. > DoWriteV actually changes it entire implementation instead of simulating pw Sure, works for me. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 174: // Simulates Linux's preadv API on OS X. > BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If yo DoWriteV actually changes it entire implementation instead of simulating pwritev. I can definitely move this, but it might be best in a follow up patch. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 262: gscoped_ptrscratch1(new uint8_t[size1]); > I mainly just followed the "pattern" from a few lines above. Happy to chang Yeah, we should be using unique_ptr in new sites. Not sure why I used heap allocation above, but that can change too. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 174: // Simulates Linux's preadv API on OS X. BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If you're feeling generous perhaps you can pull that out into this area too, so the two simulated functions can be defined side by side? Line 334: virtual Status ReadV(uint64_t offset, vector results) const OVERRIDE { > It doesn't look like any read or write calls have this now. Since these are Hmm you're probably right; I didn't realize there weren't any calls in Write either. Nevermind then. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (23 comments) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 301: RETURN_NOT_OK(block_->ReadV(off, results)); > I think it would be a little less obtrusive to have ReadResult construct an My goal was to match the pattern and variables used in the Read function. Especially since this is essentially a version of Read for multiple Slices. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 210: bool has_checksums_; > This will be padded to 8 bytes, which is actually quite a bit given how man Done http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 262: gscoped_ptrscratch1(new uint8_t[size1]); > Nit: use unique_ptr. I mainly just followed the "pattern" from a few lines above. Happy to change it. Line 267: ReadResult { > Surprised that this style of struct initialization works; isn't it C99 but Will change this. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 162: // Does not modify 'result' on error (but may modify 'scratch'). > You mean results? I mean the result field in the results. I will make this more clear. Line 163: virtual Status ReadV(uint64_t offset, vector results) const = 0; > nit: Consider making this vector* perhaps, based on my comments Looking into this. Adar mentioned it too. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h File src/kudu/fs/fs-test-util.h: Line 72: virtual Status ReadV(uint64_t offset, vector results) const OVERRIDE { > This is a header so we retain namespace qualifications (i.e. this should be Done http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc File src/kudu/util/crc.cc: Line 50: uint64_t crc_tmp = static_cast(crc32); > Nit: extra space here. Done http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h File src/kudu/util/crc.h: Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32); > Maybe previous_crc32 to be more clear? Done Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32); > Maybe name it RollingCrc32()? Also, needs a unit test. I though it was nice to have the same name as the original function since it works in tandem with it. ie: uint32_t crc = Crc32c(data, length); crc = Crc32c(data, length, crc); http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h File src/kudu/util/env.h: Line 371: struct ReadResult { > Should provide a short high-level struct document too. Done Line 403: virtual Status ReadV(uint64_t offset, std::vector results) const = 0; > I'm finding this to be somewhat confusing. I agree. I had a hard time naming the struct too. I am not sure ReadEntry is perfect either, but it's definitely better. Will change and re-organize ReadResult. Line 536: // TODO (GH): Document > Yep. :) I was waiting until it was decided if I should allow short reads here or not. I will move the readfully logic here and document accordingly. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 175: ssize_t preadv(int fd, const struct iovec *iovec, int count, off_t offset) { > Should be const struct iovec* Done Line 180: return read_bytes; > Nit: if you're going to elide the braces, merge this line into the conditio Done Line 182: break; > Don't we need to update total_read_bytes with our partial read results befo Done Line 334: virtual Status ReadV(uint64_t offset, vector results) const OVERRIDE { > Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they It doesn't look like any read or write calls have this now. Since these are called frequently is there some overhead to worry about? Line 341: return iovec { > Again, surprised this kind of initialization works in C++11. Done Line 353: // Adjust slice sizes based on bytes read for short reads > As we discussed, let's make short reads an error, for Read() too. Do it as Will do. Line 355: for (ReadResult& readResult : results) { > Nit: auto is fine for short-scoped types like this one. Done http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc File src/kudu/util/env_util.cc: Line 196: size_t n = accumulate(results.begin(), results.end(), static_cast(0), > nit: name this req_len ? I can rename this. Often when in a new codebase I try to match the style that exists. Given this is a "mirror" of ReadFully above I used similar names. Line 207: // Calculate the read amount of data > Nit: Use punctuation like periods at the end of comment sentences per https
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Mike Percy has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS10, Line 277: uint8_t footer_scratch[footer_size]; : Slice footer; : : uint32_t checksum_size = sizeof(uint32_t); : uint8_t checksum_scratch[checksum_size]; : Slice checksum; : : // Read both the header and checksum in one call. : // We read the checksum position in case one exists. : // This is done to avoid the need for a follow up read call. : // Read both the data and checksum in one call : vector results = { : ReadResult { : .result = , : .scratch = checksum_scratch, : .length = checksum_size, : }, : ReadResult { : .result = , : .scratch = footer_scratch, : .length = footer_size, : } : }; : uint64_t off = file_size_ - kMagicAndLengthSize - footer_size - checksum_size; : RETURN_NOT_OK(block_->ReadV(off, results)); I think it would be a little less obtrusive to have ReadResult construct and own the Slice, the scratch buffer (you could use faststring) and the length and use a constructor like explicit ReadResult(buffer_size) So then instead of all this you could just say uint32_t checksum_size = sizeof(uint32_t); vector results = { ReadResult(checksum_size), ReadResult(footer_size) }; uint64_t off = file_size_ - kMagicAndLengthSize - footer_size - checksum_size; RETURN_NOT_OK(block_->ReadV(off, )); And avoid copying the vector. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS10, Line 163: vector nit: Consider making this vector* perhaps, based on my comments elsewhere http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h File src/kudu/util/crc.h: PS10, Line 38: Crc32c Maybe name it RollingCrc32()? Also, needs a unit test. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc File src/kudu/util/env_util.cc: PS10, Line 196: n nit: name this req_len ? Line 207: // Calculate the read amount of data Nit: Use punctuation like periods at the end of comment sentences per https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 10: (21 comments) I didn't review the checksumming stuff yet, just looked at the rest and the plumbing. Since the ReadV() stuff turned out to be pretty substantial, could you move it into its own commit? Would be nice to finish reviewing it separately. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 210: bool has_checksums_; This will be padded to 8 bytes, which is actually quite a bit given how many CFileReaders we instantiate. Perhaps we can calculate this on the fly instead, using a mask on footer_'s features? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS10, Line 262: gscoped_ptr Nit: use unique_ptr. Actually, these scratch spaces are tiny; can you just allocate them on the stack? Line 267: ReadResult { Surprised that this style of struct initialization works; isn't it C99 but not C++11 compatible? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 38: class Env; No need for this anymore. Though if you decide to pass the vector by pointer as I suggested in env.h, you can omit the env.h include, forward declare ReadResult, and continue to forward declare Env. PS10, Line 162: result You mean results? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h File src/kudu/fs/fs-test-util.h: PS10, Line 72: vector This is a header so we retain namespace qualifications (i.e. this should be std::vector). http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc File src/kudu/util/crc.cc: PS10, Line 50: Nit: extra space here. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h File src/kudu/util/crc.h: PS10, Line 38: crc32 Maybe previous_crc32 to be more clear? http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h File src/kudu/util/env.h: Line 371: struct ReadResult { Should provide a short high-level struct document too. PS10, Line 403: results I'm finding this to be somewhat confusing. Part of the issue is with the name ReadResult: the majority of stuff in it aren't "results"; they're actually input to the function. Maybe ReadEntry or ReadVEntry would be better? Part of the issue is that the Slice* means an extra level of indirection in the setup that callers have to do. Perhaps they can be regular Slices, that are default-initialized by the caller and set by the callee? That means 'results' should be passed by pointer, but I think that's a clearer model anyway, since we pass by pointer things that the function is expected to modify. Also, could you reorganize ReadResult so that the pure IN parameters precede IN/OUT and OUT? Meaning, the order should probably be length, scratch, then result. Line 536: // TODO (GH): Document Yep. :) http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: PS10, Line 175: const struct iovec * Should be const struct iovec* Line 180: return read_bytes; Nit: if you're going to elide the braces, merge this line into the conditional line. This makes it harder to accidentally mess up the conditional if another line is added. Line 182: break; Don't we need to update total_read_bytes with our partial read results before breaking? Line 334: virtual Status ReadV(uint64_t offset, vector results) const OVERRIDE { Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they were missing in the first place. PS10, Line 340: ReadResult Can be cref? Line 341: return iovec { Again, surprised this kind of initialization works in C++11. PS10, Line 347: results.size() arraylength(iov) is more idiomatic. Line 353: // Adjust slice sizes based on bytes read for short reads As we discussed, let's make short reads an error, for Read() too. Do it as a separate patch and layer this one on top of it. PS10, Line 355: ReadResult Nit: auto is fine for short-scoped types like this one. http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.h File src/kudu/util/env_util.h: Line 66: Status ReadVFully(RandomAccessFile* file, uint64_t offset, std::vector results); If you end up solving the partial read stuff directly in RandomAccessFile, this won't be necessary. But if it is, please also doc it. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#10). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/file_cache.cc 19 files changed, 792 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/10 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#9). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Finish testing - Method documentation - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/file_cache.cc 19 files changed, 828 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/9 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#8). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Finish testing - Method documentation - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/file_cache.cc 18 files changed, 804 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/8 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#7). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Finish testing - Method documentation - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/file_cache.cc 18 files changed, 804 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/7 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 446: data_size -= checksum_size; > careful of underflow Done Line 472: if (block.size() != data_size && checksum.size() != checksum_size) { > || Done http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h File src/kudu/util/env.h: Line 21: #include > Would prefer not to leak this sort of platform-specific thing outside of en I agree. This was just a rough patch to see if the approach and use of preadv made sense before polishing. See my comment below as well. Line 390: virtual Status ReadV(uint64_t offset, Slice** results, > Would std::vector* be more intuitive for 'results'? That is, ReadV() I agree that a vector with all the combined parameters would be better in the end. Something that has slice, scratch, and length so we don't need to expose iovec. I planned to do this, I just posted this super rough version to see if the approach and use of preadv made sense before spending the time polishing, testing, etc. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h File src/kudu/util/env.h: Line 21: #include Would prefer not to leak this sort of platform-specific thing outside of env_posix.cc. An iovec is a pretty simple structure; perhaps you can define a Kudu version and adapt it to/from an iovec inside env_posix.cc? Line 390: virtual Status ReadV(uint64_t offset, Slice** results, Would std::vector* be more intuitive for 'results'? That is, ReadV() takes a caller-provided vector of Slices and replaces its contents with a couple Slices of its own creation? Might be interesting to combine 'scratches' and 'results' into the same structure, since I imagine the length of the two must be the same. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 6: (2 comments) Just did a quick look at the approach using readv, it seems like a good direction to me. I didn't review the new header/footer checksum stuff yet. It may be worth checking for Adar's opinion on the API in Env, he tends to have opinions on that stuff. The current one doesn't seem super idiomatic to me but also don't really have a better suggestion :) http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 446: data_size -= checksum_size; careful of underflow PS6, Line 472: && || -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#6). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Finish testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.h M src/kudu/util/file_cache.cc 17 files changed, 539 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/6 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 819: unique_ptr corrupt_source( > I will look into this. I think ideally I would like to corrupt each byte in yea, you probably need to re-open the file and clear the block cache in between http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 363: uint32_t data_size = ptr.size(); > ptr.size() returns uint32_t, so I kept it the same for consistency. yea, though the style guide says to use signed quantities anywhere that you aren't doing bitwise ops (that way overflow will be more obvious since it will go to -1 instead of a very large number) Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), , checksum_scratch)); > I didn't want to cache the checksum since that seamed to complicate cache r hm, possibly, but am worried that doubling the syscalls is expensive -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#5). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Finish testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/fs-test-util.h M src/kudu/util/crc.cc M src/kudu/util/crc.h 9 files changed, 346 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/5 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG Commit Message: Line 19: immeditaly result in performance degredation and incompatible data on > typo Done Line 25: are not writen the incompatible_features "checksum" bit is not set. > typo Done Line 27: cfile_verify_checksums is used in the CFileReader to enabled validating the > typo Done http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 790: TestReadWriteRawBlocks(SNAPPY, 1000); > maybe this could be collapsed into a for loop? Done Line 819: unique_ptr corrupt_source( > would it be simpler to just corrupt the data on disk instead of this code-i I will look into this. I think ideally I would like to corrupt each byte in a sample file and show it results in a corrupt/io error. I was planning to use this to do that. However I ran into some issues with the caching of index blocks. So regardless a slightly different approach than what is here is needed. http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 157: if (PREDICT_FALSE(footer_->incompatible_features() > IncompatibleFeatures::ALL)) { > this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)? Your check is technically more correct since if there are bits in the middle that are not valid but are set, it will check that. I assumed incompatible feature bits would be used sequentially. Especially since we output "supported" as an integer in the log message below. Line 363: uint32_t data_size = ptr.size(); > best to use a signed int here ptr.size() returns uint32_t, so I kept it the same for consistency. Line 366: data_size = ptr.size() - sizeof(uint32_t); > we should verify that ptr.size() >= sizeof(uint32_t) first and return a Cor Done Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), , checksum_scratch)); > I think perf wise it may be better to just allocate the extra 4 bytes in th I didn't want to cache the checksum since that seamed to complicate cache reads. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#4). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - Update cfile docs - Finish testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/fs-test-util.h M src/kudu/util/crc.cc M src/kudu/util/crc.h 8 files changed, 321 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/4 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG Commit Message: PS3, Line 19: immeditaly typo PS3, Line 25: writen typo PS3, Line 27: enabled typo http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: PS3, Line 772: FLAGS_cfile_write_checksums = true; : FLAGS_cfile_verify_checksums = true; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = true; : FLAGS_cfile_verify_checksums = false; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = false; : FLAGS_cfile_verify_checksums = true; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); : : FLAGS_cfile_write_checksums = false; : FLAGS_cfile_verify_checksums = false; : TestReadWriteRawBlocks(NO_COMPRESSION, 1000); : TestReadWriteRawBlocks(SNAPPY, 1000); maybe this could be collapsed into a for loop? for (FLAGS_cfile_write_checksum : {false, true}) { for (FLAGS_cfile_verify_checksum : {false, true}) { ... } } PS3, Line 819: CorruptReadableBlock would it be simpler to just corrupt the data on disk instead of this code-injection-based approach? A hook in the fs manager code to corrupt a block on disk is probably more generally reusable in higher-level integration tests as well (where injecting this CorruptReadableBlock class may be a bit tricky). I think that would also enable making this code quite a bit simpler by reusing existing ReadFile/WriteFile functions http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS3, Line 157: > IncompatibleFeatures::ALL this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)? PS3, Line 363: uint32_t best to use a signed int here Line 366: data_size = ptr.size() - sizeof(uint32_t); we should verify that ptr.size() >= sizeof(uint32_t) first and return a Corruption otherwise. Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), , checksum_scratch)); I think perf wise it may be better to just allocate the extra 4 bytes in the block cache so that we can read the checksum in the same IO vs issuing the extra syscall here. Plus the code is likely to be a little simpler? -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 3: I need to handle the header and footer yet, which may require a change in how to handle backwards compatibility. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#3). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block. cfile_write_checksums is defaulted to false to ensure upgrades don't immeditaly result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not writen the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enabled validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - More unit and perf testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/fs-test-util.h 5 files changed, 195 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/3 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#2). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block. cfile_write_checksums is defaulted to false to ensure upgrades don't immeditaly result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not writen the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enabled validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - More unit and perf testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc 4 files changed, 102 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/2 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 1: This is just a quick push to sanity check my approach. -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6630 Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block. cfile_write_checksums is defaulted to false to ensure upgrades don't immeditaly result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not writen the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enabled validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. TODO: - More unit and perf testing - Change cfile block to page to avoid term overload? Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc 4 files changed, 103 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/1 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke