[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Reviewed-on: http://gerrit.cloudera.org:8080/3304 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 11: Build Started http://104.196.14.100/job/kudu-gerrit/1973/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: > Looks like a small compile error here :( whoops...was too hasty... -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 10: Build Started http://104.196.14.100/job/kudu-gerrit/1948/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#10). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/10 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: Looks like a small compile error here -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 176: data_builder_->GetLastKey(reinterpret_cast(&last_codeword)); > RETURN_NOT_OK here Done http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 62: // key should be a Slice * > 'key_void' should be a Slice* Done PS8, Line 63: void *k > style: void* (no space) Done PS8, Line 63: size_t > better to use 'int' (we try to avoid unsigned types these days) Done Line 71: Status GetLastKey(void *key) const OVERRIDE; > same Done http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_util.h File src/kudu/cfile/cfile_util.h: Line 109: void GetSeparatingKey(const Slice& left, Slice& right); > should be Slice* (we avoid non-const refs) Done http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 141: last_key_.clear(); > the default is empty, so no need to clear Done http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 269: return Status::NotFound("No keys in the block"); > usually we put the error case first, eg: Done http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: Line 76: opts.optimize_index_keys = false; > would be good to include a comment here explaining why (it's in the commit Done -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#9). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/9 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: Build Started http://104.196.14.100/job/kudu-gerrit/1927/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 176: data_builder_->GetLastKey(reinterpret_cast(&last_codeword)); RETURN_NOT_OK here http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 62: // key should be a Slice * 'key_void' should be a Slice* PS8, Line 63: void *k style: void* (no space) PS8, Line 63: size_t better to use 'int' (we try to avoid unsigned types these days) Line 71: Status GetLastKey(void *key) const OVERRIDE; same http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_util.h File src/kudu/cfile/cfile_util.h: Line 109: void GetSeparatingKey(const Slice& left, Slice& right); should be Slice* (we avoid non-const refs) http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 141: last_key_.clear(); the default is empty, so no need to clear http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 269: return Status::NotFound("No keys in the block"); usually we put the error case first, eg: if (PREDICT_FALSE(count_ == 0)) { return Status::NotFound(...); } (the philosophy is to keep the 'straight line' or 'common' code path with as little indentation as possible) http://gerrit.cloudera.org:8080/#/c/3304/8/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: Line 76: opts.optimize_index_keys = false; would be good to include a comment here explaining why (it's in the commit message, but people won't see that when reading the code) -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 8: Build Started http://104.196.14.100/job/kudu-gerrit/1870/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 7: Build Started http://104.196.14.100/job/kudu-gerrit/1869/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#7). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 226 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/7 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 6: > Sorry for the delay on this, Will. I was on PTO the last couple of > weeks. Hope to get back to review this week. I noticed the last > build failed but the build results have been purged by this point. > Is the current rev passing tests, etc? No worries Mr Lipcon. I hope you enjoyed your vacation. IIRC it passes everything, but on the last build Jenkins choked somehow and failed the build through no fault on my own. I'll verify it passes the basic stuff locally and repush so Jenkins can (hopefully) verify. Unrelatedly, I have a patch in the works for KUDU-1227 (update-merge). It works and passes the tests + the tests I wrote for it, but definitely needs some review love. Is there interest in that for 1.0? I mostly did it for personal enrichment but obv happy to contribute it if it's wanted. -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: Sorry for the delay on this, Will. I was on PTO the last couple of weeks. Hope to get back to review this week. I noticed the last build failed but the build results have been purged by this point. Is the current rev passing tests, etc? -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/1787/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#6). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 226 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/6 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG Commit Message: PS4, Line 17: compatibility with bloomfile > what's this mean? There was a bug making bloom files not work. The trick is that bloom files call AppendRawBlock directly instead of FinishCurDataBlock. I altered AppendRawBlock to accept the last key as well as the first key, so now a BloomFileWriter maintains its own last key, just like it maintains its own first key. I feel like the change is a little awkward though, so I'd appreciate any ideas on how to improve it. http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 131: last_key_.assign_copy(last->data(), last->size()); > instead of copying this every time, could we use GetLast on the code word b Doesn't the dictionary in the block builder map from string to codeword? The last key we want is the last string in the block, not the codeword stored for it, so we can't look it up in the dictionary. I did improve this, though, by looking up in the dict block via GetKeyAtIdx, a method which was added to plain encoding per one of your comments. This means I had to drop CHECK(finished_) from it (but kept it in GetFirstKey and GetLastKey). Does that solution seem reasonable to you? http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const { > can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int Done http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc File src/kudu/cfile/cfile_util.cc: Line 126: void SeparatingKey(const faststring& left, const faststring& right, faststring *target) { > can you add some unit tests for this function? (eg the examples from the co Done Line 127: DCHECK(memcmp(left.data(), right.data(), min(left.size(), right.size())) <= > I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more re Done Line 130: if (left.size() == 0) { > .empty() Done Line 131: // special case for first block: use the whole key > not sure why this is the case Bloomfiles work with this now; no more special case needed. Line 135: target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl + 1); > hm, this is a slightly different algorithm than the one used in gutil/strin I didn't realize that function existed! Still, it finds a separator in the range left <= * < right, and we require it in the range left < * <= right. I thought about it a bit and I don't see any advantage. http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h File src/kudu/cfile/cfile_util.h: Line 63: bool deltafile; > I think I'd prefer this be called something like 'optimize_index_keys' or s I'm going to keep it simple for now and just rename, but I'll consider implementing your suggestion for delta keys a little later. Line 110: // Computes the shortest key satisfying left < key < right and places it into target. > should clarify whether 'target' may be the same faststring as 'left' or 'ri Superseded by another change. PS4, Line 111: const faststring& left, const faststring& right > may make sense to use StringPiece or Slice here rather than specifically fa Agreed. Converted to use Slice. Line 111: void SeparatingKey(const faststring& left, const faststring& right, faststring *target); > rename to GetSeparatingKey Done http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 141: last_key_.clear(); > no need (it's initialized to clear) Done Line 437: SeparatingKey(last_key_, tmp_buf_, &sep_buf); > maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid t Done http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 265: if (count_ > 0) { > usually we put the error case first I'm being consistent with GetFirstKey's (preexisting) implementation. If you like, I'll adjust both of them to handle error case first. -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#5). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 20 files changed, 206 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/5 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/1786/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG Commit Message: PS4, Line 17: compatibility with bloomfile what's this mean? http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 131: last_key_.assign_copy(last->data(), last->size()); instead of copying this every time, could we use GetLast on the code word builder, then look it up in our dictionary? What you have here will probably hurt write performance http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const { can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int idx)? http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc File src/kudu/cfile/cfile_util.cc: Line 126: void SeparatingKey(const faststring& left, const faststring& right, faststring *target) { can you add some unit tests for this function? (eg the examples from the commit message) Line 127: DCHECK(memcmp(left.data(), right.data(), min(left.size(), right.size())) <= I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more readable. Especially if you change this function to take slices as arguments Line 130: if (left.size() == 0) { .empty() Line 131: // special case for first block: use the whole key not sure why this is the case Line 135: target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl + 1); hm, this is a slightly different algorithm than the one used in gutil/strings/util.h (FindShortestSeparator). eg their example: // FindShortestSeparator("foobar", "foxhunt", &sep) => sep == "fop" you would return 'fox', right? Either one is the same length, but I guess theirs ends up tighter to the left bound whereas yours ends up somewhere in the middle. Can you think of any advantages/disadvantages between the two? As far as I can think, either is correct, but maybe I'm missing something. http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h File src/kudu/cfile/cfile_util.h: Line 63: bool deltafile; I think I'd prefer this be called something like 'optimize_index_keys' or something, rather than specifically referring to whether it's a deltafile. Another option is to have an int 'preserve_index_key_prefix_length' or somesuch, where for delta keys we pass the maximum possible encoded length of a delta key. This ensures that we can still decode it, but we might still be able to truncate the value (since the delta entries themselves can be arbitrarily long with user data) Line 110: // Computes the shortest key satisfying left < key < right and places it into target. should clarify whether 'target' may be the same faststring as 'left' or 'right' or not Line 111: void SeparatingKey(const faststring& left, const faststring& right, faststring *target); rename to GetSeparatingKey PS4, Line 111: const faststring& left, const faststring& right may make sense to use StringPiece or Slice here rather than specifically faststring http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 141: last_key_.clear(); no need (it's initialized to clear) Line 437: SeparatingKey(last_key_, tmp_buf_, &sep_buf); maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid the extra allocation here http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 265: if (count_ > 0) { usually we put the error case first -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: Build Started http://104.196.14.100/job/kudu-gerrit/1749/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#4). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. Two exceptions: 1. This change does not apply to deltafiles. Deltafiles expect to be able to decode an index key into a DeltaKey. 2. The first key in a file, which could always be "", is left the same as before for compatibility with bloomfiles. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['apple' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 19 files changed, 204 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/4 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1739/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/3304 Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. This change does not apply to deltafiles, as they expect to be able to decode an index key into a DeltaKey. Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 18 files changed, 196 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/1 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley