Will Berkeley has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating 
prefix
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG
Commit Message:

PS4, Line 17: compatibility with bloomfile
> what's this mean?
There was a bug making bloom files not work. The trick is that bloom files call 
AppendRawBlock directly instead of FinishCurDataBlock. I altered AppendRawBlock 
to accept the last key as well as the first key, so now a BloomFileWriter 
maintains its own last key, just like it maintains its own first key. I feel 
like the change is a little awkward though, so I'd appreciate any ideas on how 
to improve it.


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 131:     last_key_.assign_copy(last->data(), last->size());
> instead of copying this every time, could we use GetLast on the code word b
Doesn't the dictionary in the block builder map from string to codeword? The 
last key we want is the last string in the block, not the codeword stored for 
it, so we can't look it up in the dictionary. I did improve this, though, by 
looking up in the dict block via GetKeyAtIdx, a method which was added to plain 
encoding per one of your comments. This means I had to drop CHECK(finished_) 
from it (but kept it in GetFirstKey and GetLastKey). Does that solution seem 
reasonable to you?


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const {
> can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc
File src/kudu/cfile/cfile_util.cc:

Line 126: void SeparatingKey(const faststring& left, const faststring& right, 
faststring *target) {
> can you add some unit tests for this function? (eg the examples from the co
Done


Line 127:   DCHECK(memcmp(left.data(), right.data(), min(left.size(), 
right.size())) <=
> I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more re
Done


Line 130:   if (left.size() == 0) {
> .empty()
Done


Line 131:     // special case for first block: use the whole key
> not sure why this is the case
Bloomfiles work with this now; no more special case needed.


Line 135:     target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl 
+ 1);
> hm, this is a slightly different algorithm than the one used in gutil/strin
I didn't realize that function existed! Still, it finds a separator in the 
range left <= * < right, and we require it in the range left < * <= right. I 
thought about it a bit and I don't see any advantage.


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

Line 63:   bool deltafile;
> I think I'd prefer this be called something like 'optimize_index_keys' or s
I'm going to keep it simple for now and just rename, but I'll consider 
implementing your suggestion for delta keys a little later.


Line 110: // Computes the shortest key satisfying left < key < right and places 
it into target.
> should clarify whether 'target' may be the same faststring as 'left' or 'ri
Superseded by another change.


PS4, Line 111: const faststring& left, const faststring& right
> may make sense to use StringPiece or Slice here rather than specifically fa
Agreed. Converted to use Slice.


Line 111: void SeparatingKey(const faststring& left, const faststring& right, 
faststring *target);
> rename to GetSeparatingKey
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 141:     last_key_.clear();
> no need (it's initialized to clear)
Done


Line 437:       SeparatingKey(last_key_, tmp_buf_, &sep_buf);
> maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid t
Done


http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h
File src/kudu/cfile/rle_block.h:

Line 265:     if (count_ > 0) {
> usually we put the error case first
I'm being consistent with GetFirstKey's (preexisting) implementation. If you 
like, I'll adjust both of them to handle error case first.


-- 
To view, visit http://gerrit.cloudera.org:8080/3304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to