Alexey Serbin has posted comments on this change.

Change subject: Predicate evaluation pushdown
......................................................................


Patch Set 3:

(22 comments)

First pass -- I just saw a new version has appeared.  Will continue with the 
new version.

http://gerrit.cloudera.org:8080/#/c/3990/3//COMMIT_MSG
Commit Message:

Line 16: 
https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md
Nit: it would be nice to have the document in the Kudu source tree when it's 
time to submit the changes.  I.e., please add that .md file into the 
changelist.  Probably, $REPO_ROOT/docs/design-docs is the right place to add 
the file.  Don't forget to update the comment once it's time to push the change 
into the central repo.  Better is to put the correct, relative link here, and 
add the temporary location of the documents into some sort of gerrit-specific 
comment.


Line 20: 
https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md
Ditto.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/benchmarks/tpch/tpch_real_world.cc
File src/kudu/benchmarks/tpch/tpch_real_world.cc:

Line 87: DEFINE_string(tpch_path_to_dbgen_dir, "/Users/andrew.wong/tpch-dbgen",
This does not look as a good default.  I think ".", "./tpch" or "./tpch-dbgen" 
would be better.


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 252: // store their min/max values. CopyNextAndEval in these blocks could
Ditto for the asterisk (star) as for the corresponding header file.


Line 289:       // Row is included in predicate, copy data to block.
nit: if I'm not mistaken, the style guide mentions if/else should be formatted 
like

if (...) {
  ...
} else {
  ...
}


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 191:   dict_decoder_ = iter->GetDictDecoder();
Why not to move those into the initializer list?


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.h
File src/kudu/cfile/binary_dict_block.h:

Line 131:   Status CopyNextAndEval(size_t *n,
Nit: it seems in the majority of the code in this file the asterisk tends to be 
with parameter type, not parameter name. It would be nice to have the same for 
the newly added members/methods/functions.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.h
File src/kudu/cfile/binary_dict_block.h:

Line 131:   Status CopyNextAndEval(size_t *n,
Style nit: the rest of this file uses the "asterisk is closer to the type" 
convention.  Consider following it here as well, if it makes sense.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 290:   size_t i;
If the 'i' variable is not needed outside of the cycle, why not to move it 
inside the cycle's visibility, like

for (size_t i = 0; ...)


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 110: 
What if SeekForward returned OK only if it moved forward at the specified 
number of bytes and returned Status::Incomplete() otherwise?


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 111:   virtual Status SeekForward(size_t* n) OVERRIDE {
Does it make sense to implement this method in the base class instead?


Line 112:     DCHECK(HasNext());
Why not to return non-OK status in that case instead?  I.e., always assign the 
actual number of positions it was forwarded into the output parameter and 
return non-OK status when it's not possible to seek forward to the specified 
number of positions.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/block_encodings.h
File src/kudu/cfile/block_encodings.h:

Line 162:     CopyNextValues(n, dst);
What if CopyNextValues() returned non-OK code?  Would it make sense to check 
for that and return right away in that case?


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/bshuf_block.h
File src/kudu/cfile/bshuf_block.h:

Line 278:   Status SeekForward(size_t* n) OVERRIDE {
Consider returning non-OK status code (like Status::Incomplete()) instead of 
DCHECK()) if it's not possible to advance current index as far as desired.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/plain_block.h
File src/kudu/cfile/plain_block.h:

Line 211:   virtual Status SeekForward(size_t* n) OVERRIDE {
Does it make sense to implement this method in the base class instead?


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

Line 123:   explicit SelectionVectorView(SelectionVector *sel_vec, size_t 
first_row_idx = 0)
It's not clear why it would be necessary to have this wrapper for nullptr 
SelectionVector.  Could you add a comment to clarify this?


Line 125:     if (sel_vec) {
Nit: the check for sel_vec_ is already performed by the Advance() method.


Line 133:     row_offset_ += skip;
Why to advance the row_offset_ if there is no underlying data?  Please add a 
comment about the reason behind this.


Line 146:   if (sel_vec_) {
Nit: tab/shift is missing


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/types.h
File src/kudu/common/types.h:

PS3, Line 59: std::function<int(const void*, const void*)>
Consider introducing typedef for this functor.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 453:     LOG(WARNING) << "Unable to prepare column " << ctx->col_idx() << 
": " << s.ToString();
nit: why WARNING, not ERROR?


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/tablet/memrowset.cc
File src/kudu/tablet/memrowset.cc:

Line 474: 
Why is it necessary to have this alias?


-- 
To view, visit http://gerrit.cloudera.org:8080/3990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <andrew.w...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

Reply via email to