Alexey Serbin has posted comments on this change. Change subject: Predicate evaluation pushdown ......................................................................
Patch Set 3: (22 comments) First pass -- I just saw a new version has appeared. Will continue with the new version. http://gerrit.cloudera.org:8080/#/c/3990/3//COMMIT_MSG Commit Message: Line 16: https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md Nit: it would be nice to have the document in the Kudu source tree when it's time to submit the changes. I.e., please add that .md file into the changelist. Probably, $REPO_ROOT/docs/design-docs is the right place to add the file. Don't forget to update the comment once it's time to push the change into the central repo. Better is to put the correct, relative link here, and add the temporary location of the documents into some sort of gerrit-specific comment. Line 20: https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md Ditto. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/benchmarks/tpch/tpch_real_world.cc File src/kudu/benchmarks/tpch/tpch_real_world.cc: Line 87: DEFINE_string(tpch_path_to_dbgen_dir, "/Users/andrew.wong/tpch-dbgen", This does not look as a good default. I think ".", "./tpch" or "./tpch-dbgen" would be better. http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 252: // store their min/max values. CopyNextAndEval in these blocks could Ditto for the asterisk (star) as for the corresponding header file. Line 289: // Row is included in predicate, copy data to block. nit: if I'm not mistaken, the style guide mentions if/else should be formatted like if (...) { ... } else { ... } http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 191: dict_decoder_ = iter->GetDictDecoder(); Why not to move those into the initializer list? http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.h File src/kudu/cfile/binary_dict_block.h: Line 131: Status CopyNextAndEval(size_t *n, Nit: it seems in the majority of the code in this file the asterisk tends to be with parameter type, not parameter name. It would be nice to have the same for the newly added members/methods/functions. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.h File src/kudu/cfile/binary_dict_block.h: Line 131: Status CopyNextAndEval(size_t *n, Style nit: the rest of this file uses the "asterisk is closer to the type" convention. Consider following it here as well, if it makes sense. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 290: size_t i; If the 'i' variable is not needed outside of the cycle, why not to move it inside the cycle's visibility, like for (size_t i = 0; ...) http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 110: What if SeekForward returned OK only if it moved forward at the specified number of bytes and returned Status::Incomplete() otherwise? http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 111: virtual Status SeekForward(size_t* n) OVERRIDE { Does it make sense to implement this method in the base class instead? Line 112: DCHECK(HasNext()); Why not to return non-OK status in that case instead? I.e., always assign the actual number of positions it was forwarded into the output parameter and return non-OK status when it's not possible to seek forward to the specified number of positions. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/block_encodings.h File src/kudu/cfile/block_encodings.h: Line 162: CopyNextValues(n, dst); What if CopyNextValues() returned non-OK code? Would it make sense to check for that and return right away in that case? http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/bshuf_block.h File src/kudu/cfile/bshuf_block.h: Line 278: Status SeekForward(size_t* n) OVERRIDE { Consider returning non-OK status code (like Status::Incomplete()) instead of DCHECK()) if it's not possible to advance current index as far as desired. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/plain_block.h File src/kudu/cfile/plain_block.h: Line 211: virtual Status SeekForward(size_t* n) OVERRIDE { Does it make sense to implement this method in the base class instead? http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/rowblock.h File src/kudu/common/rowblock.h: Line 123: explicit SelectionVectorView(SelectionVector *sel_vec, size_t first_row_idx = 0) It's not clear why it would be necessary to have this wrapper for nullptr SelectionVector. Could you add a comment to clarify this? Line 125: if (sel_vec) { Nit: the check for sel_vec_ is already performed by the Advance() method. Line 133: row_offset_ += skip; Why to advance the row_offset_ if there is no underlying data? Please add a comment about the reason behind this. Line 146: if (sel_vec_) { Nit: tab/shift is missing http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/types.h File src/kudu/common/types.h: PS3, Line 59: std::function<int(const void*, const void*)> Consider introducing typedef for this functor. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: Line 453: LOG(WARNING) << "Unable to prepare column " << ctx->col_idx() << ": " << s.ToString(); nit: why WARNING, not ERROR? http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/tablet/memrowset.cc File src/kudu/tablet/memrowset.cc: Line 474: Why is it necessary to have this alias? -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <andrew.w...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes