Andrew Wong has posted comments on this change. Change subject: Predicate evaluation pushdown ......................................................................
Patch Set 7: (97 comments) http://gerrit.cloudera.org:8080/#/c/3990/3//COMMIT_MSG Commit Message: Line 16: https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md > Nit: it would be nice to have the document in the Kudu source tree when it' Will do. Line 20: https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md > Ditto. Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/benchmarks/tpch/tpch_real_world.cc File src/kudu/benchmarks/tpch/tpch_real_world.cc: Line 87: DEFINE_string(tpch_path_to_dbgen_dir, ".", > This does not look as a good default. I think ".", "./tpch" or "./tpch-dbg Done, this was not meant to be in the patch. http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 252: // store their min/max values. CopyNextAndEval in these blocks could > Ditto for the asterisk (star) as for the corresponding header file. Done Line 259: ctx->SetDecoderEvalSupported(); > Is this still relevant? I think you have implemented CopyNextAndEval for p This could be extended to other blocks. If, for instance, in integer blocks, we begin storing the the min and max of the block, we could either short-circuit or evaluate slice-by-slice. Right now, anything that doesn't support decoder-level evaluation will just set eval_complete to false and go down the copy&evaluate path. Line 264: > terminate all comments with a period. Done Line 289: // been cleared, in which case we can skip evaluation. > nit: if I'm not mistaken, the style guide mentions if/else should be format Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 191: dict_decoder_(iter->GetDictDecoder()), > Why not to move those into the initializer list? Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.h File src/kudu/cfile/binary_dict_block.h: Line 131: ColumnEvalContext *ctx, > Nit: it seems in the majority of the code in this file the asterisk tends t Done Line 140: data_decoder_->SeekForward(n); > drop 'virtual' and replace OVERRIDE with override, here and above. This didn't get picked up by the last patch, but it will be fixed. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.h File src/kudu/cfile/binary_dict_block.h: Line 131: ColumnEvalContext *ctx, > Style nit: the rest of this file uses the "asterisk is closer to the type" Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 290: Slice *out = reinterpret_cast<Slice *>(dst->data()); > If the 'i' variable is not needed outside of the cycle, why not to move it Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 306: Status BinaryPlainBlockDecoder::CopyNextValues(size_t *n, ColumnDataView *dst) { > I wonder whether this function could share code with the new one using some Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 110: > What if SeekForward returned OK only if it moved forward at the specified n Fair point, may be confusing. An "incomplete" scan isn't an error, but rather it means we've reached the end of the decoder. The status is not used, so it makes more sense for this to be void. Alternatively, the if HasNext() is false, it should return an error status and OK() otherwise. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.h File src/kudu/cfile/binary_plain_block.h: Line 111: void SeekForward(size_t *n) override { > Does it make sense to implement this method in the base class instead? I agree it'd be nice, but I'm hesitant since they are dependent on private members that may not be defined for all block encodings (e.g. dict blocks don't have a cur_idx_ member, but it is used in SeekForward). We could just define these members in the base class and just leave them as unused when they aren't used, something to consider, definitely. Edit: yes this can be done by using Count() and GetCurrentIdx(), but the issue is still that there needs to be functions that can modify cur_idx_, which is not a member of all blockdecoders. Line 112: DCHECK(HasNext()); > Why not to return non-OK status in that case instead? I.e., always assign It isn't an error to not seek forward the specified number of positions. This case occurs when the batch is started near the end of a decoder. The batch may request 1000 rows, but the decoder itself may only have 100 left, in which case *n will be set to 100. This is the contract for CopyNextValues() as well. http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/block_encodings.h File src/kudu/cfile/block_encodings.h: PS2, Line 150: Fetch the next values from the block a > I think this should be up to n, right? That's at least the contract of Cop You're right. *n should be set to the largest value <= to itself without going past the end of the block. The contract should be the same as CopyNextValues. http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/block_encodings.h File src/kudu/cfile/block_encodings.h: Line 162: ColumnDataView *dst) { > What if CopyNextValues() returned non-OK code? Would it make sense to chec Done. Should be wrapped with RETURN_NOT_OK() http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/block_encodings.h File src/kudu/cfile/block_encodings.h: Line 138: virtual void SeekForward(size_t *n) = 0; > I think it'd be better to use 'int' here, so you can DCHECK_GE(*n, 0) at th I see, just to clarify, this would be used to detect cases where *n is too large and overflows. Line 157: // function is that the context is marked as supporting decoder eval. This > I think there's a subtle requirement here that, given a particular encoding Right, the current contract, although not explicitly enforced right now, is that all blocks of a block encoding will always set eval_complete to true or false. Enum state should work. Line 162: ColumnDataView *dst) { > RETURN_NOT_OK Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/bshuf_block.h File src/kudu/cfile/bshuf_block.h: Line 278: void SeekForward(size_t *n) override { > Consider returning non-OK status code (like Status::Incomplete()) instead o See comments in other implementations of SeekForward. http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 728: if (ctx_->pred()->predicate_type() == PredicateType::None) { > wrap comments at 80 columns if possible (100 max). Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 670: } > typo Done Line 728: if (ctx_->pred()->predicate_type() == PredicateType::None) { > I think a getter like 'ctx_->try_decoder_predicate_evaluation()' would be a Done Line 733: if (ctx_->pred()->EvaluateCell(static_cast<const void *>(&cur_string))) { > would rather add a wrapper in the test code instead of adding a compatibili Done Line 747: } > should remove this Done Line 977: > again a getter would centralize these checks and make the code more readabl Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 488: ctx->SetDecoderEvalNotSupported(); > shouldn't this be done in PrepareEvalContext? Done. This was put here to prevent Scan from being called before SetDecoderEvalNotSupported was called. This only happened in cfile-test, but can definitely be avoided in test rather than in code. PS7, Line 489: ctx->block() > given the number of times you use ctx->block() in this function I think it' Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: PS2, Line 385: dicate o > predicate Done Line 387: // is shared among the multiple BinaryDictBlockDecoders in a single cfile, > This seems like a strange place to put this method; why isn't it done by th A single vocabulary is shared among the entire cfile, which may contain multiple dictionary blocks. This means the predicate set itself is owned by the cfile, not the decoders, and this provides the decoders a way to access it. http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 24: #include "kudu/common/columnblock.h" > forward decl is sufficient There seem to be a fair amount of dependencies that get pretty hairy when I forward declare this. Line 222: // Get the ordinal index that the iterator is currently pointed to. > would be nice to avoid this "duplicate API" thing where one set of things g Done Line 261: virtual Status FinishBatch() = 0; > see above, would be good to collapse the two APIs Done Line 394: > this isn't returning a set of predicates, though -- it's returning a set of Done Line 395: struct PreparedBlock { > should rename to something more specifically about dictionary... maybe GetC Right, my thinking was it is a set of everything that satisfies the predicate. I see your point, will change. Line 468: > rename (per above) Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: PS7, Line 390: ; > nit: space after ; Done PS7, Line 463: or>cod > nit: space after > Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/plain_block.h File src/kudu/cfile/plain_block.h: Line 211: virtual void SeekForward(size_t *n) OVERRIDE { > Does it make sense to implement this method in the base class instead? See replies in other implementations of SeekForward. The base abstract class does not have the right private members to implement this. Am considering exposing a public API for these members. http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_eval_context.h File src/kudu/common/column_eval_context.h: PS2, Line 17: > The style guide forbids non-const reference args and fields. This is a lit eval_complete is a flag that gets set at the decoder level to indicate that decoder-level evaluation is supported by the block. It tells the materializing iterator that the decoder is only smart enough to copy values and that it still needs to evaluate each cell. It could take a pointer though, which would have the same functionality and be clearer. PS2, Line 24: > ampersand should be with the type, same with the pointer * below. I think Done http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/column_eval_context.h File src/kudu/common/column_eval_context.h: Line 36: : col_idx_(col_idx), > indent colon 4 spaces, and wrap list at commas Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/column_eval_context.h File src/kudu/common/column_eval_context.h: Line 43: // Column index in the parent CFileSet. > I think this is incorrect and it's actually the index within the projection Done Line 57: // Must be greater than size 0 and must be large enough to cover the number > isn't "greater than size 0" redundant with "large enough to cover the numbe Done Line 79: void SetDecoderEvalNotSupported() { > particularly seems like here it should be a CHECK that it's kNotSet. Otherw Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: Line 270: > move this above the anonymous namespace above, the function defined in it i Done Line 277: // Going a step further we could do runtime codegen to inline the > The IsNotNull and None branches shouldn't be necessary given the prior chec Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 130: // selection vector to 0. > I think this is going to be really slow, because it forces a bunch of branc Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 128: // This is evaluated as an 'AND' with the current contents of *sel: > this comment can be more general -- i.e "when the predicate is based on a c Done Line 129: // - If the predicate evaluates to false, sets the appropriate bit in the > nit: "should not" is vague. "Do not"? What happens if they do? It seems fun Right, I had written this comment and then changed the contract. This comment is deprecated. Line 130: // selection vector to 0. > should clarify that this is a physical_type (same below) Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.cc File src/kudu/common/generic_iterators.cc: PS7, Line 522: = ColumnEvalContext > can just be: Done Line 526: if (ctx.pred()->predicate_type() == PredicateType::None) { > does a None predicate ever make it this far into the code? I would think th Put this here since my unit tests cover None operations, but yes, in a real scan, this won't be reached. Line 549: ColumnEvalContext ctx = ColumnEvalContext(col_idx, > same Done Line 553: ctx.SetDecoderEvalNotSupported(); > why's this necessary? shouldn't the nullptr predicate be sufficient for the Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.h File src/kudu/common/generic_iterators.h: Line 28: #include "kudu/common/column_predicate.h" > this new include isn't used in this header file (include it in the .cc if i Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/iterator.h File src/kudu/common/iterator.h: Line 24: #include "kudu/common/rowblock.h" > I think you can get away with forward decls here Done Line 102: > is this still needed? It seems like the code is now calling the other overl Done Line 103: // Finish the current batch. > need doc. Done http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/iterator.h File src/kudu/common/iterator.h: Line 35: class ScanSpec; > keep this list sorted. Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/rowblock.h File src/kudu/common/rowblock.h: Line 123: // and the view will move forward by the appropriate amount. In this way, the > It's not clear why it would be necessary to have this wrapper for nullptr S Done Line 125: class SelectionVectorView { > Nit: the check for sel_vec_ is already performed by the Advance() method. Done Line 133: row_offset_ += skip; > Why to advance the row_offset_ if there is no underlying data? Please add There are some underlying structural issues that are being addressed. These all stem from an issue where the scan path diverges depending on the decoder-evaluation-support. Working on merging these paths, so should be able to avoid these. Line 146: } > Nit: tab/shift is missing Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/rowblock.h File src/kudu/common/rowblock.h: Line 121: // The SelectionVectorView keeps track of where in the selection vector a given > doc Done PS4, Line 137: + r > nit: space around ' + ' (here and below) Done Line 141: BitmapClear(sel_vec_->mutable_bitmap(), row_offset_ + row_idx); > all these if()s seem expensive as these are hot-path functions. Maybe we ca The null selection vector is a result of there being two paths down which a scan can go. Collapsing them should solve this. Line 168: // of the latter doesn't mean you _should_. > SelectionVector * const sel_vec_; Done http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/rowblock.h File src/kudu/common/rowblock.h: Line 132: DCHECK_LE(skip, sel_vec_->nrows() - row_offset_); > seems to me that these checks should take into account row_offset_ Right, should be comparing row_offset_+row_idx and nrows. PS6, Line 154: > put * with type here and below Done Line 155: > This seems a little strange, I would have expected the method to return the I didn't want to return a pointer to an off-word address, but I agree that the whole point of this class is to keep an moving window. Regardless, I think we can get by by just having the clear/set functions. I don't see a case where we would want the actual view into the bitmap for any other reason than to set or clear. Even now, SetBit is unused, but I've kept it in since I think it will be useful in future features (e.g. OR evaluation). http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/types.h File src/kudu/common/types.h: PS3, Line 59: bool AreConsecutive(const void* a, const voi > Consider introducing typedef for this functor. This should not have been in this patch. http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/CMakeLists.txt File src/kudu/tablet/CMakeLists.txt: Line 99: ADD_KUDU_TEST(tablet-decoder-eval-test) Not included in the patch. http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/cfile_set-test.cc File src/kudu/tablet/cfile_set-test.cc: Line 105: } > Insert a space above. Done PS6, Line 106: > I think this names a little confusing, since the wrapping and context are a Done PS6, Line 107: ator *iter, : size_t col_idx, > wrapping Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/cfile_set-test.cc File src/kudu/tablet/cfile_set-test.cc: Line 112: ctx.SetDecoderEvalNotSupported(); > same comment here as elsewhere -- if there's no predicate we shouldn't have Done Line 115: private: > nit: add blank line before 'private:' Done http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: Line 453: Status s = col_iters_[i]->FinishBatch(); > nit: why WARNING, not ERROR? I don't think not preparing the column would be a fatal error (it wouldn't cause a crash), so WARNING feels right. Other areas in this code use LOG(WARNING) as well. http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: Line 176: // Returns true if there might exist deltas to be applied. It is safe to > typo: "there there" Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: Line 821: // TODO: change the API to take in the col_to_apply and check for deltas on > I think this could be better done by: By the time this is called on the scan path, PrepareBatch() should indeed have already been called, although it would be good to do this check anyway. DCHECK(prepared_) should suffice for the third bullet, since prepared_=true implies that PrepareBatch() has just been called, so the blocks should be popped to the point where we're within the current prepared batch. As I understand it, if all of the delta blocks are past the end of the decoder, or if there are no blocks, ApplyUpdates should be a no-op. http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/deltamemstore.cc File src/kudu/tablet/deltamemstore.cc: Line 368: return true; > Consider using empty() method instead. Done Line 372: return true; > Consider using empty() method instead of size(). Done http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/tablet/memrowset.cc File src/kudu/tablet/memrowset.cc: Line 474: > Why is it necessary to have this alias? It was there because there is a divergence in the scan-path whether or not you want the decoder-level evaluation or not. I am working on merging these paths so only one is needed (and PredPushedNextBlock will be taken out in favor of NextBlock with some flag) http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/tablet-decoder-eval-test.cc File src/kudu/tablet/tablet-decoder-eval-test.cc: Line 1: // Licensed to the Apache Software Foundation (ASF) under one > nit: license header Done PS4, Line 92: > Is this passed by a reference and then modified by the function? If yes, t Done PS4, Line 100: : void FillTestTablet(size_t cardinality, size_t strlen) { : if (GetParam() == LARGE && !AllowSlowTests()) { : return; : } : size_t nrows = static_cast<size_t>(GetParam()); : RowBuilder rb(client_schema_); : : LocalTabletWriter writer(tablet().get(), &client_schema_); : KuduPartialRow row(&client_schema_); : // Fill tablet with repeated pattern of strings. : // e.g. Cardinality: 3, strlen: 2: "00", "01", "02", "00", "01", "02", ... : for (int64_t i = 0; i < nrows; i++) { : CHECK_OK(row.SetInt32(0, i)); : CHECK_OK(row.SetInt32(1, i * 10)); : CHECK_OK(row.SetStringCopy(2, StringPrintf( : Substitute("%0$0$1", strlen, PRId64).c_str(), : i%cardinality))); : : > Is it possible to re-factor this separating the common code into a class me Done http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/tablet-decoder-eval-test.cc File src/kudu/tablet/tablet-decoder-eval-test.cc: Line 35: #else Can be shortened. Line 44: Not used. PS6, Line 89: per) { > here and below, prefer to use stings::Substitute instead of StringPrintf. Looks like Substitute can't specify the a string length in formatting, but at least it can still get away from doing concat and to_string. Line 96: last_chunk_count = upper - lower; > remove? Done PS6, Line 128: > const probably isn't necessary? Done http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/tablet-test-util.h File src/kudu/tablet/tablet-test-util.h: PS4, Line 144: inline > Why is it necessary to have this function in-lined? Just because there isn This is used to benchmark the times, so to get the behavior of just the code in this function, it makes sense for this to be inlined. PS4, Line 145: int* fetched > Should be passed by pointer, if following google's style guide: https://goo Done http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/tablet-test-util.h File src/kudu/tablet/tablet-test-util.h: Line 146: int limit = INT_MAX) { > 'limit' is never used, right? No, removed. -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <andrew.w...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes