Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 14: (25 comments) Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@345 PS12, Line 345: bool cache_seeked_value > nit: Even though this is calling StoreCurrentValue(), I think calling this Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@348 PS12, Line 348: const std::string& GetCurrentValue() const; > Add 'const' modifier for this method. Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472 PS12, Line 472: // Value currently pointed to by validx_iter_. : std::string cur_val_; : > +1 Andrew, in case we have a single public function GetCurrentValue(), it would actually lead to different results because it uses CopyNextValues(size_t *n, ColumnDataView *dst) . CopyNextValues(..) fetches the next 'n' values from the block into 'dst' . Hence, the need to store this fetched value in 'cur_val_' . http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@473 PS10, Line 473: std::string cur_val_; : > This breaks encapsulation. We should document something along these lines i Listed this caveat in CFileSet::Iterator . Please let me know if it fine. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc@792 PS12, Line 792: DCHECK(!prepared_blocks_.empty()); > What happens if this does not hold in release build? Makes sense. Added return non-OK status (Status::NotFound) in this case. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@213 PS12, Line 213: // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is > nit: Reword as "If `cache_seeked_value` is true, the predicate column itera Rephrased as: "If 'cache_seeked_value' is true, the validx_iter_ will store the value seeked to." , since the iterator is a validx_iter_ . http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@220 PS12, Line 220: Status SeekToRowWithCurPrefixMatchingPred(const gscoped_ptr<EncodedKey>& enc_key); : > nit: reword as "Build the key with the same prefix as `cur_enc_key`, that h Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@19 PS12, Line 19: #include <algorit > nit: please use '#include <cmath>' instead and place that among other C++-s Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@76 PS12, Line 76: > Per our discussion with Mike today, I thought the idea was to have this dis You are right. I assumed this will be disabled right before merging (in the final patch). Set this to 'false' now, as now we are testing both the scenarios in index_skipscan_test.cc http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@435 PS12, Line 435: > nit: you could use 'auto' here. Removed predicates var as per comment on L436. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@436 PS12, Line 436: spec.predi > nit: this can just be `spec.predicates()`, then there would be no need for Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@440 PS12, Line 440: // Get the column id from the predicate > nit: move this out of the loop, maybe up by L413 and then use it in definin Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444 PS12, Line 444: col_id == > Could find_column() return -1 here? Yes, if the column is not found in the schema. Not quite sure whether to continue the loop in this case or just return with skip scan disabled. Any suggestion ? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = pred_value; : : // Store the cutoff on the number of skip scan seeks. : > It was originally written as described, but I asked her to make the change, Alexey, please let me know if you feel that this change should be reverted. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@542 PS12, Line 542: > nit: stick the asterisk to the type of the parameter. Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551 PS12, Line 551: atus CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value) { : gscoped_ptr<EncodedKey> enc_key; : Arena arena(kEncodedCompositeKeyMaxSize); > Hmm.. This is doing a lot of heap-allocating and gets called _very often_. Noted. I will have to look more into this. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@561 PS12, Line 561: : if (cache_seeked_value) { : // Set the predicate column to the predicate value in case we can find a : // predicate match in one search. As a side effect, GetKeyWithPredicateVal() : // sets minimum values on the columns after the predicate value, which is : // required for correctness here. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : enc_key = GetKeyWithPredicateVal(&partial_row, enc_key); : } : return key_iter_->SeekAtOrAfter(*enc_key, : /* cache_seeked_value= */ cache_seeked_value, : /* exact_match= */ nullptr); : } : : Status CFileSet::Iterator::SeekToR > I think this would be clearer as Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@583 PS12, Line 583: // If we got this far, the current key doesn't match the predicate, so search : // for the next key that matches the current prefix and predicate. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : gscoped_ptr<EncodedKey> key_with_pred_value = : > Hmm.. IMO SeekToNext* should always move forward at least one row, unless w Yes, you are right. Renamed this function to SeekToRowWithCurPrefixMatchingPred(). http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@662 PS12, Line 662: / If this matches, this is the lower bound of our desired scan. : // 3. If we found our desired lower bound, find an upper bound for the scan : // by searching for the next row key matching one value higher than the : // highest value that will match our predicate. : : skip_scan_upper_bound_idx_ = upper_bound_idx_; : size_t skip_scan_lower_bound_idx = cur_idx_; : : // Whether we found our lower bound key. : bool lower_bound_key_found = false; > Can you also label down below where 1., 2., and 3. are? Added these labels "Step 1..." , "Step 2..", "Step 3.." in the comments below. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@704 PS12, Line 704: > I don't think upper_bound_idx_ changes for the duration of a scan, so maybe Done. Added this as a member variable of CFileSet::Iterator and initialized it in CFileSet::Iterator::TryEnableSkipScan. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@721 PS12, Line 721: s = SeekToRowWithCurPrefixMatchingPred(next_prefix_key); > This decodes the current key, and then immediately after we call SeekToNext Makes sense. I have made this change now. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@92 PS12, Line 92: ( > what it was any other error but IsAlreadyPresent? Added CHECK_OK() in else block to handle other cases. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@270 PS12, Line 270: void ScanTablet(ScanSpec* spec, vector<string> > Does it make sense to add additional dimension to the test: whether the ski Thanks for the reference, Alexey. Done. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@280 PS12, Line 280: following set of tests > nit: you could use NO_FATALS() shortcut instead for better readability Thanks for this, Alexey. Done. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@321 PS12, Line 321: auto pred_p1 = ColumnPredicate::Equality(schema_.column(0), &value_p1); : auto pred_p2 = ColumnPredicate::Equality(schema_.column(1), &value_p2); : spec.AddPredicate(pred_p1); : spec.AddPredicate(pred_p2); : vector<string> results; : NO_FATALS(ScanTablet(&spec, &results, "Exact match on P1 and P2")); : EXPECT_EQ(1, results.size()); : } : break; : : c > nit: use the same layout of scopes and braces as in other cases (that's for Changed to a more uniform layout. Please let me know if this looks good. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 11 Aug 2018 08:14:07 +0000 Gerrit-HasComments: Yes