Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 7: (18 comments) Thanks for the comments. Also, added the condition for disabling skip scan on the fly. Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44 PS7, Line 44: class EncodedKey; > move this down to line 55 Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75 PS7, Line 75: return_from_skip_scan > instead of whether to enable this, this flag should specify how many loops Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514 PS7, Line 514: seek_to_upper_bound_key > How about: Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516 PS7, Line 516: 1024 > according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I Added this in cfile_reader.h (this being a common file for both cfile_set.cc and cfile_reader.cc where this variable will be accessed). http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518 PS7, Line 518: // Convert current key slice to encoded key. : RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), &arena, : Slice(key_iter_->GetCurrentValue()), &enc_key), "Invalid scan prefix key"); > This seems to be repeated in a couple of places. How about we factor this o Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533 PS7, Line 533: // Set the predicate column with the predicate value. : // This is done to avoid overshooting the lower bound on the predicate value : // in the current scan range. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : gscoped_ptr<EncodedKey> key_with_pred_value = : GetKeyWithPredicateVal(&partial_row, enc_key.Pass()); : : return key_iter_->SeekAtOrAfter(*key_with_pred_value, &exact_match, /* set_current_value= */ true); > Like I noted above, I think this part should be factored into its own funct Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579 PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector<const void *> &raw_keys, int start_col_id, int end_col_id) { : // Encode the key currently pointed to by validx_iter_. : Arena arena2(1024); : gscoped_ptr<EncodedKey> cur_enc_key; : EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), &arena2, : Slice(key_iter_->GetCurrentValue()), &cur_enc_key); : : for (int col_id = start_col_id; col_id <= end_col_id; col_id++) { : if (base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) != : base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id])) { : return false; : } : } : return true; : } > Instead of checking that the key matches, why not just check that the predi Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596 PS7, Line 596: // This is a three seek approach for index skip scan implementation: : // 1. Place the validx_iter_ at the entry containing the next : // unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : // found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : // store the row id that represents the last relevant entry (upper bound) wrt the : // current "prefix_key"(found in 1.) > move this inside the function to line 605, since it describes what we are d Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609 PS7, Line 609: bool lower_bound_key_found; : bool predicate_match = false; > can we merge these two variables into the single variable lower_bound_key_f Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614 PS7, Line 614: // Whether to seek for the next unique prefix, this flag is false : // when the prefix key gets incremented while seeking for the lower bound : // entry in the loop. : bool continue_seeking_next_prefix = true; > I think it would be more descriptive of the reasoning behind the program lo Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@621 PS7, Line 621: while (!predicate_match) { > how about: Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@678 PS7, Line 678: > add: CHECK_OK(s) to ensure we didn't get something unexpected in 's' Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@681 PS7, Line 681: CheckKeyMatch > Why do we care if we have an exact key match? Don't we only care if we have Replaced with CheckPredicateMatch() http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@684 PS7, Line 684: // We didn't find an exact match for our lower bound, so re-seek. : if (!predicate_match) { : // If the prefix key has changed already, do not seek for the next prefix : // in the next iteration. : if ( !CheckKeyMatch(key_with_pred_value->raw_keys(), : 0, skip_scan_predicate_column_id_ - 1)) { : continue_seeking_next_prefix = false; : } : continue; : } > I think this could be equivalently written as: Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@703 PS7, Line 703: // We add 1 here because we want to increment the value of a prefix that : // includes even our predicate column. : // Note: After this step, the current value pointed by validx_iter_ is an : // exclusive upper bound on our scan range, therefore, key_iter->cur_val_ is not updated. : // to avoid skipping over this current "prefix_key" in the next iteration. : s = SeekToNextPrefixKey(skip_scan_predicate_column_id_+1, /* seek_to_upper_bound_key */ true); > I think it would be easier to follow the logic if this said something more Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@723 PS7, Line 723: continue > remove (doesn't do anything at the end of the loop) Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@730 PS7, Line 730: TODO > change this to TODO(anupama) Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@749 PS7, Line 749: (static_cast<int>(cur_idx_) >= skip_scan_upper_bound_idx_) > Move the check for this boundary condition inside SkipToNextScan() to encap Done -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 06 Aug 2018 05:54:28 +0000 Gerrit-HasComments: Yes