Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 7:

(18 comments)

Thanks for the comments. Also, added the condition for disabling skip scan on 
the fly. Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44
PS7, Line 44: class EncodedKey;
> move this down to line 55
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75
PS7, Line 75: return_from_skip_scan
> instead of whether to enable this, this flag should specify how many loops
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514
PS7, Line 514: seek_to_upper_bound_key
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516
PS7, Line 516: 1024
> according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I
Added this in cfile_reader.h (this being a common file for both cfile_set.cc 
and cfile_reader.cc where this variable will be accessed).


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518
PS7, Line 518:   // Convert current key slice to encoded key.
             :   RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
             :       base_data_->tablet_schema(), &arena,
             :       Slice(key_iter_->GetCurrentValue()), &enc_key), "Invalid 
scan prefix key");
> This seems to be repeated in a couple of places. How about we factor this o
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533
PS7, Line 533:   // Set the predicate column with the predicate value.
             :   // This is done to avoid overshooting the lower bound on the 
predicate value
             :   // in the current scan range.
             :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
             :   gscoped_ptr<EncodedKey> key_with_pred_value =
             :       GetKeyWithPredicateVal(&partial_row, enc_key.Pass());
             :
             :   return key_iter_->SeekAtOrAfter(*key_with_pred_value, 
&exact_match, /* set_current_value= */ true);
> Like I noted above, I think this part should be factored into its own funct
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579
PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector<const 
void *> &raw_keys, int start_col_id, int end_col_id) {
             :   // Encode the key currently pointed to by validx_iter_.
             :   Arena arena2(1024);
             :   gscoped_ptr<EncodedKey> cur_enc_key;
             :   EncodedKey::DecodeEncodedString(
             :       base_data_->tablet_schema(), &arena2,
             :       Slice(key_iter_->GetCurrentValue()), &cur_enc_key);
             :
             :   for (int col_id = start_col_id; col_id <= end_col_id; 
col_id++) {
             :       if 
(base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) !=
             :          
base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id]))
 {
             :         return false;
             :       }
             :   }
             :   return true;
             : }
> Instead of checking that the key matches, why not just check that the predi
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596
PS7, Line 596: // This is a three seek approach for index skip scan 
implementation:
             : // 1. Place the validx_iter_ at the entry containing the next
             : //    unique "prefix_key" value.
             : // 2. Place the iterator at or after the entry formed by the 
"prefix_key"
             : //    found in 1. and the predicate value.
             : // 3. If the seek in 2. results in exact match with the 
predicate value,
             : //    store the row id that represents the last relevant entry 
(upper bound) wrt the
             : //    current "prefix_key"(found in 1.)
> move this inside the function to line 605, since it describes what we are d
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609
PS7, Line 609:   bool lower_bound_key_found;
             :   bool predicate_match = false;
> can we merge these two variables into the single variable lower_bound_key_f
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614
PS7, Line 614:   // Whether to seek for the next unique prefix, this flag is 
false
             :   // when the prefix key gets incremented while seeking for the 
lower bound
             :   // entry in the loop.
             :   bool continue_seeking_next_prefix = true;
> I think it would be more descriptive of the reasoning behind the program lo
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@621
PS7, Line 621:   while (!predicate_match) {
> how about:
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@678
PS7, Line 678:
> add: CHECK_OK(s) to ensure we didn't get something unexpected in 's'
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@681
PS7, Line 681: CheckKeyMatch
> Why do we care if we have an exact key match? Don't we only care if we have
Replaced with CheckPredicateMatch()


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@684
PS7, Line 684:     // We didn't find an exact match for our lower bound, so 
re-seek.
             :     if (!predicate_match)  {
             :       // If the prefix key has changed already, do not seek for 
the next prefix
             :       // in the next iteration.
             :       if (  !CheckKeyMatch(key_with_pred_value->raw_keys(),
             :                          0, skip_scan_predicate_column_id_ - 1)) 
{
             :         continue_seeking_next_prefix = false;
             :       }
             :       continue;
             :     }
> I think this could be equivalently written as:
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@703
PS7, Line 703:     // We add 1 here because we want to increment the value of a 
prefix that
             :     // includes even our predicate column.
             :     // Note: After this step, the current value pointed by 
validx_iter_ is an
             :     // exclusive upper bound on our scan range, therefore, 
key_iter->cur_val_ is not updated.
             :     // to avoid skipping over this current "prefix_key" in the 
next iteration.
             :     s = SeekToNextPrefixKey(skip_scan_predicate_column_id_+1, /* 
seek_to_upper_bound_key */ true);
> I think it would be easier to follow the logic if this said something more
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@723
PS7, Line 723: continue
> remove (doesn't do anything at the end of the loop)
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@730
PS7, Line 730: TODO
> change this to TODO(anupama)
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@749
PS7, Line 749: (static_cast<int>(cur_idx_) >= skip_scan_upper_bound_idx_)
> Move the check for this boundary condition inside SkipToNextScan() to encap
Done



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 06 Aug 2018 05:54:28 +0000
Gerrit-HasComments: Yes

Reply via email to