Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19
PS16, Line 19: till
nit: until


http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270
PS16, Line 270: prefix key refers to the first "num_prefix_cols" columns of the 
current key.
              :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
              :   // this function).
nit: these are no longer needed now that we have the definitions up front


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675
PS17, Line 675: the primary key (PK) value index (ad-hoc index) search pointer, 
which gives us
              :   // a forward index of value to position, and
              :   // the scan pointer, which is cur_idx_, representing an 
offset one more than
              :   // the last row that we actually scanned (in the previous 
batch).
nit: use " - " to bullet these two points


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803
PS17, Line 803: // Note: To handle a similar situation (as illustrated with 
Tables above) when finding the
              :     // upper bound key offset, we follow a different approach. 
We simply do not cache
              :     // the seeked value for the upper bound key, hence 
cache_seeked_value = false below.
Hmm, not only are we not caching the seeked value, but we're searching for a 
different value than if `cache_seeked_value` were true, right? I know it's 
documented in SeekToNextPrefixKey(), but I think it's worth explaining here, 
since AFAICT this is the only place we're using `cache_seeked_value=false`. The 
comment in SeekToNextPrefixKey() explains this is "required for correctness," 
but here is an appropriate place to explain why.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589
PS17, Line 589: const int num_matching = GenerateData(random, num_rows, 
predicate_col_id, predicate_val);
              :
Since this is generating random data, and running a scan for equality on a 
single value, wouldn't it be pretty unlikely that `num_matching` is non-zero? 
Maybe we could inflate the probability that `predicate_val` shows up by adding 
a `inject_predicate_val_probability` parameter to GenerateData() that will 
inject `predicate_val` to the `predicate_col_id` column with some probability.

Looking at GenerateData(), it seems like generated data is capped between [0, 
1000), so I don't think this will ever return rows? Same 
kMiddleColumnMinKeyValue.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 17
Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 17 Aug 2018 22:13:33 +0000
Gerrit-HasComments: Yes

Reply via email to