Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 17: (5 comments) http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19 PS16, Line 19: till nit: until http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270 PS16, Line 270: prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). nit: these are no longer needed now that we have the definitions up front http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675 PS17, Line 675: the primary key (PK) value index (ad-hoc index) search pointer, which gives us : // a forward index of value to position, and : // the scan pointer, which is cur_idx_, representing an offset one more than : // the last row that we actually scanned (in the previous batch). nit: use " - " to bullet these two points http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803 PS17, Line 803: // Note: To handle a similar situation (as illustrated with Tables above) when finding the : // upper bound key offset, we follow a different approach. We simply do not cache : // the seeked value for the upper bound key, hence cache_seeked_value = false below. Hmm, not only are we not caching the seeked value, but we're searching for a different value than if `cache_seeked_value` were true, right? I know it's documented in SeekToNextPrefixKey(), but I think it's worth explaining here, since AFAICT this is the only place we're using `cache_seeked_value=false`. The comment in SeekToNextPrefixKey() explains this is "required for correctness," but here is an appropriate place to explain why. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589 PS17, Line 589: const int num_matching = GenerateData(random, num_rows, predicate_col_id, predicate_val); : Since this is generating random data, and running a scan for equality on a single value, wouldn't it be pretty unlikely that `num_matching` is non-zero? Maybe we could inflate the probability that `predicate_val` shows up by adding a `inject_predicate_val_probability` parameter to GenerateData() that will inject `predicate_val` to the `predicate_col_id` column with some probability. Looking at GenerateData(), it seems like generated data is capped between [0, 1000), so I don't think this will ever return rows? Same kMiddleColumnMinKeyValue. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 17 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 17 Aug 2018 22:13:33 +0000 Gerrit-HasComments: Yes