Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 18:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19
PS16, Line 19: unti
> nit: until
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238
PS17, Line 238:
> nit: prefer using // instead of /**/
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244
PS17, Line 244: k to the appropriate range of rows and
> nit: how about "leftmost, non-leading primary key column with a predicate"
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245
PS17, Line 245: . The only exception is when a scan is pr
> nit: how about something like "Value that the predicate column is predicate
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246
PS17, Line 246: columns of a composite N-column primary key, where K < N,
> nit: how about "The collective columns to the left of the predicate column"
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247
PS17, Line 247: r al
> nit: extra space, also maybe use : to separate these?
I switched to new lines for the 'definition' parts.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270
PS17, Line 270:
              :   //   "prefix key"
              :   //     Any value i
> nit: now that we have some definitions above, we don't need to redefine the
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276
PS17, Line 276: n order to keep track of the number of rows that satisfy the
              :   //   "predicate value" wrt a distinct prefix key we first 
seek to the first
              :   //   such row in the index column and this corresponding 
value is called
              :   //   "lower bound" key. Next, we seek to the row that is 
right after the last
              :   //   row that satisfies the "predicate value" wrt to the 
distinct prefix key,
              :   //   and this corresponding value is calle
> I think this is getting a bit too deep into the implementation details (in
Yep, I think that's a very good suggestion.

I think I'll just tuck that into the .cc file.  We can update the signature of 
those functions and re-structure those to have better semantics is a separate 
patch.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282
PS17, Line 282:   //
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285
PS17, Line 285:   // By the definition (see above), the skip scan optimization i
> nit: "Seek to the next row that matches the predicate and has the same pref
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333
PS17, Line 333: 'remaining' will always be at least 1, although it is a TODO to 
allow it
              :   // to be 0 (0 violates CHECK conditions elsewhere in the scan 
code).
              :   //
              :   // Currently, skip scan will be dynamically disabled when the 
number of seeks
              :   // for distinct prefix keys exceeds sqrt(#total rows). We use 
sqrt(#total rows)
              :   // as a cutoff because based on performance tests on upto 10M 
rows per tablet,
> nit: standardize the bullet types
Done


http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270
PS16, Line 270:
              :   //   "prefix key"
              :   //     Any value i
> nit: these are no longer needed now that we have the definitions up front
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675
PS17, Line 675: 2. Read that distinct prefix from the ad-hoc index, append the 
predicate value
              :   //    and minimum possible value for all other columns, and 
seek to that.
              :   //    If this matches, this is the lower bound of our desired 
scan.
              :   // 3. If we found our desired lower bound, find an upper 
bound fo
> nit: use " - " to bullet these two points
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803
PS17, Line 803:   continue;
              :     }
              :
> Hmm, not only are we not caching the seeked value, but we're searching for
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589
PS17, Line 589: int num_matching = GenerateData(random, num_rows, 
predicate_col_id, predicate_val);
              :
> Since this is generating random data, and running a scan for equality on a
Right, the original idea was to generate almost the same order of matching rows 
as total rows (i.e. all or every N-th row would match).

Fixed.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 18
Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 18 Aug 2018 04:24:55 +0000
Gerrit-HasComments: Yes

Reply via email to