Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291: efficiently support predicates on non-prefix key components ...................................................................... Patch Set 18: (15 comments) http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19 PS16, Line 19: unti > nit: until Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238 PS17, Line 238: > nit: prefer using // instead of /**/ Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244 PS17, Line 244: k to the appropriate range of rows and > nit: how about "leftmost, non-leading primary key column with a predicate" Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245 PS17, Line 245: . The only exception is when a scan is pr > nit: how about something like "Value that the predicate column is predicate Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246 PS17, Line 246: columns of a composite N-column primary key, where K < N, > nit: how about "The collective columns to the left of the predicate column" Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247 PS17, Line 247: r al > nit: extra space, also maybe use : to separate these? I switched to new lines for the 'definition' parts. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270 PS17, Line 270: : // "prefix key" : // Any value i > nit: now that we have some definitions above, we don't need to redefine the Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276 PS17, Line 276: n order to keep track of the number of rows that satisfy the : // "predicate value" wrt a distinct prefix key we first seek to the first : // such row in the index column and this corresponding value is called : // "lower bound" key. Next, we seek to the row that is right after the last : // row that satisfies the "predicate value" wrt to the distinct prefix key, : // and this corresponding value is calle > I think this is getting a bit too deep into the implementation details (in Yep, I think that's a very good suggestion. I think I'll just tuck that into the .cc file. We can update the signature of those functions and re-structure those to have better semantics is a separate patch. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282 PS17, Line 282: // > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285 PS17, Line 285: // By the definition (see above), the skip scan optimization i > nit: "Seek to the next row that matches the predicate and has the same pref Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333 PS17, Line 333: 'remaining' will always be at least 1, although it is a TODO to allow it : // to be 0 (0 violates CHECK conditions elsewhere in the scan code). : // : // Currently, skip scan will be dynamically disabled when the number of seeks : // for distinct prefix keys exceeds sqrt(#total rows). We use sqrt(#total rows) : // as a cutoff because based on performance tests on upto 10M rows per tablet, > nit: standardize the bullet types Done http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270 PS16, Line 270: : // "prefix key" : // Any value i > nit: these are no longer needed now that we have the definitions up front Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675 PS17, Line 675: 2. Read that distinct prefix from the ad-hoc index, append the predicate value : // and minimum possible value for all other columns, and seek to that. : // If this matches, this is the lower bound of our desired scan. : // 3. If we found our desired lower bound, find an upper bound fo > nit: use " - " to bullet these two points Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803 PS17, Line 803: continue; : } : > Hmm, not only are we not caching the seeked value, but we're searching for Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589 PS17, Line 589: int num_matching = GenerateData(random, num_rows, predicate_col_id, predicate_val); : > Since this is generating random data, and running a scan for equality on a Right, the original idea was to generate almost the same order of matching rows as total rows (i.e. all or every N-th row would match). Fixed. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 18 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 18 Aug 2018 04:24:55 +0000 Gerrit-HasComments: Yes