[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yingchun Lai has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. KUDU-1644: Simplify InList predicate values based on rowset PK bounds Previous we only optimize InList predicates based on tablet PK bounds, we can also optimize it at the DRS level. By adding the implicit PK bounds, InList predicate can be simplified. Also, the DRS bounds info can be used to skip rows effectively when we have a predicate on a non-prefix of the primary key and the leading column(s) have cardinality=1 (as described in KUDU-1291). Benchmark tests result(in slow mode): before Selected 1 rows cost 2.519996 seconds. # PredicateOnFirstColumn Selected 100 rows cost 2.040003 seconds. # PredicateOnSecondColumn after Selected 1 rows cost 1.771755 seconds. # PredicateOnFirstColumn Selected 100 rows cost 0.131996 seconds. # PredicateOnSecondColumn Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Reviewed-on: http://gerrit.cloudera.org:8080/18434 Tested-by: Kudu Jenkins Reviewed-by: Yingchun Lai --- M src/kudu/common/generic_iterators.cc M src/kudu/common/scan_spec.cc M src/kudu/common/scan_spec.h M src/kudu/tablet/cfile_set-test.cc M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h 6 files changed, 247 insertions(+), 39 deletions(-) Approvals: Kudu Jenkins: Verified Yingchun Lai: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 30 Jun 2022 12:09:37 + Gerrit-HasComments: No
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Fri, 24 Jun 2022 10:58:15 + Gerrit-HasComments: No
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11 PS2, Line 11: Also, the DRS bounds info can be used to skip rows : effectively when we have a predicate on a non-prefix of the primary key and the : leading column(s) have cardinality=1 (as described in KUDU-1291). > I see, so it's more that for rowsets with leading column cardinality = 1, t It can be used to range predicates too. But this optimization may not improve the scan performance for range predicates because it can neither simplify the predicate evaluation nor skip rows. -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sat, 07 May 2022 05:57:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 2: Code-Review+1 (2 comments) Would be great if you could rebase; I'm not sure what test failures there were here. http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11 PS2, Line 11: Also, the DRS bounds info can be used to skip rows : effectively when we have a predicate on a non-prefix of the primary key and the : leading column(s) have cardinality=1 (as described in KUDU-1291). > Yes, we can use the bound information to seek where we start to scan in a r I see, so it's more that for rowsets with leading column cardinality = 1, the the per-rowset ScanSpec optimization reduces to a skip scan. Sounds good. Doesn't this optimization also extend beyond InList predicates? Since it's using lower and uppser bounds, wouldn't it apply to range predicates as well? http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439 PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key); > Because we create a ScanSpec copy for each CFileSet Iterator: https://githu Ah, makes sense. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sat, 07 May 2022 01:19:39 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11 PS2, Line 11: Also, the DRS bounds info can be used to skip rows : effectively when we have a predicate on a non-prefix of the primary key and the : leading column(s) have cardinality=1 (as described in KUDU-1291). > Was this implemented in this patch? I'm not sure I saw it. Yes, we can use the bound information to seek where we start to scan in a rowset. That makes us skipping rows or even a whole rowset. If we have two columns (a, b) in a rowset, the rowset has bound (1, 1) and (1,10), and we have a scan request with a predicate b >= 2. After optimizing ScanSpec, we will get the bound (1, 2) and (1, 10), and then we can use these bounds to get lower_bound_idx and upper_bound_idx, so we can scan at the row (1, 2). http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439 PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key); > I'm confused about why we're able to modify the ScanSpec for every CFileSet Because we create a ScanSpec copy for each CFileSet Iterator: https://github.com/apache/kudu/blob/master/src/kudu/common/generic_iterators.cc#L1026 -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 04 May 2022 11:24:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yifan Zhang has removed Tidy Bot from this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Removed reviewer Tidy Bot. -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/18434 ) Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11 PS2, Line 11: Also, the DRS bounds info can be used to skip rows : effectively when we have a predicate on a non-prefix of the primary key and the : leading column(s) have cardinality=1 (as described in KUDU-1291). Was this implemented in this patch? I'm not sure I saw it. http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439 PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key); I'm confused about why we're able to modify the ScanSpec for every CFileSet. The ScanSpec is shared for a single tablet scan, isn't it? Won't modifying it for one CFileSet interfere with the iteration of another CFileSet? -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 03 May 2022 19:07:46 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18434 to look at the new patch set (#2). Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. KUDU-1644: Simplify InList predicate values based on rowset PK bounds Previous we only optimize InList predicates based on tablet PK bounds, we can also optimize it at the DRS level. By adding the implicit PK bounds, InList predicate can be simplified. Also, the DRS bounds info can be used to skip rows effectively when we have a predicate on a non-prefix of the primary key and the leading column(s) have cardinality=1 (as described in KUDU-1291). Benchmark tests result(in slow mode): before Selected 1 rows cost 2.519996 seconds. # PredicateOnFirstColumn Selected 100 rows cost 2.040003 seconds. # PredicateOnSecondColumn after Selected 1 rows cost 1.771755 seconds. # PredicateOnFirstColumn Selected 100 rows cost 0.131996 seconds. # PredicateOnSecondColumn Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 --- M src/kudu/common/generic_iterators.cc M src/kudu/common/scan_spec.cc M src/kudu/common/scan_spec.h M src/kudu/tablet/cfile_set-test.cc M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h 6 files changed, 247 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/18434/2 -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18434 Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK bounds .. KUDU-1644: Simplify InList predicate values based on rowset PK bounds Previous we only optimize InList predicates based on tablet PK bounds, we can also optimize it at the DRS level. By adding the implicit PK bounds, InList predicate can be simplified. Also, the DRS bounds info can be used to skip rows effectively when we have a predicate on a non-prefix of the primary key and the leading column(s) have cardinality=1 (as described in KUDU-1291). Benchmark tests result(in slow mode): before Selected 1 rows cost 2.41999 seconds. # PredicateOnFirstColumn Selected 100 rows cost 1.847989 seconds. # PredicateOnSecondColumn after Selected 1 rows cost 1.807751 seconds. # PredicateOnFirstColumn Selected 100 rows cost 0.108024 seconds. # PredicateOnSecondColumn Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 --- M src/kudu/common/generic_iterators.cc M src/kudu/common/scan_spec.cc M src/kudu/common/scan_spec.h M src/kudu/tablet/cfile_set-test.cc M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h 6 files changed, 236 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/18434/1 -- To view, visit http://gerrit.cloudera.org:8080/18434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80 Gerrit-Change-Number: 18434 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang