[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-06-30 Thread Yingchun Lai (Code Review)
Yingchun Lai has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..

KUDU-1644: Simplify InList predicate values based on rowset PK bounds

Previous we only optimize InList predicates based on tablet PK bounds, we can
also optimize it at the DRS level. By adding the implicit PK bounds, InList
predicate can be simplified. Also, the DRS bounds info can be used to skip rows
effectively when we have a predicate on a non-prefix of the primary key and the
leading column(s) have cardinality=1 (as described in KUDU-1291).

Benchmark tests result(in slow mode):
before
Selected 1 rows cost 2.519996 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 2.040003 seconds. # PredicateOnSecondColumn
after
Selected 1 rows cost 1.771755 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 0.131996 seconds. # PredicateOnSecondColumn

Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Reviewed-on: http://gerrit.cloudera.org:8080/18434
Tested-by: Kudu Jenkins
Reviewed-by: Yingchun Lai 
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
6 files changed, 247 insertions(+), 39 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Yingchun Lai: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Yingchun Lai 


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-06-30 Thread Yingchun Lai (Code Review)
Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Thu, 30 Jun 2022 12:09:37 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-06-24 Thread Yingchun Lai (Code Review)
Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Fri, 24 Jun 2022 10:58:15 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-05-06 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11
PS2, Line 11: Also, the DRS bounds info can be used to skip rows
: effectively when we have a predicate on a non-prefix of the 
primary key and the
: leading column(s) have cardinality=1 (as described in KUDU-1291).
> I see, so it's more that for rowsets with leading column cardinality = 1, t
It can be used to range predicates too. But this optimization may not improve 
the scan performance for range predicates because it can neither simplify the 
predicate evaluation nor skip rows.



--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 3
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sat, 07 May 2022 05:57:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-05-06 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 2: Code-Review+1

(2 comments)

Would be great if you could rebase; I'm not sure what test failures there were 
here.

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11
PS2, Line 11: Also, the DRS bounds info can be used to skip rows
: effectively when we have a predicate on a non-prefix of the 
primary key and the
: leading column(s) have cardinality=1 (as described in KUDU-1291).
> Yes, we can use the bound information to seek where we start to scan in a r
I see, so it's more that for rowsets with leading column cardinality = 1, the 
the per-rowset ScanSpec optimization reduces to a skip scan.

Sounds good. Doesn't this optimization also extend beyond InList predicates? 
Since it's using lower and uppser bounds, wouldn't it apply to range predicates 
as well?


http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439
PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key);
> Because we create a ScanSpec copy for each CFileSet Iterator: https://githu
Ah, makes sense. Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Sat, 07 May 2022 01:19:39 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-05-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11
PS2, Line 11: Also, the DRS bounds info can be used to skip rows
: effectively when we have a predicate on a non-prefix of the 
primary key and the
: leading column(s) have cardinality=1 (as described in KUDU-1291).
> Was this implemented in this patch? I'm not sure I saw it.
Yes, we can use the bound information to seek where we start to scan in a 
rowset. That makes us skipping rows or even a whole rowset.

If we have two columns (a, b) in a rowset, the rowset has bound (1, 1) and 
(1,10), and we have a scan request with a predicate b >= 2. After optimizing 
ScanSpec, we will get the bound (1, 2) and (1, 10), and then we can use these 
bounds to get lower_bound_idx and upper_bound_idx, so we can scan at the row 
(1, 2).


http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439
PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key);
> I'm confused about why we're able to modify the ScanSpec for every CFileSet
Because we create a ScanSpec copy for each CFileSet Iterator: 
https://github.com/apache/kudu/blob/master/src/kudu/common/generic_iterators.cc#L1026



--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 04 May 2022 11:24:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-05-04 Thread Yifan Zhang (Code Review)
Yifan Zhang has removed Tidy Bot from this change.  ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Removed reviewer Tidy Bot.
--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang 


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-05-03 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18434 )

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18434/2//COMMIT_MSG@11
PS2, Line 11: Also, the DRS bounds info can be used to skip rows
: effectively when we have a predicate on a non-prefix of the 
primary key and the
: leading column(s) have cardinality=1 (as described in KUDU-1291).
Was this implemented in this patch? I'm not sure I saw it.


http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/18434/2/src/kudu/tablet/cfile_set.cc@439
PS2, Line 439: spec->SetLowerBoundKey(implicit_lb_key);
I'm confused about why we're able to modify the ScanSpec for every CFileSet. 
The ScanSpec is shared for a single tablet scan, isn't it? Won't modifying it 
for one CFileSet interfere with the iteration of another CFileSet?



--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 May 2022 19:07:46 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-04-20 Thread Yifan Zhang (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18434

to look at the new patch set (#2).

Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..

KUDU-1644: Simplify InList predicate values based on rowset PK bounds

Previous we only optimize InList predicates based on tablet PK bounds, we can
also optimize it at the DRS level. By adding the implicit PK bounds, InList
predicate can be simplified. Also, the DRS bounds info can be used to skip rows
effectively when we have a predicate on a non-prefix of the primary key and the
leading column(s) have cardinality=1 (as described in KUDU-1291).

Benchmark tests result(in slow mode):
before
Selected 1 rows cost 2.519996 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 2.040003 seconds. # PredicateOnSecondColumn
after
Selected 1 rows cost 1.771755 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 0.131996 seconds. # PredicateOnSecondColumn

Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
6 files changed, 247 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/18434/2
--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-1644: Simplify InList predicate values based on rowset PK bounds

2022-04-20 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18434


Change subject: KUDU-1644: Simplify InList predicate values based on rowset PK 
bounds
..

KUDU-1644: Simplify InList predicate values based on rowset PK bounds

Previous we only optimize InList predicates based on tablet PK bounds, we can
also optimize it at the DRS level. By adding the implicit PK bounds, InList
predicate can be simplified. Also, the DRS bounds info can be used to skip rows
effectively when we have a predicate on a non-prefix of the primary key and the
leading column(s) have cardinality=1 (as described in KUDU-1291).

Benchmark tests result(in slow mode):
before
Selected 1 rows cost 2.41999 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 1.847989 seconds. # PredicateOnSecondColumn
after
Selected 1 rows cost 1.807751 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 0.108024 seconds. # PredicateOnSecondColumn

Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
---
M src/kudu/common/generic_iterators.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
6 files changed, 236 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/18434/1
--
To view, visit http://gerrit.cloudera.org:8080/18434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Gerrit-Change-Number: 18434
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang