[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-21 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 13:

> Patch Set 13:
>
> > > Patch Set 12: Verified+1 Code-Review+2
>  > >
>  > > Looked at the failed build, the errors seem unrelated flaky tests
>  > (e.g. IMPALA-10754)
>  >
>  > According to 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15800/testReport/junit/org.apache.impala.planner/CardinalityTest/testAggregationNodeGroupByCardinalityCapping/,
>  > it seems this patch introduced some non-determinism.
>
> This shouldn't be related, as the failed test is COUNT DISTINCT related, so 
> it shouldn't be affected by changes in sort. + the test is about FE 
> estimations, and shouldn't be affected by BE only changes.
>
> One possibility I see is that the changed order of elements could have 
> affected the tables written during dataload - but this shouldn't be a new 
> thing, as the equal elements were never deterministic during sorting.

Thanks for the analysis Csaba!

Yeah I think you are right.

According to 
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java#L42,
 we have {{CARDINALITY_TOLERANCE}} that is used to tune how tolerant we are 
with respect to a given cardinality estimate. The value is currently set to 
0.05. Maybe we could consider increaseing this a bit, e.g., to 0.07.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 13
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 21 Feb 2022 22:41:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-21 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 13:

> > Patch Set 12: Verified+1 Code-Review+2
 > >
 > > Looked at the failed build, the errors seem unrelated flaky tests
 > (e.g. IMPALA-10754)
 >
 > According to 
 > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15800/testReport/junit/org.apache.impala.planner/CardinalityTest/testAggregationNodeGroupByCardinalityCapping/,
 > it seems this patch introduced some non-determinism.

This shouldn't be related, as the failed test is COUNT DISTINCT related, so it 
shouldn't be affected by changes in sort. + the test is about FE estimations, 
and shouldn't be affected by BE only changes.

One possibility I see is that the changed order of elements could have affected 
the tables written during dataload - but this shouldn't be a new thing, as the 
equal elements were never deterministic during sorting.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 13
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 21 Feb 2022 19:16:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-21 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 13:

> Patch Set 12: Verified+1 Code-Review+2
>
> Looked at the failed build, the errors seem unrelated flaky tests (e.g. 
> IMPALA-10754)

According to 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15800/testReport/junit/org.apache.impala.planner/CardinalityTest/testAggregationNodeGroupByCardinalityCapping/,
 it seems this patch introduced some non-determinism.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 13
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 21 Feb 2022 17:42:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..

IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

Based on a 3-way partitioning implementation by Kurt Deschler.
3-way quicksort performs much better on data with large number of
duplicates, but has a small regression in case of large NDV.
This adaptive implementation keeps the advantages of both 2-way
and 3-way quicksort. If duplicates are found during pivot selection
(among the 3 randomly selected candidates),the 3-way partitioning
function is called in SortHelper, otherwise partitioning goes 2-way.

Some benchmark results:
On a view created from 4 tpch_parquet lineitem tables
Full sort, 1 node, 1 run - no spills (only in-memory sort is changed)
Time of sorting adaptively during query execution compared to
the original implementation (sort node profile):

 
+--+++
 | Test | Original 2-way | Adaptive 
Quicksort |
 
+--+++
 | select * order by l_linestatus, NDV=2:   |  1 |  
 0.67 |
 | select l_shipmode order by l_shipmode, NDV=7 |  1 |  
 0.42 |
 | select * order by l_shipmode, NDV=7  |  1 |  
 0.57 |
 | large NDV, unique data   |  1 |  
1 | (no difference)
 
+--+++

Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Reviewed-on: http://gerrit.cloudera.org:8080/18184
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
Tested-by: Csaba Ringhofer 
---
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter-ir.cc
M be/src/runtime/sorter.cc
M be/src/util/tuple-row-compare.h
4 files changed, 189 insertions(+), 50 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Csaba Ringhofer: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 13
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 12: Verified+1 Code-Review+2

Looked at the failed build, the errors seem unrelated flaky tests (e.g. 
IMPALA-10754)


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 12
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 19 Feb 2022 18:57:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has removed a vote on this change.

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 12
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 12: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7855/


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 12
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 14:44:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10183/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 11
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 08:35:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10182/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 10
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 08:16:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7855/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 12
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 08:11:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 12
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 08:11:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 11
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 08:11:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-18 Thread Noemi Pap-Takacs (Code Review)
Hello Kurt Deschler, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18184

to look at the new patch set (#11).

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..

IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

Based on a 3-way partitioning implementation by Kurt Deschler.
3-way quicksort performs much better on data with large number of
duplicates, but has a small regression in case of large NDV.
This adaptive implementation keeps the advantages of both 2-way
and 3-way quicksort. If duplicates are found during pivot selection
(among the 3 randomly selected candidates),the 3-way partitioning
function is called in SortHelper, otherwise partitioning goes 2-way.

Some benchmark results:
On a view created from 4 tpch_parquet lineitem tables
Full sort, 1 node, 1 run - no spills (only in-memory sort is changed)
Time of sorting adaptively during query execution compared to
the original implementation (sort node profile):

 
+--+++
 | Test | Original 2-way | Adaptive 
Quicksort |
 
+--+++
 | select * order by l_linestatus, NDV=2:   |  1 |  
 0.67 |
 | select l_shipmode order by l_shipmode, NDV=7 |  1 |  
 0.42 |
 | select * order by l_shipmode, NDV=7  |  1 |  
 0.57 |
 | large NDV, unique data   |  1 |  
1 | (no difference)
 
+--+++

Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
---
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter-ir.cc
M be/src/runtime/sorter.cc
M be/src/util/tuple-row-compare.h
4 files changed, 189 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/18184/11
--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 11
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18184/10/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/18184/10/be/src/util/tuple-row-compare.h@138
PS10, Line 138: ordering_expr_evals_lhs_.data(), 
ordering_expr_evals_rhs_.data(), lhs, rhs);
nit: weird indentation



--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 10
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Feb 2022 07:55:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-17 Thread Noemi Pap-Takacs (Code Review)
Hello Kurt Deschler, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18184

to look at the new patch set (#10).

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..

IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

Based on a 3-way partitioning implementation by Kurt Deschler.
3-way quicksort performs much better on data with large number of
duplicates, but has a small regression in case of large NDV.
This adaptive implementation keeps the advantages of both 2-way
and 3-way quicksort. If duplicates are found during pivot selection
(among the 3 randomly selected candidates),the 3-way partitioning
function is called in SortHelper, otherwise partitioning goes 2-way.

Some benchmark results:
On a view created from 4 tpch_parquet lineitem tables
Full sort, 1 node, 1 run - no spills (only in-memory sort is changed)
Time of sorting adaptively during query execution compared to
the original implementation (sort node profile):

 
+--+++
 | Test | Original 2-way | Adaptive 
Quicksort |
 
+--+++
 | select * order by l_linestatus, NDV=2:   |  1 |  
 0.67 |
 | select l_shipmode order by l_shipmode, NDV=7 |  1 |  
 0.42 |
 | select * order by l_shipmode, NDV=7  |  1 |  
 0.57 |
 | large NDV, unique data   |  1 |  
1 | (no difference)
 
+--+++

Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
---
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter-ir.cc
M be/src/runtime/sorter.cc
M be/src/util/tuple-row-compare.h
4 files changed, 184 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/18184/10
--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 10
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7842/


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 14:20:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9: -Code-Review

I couldn't deduce the cause of the failures - there was a crash in llvm code, 
not sure if it is related to this change.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 08:15:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7842/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 07:52:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7840/


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 00:12:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 19:06:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7840/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:38:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 15:34:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 8: Code-Review+1

(1 comment)

The change looks great!

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG@23
PS7, Line 23:  
+--+++
:  | Test | Original 2-way 
| Adaptive Quicksort |
:  
+--+++
:  | select * order by l_linestatus, NDV=2:   |  1 
|   0.67 |
:  | select l_shipmode order by l_shipmode, NDV=7 |  1 
|   0.42 |
:  | select * order by l_shipmode, NDV=7  |  1 
|   0.57 |
:  | large NDV, unique data   |
> Thank you for the suggestion. Should I write the actual (average) execution
I'm OK with the ratio.



--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Feb 2022 16:17:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Feb 2022 15:22:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10142/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Feb 2022 15:04:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-11 Thread Noemi Pap-Takacs (Code Review)
Hello Kurt Deschler, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18184

to look at the new patch set (#8).

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..

IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

Based on a 3-way partitioning implementation by Kurt Deschler.
3-way quicksort performs much better on data with large number of
duplicates, but has a small regression in case of large NDV.
This adaptive implementation keeps the advantages of both 2-way
and 3-way quicksort. If duplicates are found during pivot selection
(among the 3 randomly selected candidates),the 3-way partitioning
function is called in SortHelper, otherwise partitioning goes 2-way.

Some benchmark results:
On a view created from 4 tpch_parquet lineitem tables
Full sort, 1 node, 1 run - no spills (only in-memory sort is changed)
Time of sorting adaptively during query execution compared to
the original implementation (sort node profile):

 
+--+++
 | Test | Original 2-way | Adaptive 
Quicksort |
 
+--+++
 | select * order by l_linestatus, NDV=2:   |  1 |  
 0.67 |
 | select l_shipmode order by l_shipmode, NDV=7 |  1 |  
 0.42 |
 | select * order by l_shipmode, NDV=7  |  1 |  
 0.57 |
 | large NDV, unique data   |  1 |  
1 | (no difference)
 
+--+++

Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
---
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter-ir.cc
M be/src/runtime/sorter.cc
3 files changed, 182 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/18184/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-11 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG@9
PS7, Line 9: 3way
> nit: 3-way
Done


http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG@23
PS7, Line 23: select * order by l_linestatus, NDV=2:
: original 2-way vs adaptive quicksort - 1 : 0.67
: select l_shipmode order by l_shipmode, NDV=7:
: original 2-way vs adaptive quicksort - 1 : 0.42
: select * order by l_shipmode, NDV=7:
: original 2-way vs adaptive quicksort - 1 : 0.57
: large NDV, unique data: no significant difference
> Using https://ozh.github.io/ascii-tables/
Thank you for the suggestion. Should I write the actual (average) execution 
time in ms or is the ratio better to compare the numbers?


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-internal.h
File be/src/runtime/sorter-internal.h:

http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-internal.h@533
PS7, Line 533: hasEquals
> nit: we name variables with underscores instead of camel case in the backen
Done


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc
File be/src/runtime/sorter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@157
PS7, Line 157: reinterpret_cast(_tuple)
> nit: Since it's being used at multiple places, maybe create a variable 'tem
Good idea, makes the code more readable. I just followed the original naming 
convention.  I think it was named temp_tuple instead of pivot to avoid 
confusion that it is not the actual pivot's tuple.
I renamed it in Partition2way() as well.


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@260
PS7, Line 260: hasEquals
> nit: should be 'has_equals'
Done


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@320
PS7, Line 320: lt
> Maybe rename it to 't1_cmp_t2'?
makes sense to me, done



--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 11 Feb 2022 14:25:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-08 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 7:

(6 comments)

I'm really excited about this change as it will greatly improve partitioned 
INSERTs.

The change looks really good, I've only found a few naming issues.

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG@9
PS7, Line 9: 3way
nit: 3-way


http://gerrit.cloudera.org:8080/#/c/18184/7//COMMIT_MSG@23
PS7, Line 23: select * order by l_linestatus, NDV=2:
: original 2-way vs adaptive quicksort - 1 : 0.67
: select l_shipmode order by l_shipmode, NDV=7:
: original 2-way vs adaptive quicksort - 1 : 0.42
: select * order by l_shipmode, NDV=7:
: original 2-way vs adaptive quicksort - 1 : 0.57
: large NDV, unique data: no significant difference
Using https://ozh.github.io/ascii-tables/

(for these benchmark tables it doesn't matter if the line widths are longer)

 
+--+++
 | Test | Original 2-way | Adaptive 
Quicksort |
 
+--+++
 | select * order by l_linestatus, NDV=2:   |  1 |  
 0.67 |
 | select l_shipmode order by l_shipmode, NDV=7 |  1 |  
 0.42 |
 | select * order by l_shipmode, NDV=7  |  1 |  
 0.57 |
 | large NDV, unique data   |  1 |  
1 | (no difference)
 
+--+++


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-internal.h
File be/src/runtime/sorter-internal.h:

http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-internal.h@533
PS7, Line 533: hasEquals
nit: we name variables with underscores instead of camel case in the backend. 
I.e. it should be 'has_equals'


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc
File be/src/runtime/sorter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@157
PS7, Line 157: reinterpret_cast(_tuple)
nit: Since it's being used at multiple places, maybe create a variable 
'temp_tuple_row' for it.

And how about calling 'temp_tuple' as 'pivot_tuple' and 'temp_tuple_row' as 
'pivot_tuple_row'?


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@260
PS7, Line 260: hasEquals
nit: should be 'has_equals'


http://gerrit.cloudera.org:8080/#/c/18184/7/be/src/runtime/sorter-ir.cc@320
PS7, Line 320: lt
Maybe rename it to 't1_cmp_t2'?



--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 08 Feb 2022 19:34:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10119/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Tue, 08 Feb 2022 11:27:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Tue, 08 Feb 2022 11:11:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-08 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18184


Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..

IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

Based on a 3way partitioning implementation by Kurt Deschler.
3-way quicksort performs much better on data with large number of
duplicates, but has a small regression in case of large NDV.
This adaptive implementation keeps the advantages of both 2-way
and 3-way quicksort. If duplicates are found during pivot selection
(among the 3 randomly selected candidates),the 3-way partitioning
function is called in SortHelper, otherwise partitioning goes 2-way.

Some benchmark results:
On a view created from 4 tpch_parquet lineitem tables
Full sort, 1 node, 1 run - no spills (only in-memory sort is changed)
Time of sorting adaptively during query execution compared to
the original implementation (sort node profile):

select * order by l_linestatus, NDV=2:
original 2-way vs adaptive quicksort - 1 : 0.67
select l_shipmode order by l_shipmode, NDV=7:
original 2-way vs adaptive quicksort - 1 : 0.42
select * order by l_shipmode, NDV=7:
original 2-way vs adaptive quicksort - 1 : 0.57
large NDV, unique data: no significant difference

Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
---
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter-ir.cc
M be/src/runtime/sorter.cc
3 files changed, 174 insertions(+), 43 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/18184/7
--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Noemi Pap-Takacs