Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3354: bad sorter pivot selection on some inputs
......................................................................


IMPALA-3354: bad sorter pivot selection on some inputs

Switch to a median of three random tuples that should be very robust to
a range of inputs. It may be slightly worse than the existing pivot
selection on some inputs where the original algorithm is close to
optimal (e.g. already sorted inputs), but should be typically
better overall.

Always recurse on the smaller partition: this prevent the stack
overflow even with bad pivot selection.

The overhead is minimal - in profiles for small sorts I'm seeing pivot
selection take at most 0.5% of CPU time.

The improved pivot selections gives modest improvements of 2-5% on the
targeted perf order by benchmarks on a single node run with TPC-H
scale factor 20.

Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452
Reviewed-on: http://gerrit.cloudera.org:8080/2824
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Internal Jenkins
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 127 insertions(+), 8 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/2824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong <[email protected]>

Reply via email to