[
https://issues.apache.org/jira/browse/IMPALA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637456#comment-17637456
]
Riza Suminto commented on IMPALA-10001:
---------------------------------------
Setting SORT_RUN_BYTES_LIMIT comes with a risk of unnecessarily spilling when
the query can actually fit all data in memory.
We have been using 512MB in our tpcds-impala-kit script for sometime now:
[https://github.com/cloudera/impala-tpcds-kit/blob/d829fc392a70df8300a8d9fd265977fa078a2dab/scripts/impala-insert.sql#L8]
Got to chat with [~noemi] who has been experimenting with sort implementation a
lot.
Generally we don't want to set SORT_RUN_BYTES_LIMIT too low as it can cause too
frequent spilling. But we also don't want to set it too high such that the cost
for in-memory sort + spilling an already too large sort-run can block for
minutes. SORT_RUN_BYTES_LIMIT=2G might be ideal to balance in-memory sort time
vs spill time.
> Find good default value for SORT_RUN_BYTES_LIMIT
> ------------------------------------------------
>
> Key: IMPALA-10001
> URL: https://issues.apache.org/jira/browse/IMPALA-10001
> Project: IMPALA
> Issue Type: Improvement
> Components: Perf Investigation
> Reporter: Riza Suminto
> Priority: Minor
>
> IMPALA-6692 add query option SORT_RUN_BYTES_LIMIT to trigger early sort
> before the query hit memory limit.
> Currently, it is disabled as default. We need to find a good default value
> for this query option.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]