[ 
https://issues.apache.org/jira/browse/IMPALA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637456#comment-17637456
 ] 

Riza Suminto commented on IMPALA-10001:
---------------------------------------

Setting SORT_RUN_BYTES_LIMIT comes with a risk of unnecessarily spilling when 
the query can actually fit all data in memory.
We have been using 512MB in our tpcds-impala-kit script for sometime now:
[https://github.com/cloudera/impala-tpcds-kit/blob/d829fc392a70df8300a8d9fd265977fa078a2dab/scripts/impala-insert.sql#L8]
 

Got to chat with [~noemi] who has been experimenting with sort implementation a 
lot.
Generally we don't want to set SORT_RUN_BYTES_LIMIT too low as it can cause too 
frequent spilling. But we also don't want to set it too high such that the cost 
for in-memory sort + spilling an already too large sort-run can block for 
minutes. SORT_RUN_BYTES_LIMIT=2G might be ideal to balance in-memory sort time 
vs spill time.

> Find good default value for SORT_RUN_BYTES_LIMIT
> ------------------------------------------------
>
>                 Key: IMPALA-10001
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10001
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Perf Investigation
>            Reporter: Riza Suminto
>            Priority: Minor
>
> IMPALA-6692 add query option SORT_RUN_BYTES_LIMIT to trigger early sort 
> before the query hit memory limit.
> Currently, it is disabled as default. We need to find a good default value 
> for this query option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to