[
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-988:
-
Description:
The amount of available memory changes the trade-off between partitioned and
shuffle join strategies: if switching to shuffle join can avoid spilling to
disk, it may be worth paying the cost of the additional network transfer.
There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore
process mem-limit.
2. Join strategy decision does not take other joins of the same query into
account. When multiple joins are present, memory consumption can be very high.
I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 -
there's a phase ordering problem here - we currently choose the best-performing
plan then decide how much memory to allocate in admission control based on that
plan. We can't preserve that while attempting to change the plan to fit the
mem_limit. That said, I think the current heuristic is a little too aggressive
about picking broadcast when the right side is very large - it should probably
bias more towards shuffle as the right side gets larger.
Note that when IMPALA-3200 is completed, this shouldn't prevent the query
running to completion, but still affects performance.
was:
The amount of available memory changes the trade-off between partitioned and
shuffle join strategies: if switching to shuffle join can avoid spilling to
disk, it may be worth paying the cost of the additional network transfer.
There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore
process mem-limit.
2. Join strategy decision does not take other joins of the same query into
account. When multiple joins are present, it'll go over the mem-limit.
Note that when IMPALA-3200 is completed, this shouldn't prevent the query
running to completion, but still affects performance.
> Join strategy (broadcast vs shuffle) decision does not take memory
> consumption and other joins into account
> ---
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
> Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and
> shuffle join strategies: if switching to shuffle join can avoid spilling to
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into
> account. When multiple joins are present, memory consumption can be very high.
> I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 -
> there's a phase ordering problem here - we currently choose the
> best-performing plan then decide how much memory to allocate in admission
> control based on that plan. We can't preserve that while attempting to change
> the plan to fit the mem_limit. That said, I think the current heuristic is a
> little too aggressive about picking broadcast when the right side is very
> large - it should probably bias more towards shuffle as the right side gets
> larger.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query
> running to completion, but still affects performance.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org