[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-988:
-
Description: 
The amount of available memory changes the trade-off between partitioned and 
shuffle join strategies: if switching to shuffle join can avoid spilling to 
disk, it may be worth paying the cost of the additional network transfer.

There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore 
process mem-limit.
2. Join strategy decision does not take other joins of the same query into 
account. When multiple joins are present, memory consumption can be very high.

I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - 
there's a phase ordering problem here - we currently choose the best-performing 
plan then decide how much memory to allocate in admission control based on that 
plan. We can't preserve that while attempting to change the plan to fit  the 
mem_limit. That said, I think the current heuristic is a little too aggressive 
about picking broadcast when the right side is very large - it should probably 
bias more towards shuffle as the right side gets larger.

Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
running to completion, but still affects performance.

  was:
The amount of available memory changes the trade-off between partitioned and 
shuffle join strategies: if switching to shuffle join can avoid spilling to 
disk, it may be worth paying the cost of the additional network transfer.

There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore 
process mem-limit.
2. Join strategy decision does not take other joins of the same query into 
account. When multiple joins are present, it'll go over the mem-limit.

Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
running to completion, but still affects performance.


> Join strategy (broadcast vs shuffle) decision does not take memory 
> consumption and other joins into account
> ---
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and 
> shuffle join strategies: if switching to shuffle join can avoid spilling to 
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore 
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into 
> account. When multiple joins are present, memory consumption can be very high.
> I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - 
> there's a phase ordering problem here - we currently choose the 
> best-performing plan then decide how much memory to allocate in admission 
> control based on that plan. We can't preserve that while attempting to change 
> the plan to fit  the mem_limit. That said, I think the current heuristic is a 
> little too aggressive about picking broadcast when the right side is very 
> large - it should probably bias more towards shuffle as the right side gets 
> larger.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
> running to completion, but still affects performance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-988:
-
Summary: Join strategy (broadcast vs shuffle) decision does not take memory 
consumption and other joins into account  (was: Join strategy (broadcast vs 
shuffle) decision does not take mem limit and other joins into account)

> Join strategy (broadcast vs shuffle) decision does not take memory 
> consumption and other joins into account
> ---
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and 
> shuffle join strategies: if switching to shuffle join can avoid spilling to 
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore 
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into 
> account. When multiple joins are present, it'll go over the mem-limit.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
> running to completion, but still affects performance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org