[ 
https://issues.apache.org/jira/browse/IMPALA-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto reassigned IMPALA-14263:
-------------------------------------

    Assignee: Riza Suminto

> Broadcast cost in planner is skewed by the number of nodes comparing to 
> partition cost
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14263
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14263
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Wenzhe Zhou
>            Assignee: Riza Suminto
>            Priority: Major
>
> broadCast Cost = dataPayload + hashTblBuildCost = 2 x (rhsDataSize * 
> leftChildNodes)
> partition Cost = Math.round(lhsNetworkCost + rhsNetworkCost + rhsDataSize)
> The number of nodes skews broadcast cost on bigger clusters, which makes 
> broadcast cost much bigger than partitioned join cost, e.g. planner favor 
> partition strategy for big cluster.
> We probably need to introduce new heuristics to join strategy decision, like 
> including number of nodes in partitioned join cost model. We also need a way 
> to check for the degree of skew on the join key during the planning phase. If 
> the skew is on the higher side, we would want to bias the cost model towards 
> broadcast.
> Adding join hints in the query is the recommended workaround to force 
> broadcast join in the cases where join keys are skewed, especially for larger 
> clusters. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to