[ https://issues.apache.org/jira/browse/IMPALA-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riza Suminto reassigned IMPALA-14263: ------------------------------------- Assignee: Riza Suminto > Broadcast cost in planner is skewed by the number of nodes comparing to > partition cost > -------------------------------------------------------------------------------------- > > Key: IMPALA-14263 > URL: https://issues.apache.org/jira/browse/IMPALA-14263 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Reporter: Wenzhe Zhou > Assignee: Riza Suminto > Priority: Major > > broadCast Cost = dataPayload + hashTblBuildCost = 2 x (rhsDataSize * > leftChildNodes) > partition Cost = Math.round(lhsNetworkCost + rhsNetworkCost + rhsDataSize) > The number of nodes skews broadcast cost on bigger clusters, which makes > broadcast cost much bigger than partitioned join cost, e.g. planner favor > partition strategy for big cluster. > We probably need to introduce new heuristics to join strategy decision, like > including number of nodes in partitioned join cost model. We also need a way > to check for the degree of skew on the join key during the planning phase. If > the skew is on the higher side, we would want to bias the cost model towards > broadcast. > Adding join hints in the query is the recommended workaround to force > broadcast join in the cases where join keys are skewed, especially for larger > clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org