[GitHub] [spark] cloud-fan commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

GitBox Thu, 01 Jul 2021 10:53:28 -0700


cloud-fan commented on pull request #32816:
URL: https://github.com/apache/spark/pull/32816#issuecomment-872439396



   I think using the `CostEvaluator` to accept extra shuffles introduced by 
skew join handling is a good idea. However, the current framework is too 
simple: we just give up the entire re-optimization if the cost becomes higher, 
while we should try to interact with the re-optimization process to get the 
plan with the lowest cost. For example, if skew join handling adds extra 
shuffles, we should only give up skew join handling, instead of the entire 
re-optimization.
   
   My idea is, the re-optimization can produce 2 result plans: one with skew 
join handling and one without. Then we compare the cost: if the plan with skew 
join handling has a higher cost and force-apply is not enabled, we pick the 
other plan. Otherwise, we pick the plan with skew join handling. Then we don't 
need to change the `CostEvaluator`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

Reply via email to