[GitHub] [spark] bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

GitBox Wed, 08 Jan 2020 22:19:01 -0800

bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join 
is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-572408308
 
 
   Thanks for taking a look!
   Yes, the reason it is here is because the shuffle/sorting is introduced by 
EnsureRequirements itself, which causes the user added sorts/shuffles 
unnecessary. Yea it felt a little hacky for optimization code to be in a rule 
called EnsureRequirements.
   
   I'd like someone more familiar with overall planner design to suggest 
whether I go through with 1st or 2nd option.
   For 2nd option, won't I need to create a new physical node for both the 
repartition and sort, each of which is kinda a dummy physical node which relies 
on EnsureRequirements to add the necessary sorts/partitioning based on 
`requiredChildDistribution` and `requiredChildOrdering`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

Reply via email to