huaxingao opened a new pull request, #34785:
URL: https://github.com/apache/spark/pull/34785

   
   ### What changes were proposed in this pull request?
   Support optimize skewed partitions in Distribution and Ordering if 
numPartitions is not specified
   
   ### Why are the changes needed?
   When doing repartition in distribution and sort, we will use Rebalance 
operator instead of RepartitionByExpression to optimize skewed partitions when
   1. numPartitions is not specified by the data source, and
   2. sortOrder is specified. This is because the requested distribution needs 
to be guaranteed, which can only be achieved by using RangePartitioning, not 
HashPartitioning.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing and new tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to