Github user yucai commented on the issue: https://github.com/apache/spark/pull/21156 @cloud-fan For bucket table, the user will do the bucket on the primary key, so in this case, they will not have the parallelism and data skew issue and we can see good benefit from avoiding shuffle. Do you mean the performance regression in some more general cases?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org