Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/21156
  
    @cloud-fan For bucket table, the user will do the bucket on the primary 
key, so in this case, they will not have the parallelism and data skew issue 
and we can see good benefit from avoiding shuffle.
    Do you mean the performance regression in some more general cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to