GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/21783
[SPARK-24799]A solution of dealing with data skew in left,right,inner join ## What changes were proposed in this pull request? For the left,right,inner join statment execution, this solution is mainling about to devide the partions where the data skew has occured into serveral partions with smaller data scale, in order to parallelly execute more tasks to increase effeciency. ## How was this patch tested? Unit tests in DatasetSuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/marymwu/spark branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21783.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21783 ---- commit 2a01c813b6ef7223a489a4bcda3c9e5feb899060 Author: wangsm9 <wangsm9@...> Date: 2018-07-16T09:48:44Z âdata skew code for spark2.3 ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org