GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/15541
[SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors ## What changes were proposed in this pull request? Restructure the code and implement two new task assigner. PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled. BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors. By default, the original round robin assigner is used. We test a pipeline, and new PackedAssigner save around 45% regarding the reserved cpu and memory with dynamic allocation enabled. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhzhan/spark TaskAssigner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15541.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15541 ---- commit 75cdd1a77a227fa492a09e93794d4ea7be8a020f Author: Zhan Zhang <zhanzh...@fb.com> Date: 2016-10-19T01:20:48Z TaskAssigner to support different scheduling algorithms ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org