Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 Thank you, @HyukjinKwon There are a significant number of Spark users who use the Job Scheduler model with a SparkContext shared across many users and many Jobs. Promoting tools and patterns based upon the number of core or executors that a SparkContext has access to, encouraging users to create Jobs that try to use all of the available cores, very much leads those users in the wrong direction. As much as possible, the public API should target policy that addresses real user problems (all users, not just a subset), and avoid targeting the particulars of Spark's internal implementation. A `repartition` that is extended to support policy or goal declarations (things along the lines of `repartition(availableCores)`, `repartition(availableDataNodes)`, `repartition(availableExecutors)`, `repartition(unreservedCores)`, etc.), relying upon Spark's internals (with it's compete knowledge of the total number of cores and executors, scheduling pool shares, number of reserved Task nodes sought in barrier scheduling, number of active Jobs, Stages, Tasks and Sessions, etc.) may be something that I can get behind. Exposing a couple of current Spark scheduler implementation details in the expectation that some subset of users in some subset of use cases will be able to make correct use of them is not.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org