Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 No, defaultParallelism isn't more useful in that case, but that just starts getting to my overall assessment of this JIRA and PR: It smells of defining the problem to align with a preconception of the solution. Exposing the driver's current accounting of the number of cores active in the cluster is not something that we couldn't do or didn't know how to do along time ago. Rather, it is something that those of us working on the scheduler chose not to do because of the expectation that putting this in the public API (and thereby implicitly encouraging its use) was likely to produce as many problems as it solves. This was primarily because of two factors: 1) The number of cores and executors is not static; 2) Closely tailoring a Job to some expectation of the number of available cores or executors is not obviously a correct thing to encourage in general. Whether from node failures, dynamic executor allocation, backend scheduler elasticity/preemption, or just other Jobs running under the same SparkContext, the number of cores and executors available to any particular Job when it is created can easily be different from what is available when any of its Stages actually runs. Even if you could get reliable numbers for the cores and executors that will be available through the lifecycle of a Job, tailoring a Job to use all of those cores and executors is only the right thing to do in a subset of Spark use cases. For example, using many more executors than there are DFS partitions holding the data, or trying to use all of the cores when there are other Jobs pending, or trying to use all of the cores when another Job needs to acquire a minimum number for barrier scheduled execution, or trying to use more cores than a scheduling pool permits would all be examples of anti-patterns that would be more enabled by easy, context-free access to low-level numCores. There definitely are use cases where users need to be able to set policy for whether particular jobs should be encouraged to use more or less of the cluster's resources, but I believe that that needs to be done at a much higher level of abstraction in a declarative form, and that policy likely needs to be enforced dynamically/adaptively at Stage boundaries. The under-developed and under-used dynamic shuffle partitioning code in Spark SQL starts to go in that direction.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org