[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

markhamstra Wed, 18 Jul 2018 15:20:42 -0700

Github user markhamstra commented on the issue:

    https://github.com/apache/spark/pull/21589
  
    No, defaultParallelism isn't more useful in that case, but that just starts 
getting to my overall assessment of this JIRA and PR: It smells of defining the 
problem to align with a preconception of the solution.
    
    Exposing the driver's current accounting of the number of cores active in 
the cluster is not something that we couldn't do or didn't know how to do along 
time ago. Rather, it is something that those of us working on the scheduler 
chose not to do because of the expectation that putting this in the public API 
(and thereby implicitly encouraging its use) was likely to produce as many 
problems as it solves. This was primarily because of two factors: 1) The number 
of cores and executors is not static; 2) Closely tailoring a Job to some 
expectation of the number of available cores or executors is not obviously a 
correct thing to encourage in general. 
    
    Whether from node failures, dynamic executor allocation, backend scheduler 
elasticity/preemption, or just other Jobs running under the same SparkContext, 
the number of cores and executors available to any particular Job when it is 
created can easily be different from what is available when any of its Stages 
actually runs.
    
    Even if you could get reliable numbers for the cores and executors that 
will be available through the lifecycle of a Job, tailoring a Job to use all of 
those cores and executors is only the right thing to do in a subset of Spark 
use cases. For example, using many more executors than there are DFS partitions 
holding the data, or trying to use all of the cores when there are other Jobs 
pending, or trying to use all of the cores when another Job needs to acquire a 
minimum number for barrier scheduled execution, or trying to use more cores 
than a scheduling pool permits would all be examples of anti-patterns that 
would be more enabled by easy, context-free access to low-level numCores.
    
    There definitely are use cases where users need to be able to set policy 
for whether particular jobs should be encouraged to use more or less of the 
cluster's resources, but I believe that that needs to be done at a much higher 
level of abstraction in a declarative form, and that policy likely needs to be 
enforced dynamically/adaptively at Stage boundaries. The under-developed and 
under-used dynamic shuffle partitioning code in Spark SQL starts to go in that 
direction.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

Reply via email to