[GitHub] spark pull request #19194: [SPARK-20589] Allow limiting task concurrency per...

squito Thu, 21 Sep 2017 09:44:11 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19194#discussion_r140296863
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -619,6 +625,47 @@ private[spark] class ExecutorAllocationManager(
         // place the executors.
         private val stageIdToExecutorPlacementHints = new mutable.HashMap[Int, 
(Int, Map[String, Int])]
     
    +    override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
    +      jobStart.stageInfos.foreach(stageInfo => 
stageIdToJobId(stageInfo.stageId) = jobStart.jobId)
    +
    +      var jobGroupId = if (jobStart.properties != null) {
    +        jobStart.properties.getProperty(SparkContext.SPARK_JOB_GROUP_ID)
    +      } else {
    +        null
    +      }
    +
    +      val maxConTasks = if (jobGroupId != null &&
    +        conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        conf.get(s"spark.job.$jobGroupId.maxConcurrentTasks").toInt
    +      } else {
    +        Int.MaxValue
    +      }
    +
    +      if (maxConTasks <= 0) {
    +        throw new IllegalArgumentException(
    +          "Maximum Concurrent Tasks should be set greater than 0 for the 
job to progress.")
    +      }
    +
    +      if (jobGroupId == null || 
!conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        jobGroupId = DEFAULT_JOB_GROUP
    +      }
    +
    +      jobIdToJobGroup(jobStart.jobId) = jobGroupId
    +      if (!jobGroupToMaxConTasks.contains(jobGroupId)) {
    --- End diff --
    
    You could submit jobs concurrently, from two different threads.  I didn't 
describe it very well -- say both threads are in the same job group.  Each 
thread can only do one job at a time, but maybe between the two threads, there 
is always some active job for the job group the entire time.
    
    There are real use cases which are like this -- in "job-server" style 
deployments, there is a long-running spark context (most likely with cached 
RDDs), accepting requests from multiple users (perhaps sitting behind an http 
server).  Maybe some group of users are always put into one job group (eg., 
there is a low-priority group and a high priority group of users).  You might 
process more than one job for each group at a time, and there is a never ending 
stream of jobs.
    
    (again, its not the most common scenario, but might as well have it behave 
correctly.)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19194: [SPARK-20589] Allow limiting task concurrency per...

Reply via email to