Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22001#discussion_r209460397
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -929,11 +963,38 @@ class DAGScheduler(
           // HadoopRDD whose underlying HDFS files have been deleted.
           finalStage = createResultStage(finalRDD, func, partitions, jobId, 
callSite)
         } catch {
    +      case e: Exception if e.getMessage.contains(
    +          
DAGScheduler.ERROR_MESSAGE_BARRIER_REQUIRE_MORE_SLOTS_THAN_CURRENT_TOTAL_NUMBER)
 =>
    +        logWarning(s"The job $jobId requires to run a barrier stage that 
requires more slots " +
    +          "than the total number of slots in the cluster currently.")
    +        jobIdToNumTasksCheckFailures.compute(jobId, new BiFunction[Int, 
Int, Int] {
    +          override def apply(key: Int, value: Int): Int = value + 1
    +        })
    +        val numCheckFailures = jobIdToNumTasksCheckFailures.get(jobId)
    +        if (numCheckFailures <= maxFailureNumTasksCheck) {
    +          messageScheduler.schedule(
    +            new Runnable {
    +              override def run(): Unit = 
eventProcessLoop.post(JobSubmitted(jobId, finalRDD, func,
    +                partitions, callSite, listener, properties))
    +            },
    +            timeIntervalNumTasksCheck * 1000,
    --- End diff --
    
    minor: how about removing `1000` and changing the time unit to `SECONDS`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to