[jira] [Commented] (SPARK-12009) Avoid re-allocate yarn container while driver want to stop all Executors

SuYan (JIRA) Thu, 26 Nov 2015 19:44:58 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029427#comment-15029427
 ]


SuYan commented on SPARK-12009:
-------------------------------

run on the spark 1.4.0, and check current 1.5.2, that problem still exist,

assume user call sc.stop in main(), now the userThread is still running on 
ApplicationMaster, right？
then sc.stop let DAGScheduler.stop -> taskScheduleImpl.stop -> 
schedulerBackend.stop -> stopExecutors
{code}
override def stop() {
    stopExecutors()
    try {
      if (driverEndpoint != null) {
        driverEndpoint.askWithRetry[Boolean](StopDriver)
      }
    } catch {
      case e: Exception =>
        throw new SparkException("Error stopping standalone scheduler's driver 
endpoint", e)
    }
  }
{code}

and ApplciationMaster still not mark userApp is finished
{code} 
 while (!finished) {
          try {
            if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) {
              finish(FinalApplicationStatus.FAILED,
                ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,
                s"Max number of executor failures ($maxNumExecutorFailures) 
reached")
            } else {
              logDebug("Sending progress")
              allocator.allocateResources()
            }
            failureCount = 0
          } 

{code}


btw, I find in branch 1.5.2, the log still 
{code}
   override def onDisconnected(remoteAddress: RpcAddress): Unit = {
      logInfo(s"Driver terminated or disconnected! Shutting down. 
$remoteAddress")
      // In cluster mode, do not rely on the disassociated event to exit
      // This avoids potentially reporting incorrect exit codes if the driver 
fails
      if (!isClusterMode) {
        finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
      }
    }
  }
{code}

> Avoid re-allocate yarn container while driver want to stop all Executors
> ------------------------------------------------------------------------
>
>                 Key: SPARK-12009
>                 URL: https://issues.apache.org/jira/browse/SPARK-12009
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.2
>            Reporter: SuYan
>            Priority: Minor
>
> Log based 1.4.0
> 2015-11-26,03:05:16,176 WARN 
> org.spark-project.jetty.util.thread.QueuedThreadPool: 8 threads could not be 
> stopped
> 2015-11-26,03:05:16,177 INFO org.apache.spark.ui.SparkUI: Stopped Spark web 
> UI at http://
> 2015-11-26,03:05:16,401 INFO org.apache.spark.scheduler.DAGScheduler: 
> Stopping DAGScheduler
> 2015-11-26,03:05:16,450 INFO 
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend: Shutting down 
> all executors
> 2015-11-26,03:05:16,525 INFO 
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend: Asking each 
> executor to shut down
> 2015-11-26,03:05:16,791 INFO 
> org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint: Driver terminated 
> or disconnected! Shutting down. XX.XX.XX.XX:38734
> 2015-11-26,03:05:16,847 ERROR org.apache.spark.scheduler.LiveListenerBus: 
> SparkListenerBus has already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(164,WrappedArray())
> 2015-11-26,03:05:27,242 INFO org.apache.spark.deploy.yarn.YarnAllocator: Will 
> request 13 executor containers, each with 1 cores and 4608 MB memory 
> including 1024 MB overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12009) Avoid re-allocate yarn container while driver want to stop all Executors

Reply via email to