[jira] [Updated] (SPARK-13642) Inconsistent finishing state between driver and AM

Saisai Shao (JIRA) Thu, 03 Mar 2016 01:11:20 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Saisai Shao updated SPARK-13642:
--------------------------------
    Description: 
Currently when running Spark on Yarn with yarn cluster mode, the default 
application final state is "SUCCEED", if any exception is occurred, this final 
state will be changed to "FAILED" and trigger the reattempt if possible. 

This is OK in normal case, but if there's a race condition when AM received a 
signal (SIGTERM) and no any exception is occurred. In this situation, shutdown 
hook will be invoked and marked this application as finished with success, and 
there's no another attempt.

In such situation, actually from Spark's aspect this application is failed and 
need another attempt, but from Yarn's aspect the application is finished with 
success.

This could happened in NM failure situation, the failure of NM will send 
SIGTERM to AM, AM should mark this attempt as failure and rerun again, not 
invoke unregister.

So to increase the chance of this race condition, here is the reproduced code:

{code}
val sc = ...
Thread.sleep(30000L)
sc.parallelize(1 to 100).collect()
{code}

If the AM is failed in sleeping, there's no exception been thrown, so from 
Yarn's point this application is finished successfully, but from Spark's point, 
this application should be reattempted.

The log normally like this:

{noformat}
16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : 
192.168.0.105:45454
16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor 
NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2
16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462)
16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor 
NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1
16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468)
16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/03/03 12:44:23 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook 
done
16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/metrics/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/api,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/static,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/threadDump,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/job/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/job,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs,null}
16/03/03 12:44:39 INFO SparkUI: Stopped Spark web UI at 
http://192.168.0.105:57452
16/03/03 12:44:39 INFO YarnAllocator: Driver requested a total number of 0 
executor(s).
16/03/03 12:44:39 INFO YarnAllocator: Canceling requests for 0 executor 
containers
16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Shutting down all executors
16/03/03 12:44:39 WARN YarnAllocator: Expected to find pending requests, but 
found none.
16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Asking each executor to 
shut down
16/03/03 12:44:39 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
16/03/03 12:44:39 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
16/03/03 12:44:39 INFO MemoryStore: MemoryStore cleared
16/03/03 12:44:39 INFO BlockManager: BlockManager stopped
16/03/03 12:44:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/03 12:44:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
16/03/03 12:44:39 INFO SparkContext: Successfully stopped SparkContext
16/03/03 12:44:39 INFO ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/03/03 12:44:39 INFO ApplicationMaster: Unregistering ApplicationMaster with 
SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/03/03 12:44:39 INFO AMRMClientImpl: Waiting for application to be 
successfully unregistered.
16/03/03 12:44:39 INFO ApplicationMaster: Deleting staging directory 
.sparkStaging/application_1456975940304_0004
16/03/03 12:44:40 INFO ShutdownHookManager: Shutdown hook called
{noformat}

So basically, I think only after the finish of user class, we could mark this 
application as "SUCCESS", otherwise, especially in the signal stopped scenario, 
it would be better to mark as failed and try again (except explicitly KILL 
command by yarn).


  was:
Currently when running Spark on Yarn with yarn cluster mode, the default 
application final state is "SUCCEED", if any exception is occurred, this final 
state will be changed to "FAILED" and trigger the reattempt if possible. 

This is OK in normal case, but if there's a race condition when AM received a 
signal (SIGTERM) and no any exception is occurred. In this situation, shutdown 
hook will be invoked and marked this application as finished with success, and 
there's no another attempt.

In such situation, actually from Spark's aspect this application is failed and 
need another attempt, but from Yarn's aspect the application is finished with 
success.

This could happened in NM failure situation, the failure of NM will send 
SIGTERM to AM, AM should make this attempt as failure and rerun again, not 
invoke unregister.

So to increase the chance of this race condition, here is the reproduced code:

{code}
val sc = ...
Thread.sleep(30000L)
sc.parallelize(1 to 100).collect()
{code}

If the AM is failed in sleeping, there's no exception been thrown, so from 
Yarn's point this application is finished successfully, but from Spark's point, 
this application should be reattempted.

The log normally like this:

{noformat}
16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : 
192.168.0.105:45454
16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor 
NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2
16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462)
16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor 
NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1
16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468)
16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/03/03 12:44:23 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook 
done
16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/metrics/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/api,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/static,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/threadDump,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/job/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/job,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs/json,null}
16/03/03 12:44:39 INFO ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/jobs,null}
16/03/03 12:44:39 INFO SparkUI: Stopped Spark web UI at 
http://192.168.0.105:57452
16/03/03 12:44:39 INFO YarnAllocator: Driver requested a total number of 0 
executor(s).
16/03/03 12:44:39 INFO YarnAllocator: Canceling requests for 0 executor 
containers
16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Shutting down all executors
16/03/03 12:44:39 WARN YarnAllocator: Expected to find pending requests, but 
found none.
16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Asking each executor to 
shut down
16/03/03 12:44:39 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
16/03/03 12:44:39 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
16/03/03 12:44:39 INFO MemoryStore: MemoryStore cleared
16/03/03 12:44:39 INFO BlockManager: BlockManager stopped
16/03/03 12:44:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/03 12:44:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
16/03/03 12:44:39 INFO SparkContext: Successfully stopped SparkContext
16/03/03 12:44:39 INFO ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/03/03 12:44:39 INFO ApplicationMaster: Unregistering ApplicationMaster with 
SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/03/03 12:44:39 INFO AMRMClientImpl: Waiting for application to be 
successfully unregistered.
16/03/03 12:44:39 INFO ApplicationMaster: Deleting staging directory 
.sparkStaging/application_1456975940304_0004
16/03/03 12:44:40 INFO ShutdownHookManager: Shutdown hook called
{noformat}

So basically, I think only after the finish of user class, we could mark this 
application as "SUCCESS", otherwise, especially in the signal stopped scenario, 
it would be better to mark as failed and try again (except explicitly KILL 
command by yarn).



> Inconsistent finishing state between driver and AM 
> ---------------------------------------------------
>
>                 Key: SPARK-13642
>                 URL: https://issues.apache.org/jira/browse/SPARK-13642
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.0
>            Reporter: Saisai Shao
>
> Currently when running Spark on Yarn with yarn cluster mode, the default 
> application final state is "SUCCEED", if any exception is occurred, this 
> final state will be changed to "FAILED" and trigger the reattempt if 
> possible. 
> This is OK in normal case, but if there's a race condition when AM received a 
> signal (SIGTERM) and no any exception is occurred. In this situation, 
> shutdown hook will be invoked and marked this application as finished with 
> success, and there's no another attempt.
> In such situation, actually from Spark's aspect this application is failed 
> and need another attempt, but from Yarn's aspect the application is finished 
> with success.
> This could happened in NM failure situation, the failure of NM will send 
> SIGTERM to AM, AM should mark this attempt as failure and rerun again, not 
> invoke unregister.
> So to increase the chance of this race condition, here is the reproduced code:
> {code}
> val sc = ...
> Thread.sleep(30000L)
> sc.parallelize(1 to 100).collect()
> {code}
> If the AM is failed in sleeping, there's no exception been thrown, so from 
> Yarn's point this application is finished successfully, but from Spark's 
> point, this application should be reattempted.
> The log normally like this:
> {noformat}
> 16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : 
> 192.168.0.105:45454
> 16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2
> 16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1
> 16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
> 16/03/03 12:44:23 INFO YarnClusterScheduler: 
> YarnClusterScheduler.postStartHook done
> 16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/metrics/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/api,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/static,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/rdd,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/pool,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/jobs/job,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/jobs/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/jobs,null}
> 16/03/03 12:44:39 INFO SparkUI: Stopped Spark web UI at 
> http://192.168.0.105:57452
> 16/03/03 12:44:39 INFO YarnAllocator: Driver requested a total number of 0 
> executor(s).
> 16/03/03 12:44:39 INFO YarnAllocator: Canceling requests for 0 executor 
> containers
> 16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Shutting down all 
> executors
> 16/03/03 12:44:39 WARN YarnAllocator: Expected to find pending requests, but 
> found none.
> 16/03/03 12:44:39 INFO YarnClusterSchedulerBackend: Asking each executor to 
> shut down
> 16/03/03 12:44:39 INFO SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices
> (serviceOption=None,
>  services=List(),
>  started=false)
> 16/03/03 12:44:39 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 16/03/03 12:44:39 INFO MemoryStore: MemoryStore cleared
> 16/03/03 12:44:39 INFO BlockManager: BlockManager stopped
> 16/03/03 12:44:39 INFO BlockManagerMaster: BlockManagerMaster stopped
> 16/03/03 12:44:39 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 16/03/03 12:44:39 INFO SparkContext: Successfully stopped SparkContext
> 16/03/03 12:44:39 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0, (reason: Shutdown hook called before final status was reported.)
> 16/03/03 12:44:39 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED (diag message: Shutdown hook called before final status was 
> reported.)
> 16/03/03 12:44:39 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 16/03/03 12:44:39 INFO ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1456975940304_0004
> 16/03/03 12:44:40 INFO ShutdownHookManager: Shutdown hook called
> {noformat}
> So basically, I think only after the finish of user class, we could mark this 
> application as "SUCCESS", otherwise, especially in the signal stopped 
> scenario, it would be better to mark as failed and try again (except 
> explicitly KILL command by yarn).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13642) Inconsistent finishing state between driver and AM

Reply via email to