[ 
https://issues.apache.org/jira/browse/SPARK-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2971:
-----------------------------
    Component/s: YARN

> Orphaned YARN ApplicationMaster lingers forever
> -----------------------------------------------
>
>                 Key: SPARK-2971
>                 URL: https://issues.apache.org/jira/browse/SPARK-2971
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.2
>         Environment: Python yarn client mode, Cloudera 5.1.0 on Ubuntu precise
>            Reporter: Shay Rojansky
>
> We have cases where if CTRL-C is hit during a Spark job startup, a YARN 
> ApplicationMaster is created but cannot connect to the driver (presumably 
> because the driver has terminated). Once an AM enters this state it never 
> exits it, and has to be manually killed in YARN.
> Here's an excerpt from the AM logs:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/yarn/nm/usercache/roji/filecache/40/spark-assembly-1.0.2-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/08/11 16:29:39 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/08/11 16:29:39 INFO SecurityManager: Changing view acls to: roji
> 14/08/11 16:29:39 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(roji)
> 14/08/11 16:29:40 INFO Slf4jLogger: Slf4jLogger started
> 14/08/11 16:29:40 INFO Remoting: Starting remoting
> 14/08/11 16:29:40 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkyar...@g024.grid.eaglerd.local:34075]
> 14/08/11 16:29:40 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sparkyar...@g024.grid.eaglerd.local:34075]
> 14/08/11 16:29:40 INFO RMProxy: Connecting to ResourceManager at 
> master.grid.eaglerd.local/192.168.41.100:8030
> 14/08/11 16:29:40 INFO ExecutorLauncher: ApplicationAttemptId: 
> appattempt_1407759736957_0014_000001
> 14/08/11 16:29:40 INFO ExecutorLauncher: Registering the ApplicationMaster
> 14/08/11 16:29:40 INFO ExecutorLauncher: Waiting for Spark driver to be 
> reachable.
> 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at 
> master.grid.eaglerd.local:44911, retrying ...
> 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at 
> master.grid.eaglerd.local:44911, retrying ...
> 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at 
> master.grid.eaglerd.local:44911, retrying ...
> 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at 
> master.grid.eaglerd.local:44911, retrying ...
> 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at 
> master.grid.eaglerd.local:44911, retrying ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to