[ https://issues.apache.org/jira/browse/SPARK-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-2971: ----------------------------- Component/s: YARN > Orphaned YARN ApplicationMaster lingers forever > ----------------------------------------------- > > Key: SPARK-2971 > URL: https://issues.apache.org/jira/browse/SPARK-2971 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.0.2 > Environment: Python yarn client mode, Cloudera 5.1.0 on Ubuntu precise > Reporter: Shay Rojansky > > We have cases where if CTRL-C is hit during a Spark job startup, a YARN > ApplicationMaster is created but cannot connect to the driver (presumably > because the driver has terminated). Once an AM enters this state it never > exits it, and has to be manually killed in YARN. > Here's an excerpt from the AM logs: > {noformat} > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/yarn/nm/usercache/roji/filecache/40/spark-assembly-1.0.2-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 14/08/11 16:29:39 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 14/08/11 16:29:39 INFO SecurityManager: Changing view acls to: roji > 14/08/11 16:29:39 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(roji) > 14/08/11 16:29:40 INFO Slf4jLogger: Slf4jLogger started > 14/08/11 16:29:40 INFO Remoting: Starting remoting > 14/08/11 16:29:40 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sparkyar...@g024.grid.eaglerd.local:34075] > 14/08/11 16:29:40 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://sparkyar...@g024.grid.eaglerd.local:34075] > 14/08/11 16:29:40 INFO RMProxy: Connecting to ResourceManager at > master.grid.eaglerd.local/192.168.41.100:8030 > 14/08/11 16:29:40 INFO ExecutorLauncher: ApplicationAttemptId: > appattempt_1407759736957_0014_000001 > 14/08/11 16:29:40 INFO ExecutorLauncher: Registering the ApplicationMaster > 14/08/11 16:29:40 INFO ExecutorLauncher: Waiting for Spark driver to be > reachable. > 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at > master.grid.eaglerd.local:44911, retrying ... > 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at > master.grid.eaglerd.local:44911, retrying ... > 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at > master.grid.eaglerd.local:44911, retrying ... > 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at > master.grid.eaglerd.local:44911, retrying ... > 14/08/11 16:29:40 ERROR ExecutorLauncher: Failed to connect to driver at > master.grid.eaglerd.local:44911, retrying ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org