[ https://issues.apache.org/jira/browse/SPARK-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guangyang Li updated SPARK-9711: -------------------------------- Description: With Spark 1.4.1 and YARN client mode, my application works at the first time the cluster is built. While if I stop and start the cluster with using spark-ec2, the same command fails. At the end of the spark logs, it's shown that it just keeps trying to connect to master node repeatedly: INFO Client: Retrying connect to server: ec2-54-174-232-129.compute-1.amazonaws.com/172.31.36.29:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) I restarted YARN and dfs manually after restarting the cluster, however, I was unable to restart Tachyon and it fails when running ./bin/tachyon runTests, which might be the possible reason. was: With Spark 1.4.1 and YARN client mode, my application works at the first time the cluster is built. While if I stop and start the cluster with using spark-ec2, the same command fails. At the end of the spark logs, it's shown that it just keeps trying to connect to master node repeatedly: INFO Client: Retrying connect to server: ec2-54-174-232-129.compute-1.amazonaws.com/172.31.36.29:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) > Unable to run spark after restarting cluster with spark-ec2 > ----------------------------------------------------------- > > Key: SPARK-9711 > URL: https://issues.apache.org/jira/browse/SPARK-9711 > Project: Spark > Issue Type: Bug > Components: EC2 > Affects Versions: 1.4.1 > Reporter: Guangyang Li > > With Spark 1.4.1 and YARN client mode, my application works at the first time > the cluster is built. While if I stop and start the cluster with using > spark-ec2, the same command fails. At the end of the spark logs, it's shown > that it just keeps trying to connect to master node repeatedly: > INFO Client: Retrying connect to server: > ec2-54-174-232-129.compute-1.amazonaws.com/172.31.36.29:8032. Already tried 0 > time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) > I restarted YARN and dfs manually after restarting the cluster, however, I > was unable to restart Tachyon and it fails when running ./bin/tachyon > runTests, which might be the possible reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org