[ https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212062#comment-14212062 ]
Andrew Ash commented on SPARK-625: ---------------------------------- Spark is very sensitive to hostnames in Spark URLs, and that comes from Akka being very sensitive. I've personally been bitten by hostnames vs FQDNs vs external IP address vs loopback IP address, and it's really a pain. On current master branch (1.2) with the Spark standalone master listening on {{spark://aash-mbp.local:7077}} as confirmed by the master web UI, and the spark shell attempting to connect to {{spark://127.0.01:7077}} with the {{--master}} parameter, the driver tries 3 attempts and then fails with this message: {noformat} 14/11/14 01:37:56 INFO AppClient$ClientActor: Connecting to master spark://127.0.0.1:7077... 14/11/14 01:37:56 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077 14/11/14 01:37:56 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:7077 14/11/14 01:38:16 INFO AppClient$ClientActor: Connecting to master spark://127.0.0.1:7077... 14/11/14 01:38:16 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:7077 14/11/14 01:38:16 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077 14/11/14 01:38:36 INFO AppClient$ClientActor: Connecting to master spark://127.0.0.1:7077... 14/11/14 01:38:36 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:7077 14/11/14 01:38:36 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077 14/11/14 01:38:56 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/11/14 01:38:56 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet. 14/11/14 01:38:56 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up. {noformat} So the hang seems to be gone and replaced with a reasonable 3x attempts and fail. [~joshrosen], short of changing Akka ourselves to make it less strict on exact URL matches, is there anything else we can do for this ticket? I think we can reasonably close as fixed. > Client hangs when connecting to standalone cluster using wrong address > ---------------------------------------------------------------------- > > Key: SPARK-625 > URL: https://issues.apache.org/jira/browse/SPARK-625 > Project: Spark > Issue Type: Bug > Affects Versions: 0.7.0, 0.7.1, 0.8.0 > Reporter: Josh Rosen > Priority: Minor > > I launched a standalone cluster on my laptop, connecting the workers to the > master using my machine's public IP address (128.32.*.*:7077). If I try to > connect spark-shell to the master using "spark://0.0.0.0:7077", it > successfully brings up a Scala prompt but hangs when I try to run a job. > From the standalone master's log, it looks like the client's messages are > being dropped without the client discovering that the connection has failed: > {code} > 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message > RegisterJob(JobDescription(Spark shell)) for non-local recipient > akka://spark@0.0.0.0:7077/user/Master at akka://spark@128.32.*.*:7077 local > is akka://spark@128.32.*.*:7077 > 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message > DaemonMsgWatch(Actor[akka://spark@128.32.*.*:57518/user/$a],Actor[akka://spark@0.0.0.0:7077/user/Master]) > for non-local recipient akka://spark@0.0.0.0:7077/remote at > akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org