[ 
https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212062#comment-14212062
 ] 

Andrew Ash commented on SPARK-625:
----------------------------------

Spark is very sensitive to hostnames in Spark URLs, and that comes from Akka 
being very sensitive.  I've personally been bitten by hostnames vs FQDNs vs 
external IP address vs loopback IP address, and it's really a pain.

On current master branch (1.2) with the Spark standalone master listening on 
{{spark://aash-mbp.local:7077}} as confirmed by the master web UI, and the 
spark shell attempting to connect to {{spark://127.0.01:7077}} with the 
{{--master}} parameter, the driver tries 3 attempts and then fails with this 
message:

{noformat}
14/11/14 01:37:56 INFO AppClient$ClientActor: Connecting to master 
spark://127.0.0.1:7077...
14/11/14 01:37:56 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid 
address: akka.tcp://sparkMaster@127.0.0.1:7077
14/11/14 01:37:56 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 
ms, all messages to this address will be delivered to dead letters. Reason: 
Connection refused: /127.0.0.1:7077
14/11/14 01:38:16 INFO AppClient$ClientActor: Connecting to master 
spark://127.0.0.1:7077...
14/11/14 01:38:16 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 
ms, all messages to this address will be delivered to dead letters. Reason: 
Connection refused: /127.0.0.1:7077
14/11/14 01:38:16 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid 
address: akka.tcp://sparkMaster@127.0.0.1:7077
14/11/14 01:38:36 INFO AppClient$ClientActor: Connecting to master 
spark://127.0.0.1:7077...
14/11/14 01:38:36 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 
ms, all messages to this address will be delivered to dead letters. Reason: 
Connection refused: /127.0.0.1:7077
14/11/14 01:38:36 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid 
address: akka.tcp://sparkMaster@127.0.0.1:7077
14/11/14 01:38:56 ERROR SparkDeploySchedulerBackend: Application has been 
killed. Reason: All masters are unresponsive! Giving up.
14/11/14 01:38:56 WARN SparkDeploySchedulerBackend: Application ID is not 
initialized yet.
14/11/14 01:38:56 ERROR TaskSchedulerImpl: Exiting due to error from cluster 
scheduler: All masters are unresponsive! Giving up.
{noformat}

So the hang seems to be gone and replaced with a reasonable 3x attempts and 
fail.

[~joshrosen], short of changing Akka ourselves to make it less strict on exact 
URL matches, is there anything else we can do for this ticket?  I think we can 
reasonably close as fixed.

> Client hangs when connecting to standalone cluster using wrong address
> ----------------------------------------------------------------------
>
>                 Key: SPARK-625
>                 URL: https://issues.apache.org/jira/browse/SPARK-625
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.7.1, 0.8.0
>            Reporter: Josh Rosen
>            Priority: Minor
>
> I launched a standalone cluster on my laptop, connecting the workers to the 
> master using my machine's public IP address (128.32.*.*:7077).  If I try to 
> connect spark-shell to the master using "spark://0.0.0.0:7077", it 
> successfully brings up a Scala prompt but hangs when I try to run a job.
> From the standalone master's log, it looks like the client's messages are 
> being dropped without the client discovering that the connection has failed:
> {code}
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
> RegisterJob(JobDescription(Spark shell)) for non-local recipient 
> akka://spark@0.0.0.0:7077/user/Master at akka://spark@128.32.*.*:7077 local 
> is akka://spark@128.32.*.*:7077
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
> DaemonMsgWatch(Actor[akka://spark@128.32.*.*:57518/user/$a],Actor[akka://spark@0.0.0.0:7077/user/Master])
>  for non-local recipient akka://spark@0.0.0.0:7077/remote at 
> akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to