[ https://issues.apache.org/jira/browse/SPARK-17607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559882#comment-15559882 ]
Sean Owen commented on SPARK-17607: ----------------------------------- Same as https://issues.apache.org/jira/browse/SPARK-4563 ? It's not worth pinging issues please; it doesn't cause people to work on it. You can research and follow this and other related JIRAs instead. > --driver-url doesn't point to my master_ip. > ------------------------------------------- > > Key: SPARK-17607 > URL: https://issues.apache.org/jira/browse/SPARK-17607 > Project: Spark > Issue Type: Bug > Affects Versions: 1.5.2 > Reporter: Sasi > Priority: Critical > > Hi, > I have master machine and slave machine. > My master machine contains 2 interfaces. > First interface has the following ip 10.5.5.2, and the other interface has > the following ip 10.0.42.230. > I configured the MASTER_IP to be 10.5.5.2, so once the master goes up and its > worker I see the following INFO lines: > {code} > 16/09/20 12:32:32 INFO Worker: Successfully registered with master > spark://10.5.5.2:7077 > 16/09/20 12:39:15 INFO Worker: Asked to launch executor > app-20160920123915-0000/0 for Spark-DataAccessor-JBoss > {code} > I set the SPARK_LOCAL_IP on each worker to be its own ip, e.g 10.5.5.5. > Both constants were configured on spark-env.sh. > The problem started when I tried to get data from my workers. > I got the following INFO line in each worker log. > {code} > "--driver-url" > "akka.tcp://sparkDriver@10.0.42.230:43683/user/CoarseGrainedScheduler" " > {code} > As you can see the masterIp is different then the driver-url ip. > Master ip is 10.5.5.2 but driver-url is 10.0.42.230, therefore i'm getting > the following errors: > {code} > 16/09/20 12:17:57 INFO Slf4jLogger: Slf4jLogger started > 16/09/20 12:17:57 INFO Remoting: Starting remoting > 16/09/20 12:17:57 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://driverPropsFetcher@10.5.5.5:34961] > 16/09/20 12:17:57 INFO Utils: Successfully started service > 'driverPropsFetcher' on port 34961. > 16/09/20 12:19:00 WARN ReliableDeliverySupervisor: Association with remote > system [akka.tcp://sparkDriver@10.0.42.230:36711] has failed, address is now > gated for [5000] ms. Reason: [Association failed with > [akka.tcp://sparkDriver@10.0.42.230:36711]] Caused by: [Connection timed out: > /10.0.42.230:36711] > Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: > ActorSelection[Anchor(akka.tcp://sparkDriver@10.0.42.230:36711/), > Path(/user/CoarseGrainedScheduler)] > at > {code} > {code} > "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" > "akka.tcp://sparkDriver@10.0.42.230:43683/user/CoarseGrainedScheduler" > {code} > The master is listen and open for communicate via 10.5.5.2 and not > 10.0.42.230. > Looks like the driver-url ignore the real MASTER_IP. > Thanks, > Sasi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org