[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

Christopher Bourez (JIRA) Sun, 14 Feb 2016 10:58:42 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ]


Christopher Bourez edited comment on SPARK-13317 at 2/14/16 6:58 PM:
---------------------------------------------------------------------

I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

`
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
`

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,


was (Author: christopher5106):
I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

```
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-0000/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-0000/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
```

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,

> SPARK_LOCAL_IP does not bind on Slaves
> --------------------------------------
>
>                 Key: SPARK-13317
>                 URL: https://issues.apache.org/jira/browse/SPARK-13317
>             Project: Spark
>          Issue Type: Bug
>         Environment: Linux EC2, different VPC 
>            Reporter: Christopher Bourez
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

Reply via email to