from:"\"Christopher Bourez \\\(JIRA\\\)\""

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind to public IP on Slaves

2016-02-16 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148705#comment-15148705
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/16/16 2:54 PM:
-

I'm trying my best, second time, but when I specify the public IP with 
{code}SPARK_PUBLIC_IP{code}in spark-env.sh and restart, I get an error during 
spark context initialization in spark-shell : 

{code}
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException:
{code}

Do you have any clue ?


was (Author: christopher5106):
I'm trying my best, second time, but when I specify the public IP with 
{code}SPARK_PUBLIC_ID{code} I get an error during spark context initialization 
in spark-shell : 

{code}
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException:
{code}

Do you have any clue ?

> SPARK_LOCAL_IP does not bind to public IP on Slaves
> ---
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, EC2
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>Priority: Minor
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind to public IP on Slaves

2016-02-16 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148705#comment-15148705
 ] 

Christopher Bourez commented on SPARK-13317:


I'm trying my best, second time, but when I specify the public IP with 
{code}SPARK_PUBLIC_ID{code} I get an error during spark context initialization 
in spark-shell : 

{code}
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 WARN Utils: Service 'sparkDriver' could not bind on port 0. 
Attempting port 1.
16/02/16 14:40:51 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException:
{code}

Do you have any clue ?

> SPARK_LOCAL_IP does not bind to public IP on Slaves
> ---
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, EC2
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>Priority: Minor
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-15 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147157#comment-15147157
 ] 

Christopher Bourez commented on SPARK-13317:


Let me give a check again but as far as I remember  {{SPARK_PUBLIC_DNS}}  is 
already set by default to public DNS in the conf of the slave, that's why the 
web UI works well

> SPARK_LOCAL_IP does not bind on Slaves
> --
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-15 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147138#comment-15147138
 ] 

Christopher Bourez commented on SPARK-13317:


To confirm, I stop all with stop-all.sh. Then I set SPARK_LOCAL_IP to the 
public IP on all instances. And then I run again start-all.sh. Am I missing 
something ?

> SPARK_LOCAL_IP does not bind on Slaves
> --
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146704#comment-15146704
 ] 

Christopher Bourez commented on SPARK-13317:


Because installing the notebooks Zeppelin or IScala on the cluster does not 
make a lot of sense.

> SPARK_LOCAL_IP does not bind on Slaves
> --
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 7:02 PM:
-

I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
{code}
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
{code}
I tried to connect to the slaves, to set SPARK_LOCAL_IP in the slaves' 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP of the slaves when I execute a job in client mode 
(spark-shell or zeppelin on my macbook).
I think we should be able to work from different networks. Only UI interfaces 
seem to be bound to the correct IP.


was (Author: christopher5106):
I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 7:01 PM:
-

I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
{code}
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
{code}
I tried to connect to the slaves, to set SPARK_LOCAL_IP in the slaves' 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP of the slaves.


was (Author: christopher5106):
I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerM

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 7:00 PM:
-

I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
{code}
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
{code}
I tried to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.


was (Author: christopher5106):
I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint:

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 6:59 PM:
-

I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
{code}
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
{code}
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,


was (Author: christopher5106):
I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 w

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 6:59 PM:
-

I launch a cluster 
{code}
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
{code}
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
{code}
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
{code}
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
{code}
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
{code}
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,


was (Author: christopher5106):
I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block m

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 6:59 PM:
-

I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

{code}
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
{code}

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,


was (Author: christopher5106):
I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

`
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, B

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez edited comment on SPARK-13317 at 2/14/16 6:58 PM:
-

I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

`
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
`

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,


was (Author: christopher5106):
I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

```
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockMana

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146689#comment-15146689
 ] 

Christopher Bourez commented on SPARK-13317:


I launch a cluster 
 ./ec2/spark-ec2 -k sparkclusterkey -i ~/sparkclusterkey.pem --region=eu-west-1 
--copy-aws-credentials --instance-type=m1.large -s 4 --hadoop-major-version=2 
launch spark-cluster
which gives me a master at ec2-54-229-16-73.eu-west-1.compute.amazonaws.com
and slaves at ec2-54-194-99-236.eu-west-1.compute.amazonaws.com etc
If I launch a job in client mode from another network, for example in a 
Zeppelin notebook on my macbook, which configuration is equivalent to 
spark-shell 
--master=spark://ec2-54-229-16-73.eu-west-1.compute.amazonaws.com:7077
I see in the logs : 

```
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/0 on worker-20160214185030-172.31.4.179-34425 
(172.31.4.179:34425) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/0 on hostPort 172.31.4.179:34425 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/1 on worker-20160214185030-172.31.4.176-47657 
(172.31.4.176:47657) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/1 on hostPort 172.31.4.176:47657 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/2 on worker-20160214185031-172.31.4.177-41379 
(172.31.4.177:41379) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/2 on hostPort 172.31.4.177:41379 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO AppClient$ClientEndpoint: Executor added: 
app-20160214185504-/3 on worker-20160214185032-172.31.4.178-34353 
(172.31.4.178:34353) with 2 cores
16/02/14 19:55:04 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20160214185504-/3 on hostPort 172.31.4.178:34353 with 2 cores, 1024.0 
MB RAM
16/02/14 19:55:04 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.1.11:64058 with 511.5 MB RAM, BlockManagerId(driver, 192.168.1.11, 
64058)
16/02/14 19:55:04 INFO BlockManagerMaster: Registered BlockManager
```

which are private IP that my macbook cannot access and when launching a job, an 
error follow : 
16/02/14 19:57:19 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
I tryied to connect to the slave, to set SPARK_LOCAL_IP in the slave's 
spark-env.sh, stop and restart all slaves from the master, spark master still 
returns the private IP.
Thanks,

> SPARK_LOCAL_IP does not bind on Slaves
> --
>
> Key: SPARK-13317
> URL: https://issues.apache.org/jira/browse/SPARK-13317
> Project: Spark
>  Issue Type: Bug
> Environment: Linux EC2, different VPC 
>Reporter: Christopher Bourez
>
> SPARK_LOCAL_IP does not bind to the provided IP on slaves.
> When launching a job or a spark-shell from a second network, the returned IP 
> for the slave is still the first IP of the slave. 
> So the job fails with the message : 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> It is not a question of resources but the driver which cannot connect to the 
> slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

2016-02-14 Thread Christopher Bourez (JIRA)

Christopher Bourez created SPARK-13317:
--

 Summary: SPARK_LOCAL_IP does not bind on Slaves
 Key: SPARK-13317
 URL: https://issues.apache.org/jira/browse/SPARK-13317
 Project: Spark
  Issue Type: Bug
 Environment: Linux EC2, different VPC 
Reporter: Christopher Bourez


SPARK_LOCAL_IP does not bind to the provided IP on slaves.
When launching a job or a spark-shell from a second network, the returned IP 
for the slave is still the first IP of the slave. 
So the job fails with the message : 

Initial job has not accepted any resources; check your cluster UI to ensure 
that workers are registered and have sufficient resources

It is not a question of resources but the driver which cannot connect to the 
slave given the wrong IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-12 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144901#comment-15144901
 ] 

Christopher Bourez commented on SPARK-12261:


Sean, how can I get the executor log in local mode ? Thanks

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-12 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144873#comment-15144873
 ] 

Christopher Bourez commented on SPARK-12261:


Here is what i see when i activate the logs : 

16/02/12 18:09:22 ERROR TaskSetManager: Task 0 in stage 5.0 failed 1 times; abor
ting job
16/02/12 18:09:22 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have
all completed, from pool
16/02/12 18:09:22 INFO TaskSchedulerImpl: Cancelling stage 5
16/02/12 18:09:22 INFO DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:39
3) failed in 0,280 s
16/02/12 18:09:22 INFO DAGScheduler: Job 5 failed: runJob at PythonRDD.scala:393
, took 0,308529 s
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Documents\c.bourez\Documents\spark-1.5.2-bin-hadoop2.6\spark-1.5.2-bi
n-hadoop2.6\python\pyspark\rdd.py", line 1299, in take
res = self.context.runJob(self, takeUpToNumLeft, p)
  File "C:\Documents\c.bourez\Documents\spark-1.5.2-bin-hadoop2.6\spark-1.5.2-bi
n-hadoop2.6\python\pyspark\context.py", line 916, in runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partition
s)
  File "C:\Documents\c.bourez\Documents\spark-1.5.2-bin-hadoop2.6\spark-1.5.2-bi
n-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py", line 538, in
__call__
  File "C:\Documents\c.bourez\Documents\spark-1.5.2-bin-hadoop2.6\spark-1.5.2-bi
n-hadoop2.6\python\pyspark\sql\utils.py", line 36, in deco
return f(*a, **kw)
  File "C:\Documents\c.bourez\Documents\spark-1.5.2-bin-hadoop2.6\spark-1.5.2-bi
n-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py", line 300, in get_
return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.
api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in s
tage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5,
localhost): java.net.SocketException: Connection reset by peer: socket write er
ror
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82
)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:622)
at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$Py
thonRDD$$write$1(PythonRDD.scala:442)
at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
1.apply(PythonRDD.scala:452)
at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
1.apply(PythonRDD.scala:452)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRD
D.scala:452)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.
apply(PythonRDD.scala:280)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apac

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-12 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144844#comment-15144844
 ] 

Christopher Bourez commented on SPARK-12261:


Sean Owen, do you reconsider the status as a Spark issue ? 

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-12 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144793#comment-15144793
 ] 

Christopher Bourez commented on SPARK-12261:


Dear Niall
Your solution works very well :)
Thank you a lot




> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-07 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136229#comment-15136229
 ] 

Christopher Bourez commented on SPARK-12261:


I'm still here if you need any more info about how to reproduce the case

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-01-28 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121182#comment-15121182
 ] 

Christopher Bourez commented on SPARK-12261:


There is a strange "remove broadcast variable" operation at the end of the 3 
third sc.textFile().take(1) method execution; and then the next executions 
fail. 
Can there be a link with this problem :
https://spark-project.atlassian.net/browse/SPARK-1065
?

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-01-28 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121114#comment-15121114
 ] 

Christopher Bourez commented on SPARK-12261:


I recompiled Spark on Windows but the problem remains. 

The first 3 times I launch the textFile command followed by a take(1), it 
works, but then does not work anymore. The memory (between python and the JVM) 
sounds not to be release.

I tried to re-init
sc.stop(), del sc, sc = SparkContext('local','test')
import gc, gc.collect()
...
does not change. Memory not released.

It sounds that OOM are quite common on Windows/Pyspark

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-01-26 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117345#comment-15117345
 ] 

Christopher Bourez commented on SPARK-12261:


The solution "Increase driver memory" does not work.

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

2016-01-26 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116907#comment-15116907
 ] 

Christopher Bourez edited comment on SPARK-12261 at 1/26/16 8:11 AM:
-

To reproduce you can follow the steps : 
- create an Aws Workspace with Windows 7 (that I can share you if you'd like) 
with Standard instance, 2GiB RAM
On this instance : 
- download spark (1.5 or 1.6 same pb) with hadoop 2.6
- install java 8 jdk
- download python 2.7.8 
- download the sample file 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
- launch Pyspark : bin\pyspark --master local[1]
- run command : sc.textFile("test.csv").take(1) => fails (very few times worked)
- run sc.textFile("test.csv", 2000).take(1) => works

Sample file is 13M, has been created randomly

for i in {0..30}; do
  VALUE="$RANDOM"
  for j in {0..6}; do 
VALUE="$VALUE;$RANDOM"; 
  done
  echo $VALUE >> test.csv
done

Running Pyspark with more memory
bin\pyspark --master local[1] --conf spark.driver.memory=3g
displays more memory in http://localhost:4040/executors
but does not change the problem

Full video of the problem : 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov



was (Author: christopher5106):
To reproduce you can follow the steps : 
- create an Aws Workspace with Windows 7 (that I can share you if you'd like) 
with Standard instance, 2GiB RAM
On this instance : 
- download spark (1.5 or 1.6 same pb) with hadoop 2.6
- install java 8 jdk
- download python 2.7.8 
- download the sample file 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
- launch Pyspark : bin\pyspark --master local[1]
- run command : sc.textFile("test.csv").take(1) => fails (very few times worked)
- run sc.textFile("test.csv", 2000).take(1) => works

Sample file is 13M, has been created randomly
for i in {0..30}; do
  VALUE="$RANDOM"
  for j in {0..6}; do 
VALUE="$VALUE;$RANDOM"; 
  done
  echo $VALUE >> test.csv
done

Running Pyspark with more memory
bin\pyspark --master local[1] --conf spark.driver.memory=3g
displays more memory in http://localhost:4040/executors
but does not change the problem

Full video of the problem : 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov


> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

2016-01-26 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116907#comment-15116907
 ] 

Christopher Bourez edited comment on SPARK-12261 at 1/26/16 8:10 AM:
-

To reproduce you can follow the steps : 
- create an Aws Workspace with Windows 7 (that I can share you if you'd like) 
with Standard instance, 2GiB RAM
On this instance : 
- download spark (1.5 or 1.6 same pb) with hadoop 2.6
- install java 8 jdk
- download python 2.7.8 
- download the sample file 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
- launch Pyspark : bin\pyspark --master local[1]
- run command : sc.textFile("test.csv").take(1) => fails (very few times worked)
- run sc.textFile("test.csv", 2000).take(1) => works

Sample file is 13M, has been created randomly
for i in {0..30}; do
  VALUE="$RANDOM"
  for j in {0..6}; do 
VALUE="$VALUE;$RANDOM"; 
  done
  echo $VALUE >> test.csv
done

Running Pyspark with more memory
bin\pyspark --master local[1] --conf spark.driver.memory=3g
displays more memory in http://localhost:4040/executors
but does not change the problem

Full video of the problem : 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov



was (Author: christopher5106):
To reproduce you can follow the steps : 
- create an Aws Workspace with Windows 7 (that I can share you if you'd like) 
with Standard instance, 2GiB RAM
On this instance : 
- download spark (1.5 or 1.6 same pb) with hadoop 2.6
- install java 8 jdk
- download python 2.7.8 
- downloaded the sample file 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
- launch Pyspark : bin\pyspark --master local[1]
- run command : sc.textFile("test.csv").take(1) => fails (very few times worked)
- run sc.textFile("test.csv", 2000).take(1) => works

Sample file is 13M, has been created randomly
for i in {0..30}; do
  VALUE="$RANDOM"
  for j in {0..6}; do 
VALUE="$VALUE;$RANDOM"; 
  done
  echo $VALUE >> test.csv
done

Running Pyspark with more memory
bin\pyspark --master local[1] --conf spark.driver.memory=3g
displays more memory in http://localhost:4040/executors
but does not change the problem

Full video of the problem : 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov


> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-01-26 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116907#comment-15116907
 ] 

Christopher Bourez commented on SPARK-12261:


To reproduce you can follow the steps : 
- create an Aws Workspace with Windows 7 (that I can share you if you'd like) 
with Standard instance, 2GiB RAM
On this instance : 
- download spark (1.5 or 1.6 same pb) with hadoop 2.6
- install java 8 jdk
- download python 2.7.8 
- downloaded the sample file 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
- launch Pyspark : bin\pyspark --master local[1]
- run command : sc.textFile("test.csv").take(1) => fails (very few times worked)
- run sc.textFile("test.csv", 2000).take(1) => works

Sample file is 13M, has been created randomly
for i in {0..30}; do
  VALUE="$RANDOM"
  for j in {0..6}; do 
VALUE="$VALUE;$RANDOM"; 
  done
  echo $VALUE >> test.csv
done

Running Pyspark with more memory
bin\pyspark --master local[1] --conf spark.driver.memory=3g
displays more memory in http://localhost:4040/executors
but does not change the problem

Full video of the problem : 
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov


> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-12980) pyspark crash for large dataset - clone

2016-01-25 Thread Christopher Bourez (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bourez closed SPARK-12980.
--

> pyspark crash for large dataset - clone
> ---
>
> Key: SPARK-12980
> URL: https://issues.apache.org/jira/browse/SPARK-12980
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: Christopher Bourez
>
> I installed spark 1.6 on many different computers. 
> On Windows, PySpark textfile method, followed by take(1), does not work on a 
> file of 13M.
> If I set numpartitions to 2000 or take a smaller file, the method works well.
> The Pyspark is set with all RAM memory of the computer thanks to the command 
> --conf spark.driver.memory=5g in local mode.
> On Mac OS, I'm able to launch the exact same program with Pyspark with 16G 
> RAM for a file of much bigger in comparison, of 5G. Memory is correctly 
> allocated, removed etc
> On Ubuntu, no trouble, I can also launch a cluster 
> http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html
> The error message on Windows is : java.net.SocketException: Connection reset 
> by peer: socket write error
> Configuration is : Java 8 64 bit, Python 2.7.11, on Windows 7 entreprise SP1 
> v2.42.01
> What could be the reason to have the windows spark textfile method fail ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-01-25 Thread Christopher Bourez (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115433#comment-15115433
 ] 

Christopher Bourez commented on SPARK-12261:


I think the issue is not resolved

I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well. 
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode. 

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc 

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html
 

The error message on Windows is : java.net.SocketException: Connection reset by 
peer: socket write error 
What could be the reason to have the windows spark textfile method fail ?

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

2016-01-25 Thread Christopher Bourez (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bourez updated SPARK-12980:
---
Description: 
I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well.
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode.

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html

The error message on Windows is : java.net.SocketException: Connection reset by 
peer: socket write error
Configuration is : Java 8 64 bit, Python 2.7.11, on Windows 7 entreprise SP1 
v2.42.01
What could be the reason to have the windows spark textfile method fail ?

  was:
I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well.
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode.

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html

The error message on Windows is : java.net.SocketException: Connection reset by 
peer: socket write error
What could be the reason to have the windows spark textfile method fail ?


> pyspark crash for large dataset - clone
> ---
>
> Key: SPARK-12980
> URL: https://issues.apache.org/jira/browse/SPARK-12980
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: Christopher Bourez
>
> I installed spark 1.6 on many different computers. 
> On Windows, PySpark textfile method, followed by take(1), does not work on a 
> file of 13M.
> If I set numpartitions to 2000 or take a smaller file, the method works well.
> The Pyspark is set with all RAM memory of the computer thanks to the command 
> --conf spark.driver.memory=5g in local mode.
> On Mac OS, I'm able to launch the exact same program with Pyspark with 16G 
> RAM for a file of much bigger in comparison, of 5G. Memory is correctly 
> allocated, removed etc
> On Ubuntu, no trouble, I can also launch a cluster 
> http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html
> The error message on Windows is : java.net.SocketException: Connection reset 
> by peer: socket write error
> Configuration is : Java 8 64 bit, Python 2.7.11, on Windows 7 entreprise SP1 
> v2.42.01
> What could be the reason to have the windows spark textfile method fail ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

2016-01-25 Thread Christopher Bourez (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bourez updated SPARK-12980:
---
Description: 
I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well.
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode.

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html

What could be the reason to have the windows spark textfile method fail ?

  was:
I tried to import a local text(over 100mb) file via textFile in pyspark, when i 
ran data.take(), it failed and gave error messages including:
15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
aborting job
Traceback (most recent call last):
  File "E:/spark_python/test3.py", line 9, in 
lines.take(5)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
in take
res = self.context.runJob(self, takeUpToNumLeft, p)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
916, in runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
partitions)
  File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
__call__
answer, self.gateway_client, self.target_id, self.name)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
36, in deco
return f(*a, **kw)
  File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0, localhost): java.net.SocketException: Connection reset by peer: socket write 
error

Then i ran the same code for a small text file, this time .take() worked fine.
How can i solve this problem?


> pyspark crash for large dataset - clone
> ---
>
> Key: SPARK-12980
> URL: https://issues.apache.org/jira/browse/SPARK-12980
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: Christopher Bourez
>
> I installed spark 1.6 on many different computers. 
> On Windows, PySpark textfile method, followed by take(1), does not work on a 
> file of 13M.
> If I set numpartitions to 2000 or take a smaller file, the method works well.
> The Pyspark is set with all RAM memory of the computer thanks to the command 
> --conf spark.driver.memory=5g in local mode.
> On Mac OS, I'm able to launch the exact same program with Pyspark with 16G 
> RAM for a file of much bigger in comparison, of 5G. Memory is correctly 
> allocated, removed etc
> On Ubuntu, no trouble, I can also launch a cluster 
> http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html
> What could be the reason to have the windows spark textfile method fail ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

2016-01-25 Thread Christopher Bourez (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bourez updated SPARK-12980:
---
Description: 
I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well.
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode.

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html

The error message on Windows is : java.net.SocketException: Connection reset by 
peer: socket write error
What could be the reason to have the windows spark textfile method fail ?

  was:
I installed spark 1.6 on many different computers. 

On Windows, PySpark textfile method, followed by take(1), does not work on a 
file of 13M.
If I set numpartitions to 2000 or take a smaller file, the method works well.
The Pyspark is set with all RAM memory of the computer thanks to the command 
--conf spark.driver.memory=5g in local mode.

On Mac OS, I'm able to launch the exact same program with Pyspark with 16G RAM 
for a file of much bigger in comparison, of 5G. Memory is correctly allocated, 
removed etc

On Ubuntu, no trouble, I can also launch a cluster 
http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html

What could be the reason to have the windows spark textfile method fail ?


> pyspark crash for large dataset - clone
> ---
>
> Key: SPARK-12980
> URL: https://issues.apache.org/jira/browse/SPARK-12980
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: Christopher Bourez
>
> I installed spark 1.6 on many different computers. 
> On Windows, PySpark textfile method, followed by take(1), does not work on a 
> file of 13M.
> If I set numpartitions to 2000 or take a smaller file, the method works well.
> The Pyspark is set with all RAM memory of the computer thanks to the command 
> --conf spark.driver.memory=5g in local mode.
> On Mac OS, I'm able to launch the exact same program with Pyspark with 16G 
> RAM for a file of much bigger in comparison, of 5G. Memory is correctly 
> allocated, removed etc
> On Ubuntu, no trouble, I can also launch a cluster 
> http://christopher5106.github.io/big/data/2016/01/19/computation-power-as-you-need-with-EMR-auto-termination-cluster-example-random-forest-python.html
> The error message on Windows is : java.net.SocketException: Connection reset 
> by peer: socket write error
> What could be the reason to have the windows spark textfile method fail ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12980) pyspark crash for large dataset - clone

2016-01-25 Thread Christopher Bourez (JIRA)

Christopher Bourez created SPARK-12980:
--

 Summary: pyspark crash for large dataset - clone
 Key: SPARK-12980
 URL: https://issues.apache.org/jira/browse/SPARK-12980
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.5.2
 Environment: windows
Reporter: Christopher Bourez


I tried to import a local text(over 100mb) file via textFile in pyspark, when i 
ran data.take(), it failed and gave error messages including:
15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
aborting job
Traceback (most recent call last):
  File "E:/spark_python/test3.py", line 9, in 
lines.take(5)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
in take
res = self.context.runJob(self, takeUpToNumLeft, p)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
916, in runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
partitions)
  File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
__call__
answer, self.gateway_client, self.target_id, self.name)
  File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
36, in deco
return f(*a, **kw)
  File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0, localhost): java.net.SocketException: Connection reset by peer: socket write 
error

Then i ran the same code for a small text file, this time .take() worked fine.
How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind to public IP on Slaves

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind to public IP on Slaves

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Comment Edited] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Commented] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Created] (SPARK-13317) SPARK_LOCAL_IP does not bind on Slaves

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Closed] (SPARK-12980) pyspark crash for large dataset - clone

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

[jira] [Updated] (SPARK-12980) pyspark crash for large dataset - clone

[jira] [Created] (SPARK-12980) pyspark crash for large dataset - clone

31 matches

Site Navigation

Mail list logo

Footer information