Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Is it possible to run across cluster using Spark Interactive Shell ? To be more explicit, is the procedure similar to running standalone master-slave spark. I want to execute my code in the interactive shell in the master-node, and it should run across the cluster [say 5 node]. Is the procedure

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what do you mean by run across the cluster? you want to start the spark-shell across the cluster or you want to distribute tasks to multiple machines? if the former case, yes, as long as you indicate the right master URL if the later case, also yes, you can observe the distributed task in the

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.] If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in the /conf/slaves at the master node, will the interactive shell code run at the master get distributed across multiple machines

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what you only need to do is ensure your spark cluster is running well, (you can check by access the Spark UI to see if all workers are displayed) then, you have to set correct SPARK_MASTER_IP in the machine where you run spark-shell The more details are : when you run bin/spark-shell, it will

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Yana Kadiyska
Nan (or anyone who feels they understand the cluster architecture well), can you clarify something for me. From reading this user group and your explanation above it appears that the cluster master is only involved in this during application startup -- to allocate executors(from what you wrote

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
master does more work than that actually, I just explained why he should set MASTER_IP correctly a simplified list: 1. maintain the worker status 2. maintain in-cluster driver status 3. maintain executor status (the worker tells master what happened on the executor, -- Nan Zhu On

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
and, yes, I think that picture is a bit misleading, though in the following paragraph it has mentioned that “ Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the