Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Is it possible to run across cluster using Spark Interactive Shell ?

To be more explicit, is the procedure similar to running standalone
master-slave spark.

I want to execute my code in  the interactive shell in the master-node, and
it should run across the cluster [say 5 node]. Is the procedure similar ???





-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*


Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what do you mean by run across the cluster? 

you want to start the spark-shell across the cluster or you want to distribute 
tasks to multiple machines?

if the former case, yes, as long as you indicate the right master URL

if the later case, also yes, you can observe the distributed task in the Spark 
UI 

-- 
Nan Zhu


On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:

 Is it possible to run across cluster using Spark Interactive Shell ?
 
 To be more explicit, is the procedure similar to running standalone 
 master-slave spark.
 
 I want to execute my code in  the interactive shell in the master-node, and 
 it should run across the cluster [say 5 node]. Is the procedure similar ???
 
 
 
 
 
 -- 
 Sai Prasanna. AN
 II M.Tech (CS), SSSIHL
 
 Entire water in the ocean can never sink a ship, Unless it gets inside.
 All the pressures of life can never hurt you, Unless you let them in.



Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Sai Prasanna
Nan Zhu, its the later, I want to distribute the tasks to the cluster
[machines available.]

If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in
the /conf/slaves at the master node, will the interactive shell code run at
the master get distributed across multiple machines ???





On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  what do you mean by run across the cluster?

 you want to start the spark-shell across the cluster or you want to
 distribute tasks to multiple machines?

 if the former case, yes, as long as you indicate the right master URL

 if the later case, also yes, you can observe the distributed task in the
 Spark UI

 --
 Nan Zhu

 On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:

 Is it possible to run across cluster using Spark Interactive Shell ?

 To be more explicit, is the procedure similar to running standalone
 master-slave spark.

 I want to execute my code in  the interactive shell in the master-node,
 and it should run across the cluster [say 5 node]. Is the procedure similar
 ???





 --
 *Sai Prasanna. AN*
 *II M.Tech (CS), SSSIHL*


 *Entire water in the ocean can never sink a ship, Unless it gets inside.
 All the pressures of life can never hurt you, Unless you let them in.*





-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*


Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
what you only need to do is ensure your spark cluster is running well, (you can 
check by access the Spark UI to see if all workers are displayed)

then, you have to set correct SPARK_MASTER_IP in the machine where you run 
spark-shell

The more details are :

when you run bin/spark-shell, it will start the driver program in that machine, 
interacting with the Master to start the application (in this case, it is 
spark-shell)

the Master tells Workers to start executors for your application, and the 
executors will try to register with your driver, 

then your driver can distribute tasks to the executors, i.e. run in a 
distributed fashion


Best, 

-- 
Nan Zhu


On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:

 Nan Zhu, its the later, I want to distribute the tasks to the cluster 
 [machines available.]
 
 If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in 
 the /conf/slaves at the master node, will the interactive shell code run at 
 the master get distributed across multiple machines ??? 
 
 
  
 
 
 On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu zhunanmcg...@gmail.com 
 (mailto:zhunanmcg...@gmail.com) wrote:
  what do you mean by run across the cluster? 
  
  you want to start the spark-shell across the cluster or you want to 
  distribute tasks to multiple machines?
  
  if the former case, yes, as long as you indicate the right master URL 
  
  if the later case, also yes, you can observe the distributed task in the 
  Spark UI 
  
  -- 
  Nan Zhu
  
  
  On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
  
   Is it possible to run across cluster using Spark Interactive Shell ?
   
   To be more explicit, is the procedure similar to running standalone 
   master-slave spark. 
   
   I want to execute my code in  the interactive shell in the master-node, 
   and it should run across the cluster [say 5 node]. Is the procedure 
   similar ???
   
   
   
   
   
   -- 
   Sai Prasanna. AN
   II M.Tech (CS), SSSIHL
   
   Entire water in the ocean can never sink a ship, Unless it gets inside.
   All the pressures of life can never hurt you, Unless you let them in.
  
 
 
 
 -- 
 Sai Prasanna. AN
 II M.Tech (CS), SSSIHL
 
 Entire water in the ocean can never sink a ship, Unless it gets inside.
 All the pressures of life can never hurt you, Unless you let them in.



Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Yana Kadiyska
Nan (or anyone who feels they understand the cluster architecture well),
can you clarify something for me.

From reading this user group and your explanation above it appears that the
cluster master is only involved in this during application startup -- to
allocate executors(from what you wrote sounds like the driver itself passes
the job/tasks to  the executors). From there onwards all computation is
done on the executors, who communicate results directly to the driver if
certain actions (say collect) are performed. Is that right? The only
description of the cluster I've seen came from here:
https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture
suggests there is no direct communication between driver and executors,
which I believe is wrong (unless I am misreading the picture -- I believe
Master and Cluster Manager refer to the same thing?).

The very short form of my question is, does the master do anything other
than executor allocation?


On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu zhunanmcg...@gmail.com wrote:

  what you only need to do is ensure your spark cluster is running well,
 (you can check by access the Spark UI to see if all workers are displayed)

 then, you have to set correct SPARK_MASTER_IP in the machine where you run
 spark-shell

 The more details are :

 when you run bin/spark-shell, it will start the driver program in that
 machine, interacting with the Master to start the application (in this
 case, it is spark-shell)

 the Master tells Workers to start executors for your application, and the
 executors will try to register with your driver,

 then your driver can distribute tasks to the executors, i.e. run in a
 distributed fashion


 Best,

 --
 Nan Zhu

 On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:

 Nan Zhu, its the later, I want to distribute the tasks to the cluster
 [machines available.]

 If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP
 in the /conf/slaves at the master node, will the interactive shell code run
 at the master get distributed across multiple machines ???





 On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  what do you mean by run across the cluster?

 you want to start the spark-shell across the cluster or you want to
 distribute tasks to multiple machines?

 if the former case, yes, as long as you indicate the right master URL

 if the later case, also yes, you can observe the distributed task in the
 Spark UI

 --
 Nan Zhu

 On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:

 Is it possible to run across cluster using Spark Interactive Shell ?

 To be more explicit, is the procedure similar to running standalone
 master-slave spark.

 I want to execute my code in  the interactive shell in the master-node,
 and it should run across the cluster [say 5 node]. Is the procedure similar
 ???





 --
 *Sai Prasanna. AN*
 *II M.Tech (CS), SSSIHL*


 *Entire water in the ocean can never sink a ship, Unless it gets inside.
 All the pressures of life can never hurt you, Unless you let them in.*





 --
 *Sai Prasanna. AN*
 *II M.Tech (CS), SSSIHL*


 *Entire water in the ocean can never sink a ship, Unless it gets inside.
 All the pressures of life can never hurt you, Unless you let them in.*





Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
master does more work than that actually, I just explained why he should set 
MASTER_IP correctly

a simplified list:

1. maintain the  worker status

2. maintain in-cluster driver status

3. maintain executor status (the worker tells master what happened on the 
executor, 



-- 
Nan Zhu



On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote:

 Nan (or anyone who feels they understand the cluster architecture well), can 
 you clarify something for me. 
 
 From reading this user group and your explanation above it appears that the 
 cluster master is only involved in this during application startup -- to 
 allocate executors(from what you wrote sounds like the driver itself passes 
 the job/tasks to  the executors). From there onwards all computation is done 
 on the executors, who communicate results directly to the driver if certain 
 actions (say collect) are performed. Is that right? The only description of 
 the cluster I've seen came from here: 
 https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture 
 suggests there is no direct communication between driver and executors, which 
 I believe is wrong (unless I am misreading the picture -- I believe Master 
 and Cluster Manager refer to the same thing?). 
 
 The very short form of my question is, does the master do anything other than 
 executor allocation?
 
 
 On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu zhunanmcg...@gmail.com 
 (mailto:zhunanmcg...@gmail.com) wrote:
  what you only need to do is ensure your spark cluster is running well, (you 
  can check by access the Spark UI to see if all workers are displayed)
  
  then, you have to set correct SPARK_MASTER_IP in the machine where you run 
  spark-shell 
  
  The more details are :
  
  when you run bin/spark-shell, it will start the driver program in that 
  machine, interacting with the Master to start the application (in this 
  case, it is spark-shell) 
  
  the Master tells Workers to start executors for your application, and the 
  executors will try to register with your driver, 
  
  then your driver can distribute tasks to the executors, i.e. run in a 
  distributed fashion 
  
  
  Best, 
  
  -- 
  Nan Zhu
  
  
  On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:
  
   Nan Zhu, its the later, I want to distribute the tasks to the cluster 
   [machines available.]
   
   If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP 
   in the /conf/slaves at the master node, will the interactive shell code 
   run at the master get distributed across multiple machines ??? 
   
   

   
   
   On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu zhunanmcg...@gmail.com 
   (mailto:zhunanmcg...@gmail.com) wrote:
what do you mean by run across the cluster? 

you want to start the spark-shell across the cluster or you want to 
distribute tasks to multiple machines?

if the former case, yes, as long as you indicate the right master URL 

if the later case, also yes, you can observe the distributed task in 
the Spark UI 

-- 
Nan Zhu


On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:

 Is it possible to run across cluster using Spark Interactive Shell ?
 
 To be more explicit, is the procedure similar to running standalone 
 master-slave spark. 
 
 I want to execute my code in  the interactive shell in the 
 master-node, and it should run across the cluster [say 5 node]. Is 
 the procedure similar ???
 
 
 
 
 
 -- 
 Sai Prasanna. AN
 II M.Tech (CS), SSSIHL
 
 Entire water in the ocean can never sink a ship, Unless it gets 
 inside.
 All the pressures of life can never hurt you, Unless you let them in.

   
   
   
   -- 
   Sai Prasanna. AN
   II M.Tech (CS), SSSIHL
   
   Entire water in the ocean can never sink a ship, Unless it gets inside.
   All the pressures of life can never hurt you, Unless you let them in.
  
 



Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
and, yes, I think that picture is a bit misleading, though in the following 
paragraph it has mentioned that  

“
Because the driver schedules tasks on the cluster, it should be run close to 
the worker nodes, preferably on the same local area network. If you’d like to 
send requests to the cluster remotely, it’s better to open an RPC to the driver 
and have it submit operations from nearby than to run a driver far away from 
the worker nodes.


--  
Nan Zhu


On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote:

 master does more work than that actually, I just explained why he should set 
 MASTER_IP correctly
  
 a simplified list:
  
 1. maintain the  worker status
  
 2. maintain in-cluster driver status
  
 3. maintain executor status (the worker tells master what happened on the 
 executor,  
  
  
  
 --  
 Nan Zhu
  
  
  
 On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote:
  
  Nan (or anyone who feels they understand the cluster architecture well), 
  can you clarify something for me.  
   
  From reading this user group and your explanation above it appears that the 
  cluster master is only involved in this during application startup -- to 
  allocate executors(from what you wrote sounds like the driver itself passes 
  the job/tasks to  the executors). From there onwards all computation is 
  done on the executors, who communicate results directly to the driver if 
  certain actions (say collect) are performed. Is that right? The only 
  description of the cluster I've seen came from here: 
  https://spark.apache.org/docs/0.9.0/cluster-overview.html but that picture 
  suggests there is no direct communication between driver and executors, 
  which I believe is wrong (unless I am misreading the picture -- I believe 
  Master and Cluster Manager refer to the same thing?).  
   
  The very short form of my question is, does the master do anything other 
  than executor allocation?
   
   
  On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu zhunanmcg...@gmail.com 
  (mailto:zhunanmcg...@gmail.com) wrote:
   what you only need to do is ensure your spark cluster is running well, 
   (you can check by access the Spark UI to see if all workers are displayed)

   then, you have to set correct SPARK_MASTER_IP in the machine where you 
   run spark-shell  

   The more details are :

   when you run bin/spark-shell, it will start the driver program in that 
   machine, interacting with the Master to start the application (in this 
   case, it is spark-shell)  

   the Master tells Workers to start executors for your application, and the 
   executors will try to register with your driver,  

   then your driver can distribute tasks to the executors, i.e. run in a 
   distributed fashion  


   Best,  

   --  
   Nan Zhu


   On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote:

Nan Zhu, its the later, I want to distribute the tasks to the cluster 
[machines available.]
 
If i set the SPARK_MASTER_IP at the other machines and set the 
slaves-IP in the /conf/slaves at the master node, will the interactive 
shell code run at the master get distributed across multiple machines 
???  
 
 
  
 
 
On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu zhunanmcg...@gmail.com 
(mailto:zhunanmcg...@gmail.com) wrote:
 what do you mean by run across the cluster?  
  
 you want to start the spark-shell across the cluster or you want to 
 distribute tasks to multiple machines?
  
 if the former case, yes, as long as you indicate the right master URL 
  
  
 if the later case, also yes, you can observe the distributed task in 
 the Spark UI  
  
 --  
 Nan Zhu
  
  
 On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
  
  Is it possible to run across cluster using Spark Interactive Shell ?
   
  To be more explicit, is the procedure similar to running standalone 
  master-slave spark.  
   
  I want to execute my code in  the interactive shell in the 
  master-node, and it should run across the cluster [say 5 node]. Is 
  the procedure similar ???
   
   
   
   
   
  --  
  Sai Prasanna. AN
  II M.Tech (CS), SSSIHL
   
  Entire water in the ocean can never sink a ship, Unless it gets 
  inside.
  All the pressures of life can never hurt you, Unless you let them 
  in.
  
 
 
 
--  
Sai Prasanna. AN
II M.Tech (CS), SSSIHL
 
Entire water in the ocean can never sink a ship, Unless it gets inside.
All the pressures of life can never hurt you, Unless you let them in.