Is it possible to run across cluster using Spark Interactive Shell ?
To be more explicit, is the procedure similar to running standalone
master-slave spark.
I want to execute my code in the interactive shell in the master-node, and
it should run across the cluster [say 5 node]. Is the procedure
what do you mean by run across the cluster?
you want to start the spark-shell across the cluster or you want to distribute
tasks to multiple machines?
if the former case, yes, as long as you indicate the right master URL
if the later case, also yes, you can observe the distributed task in the
Nan Zhu, its the later, I want to distribute the tasks to the cluster
[machines available.]
If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP in
the /conf/slaves at the master node, will the interactive shell code run at
the master get distributed across multiple machines
what you only need to do is ensure your spark cluster is running well, (you can
check by access the Spark UI to see if all workers are displayed)
then, you have to set correct SPARK_MASTER_IP in the machine where you run
spark-shell
The more details are :
when you run bin/spark-shell, it will
Nan (or anyone who feels they understand the cluster architecture well),
can you clarify something for me.
From reading this user group and your explanation above it appears that the
cluster master is only involved in this during application startup -- to
allocate executors(from what you wrote
master does more work than that actually, I just explained why he should set
MASTER_IP correctly
a simplified list:
1. maintain the worker status
2. maintain in-cluster driver status
3. maintain executor status (the worker tells master what happened on the
executor,
--
Nan Zhu
On
and, yes, I think that picture is a bit misleading, though in the following
paragraph it has mentioned that
“
Because the driver schedules tasks on the cluster, it should be run close to
the worker nodes, preferably on the same local area network. If you’d like to
send requests to the