Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

Sachin Mittal Wed, 20 Jul 2016 10:31:48 -0700

Hi,
I was able to build and run my spark application via spark submit.

I have understood some of the concepts by going through the resources at
https://spark.apache.org but few doubts still remain. I have few specific
questions and would be glad if someone could share some light on it.


So I submitted the application using spark.master    local[*] and I have a
8 core PC.

- What I understand is that application is called as job. Since mine had
two stages it gets divided into 2 stages and each stage had number of tasks
which ran in parallel.
Is this understanding correct.

- What I notice is that each stage is further divided into 262 tasks From
where did this number 262 came from. Is this configurable. Would increasing
this number improve performance.

- Also I see that the tasks are run in parallel in set of 8. Is this
because I have a 8 core PC.

- What is the difference or relation between slave and worker. When I did
spark-submit did it start 8 slaves or worker threads?

- I see all worker threads running in one single JVM. Is this because I did
not start  slaves separately and connect it to a single master cluster
manager. If I had done that then each worker would have run in its own JVM.

- What is the relationship between worker and executor. Can a worker have
more than one executors? If yes then how do we configure that. Does all
executor run in the worker JVM and are independent threads.

I suppose that is all for now. Would appreciate any response.Will add
followup questions if any.

Thanks
Sachin

Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

Reply via email to