Hi,
I was able to build and run my spark application via spark submit.

I have understood some of the concepts by going through the resources at
https://spark.apache.org but few doubts still remain. I have few specific
questions and would be glad if someone could share some light on it.

So I submitted the application using spark.master    local[*] and I have a
8 core PC.

- What I understand is that application is called as job. Since mine had
two stages it gets divided into 2 stages and each stage had number of tasks
which ran in parallel.
Is this understanding correct.

- What I notice is that each stage is further divided into 262 tasks From
where did this number 262 came from. Is this configurable. Would increasing
this number improve performance.

- Also I see that the tasks are run in parallel in set of 8. Is this
because I have a 8 core PC.

- What is the difference or relation between slave and worker. When I did
spark-submit did it start 8 slaves or worker threads?

- I see all worker threads running in one single JVM. Is this because I did
not start  slaves separately and connect it to a single master cluster
manager. If I had done that then each worker would have run in its own JVM.

- What is the relationship between worker and executor. Can a worker have
more than one executors? If yes then how do we configure that. Does all
executor run in the worker JVM and are independent threads.

I suppose that is all for now. Would appreciate any response.Will add
followup questions if any.

Thanks
Sachin

Reply via email to