Hi, I was able to build and run my spark application via spark submit. I have understood some of the concepts by going through the resources at https://spark.apache.org but few doubts still remain. I have few specific questions and would be glad if someone could share some light on it.
So I submitted the application using spark.master local[*] and I have a 8 core PC. - What I understand is that application is called as job. Since mine had two stages it gets divided into 2 stages and each stage had number of tasks which ran in parallel. Is this understanding correct. - What I notice is that each stage is further divided into 262 tasks From where did this number 262 came from. Is this configurable. Would increasing this number improve performance. - Also I see that the tasks are run in parallel in set of 8. Is this because I have a 8 core PC. - What is the difference or relation between slave and worker. When I did spark-submit did it start 8 slaves or worker threads? - I see all worker threads running in one single JVM. Is this because I did not start slaves separately and connect it to a single master cluster manager. If I had done that then each worker would have run in its own JVM. - What is the relationship between worker and executor. Can a worker have more than one executors? If yes then how do we configure that. Does all executor run in the worker JVM and are independent threads. I suppose that is all for now. Would appreciate any response.Will add followup questions if any. Thanks Sachin