Re: Spark driver locality

2015-08-28 Thread Rishitesh Mishra
Hi Swapnil, 1. All the task scheduling/retry happens from Driver. So you are right that a lot of communication happens from driver to cluster. It all depends on the how you want to go about your Spark application, whether your application has direct access to Spark cluster or its routed through a

Re: Spark driver locality

2015-08-28 Thread Swapnil Shinde
Thanks.. On Aug 28, 2015 4:55 AM, Rishitesh Mishra rishi80.mis...@gmail.com wrote: Hi Swapnil, 1. All the task scheduling/retry happens from Driver. So you are right that a lot of communication happens from driver to cluster. It all depends on the how you want to go about your Spark

Spark driver locality

2015-08-27 Thread Swapnil Shinde
Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on below doubts- 1. Driver locality - It is mentioned in documentation that client deploy-mode is not good if machine running spark-submit is not co-located with worker

Re: Spark driver locality

2015-08-27 Thread Rishitesh Mishra
Hi Swapnil, Let me try to answer some of the questions. Answers inline. Hope it helps. On Thursday, August 27, 2015, Swapnil Shinde swapnilushi...@gmail.com wrote: Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on

Re: Spark driver locality

2015-08-27 Thread Swapnil Shinde
Thanks Rishitesh !! 1. I get that driver doesn't need to be on master but there is lot of communication between driver and cluster. That's why co-located gateway was recommended. How much is the impact of driver not being co-located with cluster? 4. How does hdfs split get assigned to worker node