Hi Spark devs/users, One of the things we are investigating here at Netflix is if Spark would suit us for our ETL needs, and one of requirements is multi tenancy. I did read the official doc <http://spark.apache.org/docs/latest/job-scheduling.html> and the book, but I'm still not clear on certain things.
Here are my questions : 1. *Sharing spark context* : How exactly multiple users can share the cluster using same spark context ? UserA wants to run AppA, UserB wants to run AppB. How do they talk to same context ? How exactly are each of their jobs scheduled and run in same context? Is preemption supported in this scenario ? How are user names passed on to the spark context ? 2. *Different spark context in YARN*: assuming I have a YARN cluster with queues and preemption configured. Are there problems if executors/containers of a spark app are preempted to allow a high priority spark app to execute ? Would the preempted app get stuck or would it continue to make progress? How are user names passed on from spark to yarn(say I'm using nested user queues feature in fair scheduler) ? 3. Sharing RDDs in 1 and 2 above ? 4. Anything else about user/job isolation ? I know I'm asking a lot of questions. Thanks in advance :) ! -- Thanks, Ashwin Netflix