Hi Spark devs/users,
One of the things we are investigating here at Netflix is if Spark would
suit us for our ETL needs, and one of requirements is multi tenancy.
I did read the official doc
<http://spark.apache.org/docs/latest/job-scheduling.html> and the book, but
I'm still not clear on certain things.

Here are my questions :
1. *Sharing spark context* : How exactly multiple users can share the
cluster using same spark
    context ? UserA wants to run AppA, UserB wants to run AppB. How do they
talk to same
    context ? How exactly are each of their jobs scheduled and run in same
context?
    Is preemption supported in this scenario ? How are user names passed on
to the spark context ?

2. *Different spark context in YARN*: assuming I have a YARN cluster with
queues and preemption
    configured. Are there problems if executors/containers of a spark app
are preempted to allow a
    high priority spark app to execute ? Would the preempted app get stuck
or would it continue to
    make progress? How are user names passed on from spark to yarn(say I'm
using nested user
    queues feature in fair scheduler) ?

3. Sharing RDDs in 1 and 2 above ?

4. Anything else about user/job isolation ?

I know I'm asking a lot of questions. Thanks in advance :) !

-- 
Thanks,
Ashwin
Netflix

Reply via email to