Re: Multitenancy in Spark - within/across spark context

2014-10-25 Thread RJ Nowling
Ashwin, What is your motivation for needing to share RDDs between jobs? Optimizing for reusing data across jobs? If so, you may want to look into Tachyon. My understanding is that Tachyon acts like a caching layer and you can designate when data will be reused in multiple jobs so it know to keep

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Jianshi Huang
Upvote for the multitanency requirement. I'm also building a data analytic platform and there'll be multiple users running queries and computations simultaneously. One of the paint point is control of resource size. Users don't really know how much nodes they need, they always use as much as

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Marcelo Vanzin
You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174. On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Upvote for the multitanency requirement. I'm also building a data analytic platform and there'll be multiple users running queries and

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Evan Chan
Ashwin, I would say the strategies in general are: 1) Have each user submit separate Spark app (each its own Spark Context), with its own resource settings, and share data through HDFS or something like Tachyon for speed. 2) Share a single spark context amongst multiple users, using fair

Multitenancy in Spark - within/across spark context

2014-10-22 Thread Ashwin Shankar
Hi Spark devs/users, One of the things we are investigating here at Netflix is if Spark would suit us for our ETL needs, and one of requirements is multi tenancy. I did read the official doc http://spark.apache.org/docs/latest/job-scheduling.html and the book, but I'm still not clear on certain

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
Hi Ashwin, Let me try to answer to the best of my knowledge. On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Here are my questions : 1. Sharing spark context : How exactly multiple users can share the cluster using same spark context ? That's not

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Ashwin Shankar
Thanks Marcelo, that was helpful ! I had some follow up questions : That's not something you might want to do usually. In general, a SparkContext maps to a user application My question was basically this. In this http://spark.apache.org/docs/latest/job-scheduling.html page in the official doc,

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: That's not something you might want to do usually. In general, a SparkContext maps to a user application My question was basically this. In this page in the official doc, under Scheduling within an application