Ashwin,
What is your motivation for needing to share RDDs between jobs? Optimizing
for reusing data across jobs?
If so, you may want to look into Tachyon. My understanding is that Tachyon
acts like a caching layer and you can designate when data will be reused in
multiple jobs so it know to keep
Upvote for the multitanency requirement.
I'm also building a data analytic platform and there'll be multiple users
running queries and computations simultaneously. One of the paint point is
control of resource size. Users don't really know how much nodes they need,
they always use as much as
You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174.
On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang jianshi.hu...@gmail.com wrote:
Upvote for the multitanency requirement.
I'm also building a data analytic platform and there'll be multiple users
running queries and
Ashwin,
I would say the strategies in general are:
1) Have each user submit separate Spark app (each its own Spark
Context), with its own resource settings, and share data through HDFS
or something like Tachyon for speed.
2) Share a single spark context amongst multiple users, using fair
Hi Spark devs/users,
One of the things we are investigating here at Netflix is if Spark would
suit us for our ETL needs, and one of requirements is multi tenancy.
I did read the official doc
http://spark.apache.org/docs/latest/job-scheduling.html and the book, but
I'm still not clear on certain
Hi Ashwin,
Let me try to answer to the best of my knowledge.
On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
Here are my questions :
1. Sharing spark context : How exactly multiple users can share the cluster
using same spark
context ?
That's not
Thanks Marcelo, that was helpful ! I had some follow up questions :
That's not something you might want to do usually. In general, a
SparkContext maps to a user application
My question was basically this. In this
http://spark.apache.org/docs/latest/job-scheduling.html page in the
official doc,
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
That's not something you might want to do usually. In general, a
SparkContext maps to a user application
My question was basically this. In this page in the official doc, under
Scheduling within an application