I believe the Spark Job Server by Ooyala can help you share data across
multiple jobs, take a look at
http://engineering.ooyala.com/blog/open-sourcing-our-spark-job-server. It
seems to fit closely to what you need.
Best Regards,
Sonal
Founder, Nube Technologies http://www.nubetech.co
For sharing RDDs across multiple jobs - you could also have a look at
Tachyon. It provides an HDFS compatible in-memory storage layer that keeps
data in memory across multiple jobs/frameworks - http://tachyon-project.org/
.
-
On Tue, Nov 11, 2014 at 8:11 AM, Sonal Goyal sonalgoy...@gmail.com
David,
Here is what I would suggest:
1 - Does a new SparkContext get created in the web tier for each new request
for processing?
Create a single SparkContext that gets shared across multiple web requests.
Depending on the framework that you are using for the web-tier, it should not
be
Hi,
also there is Spindle https://github.com/adobe-research/spindle which was
introduced on this list some time ago. I haven't looked into it deeply, but
you might gain some valuable insights from their architecture, they are
also using Spark to fulfill requests coming from the web.
Tobias