At the moment your best bet for sharing SparkContexts across jobs will be
Ooyala job server: https://github.com/ooyala/spark-jobserver
It doesn't yet support spark 1.0 though I did manage to amend it to get it to
build and run on 1.0
—
Sent from Mailbox
On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav wrote:
> Hi Folks,
> I have been trying to dig up some information in regards to what are the
> possibilities when wanting to deploy more than one client process that
> consumes Spark.
> Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
> additional servers which are sending requests to it through a Spark
> context, referencing one specific file of 1TB of data.
> Each client process, has its own SparkContext instance.
> Currently, the result is that that same file is loaded into memory twice
> because the Spark Context resources are not shared between processes/jvms.
> I wouldn't like to have that same file loaded over and over again with
> every new client being introduced.
> What would be the best practice here? Am I missing something?
> Thank you,
> Asaf