Re: Spark clustered client

2014-07-22 Thread Nick Pentreath
At the moment your best bet for sharing SparkContexts across jobs will be 
Ooyala job server: https://github.com/ooyala/spark-jobserver


It doesn't yet support spark 1.0 though I did manage to amend it to get it to 
build and run on 1.0
—
Sent from Mailbox

On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav  wrote:

> Hi Folks,
> I have been trying to dig up some information in regards to what are the
> possibilities when wanting to deploy more than one client process that
> consumes Spark.
> Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
> additional servers which are sending requests to it through a Spark
> context, referencing one specific file of 1TB of data.
> Each client process, has its own SparkContext instance.
> Currently, the result is that that same file is loaded into memory twice
> because the Spark Context resources are not shared between processes/jvms.
> I wouldn't like to have that same file loaded over and over again with
> every new client being introduced.
> What would be the best practice here? Am I missing something?
> Thank you,
> Asaf

Spark clustered client

2014-07-22 Thread Asaf Lahav
Hi Folks,

I have been trying to dig up some information in regards to what are the
possibilities when wanting to deploy more than one client process that
consumes Spark.

Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
additional servers which are sending requests to it through a Spark
context, referencing one specific file of 1TB of data.

Each client process, has its own SparkContext instance.
Currently, the result is that that same file is loaded into memory twice
because the Spark Context resources are not shared between processes/jvms.


I wouldn't like to have that same file loaded over and over again with
every new client being introduced.
What would be the best practice here? Am I missing something?

Thank you,
Asaf