Re: Sharing spark executor pool across multiple long running spark applications

2018-02-10 Thread Nirav Patel
I did take a look at SJC earlier. It does look like fits oure use case. It
seems to integrated in Datastax too. Apache Livy looks promising as well. I
will look into these further.
I think for real-time app that needs subsecond latency, spark dynamic
allocation won't work.

Thanks!

On Wed, Feb 7, 2018 at 6:37 AM, Vadim Semenov  wrote:

> The other way might be to launch a single SparkContext and then run jobs
> inside of it.
>
> You can take a look at these projects:
> - https://github.com/spark-jobserver/spark-jobserver#
> persistent-context-mode---faster--required-for-related-jobs
> - http://livy.incubator.apache.org
>
> Problems with this way:
> - Can't update the code of your job.
> - A single job can break the SparkContext.
>
>
> We evaluated this way and decided to go with the dynamic allocation,
> but we also had to rethink the way we write our jobs:
> - Can't use caching since it locks executors, have to use checkpointing,
> which adds up to computation time.
> - Use some unconventional methods like reusing the same DF to write out
> multiple separate things in one go.
> - Sometimes remove executors from within the job, like when we know how
> many we would need, so the executors could join other jobs.
>
> On Tue, Feb 6, 2018 at 3:00 PM, Nirav Patel  wrote:
>
>> Currently sparkContext and it's executor pool is not shareable. Each
>> spakContext gets its own executor pool for entire life of an application.
>> So what is the best ways to share cluster resources across multiple long
>> running spark applications?
>>
>> Only one I see is spark dynamic allocation but it has high latency when
>> it comes to real-time application.
>>
>>
>>
>>
>> [image: What's New with Xactly] 
>>
>> 
>> 
>>    
>> 
>
>
>

-- 


[image: What's New with Xactly] 

   
   
      



Re: Sharing spark executor pool across multiple long running spark applications

2018-02-07 Thread Vadim Semenov
The other way might be to launch a single SparkContext and then run jobs
inside of it.

You can take a look at these projects:
-
https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
- http://livy.incubator.apache.org

Problems with this way:
- Can't update the code of your job.
- A single job can break the SparkContext.


We evaluated this way and decided to go with the dynamic allocation,
but we also had to rethink the way we write our jobs:
- Can't use caching since it locks executors, have to use checkpointing,
which adds up to computation time.
- Use some unconventional methods like reusing the same DF to write out
multiple separate things in one go.
- Sometimes remove executors from within the job, like when we know how
many we would need, so the executors could join other jobs.

On Tue, Feb 6, 2018 at 3:00 PM, Nirav Patel  wrote:

> Currently sparkContext and it's executor pool is not shareable. Each
> spakContext gets its own executor pool for entire life of an application.
> So what is the best ways to share cluster resources across multiple long
> running spark applications?
>
> Only one I see is spark dynamic allocation but it has high latency when it
> comes to real-time application.
>
>
>
>
> [image: What's New with Xactly] 
>
> 
> 
>    
> 


Sharing spark executor pool across multiple long running spark applications

2018-02-06 Thread Nirav Patel
Currently sparkContext and it's executor pool is not shareable. Each
spakContext gets its own executor pool for entire life of an application.
So what is the best ways to share cluster resources across multiple long
running spark applications?

Only one I see is spark dynamic allocation but it has high latency when it
comes to real-time application.

-- 


[image: What's New with Xactly]