Re: Sharing spark executor pool across multiple long running spark applications

2018-02-10 Thread Nirav Patel
I did take a look at SJC earlier. It does look like fits oure use case. It
seems to integrated in Datastax too. Apache Livy looks promising as well. I
will look into these further.
I think for real-time app that needs subsecond latency, spark dynamic
allocation won't work.

Thanks!

On Wed, Feb 7, 2018 at 6:37 AM, Vadim Semenov  wrote:

> The other way might be to launch a single SparkContext and then run jobs
> inside of it.
>
> You can take a look at these projects:
> - https://github.com/spark-jobserver/spark-jobserver#
> persistent-context-mode---faster--required-for-related-jobs
> - http://livy.incubator.apache.org
>
> Problems with this way:
> - Can't update the code of your job.
> - A single job can break the SparkContext.
>
>
> We evaluated this way and decided to go with the dynamic allocation,
> but we also had to rethink the way we write our jobs:
> - Can't use caching since it locks executors, have to use checkpointing,
> which adds up to computation time.
> - Use some unconventional methods like reusing the same DF to write out
> multiple separate things in one go.
> - Sometimes remove executors from within the job, like when we know how
> many we would need, so the executors could join other jobs.
>
> On Tue, Feb 6, 2018 at 3:00 PM, Nirav Patel  wrote:
>
>> Currently sparkContext and it's executor pool is not shareable. Each
>> spakContext gets its own executor pool for entire life of an application.
>> So what is the best ways to share cluster resources across multiple long
>> running spark applications?
>>
>> Only one I see is spark dynamic allocation but it has high latency when
>> it comes to real-time application.
>>
>>
>>
>>
>> [image: What's New with Xactly] 
>>
>> 
>> 
>>    
>> 
>
>
>

-- 


[image: What's New with Xactly] 

   
   
      



Re: Sharing spark executor pool across multiple long running spark applications

2018-02-07 Thread Vadim Semenov
The other way might be to launch a single SparkContext and then run jobs
inside of it.

You can take a look at these projects:
-
https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
- http://livy.incubator.apache.org

Problems with this way:
- Can't update the code of your job.
- A single job can break the SparkContext.


We evaluated this way and decided to go with the dynamic allocation,
but we also had to rethink the way we write our jobs:
- Can't use caching since it locks executors, have to use checkpointing,
which adds up to computation time.
- Use some unconventional methods like reusing the same DF to write out
multiple separate things in one go.
- Sometimes remove executors from within the job, like when we know how
many we would need, so the executors could join other jobs.

On Tue, Feb 6, 2018 at 3:00 PM, Nirav Patel  wrote:

> Currently sparkContext and it's executor pool is not shareable. Each
> spakContext gets its own executor pool for entire life of an application.
> So what is the best ways to share cluster resources across multiple long
> running spark applications?
>
> Only one I see is spark dynamic allocation but it has high latency when it
> comes to real-time application.
>
>
>
>
> [image: What's New with Xactly] 
>
> 
> 
>    
>