Hi, Praveen, have you checked out this, which might have the details you need:
https://spark-summit.org/2014/wp-content/uploads/2014/07/Spark-Job-Server-Easy-Spark-Job-Management-Chan-Chu.pdf

Best Regards,
Jia


On Jan 19, 2016, at 7:28 AM, praveen S <mylogi...@gmail.com> wrote:

> Can you give me more details on Spark's jobserver.
> 
> Regards, 
> Praveen
> 
> On 18 Jan 2016 03:30, "Jia" <jacqueline...@gmail.com> wrote:
> I guess all jobs submitted through JobServer are executed in the same JVM, so 
> RDDs cached by one job can be visible to all other jobs executed later.
> On Jan 17, 2016, at 3:56 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> 
>> Yes, that is one of the basic reasons to use a 
>> jobserver/shared-SparkContext.  Otherwise, in order share the data in an RDD 
>> you have to use an external storage system, such as a distributed filesystem 
>> or Tachyon.
>> 
>> On Sun, Jan 17, 2016 at 1:52 PM, Jia <jacqueline...@gmail.com> wrote:
>> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so 
>> that jobs can be submitted at different time and still share RDDs.
>> 
>> Best Regards,
>> Jia
>> 
>> 
>> On Jan 17, 2016, at 3:44 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
>> 
>>> There is a 1-to-1 relationship between Spark Applications and SparkContexts 
>>> -- fundamentally, a Spark Applications is a program that creates and uses a 
>>> SparkContext, and that SparkContext is destroyed when then Application 
>>> ends.  A jobserver generically and the Spark JobServer specifically is an 
>>> Application that keeps a SparkContext open for a long time and allows many 
>>> Jobs to be be submitted and run using that shared SparkContext.
>>> 
>>> More than one Application/SparkContext unavoidably implies more than one 
>>> JVM process per Worker -- Applications/SparkContexts cannot share JVM 
>>> processes.  
>>> 
>>> On Sun, Jan 17, 2016 at 1:15 PM, Jia <jacqueline...@gmail.com> wrote:
>>> Hi, Mark, sorry for the confusion.
>>> 
>>> Let me clarify, when an application is submitted, the master will tell each 
>>> Spark worker to spawn an executor JVM process. All the task sets  of the 
>>> application will be executed by the executor. After the application runs to 
>>> completion. The executor process will be killed.
>>> But I hope that all applications submitted can run in the same executor, 
>>> can JobServer do that? If so, it’s really good news!
>>> 
>>> Best Regards,
>>> Jia
>>> 
>>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
>>> 
>>>> You've still got me confused.  The SparkContext exists at the Driver, not 
>>>> on an Executor.
>>>> 
>>>> Many Jobs can be run by a SparkContext -- it is a common pattern to use 
>>>> something like the Spark Jobserver where all Jobs are run through a shared 
>>>> SparkContext.
>>>> 
>>>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou <jacqueline...@gmail.com> wrote:
>>>> Hi, Mark, sorry, I mean SparkContext.
>>>> I mean to change Spark into running all submitted jobs (SparkContexts) in 
>>>> one executor JVM.
>>>> 
>>>> Best Regards,
>>>> Jia
>>>> 
>>>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra <m...@clearstorydata.com> 
>>>> wrote:
>>>> -dev
>>>> 
>>>> What do you mean by JobContext?  That is a Hadoop mapreduce concept, not 
>>>> Spark.
>>>> 
>>>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com> wrote:
>>>> Dear all,
>>>> 
>>>> Is there a way to reuse executor JVM across different JobContexts? Thanks.
>>>> 
>>>> Best Regards,
>>>> Jia
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 

Reply via email to