I guess all jobs submitted through JobServer are executed in the same JVM, so 
RDDs cached by one job can be visible to all other jobs executed later.
On Jan 17, 2016, at 3:56 PM, Mark Hamstra <m...@clearstorydata.com> wrote:

> Yes, that is one of the basic reasons to use a jobserver/shared-SparkContext. 
>  Otherwise, in order share the data in an RDD you have to use an external 
> storage system, such as a distributed filesystem or Tachyon.
> 
> On Sun, Jan 17, 2016 at 1:52 PM, Jia <jacqueline...@gmail.com> wrote:
> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so 
> that jobs can be submitted at different time and still share RDDs.
> 
> Best Regards,
> Jia
> 
> 
> On Jan 17, 2016, at 3:44 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> 
>> There is a 1-to-1 relationship between Spark Applications and SparkContexts 
>> -- fundamentally, a Spark Applications is a program that creates and uses a 
>> SparkContext, and that SparkContext is destroyed when then Application ends. 
>>  A jobserver generically and the Spark JobServer specifically is an 
>> Application that keeps a SparkContext open for a long time and allows many 
>> Jobs to be be submitted and run using that shared SparkContext.
>> 
>> More than one Application/SparkContext unavoidably implies more than one JVM 
>> process per Worker -- Applications/SparkContexts cannot share JVM processes. 
>>  
>> 
>> On Sun, Jan 17, 2016 at 1:15 PM, Jia <jacqueline...@gmail.com> wrote:
>> Hi, Mark, sorry for the confusion.
>> 
>> Let me clarify, when an application is submitted, the master will tell each 
>> Spark worker to spawn an executor JVM process. All the task sets  of the 
>> application will be executed by the executor. After the application runs to 
>> completion. The executor process will be killed.
>> But I hope that all applications submitted can run in the same executor, can 
>> JobServer do that? If so, it’s really good news!
>> 
>> Best Regards,
>> Jia
>> 
>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
>> 
>>> You've still got me confused.  The SparkContext exists at the Driver, not 
>>> on an Executor.
>>> 
>>> Many Jobs can be run by a SparkContext -- it is a common pattern to use 
>>> something like the Spark Jobserver where all Jobs are run through a shared 
>>> SparkContext.
>>> 
>>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou <jacqueline...@gmail.com> wrote:
>>> Hi, Mark, sorry, I mean SparkContext.
>>> I mean to change Spark into running all submitted jobs (SparkContexts) in 
>>> one executor JVM.
>>> 
>>> Best Regards,
>>> Jia
>>> 
>>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra <m...@clearstorydata.com> 
>>> wrote:
>>> -dev
>>> 
>>> What do you mean by JobContext?  That is a Hadoop mapreduce concept, not 
>>> Spark.
>>> 
>>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com> wrote:
>>> Dear all,
>>> 
>>> Is there a way to reuse executor JVM across different JobContexts? Thanks.
>>> 
>>> Best Regards,
>>> Jia
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

Reply via email to