Hi, Praveen, have you checked out this, which might have the details you need: https://spark-summit.org/2014/wp-content/uploads/2014/07/Spark-Job-Server-Easy-Spark-Job-Management-Chan-Chu.pdf
Best Regards, Jia On Jan 19, 2016, at 7:28 AM, praveen S <mylogi...@gmail.com> wrote: > Can you give me more details on Spark's jobserver. > > Regards, > Praveen > > On 18 Jan 2016 03:30, "Jia" <jacqueline...@gmail.com> wrote: > I guess all jobs submitted through JobServer are executed in the same JVM, so > RDDs cached by one job can be visible to all other jobs executed later. > On Jan 17, 2016, at 3:56 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > >> Yes, that is one of the basic reasons to use a >> jobserver/shared-SparkContext. Otherwise, in order share the data in an RDD >> you have to use an external storage system, such as a distributed filesystem >> or Tachyon. >> >> On Sun, Jan 17, 2016 at 1:52 PM, Jia <jacqueline...@gmail.com> wrote: >> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so >> that jobs can be submitted at different time and still share RDDs. >> >> Best Regards, >> Jia >> >> >> On Jan 17, 2016, at 3:44 PM, Mark Hamstra <m...@clearstorydata.com> wrote: >> >>> There is a 1-to-1 relationship between Spark Applications and SparkContexts >>> -- fundamentally, a Spark Applications is a program that creates and uses a >>> SparkContext, and that SparkContext is destroyed when then Application >>> ends. A jobserver generically and the Spark JobServer specifically is an >>> Application that keeps a SparkContext open for a long time and allows many >>> Jobs to be be submitted and run using that shared SparkContext. >>> >>> More than one Application/SparkContext unavoidably implies more than one >>> JVM process per Worker -- Applications/SparkContexts cannot share JVM >>> processes. >>> >>> On Sun, Jan 17, 2016 at 1:15 PM, Jia <jacqueline...@gmail.com> wrote: >>> Hi, Mark, sorry for the confusion. >>> >>> Let me clarify, when an application is submitted, the master will tell each >>> Spark worker to spawn an executor JVM process. All the task sets of the >>> application will be executed by the executor. After the application runs to >>> completion. The executor process will be killed. >>> But I hope that all applications submitted can run in the same executor, >>> can JobServer do that? If so, it’s really good news! >>> >>> Best Regards, >>> Jia >>> >>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra <m...@clearstorydata.com> wrote: >>> >>>> You've still got me confused. The SparkContext exists at the Driver, not >>>> on an Executor. >>>> >>>> Many Jobs can be run by a SparkContext -- it is a common pattern to use >>>> something like the Spark Jobserver where all Jobs are run through a shared >>>> SparkContext. >>>> >>>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou <jacqueline...@gmail.com> wrote: >>>> Hi, Mark, sorry, I mean SparkContext. >>>> I mean to change Spark into running all submitted jobs (SparkContexts) in >>>> one executor JVM. >>>> >>>> Best Regards, >>>> Jia >>>> >>>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> -dev >>>> >>>> What do you mean by JobContext? That is a Hadoop mapreduce concept, not >>>> Spark. >>>> >>>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com> wrote: >>>> Dear all, >>>> >>>> Is there a way to reuse executor JVM across different JobContexts? Thanks. >>>> >>>> Best Regards, >>>> Jia >>>> >>>> >>>> >>> >>> >> >> >