On Tue, Aug 18, 2015 at 1:16 PM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote:
> No, the data is not stored between two jobs. But it is stored for a > lifetime of a job. Job can have multiple actions run. > I too thought so but wanted to confirm. Thanks. > > For a matter of sharing an rdd between jobs you can have a look at Spark > Job Server(spark-jobserver <https://github.com/ooyala/spark-jobserver>) > or some In-Memory storages: Tachyon(http://tachyon-project.org/) or > Ignite(https://ignite.incubator.apache.org/) > > 2015-08-18 9:37 GMT+02:00 Hemant Bhanawat <hemant9...@gmail.com>: > >> It is still in memory for future rdd transformations and actions. >> >> This is interesting. You mean Spark holds the data in memory between two >> job executions. How does the second job get the handle of the data in >> memory? I am interested in knowing more about it. Can you forward me a >> spark article or JIRA that talks about it? >> >> On Tue, Aug 18, 2015 at 12:49 PM, Sabarish Sasidharan < >> sabarish.sasidha...@manthan.com> wrote: >> >>> It is still in memory for future rdd transformations and actions. What >>> you get in driver is a copy of the data. >>> >>> Regards >>> Sab >>> >>> On Tue, Aug 18, 2015 at 12:02 PM, praveen S <mylogi...@gmail.com> wrote: >>> >>>> When I do an rdd.collect().. The data moves back to driver Or is still >>>> held in memory across the executors? >>>> >>> >>> >>> >>> -- >>> >>> Architect - Big Data >>> Ph: +91 99805 99458 >>> >>> Manthan Systems | *Company of the year - Analytics (2014 Frost and >>> Sullivan India ICT)* >>> +++ >>> >> >> >