Re: Regarding rdd.collect()

Dawid Wysakowicz Tue, 18 Aug 2015 00:46:36 -0700

No, the data is not stored between two jobs. But it is stored for a
lifetime of a job. Job can have multiple actions run.
For a matter of sharing an rdd between jobs you can have a look at Spark
Job Server(spark-jobserver <https://github.com/ooyala/spark-jobserver>) or
some In-Memory storages: Tachyon(http://tachyon-project.org/) or Ignite(
https://ignite.incubator.apache.org/)


2015-08-18 9:37 GMT+02:00 Hemant Bhanawat <hemant9...@gmail.com>:

> It is still in memory for future rdd transformations and actions.
>
> This is interesting. You mean Spark holds the data in memory between two
> job executions.  How does the second job get the handle of the data in
> memory? I am interested in knowing more about it. Can you forward me a
> spark article or JIRA that talks about it?
>
> On Tue, Aug 18, 2015 at 12:49 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> It is still in memory for future rdd transformations and actions. What
>> you get in driver is a copy of the data.
>>
>> Regards
>> Sab
>>
>> On Tue, Aug 18, 2015 at 12:02 PM, praveen S <mylogi...@gmail.com> wrote:
>>
>>> When I do an rdd.collect().. The data moves back to driver  Or is still
>>> held in memory across the executors?
>>>
>>
>>
>>
>> --
>>
>> Architect - Big Data
>> Ph: +91 99805 99458
>>
>> Manthan Systems | *Company of the year - Analytics (2014 Frost and
>> Sullivan India ICT)*
>> +++
>>
>
>

Re: Regarding rdd.collect()

Reply via email to