Re: Regarding rdd.collect()

Hemant Bhanawat Tue, 18 Aug 2015 02:12:06 -0700

On Tue, Aug 18, 2015 at 1:16 PM, Dawid Wysakowicz <
wysakowicz.da...@gmail.com> wrote:


> No, the data is not stored between two jobs. But it is stored for a
> lifetime of a job. Job can have multiple actions run.
>
I too thought so but wanted to confirm. Thanks.

>
> For a matter of sharing an rdd between jobs you can have a look at Spark
> Job Server(spark-jobserver <https://github.com/ooyala/spark-jobserver>)
> or some In-Memory storages: Tachyon(http://tachyon-project.org/) or
> Ignite(https://ignite.incubator.apache.org/)
>
> 2015-08-18 9:37 GMT+02:00 Hemant Bhanawat <hemant9...@gmail.com>:
>
>> It is still in memory for future rdd transformations and actions.
>>
>> This is interesting. You mean Spark holds the data in memory between two
>> job executions.  How does the second job get the handle of the data in
>> memory? I am interested in knowing more about it. Can you forward me a
>> spark article or JIRA that talks about it?
>>
>> On Tue, Aug 18, 2015 at 12:49 PM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> It is still in memory for future rdd transformations and actions. What
>>> you get in driver is a copy of the data.
>>>
>>> Regards
>>> Sab
>>>
>>> On Tue, Aug 18, 2015 at 12:02 PM, praveen S <mylogi...@gmail.com> wrote:
>>>
>>>> When I do an rdd.collect().. The data moves back to driver  Or is still
>>>> held in memory across the executors?
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Architect - Big Data
>>> Ph: +91 99805 99458
>>>
>>> Manthan Systems | *Company of the year - Analytics (2014 Frost and
>>> Sullivan India ICT)*
>>> +++
>>>
>>
>>
>

Re: Regarding rdd.collect()

Reply via email to