One spark application can have many jobs,eg,first call rdd.count then call 
rdd.collect






At 2015-08-18 15:37:14, "Hemant Bhanawat" <hemant9...@gmail.com> wrote:

It is still in memory for future rdd transformations and actions.


This is interesting. You mean Spark holds the data in memory between two job 
executions.  How does the second job get the handle of the data in memory? I am 
interested in knowing more about it. Can you forward me a spark article or JIRA 
that talks about it? 


On Tue, Aug 18, 2015 at 12:49 PM, Sabarish Sasidharan 
<sabarish.sasidha...@manthan.com> wrote:

It is still in memory for future rdd transformations and actions. What you get 
in driver is a copy of the data.


Regards
Sab


On Tue, Aug 18, 2015 at 12:02 PM, praveen S <mylogi...@gmail.com> wrote:


When I do an rdd.collect().. The data moves back to driver  Or is still held in 
memory across the executors?






--



Architect - Big Data

Ph: +91 99805 99458


Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
India ICT)
+++

Reply via email to