unpersist RDD from another thread
Hi, What is the behavior when calling rdd.unpersist() from a different thread while another thread is using that rdd. Below is a simple case for this: 1) create rdd and load data 2) call rdd.cache() to bring data into memory 3) create another thread and pass rdd for a long computation 4) call rdd.unpersist while 3. is still running Questions: * Will the computation in 3) finish properly even if unpersist was called on the rdd while running? * What happens if a part of the computation fails and the rdd needs to reconstruct based on DAG lineage, will this still work even though unpersist has been called? thanks, -paul
Re: unpersist RDD from another thread
So in order to not incur any performance issues I should really wait for all usage of the rdd to complete before calling unpersist, correct? On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Daswrote: > unpredictable. I think it will be safe (as in nothing should fail), but > the performance will be unpredictable (some partition may use cache, some > may not be able to use the cache). > > On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss > wrote: > >> Hi, >> >> What is the behavior when calling rdd.unpersist() from a different thread >> while another thread is using that rdd. Below is a simple case for this: >> >> 1) create rdd and load data >> 2) call rdd.cache() to bring data into memory >> 3) create another thread and pass rdd for a long computation >> 4) call rdd.unpersist while 3. is still running >> >> Questions: >> >> * Will the computation in 3) finish properly even if unpersist was called >> on the rdd while running? >> * What happens if a part of the computation fails and the rdd needs to >> reconstruct based on DAG lineage, will this still work even though >> unpersist has been called? >> >> thanks, >> -paul >> > >
Re: unpersist RDD from another thread
Yes. On Wed, Sep 16, 2015 at 1:12 PM, Paul Weisswrote: > So in order to not incur any performance issues I should really wait for > all usage of the rdd to complete before calling unpersist, correct? > > On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> unpredictable. I think it will be safe (as in nothing should fail), but >> the performance will be unpredictable (some partition may use cache, some >> may not be able to use the cache). >> >> On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss >> wrote: >> >>> Hi, >>> >>> What is the behavior when calling rdd.unpersist() from a different >>> thread while another thread is using that rdd. Below is a simple case for >>> this: >>> >>> 1) create rdd and load data >>> 2) call rdd.cache() to bring data into memory >>> 3) create another thread and pass rdd for a long computation >>> 4) call rdd.unpersist while 3. is still running >>> >>> Questions: >>> >>> * Will the computation in 3) finish properly even if unpersist was >>> called on the rdd while running? >>> * What happens if a part of the computation fails and the rdd needs to >>> reconstruct based on DAG lineage, will this still work even though >>> unpersist has been called? >>> >>> thanks, >>> -paul >>> >> >> >