unpersist RDD from another thread

2015-09-16 Thread Paul Weiss
Hi,

What is the behavior when calling rdd.unpersist() from a different thread
while another thread is using that rdd.  Below is a simple case for this:

1) create rdd and load data
2) call rdd.cache() to bring data into memory
3) create another thread and pass rdd for a long computation
4) call rdd.unpersist while 3. is still running

Questions:

* Will the computation in 3) finish properly even if unpersist was called
on the rdd while running?
* What happens if a part of the computation fails and the rdd needs to
reconstruct based on DAG lineage, will this still work even though
unpersist has been called?

thanks,
-paul


Re: unpersist RDD from another thread

2015-09-16 Thread Paul Weiss
So in order to not incur any performance issues I should really wait for
all usage of the rdd to complete before calling unpersist, correct?

On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das 
wrote:

> unpredictable. I think it will be safe (as in nothing should fail), but
> the performance will be unpredictable (some partition may use cache, some
> may not be able to use the cache).
>
> On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss 
> wrote:
>
>> Hi,
>>
>> What is the behavior when calling rdd.unpersist() from a different thread
>> while another thread is using that rdd.  Below is a simple case for this:
>>
>> 1) create rdd and load data
>> 2) call rdd.cache() to bring data into memory
>> 3) create another thread and pass rdd for a long computation
>> 4) call rdd.unpersist while 3. is still running
>>
>> Questions:
>>
>> * Will the computation in 3) finish properly even if unpersist was called
>> on the rdd while running?
>> * What happens if a part of the computation fails and the rdd needs to
>> reconstruct based on DAG lineage, will this still work even though
>> unpersist has been called?
>>
>> thanks,
>> -paul
>>
>
>


Re: unpersist RDD from another thread

2015-09-16 Thread Tathagata Das
Yes.

On Wed, Sep 16, 2015 at 1:12 PM, Paul Weiss  wrote:

> So in order to not incur any performance issues I should really wait for
> all usage of the rdd to complete before calling unpersist, correct?
>
> On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> unpredictable. I think it will be safe (as in nothing should fail), but
>> the performance will be unpredictable (some partition may use cache, some
>> may not be able to use the cache).
>>
>> On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss 
>> wrote:
>>
>>> Hi,
>>>
>>> What is the behavior when calling rdd.unpersist() from a different
>>> thread while another thread is using that rdd.  Below is a simple case for
>>> this:
>>>
>>> 1) create rdd and load data
>>> 2) call rdd.cache() to bring data into memory
>>> 3) create another thread and pass rdd for a long computation
>>> 4) call rdd.unpersist while 3. is still running
>>>
>>> Questions:
>>>
>>> * Will the computation in 3) finish properly even if unpersist was
>>> called on the rdd while running?
>>> * What happens if a part of the computation fails and the rdd needs to
>>> reconstruct based on DAG lineage, will this still work even though
>>> unpersist has been called?
>>>
>>> thanks,
>>> -paul
>>>
>>
>>
>