Spark shuffle files

2017-03-27 Thread Ashwin Sai Shankar
Hi!

In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)

Thanks,
Ashwin


Re: Spark shuffle files

2017-03-27 Thread Mark Hamstra
Shuffle files are cleaned when they are no longer referenced. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala

On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
ashan...@netflix.com.invalid> wrote:

> Hi!
>
> In spark on yarn, when are shuffle files on local disk removed? (Is it
> when the app completes or
> once all the shuffle files are fetched or end of the stage?)
>
> Thanks,
> Ashwin
>


Re: Spark shuffle files

2017-03-27 Thread Ashwin Sai Shankar
Thanks Mark! follow up question, do you know when shuffle files are usually
un-referenced?

On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra 
wrote:

> Shuffle files are cleaned when they are no longer referenced. See
> https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/ContextCleaner.scala
>
> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
> ashan...@netflix.com.invalid> wrote:
>
>> Hi!
>>
>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>> when the app completes or
>> once all the shuffle files are fetched or end of the stage?)
>>
>> Thanks,
>> Ashwin
>>
>
>


Re: Spark shuffle files

2017-03-27 Thread Mark Hamstra
When the RDD using them goes out of scope.

On Mon, Mar 27, 2017 at 3:13 PM, Ashwin Sai Shankar 
wrote:

> Thanks Mark! follow up question, do you know when shuffle files are
> usually un-referenced?
>
> On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra 
> wrote:
>
>> Shuffle files are cleaned when they are no longer referenced. See
>> https://github.com/apache/spark/blob/master/core/src/mai
>> n/scala/org/apache/spark/ContextCleaner.scala
>>
>> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
>> ashan...@netflix.com.invalid> wrote:
>>
>>> Hi!
>>>
>>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>>> when the app completes or
>>> once all the shuffle files are fetched or end of the stage?)
>>>
>>> Thanks,
>>> Ashwin
>>>
>>
>>
>