Re: Spark does not delete temporary directories

2015-05-07 Thread Sean Owen
You're referring to a comment in the generic utility method, not the
specific calls to it. The comment just says that the generic method
doesn't mark the directory for deletion. Individual uses of it might
need to.

One or more of these might be delete-able on exit, but in any event
it's just a directory. I think 'spark files' might intentionally stay
around since it outlives one JVM and might be shared across executors.

On Fri, May 8, 2015 at 3:53 AM, Taeyun Kim  wrote:
> It seems that they are always empty.
>
>
>
> I've traced the spark source code.
>
> The module methods that create the 3 'temp' directories are as follows:
>
>
>
> - DiskBlockManager.createLocalDirs
>
> - HttpFileServer.initialize
>
> - SparkEnv.sparkFilesDir
>
>
>
> They (eventually) call Utils.getOrCreateLocalRootDirs and then
> Utils.createDirectory, which intentionally does NOT mark the directory for
> automatic deletion.
>
> The comment of createDirectory method says: "The directory is guaranteed to
> be newly created, and is not marked for automatic deletion."
>
> I don't know why they are not marked. Is this really intentional?
>
>
>
> From: Haopu Wang [mailto:hw...@qilinsoft.com]
> Sent: Friday, May 08, 2015 11:37 AM
> To: Taeyun Kim; Ted Yu; Todd Nist; user@spark.apache.org
>
>
> Subject: RE: Spark does not delete temporary directories
>
>
>
> I think the temporary folders are used to store blocks and shuffles. That
> doesn't depend on the cluster manager.
>
> Ideally they should be removed after the application has been terminated.
>
> Can you check if there are contents under those folders?
>
>
>
> ____________
>
> From: Taeyun Kim [mailto:taeyun@innowireless.com]
> Sent: Friday, May 08, 2015 9:42 AM
> To: 'Ted Yu'; 'Todd Nist'; user@spark.apache.org
> Subject: RE: Spark does not delete temporary directories
>
>
>
> Thanks, but it seems that the option is for Spark standalone mode only.
>
> I’ve (lightly) tested the options with local mode and yarn-client mode, the
> ‘temp’ directories were not deleted.
>
>
>
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Thursday, May 07, 2015 10:47 PM
> To: Todd Nist
> Cc: Taeyun Kim; user@spark.apache.org
> Subject: Re: Spark does not delete temporary directories
>
>
>
> Default value for spark.worker.cleanup.enabled is false:
>
>
> private val CLEANUP_ENABLED =
> conf.getBoolean("spark.worker.cleanup.enabled", false)
>
>
>
> I wonder if the default should be set as true.
>
>
>
> Cheers
>
>
>
> On Thu, May 7, 2015 at 6:19 AM, Todd Nist  wrote:
>
> Have you tried to set the following?
>
> spark.worker.cleanup.enabled=true
> spark.worker.cleanup.appDataTtl=”
>
>
>
>
>
> On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim 
> wrote:
>
> Hi,
>
>
>
> After a spark program completes, there are 3 temporary directories remain in
> the temp directory.
>
> The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7
>
>
>
> And the Spark program runs on Windows, a snappy DLL file also remains in the
> temp directory.
>
> The file name is like this:
> snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava
>
>
>
> They are created every time the Spark program runs. So the number of files
> and directories keeps growing.
>
>
>
> How can let them be deleted?
>
>
>
> Spark version is 1.3.1 with Hadoop 2.6.
>
>
>
> Thanks.
>
>
>
>
>
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Spark does not delete temporary directories

2015-05-07 Thread Taeyun Kim
It seems that they are always empty.

 

I've traced the spark source code.

The module methods that create the 3 'temp' directories are as follows:

 

- DiskBlockManager.createLocalDirs

- HttpFileServer.initialize

- SparkEnv.sparkFilesDir

 

They (eventually) call Utils.getOrCreateLocalRootDirs and then
Utils.createDirectory, which intentionally does NOT mark the directory for
automatic deletion.

The comment of createDirectory method says: "The directory is guaranteed to
be newly created, and is not marked for automatic deletion."

I don't know why they are not marked. Is this really intentional?

 

From: Haopu Wang [mailto:hw...@qilinsoft.com] 
Sent: Friday, May 08, 2015 11:37 AM
To: Taeyun Kim; Ted Yu; Todd Nist; user@spark.apache.org
Subject: RE: Spark does not delete temporary directories

 

I think the temporary folders are used to store blocks and shuffles. That
doesn't depend on the cluster manager.

Ideally they should be removed after the application has been terminated.

Can you check if there are contents under those folders?

 

  _  

From: Taeyun Kim [mailto:taeyun@innowireless.com] 
Sent: Friday, May 08, 2015 9:42 AM
To: 'Ted Yu'; 'Todd Nist'; user@spark.apache.org
Subject: RE: Spark does not delete temporary directories

 

Thanks, but it seems that the option is for Spark standalone mode only.

I've (lightly) tested the options with local mode and yarn-client mode, the
'temp' directories were not deleted.

 

From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Thursday, May 07, 2015 10:47 PM
To: Todd Nist
Cc: Taeyun Kim; user@spark.apache.org
Subject: Re: Spark does not delete temporary directories

 

Default value for spark.worker.cleanup.enabled is false:


private val CLEANUP_ENABLED =
conf.getBoolean("spark.worker.cleanup.enabled", false)

 

I wonder if the default should be set as true.

 

Cheers

 

On Thu, May 7, 2015 at 6:19 AM, Todd Nist  wrote:

Have you tried to set the following?

spark.worker.cleanup.enabled=true 
spark.worker.cleanup.appDataTtl="

 

 

On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim 
wrote:

Hi,

 

After a spark program completes, there are 3 temporary directories remain in
the temp directory.

The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7

 

And the Spark program runs on Windows, a snappy DLL file also remains in the
temp directory.

The file name is like this:
snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava

 

They are created every time the Spark program runs. So the number of files
and directories keeps growing.

 

How can let them be deleted?

 

Spark version is 1.3.1 with Hadoop 2.6.

 

Thanks.

 

 

 

 



RE: Spark does not delete temporary directories

2015-05-07 Thread Haopu Wang
I think the temporary folders are used to store blocks and shuffles.
That doesn't depend on the cluster manager.

Ideally they should be removed after the application has been
terminated.

Can you check if there are contents under those folders?

 



From: Taeyun Kim [mailto:taeyun@innowireless.com] 
Sent: Friday, May 08, 2015 9:42 AM
To: 'Ted Yu'; 'Todd Nist'; user@spark.apache.org
Subject: RE: Spark does not delete temporary directories

 

Thanks, but it seems that the option is for Spark standalone mode only.

I've (lightly) tested the options with local mode and yarn-client mode,
the 'temp' directories were not deleted.

 

From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Thursday, May 07, 2015 10:47 PM
To: Todd Nist
Cc: Taeyun Kim; user@spark.apache.org
Subject: Re: Spark does not delete temporary directories

 

Default value for spark.worker.cleanup.enabled is false:


private val CLEANUP_ENABLED =
conf.getBoolean("spark.worker.cleanup.enabled", false)

 

I wonder if the default should be set as true.

 

Cheers

 

On Thu, May 7, 2015 at 6:19 AM, Todd Nist  wrote:

Have you tried to set the following?

spark.worker.cleanup.enabled=true 
spark.worker.cleanup.appDataTtl="

 

 

On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim 
wrote:

Hi,

 

After a spark program completes, there are 3 temporary directories
remain in the temp directory.

The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7

 

And the Spark program runs on Windows, a snappy DLL file also remains in
the temp directory.

The file name is like this:
snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava

 

They are created every time the Spark program runs. So the number of
files and directories keeps growing.

 

How can let them be deleted?

 

Spark version is 1.3.1 with Hadoop 2.6.

 

Thanks.

 

 

 

 



RE: Spark does not delete temporary directories

2015-05-07 Thread Taeyun Kim
Thanks, but it seems that the option is for Spark standalone mode only.

I’ve (lightly) tested the options with local mode and yarn-client mode, the 
‘temp’ directories were not deleted.

 

From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Thursday, May 07, 2015 10:47 PM
To: Todd Nist
Cc: Taeyun Kim; user@spark.apache.org
Subject: Re: Spark does not delete temporary directories

 

Default value for spark.worker.cleanup.enabled is false:


private val CLEANUP_ENABLED = conf.getBoolean("spark.worker.cleanup.enabled", 
false)

 

I wonder if the default should be set as true.

 

Cheers

 

On Thu, May 7, 2015 at 6:19 AM, Todd Nist  wrote:

Have you tried to set the following?

spark.worker.cleanup.enabled=true 
spark.worker.cleanup.appDataTtl=”

 

 

On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim  wrote:

Hi,

 

After a spark program completes, there are 3 temporary directories remain in 
the temp directory.

The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7

 

And the Spark program runs on Windows, a snappy DLL file also remains in the 
temp directory.

The file name is like this: 
snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava

 

They are created every time the Spark program runs. So the number of files and 
directories keeps growing.

 

How can let them be deleted?

 

Spark version is 1.3.1 with Hadoop 2.6.

 

Thanks.

 

 

 

 



Re: Spark does not delete temporary directories

2015-05-07 Thread Ted Yu
Default value for spark.worker.cleanup.enabled is false:

private val CLEANUP_ENABLED =
conf.getBoolean("spark.worker.cleanup.enabled", false)

I wonder if the default should be set as true.

Cheers

On Thu, May 7, 2015 at 6:19 AM, Todd Nist  wrote:

> Have you tried to set the following?
>
> spark.worker.cleanup.enabled=true
> spark.worker.cleanup.appDataTtl=”
>
>
>
> On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim 
> wrote:
>
>> Hi,
>>
>>
>>
>> After a spark program completes, there are 3 temporary directories remain
>> in the temp directory.
>>
>> The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7
>>
>>
>>
>> And the Spark program runs on Windows, a snappy DLL file also remains in
>> the temp directory.
>>
>> The file name is like this:
>> snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava
>>
>>
>>
>> They are created every time the Spark program runs. So the number of
>> files and directories keeps growing.
>>
>>
>>
>> How can let them be deleted?
>>
>>
>>
>> Spark version is 1.3.1 with Hadoop 2.6.
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>
>


Re: Spark does not delete temporary directories

2015-05-07 Thread Todd Nist
Have you tried to set the following?

spark.worker.cleanup.enabled=true
spark.worker.cleanup.appDataTtl=”



On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim 
wrote:

> Hi,
>
>
>
> After a spark program completes, there are 3 temporary directories remain
> in the temp directory.
>
> The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7
>
>
>
> And the Spark program runs on Windows, a snappy DLL file also remains in
> the temp directory.
>
> The file name is like this:
> snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava
>
>
>
> They are created every time the Spark program runs. So the number of files
> and directories keeps growing.
>
>
>
> How can let them be deleted?
>
>
>
> Spark version is 1.3.1 with Hadoop 2.6.
>
>
>
> Thanks.
>
>
>
>
>