[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

Kingsley Jones (JIRA) Wed, 16 Jan 2019 21:04:43 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421530#comment-16421530
 ]


Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-----------------------------------------------------------------

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux subsystem to launch processes. So 
we do have a workaround.

However, it does concern me that this bug has been hanging around for four 
years or more when it seems to come from a lax coding practise in the use of 
classloaders. That kind of breaks the cross-platform promise of Java and Scala, 
which is why they were popular in the first place :)

Linux is good.

Windows is good.

The addressable pool of Apache Spark developers *will* expand very 
significantly if Windows developers are not shut out of the ecosystem by 
(apparently) fixable issues.


was (Author: kingsley):
Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux subsystem to launch processes. So 
we do have a workaround.

However, it does concern me that this bug has been hanging around for four 
years or more when it seems to come from a lax coding practise in the use of 
classloaders. That kind of breaks the cross-platform promise of Java and Scala, 
which is why they were popular in the first place :)

Linux is good.

Windows is good.

The addressable pool of Apache Spark developers *will* expand very 
significantly if Windows developers are not shut out of the ecosystem by 
(apparently) fixable issues.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

Reply via email to