[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 9:58 PM:
----------------------------------------------------------

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

    # Initialize the spark context.
    spark = SparkSession\
        .builder\
        .appName("SparkParseLogTest")\
        .getOrCreate()
                
    lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
    lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
    main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end (I think during saveAsTextFile, or after it), the 
{{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} 
get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\




was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

    # Initialize the spark context.
    spark = SparkSession\
        .builder\
        .appName("SparkParseLogTest")\
        .getOrCreate()
                
    lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
    lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
    main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end, the {{aws-java-sdk-1.7.4.jar}} file is copied there and the 
{{sparkbugtest.py}} get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\



> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to