[ 
https://issues.apache.org/jira/browse/SPARK-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961395#comment-13961395
 ] 

OuyangJin commented on SPARK-1304:
----------------------------------

This ShutdownHook was thrown by HDFS when creating a FileSystem object , and 
was just reported back to Sparkt and reported by TaskSetManager. And this may 
caused by HDFS not in normal mode or some fiile blocks read by Spark corrupted 
, You may check your NameNode log at that specific time to see what happened to 
HDFS ,and may be some DataNode log containing revelvant file blocks  

> Job fails with spot instances (due to IllegalStateException: Shutdown in 
> progress)
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-1304
>                 URL: https://issues.apache.org/jira/browse/SPARK-1304
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Alex Boisvert
>
> We had a job running smoothly with spot instances until one of the spot 
> instances got terminated ... which led to a series of "IllegalStateException: 
> Shutdown in progress" and the job failed afterwards.
> 14/03/24 06:07:52 WARN scheduler.TaskSetManager: Loss was due to 
> java.lang.IllegalStateException
> java.lang.IllegalStateException: Shutdown in progress
>       at 
> java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
>       at java.lang.Runtime.addShutdownHook(Runtime.java:211)
>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1441)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
>       at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
>       at 
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:77)
>       at 
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51)
>       at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>       at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>       at 
> org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:90)
>       at 
> org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:89)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:57)
>       at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>       at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:94)
>       at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>       at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>       at org.apache.spark.scheduler.Task.run(Task.scala:53)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to