[ 
https://issues.apache.org/jira/browse/SPARK-22714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309366#comment-16309366
 ] 

Sujith Jay Nair commented on SPARK-22714:
-----------------------------------------

Hi [~todesking], is this reproducible outside of Spark REPL? Trying to 
understand if this is specific to Spark shell.

> Spark API Not responding when Fatal exception occurred in event loop
> --------------------------------------------------------------------
>
>                 Key: SPARK-22714
>                 URL: https://issues.apache.org/jira/browse/SPARK-22714
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: todesking
>            Priority: Critical
>
> To reproduce, let Spark to throw an OOM Exception in event loop:
> {noformat}
> scala> spark.sparkContext.getConf.get("spark.driver.memory")
> res0: String = 1g
> scala> val a = new Array[Int](4 * 1000 * 1000)
> scala> val ds = spark.createDataset(a)
> scala> ds.rdd.zipWithIndex
> [Stage 0:>                                                          (0 + 0) / 
> 3]Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: 
> Java heap space
> [Stage 0:>                                                          (0 + 0) / 
> 3]
> // Spark is not responding
> {noformat}
> While not responding, Spark waiting for some Promise, but is never done.
> The promise depends some process in event loop thread, but the thread is dead 
> when Fatal exception is thrown.
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007ffc9300b000 nid=0x1703 waiting on 
> condition [0x0000700000216000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007ad978eb8> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>         at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
>         at 
> org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:50)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>         at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1292)
> {noformat}
> I don't know how to fix it properly, but it seems we need to add Fatal error 
> handling to EventLoop.run() in core/EventLoop.scala



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to