[ 
https://issues.apache.org/jira/browse/SPARK-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail updated SPARK-17430:
----------------------------
    Attachment: abort-task-on-oom-in-dag-scheduler.patch

the patch that fixes hanging

> Spark task Hangs after OOM while DAG scheduler tries to serialize a task
> ------------------------------------------------------------------------
>
>                 Key: SPARK-17430
>                 URL: https://issues.apache.org/jira/browse/SPARK-17430
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.6.2
>            Reporter: Mikhail
>         Attachments: abort-task-on-oom-in-dag-scheduler.patch
>
>
> Hi here,
> We're running Spark under Hadoop 2.7.1 Yarn and faced a problem.
> The problem is that sometimes an exception raises inside JavaSerializer (see 
> the stacktrace below). The exception isn't a problem itself but after it 
> happens, the task hangs. It's shown as "running" in the Hadoop task list but 
> no one worker is executing task, no more records appear in Spark job log 
> until somebody kills it.
> We have fixed the issue by patching Spark code (catch OOM in 
> submitMissingTasks()) but it looks like OOM error is deliberately ignored so 
> probably there should be a better solution.
> {noformat}
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: 
> Java heap space
>       at java.util.Arrays.copyOf(Arrays.java:3332)
>       at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>       at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>       at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
>       at java.lang.StringBuilder.append(StringBuilder.java:136)
>       at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>       at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>       at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>       at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>       at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>       at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>       at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>       at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>       at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>       at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>       at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>       at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>       at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>       at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>       at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>       at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>       at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1003)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:861)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to