[
https://issues.apache.org/jira/browse/SPARK-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445999#comment-16445999
]
Artem Rudoy edited comment on SPARK-22371 at 4/20/18 4:37 PM:
--
Do we really need to throw an exception from AccumulatorContext.get() when an
accumulator is garbage collected? There's a period of time when an accumulator
has been garbage collected, but hasn't been removed from
AccumulatorContext.originals by ContextCleaner. When an update is received for
such accumulator it will throw an exception and kill the whole job. This can
happen when a stage completed, but there're still running tasks from other
attempts, speculation etc. Since AccumulatorContext.get() returns an Option we
could just return None in such case. Before SPARK-20940 this method threw
IllegalAccessError which is not a NonFatal, was caught at a lower level and
didn't cause job failure.
was (Author: rudoy):
Do we really need to throw an exception from AccumulatorContext.get() when an
accumulator is garbage collected? There's a period of time when an accumulator
has been garbage collected, but hasn't been removed from
AccumulatorContext.originals by ContextCleaner. When an update is received for
such accumulator it will throw an exception and kill the whole job. This can
happen when a stage completed, but there're still running tasks from other
attempts, speculation etc. Since AccumulatorContext.get() returns an Option we
could just return None in such case. Before SPARK-20940 this method thrown
IllegalAccessError which is not a NonFatal, was caught at a lower level and
didn't cause job failure.
> dag-scheduler-event-loop thread stopped with error Attempted to access
> garbage collected accumulator 5605982
> -
>
> Key: SPARK-22371
> URL: https://issues.apache.org/jira/browse/SPARK-22371
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Mayank Agarwal
>Priority: Major
> Attachments: Helper.scala, ShuffleIssue.java,
> driver-thread-dump-spark2.1.txt, sampledata
>
>
> Our Spark Jobs are getting stuck on DagScheduler.runJob as dagscheduler
> thread is stopped because of *Attempted to access garbage collected
> accumulator 5605982*.
> from our investigation it look like accumulator is cleaned by GC first and
> same accumulator is used for merging the results from executor on task
> completion event.
> As the error java.lang.IllegalAccessError is LinkageError which is treated as
> FatalError so dag-scheduler loop is finished with below exception.
> ---ERROR stack trace --
> Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError:
> Attempted to access garbage collected accumulator 5605982
> at
> org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:253)
> at
> org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:249)
> at scala.Option.map(Option.scala:146)
> at
> org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:249)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1083)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1080)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1080)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1183)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1647)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> I am attaching the thread dump of driver as well
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org