[ 
https://issues.apache.org/jira/browse/SPARK-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245902#comment-15245902
 ] 

Simon Scott commented on SPARK-10722:
-------------------------------------

We too have experienced this exact exception and the resulting "Lost executor" 
error as described by Vincent Primault.

We are using Spark 1.5.1 and the KryoSerializer.

So the good news is that I believe I have identified a probable cause of the 
exception. I have rebuilt the spark-core jar with a fix and the issue appears 
to be resolved. I say "appears" because I need guidance on how to build a 
reproducible test-case that provokes the issue and demonstrates the success of 
any fix. Suffice to say that our nightly integration test, which was failing 
due to this issue, has now run successfully for several days. So I thought it 
was time to share my findings.

Examining the exception stack trace leads us to the "Executor.reportHeartbeat" 
method. This method is run regularly by a ScheduledThreadPoolExecutor. Given 
the essentially random occurrences of this exception, it seems reasonable to 
assume that the exception happens when whichever thread of the pool is running 
the reportHeartbeat method is insufficiently configured.  Again looking at the 
stack trace, the "deserialize" at line 440 of Executor.scala is failing to load 
the RDDBlockId class - so the failing thread is not configured with the correct 
class loader?

So the fix I have applied is to change the Utils.deserialize call to call 
instead the Utils.deserialize override that takes a second argument which is 
the class loader to use. Helpfully Utils also provides 
"getContextOrSparkClassLoader" which seems to have a good enough value to 
resolve the issue.

So I hope that helps. I would like to put forward a patch with my fix, the only 
thing holding me back is lack of a reproducible test-case. As I said, any 
guidance on how to generate such warmly received.

> Uncaught exception: RDDBlockId not found in driver-heartbeater
> --------------------------------------------------------------
>
>                 Key: SPARK-10722
>                 URL: https://issues.apache.org/jira/browse/SPARK-10722
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.3.1, 1.4.1, 1.5.0
>            Reporter: Simeon Simeonov
>
> Some operations involving cached RDDs generate an uncaught exception in 
> driver-heartbeater. If the {{.cache()}} call is removed, processing happens 
> without the exception. However, not all RDDs trigger the problem, i.e., some 
> {{.cache()}} operations are fine. 
> I can see the problem with 1.4.1 and 1.5.0 but I have not been able to create 
> a reproducible test case. The same exception is [reported on 
> SO|http://stackoverflow.com/questions/31280355/spark-test-on-local-machine] 
> for v1.3.1 but the behavior is related to large broadcast variables.
> The full stack trace is:
> {code}
> 15/09/20 22:10:08 ERROR Utils: Uncaught exception in thread driver-heartbeater
> java.io.IOException: java.lang.ClassNotFoundException: 
> org.apache.spark.storage.RDDBlockId
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>   at org.apache.spark.executor.TaskMetrics.readObject(TaskMetrics.scala:219)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at org.apache.spark.util.Utils$.deserialize(Utils.scala:91)
>   at 
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1$$anonfun$apply$6.apply(Executor.scala:440)
>   at 
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1$$anonfun$apply$6.apply(Executor.scala:430)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1.apply(Executor.scala:430)
>   at 
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1.apply(Executor.scala:428)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:428)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:472)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:472)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.storage.RDDBlockId
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:270)
>   at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
>   at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
>   at 
> org.apache.spark.executor.TaskMetrics$$anonfun$readObject$1.apply$mcV$sp(TaskMetrics.scala:220)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)
>   ... 33 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to