Re: java serialization errors with spark.files.userClassPathFirst=true

2014-05-16 Thread Koert Kuipers
after removing all class paramater of class Path from my code, i tried
again. different but related eror when i set
spark.files.userClassPathFirst=true

now i dont even use FileInputFormat directly. HadoopRDD does...

14/05/16 12:17:17 ERROR Executor: Exception in task ID 45
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57)
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)



On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers ko...@tresata.com wrote:

 when i set spark.files.userClassPathFirst=true, i get java serialization
 errors in my tasks, see below. when i set userClassPathFirst back to its
 default of false, the serialization errors are gone. my spark.serializer is
 KryoSerializer.

 the class org.apache.hadoop.fs.Path is in the spark assembly jar, but not
 in my task jars (the ones i added to the SparkConf). so looks like the
 ClosureSerializer is having trouble with this class once the
 ChildExecutorURLClassLoader is used? thats me just guessing.

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 1.0:5 failed 4 times, most recent failure:
 Exception failure in TID 31 on host node05.tresata.com:
 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
 java.lang.Class.getDeclaredConstructors0(Native Method)
 java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
 java.lang.Class.getDeclaredConstructors(Class.java:1838)

 java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
 java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
 java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
 java.security.AccessController.doPrivileged(Native Method)

 java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)
 java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)

 java.io.ObjectInputStream.readNonProxyDesc

Re: java serialization errors with spark.files.userClassPathFirst=true

2014-05-16 Thread Koert Kuipers
)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at
 scala.collection.immutable.$colon$colon.readObject(List.scala:362)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)



 On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers ko...@tresata.com wrote:

 when i set spark.files.userClassPathFirst=true, i get java serialization
 errors in my tasks, see below. when i set userClassPathFirst back to its
 default of false, the serialization errors are gone. my spark.serializer is
 KryoSerializer.

 the class org.apache.hadoop.fs.Path is in the spark assembly jar, but not
 in my task jars (the ones i added to the SparkConf). so looks like the
 ClosureSerializer is having trouble with this class once the
 ChildExecutorURLClassLoader is used? thats me just guessing.

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 1.0:5 failed 4 times, most recent failure:
 Exception failure in TID 31 on host node05.tresata.com:
 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
 java.lang.Class.getDeclaredConstructors0(Native Method)
 java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
 java.lang.Class.getDeclaredConstructors(Class.java:1838)

 java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
 java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
 java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
 java.security.AccessController.doPrivileged(Native Method)

 java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)
 java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)

 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580)

 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
 scala.collection.immutable.$colon$colon.readObject(List.scala:362)
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 java.lang.reflect.Method.invoke(Method.java:597)

 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)

 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)

 org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:66)

 org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:139)

 java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java

Re: java serialization errors with spark.files.userClassPathFirst=true

2014-05-16 Thread Koert Kuipers
/FileInputFormat
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at
 org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at
 org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
 at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
 at
 java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
 at
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at
 scala.collection.immutable.$colon$colon.readObject(List.scala:362)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)



 On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers ko...@tresata.comwrote:

 when i set spark.files.userClassPathFirst=true, i get java
 serialization errors in my tasks, see below. when i set userClassPathFirst
 back to its default of false, the serialization errors are gone. my
 spark.serializer is KryoSerializer.

 the class org.apache.hadoop.fs.Path is in the spark assembly jar, but
 not in my task jars (the ones i added to the SparkConf). so looks like the
 ClosureSerializer is having trouble with this class once the
 ChildExecutorURLClassLoader is used? thats me just guessing.

 Exception in thread main org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 1.0:5 failed 4 times, most recent
 failure: Exception failure in TID 31 on host node05.tresata.com:
 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
 java.lang.Class.getDeclaredConstructors0(Native Method)
 java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
 java.lang.Class.getDeclaredConstructors(Class.java:1838)

 java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
 java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
 java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
 java.security.AccessController.doPrivileged(Native Method)

 java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)

 java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)

 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580)

 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)

 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)

 java.io.ObjectInputStream.defaultReadFields