Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Prem Spark
you need make sure this class is accessible to all servers since its a
cluster mode and drive can be on any of the worker nodes.


On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa  wrote:

> Hi,
>
> I'm submitting a spark job like this:
>
> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>> spark://machine1:6066 --deploy-mode cluster --jars
>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>> machine2  1000
>>
>
> and in the driver stderr, I get the following exception:
>
>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX):
>> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
> Note that everything works fine when using deploy-mode as 'client'.
> This is the application that I'm trying to run:
> https://github.com/tdas/spark-streaming-benchmark (this problem also
> happens for non streaming applications)
>
> What can I do to sort this out?
>
> Thanks.
>


Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Saiph Kappa
I found out that by commenting this line in the application code:
sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300
-XX:MaxInlineSize=300 ")

the exception does not occur anymore.  Not entirely sure why, but
everything goes fine without that line.

Thanks!

On Tue, Dec 29, 2015 at 1:39 PM, Prem Spark  wrote:

> you need make sure this class is accessible to all servers since its a
> cluster mode and drive can be on any of the worker nodes.
>
>
> On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa 
> wrote:
>
>> Hi,
>>
>> I'm submitting a spark job like this:
>>
>> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>>> spark://machine1:6066 --deploy-mode cluster --jars
>>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>>> machine2  1000
>>>
>>
>> and in the driver stderr, I get the following exception:
>>
>>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74,
>>> XXX.XXX.XX.XXX): java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:270)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>
>> Note that everything works fine when using deploy-mode as 'client'.
>> This is the application that I'm trying to run:
>> https://github.com/tdas/spark-streaming-benchmark (this problem also
>> happens for non streaming applications)
>>
>> What can I do to sort this out?
>>
>> Thanks.
>>
>
>


ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-25 Thread Saiph Kappa
Hi,

I'm submitting a spark job like this:

~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
> spark://machine1:6066 --deploy-mode cluster --jars
> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
> machine2  1000
>

and in the driver stderr, I get the following exception:

 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX):
> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>

Note that everything works fine when using deploy-mode as 'client'.
This is the application that I'm trying to run:
https://github.com/tdas/spark-streaming-benchmark (this problem also
happens for non streaming applications)

What can I do to sort this out?

Thanks.