org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory ClassNotFoundException

2023-11-07 Thread Yi Zheng
Hi,


The problem I’ve encountered is: after “spark-shell” command, when I first 
enter “spark.sql("select * from test.test_3 ").show(false)” command, it throws 
“ERROR session.SessionState: Error setting up authorization: 
java.lang.ClassNotFoundException: 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassNotFoundException: 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory”. 
when I enter “spark.sql("select * from test.test_3 ").show(false)” command for 
the second time, no error is thrown, and correct result is given back. In 
summary, whenever a new spark session is established, the first spark sql 
command always throws the error, while the following spark sql command will 
not. I have a feeling of some configuration is not set correctly but couldn’t 
figure out what the reason might be of this problem.


Below is some background information. Please let me know if additional 
information is needed. Thank you.

Modules and version:

  *   CDH:6.3.2
  *   Zookeeper:
  *   HDFS:
  *   Spark:2.4.0
  *   Yarn:
  *   Hive:2.1.1
  *   Ranger:2.1.0

Complete error message:

[root@poc6-node1 conf]# spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

23/11/07 11:16:41 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
Attempted to request executors before the AM has registered!

23/11/07 11:16:41 WARN lineage.LineageWriter: Lineage directory 
/var/log/spark/lineage doesn't exist or is not writable. Lineage for this 
application will be disabled.

Spark context Web UI available at

Spark context available as 'sc' (master = yarn, app id =).

Spark session available as 'spark'.

Welcome to

    __

 / __/__  ___ _/ /__

_\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2

  /_/



Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)

Type in expressions to have them evaluated.

Type :help for more information.



scala> spark.sql("select * from test.test_3 ").show(false)

23/11/07 11:17:30 WARN lineage.LineageWriter: Lineage directory 
/var/log/spark/lineage doesn't exist or is not writable. Lineage for this 
application will be disabled.

23/11/07 11:17:35 ERROR session.SessionState: Error setting up authorization: 
java.lang.ClassNotFoundException: 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassNotFoundException: 
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

at 
org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:385)

at 
org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:873)

at 
org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1559)

at 
org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:1239)

at 
org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:181)

at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:123)

at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:927)

at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitions$1.apply(HiveClientImpl.scala:670)

at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitions$1.apply(HiveClientImpl.scala:669)

at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)

at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)

at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)

at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)

at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitions(HiveClientImpl.scala:669)

at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartitions(HiveClient.scala:210)

at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitions(HiveClientImpl.scala:84)

at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitions$1.apply(HiveExternalCatalog.scala:1232)

at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitions$1.apply(HiveExternalCatalog.scala:1230)

at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)

at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1230)

at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitions(ExternalCatalogWithListener.scala:254)

at 

Help with ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

2022-12-30 Thread Meharji Arumilli
Dear community members,
I am using apache pyspark for the first time and have done all the 
configurations. However, I am not able to write files to local storage.
 I have described the Issue here 
https://stackoverflow.com/questions/74962675/how-to-fix-java-lang-classnotfoundexception-org-apache-spark-internal-io-cloud
Could you kindly help to solve this.
Regards
Mehar

Java Generic T makes ClassNotFoundException

2019-06-27 Thread big data
Dear,

I use Spark to deserialize some files to restore to my own Class object.

The Spark code and Class deserialized code (using Apache Common Lang) like this:

val fis = spark.sparkContext.binaryFiles("/folder/abc*.file")
val RDD = fis.map(x => {
  val content = x._2.toArray()
  val b = Block.deserializeFrom(content)
  ...
}




public static Block deserializeFrom(byte[] bytes) {
try {
Block b = SerializationUtils.deserialize(bytes);
System.out.println("b="+b);
return b;
} catch (ClassCastException e) {
System.out.println("ClassCastException");
e.printStackTrace();
} catch (IllegalArgumentException e) {
System.out.println("IllegalArgumentException");
e.printStackTrace();

} catch (SerializationException e) {
System.out.println("SerializationException");
e.printStackTrace();
}
return null;
}

Below is Commons Lang source code about deserialize:

public static  T deserialize(final byte[] objectData) {
Validate.isTrue(objectData != null, "The byte[] must not be null");
return deserialize(new ByteArrayInputStream(objectData));
}


public static  T deserialize(final InputStream inputStream) {
Validate.isTrue(inputStream != null, "The InputStream must not be null");
try (ObjectInputStream in = new ObjectInputStream(inputStream)) {
@SuppressWarnings("unchecked")
final T obj = (T) in.readObject();
return obj;
} catch (final ClassNotFoundException | IOException ex) {
throw new SerializationException(ex);
}
}

In the Spark local mode, the code runs OK. But in Cluster On Yarn mode, Spark 
code runs error like this:

org.apache.commons.lang3.SerializationException: 
java.lang.ClassNotFoundException: com.Block
at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:227)
at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:265)
at com.com...deserializeFrom(XXX.java:81)
at com.XXX.$$anonfun$3.apply(B.scala:157)
at com.XXX.$$anonfun$3.apply(B.scala:153)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.Block
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:223)


From the error, we can get error happens in Apache Common Lang package at

final T obj = (T) in.readObject();

T is Block class, and when it want to 

RE: ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1

2017-09-12 Thread PICARD Damien
Ok, it just seems to be an issue with the syntax of the spark-submit command. 
It should be :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path hibernate-validator-5.2.2.Final.jar \
--conf "spark.executor.extraClassPath=hibernate -validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

I also have to add some others jars, like jboss-logging to meet the needs of 
hibernate-validator.

De : PICARD Damien (EXT) AssuResPriSmsAts
Envoyé : lundi 11 septembre 2017 08:53
À : 'user@spark.apache.org'
Objet : ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1

Hi !

I'm facing a Classloader problem using Spark 1.5.1

I use javax.validation and hibernate validation annotations on some of my beans 
:

  @NotBlank
  @Valid
  private String attribute1 ;

  @Valid
  private String attribute2 ;

When Spark tries to unmarshall these beans (after a remote RDD), I get the 
ClassNotFoundException :
17/09/07 09:19:25 INFO storage.BlockManager: Found block rdd_8_1 remotely
17/09/07 09:19:25 ERROR executor.Executor: Exception in task 3.0 in stage 2.0 
(TID 6)
java.lang.ClassNotFoundException: org.hibernate.validator.constraints.NotBlank
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:700)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1566)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
   ...

Indeed, it means that the annotation class is not found, because it is not in 
the classpath. Why ? I don't know, because I make a uber JAR that contains this 
class. I suppose that at the time the job tries to unmarshall the RDD, the uber 
jar is not loaded.

So, I try to add the hibernate JAR to the class loader manually, using this 
spark-submit command :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path /home/user/hibernate-validator-5.2.2.Final.jar \
--conf 
"spark.executor.extraClassPath=/home/user/hibernate-validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

Without effects. So, is there a way to add this class to the classloader ?

Thank you in advance.

Damien


=

Ce message et toutes les pieces jointes (ci-apres le "message")
sont confidentiels et susceptibles de contenir des informations
couvertes par le secret professionnel. Ce message est etabli
a l'intention exclusive de ses destinataires. Toute utilisation
ou diffusion non autorisee interdite.
Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
et ses filiales declinent toute responsabilite au titre de ce message
s'il a ete altere, deforme falsifie.

=

This message and any attachments (the "message") are confidential,
intended solely for the addresses, and may contain legally privileged
information. Any unauthorized use or dissemination is prohibited.
E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any
of its subsidiaries or affiliates shall be liable for the message
if altered, changed or falsified.

=


ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1

2017-09-11 Thread PICARD Damien
Hi !

I'm facing a Classloader problem using Spark 1.5.1

I use javax.validation and hibernate validation annotations on some of my beans 
:

  @NotBlank
  @Valid
  private String attribute1 ;

  @Valid
  private String attribute2 ;

When Spark tries to unmarshall these beans (after a remote RDD), I get the 
ClassNotFoundException :
17/09/07 09:19:25 INFO storage.BlockManager: Found block rdd_8_1 remotely
17/09/07 09:19:25 ERROR executor.Executor: Exception in task 3.0 in stage 2.0 
(TID 6)
java.lang.ClassNotFoundException: org.hibernate.validator.constraints.NotBlank
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:700)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1566)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
   ...

Indeed, it means that the annotation class is not found, because it is not in 
the classpath. Why ? I don't know, because I make a uber JAR that contains this 
class. I suppose that at the time the job tries to unmarshall the RDD, the uber 
jar is not loaded.

So, I try to add the hibernate JAR to the class loader manually, using this 
spark-submit command :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path /home/user/hibernate-validator-5.2.2.Final.jar \
--conf 
"spark.executor.extraClassPath=/home/user/hibernate-validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

Without effects. So, is there a way to add this class to the classloader ?

Thank you in advance.

Damien


=

Ce message et toutes les pieces jointes (ci-apres le "message")
sont confidentiels et susceptibles de contenir des informations
couvertes par le secret professionnel. Ce message est etabli
a l'intention exclusive de ses destinataires. Toute utilisation
ou diffusion non autorisee interdite.
Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
et ses filiales declinent toute responsabilite au titre de ce message
s'il a ete altere, deforme falsifie.

=

This message and any attachments (the "message") are confidential,
intended solely for the addresses, and may contain legally privileged
information. Any unauthorized use or dissemination is prohibited.
E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any
of its subsidiaries or affiliates shall be liable for the message
if altered, changed or falsified.

=


Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex
Thanks, Marcelo. Will give it a shot tomorrow.

-Alex

On 8/9/17, 5:59 PM, "Marcelo Vanzin"  wrote:

Jars distributed using --jars are not added to the system classpath,
so log4j cannot see them.

To work around that, you need to manually add the *name* jar to the
driver executor classpaths:

spark.driver.extraClassPath=some.jar
spark.executor.extraClassPath=some.jar

In client mode you should use spark.yarn.dist.jars instead of --jars,
and change the driver classpath above to point to the local copy of
the jar.


On Wed, Aug 9, 2017 at 2:52 PM, Mikhailau, Alex  
wrote:
> I have log4j json layout jars added via spark-submit on EMR
>
>
>
> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars
> 
/home/hadoop/lib/jsonevent-layout-1.7.jar,/home/hadoop/lib/json-smart-1.1.1.jar
> --driver-java-options "-XX:+AlwaysPreTouch -XX:MaxPermSize=6G" --class
> com.mlbam.emr.XXX  s3://xxx/aa/jars/ spark-job-assembly-1.4.1-SNAPSHOT.jar
> ActionOnFailure=CONTINUE
>
>
>
>
>
> this is the process running on the executor:
>
>
>
> /usr/lib/jvm/java-1.8.0/bin/java -server -Xmx8192m -XX:+AlwaysPreTouch
> -XX:MaxPermSize=6G
> 
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/tmp
> -Dspark.driver.port=32869 -Dspark.history.ui.port=18080 -Dspark.ui.port=0
> 
-Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1502310393755_0003/container_1502310393755_0003_01_05
> -XX:OnOutOfMemoryError=kill %p
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://CoarseGrainedScheduler@10.202.138.158:32869 --executor-id 3
> --hostname ip-10-202-138-98.mlbam.qa.us-east-1.bamgrid.net --cores 8
> --app-id application_1502310393755_0003 --user-class-path
> 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/__app__.jar
> --user-class-path
> 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/jsonevent-layout-1.7.jar
> --user-class-path
> 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/json-smart-1.1.1.jar
>
>
>
> I see that jsonevent-layout-1.7.jar is passed as –user-class-path to the 
job
> (see the above process), yet, I see the following log exception in my
> stderr:
>
>
>
> log4j:ERROR Could not instantiate class
> [net.logstash.log4j.JSONEventLayoutV1].
>
> java.lang.ClassNotFoundException: net.logstash.log4j.JSONEventLayoutV1
>
>
>
>
>
> Am I doing something wrong?
>
>
>
> Thank you,
>
>
>
> Alex



-- 
Marcelo



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Marcelo Vanzin
Jars distributed using --jars are not added to the system classpath,
so log4j cannot see them.

To work around that, you need to manually add the *name* jar to the
driver executor classpaths:

spark.driver.extraClassPath=some.jar
spark.executor.extraClassPath=some.jar

In client mode you should use spark.yarn.dist.jars instead of --jars,
and change the driver classpath above to point to the local copy of
the jar.


On Wed, Aug 9, 2017 at 2:52 PM, Mikhailau, Alex  wrote:
> I have log4j json layout jars added via spark-submit on EMR
>
>
>
> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars
> /home/hadoop/lib/jsonevent-layout-1.7.jar,/home/hadoop/lib/json-smart-1.1.1.jar
> --driver-java-options "-XX:+AlwaysPreTouch -XX:MaxPermSize=6G" --class
> com.mlbam.emr.XXX  s3://xxx/aa/jars/ spark-job-assembly-1.4.1-SNAPSHOT.jar
> ActionOnFailure=CONTINUE
>
>
>
>
>
> this is the process running on the executor:
>
>
>
> /usr/lib/jvm/java-1.8.0/bin/java -server -Xmx8192m -XX:+AlwaysPreTouch
> -XX:MaxPermSize=6G
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/tmp
> -Dspark.driver.port=32869 -Dspark.history.ui.port=18080 -Dspark.ui.port=0
> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1502310393755_0003/container_1502310393755_0003_01_05
> -XX:OnOutOfMemoryError=kill %p
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://CoarseGrainedScheduler@10.202.138.158:32869 --executor-id 3
> --hostname ip-10-202-138-98.mlbam.qa.us-east-1.bamgrid.net --cores 8
> --app-id application_1502310393755_0003 --user-class-path
> file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/__app__.jar
> --user-class-path
> file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/jsonevent-layout-1.7.jar
> --user-class-path
> file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/json-smart-1.1.1.jar
>
>
>
> I see that jsonevent-layout-1.7.jar is passed as –user-class-path to the job
> (see the above process), yet, I see the following log exception in my
> stderr:
>
>
>
> log4j:ERROR Could not instantiate class
> [net.logstash.log4j.JSONEventLayoutV1].
>
> java.lang.ClassNotFoundException: net.logstash.log4j.JSONEventLayoutV1
>
>
>
>
>
> Am I doing something wrong?
>
>
>
> Thank you,
>
>
>
> Alex



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



--jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex
I have log4j json layout jars added via spark-submit on EMR

/usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars 
/home/hadoop/lib/jsonevent-layout-1.7.jar,/home/hadoop/lib/json-smart-1.1.1.jar 
--driver-java-options "-XX:+AlwaysPreTouch -XX:MaxPermSize=6G" --class 
com.mlbam.emr.XXX  s3://xxx/aa/jars/ spark-job-assembly-1.4.1-SNAPSHOT.jar 
ActionOnFailure=CONTINUE


this is the process running on the executor:

/usr/lib/jvm/java-1.8.0/bin/java -server -Xmx8192m -XX:+AlwaysPreTouch 
-XX:MaxPermSize=6G 
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/tmp
 -Dspark.driver.port=32869 -Dspark.history.ui.port=18080 -Dspark.ui.port=0 
-Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1502310393755_0003/container_1502310393755_0003_01_05
 -XX:OnOutOfMemoryError=kill %p 
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
spark://CoarseGrainedScheduler@10.202.138.158:32869 --executor-id 3 --hostname 
ip-10-202-138-98.mlbam.qa.us-east-1.bamgrid.net --cores 8 --app-id 
application_1502310393755_0003 --user-class-path 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/__app__.jar
 --user-class-path 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/jsonevent-layout-1.7.jar
 --user-class-path 
file:/mnt/yarn/usercache/hadoop/appcache/application_1502310393755_0003/container_1502310393755_0003_01_05/json-smart-1.1.1.jar

I see that jsonevent-layout-1.7.jar is passed as –user-class-path to the job 
(see the above process), yet, I see the following log exception in my stderr:

log4j:ERROR Could not instantiate class [net.logstash.log4j.JSONEventLayoutV1].
java.lang.ClassNotFoundException: net.logstash.log4j.JSONEventLayoutV1


Am I doing something wrong?

Thank you,

Alex


Re: ClassNotFoundException for Workers

2017-07-31 Thread Noppanit Charassinvichai
I've included that in my build file for the fat jar already.


libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"

Not sure if I need special configuration?

On Tue, 25 Jul 2017 at 04:17 周康  wrote:

> Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath
> which include your application jar and attached executor jars.
>
> 2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai :
>
>> I have this spark job which is using S3 client in mapPartition. And I get
>> this error
>>
>> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 3.0 (TID 74,
>> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
>> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
>> +details
>> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 3.0 (TID 74,
>> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
>> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
>> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49)
>> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> This is my code
>> val jsonRows = sqs.mapPartitions(partitions => {
>>   val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new
>> DefaultCredentialsProvider).build()
>>
>>   val txfm = new LogLine2Json
>>   val log = Logger.getLogger("parseLog")
>>
>>   partitions.flatMap(messages => {
>> val sqsMsg = Json.parse(messages)
>> val bucketName =
>> Json.stringify(sqsMsg("Records")(0)("s3")("bucket")("name")).replace("\"",
>> "")
>> val key =
>> Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"",
>> "")
>> val obj = s3Client.getObject(new GetObjectRequest(bucketName,
>> key))
>> val stream = obj.getObjectContent()
>>
>> scala.io.Source.fromInputStream(stream).getLines().map(line => {
>>   try {
>> txfm.parseLine(line)
>>   }
>>   catch {
>> case e: Throwable => {
>>   log.info(line); "{}";
>> }
>>   }
>> }).filter(line => line != "{}")
>>   })
>> })
>>
>> This is my build.sbt
>>
>> name := "sparrow-to-orc"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>> % "provided"
>>
>> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" %
>> "provided"
>> libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" %
>> "provided"
>> libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT"
>>
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"
>>
>> libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+"
>> libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0"
>> dependencyOverrides ++= Set("com.fasterxml.jackson.core" %
>> "jackson-databind" % "2.6.0")
>>
>>
>>
>> 

Re: ClassNotFoundException for Workers

2017-07-25 Thread 周康
Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath
which include your application jar and attached executor jars.

2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai :

> I have this spark job which is using S3 client in mapPartition. And I get
> this error
>
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 3.0 (TID 74,
> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
> +details
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 3.0 (TID 74,
> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49)
> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46)
> at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:796)
> at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:796)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:282)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> This is my code
> val jsonRows = sqs.mapPartitions(partitions => {
>   val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new
> DefaultCredentialsProvider).build()
>
>   val txfm = new LogLine2Json
>   val log = Logger.getLogger("parseLog")
>
>   partitions.flatMap(messages => {
> val sqsMsg = Json.parse(messages)
> val bucketName = Json.stringify(sqsMsg("
> Records")(0)("s3")("bucket")("name")).replace("\"", "")
> val key = 
> Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"",
> "")
> val obj = s3Client.getObject(new GetObjectRequest(bucketName, key))
> val stream = obj.getObjectContent()
>
> scala.io.Source.fromInputStream(stream).getLines().map(line => {
>   try {
> txfm.parseLine(line)
>   }
>   catch {
> case e: Throwable => {
>   log.info(line); "{}";
> }
>   }
> }).filter(line => line != "{}")
>   })
> })
>
> This is my build.sbt
>
> name := "sparrow-to-orc"
>
> version := "0.1"
>
> scalaVersion := "2.11.8"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" %
> "provided"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" %
> "provided"
> libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" %
> "provided"
> libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT"
>
> libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
> libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
> libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"
>
> libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+"
> libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0"
> dependencyOverrides ++= Set("com.fasterxml.jackson.core" %
> "jackson-databind" % "2.6.0")
>
>
>
> assemblyMergeStrategy in assembly := {
>   case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
>   case PathList("org", "apache", xs @ _*) => MergeStrategy.last
>   case PathList("com", "google", xs @ _*) => MergeStrategy.last
>   case PathList("com", "esotericsoftware", xs @ _*) => 

ClassNotFoundException for Workers

2017-07-19 Thread Noppanit Charassinvichai
I have this spark job which is using S3 client in mapPartition. And I get
this error

Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 3.0 (TID 74,
ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
+details
Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 3.0 (TID 74,
ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49)
at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

This is my code
val jsonRows = sqs.mapPartitions(partitions => {
  val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new
DefaultCredentialsProvider).build()

  val txfm = new LogLine2Json
  val log = Logger.getLogger("parseLog")

  partitions.flatMap(messages => {
val sqsMsg = Json.parse(messages)
val bucketName =
Json.stringify(sqsMsg("Records")(0)("s3")("bucket")("name")).replace("\"",
"")
val key =
Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"",
"")
val obj = s3Client.getObject(new GetObjectRequest(bucketName, key))
val stream = obj.getObjectContent()

scala.io.Source.fromInputStream(stream).getLines().map(line => {
  try {
txfm.parseLine(line)
  }
  catch {
case e: Throwable => {
  log.info(line); "{}";
}
  }
}).filter(line => line != "{}")
  })
})

This is my build.sbt

name := "sparrow-to-orc"

version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" %
"provided"

libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" %
"provided"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" %
"provided"
libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT"

libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"

libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+"
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0"
dependencyOverrides ++= Set("com.fasterxml.jackson.core" %
"jackson-databind" % "2.6.0")



assemblyMergeStrategy in assembly := {
  case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("com", "google", xs @ _*) => MergeStrategy.last
  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
  case PathList("com", "amazonaws", xs @ _*) => MergeStrategy.last
  case PathList("com", "typesafe", xs @ _*) => MergeStrategy.last
  case "about.html" => MergeStrategy.rename
  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
  case "META-INF/mailcap" => MergeStrategy.last
  case 

Re: Sporadic ClassNotFoundException with Kryo

2017-01-12 Thread Nirmal Fernando
I faced a similar issue and had to do two things;

1. Submit Kryo jar with the spark-submit
2. Set spark.executor.userClassPathFirst true in Spark conf

On Fri, Nov 18, 2016 at 7:39 PM, chrism <christopher.martens...@cics.se>
wrote:

> Regardless of the different ways we have tried deploying a jar together
> with
> Spark, when running a Spark Streaming job with Kryo as serializer on top of
> Mesos, we sporadically get the following error (I have truncated a bit):
>
> /16/11/18 08:39:10 ERROR OneForOneBlockFetcher: Failed while starting block
> fetches
> java.lang.RuntimeException: org.apache.spark.SparkException: Failed to
> register classes with Kryo
>   at
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSeria
> lizer.scala:129)
>   at
> org.apache.spark.serializer.KryoSerializerInstance.borrowKry
> o(KryoSerializer.scala:274)
> ...
>   at
> org.apache.spark.serializer.SerializerManager.dataSerializeS
> tream(SerializerManager.scala:125)
>   at
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemor
> y$3.apply(BlockManager.scala:1265)
>   at
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemor
> y$3.apply(BlockManager.scala:1261)
> ...
> Caused by: java.lang.ClassNotFoundException: cics.udr.compound_ran_udr
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)/
>
> where "cics.udr.compound_ran_udr" is a class provided by us in a jar.
>
> We know that the jar containing "cics.udr.compound_ran_udr" is being
> deployed and works because it is listed in the "Environment" tab in the
> GUI,
> and calculations using this class succeed.
>
> We have tried the following methods of deploying the jar containing the
> class
>  * Through --jars in spark-submit
>  * Through SparkConf.setJar
>  * Through spark.driver.extraClassPath and spark.executor.extraClassPath
>  * By having it as the main jar used by spark-submit
> with no luck. The logs (see attached) recognize that the jar is being added
> to the classloader.
>
> We have tried registering the class using
>  * SparkConf.registerKryoClasses.
>  * spark.kryo.classesToRegister
> with no luck.
>
> We are running on Mesos and the jar has been deployed on every machine on
> the local file system in the same location.
>
> I would be very grateful for any help or ideas :)
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Sporadic-ClassNotFoundException-with-K
> ryo-tp28104.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733 <+94%2071%20577%209733>
Blog: http://nirmalfdo.blogspot.com/


Sporadic ClassNotFoundException with Kryo

2016-11-18 Thread chrism
Regardless of the different ways we have tried deploying a jar together with
Spark, when running a Spark Streaming job with Kryo as serializer on top of
Mesos, we sporadically get the following error (I have truncated a bit):

/16/11/18 08:39:10 ERROR OneForOneBlockFetcher: Failed while starting block
fetches
java.lang.RuntimeException: org.apache.spark.SparkException: Failed to
register classes with Kryo
  at
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:129)
  at
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:274)
...
  at
org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:125)
  at
org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1265)
  at
org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1261)
...
Caused by: java.lang.ClassNotFoundException: cics.udr.compound_ran_udr
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)/

where "cics.udr.compound_ran_udr" is a class provided by us in a jar.

We know that the jar containing "cics.udr.compound_ran_udr" is being
deployed and works because it is listed in the "Environment" tab in the GUI,
and calculations using this class succeed.

We have tried the following methods of deploying the jar containing the
class
 * Through --jars in spark-submit
 * Through SparkConf.setJar
 * Through spark.driver.extraClassPath and spark.executor.extraClassPath
 * By having it as the main jar used by spark-submit
with no luck. The logs (see attached) recognize that the jar is being added
to the classloader.

We have tried registering the class using
 * SparkConf.registerKryoClasses.
 * spark.kryo.classesToRegister
with no luck.

We are running on Mesos and the jar has been deployed on every machine on
the local file system in the same location.

I would be very grateful for any help or ideas :)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sporadic-ClassNotFoundException-with-Kryo-tp28104.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark streaming 2, giving error ClassNotFoundException: scala.collection.GenTraversableOnce$class

2016-08-19 Thread Mich Talebzadeh
Thanks

--jars /home/hduser/jars/spark-streaming-kafka-assembly_*2.11*-1.6.1.jar

sorted it out

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 August 2016 at 20:19, Tathagata Das 
wrote:

> You seem to combining Scala 2.10 and 2.11 libraries - your sbt project is
> 2.11, where as you are trying to pull in spark-streaming-kafka-assembly_
> *2.10*-1.6.1.jar.
>
> On Fri, Aug 19, 2016 at 11:24 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> My spark streaming app with 1.6.1 used to work.
>>
>> Now with
>>
>> scala> sc version
>> res0: String = 2.0.0
>>
>> Compiling with sbt assembly as before, with the following:
>>
>> version := "1.0",
>> scalaVersion := "2.11.8",
>> mainClass in Compile := Some("myPackage.${APPLICATION}")
>>   )
>> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0"
>> % "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
>> "1.6.1" % "provided"
>>
>>
>> I downgradedscalaVersion to 2.10.4, it did not change.
>>
>> It compiles OK but at run time it fails
>>
>> This Jar is added to spark-summit
>>
>> --jars /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>>
>> And this is the error
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> scala/collection/GenTraversableOnce$class
>> at kafka.utils.Pool.(Pool.scala:28)
>> at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(
>> FetchRequestAndResponseStats.scala:60)
>> at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<
>> clinit>(FetchRequestAndResponseStats.scala)
>> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
>> at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaC
>> luster.scala:52)
>> at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$
>> apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.
>> apply(KafkaCluster.scala:345)
>> at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$
>> apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.
>> apply(KafkaCluster.scala:342)
>> at scala.collection.IndexedSeqOptimized$class.foreach(
>> IndexedSeqOptimized.scala:33)
>> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.
>> scala:35)
>> at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spa
>> rk$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
>> at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMe
>> tadata(KafkaCluster.scala:125)
>> at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(
>> KafkaCluster.scala:112)
>> at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(
>> KafkaUtils.scala:211)
>> at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStr
>> eam(KafkaUtils.scala:484)
>> at CEP_streaming$.main(CEP_streaming.scala:123)
>> at CEP_streaming.main(CEP_streaming.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:729)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:185)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:210)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 124)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> scala.collection.GenTraversableOnce$class
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>
>>
>> Any ideas appreciated
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> 

Re: Spark streaming 2, giving error ClassNotFoundException: scala.collection.GenTraversableOnce$class

2016-08-19 Thread Tathagata Das
You seem to combining Scala 2.10 and 2.11 libraries - your sbt project is
2.11, where as you are trying to pull in spark-streaming-kafka-assembly_
*2.10*-1.6.1.jar.

On Fri, Aug 19, 2016 at 11:24 AM, Mich Talebzadeh  wrote:

> Hi,
>
> My spark streaming app with 1.6.1 used to work.
>
> Now with
>
> scala> sc version
> res0: String = 2.0.0
>
> Compiling with sbt assembly as before, with the following:
>
> version := "1.0",
> scalaVersion := "2.11.8",
> mainClass in Compile := Some("myPackage.${APPLICATION}")
>   )
> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
> "1.6.1" % "provided"
>
>
> I downgradedscalaVersion to 2.10.4, it did not change.
>
> It compiles OK but at run time it fails
>
> This Jar is added to spark-summit
>
> --jars /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>
> And this is the error
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> scala/collection/GenTraversableOnce$class
> at kafka.utils.Pool.(Pool.scala:28)
> at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(
> FetchRequestAndResponseStats.scala:60)
> at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(
> FetchRequestAndResponseStats.scala)
> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
> at org.apache.spark.streaming.kafka.KafkaCluster.connect(
> KafkaCluster.scala:52)
> at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$
> org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(
> KafkaCluster.scala:345)
> at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$
> org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(
> KafkaCluster.scala:342)
> at scala.collection.IndexedSeqOptimized$class.
> foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(
> WrappedArray.scala:35)
> at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$
> spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
> at org.apache.spark.streaming.kafka.KafkaCluster.
> getPartitionMetadata(KafkaCluster.scala:125)
> at org.apache.spark.streaming.kafka.KafkaCluster.
> getPartitions(KafkaCluster.scala:112)
> at org.apache.spark.streaming.kafka.KafkaUtils$.
> getFromOffsets(KafkaUtils.scala:211)
> at org.apache.spark.streaming.kafka.KafkaUtils$.
> createDirectStream(KafkaUtils.scala:484)
> at CEP_streaming$.main(CEP_streaming.scala:123)
> at CEP_streaming.main(CEP_streaming.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:185)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:210)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:124)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: scala.collection.
> GenTraversableOnce$class
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
>
> Any ideas appreciated
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Spark streaming 2, giving error ClassNotFoundException: scala.collection.GenTraversableOnce$class

2016-08-19 Thread Mich Talebzadeh
Hi,

My spark streaming app with 1.6.1 used to work.

Now with

scala> sc version
res0: String = 2.0.0

Compiling with sbt assembly as before, with the following:

version := "1.0",
scalaVersion := "2.11.8",
mainClass in Compile := Some("myPackage.${APPLICATION}")
  )
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
"1.6.1" % "provided"


I downgradedscalaVersion to 2.10.4, it did not change.

It compiles OK but at run time it fails

This Jar is added to spark-summit

--jars /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \

And this is the error

Exception in thread "main" java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28)
at
kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
at
kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
at
org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at
org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at
org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org
$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at
org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at
org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at
org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at
org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at CEP_streaming$.main(CEP_streaming.scala:123)
at CEP_streaming.main(CEP_streaming.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain
(SparkSubmit.scala:729)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)


Any ideas appreciated

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: [Spark 2.0] ClassNotFoundException is thrown when using Hive

2016-08-18 Thread Aditya

Try using --files /path/of/hive-site.xml  in spark-submit and run.

On Thursday 18 August 2016 05:26 PM, Diwakar Dhanuskodi wrote:

Hi

Can  you  cross check by providing same library path in --jars of 
spark-submit and run .



Sent from Samsung Mobile.


 Original message 
From: "颜发才(Yan Facai)" <yaf...@gmail.com>
Date:18/08/2016 15:17 (GMT+05:30)
To: "user.spark" <user@spark.apache.org>
Cc:
Subject: [Spark 2.0] ClassNotFoundException is thrown when using Hive

Hi, all.

I copied hdfs-site.xml, core-site.xml and hive-site.xml to 
$SPARK_HOME/conf.
And spark-submit is used to submit task to yarn, and run as **client** 
mode.

However, ClassNotFoundException is thrown.

some details of logs are list below:
```
16/08/12 17:07:32 INFO hive.HiveUtils: Initializing 
HiveMetastoreConnection version 0.13.1 using 
file:/data0/facai/lib/hive-0.13.1/lib:file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
16/08/12 17:07:32 ERROR yarn.ApplicationMaster: User class threw 
exception: java.lang.ClassNotFoundException: 
java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/ql/session/SessionState when creating Hive 
client using classpath: file:/data0/facai/lib/hive-0.13.1/lib, 
file:/data0/facai/lib/hadoop-2.4.1/share/hadoop

```

In fact, all the jars needed by hive is  in the directory:
```Bash
[hadoop@h107713699 spark_test]$ ls /data0/facai/lib/hive-0.13.1/lib/ | 
grep hive

hive-ant-0.13.1.jar
hive-beeline-0.13.1.jar
hive-cli-0.13.1.jar
hive-common-0.13.1.jar
...
```

So, my question is:
why spark cannot find the jars needed?

Any help will be appreciate, thanks.







RE: [Spark 2.0] ClassNotFoundException is thrown when using Hive

2016-08-18 Thread Diwakar Dhanuskodi
Hi

Can  you  cross check by providing same library path in --jars of spark-submit 
and run .


Sent from Samsung Mobile.

 Original message From: "颜发才(Yan Facai)" 
<yaf...@gmail.com> Date:18/08/2016  15:17  (GMT+05:30) 
To: "user.spark" <user@spark.apache.org> Cc:  
Subject: [Spark 2.0] ClassNotFoundException is thrown when using 
Hive 
Hi, all.

I copied hdfs-site.xml, core-site.xml and hive-site.xml to $SPARK_HOME/conf. 
And spark-submit is used to submit task to yarn, and run as **client** mode. 
However, ClassNotFoundException is thrown.

some details of logs are list below:
```
16/08/12 17:07:32 INFO hive.HiveUtils: Initializing HiveMetastoreConnection 
version 0.13.1 using 
file:/data0/facai/lib/hive-0.13.1/lib:file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
16/08/12 17:07:32 ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/ql/session/SessionState when creating Hive client using 
classpath: file:/data0/facai/lib/hive-0.13.1/lib, 
file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
```

In fact, all the jars needed by hive is  in the directory:
```Bash
[hadoop@h107713699 spark_test]$ ls /data0/facai/lib/hive-0.13.1/lib/ | grep hive
hive-ant-0.13.1.jar
hive-beeline-0.13.1.jar
hive-cli-0.13.1.jar
hive-common-0.13.1.jar
...
```

So, my question is:
why spark cannot find the jars needed? 

Any help will be appreciate, thanks.



Re: [Spark 2.0] ClassNotFoundException is thrown when using Hive

2016-08-18 Thread Mich Talebzadeh
when you start spark-shell does it work or this issue is only with
spark-submit?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 August 2016 at 10:47, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:

> Hi, all.
>
> I copied hdfs-site.xml, core-site.xml and hive-site.xml to
> $SPARK_HOME/conf.
> And spark-submit is used to submit task to yarn, and run as **client**
> mode.
> However, ClassNotFoundException is thrown.
>
> some details of logs are list below:
> ```
> 16/08/12 17:07:32 INFO hive.HiveUtils: Initializing
> HiveMetastoreConnection version 0.13.1 using file:/data0/facai/lib/hive-0.1
> 3.1/lib:file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
> 16/08/12 17:07:32 ERROR yarn.ApplicationMaster: User class threw
> exception: java.lang.ClassNotFoundException:
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/session/SessionState
> when creating Hive client using classpath: 
> file:/data0/facai/lib/hive-0.13.1/lib,
> file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
> ```
>
> In fact, all the jars needed by hive is  in the directory:
> ```Bash
> [hadoop@h107713699 spark_test]$ ls /data0/facai/lib/hive-0.13.1/lib/ |
> grep hive
> hive-ant-0.13.1.jar
> hive-beeline-0.13.1.jar
> hive-cli-0.13.1.jar
> hive-common-0.13.1.jar
> ...
> ```
>
> So, my question is:
> why spark cannot find the jars needed?
>
> Any help will be appreciate, thanks.
>
>


[Spark 2.0] ClassNotFoundException is thrown when using Hive

2016-08-18 Thread Yan Facai
Hi, all.

I copied hdfs-site.xml, core-site.xml and hive-site.xml to
$SPARK_HOME/conf.
And spark-submit is used to submit task to yarn, and run as **client**
mode.
However, ClassNotFoundException is thrown.

some details of logs are list below:
```
16/08/12 17:07:32 INFO hive.HiveUtils: Initializing HiveMetastoreConnection
version 0.13.1 using file:/data0/facai/lib/hive-0.1
3.1/lib:file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
16/08/12 17:07:32 ERROR yarn.ApplicationMaster: User class threw exception:
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError:
org/apache/hadoop/hive/ql/session/SessionState when creating Hive client
using classpath: file:/data0/facai/lib/hive-0.13.1/lib,
file:/data0/facai/lib/hadoop-2.4.1/share/hadoop
```

In fact, all the jars needed by hive is  in the directory:
```Bash
[hadoop@h107713699 spark_test]$ ls /data0/facai/lib/hive-0.13.1/lib/ | grep
hive
hive-ant-0.13.1.jar
hive-beeline-0.13.1.jar
hive-cli-0.13.1.jar
hive-common-0.13.1.jar
...
```

So, my question is:
why spark cannot find the jars needed?

Any help will be appreciate, thanks.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Thanks Marcelo.
Problem solved.

Best,
Carlo


Hi Marcelo,

Thanks you for your help.
Problem solved as you suggested.

Best Regards,
Carlo

> On 5 Aug 2016, at 18:34, Marcelo Vanzin  wrote:
>
> On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca  
> wrote:
>>
>>org.apache.spark
>>spark-core_2.10
>>2.0.0
>>jar
>>
>>
>>org.apache.spark
>>spark-sql_2.10
>>2.0.0
>>jar
>>
>>
>>org.apache.spark
>>spark-mllib_2.10
>>1.3.0
>>jar
>>
>>
>>
>
> One of these is not like the others...
>
> --
> Marcelo

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
I have also executed:

mvn dependency:tree |grep log
[INFO] |  | +- com.esotericsoftware:minlog:jar:1.3.0:compile
[INFO] +- log4j:log4j:jar:1.2.17:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.3:compile


and the POM reports the above libraries.

Many Thanks for your help.

Carlo


On 5 Aug 2016, at 18:17, Carlo.Allocca 
> wrote:

Please Sean, could you detail the version mismatch?

Many thanks,
Carlo
On 5 Aug 2016, at 18:11, Sean Owen 
> wrote:

You also seem to have a
version mismatch here.


-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Marcelo Vanzin
On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca  wrote:
> 
> org.apache.spark
> spark-core_2.10
> 2.0.0
> jar
> 
> 
> org.apache.spark
> spark-sql_2.10
> 2.0.0
> jar
> 
> 
> org.apache.spark
> spark-mllib_2.10
> 1.3.0
> jar
> 
>
> 

One of these is not like the others...

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Please Sean, could you detail the version mismatch?

Many thanks,
Carlo
On 5 Aug 2016, at 18:11, Sean Owen 
> wrote:

You also seem to have a
version mismatch here.

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
One option is to clone the class in your own project.

Experts may have better solution.

Cheers

On Fri, Aug 5, 2016 at 10:10 AM, Carlo.Allocca 
wrote:

> Hi Ted,
>
> Thanks for the promptly answer.
> It is not yet clear to me what I should do.
>
> How to fix it?
>
> Many thanks,
> Carlo
>
> On 5 Aug 2016, at 17:58, Ted Yu  wrote:
>
> private[spark] trait Logging {
>
>
> -- The Open University is incorporated by Royal Charter (RC 000391), an
> exempt charity in England & Wales and a charity registered in Scotland (SC
> 038302). The Open University is authorised and regulated by the Financial
> Conduct Authority.
>


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Hi Ted,

Thanks for the promptly answer.
It is not yet clear to me what I should do.

How to fix it?

Many thanks,
Carlo

On 5 Aug 2016, at 17:58, Ted Yu 
> wrote:

private[spark] trait Logging {

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
In 2.0, Logging became private:

private[spark] trait Logging {

FYI

On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca 
wrote:

> Dear All,
>
> I would like to ask for your help about the following issue: 
> java.lang.ClassNotFoundException:
> org.apache.spark.Logging
>
> I checked and the class Logging is not present.
> Moreover, the line of code where the exception is thrown
>
> final org.apache.spark.mllib.regression.LinearRegressionModel lrModel
> = LinearRegressionWithSGD.train(a, numIterations,
> stepSize);
>
>
> My POM is as reported below.
>
>
> What am I doing wrong or missing? How I can fix it?
>
> Many Thanks in advice for your support.
>
> Best,
> Carlo
>
>
>
>  POM
>
> 
>
> 
> org.apache.spark
> spark-core_2.10
> 2.0.0
> jar
> 
>
>
> 
> org.apache.spark
> spark-sql_2.10
> 2.0.0
> jar
> 
>
> 
> log4j
> log4j
> 1.2.17
> test
> 
>
>
> 
> org.slf4j
> slf4j-log4j12
> 1.7.16
> test
> 
>
>
> 
> org.apache.hadoop
> hadoop-client
> 2.7.2
> 
>
> 
> junit
> junit
> 4.12
> 
>
> 
> org.hamcrest
> hamcrest-core
> 1.3
> 
> 
> org.apache.spark
> spark-mllib_2.10
> 1.3.0
> jar
> 
>
> 
>
> -- The Open University is incorporated by Royal Charter (RC 000391), an
> exempt charity in England & Wales and a charity registered in Scotland (SC
> 038302). The Open University is authorised and regulated by the Financial
> Conduct Authority.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Dear All,

I would like to ask for your help about the following issue: 
java.lang.ClassNotFoundException: org.apache.spark.Logging

I checked and the class Logging is not present.
Moreover, the line of code where the exception is thrown

final org.apache.spark.mllib.regression.LinearRegressionModel lrModel
= LinearRegressionWithSGD.train(a, numIterations, stepSize);


My POM is as reported below.


What am I doing wrong or missing? How I can fix it?

Many Thanks in advice for your support.

Best,
Carlo



 POM




org.apache.spark
spark-core_2.10
2.0.0
jar




org.apache.spark
spark-sql_2.10
2.0.0
jar



log4j
log4j
1.2.17
test




org.slf4j
slf4j-log4j12
1.7.16
test




org.apache.hadoop
hadoop-client
2.7.2



junit
junit
4.12



org.hamcrest
hamcrest-core
1.3


org.apache.spark
spark-mllib_2.10
1.3.0
jar




-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter

2016-07-07 Thread Bryan Cutler
Can you try running the example like this

./bin/run-example sql.RDDRelation 

I know there are some jars in the example folders, and running them this
way adds them to the classpath
On Jul 7, 2016 3:47 AM, "kevin"  wrote:

> hi,all:
> I build spark use:
>
> ./make-distribution.sh --name "hadoop2.7.1" --tgz
> "-Pyarn,hadoop-2.6,parquet-provided,hive,hive-thriftserver" -DskipTests
> -Dhadoop.version=2.7.1
>
> I can run example :
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
> --master spark://master1:7077 \
> --driver-memory 1g \
> --executor-memory 512m \
> --executor-cores 1 \
> lib/spark-examples*.jar \
> 10
>
> but can't run example :
> org.apache.spark.examples.sql.RDDRelation
>
> *I got error:*
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/2 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/4 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/3 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/0 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/1 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/5 is now RUNNING
> 16/07/07 18:28:46 INFO cluster.SparkDeploySchedulerBackend:
> SchedulerBackend is ready for scheduling beginning after reached
> minRegisteredResourcesRatio: 0.0
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/parquet/hadoop/ParquetOutputCommitter
> at org.apache.spark.sql.SQLConf$.(SQLConf.scala:319)
> at org.apache.spark.sql.SQLConf$.(SQLConf.scala)
> at org.apache.spark.sql.SQLContext.(SQLContext.scala:85)
> at org.apache.spark.sql.SQLContext.(SQLContext.scala:77)
> at main.RDDRelation$.main(RDDRelation.scala:13)
> at main.RDDRelation.main(RDDRelation.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.parquet.hadoop.ParquetOutputCommitter
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
>
>


ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter

2016-07-07 Thread kevin
hi,all:
I build spark use:

./make-distribution.sh --name "hadoop2.7.1" --tgz
"-Pyarn,hadoop-2.6,parquet-provided,hive,hive-thriftserver" -DskipTests
-Dhadoop.version=2.7.1

I can run example :
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master spark://master1:7077 \
--driver-memory 1g \
--executor-memory 512m \
--executor-cores 1 \
lib/spark-examples*.jar \
10

but can't run example :
org.apache.spark.examples.sql.RDDRelation

*I got error:*
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/2 is now RUNNING
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/4 is now RUNNING
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/3 is now RUNNING
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/0 is now RUNNING
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/1 is now RUNNING
16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
app-20160707182845-0003/5 is now RUNNING
16/07/07 18:28:46 INFO cluster.SparkDeploySchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/parquet/hadoop/ParquetOutputCommitter
at org.apache.spark.sql.SQLConf$.(SQLConf.scala:319)
at org.apache.spark.sql.SQLConf$.(SQLConf.scala)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:85)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:77)
at main.RDDRelation$.main(RDDRelation.scala:13)
at main.RDDRelation.main(RDDRelation.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.parquet.hadoop.ParquetOutputCommitter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more


Re: Custom Log4j layout on YARN = ClassNotFoundException

2016-04-22 Thread andrew.rowson
Apologies, outlook for mac is ridiculous. Copy and paste the original below:

-

I’m running into a strange issue with trying to use a custom Log4j layout for 
Spark (1.6.1) on YARN (CDH). The layout is: 
https://github.com/michaeltandy/log4j-json

If I use a log4j.properties file (supplied with --files) with:

log4j.appender.consolejson=org.apache.log4j.ConsoleAppender
log4j.appender.consolejson.target=System.err
log4j.appender.consolejson.layout=uk.me.mjt.log4jjson.SimpleJsonLayout


And supply the log4j-json.1.0.jar with ‘--jars’ to spark-submit, the driver and 
executors throw an exception right at the start of the log file:

log4j:ERROR Could not instantiate class [uk.me.mjt.log4jjson.SimpleJsonLayout].
java.lang.ClassNotFoundException: uk.me.mjt.log4jjson.SimpleJsonLayout

However, a simple spark job that does something like:

sc.parallelize(List(1,2,3)).foreach(i => 
{Class.forName("uk.me.mjt.log4jjson.SimpleJsonLayout")})

Doesn’t throw an error. So the class is being loaded, but just not in time for 
Log4j to use it.

I've tried a few different options trying to get it to work (including it in 
the YARN application classpath, spark executor classpaths etc) and they all 
produce the same results. The only thing that seems to work is building a 
custom spark-assembly with the maven dependency included in core/pom.xml. This 
way, the layout is included in the spark assembly jar, and I get the JSON log 
output desired.

Is there a classloading issue on Log4j when using --jars? I can't imagine why 
it works with bundling in spark-assembly, but doesn't work with --jars.


From:  Ted Yu <yuzhih...@gmail.com>
Date:  Friday, 22 April 2016 at 14:55
To:  Andrew Rowson <andrew.row...@thomsonreuters.com>
Cc:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  Re: Custom Log4j layout on YARN = ClassNotFoundException

There is not much in the body of email. 

Can you elaborate what issue you encountered ?

Thanks

On Fri, Apr 22, 2016 at 2:27 AM, Rowson, Andrew G. (TR Technology & Ops) 
<andrew.row...@thomsonreuters.com> wrote:



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.<http://site.thomsonreuters.com/site/disclosures/>


-- Forwarded message --
From: "Rowson, Andrew G. (TR Technology & Ops)" 
<andrew.row...@thomsonreuters.com>
To: "user@spark.apache.org" <user@spark.apache.org>
Cc: 
Date: Fri, 22 Apr 2016 10:27:53 +0100
Subject: Custom Log4j layout on YARN = ClassNotFoundException


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




smime.p7s
Description: S/MIME cryptographic signature


Re: Custom Log4j layout on YARN = ClassNotFoundException

2016-04-22 Thread Ted Yu
There is not much in the body of email.

Can you elaborate what issue you encountered ?

Thanks

On Fri, Apr 22, 2016 at 2:27 AM, Rowson, Andrew G. (TR Technology & Ops) <
andrew.row...@thomsonreuters.com> wrote:

>
> 
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.<
> http://site.thomsonreuters.com/site/disclosures/>
>
>
> -- Forwarded message --
> From: "Rowson, Andrew G. (TR Technology & Ops)" <
> andrew.row...@thomsonreuters.com>
> To: "user@spark.apache.org" <user@spark.apache.org>
> Cc:
> Date: Fri, 22 Apr 2016 10:27:53 +0100
> Subject: Custom Log4j layout on YARN = ClassNotFoundException
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>


Custom Log4j layout on YARN = ClassNotFoundException

2016-04-22 Thread Rowson, Andrew G. (TR Technology & Ops)



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.
--- Begin Message ---
<>--- End Message ---

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ClassNotFoundException in RDD.map

2016-03-23 Thread Dirceu Semighini Filho
Thanks Jacob,
I've looked into the source code here and found that I miss this property
there:
spark.repl.class.uri

Putting it solved the problem

Cheers

2016-03-17 18:14 GMT-03:00 Jakob Odersky <ja...@odersky.com>:

> The error is very strange indeed, however without code that reproduces
> it, we can't really provide much help beyond speculation.
>
> One thing that stood out to me immediately is that you say you have an
> RDD of Any where every Any should be a BigDecimal, so why not specify
> that type information?
> When using Any, a whole class of errors, that normally the typechecker
> could catch, can slip through.
>
> On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho
> <dirceu.semigh...@gmail.com> wrote:
> > Hi Ted, thanks for answering.
> > The map is just that, whenever I try inside the map it throws this
> > ClassNotFoundException, even if I do map(f => f) it throws the exception.
> > What is bothering me is that when I do a take or a first it returns the
> > result, which make me conclude that the previous code isn't wrong.
> >
> > Kind Regards,
> > Dirceu
> >
> >
> > 2016-03-17 12:50 GMT-03:00 Ted Yu <yuzhih...@gmail.com>:
> >>
> >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> >>
> >> Do you mind showing more of your code involving the map() ?
> >>
> >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho
> >> <dirceu.semigh...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>> I found a strange behavior after executing a prediction with MLIB.
> >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
> >>> which is BigDecimal, and Double is the prediction for that line.
> >>> When I run
> >>> myRdd.take(10) it returns ok
> >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
> >>> Array((1921821857196754403.00,0.1690292052496703),
> >>> (454575632374427.00,0.16902820241892452),
> >>> (989198096568001939.00,0.16903432789699502),
> >>> (14284129652106187990.00,0.16903517653451386),
> >>> (17980228074225252497.00,0.16903151028332508),
> >>> (3861345958263692781.00,0.16903056986183976),
> >>> (17558198701997383205.00,0.1690295450319745),
> >>> (10651576092054552310.00,0.1690286445174418),
> >>> (4534494349035056215.00,0.16903303401862327),
> >>> (5551671513234217935.00,0.16902303368995966))
> >>> But when I try to run some map on it:
> >>> myRdd.map(_._1).take(10)
> >>> It throws a ClassCastException:
> >>> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0
> >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in
> stage
> >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
> >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >>> at java.security.AccessController.doPrivileged(Native Method)
> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >>> at java.lang.Class.forName0(Native Method)
> >>> at java.lang.Class.forName(Class.java:278)
> >>> at
> >>>
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> >>> at
> >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at
> >>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at
> >>>
> java.io.ObjectInputStream.defau

ClassNotFoundException in RDD.map

2016-03-20 Thread Dirceu Semighini Filho
Hello,
I found a strange behavior after executing a prediction with MLIB.
My code return an RDD[(Any,Double)] where Any is the id of my dataset,
which is BigDecimal, and Double is the prediction for that line.
When I run
myRdd.take(10) it returns ok
res16: Array[_ >: (Double, Double) <: (Any, Double)] =
Array((1921821857196754403.00,0.1690292052496703),
(454575632374427.00,0.16902820241892452),
(989198096568001939.00,0.16903432789699502),
(14284129652106187990.00,0.16903517653451386),
(17980228074225252497.00,0.16903151028332508),
(3861345958263692781.00,0.16903056986183976),
(17558198701997383205.00,0.1690295450319745),
(10651576092054552310.00,0.1690286445174418),
(4534494349035056215.00,0.16903303401862327),
(5551671513234217935.00,0.16902303368995966))
But when I try to run some map on it:
myRdd.map(_._1).take(10)
It throws a ClassCastException:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 

Re: ClassNotFoundException in RDD.map

2016-03-20 Thread Jakob Odersky
The error is very strange indeed, however without code that reproduces
it, we can't really provide much help beyond speculation.

One thing that stood out to me immediately is that you say you have an
RDD of Any where every Any should be a BigDecimal, so why not specify
that type information?
When using Any, a whole class of errors, that normally the typechecker
could catch, can slip through.

On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho
<dirceu.semigh...@gmail.com> wrote:
> Hi Ted, thanks for answering.
> The map is just that, whenever I try inside the map it throws this
> ClassNotFoundException, even if I do map(f => f) it throws the exception.
> What is bothering me is that when I do a take or a first it returns the
> result, which make me conclude that the previous code isn't wrong.
>
> Kind Regards,
> Dirceu
>
>
> 2016-03-17 12:50 GMT-03:00 Ted Yu <yuzhih...@gmail.com>:
>>
>> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>
>> Do you mind showing more of your code involving the map() ?
>>
>> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho
>> <dirceu.semigh...@gmail.com> wrote:
>>>
>>> Hello,
>>> I found a strange behavior after executing a prediction with MLIB.
>>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
>>> which is BigDecimal, and Double is the prediction for that line.
>>> When I run
>>> myRdd.take(10) it returns ok
>>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
>>> Array((1921821857196754403.00,0.1690292052496703),
>>> (454575632374427.00,0.16902820241892452),
>>> (989198096568001939.00,0.16903432789699502),
>>> (14284129652106187990.00,0.16903517653451386),
>>> (17980228074225252497.00,0.16903151028332508),
>>> (3861345958263692781.00,0.16903056986183976),
>>> (17558198701997383205.00,0.1690295450319745),
>>> (10651576092054552310.00,0.1690286445174418),
>>> (4534494349035056215.00,0.16903303401862327),
>>> (5551671513234217935.00,0.16902303368995966))
>>> But when I try to run some map on it:
>>> myRdd.map(_._1).take(10)
>>> It throws a ClassCastException:
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:278)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java

Re: ClassNotFoundException in RDD.map

2016-03-19 Thread Dirceu Semighini Filho
Hi Ted, thanks for answering.
The map is just that, whenever I try inside the map it throws this
ClassNotFoundException, even if I do map(f => f) it throws the exception.
What is bothering me is that when I do a take or a first it returns the
result, which make me conclude that the previous code isn't wrong.

Kind Regards,
Dirceu

2016-03-17 12:50 GMT-03:00 Ted Yu <yuzhih...@gmail.com>:

> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>
> Do you mind showing more of your code involving the map() ?
>
> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho <
> dirceu.semigh...@gmail.com> wrote:
>
>> Hello,
>> I found a strange behavior after executing a prediction with MLIB.
>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
>> which is BigDecimal, and Double is the prediction for that line.
>> When I run
>> myRdd.take(10) it returns ok
>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
>> Array((1921821857196754403.00,0.1690292052496703),
>> (454575632374427.00,0.16902820241892452),
>> (989198096568001939.00,0.16903432789699502),
>> (14284129652106187990.00,0.16903517653451386),
>> (17980228074225252497.00,0.16903151028332508),
>> (3861345958263692781.00,0.16903056986183976),
>> (17558198701997383205.00,0.1690295450319745),
>> (10651576092054552310.00,0.1690286445174418),
>> (4534494349035056215.00,0.16903303401862327),
>> (5551671513234217935.00,0.16902303368995966))
>> But when I try to run some map on it:
>> myRdd.map(_._1).take(10)
>> It throws a ClassCastException:
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:278)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java

Re: ClassNotFoundException in RDD.map

2016-03-19 Thread Ted Yu
bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1

Do you mind showing more of your code involving the map() ?

On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:

> Hello,
> I found a strange behavior after executing a prediction with MLIB.
> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
> which is BigDecimal, and Double is the prediction for that line.
> When I run
> myRdd.take(10) it returns ok
> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
> Array((1921821857196754403.00,0.1690292052496703),
> (454575632374427.00,0.16902820241892452),
> (989198096568001939.00,0.16903432789699502),
> (14284129652106187990.00,0.16903517653451386),
> (17980228074225252497.00,0.16903151028332508),
> (3861345958263692781.00,0.16903056986183976),
> (17558198701997383205.00,0.1690295450319745),
> (10651576092054552310.00,0.1690286445174418),
> (4534494349035056215.00,0.16903303401862327),
> (5551671513234217935.00,0.16902303368995966))
> But when I try to run some map on it:
> myRdd.map(_._1).take(10)
> It throws a ClassCastException:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:278)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
> at
> 

spark-submit with cluster deploy mode fails with ClassNotFoundException (jars are not passed around properley?)

2016-03-11 Thread Hiroyuki Yamada
Hi,

I am trying to work with spark-submit with cluster deploy mode in single
node,
but I keep getting ClassNotFoundException as shown below.
(in this case, snakeyaml.jar is not found from the spark cluster)

===

16/03/12 14:19:12 INFO Remoting: Starting remoting
16/03/12 14:19:12 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://Driver@192.168.1.2:52993]
16/03/12 14:19:12 INFO util.Utils: Successfully started service
'Driver' on port 52993.
16/03/12 14:19:12 INFO worker.WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@192.168.1.2:52985/user/Worker
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
at 
com.analytics.config.YamlConfigLoader.loadConfig(YamlConfigLoader.java:30)
at 
com.analytics.api.DeclarativeAnalyticsFactory.create(DeclarativeAnalyticsFactory.java:21)
at com.analytics.program.QueryExecutor.main(QueryExecutor.java:12)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
at java.lang.ClassLoader.findClass(ClassLoader.java:530)
at 
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
at 
org.apache.spark.util.ChildFirstURLClassLoader.liftedTree1$1(MutableURLClassLoader.scala:75)
at 
org.apache.spark.util.ChildFirstURLClassLoader.loadClass(MutableURLClassLoader.scala:71)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
16/03/12 14:19:12 INFO util.Utils: Shutdown hook called



I can submit a job successfully with client mode, but I can't with cluster
mode,
so, it is a matter of not properly passing jars (snakeyaml) to the cluster.

The actual command I tried is:

$ spark-submit --master spark://192.168.1.2:6066 --deploy-mode cluster
--jars all-the-jars(with comma separated) --class
com.analytics.program.QueryExecutor analytics.jar
(of course, snakeyaml.jar is specified after --jars)

I tried spark.executor.extraClassPath and spark.driver.extraClassPath in
spark-defaults.conf to specifiy snakeyaml.jar,
but none of those worked.


I also found couple of similar issues posted in the mailing list or other
sites,
but, it is not responded back properly or it didn't work to me.

<
https://mail-archives.apache.org/mod_mbox/spark-user/201505.mbox/%3CCAGSyEuApEkfO_2-iiiuyS2eeg=w_jkf83vcceguns4douod...@mail.gmail.com%3E
>
<
http://stackoverflow.com/questions/34272426/how-to-give-dependent-jars-to-spark-submit-in-cluster-mode
>
<
https://support.datastax.com/hc/en-us/articles/207442243-Spark-submit-fails-with-class-not-found-when-deploying-in-cluster-mode
>


Could anyone give me a help ?

Best regards,
Hiro


Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Ted Yu
I think SPARK_CLASSPATH is deprecated.

Can you show the command line launching your Spark job ?
Which Spark release do you use ?

Thanks



On Thu, Feb 11, 2016 at 5:38 PM, Charlie Wright <charliewri...@live.ca>
wrote:

> built and installed hadoop with:
> mvn package -Pdist -DskipTests -Dtar
> mvn install -DskipTests
>
> built spark with:
> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.8.0-SNAPSHOT -DskipTests clean
> package
>
> Where would I check the classpath? Is it the environment variable
> SPARK_CLASSPATH?
>
> Charles
>
> --
> Date: Thu, 11 Feb 2016 17:29:00 -0800
> Subject: Re: Building Spark with a Custom Version of Hadoop: HDFS
> ClassNotFoundException
> From: yuzhih...@gmail.com
> To: charliewri...@live.ca
> CC: d...@spark.apache.org
>
> Hdfs class is in hadoop-hdfs-XX.jar
>
> Can you check the classpath to see if the above jar is there ?
>
> Please describe the command lines you used for building hadoop / Spark.
>
> Cheers
>
> On Thu, Feb 11, 2016 at 5:15 PM, Charlie Wright <charliewri...@live.ca>
> wrote:
>
> I am having issues trying to run a test job on a built version of Spark
> with a custom Hadoop JAR.
> My custom hadoop version runs without issues and I can run jobs from a
> precompiled version of Spark (with Hadoop) no problem.
>
> However, whenever I try to run the same Spark example on the Spark version
> with my custom hadoop JAR - I get this error:
> "Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.Hdfs not found"
>
> Does anybody know why this is happening?
>
> Thanks,
> Charles.
>
>
>


Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Ted Yu
The Spark driver does not run on the YARN cluster in client mode, only the
Spark executors do.
Can you check YARN logs for the failed job to see if there was more clue ?

Does the YARN cluster run the customized hadoop or stock hadoop ?

Cheers

On Thu, Feb 11, 2016 at 5:44 PM, Charlie Wright <charliewri...@live.ca>
wrote:

> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
> yarn --deploy-mode client --driver-memory 4g --executor-memory
> 1664m --executor-cores 1 --queue default
> examples/target/spark-examples*.jar 10
>
> I am using the 1.6.0 release.
>
>
> Charles.
>
> --
> Date: Thu, 11 Feb 2016 17:41:54 -0800
> Subject: Re: Building Spark with a Custom Version of Hadoop: HDFS
> ClassNotFoundException
> From: yuzhih...@gmail.com
> To: charliewri...@live.ca; user@spark.apache.org
>
>
> I think SPARK_CLASSPATH is deprecated.
>
> Can you show the command line launching your Spark job ?
> Which Spark release do you use ?
>
> Thanks
>
>
>
> On Thu, Feb 11, 2016 at 5:38 PM, Charlie Wright <charliewri...@live.ca>
> wrote:
>
> built and installed hadoop with:
> mvn package -Pdist -DskipTests -Dtar
> mvn install -DskipTests
>
> built spark with:
> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.8.0-SNAPSHOT -DskipTests clean
> package
>
> Where would I check the classpath? Is it the environment variable
> SPARK_CLASSPATH?
>
> Charles
>
> ------
> Date: Thu, 11 Feb 2016 17:29:00 -0800
> Subject: Re: Building Spark with a Custom Version of Hadoop: HDFS
> ClassNotFoundException
> From: yuzhih...@gmail.com
> To: charliewri...@live.ca
> CC: d...@spark.apache.org
>
> Hdfs class is in hadoop-hdfs-XX.jar
>
> Can you check the classpath to see if the above jar is there ?
>
> Please describe the command lines you used for building hadoop / Spark.
>
> Cheers
>
> On Thu, Feb 11, 2016 at 5:15 PM, Charlie Wright <charliewri...@live.ca>
> wrote:
>
> I am having issues trying to run a test job on a built version of Spark
> with a custom Hadoop JAR.
> My custom hadoop version runs without issues and I can run jobs from a
> precompiled version of Spark (with Hadoop) no problem.
>
> However, whenever I try to run the same Spark example on the Spark version
> with my custom hadoop JAR - I get this error:
> "Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.Hdfs not found"
>
> Does anybody know why this is happening?
>
> Thanks,
> Charles.
>
>
>
>


ClassNotFoundException interpreting a Spark job

2016-01-16 Thread milad bourhani
Hi everyone,

I’m trying to use the Scala interpreter, IMain, to interpret some Scala code 
that executes a job with Spark:

@Test
public void countToFive() throws ScriptException {
SparkConf conf = new SparkConf().setAppName("Spark 
interpreter").setMaster("local[2]");
SparkContext sc = new SparkContext(conf);
Settings settings = new Settings();
((MutableSettings.BooleanSetting) settings.usejavacp()).value_$eq(true);
IMain interpreter = new IMain(settings);
interpreter.setContextClassLoader();
interpreter.put("sc: org.apache.spark.SparkContext", sc);
assertEquals(5L, interpreter.eval("sc.parallelize(List(1,2,3,4,5)).map( _ + 
1 ).count()"));
}

However the following error shows up:

java.lang.ClassNotFoundException: $line5.$read$$iw$$iw$$anonfun$1

If the SparkContext object is created after this line:
interpreter.setContextClassLoader();
then the execution succeeds. The fact is that I’d like to create the context 
once, and then create interpreters from multiple threads on demand later on. 
This also relates with the fact that there can only be one SparkContext object 
in the JVM — https://issues.apache.org/jira/browse/SPARK-2243 
.

It looks like the SparkContext cannot serialize the anonymous function (“_ + 
1”). I’ve fiddled a lot with this and cannot seem to get through it, can 
anybody help?

Thank you in advance,
Milad

Re: 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-14 Thread Egor Pahomov
My fault, I should have read documentation more accurate -
http://spark.apache.org/docs/latest/sql-programming-guide.html precisely
says, that I need to add these 3 jars to class path in case I need them. We
can not include them in fat jar, because they OSGI and require to have
plugin.xml and META_INF/MANIFEST.MF in root of jar. The problem is you have
3 of them and every one has it's own plugin.xml. You can include all this
in fat jar if you would be able to merge plugin.xml, but currently there is
no tool to do so. maven assembly plugin just has no such merger, maven
shaded plugin has XmlAppenderTransformer, but for some reason it doesn't
work. And that is it - you just have to live with the fact, that you have
fat jar with all dep, except these 3. Good news is if you are in
yarn-client mode you only need to add them to classpath of your driver, you
do not have to do addJar(). It's really good news, since it's hard to do
addJar() properly in Oozie job.

2016-01-12 17:01 GMT-08:00 Egor Pahomov :

> Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
> serious issue. I successfully updated spark thrift server from 1.5.2 to
> 1.6.0. But I have standalone application, which worked fine with 1.5.2 but
> failing on 1.6.0 with:
>
> *NestedThrowables:*
> *java.lang.ClassNotFoundException:
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory*
> * at
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)*
> * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)*
> * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)*
>
> Inside this application I work with hive table, which have data in json
> format.
>
> When I add
>
> 
> org.datanucleus
> datanucleus-core
> 4.0.0-release
> 
>
> 
> org.datanucleus
> datanucleus-api-jdo
> 4.0.0-release
> 
>
> 
> org.datanucleus
> datanucleus-rdbms
> 3.2.9
> 
>
> I'm getting:
>
> *Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence
> process has been specified to use a ClassLoaderResolver of name
> "datanucleus" yet this has not been found by the DataNucleus plugin
> mechanism. Please check your CLASSPATH and plugin specification.*
> * at
> org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)*
> * at
> org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)*
>
> I have CDH 5.5. I build spark with
>
> *./make-distribution.sh -Pyarn -Phadoop-2.6
> -Dhadoop.version=2.6.0-cdh5.5.0 -Phive -DskipTests*
>
> Than I publish fat jar locally:
>
> *mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
> -Dfile=./spark-assembly.jar -DgroupId=org.spark-project
> -DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar*
>
> Than I include dependency on this fat jar:
>
> 
> org.spark-project
> my-spark-assembly
> 1.6.0-SNAPSHOT
> 
>
> Than I build my application with assembly plugin:
>
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 
> 
> 
> *:*
> 
> 
> 
> 
> *:*
> 
> META-INF/*.SF
> META-INF/*.DSA
> META-INF/*.RSA
> 
> 
> 
> 
> 
> 
> package
> 
> shade
> 
> 
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
>  
> implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
> 
> META-INF/services/org.apache.hadoop.fs.FileSystem
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
> reference.conf
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
> log4j.properties
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/>
>  
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer"/>
> 
> 
> 
> 
> 
>
> Configuration of assembly plugin is copy-past from spark assembly pom.
>
> This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good 
> approach of creating this standalone application, please recommend other 
> approach, but spark-submit does not work for me - it hard for me to connect 
> it to Oozie.
>
> Any suggestion would be 

1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-12 Thread Egor Pahomov
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
serious issue. I successfully updated spark thrift server from 1.5.2 to
1.6.0. But I have standalone application, which worked fine with 1.5.2 but
failing on 1.6.0 with:

*NestedThrowables:*
*java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory*
* at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)*
* at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)*
* at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)*

Inside this application I work with hive table, which have data in json
format.

When I add


org.datanucleus
datanucleus-core
4.0.0-release



org.datanucleus
datanucleus-api-jdo
4.0.0-release



org.datanucleus
datanucleus-rdbms
3.2.9


I'm getting:

*Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence
process has been specified to use a ClassLoaderResolver of name
"datanucleus" yet this has not been found by the DataNucleus plugin
mechanism. Please check your CLASSPATH and plugin specification.*
* at
org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)*
* at
org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)*

I have CDH 5.5. I build spark with

*./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.5.0
-Phive -DskipTests*

Than I publish fat jar locally:

*mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
-Dfile=./spark-assembly.jar -DgroupId=org.spark-project
-DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar*

Than I include dependency on this fat jar:


org.spark-project
my-spark-assembly
1.6.0-SNAPSHOT


Than I build my application with assembly plugin:


org.apache.maven.plugins
maven-shade-plugin



*:*




*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA






package

shade






META-INF/services/org.apache.hadoop.fs.FileSystem


reference.conf


log4j.properties









Configuration of assembly plugin is copy-past from spark assembly pom.

This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good
approach of creating this standalone application, please recommend
other approach, but spark-submit does not work for me - it hard for me
to connect it to Oozie.

Any suggestion would be appreciated - I'm stuck.

My spark config:

lazy val sparkConf = new SparkConf()
  .setMaster("yarn-client")
  .setAppName(appName)
  .set("spark.yarn.queue", "jenkins")
  .set("spark.executor.memory", "10g")
  .set("spark.yarn.executor.memoryOverhead", "2000")
  .set("spark.executor.cores", "3")
  .set("spark.driver.memory", "4g")
  .set("spark.shuffle.io.numConnectionsPerPeer", "5")
  .set("spark.sql.autoBroadcastJoinThreshold", "200483647")
  .set("spark.network.timeout", "1000s")
  .set("spark.executor.extraJavaOptions", "-XX:MaxPermSize=2g")
  .set("spark.driver.maxResultSize", "2g")
  .set("spark.rpc.lookupTimeout", "1000s")
  .set("spark.sql.hive.convertMetastoreParquet", "false")
  .set("spark.kryoserializer.buffer.max", "200m")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.yarn.driver.memoryOverhead", "1000")
  .set("spark.dynamicAllocation.enabled", "true")
  .set("spark.shuffle.service.enabled", "true")
  .set("spark.dynamicAllocation.minExecutors", "1")
  .set("spark.dynamicAllocation.maxExecutors", "20")
  .set("spark.dynamicAllocation.executorIdleTimeout", "60s")
  .set("spark.sql.tungsten.enabled", "false")
  .set("spark.dynamicAllocation.cachedExecutorIdleTimeout", "100s")
.setJars(List(this.getClass.getProtectionDomain().getCodeSource().getLocation().toURI().getPath()))

-- 



*Sincerely yoursEgor Pakhomov*


Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Prem Spark
you need make sure this class is accessible to all servers since its a
cluster mode and drive can be on any of the worker nodes.


On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa  wrote:

> Hi,
>
> I'm submitting a spark job like this:
>
> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>> spark://machine1:6066 --deploy-mode cluster --jars
>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>> machine2  1000
>>
>
> and in the driver stderr, I get the following exception:
>
>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX):
>> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
> Note that everything works fine when using deploy-mode as 'client'.
> This is the application that I'm trying to run:
> https://github.com/tdas/spark-streaming-benchmark (this problem also
> happens for non streaming applications)
>
> What can I do to sort this out?
>
> Thanks.
>


Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Saiph Kappa
I found out that by commenting this line in the application code:
sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300
-XX:MaxInlineSize=300 ")

the exception does not occur anymore.  Not entirely sure why, but
everything goes fine without that line.

Thanks!

On Tue, Dec 29, 2015 at 1:39 PM, Prem Spark  wrote:

> you need make sure this class is accessible to all servers since its a
> cluster mode and drive can be on any of the worker nodes.
>
>
> On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa 
> wrote:
>
>> Hi,
>>
>> I'm submitting a spark job like this:
>>
>> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>>> spark://machine1:6066 --deploy-mode cluster --jars
>>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>>> machine2  1000
>>>
>>
>> and in the driver stderr, I get the following exception:
>>
>>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74,
>>> XXX.XXX.XX.XXX): java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:270)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>
>> Note that everything works fine when using deploy-mode as 'client'.
>> This is the application that I'm trying to run:
>> https://github.com/tdas/spark-streaming-benchmark (this problem also
>> happens for non streaming applications)
>>
>> What can I do to sort this out?
>>
>> Thanks.
>>
>
>


ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-25 Thread Saiph Kappa
Hi,

I'm submitting a spark job like this:

~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
> spark://machine1:6066 --deploy-mode cluster --jars
> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
> machine2  1000
>

and in the driver stderr, I get the following exception:

 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX):
> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>

Note that everything works fine when using deploy-mode as 'client'.
This is the application that I'm trying to run:
https://github.com/tdas/spark-streaming-benchmark (this problem also
happens for non streaming applications)

What can I do to sort this out?

Thanks.


jdbc error, ClassNotFoundException: org.apache.hadoop.hive.schshim.FairSchedulerShim

2015-12-03 Thread zhangjp
Hi all,
  I download the prebuilt version 1.5.2 with hadoop 2.6, when I use spark-sql 
there is no problem, but when i start thriftServer and then want to query hive 
table useing jdbc  there is errors as follows.
  
 Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.schshim.FairSchedulerShim
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at 
org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)

ClassNotFoundException with a uber jar.

2015-11-26 Thread Marc de Palol
Hi all, 

I have a uber jar made with maven, the contents are:

my.org.my.classes.Class
...
lib/lib1.jar // 3rd party libs
lib/lib2.jar 

I'm using this kind of jar for hadoop applications and all works fine. 

I added spark libs, scala and everything needed in spark, but when I submit
this jar to spark I get ClassNotFoundExceptions: 

spark-submit --class com.bla.TestJob --driver-memory 512m --master
yarn-client /home/ble/uberjar.jar

Then when the job is running I get this: 
java.lang.NoClassDefFoundError:
com/fasterxml/jackson/datatype/guava/GuavaModule
// usage of jackson's GuavaModule is expected, as the job is using jackson
to read json.


this class is contained in: 
lib/jackson-datatype-guava-2.4.3.jar, which is in the uberjar

So I really don't know what I'm missing. I've tried to use --jars and
SparkContext.addJar (adding the uberjar) with no luck. 

Is there any problem using uberjars with inner jars inside ? 

Thanks!






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-a-uber-jar-tp25493.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException with a uber jar.

2015-11-26 Thread Ali Tajeldin EDU
I'm not %100 sure, but I don't think a jar within a jar will work without a 
custom class loader.  You can perhaps try to use "maven-assembly-plugin" or 
"maven-shade-plugin" to build your uber/fat jar.  Both of these will build a 
flattened single jar.
--
Ali

On Nov 26, 2015, at 2:49 AM, Marc de Palol <phleg...@gmail.com> wrote:

> Hi all, 
> 
> I have a uber jar made with maven, the contents are:
> 
> my.org.my.classes.Class
> ...
> lib/lib1.jar // 3rd party libs
> lib/lib2.jar 
> 
> I'm using this kind of jar for hadoop applications and all works fine. 
> 
> I added spark libs, scala and everything needed in spark, but when I submit
> this jar to spark I get ClassNotFoundExceptions: 
> 
> spark-submit --class com.bla.TestJob --driver-memory 512m --master
> yarn-client /home/ble/uberjar.jar
> 
> Then when the job is running I get this: 
> java.lang.NoClassDefFoundError:
> com/fasterxml/jackson/datatype/guava/GuavaModule
> // usage of jackson's GuavaModule is expected, as the job is using jackson
> to read json.
> 
> 
> this class is contained in: 
> lib/jackson-datatype-guava-2.4.3.jar, which is in the uberjar
> 
> So I really don't know what I'm missing. I've tried to use --jars and
> SparkContext.addJar (adding the uberjar) with no luck. 
> 
> Is there any problem using uberjars with inner jars inside ? 
> 
> Thanks!
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-a-uber-jar-tp25493.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



log4j custom appender ClassNotFoundException with spark 1.5.2

2015-11-25 Thread lev
Hi,

I'm using spark 1.5.2 and running on a yarn cluster
and trying to use a custom log4j appender 

in my setup there are 3 jars:
the uber jar: spark.yarn.jar=uber-jar.jar
the jar that contains the main class: main.jar
additional jar with dependencies: dep.jar (passed with the --jars flag to
spark submit)

I've tried defining my appender in each one of the jars:
in uber-jar: the appender is found and created successfully
in main.jar or dep.jar: throws ClassNotFoundException 

I guess log4j tries to load the class before the assemblies were loaded

it's related to this ticket:
https://issues.apache.org/jira/browse/SPARK-9826
but not the same, as the ticket talk about the appender being defined in the
uber-jar, and that case works for me

any ideas on how to solve this?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-5-2-tp25487.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-11-19 Thread Zsolt Tóth
>> at $iwC$$iwC.(:34)
>>
>> at $iwC.(:36)
>>
>> at (:38)
>>
>> at .(:42)
>>
>> at .()
>>
>> at .(:7)
>>
>> at .()
>>
>> at $print()
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>
>> at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>>
>> at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>
>> at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>
>> at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>
>> at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>
>> at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>
>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>
>> at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>
>> at
>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>
>> at org.apache.spark.repl.SparkILoop.org
>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>
>> at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>
>> at org.apache.spark.repl.SparkILoop.org
>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>
>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>
>> at org.apache.spark.repl.Main$.main(Main.scala:31)
>>
>> at org.apache.spark.repl.Main.main(Main.scala)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>>
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>> ----
>>
>> ThanksBest regards!
>> San.Luo
>>
>> - 原始邮件 -
>> 发件人:Xiao Li <gatorsm...@gmail.com>
>> 收件人:Akhil Das <ak...@sigmoidanalytics.com>
>> 抄送人:Hurshal Patel <hpatel...@gmail.com>, "user@spark.apache.org" <
>> user@spark.apache.org>
>> 主题:Re: driver ClassNotFoundException when MySQL JDBC exceptions are
>> thrown on executor
>> 日期:2015年10月22日 22点10分
>>
>> A few months ago, I used the DB2 jdbc drivers. I hit a couple of issues
>> when using --driver-class-path. At the end, I used the following command to
>> bypass most of issues:
>>
>> ./bin/spark-submit --jars
>> /Users/smile/db2driver/db2jcc.jar,/Users/smile/db2driver/db2jcc_license_cisuz.jar
>> --master local[*] --class com.sparkEngine.
>> /Users/smile/spark-1.3.1-bin-hadoop2.3/projects/SparkApps-master/spark-load-from-db/target/-1.0.jar
>>
>> Hopefully, it works for you.
>>
>> Xiao Li
>>
>>
>> 2015-10-22 4:56 GMT-07:00 Akhil

Re: Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-11-19 Thread Jeff Zhang
arkIMain.scala:1065)
>
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>
> at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>
> at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>
> at
> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>
> at
> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>
> at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>
> at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>
> at org.apache.spark.repl.Main$.main(Main.scala:31)
>
> at org.apache.spark.repl.Main.main(Main.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
>
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>
> at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
> 
>
> ThanksBest regards!
> San.Luo
>
> - 原始邮件 -
> 发件人:Xiao Li <gatorsm...@gmail.com>
> 收件人:Akhil Das <ak...@sigmoidanalytics.com>
> 抄送人:Hurshal Patel <hpatel...@gmail.com>, "user@spark.apache.org" <
> user@spark.apache.org>
> 主题:Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown
> on executor
> 日期:2015年10月22日 22点10分
>
> A few months ago, I used the DB2 jdbc drivers. I hit a couple of issues
> when using --driver-class-path. At the end, I used the following command to
> bypass most of issues:
>
> ./bin/spark-submit --jars
> /Users/smile/db2driver/db2jcc.jar,/Users/smile/db2driver/db2jcc_license_cisuz.jar
> --master local[*] --class com.sparkEngine.
> /Users/smile/spark-1.3.1-bin-hadoop2.3/projects/SparkApps-master/spark-load-from-db/target/-1.0.jar
>
> Hopefully, it works for you.
>
> Xiao Li
>
>
> 2015-10-22 4:56 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
> Did you try passing the mysql connector jar through --driver-class-path
>
> Thanks
> Best Regards
>
> On Sat, Oct 17, 2015 at 6:33 AM, Hurshal Patel <hpatel...@gmail.com>
> wrote:
>
> Hi all,
>
> I've been struggling with a particularly puzzling issue after upgrading to
> Spark 1.5.1 from Spark 1.4.1.
>
> When I use the MySQL JDBC connector and an exception (e.g.
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException) is thrown on
> the executor, I get a ClassNotFoundException on the driver, which results
> in this error (logs are abbreviated):
>
> 15/10/16 17:20:59 INFO SparkContext: Starting job: collect at
> repro.scala:73
> ...
> 15/10/16 17:20:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
> 15/10/16 17:20:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID
> 3)
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
> at repro.Repro$$anonfun$main$3.apply$mcZI$sp(repro.scala:69)
> ...
> 15/10/16 17:20:59 WARN ThrowableSerializationWrapper: Task exception could
> not be deserialized
> java.lang.ClassNotFoundException:
> com.mysql.jdb

Re: kafka streaminf 1.5.2 - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2015-11-17 Thread Tathagata Das
t; 
> 
>
> 
> org.scalatest
> scalatest-maven-plugin
> 1.0
> 
>
>
> ${project.build.directory}/surefire-reports
> .
> WDF TestSuite.txt
> 
> 
> 
> test
> 
> test
> 
> 
> 
> 
>
> 
> org.scala-tools
> maven-scala-plugin
> 2.15.0
> 
> 
> compile-scala
> compile
> 
> compile
> 
> 
>
> 
> compile-tests-scala
> compile
> 
> testCompile
> 
> 
> 
> 
>
> 
> 
>
> 
> 
> org.slf4j
> slf4j-log4j12
> 1.7.5
> 
> 
> org.slf4j
> slf4j-api
> 1.7.5
> 
> 
> org.springframework
> spring-test
> test
> 
> ...
> ...
> 
> com.databricks
> spark-csv_2.10
> 1.2.0
> 
> 
> org.scala-lang
> scala-compiler
> ${scala.version}
> compile
> 
>
> 
> org.scala-lang
> scala-library
> ${scala.version}
> 
> 
> org.scalatest
> scalatest_${scala.binary.version}
> test
> 
>
> 
> org.apache.kafka
> kafka_${scala.binary.version}
> 0.8.2.1
> compile
> 
> 
> jmxri
> com.sun.jmx
> 
> 
> jms
> javax.jms
> 
> 
> jmxtools
> com.sun.jdmk
> 
> 
> 
>
> 
> org.apache.spark
>
> spark-streaming-kafka_${scala.binary.version}
> ${spark.version}
> 
> 
> jcl-over-slf4j
> org.slf4j
> 
> 
> javax.servlet-api
> javax.servlet
> 
> 
> javax.servlet
> servlet-api
> 
> 
> 
> 
> org.apache.spark
>
> spark-streaming_${scala.binary.version}
> ${spark.version}
> 
> 
> jcl-over-slf4j
> org.slf4j
> 
>     
> 
> 
> org.apache.spark
> spark-core_${scala.binary.version}
> ${spark.version}
> provided
> 
> 
> jcl-over-slf4j
> org.slf4j
> 
> 
> javax.servlet-api
> javax.servlet
> 
> 
> javax.servlet
> servlet-api
> 
> 
> 
>
> 
> org.apache.spark
> spark-sql_${scala.binary.version}
> ${spark.version}
> 
> 
> jcl-over-slf4j
> org.slf4j
> 
> 
> 
> 
> org.mockito
> mockito-all
> test
> 
> 
> junit
> junit
> test
> 
> 
> joda-time
> joda-time
> 
>
> 
>
> 
>
> 
>
>
> org.apache.spark.streaming.kafka.KafkaReceiver is inside the
> spark-streaming-kafka jar file, so I’m not sure why I get the
> ClassNotFoundException.
>
> Please help.
>
> Thanks,
> Tim
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/kafka-streaminf-1-5-2-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp25408.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


kafka streaminf 1.5.2 - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2015-11-17 Thread tim_b123
  

org.scalatest
scalatest_${scala.binary.version}
test

 

org.apache.kafka
kafka_${scala.binary.version}
0.8.2.1
compile


jmxri
com.sun.jmx


jms
javax.jms


jmxtools
com.sun.jdmk



 

org.apache.spark
   
spark-streaming-kafka_${scala.binary.version}
${spark.version}


jcl-over-slf4j
org.slf4j


javax.servlet-api
javax.servlet


javax.servlet
servlet-api




org.apache.spark
spark-streaming_${scala.binary.version}
${spark.version}


jcl-over-slf4j
org.slf4j




org.apache.spark
spark-core_${scala.binary.version}
${spark.version}
provided


jcl-over-slf4j
org.slf4j


javax.servlet-api
javax.servlet


javax.servlet
servlet-api



 

org.apache.spark
spark-sql_${scala.binary.version}
${spark.version}


jcl-over-slf4j
org.slf4j




org.mockito
mockito-all
test


junit
junit
test


joda-time
    joda-time
    
 

 

 

 
 
org.apache.spark.streaming.kafka.KafkaReceiver is inside the
spark-streaming-kafka jar file, so I’m not sure why I get the
ClassNotFoundException.
 
Please help.
 
Thanks,
Tim
 
 
 
 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kafka-streaminf-1-5-2-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp25408.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException even if class is present in Jarfile

2015-11-03 Thread hveiga
It turned out to be a problem with `SerializationUtils` from Apache Commons
Lang. There is an open issue where the class will throw a
`ClassNotFoundException` even if the class is in the classpath in a
multiple-classloader environment:
https://issues.apache.org/jira/browse/LANG-1049

We moved away from the library and our Spark job is working fine now. The
issue was not related with Spark finally.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254p25268.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException even if class is present in Jarfile

2015-11-03 Thread Iulian Dragoș
Where is the exception thrown (full stack trace)? How are you running your
application, via spark-submit or spark-shell?

On Tue, Nov 3, 2015 at 1:43 AM, hveiga <kec...@gmail.com> wrote:

> Hello,
>
> I am facing an issue where I cannot run my Spark job in a cluster
> environment (standalone or EMR) but it works successfully if I run it
> locally using local[*] as master.
>
> I am getting ClassNotFoundException: com.mycompany.folder.MyObject on the
> slave executors. I don't really understand why this is happening since I
> have uncompressed the Jarfile to make sure that the class is present inside
> (both .java and .class) and all the rest of the classes are being loaded
> fine.
>
> Also, I would like to mention something weird that might be related but not
> sure. There are two packages inside my jarfile that are called the same but
> with different casing:
>
> - com.mycompany.folder.MyObject
> - com.myCompany.something.Else
>
> Could that be the reason?
>
> Also, I have tried adding my jarfiles in all the ways I could find
> (sparkConf.setJars(...), sparkContext.addJar(...), spark-submit opt --jars,
> ...) but none of the actually worked.
>
> I am using Apache Spark 1.5.0, Java 7, sbt 0.13.7, scala 2.10.5.
>
> Thanks a lot,
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Fwd: Getting ClassNotFoundException: scala.Some on Spark 1.5.x

2015-11-02 Thread Babar Tareen
Resending, haven't found a workaround. Any help is highly appreciated.

-- Forwarded message --
From: Babar Tareen <babartar...@gmail.com>
Date: Thu, Oct 22, 2015 at 2:47 PM
Subject: Getting ClassNotFoundException: scala.Some on Spark 1.5.x
To: user@spark.apache.org


Hi,

I am getting following exception when submitting a job to Spark 1.5.x from
Scala. The same code works with Spark 1.4.1. Any clues as to what might
causing the exception.



*Code:App.scala*import org.apache.spark.SparkContext

object App {
  def main(args: Array[String]) = {
val l = List(1,2,3,4,5,6,7,8,9,0)
val sc = new SparkContext("local[4]", "soark-test")
val rdd = sc.parallelize(l)
rdd.foreach(println)
println(rdd.collect())
  }
}

*build.sbt*
lazy val sparkjob = (project in file("."))
  .settings(
name := "SparkJob",
version := "1.0",
scalaVersion := "2.11.6",
libraryDependencies := libs
)

lazy val libs = Seq(
  "org.apache.spark" %% "spark-core" % "1.5.1"
)


*Exception:*15/10/22 14:32:42 INFO DAGScheduler: Job 0 failed: foreach at
app.scala:9, took 0.689832 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure:
Lost task 2.0 in stage 0.0 (TID 2, localhost): java.io.IOException:
java.lang.ClassNotFoundException: scala.Some
[error] at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
[error] at
org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:497)
[error] at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1959)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
[error] at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
[error] at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
[error] at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
[error] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error] at java.lang.Thread.run(Thread.java:745)
[error] Caused by: java.lang.ClassNotFoundException: scala.Some
[error] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[error] at java.lang.Class.forName0(Native Method)
[error] at java.lang.Class.forName(Class.java:348)
[error] at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
[error] at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
[error] at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] at
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501)
[error] at
org.apache.spark.Accumulable$$anonfun$readObject$1.apply$mcV$sp(Accumulators.scala:152)
[error] at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)
[error] ... 24 more
[error]
[error] Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2
in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in

Re: Getting ClassNotFoundException: scala.Some on Spark 1.5.x

2015-11-02 Thread Jonathan Coveney
Caused by: java.lang.ClassNotFoundException: scala.Some

indicates that you don't have the scala libs present. How are you executing
this? My guess is the issue is a conflict between scala 2.11.6 in your
build and 2.11.7? Not sure...try setting your scala to 2.11.7?

But really, first it'd be good to see what command you're using to invoke
this.

2015-11-02 14:48 GMT-05:00 Babar Tareen <babartar...@gmail.com>:

> Resending, haven't found a workaround. Any help is highly appreciated.
>
> -- Forwarded message --
> From: Babar Tareen <babartar...@gmail.com>
> Date: Thu, Oct 22, 2015 at 2:47 PM
> Subject: Getting ClassNotFoundException: scala.Some on Spark 1.5.x
> To: user@spark.apache.org
>
>
> Hi,
>
> I am getting following exception when submitting a job to Spark 1.5.x from
> Scala. The same code works with Spark 1.4.1. Any clues as to what might
> causing the exception.
>
>
>
> *Code:App.scala*import org.apache.spark.SparkContext
>
> object App {
>   def main(args: Array[String]) = {
> val l = List(1,2,3,4,5,6,7,8,9,0)
> val sc = new SparkContext("local[4]", "soark-test")
> val rdd = sc.parallelize(l)
> rdd.foreach(println)
> println(rdd.collect())
>   }
> }
>
> *build.sbt*
> lazy val sparkjob = (project in file("."))
>   .settings(
> name := "SparkJob",
> version := "1.0",
> scalaVersion := "2.11.6",
> libraryDependencies := libs
> )
>
> lazy val libs = Seq(
>   "org.apache.spark" %% "spark-core" % "1.5.1"
> )
>
>
> *Exception:*15/10/22 14:32:42 INFO DAGScheduler: Job 0 failed: foreach at
> app.scala:9, took 0.689832 s
> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
> stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure:
> Lost task 2.0 in stage 0.0 (TID 2, localhost): java.io.IOException:
> java.lang.ClassNotFoundException: scala.Some
> [error] at
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
> [error] at
> org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
> [error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [error] at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [error] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [error] at java.lang.reflect.Method.invoke(Method.java:497)
> [error] at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> [error] at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
> [error] at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> [error] at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> [error] at
> java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1959)
> [error] at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> [error] at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> [error] at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> [error] at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
> [error] at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
> [error] at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> [error] at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> [error] at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> [error] at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> [error] at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> [error] at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
> [error] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [error] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [error] at java.lang.Thread.run(Thread.java:745)
> [error] Caused by: java.lang.ClassNotFoundException: scala.Some
> [error] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [error] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [error] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [error] at java.lang.Class.forName0(Native Method)
> [error] at java.lang.Class.forName(Class.java:348)
> [error] at
> org.apache.spark.serializer.JavaDeserializationStrea

Re: Getting ClassNotFoundException: scala.Some on Spark 1.5.x

2015-11-02 Thread Babar Tareen
I am using *'sbt run'* to execute the code. Detailed sbt output is here (
https://drive.google.com/open?id=0B2dlA_DzEohVakpValRjRS1zVG8).

I had scala 2.11.7 installed on my machine. But even after uninstalling it,
I am still getting the exception with 2.11.6.

Changing the scala version to 2.11.7 in build.sbt fixes the exception as
you suggested. I am unclear as to why it works with 2.11.7 and not 2.11.6.

Thanks,
Babar

On Mon, Nov 2, 2015 at 2:10 PM Jonathan Coveney <jcove...@gmail.com> wrote:

> Caused by: java.lang.ClassNotFoundException: scala.Some
>
> indicates that you don't have the scala libs present. How are you
> executing this? My guess is the issue is a conflict between scala 2.11.6 in
> your build and 2.11.7? Not sure...try setting your scala to 2.11.7?
>
> But really, first it'd be good to see what command you're using to invoke
> this.
>
> 2015-11-02 14:48 GMT-05:00 Babar Tareen <babartar...@gmail.com>:
>
>> Resending, haven't found a workaround. Any help is highly appreciated.
>>
>> -- Forwarded message --
>> From: Babar Tareen <babartar...@gmail.com>
>> Date: Thu, Oct 22, 2015 at 2:47 PM
>> Subject: Getting ClassNotFoundException: scala.Some on Spark 1.5.x
>> To: user@spark.apache.org
>>
>>
>> Hi,
>>
>> I am getting following exception when submitting a job to Spark 1.5.x
>> from Scala. The same code works with Spark 1.4.1. Any clues as to what
>> might causing the exception.
>>
>>
>>
>> *Code:App.scala*import org.apache.spark.SparkContext
>>
>> object App {
>>   def main(args: Array[String]) = {
>> val l = List(1,2,3,4,5,6,7,8,9,0)
>> val sc = new SparkContext("local[4]", "soark-test")
>> val rdd = sc.parallelize(l)
>> rdd.foreach(println)
>> println(rdd.collect())
>>   }
>> }
>>
>> *build.sbt*
>> lazy val sparkjob = (project in file("."))
>>   .settings(
>> name := "SparkJob",
>> version := "1.0",
>> scalaVersion := "2.11.6",
>> libraryDependencies := libs
>> )
>>
>> lazy val libs = Seq(
>>   "org.apache.spark" %% "spark-core" % "1.5.1"
>> )
>>
>>
>> *Exception:*15/10/22 14:32:42 INFO DAGScheduler: Job 0 failed: foreach
>> at app.scala:9, took 0.689832 s
>> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
>> stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure:
>> Lost task 2.0 in stage 0.0 (TID 2, localhost): java.io.IOException:
>> java.lang.ClassNotFoundException: scala.Some
>> [error] at
>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>> [error] at
>> org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
>> [error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> [error] at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> [error] at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> [error] at java.lang.reflect.Method.invoke(Method.java:497)
>> [error] at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> [error] at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
>> [error] at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> [error] at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> [error] at
>> java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1959)
>> [error] at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> [error] at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> [error] at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> [error] at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
>> [error] at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
>> [error] at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> [error] at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> [error] at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> [error] at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> [error] at
>> org.apache.spark.serializer.JavaSerializerInstan

Re: Getting ClassNotFoundException: scala.Some on Spark 1.5.x

2015-11-02 Thread Jonathan Coveney
My guess, and it's just a guess, is that there is some change between
versions which you got bit by as it chsnged the class path.

El lunes, 2 de noviembre de 2015, Babar Tareen <babartar...@gmail.com>
escribió:

> I am using *'sbt run'* to execute the code. Detailed sbt output is here (
> https://drive.google.com/open?id=0B2dlA_DzEohVakpValRjRS1zVG8).
>
> I had scala 2.11.7 installed on my machine. But even after uninstalling
> it, I am still getting the exception with 2.11.6.
>
> Changing the scala version to 2.11.7 in build.sbt fixes the exception as
> you suggested. I am unclear as to why it works with 2.11.7 and not 2.11.6.
>
> Thanks,
> Babar
>
> On Mon, Nov 2, 2015 at 2:10 PM Jonathan Coveney <jcove...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jcove...@gmail.com');>> wrote:
>
>> Caused by: java.lang.ClassNotFoundException: scala.Some
>>
>> indicates that you don't have the scala libs present. How are you
>> executing this? My guess is the issue is a conflict between scala 2.11.6 in
>> your build and 2.11.7? Not sure...try setting your scala to 2.11.7?
>>
>> But really, first it'd be good to see what command you're using to invoke
>> this.
>>
>> 2015-11-02 14:48 GMT-05:00 Babar Tareen <babartar...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','babartar...@gmail.com');>>:
>>
>>> Resending, haven't found a workaround. Any help is highly appreciated.
>>>
>>> -- Forwarded message --
>>> From: Babar Tareen <babartar...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','babartar...@gmail.com');>>
>>> Date: Thu, Oct 22, 2015 at 2:47 PM
>>> Subject: Getting ClassNotFoundException: scala.Some on Spark 1.5.x
>>> To: user@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','user@spark.apache.org');>
>>>
>>>
>>> Hi,
>>>
>>> I am getting following exception when submitting a job to Spark 1.5.x
>>> from Scala. The same code works with Spark 1.4.1. Any clues as to what
>>> might causing the exception.
>>>
>>>
>>>
>>> *Code:App.scala*import org.apache.spark.SparkContext
>>>
>>> object App {
>>>   def main(args: Array[String]) = {
>>> val l = List(1,2,3,4,5,6,7,8,9,0)
>>> val sc = new SparkContext("local[4]", "soark-test")
>>> val rdd = sc.parallelize(l)
>>> rdd.foreach(println)
>>> println(rdd.collect())
>>>   }
>>> }
>>>
>>> *build.sbt*
>>> lazy val sparkjob = (project in file("."))
>>>   .settings(
>>> name := "SparkJob",
>>> version := "1.0",
>>> scalaVersion := "2.11.6",
>>> libraryDependencies := libs
>>> )
>>>
>>> lazy val libs = Seq(
>>>   "org.apache.spark" %% "spark-core" % "1.5.1"
>>> )
>>>
>>>
>>> *Exception:*15/10/22 14:32:42 INFO DAGScheduler: Job 0 failed: foreach
>>> at app.scala:9, took 0.689832 s
>>> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
>>> stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure:
>>> Lost task 2.0 in stage 0.0 (TID 2, localhost): java.io.IOException:
>>> java.lang.ClassNotFoundException: scala.Some
>>> [error] at
>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>>> [error] at
>>> org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
>>> [error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> [error] at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> [error] at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> [error] at java.lang.reflect.Method.invoke(Method.java:497)
>>> [error] at
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>> [error] at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
>>> [error] at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> [error] at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>> [error] at
>>> java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1959)
>>> [error] at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>

ClassNotFoundException even if class is present in Jarfile

2015-11-02 Thread hveiga
Hello,

I am facing an issue where I cannot run my Spark job in a cluster
environment (standalone or EMR) but it works successfully if I run it
locally using local[*] as master.

I am getting ClassNotFoundException: com.mycompany.folder.MyObject on the
slave executors. I don't really understand why this is happening since I
have uncompressed the Jarfile to make sure that the class is present inside
(both .java and .class) and all the rest of the classes are being loaded
fine.

Also, I would like to mention something weird that might be related but not
sure. There are two packages inside my jarfile that are called the same but
with different casing:

- com.mycompany.folder.MyObject
- com.myCompany.something.Else

Could that be the reason?

Also, I have tried adding my jarfiles in all the ways I could find
(sparkConf.setJars(...), sparkContext.addJar(...), spark-submit opt --jars,
...) but none of the actually worked.

I am using Apache Spark 1.5.0, Java 7, sbt 0.13.7, scala 2.10.5.

Thanks a lot,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-10-22 Thread Akhil Das
Did you try passing the mysql connector jar through --driver-class-path

Thanks
Best Regards

On Sat, Oct 17, 2015 at 6:33 AM, Hurshal Patel <hpatel...@gmail.com> wrote:

> Hi all,
>
> I've been struggling with a particularly puzzling issue after upgrading to
> Spark 1.5.1 from Spark 1.4.1.
>
> When I use the MySQL JDBC connector and an exception (e.g.
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException) is thrown on
> the executor, I get a ClassNotFoundException on the driver, which results
> in this error (logs are abbreviated):
>
> 15/10/16 17:20:59 INFO SparkContext: Starting job: collect at
> repro.scala:73
> ...
> 15/10/16 17:20:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
> 15/10/16 17:20:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID
> 3)
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
> at repro.Repro$$anonfun$main$3.apply$mcZI$sp(repro.scala:69)
> ...
> 15/10/16 17:20:59 WARN ThrowableSerializationWrapper: Task exception could
> not be deserialized
> java.lang.ClassNotFoundException:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> ...
> 15/10/16 17:20:59 ERROR TaskResultGetter: Could not deserialize
> TaskEndReason: ClassNotFound with classloader
> org.apache.spark.util.MutableURLClassLoader@7f08a6b1
> 15/10/16 17:20:59 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3,
> localhost): UnknownReason
> 15/10/16 17:20:59 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1
> times; aborting job
> 15/10/16 17:20:59 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
> have all completed, from pool
> 15/10/16 17:20:59 INFO TaskSchedulerImpl: Cancelling stage 3
> 15/10/16 17:20:59 INFO DAGScheduler: ResultStage 3 (collect at
> repro.scala:73) failed in 0.012 s
> 15/10/16 17:20:59 INFO DAGScheduler: Job 3 failed: collect at
> repro.scala:73, took 0.018694 s
>
>  In Spark 1.4.1, I get the following (logs are abbreviated):
> 15/10/16 17:42:41 INFO SparkContext: Starting job: collect at
> repro.scala:53
> ...
> 15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
> 15/10/16 17:42:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID
> 2)
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
> at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
> ...
> 15/10/16 17:42:41 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
> localhost): com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
> at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
> ...
>
> 15/10/16 17:42:41 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1
> times; aborting job
> 15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
> have all completed, from pool
> 15/10/16 17:42:41 INFO TaskSchedulerImpl: Cancelling stage 2
> 15/10/16 17:42:41 INFO DAGScheduler: ResultStage 2 (collect at
> repro.scala:53) failed in 0.016 s
> 15/10/16 17:42:41 INFO DAGScheduler: Job 2 failed: collect at
> repro.scala:53, took 0.024584 s
>
>
> I have seriously screwed up somewhere or this is a change in behavior that
> I have not been able to find in the documentation. For those that are
> interested, a full repro and logs follow.
>
> Hurshal
>
> ---
>
> I am running this on Spark 1.5.1+Hadoop 2.6. I have tried this in various
> combinations of
>  * local/standalone mode
>  * putting mysql on the classpath with --jars/building a fat jar with
> mysql in it/manually running sc.addJar on the mysql jar
>  * --deploy-mode client/--deploy-mode cluster
> but nothing seems to change.
>
>
>
> Here is an example invocation, and the accompanying source code:
>
> $ ./bin/spark-submit --master local --deploy-mode client --class
> repro.Repro /home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.jar
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/10/16 17:40:53 INFO SparkContext: Running Spark version 1.5.1
> 15/10/16 17:40:53 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/10/16 17:40:53 WARN Utils: Your hostname, choochootrain resolves to a
> loopback address: 127.0.1.1; using 10.0.1.97 instead (on interface wlan0)
> 15/10/16 17:40:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
> 15/10/16 17:40:53 INFO SecurityManager: Changing view acls to: root
> 15/10/16 17:40:53 INFO SecurityManager: Changing modify acls to: root
> 15/10/16 17:40:53 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(root); users
> with modify permissions: Set(root)
> 15/10/16 17:4

Getting ClassNotFoundException: scala.Some on Spark 1.5.x

2015-10-22 Thread Babar Tareen
Hi,

I am getting following exception when submitting a job to Spark 1.5.x from
Scala. The same code works with Spark 1.4.1. Any clues as to what might
causing the exception.



*Code:App.scala*import org.apache.spark.SparkContext

object App {
  def main(args: Array[String]) = {
val l = List(1,2,3,4,5,6,7,8,9,0)
val sc = new SparkContext("local[4]", "soark-test")
val rdd = sc.parallelize(l)
rdd.foreach(println)
println(rdd.collect())
  }
}

*build.sbt*
lazy val sparkjob = (project in file("."))
  .settings(
name := "SparkJob",
version := "1.0",
scalaVersion := "2.11.6",
libraryDependencies := libs
)

lazy val libs = Seq(
  "org.apache.spark" %% "spark-core" % "1.5.1"
)


*Exception:*15/10/22 14:32:42 INFO DAGScheduler: Job 0 failed: foreach at
app.scala:9, took 0.689832 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure:
Lost task 2.0 in stage 0.0 (TID 2, localhost): java.io.IOException:
java.lang.ClassNotFoundException: scala.Some
[error] at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
[error] at
org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:497)
[error] at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1959)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
[error] at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
[error] at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
[error] at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
[error] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error] at java.lang.Thread.run(Thread.java:745)
[error] Caused by: java.lang.ClassNotFoundException: scala.Some
[error] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[error] at java.lang.Class.forName0(Native Method)
[error] at java.lang.Class.forName(Class.java:348)
[error] at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
[error] at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
[error] at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
[error] at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
[error] at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] at
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501)
[error] at
org.apache.spark.Accumulable$$anonfun$readObject$1.apply$mcV$sp(Accumulators.scala:152)
[error] at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)
[error] ... 24 more
[error]
[error] Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2
in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage
0.0 (TID 2, localhost): java.io.IOException:
java.lang.ClassNotFoundException: scala.Some
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
at org.apache.spark.Accumulable.readObject(Accumulators.scala:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-10-22 Thread Xiao Li
A few months ago, I used the DB2 jdbc drivers. I hit a couple of issues
when using --driver-class-path. At the end, I used the following command to
bypass most of issues:

./bin/spark-submit --jars
/Users/smile/db2driver/db2jcc.jar,/Users/smile/db2driver/db2jcc_license_cisuz.jar
--master local[*] --class com.sparkEngine.
/Users/smile/spark-1.3.1-bin-hadoop2.3/projects/SparkApps-master/spark-load-from-db/target/-1.0.jar

Hopefully, it works for you.

Xiao Li


2015-10-22 4:56 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

> Did you try passing the mysql connector jar through --driver-class-path
>
> Thanks
> Best Regards
>
> On Sat, Oct 17, 2015 at 6:33 AM, Hurshal Patel <hpatel...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I've been struggling with a particularly puzzling issue after upgrading
>> to Spark 1.5.1 from Spark 1.4.1.
>>
>> When I use the MySQL JDBC connector and an exception (e.g.
>> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException) is thrown on
>> the executor, I get a ClassNotFoundException on the driver, which results
>> in this error (logs are abbreviated):
>>
>> 15/10/16 17:20:59 INFO SparkContext: Starting job: collect at
>> repro.scala:73
>> ...
>> 15/10/16 17:20:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
>> 15/10/16 17:20:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID
>> 3)
>> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
>> at repro.Repro$$anonfun$main$3.apply$mcZI$sp(repro.scala:69)
>> ...
>> 15/10/16 17:20:59 WARN ThrowableSerializationWrapper: Task exception
>> could not be deserialized
>> java.lang.ClassNotFoundException:
>> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> ...
>> 15/10/16 17:20:59 ERROR TaskResultGetter: Could not deserialize
>> TaskEndReason: ClassNotFound with classloader
>> org.apache.spark.util.MutableURLClassLoader@7f08a6b1
>> 15/10/16 17:20:59 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3,
>> localhost): UnknownReason
>> 15/10/16 17:20:59 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1
>> times; aborting job
>> 15/10/16 17:20:59 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose
>> tasks have all completed, from pool
>> 15/10/16 17:20:59 INFO TaskSchedulerImpl: Cancelling stage 3
>> 15/10/16 17:20:59 INFO DAGScheduler: ResultStage 3 (collect at
>> repro.scala:73) failed in 0.012 s
>> 15/10/16 17:20:59 INFO DAGScheduler: Job 3 failed: collect at
>> repro.scala:73, took 0.018694 s
>>
>>  In Spark 1.4.1, I get the following (logs are abbreviated):
>> 15/10/16 17:42:41 INFO SparkContext: Starting job: collect at
>> repro.scala:53
>> ...
>> 15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
>> 15/10/16 17:42:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID
>> 2)
>> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
>> at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
>> ...
>> 15/10/16 17:42:41 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
>> localhost): com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
>> at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
>> ...
>>
>> 15/10/16 17:42:41 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1
>> times; aborting job
>> 15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose
>> tasks have all completed, from pool
>> 15/10/16 17:42:41 INFO TaskSchedulerImpl: Cancelling stage 2
>> 15/10/16 17:42:41 INFO DAGScheduler: ResultStage 2 (collect at
>> repro.scala:53) failed in 0.016 s
>> 15/10/16 17:42:41 INFO DAGScheduler: Job 2 failed: collect at
>> repro.scala:53, took 0.024584 s
>>
>>
>> I have seriously screwed up somewhere or this is a change in behavior
>> that I have not been able to find in the documentation. For those that are
>> interested, a full repro and logs follow.
>>
>> Hurshal
>>
>> ---
>>
>> I am running this on Spark 1.5.1+Hadoop 2.6. I have tried this in various
>> combinations of
>>  * local/standalone mode
>>  * putting mysql on the classpath with --jars/building a fat jar with
>> mysql in it/manually running sc.addJar on the mysql jar
>>  * --deploy-mode client/--deploy-mode cluster
>> but nothing seems to change.
>>
>>
>>
>> Here is an example invocation, and the accompanying source code:
>>
>> $ ./bin/spark-submit --master local --deploy-mode client --class
>> repro.Repro /home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.j

driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-10-16 Thread Hurshal Patel
Hi all,

I've been struggling with a particularly puzzling issue after upgrading to
Spark 1.5.1 from Spark 1.4.1.

When I use the MySQL JDBC connector and an exception (e.g.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException) is thrown on the
executor, I get a ClassNotFoundException on the driver, which results in
this error (logs are abbreviated):

15/10/16 17:20:59 INFO SparkContext: Starting job: collect at repro.scala:73
...
15/10/16 17:20:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/10/16 17:20:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$3.apply$mcZI$sp(repro.scala:69)
...
15/10/16 17:20:59 WARN ThrowableSerializationWrapper: Task exception could
not be deserialized
java.lang.ClassNotFoundException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
...
15/10/16 17:20:59 ERROR TaskResultGetter: Could not deserialize
TaskEndReason: ClassNotFound with classloader
org.apache.spark.util.MutableURLClassLoader@7f08a6b1
15/10/16 17:20:59 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3,
localhost): UnknownReason
15/10/16 17:20:59 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times;
aborting job
15/10/16 17:20:59 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
15/10/16 17:20:59 INFO TaskSchedulerImpl: Cancelling stage 3
15/10/16 17:20:59 INFO DAGScheduler: ResultStage 3 (collect at
repro.scala:73) failed in 0.012 s
15/10/16 17:20:59 INFO DAGScheduler: Job 3 failed: collect at
repro.scala:73, took 0.018694 s

 In Spark 1.4.1, I get the following (logs are abbreviated):
15/10/16 17:42:41 INFO SparkContext: Starting job: collect at repro.scala:53
...
15/10/16 17:42:41 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/10/16 17:42:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
...
15/10/16 17:42:41 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2,
localhost): com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException
at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
...

15/10/16 17:42:41 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times;
aborting job
15/10/16 17:42:41 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
15/10/16 17:42:41 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/16 17:42:41 INFO DAGScheduler: ResultStage 2 (collect at
repro.scala:53) failed in 0.016 s
15/10/16 17:42:41 INFO DAGScheduler: Job 2 failed: collect at
repro.scala:53, took 0.024584 s


I have seriously screwed up somewhere or this is a change in behavior that
I have not been able to find in the documentation. For those that are
interested, a full repro and logs follow.

Hurshal

---

I am running this on Spark 1.5.1+Hadoop 2.6. I have tried this in various
combinations of
 * local/standalone mode
 * putting mysql on the classpath with --jars/building a fat jar with mysql
in it/manually running sc.addJar on the mysql jar
 * --deploy-mode client/--deploy-mode cluster
but nothing seems to change.



Here is an example invocation, and the accompanying source code:

$ ./bin/spark-submit --master local --deploy-mode client --class
repro.Repro /home/nix/repro/target/scala-2.10/repro-assembly-0.0.1.jar
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/10/16 17:40:53 INFO SparkContext: Running Spark version 1.5.1
15/10/16 17:40:53 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/10/16 17:40:53 WARN Utils: Your hostname, choochootrain resolves to a
loopback address: 127.0.1.1; using 10.0.1.97 instead (on interface wlan0)
15/10/16 17:40:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
15/10/16 17:40:53 INFO SecurityManager: Changing view acls to: root
15/10/16 17:40:53 INFO SecurityManager: Changing modify acls to: root
15/10/16 17:40:53 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(root); users
with modify permissions: Set(root)
15/10/16 17:40:54 INFO Slf4jLogger: Slf4jLogger started
15/10/16 17:40:54 INFO Remoting: Starting remoting
15/10/16 17:40:54 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@10.0.1.97:48116]
15/10/16 17:40:54 INFO Utils: Successfully started service 'sparkDriver' on
port 48116.
15/10/16 17:40:54 INFO SparkEnv: Registering MapOutputTracker
15/10/16 17:40:54 INFO SparkEnv: Registering BlockManagerMaster
15/10/16 17:40:54 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-7e7cf2b0-397e-4c44-97e9-508f5c6ec5ab
15/10/16 17:40:54 INFO MemoryStore: MemoryStore started with capacity 530.3
MB
15/10/16 17:40:54 INFO HttpFileServer: HTTP File server

Spark 1.5.1 ClassNotFoundException in cluster mode.

2015-10-14 Thread Renato Perini

Hello.
I have developed a Spark job using a jersey client (1.9 included with 
Spark) to make some service calls during data computations.

Data is read and written on an Apache Cassandra 2.2.1 database.
When I run the job in local mode, everything works nicely. But when I 
execute my job in cluster mode (spark standalone) I receive the 
following exception:
I have no clue on where this exception occurs. Any idea / advice on what 
can I check?


[Stage 38:==> (8 + 
2) / 200]15/10/14 15:54:07 WARN ThrowableSerializationWrapper: Task 
exception could
java.lang.ClassNotFoundException: 
com.datastax.spark.connector.types.TypeConversionException

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)

at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1897)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)

at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
15/10/14 15:54:07 WARN ThrowableSerializationWrapper: Task exception 
could not be deserialized
java.lang.ClassNotFoundException: 
com.datastax.spark.connector.types.TypeConversionException

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)

at 

Re: Spark 1.5.1 ClassNotFoundException in cluster mode.

2015-10-14 Thread Dean Wampler
There is a Datastax Spark connector library jar file that you probably have
on your CLASSPATH locally, but not on the cluster. If you know where it is,
you could either install it on each node in some location on their
CLASSPATHs or when you submit the mob, pass the jar file using the "--jars"
option. Note that the latter may not be an ideal solution if it has other
dependencies that also need to be passed.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
 (O'Reilly)
Typesafe 
@deanwampler 
http://polyglotprogramming.com

On Wed, Oct 14, 2015 at 5:05 PM, Renato Perini 
wrote:

> Hello.
> I have developed a Spark job using a jersey client (1.9 included with
> Spark) to make some service calls during data computations.
> Data is read and written on an Apache Cassandra 2.2.1 database.
> When I run the job in local mode, everything works nicely. But when I
> execute my job in cluster mode (spark standalone) I receive the following
> exception:
> I have no clue on where this exception occurs. Any idea / advice on what
> can I check?
>
> [Stage 38:==> (8 + 2)
> / 200]15/10/14 15:54:07 WARN ThrowableSerializationWrapper: Task exception
> could
> java.lang.ClassNotFoundException:
> com.datastax.spark.connector.types.TypeConversionException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:278)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1897)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/10/14 15:54:07 WARN ThrowableSerializationWrapper: Task exception could
> 

Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-30 Thread Ted Yu
bq. have tried these settings with the hbase protocol jar, to no avail

In that case, HBaseZeroCopyByteString is contained in hbase-protocol.jar.
In HBaseZeroCopyByteString , you can see:

package com.google.protobuf;  // This is a lie.

If protobuf jar is loaded ahead of hbase-protocol.jar, things start to get
interesting ...

On Tue, Sep 29, 2015 at 6:12 PM, Dmitry Goldenberg  wrote:

> Ted, I think I have tried these settings with the hbase protocol jar, to
> no avail.
>
> I'm going to see if I can try and use these with this SolrException issue
> though it now may be harder to reproduce it. Thanks for the suggestion.
>
> On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu  wrote:
>
>> Have you tried the following ?
>> --conf spark.driver.userClassPathFirst=true --conf spark.executor.
>> userClassPathFirst=true
>>
>> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg <
>> dgoldenberg...@gmail.com> wrote:
>>
>>> Release of Spark: 1.5.0.
>>>
>>> Command line invokation:
>>>
>>> ACME_INGEST_HOME=/mnt/acme/acme-ingest
>>> ACME_INGEST_VERSION=0.0.1-SNAPSHOT
>>> ACME_BATCH_DURATION_MILLIS=5000
>>> SPARK_MASTER_URL=spark://data1:7077
>>> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
>>> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"
>>>
>>> $SPARK_HOME/bin/spark-submit \
>>> --driver-class-path  $ACME_INGEST_HOME \
>>> --driver-java-options "$JAVA_OPTIONS" \
>>> --class
>>> "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
>>> --master $SPARK_MASTER_URL  \
>>> --conf
>>> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
>>> \
>>>
>>> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
>>> -brokerlist $METADATA_BROKER_LIST \
>>> -topic acme.topic1 \
>>> -autooffsetreset largest \
>>> -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
>>> -appname Acme.App1 \
>>> -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
>>> Note that SolrException is definitely in our consumer jar
>>> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
>>> $ACME_INGEST_HOME.
>>>
>>> For the extraClassPath on the executors, we've got additionally
>>> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
>>> Spark jobs to communicate with HBase.  The only way to force Phoenix to
>>> successfully communicate with HBase was to have that JAR explicitly added
>>> to the executor classpath regardless of the fact that the contents of the
>>> hbase-protocol hadoop jar get rolled up into the consumer jar at build time.
>>>
>>> I'm starting to wonder whether there's some class loading pattern here
>>> where some classes may not get loaded out of the consumer jar and therefore
>>> have to have their respective jars added to the executor extraClassPath?
>>>
>>> Or is this a serialization problem for SolrException as Divya
>>> Ravichandran suggested?
>>>
>>>
>>>
>>>
>>> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:
>>>
 Mind providing a bit more information:

 release of Spark
 command line for running Spark job

 Cheers

 On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
 dgoldenberg...@gmail.com> wrote:

> We're seeing this occasionally. Granted, this was caused by a wrinkle
> in the Solr schema but this bubbled up all the way in Spark and caused job
> failures.
>
> I just checked and SolrException class is actually in the consumer job
> jar we use.  Is there any reason why Spark cannot find the SolrException
> class?
>
> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
> could not be deserialized
> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> 

Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-30 Thread Dmitry Goldenberg
I believe I've had trouble with --conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true before, so these might not
work...

I was thinking of trying to add the solr4j jar to
spark.executor.extraClassPath...

On Wed, Sep 30, 2015 at 12:01 PM, Ted Yu  wrote:

> bq. have tried these settings with the hbase protocol jar, to no avail
>
> In that case, HBaseZeroCopyByteString is contained in hbase-protocol.jar.
> In HBaseZeroCopyByteString , you can see:
>
> package com.google.protobuf;  // This is a lie.
>
> If protobuf jar is loaded ahead of hbase-protocol.jar, things start to get
> interesting ...
>
> On Tue, Sep 29, 2015 at 6:12 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> Ted, I think I have tried these settings with the hbase protocol jar, to
>> no avail.
>>
>> I'm going to see if I can try and use these with this SolrException issue
>> though it now may be harder to reproduce it. Thanks for the suggestion.
>>
>> On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu  wrote:
>>
>>> Have you tried the following ?
>>> --conf spark.driver.userClassPathFirst=true --conf spark.executor.
>>> userClassPathFirst=true
>>>
>>> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg <
>>> dgoldenberg...@gmail.com> wrote:
>>>
 Release of Spark: 1.5.0.

 Command line invokation:

 ACME_INGEST_HOME=/mnt/acme/acme-ingest
 ACME_INGEST_VERSION=0.0.1-SNAPSHOT
 ACME_BATCH_DURATION_MILLIS=5000
 SPARK_MASTER_URL=spark://data1:7077
 JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
 JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"

 $SPARK_HOME/bin/spark-submit \
 --driver-class-path  $ACME_INGEST_HOME \
 --driver-java-options "$JAVA_OPTIONS" \
 --class
 "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
 --master $SPARK_MASTER_URL  \
 --conf
 "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
 \

 $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
 -brokerlist $METADATA_BROKER_LIST \
 -topic acme.topic1 \
 -autooffsetreset largest \
 -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
 -appname Acme.App1 \
 -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
 Note that SolrException is definitely in our consumer jar
 acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
 $ACME_INGEST_HOME.

 For the extraClassPath on the executors, we've got additionally
 hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
 Spark jobs to communicate with HBase.  The only way to force Phoenix to
 successfully communicate with HBase was to have that JAR explicitly added
 to the executor classpath regardless of the fact that the contents of the
 hbase-protocol hadoop jar get rolled up into the consumer jar at build 
 time.

 I'm starting to wonder whether there's some class loading pattern here
 where some classes may not get loaded out of the consumer jar and therefore
 have to have their respective jars added to the executor extraClassPath?

 Or is this a serialization problem for SolrException as Divya
 Ravichandran suggested?




 On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:

> Mind providing a bit more information:
>
> release of Spark
> command line for running Spark job
>
> Cheers
>
> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> We're seeing this occasionally. Granted, this was caused by a wrinkle
>> in the Solr schema but this bubbled up all the way in Spark and caused 
>> job
>> failures.
>>
>> I just checked and SolrException class is actually in the consumer
>> job jar we use.  Is there any reason why Spark cannot find the
>> SolrException class?
>>
>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
>> could not be deserialized
>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:348)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>> at
>> 

Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Dmitry Goldenberg
Release of Spark: 1.5.0.

Command line invokation:

ACME_INGEST_HOME=/mnt/acme/acme-ingest
ACME_INGEST_VERSION=0.0.1-SNAPSHOT
ACME_BATCH_DURATION_MILLIS=5000
SPARK_MASTER_URL=spark://data1:7077
JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"

$SPARK_HOME/bin/spark-submit \
--driver-class-path  $ACME_INGEST_HOME \
--driver-java-options "$JAVA_OPTIONS" \
--class "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
--master $SPARK_MASTER_URL  \
--conf
"spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
\

$ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
-brokerlist $METADATA_BROKER_LIST \
-topic acme.topic1 \
-autooffsetreset largest \
-batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
-appname Acme.App1 \
-checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
Note that SolrException is definitely in our consumer jar
acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
$ACME_INGEST_HOME.

For the extraClassPath on the executors, we've got additionally
hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
Spark jobs to communicate with HBase.  The only way to force Phoenix to
successfully communicate with HBase was to have that JAR explicitly added
to the executor classpath regardless of the fact that the contents of the
hbase-protocol hadoop jar get rolled up into the consumer jar at build time.

I'm starting to wonder whether there's some class loading pattern here
where some classes may not get loaded out of the consumer jar and therefore
have to have their respective jars added to the executor extraClassPath?

Or is this a serialization problem for SolrException as Divya Ravichandran
suggested?




On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:

> Mind providing a bit more information:
>
> release of Spark
> command line for running Spark job
>
> Cheers
>
> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> We're seeing this occasionally. Granted, this was caused by a wrinkle in
>> the Solr schema but this bubbled up all the way in Spark and caused job
>> failures.
>>
>> I just checked and SolrException class is actually in the consumer job
>> jar we use.  Is there any reason why Spark cannot find the SolrException
>> class?
>>
>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
>> could not be deserialized
>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:348)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> at
>> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> at
>> 

Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Ted Yu
Mind providing a bit more information:

release of Spark
command line for running Spark job

Cheers

On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg  wrote:

> We're seeing this occasionally. Granted, this was caused by a wrinkle in
> the Solr schema but this bubbled up all the way in Spark and caused job
> failures.
>
> I just checked and SolrException class is actually in the consumer job jar
> we use.  Is there any reason why Spark cannot find the SolrException class?
>
> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception could
> not be deserialized
> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>


Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Ted Yu
Have you tried the following ?
--conf spark.driver.userClassPathFirst=true --conf spark.executor.
userClassPathFirst=true

On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg  wrote:

> Release of Spark: 1.5.0.
>
> Command line invokation:
>
> ACME_INGEST_HOME=/mnt/acme/acme-ingest
> ACME_INGEST_VERSION=0.0.1-SNAPSHOT
> ACME_BATCH_DURATION_MILLIS=5000
> SPARK_MASTER_URL=spark://data1:7077
> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"
>
> $SPARK_HOME/bin/spark-submit \
> --driver-class-path  $ACME_INGEST_HOME \
> --driver-java-options "$JAVA_OPTIONS" \
> --class "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
> --master $SPARK_MASTER_URL  \
> --conf
> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
> \
>
> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
> -brokerlist $METADATA_BROKER_LIST \
> -topic acme.topic1 \
> -autooffsetreset largest \
> -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
> -appname Acme.App1 \
> -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
> Note that SolrException is definitely in our consumer jar
> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
> $ACME_INGEST_HOME.
>
> For the extraClassPath on the executors, we've got additionally
> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
> Spark jobs to communicate with HBase.  The only way to force Phoenix to
> successfully communicate with HBase was to have that JAR explicitly added
> to the executor classpath regardless of the fact that the contents of the
> hbase-protocol hadoop jar get rolled up into the consumer jar at build time.
>
> I'm starting to wonder whether there's some class loading pattern here
> where some classes may not get loaded out of the consumer jar and therefore
> have to have their respective jars added to the executor extraClassPath?
>
> Or is this a serialization problem for SolrException as Divya
> Ravichandran suggested?
>
>
>
>
> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:
>
>> Mind providing a bit more information:
>>
>> release of Spark
>> command line for running Spark job
>>
>> Cheers
>>
>> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
>> dgoldenberg...@gmail.com> wrote:
>>
>>> We're seeing this occasionally. Granted, this was caused by a wrinkle in
>>> the Solr schema but this bubbled up all the way in Spark and caused job
>>> failures.
>>>
>>> I just checked and SolrException class is actually in the consumer job
>>> jar we use.  Is there any reason why Spark cannot find the SolrException
>>> class?
>>>
>>> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
>>> could not be deserialized
>>> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:348)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>>> at
>>> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>> at
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>> 

ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Dmitry Goldenberg
We're seeing this occasionally. Granted, this was caused by a wrinkle in
the Solr schema but this bubbled up all the way in Spark and caused job
failures.

I just checked and SolrException class is actually in the consumer job jar
we use.  Is there any reason why Spark cannot find the SolrException class?

15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception could
not be deserialized
java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at
org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Divya Ravichandran
This could be because org.apache.solr.common.SolrException doesn't
implement Serializable.

This error shows up when Spark is deserilizing a class which doesn't
implement Serializable.

Thanks
Divya
On Sep 29, 2015 4:37 PM, "Dmitry Goldenberg" 
wrote:

> We're seeing this occasionally. Granted, this was caused by a wrinkle in
> the Solr schema but this bubbled up all the way in Spark and caused job
> failures.
>
> I just checked and SolrException class is actually in the consumer job jar
> we use.  Is there any reason why Spark cannot find the SolrException class?
>
> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception could
> not be deserialized
> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>


Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Dmitry Goldenberg
I'm actually not sure how either one of these would possibly cause Spark to
find SolrException. Whether the driver or executor class path is first,
should it not matter, if the class is in the consumer job jar?




On Tue, Sep 29, 2015 at 9:12 PM, Dmitry Goldenberg  wrote:

> Ted, I think I have tried these settings with the hbase protocol jar, to
> no avail.
>
> I'm going to see if I can try and use these with this SolrException issue
> though it now may be harder to reproduce it. Thanks for the suggestion.
>
> On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu  wrote:
>
>> Have you tried the following ?
>> --conf spark.driver.userClassPathFirst=true --conf spark.executor.
>> userClassPathFirst=true
>>
>> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg <
>> dgoldenberg...@gmail.com> wrote:
>>
>>> Release of Spark: 1.5.0.
>>>
>>> Command line invokation:
>>>
>>> ACME_INGEST_HOME=/mnt/acme/acme-ingest
>>> ACME_INGEST_VERSION=0.0.1-SNAPSHOT
>>> ACME_BATCH_DURATION_MILLIS=5000
>>> SPARK_MASTER_URL=spark://data1:7077
>>> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
>>> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"
>>>
>>> $SPARK_HOME/bin/spark-submit \
>>> --driver-class-path  $ACME_INGEST_HOME \
>>> --driver-java-options "$JAVA_OPTIONS" \
>>> --class
>>> "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver" \
>>> --master $SPARK_MASTER_URL  \
>>> --conf
>>> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
>>> \
>>>
>>> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
>>> -brokerlist $METADATA_BROKER_LIST \
>>> -topic acme.topic1 \
>>> -autooffsetreset largest \
>>> -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
>>> -appname Acme.App1 \
>>> -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
>>> Note that SolrException is definitely in our consumer jar
>>> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
>>> $ACME_INGEST_HOME.
>>>
>>> For the extraClassPath on the executors, we've got additionally
>>> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
>>> Spark jobs to communicate with HBase.  The only way to force Phoenix to
>>> successfully communicate with HBase was to have that JAR explicitly added
>>> to the executor classpath regardless of the fact that the contents of the
>>> hbase-protocol hadoop jar get rolled up into the consumer jar at build time.
>>>
>>> I'm starting to wonder whether there's some class loading pattern here
>>> where some classes may not get loaded out of the consumer jar and therefore
>>> have to have their respective jars added to the executor extraClassPath?
>>>
>>> Or is this a serialization problem for SolrException as Divya
>>> Ravichandran suggested?
>>>
>>>
>>>
>>>
>>> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:
>>>
 Mind providing a bit more information:

 release of Spark
 command line for running Spark job

 Cheers

 On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
 dgoldenberg...@gmail.com> wrote:

> We're seeing this occasionally. Granted, this was caused by a wrinkle
> in the Solr schema but this bubbled up all the way in Spark and caused job
> failures.
>
> I just checked and SolrException class is actually in the consumer job
> jar we use.  Is there any reason why Spark cannot find the SolrException
> class?
>
> 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
> could not be deserialized
> java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> 

Re: ThrowableSerializationWrapper: Task exception could not be deserialized / ClassNotFoundException: org.apache.solr.common.SolrException

2015-09-29 Thread Dmitry Goldenberg
Ted, I think I have tried these settings with the hbase protocol jar, to no
avail.

I'm going to see if I can try and use these with this SolrException issue
though it now may be harder to reproduce it. Thanks for the suggestion.

On Tue, Sep 29, 2015 at 8:03 PM, Ted Yu  wrote:

> Have you tried the following ?
> --conf spark.driver.userClassPathFirst=true --conf spark.executor.
> userClassPathFirst=true
>
> On Tue, Sep 29, 2015 at 4:38 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> Release of Spark: 1.5.0.
>>
>> Command line invokation:
>>
>> ACME_INGEST_HOME=/mnt/acme/acme-ingest
>> ACME_INGEST_VERSION=0.0.1-SNAPSHOT
>> ACME_BATCH_DURATION_MILLIS=5000
>> SPARK_MASTER_URL=spark://data1:7077
>> JAVA_OPTIONS="-Dspark.streaming.kafka.maxRatePerPartition=1000"
>> JAVA_OPTIONS="$JAVA_OPTIONS -Dspark.executor.memory=2g"
>>
>> $SPARK_HOME/bin/spark-submit \
>> --driver-class-path  $ACME_INGEST_HOME \
>> --driver-java-options "$JAVA_OPTIONS" \
>> --class "com.acme.consumer.kafka.spark.KafkaSparkStreamingDriver"
>> \
>> --master $SPARK_MASTER_URL  \
>> --conf
>> "spark.executor.extraClassPath=$ACME_INGEST_HOME/conf:$ACME_INGEST_HOME/lib/hbase-protocol-0.98.9-hadoop2.jar"
>> \
>>
>> $ACME_INGEST_HOME/lib/acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar \
>> -brokerlist $METADATA_BROKER_LIST \
>> -topic acme.topic1 \
>> -autooffsetreset largest \
>> -batchdurationmillis $ACME_BATCH_DURATION_MILLIS \
>> -appname Acme.App1 \
>> -checkpointdir file://$SPARK_HOME/acme/checkpoint-acme-app1
>> Note that SolrException is definitely in our consumer jar
>> acme-ingest-kafka-spark-$ACME_INGEST_VERSION.jar which gets deployed to
>> $ACME_INGEST_HOME.
>>
>> For the extraClassPath on the executors, we've got additionally
>> hbase-protocol-0.98.9-hadoop2.jar: we're using Apache Phoenix from the
>> Spark jobs to communicate with HBase.  The only way to force Phoenix to
>> successfully communicate with HBase was to have that JAR explicitly added
>> to the executor classpath regardless of the fact that the contents of the
>> hbase-protocol hadoop jar get rolled up into the consumer jar at build time.
>>
>> I'm starting to wonder whether there's some class loading pattern here
>> where some classes may not get loaded out of the consumer jar and therefore
>> have to have their respective jars added to the executor extraClassPath?
>>
>> Or is this a serialization problem for SolrException as Divya
>> Ravichandran suggested?
>>
>>
>>
>>
>> On Tue, Sep 29, 2015 at 6:16 PM, Ted Yu  wrote:
>>
>>> Mind providing a bit more information:
>>>
>>> release of Spark
>>> command line for running Spark job
>>>
>>> Cheers
>>>
>>> On Tue, Sep 29, 2015 at 1:37 PM, Dmitry Goldenberg <
>>> dgoldenberg...@gmail.com> wrote:
>>>
 We're seeing this occasionally. Granted, this was caused by a wrinkle
 in the Solr schema but this bubbled up all the way in Spark and caused job
 failures.

 I just checked and SolrException class is actually in the consumer job
 jar we use.  Is there any reason why Spark cannot find the SolrException
 class?

 15/09/29 15:41:58 WARN ThrowableSerializationWrapper: Task exception
 could not be deserialized
 java.lang.ClassNotFoundException: org.apache.solr.common.SolrException
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:348)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at 

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-18 Thread Vipul Rai
Hi Nick/Igor,

​​
Any solution for this ?
Even I am having the same issue and copying jar to each executor is not
feasible if we use lot of jars.

Thanks,
Vipul


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
as a starting point, attach your stacktrace...
ps: look for duplicates in your classpath, maybe you include another jar
with same class

On 8 September 2015 at 06:38, Nicholas R. Peterson <nrpeter...@gmail.com>
wrote:

> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn.
> Serialization is set to use Kryo.
>
> I have a large object which I send to the executors as a Broadcast. The
> object seems to serialize just fine. When it attempts to deserialize,
> though, Kryo throws a ClassNotFoundException... for a class that I include
> in the fat jar that I spark-submit.
>
> What could be causing this classpath issue with Kryo on the executors?
> Where should I even start looking to try to diagnose the problem? I
> appreciate any help you can provide.
>
> Thank you!
>
> -- Nick
>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nicholas R. Peterson
Thans, Igor; I've got it running again right now, and can attach the stack
trace when it finishes.

In the mean time, I've noticed something interesting: in the Spark UI, the
application jar that I submit is not being included on the classpath.  It
has been successfully uploaded to the nodes -- in the nodemanager directory
for the application, I see __app__.jar and __spark__.jar.  The directory
itself is on the classpath, and __spark__.jar and __hadoop_conf__ are as
well.  When I do everything the same but switch the master to local[*], the
jar I submit IS added to the classpath.

This seems like a likely culprit.  What could cause this, and how can I fix
it?

Best,
Nick

On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com> wrote:

> as a starting point, attach your stacktrace...
> ps: look for duplicates in your classpath, maybe you include another jar
> with same class
>
> On 8 September 2015 at 06:38, Nicholas R. Peterson <nrpeter...@gmail.com>
> wrote:
>
>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn.
>> Serialization is set to use Kryo.
>>
>> I have a large object which I send to the executors as a Broadcast. The
>> object seems to serialize just fine. When it attempts to deserialize,
>> though, Kryo throws a ClassNotFoundException... for a class that I include
>> in the fat jar that I spark-submit.
>>
>> What could be causing this classpath issue with Kryo on the executors?
>> Where should I even start looking to try to diagnose the problem? I
>> appreciate any help you can provide.
>>
>> Thank you!
>>
>> -- Nick
>>
>
>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
nstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>  at 
>> com.twitter.chill.Instantiators$$anonfun$normalJava$1.apply(KryoBase.scala:160)
>>  at 
>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:123)
>>  ... 32 more
>> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: 
>> com.i2028.Document.Document
>>  at 
>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>  at 
>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>  at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>>  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>>  at 
>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>>  at 
>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>>  at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>>  at 
>> com.lumiata.patientanalysis.utils.CachedGraph.loadCacheFromSerializedData(CachedGraph.java:221)
>>  at 
>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:182)
>>  at 
>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
>>  ... 38 more
>> Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>  at java.lang.Class.forName0(Native Method)
>>  at java.lang.Class.forName(Class.java:348)
>>  at 
>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>>  ... 47 more
>>
>>
>>
>>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com>
>>> wrote:
>>>
>>>> I wouldn't build on this. local mode & yarn are different so that jars
>>>> you use in spark submit are handled differently
>>>>
>>>> On 8 September 2015 at 15:43, Nicholas R. Peterson <
>>>> nrpeter...@gmail.com> wrote:
>>>>
>>>>> Thans, Igor; I've got it running again right now, and can attach the
>>>>> stack trace when it finishes.
>>>>>
>>>>> In the mean time, I've noticed something interesting: in the Spark UI,
>>>>> the application jar that I submit is not being included on the classpath.
>>>>> It has been successfully uploaded to the nodes -- in the nodemanager
>>>>> directory for the application, I see __app__.jar and __spark__.jar.  The
>>>>> directory itself is on the classpath, and __spark__.jar and 
>>>>> __hadoop_conf__
>>>>> are as well.  When I do everything the same but switch the master to
>>>>> local[*], the jar I submit IS added to the classpath.
>>>>>
>>>>> This seems like a likely culprit.  What could cause this, and how can
>>>>> I fix it?
>>>>>
>>>>> Best,
>>>>> Nick
>>>>>
>>>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> as a starting point, attach your stacktrace...
>>>>>> ps: look for duplicates in your classpath, maybe you include another
>>>>>> jar with same class
>>>>>>
>>>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>
>>>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through
>>>>>>> Yarn. Serialization is set to use Kryo.
>>>>>>>
>>>>>>> I have a large object which I send to the executors as a Broadcast.
>>>>>>> The object seems to serialize just fine. When it attempts to 
>>>>>>> deserialize,
>>>>>>> though, Kryo throws a ClassNotFoundException... for a class that I 
>>>>>>> include
>>>>>>> in the fat jar that I spark-submit.
>>>>>>>
>>>>>>> What could be causing this classpath issue with Kryo on the
>>>>>>> executors? Where should I even start looking to try to diagnose the
>>>>>>> problem? I appreciate any help you can provide.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> -- Nick
>>>>>>>
>>>>>>
>>>>>>
>>>>
>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nicholas R. Peterson
)
at 
com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
... 38 more
Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
... 47 more



> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com> wrote:
>
>> I wouldn't build on this. local mode & yarn are different so that jars
>> you use in spark submit are handled differently
>>
>> On 8 September 2015 at 15:43, Nicholas R. Peterson <nrpeter...@gmail.com>
>> wrote:
>>
>>> Thans, Igor; I've got it running again right now, and can attach the
>>> stack trace when it finishes.
>>>
>>> In the mean time, I've noticed something interesting: in the Spark UI,
>>> the application jar that I submit is not being included on the classpath.
>>> It has been successfully uploaded to the nodes -- in the nodemanager
>>> directory for the application, I see __app__.jar and __spark__.jar.  The
>>> directory itself is on the classpath, and __spark__.jar and __hadoop_conf__
>>> are as well.  When I do everything the same but switch the master to
>>> local[*], the jar I submit IS added to the classpath.
>>>
>>> This seems like a likely culprit.  What could cause this, and how can I
>>> fix it?
>>>
>>> Best,
>>> Nick
>>>
>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>> wrote:
>>>
>>>> as a starting point, attach your stacktrace...
>>>> ps: look for duplicates in your classpath, maybe you include another
>>>> jar with same class
>>>>
>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>> nrpeter...@gmail.com> wrote:
>>>>
>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through
>>>>> Yarn. Serialization is set to use Kryo.
>>>>>
>>>>> I have a large object which I send to the executors as a Broadcast.
>>>>> The object seems to serialize just fine. When it attempts to deserialize,
>>>>> though, Kryo throws a ClassNotFoundException... for a class that I include
>>>>> in the fat jar that I spark-submit.
>>>>>
>>>>> What could be causing this classpath issue with Kryo on the executors?
>>>>> Where should I even start looking to try to diagnose the problem? I
>>>>> appreciate any help you can provide.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> -- Nick
>>>>>
>>>>
>>>>
>>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
olver.java:115)
>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>   at 
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>   at 
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>   at 
> com.lumiata.patientanalysis.utils.CachedGraph.loadCacheFromSerializedData(CachedGraph.java:221)
>   at 
> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:182)
>   at 
> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
>   ... 38 more
> Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>   ... 47 more
>
>
>
>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com> wrote:
>>
>>> I wouldn't build on this. local mode & yarn are different so that jars
>>> you use in spark submit are handled differently
>>>
>>> On 8 September 2015 at 15:43, Nicholas R. Peterson <nrpeter...@gmail.com
>>> > wrote:
>>>
>>>> Thans, Igor; I've got it running again right now, and can attach the
>>>> stack trace when it finishes.
>>>>
>>>> In the mean time, I've noticed something interesting: in the Spark UI,
>>>> the application jar that I submit is not being included on the classpath.
>>>> It has been successfully uploaded to the nodes -- in the nodemanager
>>>> directory for the application, I see __app__.jar and __spark__.jar.  The
>>>> directory itself is on the classpath, and __spark__.jar and __hadoop_conf__
>>>> are as well.  When I do everything the same but switch the master to
>>>> local[*], the jar I submit IS added to the classpath.
>>>>
>>>> This seems like a likely culprit.  What could cause this, and how can I
>>>> fix it?
>>>>
>>>> Best,
>>>> Nick
>>>>
>>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>>> wrote:
>>>>
>>>>> as a starting point, attach your stacktrace...
>>>>> ps: look for duplicates in your classpath, maybe you include another
>>>>> jar with same class
>>>>>
>>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>>> nrpeter...@gmail.com> wrote:
>>>>>
>>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through
>>>>>> Yarn. Serialization is set to use Kryo.
>>>>>>
>>>>>> I have a large object which I send to the executors as a Broadcast.
>>>>>> The object seems to serialize just fine. When it attempts to deserialize,
>>>>>> though, Kryo throws a ClassNotFoundException... for a class that I 
>>>>>> include
>>>>>> in the fat jar that I spark-submit.
>>>>>>
>>>>>> What could be causing this classpath issue with Kryo on the
>>>>>> executors? Where should I even start looking to try to diagnose the
>>>>>> problem? I appreciate any help you can provide.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> -- Nick
>>>>>>
>>>>>
>>>>>
>>>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
yoDeserializationStream.readObject(KryoSerializer.scala:182)
>>> at 
>>> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217)
>>> at 
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1254)
>>> ... 24 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at 
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>> at 
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>> at 
>>> com.twitter.chill.Instantiators$$anonfun$normalJava$1.apply(KryoBase.scala:160)
>>> at 
>>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:123)
>>> ... 32 more
>>> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: 
>>> com.i2028.Document.Document
>>> at 
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>> at 
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>>> at 
>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>>> at 
>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>>> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>>> at 
>>> com.lumiata.patientanalysis.utils.CachedGraph.loadCacheFromSerializedData(CachedGraph.java:221)
>>> at 
>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:182)
>>> at 
>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
>>> ... 38 more
>>> Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:348)
>>> at 
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>>> ... 47 more
>>>
>>>
>>>
>>>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com>
>>>> wrote:
>>>>
>>>>> I wouldn't build on this. local mode & yarn are different so that jars
>>>>> you use in spark submit are handled differently
>>>>>
>>>>> On 8 September 2015 at 15:43, Nicholas R. Peterson <
>>>>> nrpeter...@gmail.com> wrote:
>>>>>
>>>>>> Thans, Igor; I've got it running again right now, and can attach the
>>>>>> stack trace when it finishes.
>>>>>>
>>>>>> In the mean time, I've noticed something interesting: in the Spark
>>>>>> UI, the application jar that I submit is not being included on the
>>>>>> classpath.  It has been successfully uploaded to the nodes -- in the
>>>>>> nodemanager directory for the application, I see __app__.jar and
>>>>>> __spark__.jar.  The directory itself is on the classpath, and 
>>>>>> __spark__.jar
>>>>>> and __hadoop_conf__ are as well.  When I do everything the same but
>>>>>> switch the master to local[*], the jar I submit IS added to the 
>>>>>> classpath.
>>>>>>
>>>>>> This seems like a likely culprit.  What could cause this, and how can
>>>>>> I fix it?
>>>>>>
>>>>>> Best,
>>>>>> Nick
>>>>>>
>>>>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> as a starting point, attach your stacktrace...
>>>>>>> ps: look for duplicates in your classpath, maybe you include another
>>>>>>> jar with same class
>>>>>>>
>>>>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through
>>>>>>>> Yarn. Serialization is set to use Kryo.
>>>>>>>>
>>>>>>>> I have a large object which I send to the executors as a Broadcast.
>>>>>>>> The object seems to serialize just fine. When it attempts to 
>>>>>>>> deserialize,
>>>>>>>> though, Kryo throws a ClassNotFoundException... for a class that I 
>>>>>>>> include
>>>>>>>> in the fat jar that I spark-submit.
>>>>>>>>
>>>>>>>> What could be causing this classpath issue with Kryo on the
>>>>>>>> executors? Where should I even start looking to try to diagnose the
>>>>>>>> problem? I appreciate any help you can provide.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> -- Nick
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
.scheduler.ResultTask.runTask(ResultTask.scala:63)
>>>>>   at org.apache.spark.scheduler.Task.run(Task.scala:70)
>>>>>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>   at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: com.esotericsoftware.kryo.KryoException: Error constructing 
>>>>> instance of class: com.lumiata.patientanalysis.utils.CachedGraph
>>>>>   at 
>>>>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:126)
>>>>>   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1065)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
>>>>>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>>>>   at 
>>>>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:182)
>>>>>   at 
>>>>> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217)
>>>>>   at 
>>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>>>>>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1254)
>>>>>   ... 24 more
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>>>   at 
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>>   at 
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>>>>   at 
>>>>> com.twitter.chill.Instantiators$$anonfun$normalJava$1.apply(KryoBase.scala:160)
>>>>>   at 
>>>>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:123)
>>>>>   ... 32 more
>>>>> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: 
>>>>> com.i2028.Document.Document
>>>>>   at 
>>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>>>>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>>>>>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>>>>>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>>>>>   at 
>>>>> com.lumiata.patientanalysis.utils.CachedGraph.loadCacheFromSerializedData(CachedGraph.java:221)
>>>>>   at 
>>>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:182)
>>>>>   at 
>>>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
>>>>>   ... 38 more
>>>>> Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
>>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>   at java.lang.Class.forName0(Native Method)
>>>>>   at java.lang.Class.forName(Class.java:348)
>>>>>   at 
>>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>>>>>   ... 47 more
>>>>>
>>>>>
>>>>>
>>>>>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
lass: com.lumiata.patientanalysis.utils.CachedGraph
>>>>at 
>>>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:126)
>>>>at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1065)
>>>>at 
>>>> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
>>>>at 
>>>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
>>>>at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>>>at 
>>>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:182)
>>>>at 
>>>> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217)
>>>>at 
>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>>>>at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1254)
>>>>... 24 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>>at 
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>at 
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>>>at 
>>>> com.twitter.chill.Instantiators$$anonfun$normalJava$1.apply(KryoBase.scala:160)
>>>>at 
>>>> com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:123)
>>>>... 32 more
>>>> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: 
>>>> com.i2028.Document.Document
>>>>at 
>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>>>at 
>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>>>at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>>>>at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>>>>at 
>>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>>>>at 
>>>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>>>>at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>>>>at 
>>>> com.lumiata.patientanalysis.utils.CachedGraph.loadCacheFromSerializedData(CachedGraph.java:221)
>>>>at 
>>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:182)
>>>>at 
>>>> com.lumiata.patientanalysis.utils.CachedGraph.(CachedGraph.java:178)
>>>>... 38 more
>>>> Caused by: java.lang.ClassNotFoundException: com.i2028.Document.Document
>>>>at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>at java.lang.Class.forName0(Native Method)
>>>>at java.lang.Class.forName(Class.java:348)
>>>>at 
>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>>>>... 47 more
>>>>
>>>>
>>>>
>>>>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I wouldn't build on this. local mode & yarn are different so that
>>>>>> jars you use in spark submit are handled differently
>>>>>>
>>>>>> On 8 September 2015 at 15:43, Nicholas R. Peterson <
>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>
>>>>>>> Thans, Igor; I've got it running again right now, and can attach the
>>>>>>> stack trace when it finishes.
>>>>>>>
>>>>>>> In the mean time, I've noticed something interesting: in the Spark
>>>>>>> UI, the application jar that I submit is not being included on the
>>>>>>> classpath.  It has been successfully uploaded to the nodes -- in the
>>>>>>> nodemanager directory for the application, I see __app__.jar and
>>>>>>> __spark__.jar.  The directory itself is on the classpath, and 
>>>>>>> __spark__.jar
>>>>>>> and __hadoop_conf__ are as well.  When I do everything the same but
>>>>>>> switch the master to local[*], the jar I submit IS added to the 
>>>>>>> classpath.
>>>>>>>
>>>>>>> This seems like a likely culprit.  What could cause this, and how
>>>>>>> can I fix it?
>>>>>>>
>>>>>>> Best,
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> as a starting point, attach your stacktrace...
>>>>>>>> ps: look for duplicates in your classpath, maybe you include
>>>>>>>> another jar with same class
>>>>>>>>
>>>>>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through
>>>>>>>>> Yarn. Serialization is set to use Kryo.
>>>>>>>>>
>>>>>>>>> I have a large object which I send to the executors as a
>>>>>>>>> Broadcast. The object seems to serialize just fine. When it attempts 
>>>>>>>>> to
>>>>>>>>> deserialize, though, Kryo throws a ClassNotFoundException... for a 
>>>>>>>>> class
>>>>>>>>> that I include in the fat jar that I spark-submit.
>>>>>>>>>
>>>>>>>>> What could be causing this classpath issue with Kryo on the
>>>>>>>>> executors? Where should I even start looking to try to diagnose the
>>>>>>>>> problem? I appreciate any help you can provide.
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>>
>>>>>>>>> -- Nick
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>
>


Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nick Peterson
ocument.Document
>>>>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>  at java.lang.Class.forName0(Native Method)
>>>>>>  at java.lang.Class.forName(Class.java:348)
>>>>>>  at 
>>>>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>>>>>>  ... 47 more
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Tue, Sep 8, 2015 at 6:01 AM Igor Berman <igor.ber...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I wouldn't build on this. local mode & yarn are different so that
>>>>>>>> jars you use in spark submit are handled differently
>>>>>>>>
>>>>>>>> On 8 September 2015 at 15:43, Nicholas R. Peterson <
>>>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thans, Igor; I've got it running again right now, and can attach
>>>>>>>>> the stack trace when it finishes.
>>>>>>>>>
>>>>>>>>> In the mean time, I've noticed something interesting: in the Spark
>>>>>>>>> UI, the application jar that I submit is not being included on the
>>>>>>>>> classpath.  It has been successfully uploaded to the nodes -- in the
>>>>>>>>> nodemanager directory for the application, I see __app__.jar and
>>>>>>>>> __spark__.jar.  The directory itself is on the classpath, and 
>>>>>>>>> __spark__.jar
>>>>>>>>> and __hadoop_conf__ are as well.  When I do everything the same
>>>>>>>>> but switch the master to local[*], the jar I submit IS added to the
>>>>>>>>> classpath.
>>>>>>>>>
>>>>>>>>> This seems like a likely culprit.  What could cause this, and how
>>>>>>>>> can I fix it?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Sep 8, 2015 at 1:14 AM Igor Berman <igor.ber...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> as a starting point, attach your stacktrace...
>>>>>>>>>> ps: look for duplicates in your classpath, maybe you include
>>>>>>>>>> another jar with same class
>>>>>>>>>>
>>>>>>>>>> On 8 September 2015 at 06:38, Nicholas R. Peterson <
>>>>>>>>>> nrpeter...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster,
>>>>>>>>>>> through Yarn. Serialization is set to use Kryo.
>>>>>>>>>>>
>>>>>>>>>>> I have a large object which I send to the executors as a
>>>>>>>>>>> Broadcast. The object seems to serialize just fine. When it 
>>>>>>>>>>> attempts to
>>>>>>>>>>> deserialize, though, Kryo throws a ClassNotFoundException... for a 
>>>>>>>>>>> class
>>>>>>>>>>> that I include in the fat jar that I spark-submit.
>>>>>>>>>>>
>>>>>>>>>>> What could be causing this classpath issue with Kryo on the
>>>>>>>>>>> executors? Where should I even start looking to try to diagnose the
>>>>>>>>>>> problem? I appreciate any help you can provide.
>>>>>>>>>>>
>>>>>>>>>>> Thank you!
>>>>>>>>>>>
>>>>>>>>>>> -- Nick
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>


Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-07 Thread Nicholas R. Peterson
I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn.
Serialization is set to use Kryo.

I have a large object which I send to the executors as a Broadcast. The
object seems to serialize just fine. When it attempts to deserialize,
though, Kryo throws a ClassNotFoundException... for a class that I include
in the fat jar that I spark-submit.

What could be causing this classpath issue with Kryo on the executors?
Where should I even start looking to try to diagnose the problem? I
appreciate any help you can provide.

Thank you!

-- Nick


Strange ClassNotFoundException in spark-shell

2015-08-24 Thread Jan Algermissen
Hi,

I am using spark 1.4 M1 with the Cassandra Connector and run into a strange 
error when using the spark shell.

This works:

sc.cassandraTable(events, 
bid_events).select(bid,type).take(10).foreach(println)


But as soon as I put a map() in there (or filter):

sc.cassandraTable(events, bid_events).select(bid,type).map(r = 
r).take(10).foreach(println)


I get the exception below.

The spark-shell call is:

/opt/spark/bin/spark-shell --master spark://x-1:7077 --conf 
spark.cassandra.connection.host=$(hostname -i) --driver-class-path $(echo 
/root/*.jar |sed 's/ /:/g') --jar 
spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar

Can anyone provide ideas how to approach debugging this?

Jan


15/08/24 23:54:43 INFO DAGScheduler: Job 0 failed: take at console:32, took 
1.999875 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, 10.31.39.116): java.lang.ClassNotFoundException: $anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:66)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
  at scala.Option.foreach(Option.scala:257)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

scala 
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: log4j custom appender ClassNotFoundException with spark 1.4.1

2015-08-07 Thread mlemay
One possible solution is to spark-submit with --driver-class-path and list
all recursive dependencies.  This is fragile and error prone.

Non-working alternatives (used in SparkSubmit.scala AFTER arguments parser
is initialized):

spark-submit --packages ...
spark-submit --jars ...
spark-defaults.conf (spark.driver.extraJavaOptions, spark.jars,
spark.driver.extraClassPath,
...)

On Fri, Aug 7, 2015 at 8:57 AM, mlemay [via Apache Spark User List] 
ml-node+s1001560n24169...@n3.nabble.com wrote:

 That starts to smell...

 When analyzing SparkSubmit.scala, we can see than one of the firsts thing
 it does is to parse arguments. This uses Utils object and triggers
 initialization of member variables.  One such variable is
 ShutdownHookManager (which didn't exists in spark 1.3) with the later log4j
 initialization.

 setContextClassLoader is set only a few steps after argument parsing in
 submit  doRunMain  runMain..

 That pretty much sums it up:
 spark.util.Utils has a new static dependency on log4j that triggers it's
 initialization before the call to
 setContextClassLoader(MutableURLClassLoader)

 Anyone has a workaround to make this work in 1.4.1?


 --
 If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-4-1-tp24159p24169.html
 To unsubscribe from log4j custom appender ClassNotFoundException with
 spark 1.4.1, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=24159code=bWxlbWF5QGdtYWlsLmNvbXwyNDE1OXwtMTk2MTgzMjQzNg==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-4-1-tp24159p24170.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: log4j custom appender ClassNotFoundException with spark 1.4.1

2015-08-07 Thread mlemay
Offending commit is :

[SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.
https://github.com/apache/spark/commit/e72c16e30d85cdc394d318b5551698885cfda9b8





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-4-1-tp24159p24171.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: log4j custom appender ClassNotFoundException with spark 1.4.1

2015-08-07 Thread mlemay
Looking at the callstack and diffs between 1.3.1 and 1.4.1-rc4, I see
something that could be relevant to the issue.

1) Callstack tells that log4j manager gets initialized and uses default java
context class loader. This context class loader should probably be
MutableURLClassLoader from spark but it's not.  We can assume that
currentThread.setContextClassLoader has not been called yet.

2) Still in the callstack, we can see that ShutdownHookManager is the class
object responsible to trigger log4j initialization.

3) Looking at the diffs between 1.3 and 1.4, we can see that this
ShutdownHookManager is a new class object.

With this information, is it possible that ShutdownHookManager makes log4j
initialize too early?  By that, I mean before spark gets the chance to set
it's MutableURLClassLoader on thread context?

Let me know if it does not make sense.

Mike




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-4-1-tp24159p24168.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: log4j custom appender ClassNotFoundException with spark 1.4.1

2015-08-07 Thread mlemay
That starts to smell...

When analyzing SparkSubmit.scala, we can see than one of the firsts thing it
does is to parse arguments. This uses Utils object and triggers
initialization of member variables.  One such variable is
ShutdownHookManager (which didn't exists in spark 1.3) with the later log4j
initialization.

setContextClassLoader is set only a few steps after argument parsing in
submit  doRunMain  runMain..

That pretty much sums it up: 
spark.util.Utils has a new static dependency on log4j that triggers it's
initialization before the call to
setContextClassLoader(MutableURLClassLoader)

Anyone has a workaround to make this work in 1.4.1? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/log4j-custom-appender-ClassNotFoundException-with-spark-1-4-1-tp24159p24169.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to use KryoSerializer : ClassNotFoundException

2015-06-24 Thread pth001

Hi,

I am using spark 1.4. I wanted to serialize by KryoSerializer, but got 
ClassNotFoundException. The configuration and exception is below. When I 
submitted the job, I also provided --jars mylib.jar which contains 
WRFVariableZ.


conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
conf.registerKryoClasses(Array(classOf[WRFVariableZ]))

Exception in thread main org.apache.spark.SparkException: Failed to 
register classes with Kryo
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:114)
Caused by: java.lang.ClassNotFoundException: 
no.uni.computing.io.WRFVariableZ


How can I configure it?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Running spark1.4 inside intellij idea HttpServletResponse - ClassNotFoundException

2015-06-15 Thread Wwh 吴
name := SparkLeaning

version := 1.0

scalaVersion := 2.10.4
//scalaVersion := 2.11.2

libraryDependencies ++= Seq(
  //org.apache.hive% hive-jdbc % 0.13.0
  //io.spray % spray-can % 1.3.1,
  //io.spray % spray-routing % 1.3.1,
  io.spray % spray-testkit % 1.3.1 % test,
  io.spray %% spray-json % 1.2.6,
  com.typesafe.akka %% akka-actor % 2.3.2,
  com.typesafe.akka %% akka-testkit % 2.3.2 % test,
  org.scalatest %% scalatest % 2.2.0,
  org.apache.spark %% spark-core % 1.4.0,
  org.apache.spark %% spark-sql % 1.4.0,
  org.apache.spark %% spark-hive % 1.4.0,
  org.apache.spark %% spark-mllib % 1.4.0,
  //org.apache.hadoop %% hadoop-client % 2.4.0
  javax.servlet % javax.servlet-api % 3.0.1//,
  //org.eclipse.jetty%%jetty-servlet%8.1.14.v20131031,
  //org.eclipse.jetty.orbit%javax.servlet%3.0.0.v201112011016
  //org.mortbay.jetty%%servlet-api%3.0.20100224

)object SparkPI {
  def main(args:Array[String]): Unit = {
val conf = new SparkConf().setAppName(Spark Pi)
conf.setMaster(local)

val spark = new SparkContext(conf)
val slices = if (args.length  0)args(0).toInt else 2
val n = 10 * slices
val count = spark.parallelize(1 to n, slices).map{ i =
  val x = random * 2 -1
  val y = random * 2 -1
  if (x*x + y*y  1) 1 else 0
}.reduce(_ + _)
println(Pi is roughly + 4.0 * count / n)
spark.stop()
  }
}when Running this program,something is error! help me?15/06/15 21:40:08 INFO 
HttpServer: Starting HTTP Server
Exception in thread main java.lang.NoClassDefFoundError: 
javax/servlet/http/HttpServletResponse
at 
org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75)
at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.HttpServer.start(HttpServer.scala:62)
at org.apache.spark.HttpFileServer.initialize(HttpFileServer.scala:46)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:350)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.init(SparkContext.scala:424)
at org.learn.SparkPI$.main(SparkPI.scala:24)
at org.learn.SparkPI.main(SparkPI.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.ClassNotFoundException: 
javax.servlet.http.HttpServletResponse
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 19 more
15/06/15 21:40:08 INFO DiskBlockManager: Shutdown hook called
  

Re: Running spark1.4 inside intellij idea HttpServletResponse - ClassNotFoundException

2015-06-15 Thread Tarek Auel
Hey,

I had some similar issues in the past when I used Java 8. Are you using
Java 7 or 8. (it's just an idea, because I had a similar issue)
On Mon 15 Jun 2015 at 6:52 am Wwh 吴 wwyando...@hotmail.com wrote:

 name := SparkLeaning

 version := 1.0

 scalaVersion := 2.10.4
 //scalaVersion := 2.11.2

 libraryDependencies ++= Seq(
   //org.apache.hive% hive-jdbc % 0.13.0
   //io.spray % spray-can % 1.3.1,
   //io.spray % spray-routing % 1.3.1,
   io.spray % spray-testkit % 1.3.1 % test,
   io.spray %% spray-json % 1.2.6,
   com.typesafe.akka %% akka-actor % 2.3.2,
   com.typesafe.akka %% akka-testkit % 2.3.2 % test,
   org.scalatest %% scalatest % 2.2.0,
   org.apache.spark %% spark-core % 1.4.0,
   org.apache.spark %% spark-sql % 1.4.0,
   org.apache.spark %% spark-hive % 1.4.0,
   org.apache.spark %% spark-mllib % 1.4.0,
   //org.apache.hadoop %% hadoop-client % 2.4.0
   javax.servlet % javax.servlet-api % 3.0.1//,
   //org.eclipse.jetty%%jetty-servlet%8.1.14.v20131031,
   //org.eclipse.jetty.orbit%javax.servlet%3.0.0.v201112011016
   //org.mortbay.jetty%%servlet-api%3.0.20100224

 )

 object SparkPI {
   def main(args:Array[String]): Unit = {
 val conf = new SparkConf().setAppName(Spark Pi)
 conf.setMaster(local)

 val spark = new SparkContext(conf)
 val slices = if (args.length  0)args(0).toInt else 2
 val n = 10 * slices
 val count = spark.parallelize(1 to n, slices).map{ i =
   val x = random * 2 -1
   val y = random * 2 -1
   if (x*x + y*y  1) 1 else 0
 }.reduce(_ + _)
 println(Pi is roughly + 4.0 * count / n)
 spark.stop()
   }
 }

 when Running this program,something is error! help me?

 15/06/15 21:40:08 INFO HttpServer: Starting HTTP Server
 Exception in thread main java.lang.NoClassDefFoundError: 
 javax/servlet/http/HttpServletResponse
   at 
 org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75)
   at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
   at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
   at 
 org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
   at org.apache.spark.HttpServer.start(HttpServer.scala:62)
   at org.apache.spark.HttpFileServer.initialize(HttpFileServer.scala:46)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:350)
   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
   at org.apache.spark.SparkContext.init(SparkContext.scala:424)
   at org.learn.SparkPI$.main(SparkPI.scala:24)
   at org.learn.SparkPI.main(SparkPI.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.lang.ClassNotFoundException: 
 javax.servlet.http.HttpServletResponse
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   ... 19 more
 15/06/15 21:40:08 INFO DiskBlockManager: Shutdown hook called





Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-11 Thread Josh Mahonin
.
   
I've only tested this in local mode. To convert it to a full jobs
 JAR, I
suspect that keeping all of the spark and phoenix dependencies
 marked as
'provided', and including the Phoenix client JAR in the Spark
 classpath
would work as well.
   
Good luck,
   
Josh
   
On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek j.v...@work.nl wrote:
 Hi,

 I posted a question with regards to Phoenix and Spark Streaming on
 StackOverflow [1]. Please find a copy of the question to this email
  
   below
  
 the
 first stack trace. I also already contacted the Phoenix mailing
 list
  
   and
  
 tried
 the suggestion of setting spark.driver.userClassPathFirst.
  
   Unfortunately
  
 that
 only pushed me further into the dependency hell, which I tried to
  
   resolve
  
 until I hit a wall with an UnsatisfiedLinkError on Snappy.

 What I am trying to achieve: To save a stream from Kafka into
 Phoenix/Hbase
 via Spark Streaming. I'm using MapR as a platform and the original
 exception
 happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
 experimentation), in YARN and stand-alone mode. Further
 experimentation
 (like
 the saveAsNewHadoopApiFile below), was done only on the sandbox in
 standalone
 mode.

 Phoenix only supports Spark from 4.4.0 onwards, but I thought I
 could
 use a naive implementation that creates a new connection for
 every RDD from the DStream in 4.3.1.  This resulted in the
 ClassNotFoundException described in [1], so I switched to 4.4.0.

 Unfortunately the saveToPhoenix method is only available in Scala.
 So
 I
 did
 find the suggestion to try it via the saveAsNewHadoopApiFile method
 [2]
 and an
 example implementation [3], which I adapted to my own needs.

 However, 4.4.0 + saveAsNewHadoopApiFile  raises the same

 ClassNotFoundExeption, just a slightly different stacktrace:
   java.lang.RuntimeException: java.sql.SQLException: ERROR 103

 (08004): Unable to establish connection.

 at
  
  
 org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOu
  
 tputFormat.java:58)

 at
  
  
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
  
 cala:995)

 at
  
  
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
  
 cala:979)

 at

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)

 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at


 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

 at
  
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
 1145)

 at
  
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
  
 :615)
 :
 at java.lang.Thread.run(Thread.java:745)

 Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to
 establish connection.

 at
  
  
 org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLEx
  
 ceptionCode.java:386)

 at
  
  
 org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionI
  
 nfo.java:145)

 at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(Connec
  
 tionQueryServicesImpl.java:288)

 at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(Connection
  
 QueryServicesImpl.java:171)

 at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
  
 ryServicesImpl.java:1881)

 at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
  
 ryServicesImpl.java:1860)

 at
  
  
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor
  
 .java:77)

 at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryS
  
 ervicesImpl.java:1860)

 at
  
  
 org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDr
  
 iver.java:162)

 at
  
  
 org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDrive
  
 r.java:131)

 at


 org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)

 at

 java.sql.DriverManager.getConnection(DriverManager.java:571)

 at

 java.sql.DriverManager.getConnection(DriverManager.java:187)

 at
  
  
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionU
  
 til.java:92)

 at
  
  
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(Conne

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-11 Thread Jeroen Vlek
 with an UnsatisfiedLinkError on Snappy.

What I am trying to achieve: To save a stream from Kafka into
Phoenix/Hbase
via Spark Streaming. I'm using MapR as a platform and the original
exception
happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
experimentation), in YARN and stand-alone mode. Further
experimentation
(like
the saveAsNewHadoopApiFile below), was done only on the sandbox in
standalone
mode.

Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
use a naive implementation that creates a new connection for
every RDD from the DStream in 4.3.1.  This resulted in the
ClassNotFoundException described in [1], so I switched to 4.4.0.

Unfortunately the saveToPhoenix method is only available in Scala. So
I
did
find the suggestion to try it via the saveAsNewHadoopApiFile method
[2]
and an
example implementation [3], which I adapted to my own needs.

However, 4.4.0 + saveAsNewHadoopApiFile  raises the same

ClassNotFoundExeption, just a slightly different stacktrace:
  java.lang.RuntimeException: java.sql.SQLException: ERROR 103

(08004): Unable to establish connection.

at
  
  org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOu
  
tputFormat.java:58)

at
  
  org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
  
cala:995)

at
  
  org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
  
cala:979)

at

org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)

at org.apache.spark.scheduler.Task.run(Task.scala:64)
at

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

at
  
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1145)

at
  
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
  
:615)
:
at java.lang.Thread.run(Thread.java:745)

Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to
establish connection.

at
  
  org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLEx
  
ceptionCode.java:386)

at
  
  org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionI
  
nfo.java:145)

at
  
  org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(Connec
  
tionQueryServicesImpl.java:288)

at
  
  org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(Connection
  
QueryServicesImpl.java:171)

at
  
  org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
  
ryServicesImpl.java:1881)

at
  
  org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
  
ryServicesImpl.java:1860)

at
  
  org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor
  
.java:77)

at
  
  org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryS
  
ervicesImpl.java:1860)

at
  
  org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDr
  
iver.java:162)

at
  
  org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDrive
  
r.java:131)

at

org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)

at

java.sql.DriverManager.getConnection(DriverManager.java:571)

at

java.sql.DriverManager.getConnection(DriverManager.java:187)

at
  
  org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionU
  
til.java:92)

at
  
  org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(Conne
  
ctionUtil.java:80)

at
  
  org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(Conne
  
ctionUtil.java:68)

at
  
  org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWrite
  
r.java:49)

at
  
  org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOu
  
tputFormat.java:55)

... 8 more

Caused by: java.io.IOException:
java.lang.reflect.InvocationTargetException

at
  
  org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnec
  
tionManager.java:457)

at
  
  org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnec
  
tionManager.java:350)

at
  
  org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createC
  
onnection(HConnectionFactory.java:47

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Josh Mahonin
 that creates a new connection for
   every RDD from the DStream in 4.3.1.  This resulted in the
   ClassNotFoundException described in [1], so I switched to 4.4.0.
  
   Unfortunately the saveToPhoenix method is only available in Scala. So I
   did
   find the suggestion to try it via the saveAsNewHadoopApiFile method [2]
   and an
   example implementation [3], which I adapted to my own needs.
  
   However, 4.4.0 + saveAsNewHadoopApiFile  raises the same
  
   ClassNotFoundExeption, just a slightly different stacktrace:
 java.lang.RuntimeException: java.sql.SQLException: ERROR 103
  
   (08004): Unable to establish connection.
  
   at
  
  
 org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOu
   tputFormat.java:58)
   at
  
  
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
   cala:995)
   at
  
  
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.s
   cala:979)
   at
  
   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at
  
   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
  
   at
  
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
   1145)
   at
  
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
   :615)
   at java.lang.Thread.run(Thread.java:745)
  
   Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to
   establish connection.
  
   at
  
  
 org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLEx
   ceptionCode.java:386)
   at
  
  
 org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionI
   nfo.java:145)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(Connec
   tionQueryServicesImpl.java:288)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(Connection
   QueryServicesImpl.java:171)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
   ryServicesImpl.java:1881)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQue
   ryServicesImpl.java:1860)
   at
  
  
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor
   .java:77)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryS
   ervicesImpl.java:1860)
   at
  
  
 org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDr
   iver.java:162)
   at
  
  
 org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDrive
   r.java:131)
   at
  
   org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
  
   at
  
   java.sql.DriverManager.getConnection(DriverManager.java:571)
  
   at
  
   java.sql.DriverManager.getConnection(DriverManager.java:187)
  
   at
  
  
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionU
   til.java:92)
   at
  
  
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(Conne
   ctionUtil.java:80)
   at
  
  
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(Conne
   ctionUtil.java:68)
   at
  
  
 org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWrite
   r.java:49)
   at
  
  
 org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOu
   tputFormat.java:55)
   ... 8 more
  
   Caused by: java.io.IOException:
   java.lang.reflect.InvocationTargetException
  
   at
  
  
 org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnec
   tionManager.java:457)
   at
  
  
 org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnec
   tionManager.java:350)
   at
  
  
 org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createC
   onnection(HConnectionFactory.java:47)
   at
  
  
 org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(Connec
   tionQueryServicesImpl.java:286)
   ... 23 more
  
   Caused by: java.lang.reflect.InvocationTargetException
  
   at
  
   sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
   Method)
  
   at
  
  
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcc
   essorImpl.java:57)
   at
  
  
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstr
   uctorAccessorImpl.java:45)
   at
  
   java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  
   at
  
  
 org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnec
   tionManager.java:455)
   ... 26 more
  
   Caused by: java.lang.UnsupportedOperationException: Unable to find

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh,

Thank you for your effort. Looking at your code, I feel that mine is 
semantically the same, except written in Java. The dependencies in the pom.xml 
all have the scope provided. The job is submitted as follows:

$ rm spark.log  MASTER=spark://maprdemo:7077 
/opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars 
/home/mapr/projects/customer/lib/spark-streaming-
kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.jar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/customer/lib/metrics-
core-3.1.0.jar,/home/mapr/projects/customer/lib/metrics-
core-2.2.0.jar,lib/spark-sql_2.10-1.3.1.jar,/opt/mapr/phoenix/phoenix-4.4.0-
HBase-0.98-bin/phoenix-4.4.0-HBase-0.98-client.jar --class 
nl.work.kafkastreamconsumer.phoenix.KafkaPhoenixConnector 
KafkaStreamConsumer.jar maprdemo:5181 0 topic jdbc:phoenix:maprdemo:5181 true

The spark-defaults.conf is reverted back to its defaults (i.e. no 
userClassPathFirst). In the catch-block of the Phoenix connection buildup  the 
class path is printed by recursively iterating over the class loaders. The 
first one already prints the phoenix-client jar [1]. It's also very unlikely to 
be a bug in Spark or Phoenix, if your proof-of-concept just works.

So if the JAR that contains the offending class is known by the class loader, 
then that might indicate that there's a second JAR providing the same class 
but with a different version, right? 
Yet, the only Phoenix JAR on the whole class path hierarchy is the 
aforementioned phoenix-client JAR. Furthermore, I googled the class in 
question, ClientRpcControllerFactory, and it really only exists in the Phoenix 
project. We're not talking about some low-level AOP Alliance stuff here ;)

Maybe I'm missing some fundamental class loading knowledge, in that case I'd 
be very happy to be enlightened. This all seems very strange.

Cheers,
Jeroen

[1]  [file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
streaming-kafka_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./kafka_2.10-0.8.1.1.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./zkclient-0.3.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./phoenix-4.4.0-
HBase-0.98-client.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./spark-
sql_2.10-1.3.1.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-3.1.0.jar, 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./KafkaStreamConsumer.jar,
 
file:/opt/mapr/spark/spark-1.3.1/tmp/app-20150610010512-0001/0/./metrics-
core-2.2.0.jar]


On Tuesday, June 09, 2015 11:18:08 AM Josh Mahonin wrote:
 This may or may not be helpful for your classpath issues, but I wanted to
 verify that basic functionality worked, so I made a sample app here:
 
 https://github.com/jmahonin/spark-streaming-phoenix
 
 This consumes events off a Kafka topic using spark streaming, and writes
 out event counts to Phoenix using the new phoenix-spark functionality:
 http://phoenix.apache.org/phoenix_spark.html
 
 It's definitely overkill, and would probably be more efficient to use the
 JDBC driver directly, but it serves as a proof-of-concept.
 
 I've only tested this in local mode. To convert it to a full jobs JAR, I
 suspect that keeping all of the spark and phoenix dependencies marked as
 'provided', and including the Phoenix client JAR in the Spark classpath
 would work as well.
 
 Good luck,
 
 Josh
 
 On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek j.v...@work.nl wrote:
  Hi,
  
  I posted a question with regards to Phoenix and Spark Streaming on
  StackOverflow [1]. Please find a copy of the question to this email below
  the
  first stack trace. I also already contacted the Phoenix mailing list and
  tried
  the suggestion of setting spark.driver.userClassPathFirst. Unfortunately
  that
  only pushed me further into the dependency hell, which I tried to resolve
  until I hit a wall with an UnsatisfiedLinkError on Snappy.
  
  What I am trying to achieve: To save a stream from Kafka into
  Phoenix/Hbase
  via Spark Streaming. I'm using MapR as a platform and the original
  exception
  happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
  experimentation), in YARN and stand-alone mode. Further experimentation
  (like
  the saveAsNewHadoopApiFile below), was done only on the sandbox in
  standalone
  mode.
  
  Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
  use a naive implementation that creates a new connection for
  every RDD from the DStream in 4.3.1.  This resulted in the
  ClassNotFoundException described in [1], so I switched to 4.4.0.
  
  Unfortunately the saveToPhoenix method is only available in Scala. So I
  did
  find the suggestion to try it via the saveAsNewHadoopApiFile method [2]
  and an
  example implementation [3], which I adapted to my own needs.
  
  However, 4.4.0 + saveAsNewHadoopApiFile  raises the same

  1   2   >