RE: ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1
Ok, it just seems to be an issue with the syntax of the spark-submit command. It should be : spark-submit --queue default \ --class com.my.Launcher \ --deploy-mode cluster \ --master yarn-cluster \ --driver-java-options "-Dfile.encoding=UTF-8" \ --jars /home/user/hibernate-validator-5.2.2.Final.jar \ --driver-class-path hibernate-validator-5.2.2.Final.jar \ --conf "spark.executor.extraClassPath=hibernate -validator-5.2.2.Final.jar" \ /home/user/uberjar-job.jar I also have to add some others jars, like jboss-logging to meet the needs of hibernate-validator. De : PICARD Damien (EXT) AssuResPriSmsAts Envoyé : lundi 11 septembre 2017 08:53 À : 'user@spark.apache.org' Objet : ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1 Hi ! I'm facing a Classloader problem using Spark 1.5.1 I use javax.validation and hibernate validation annotations on some of my beans : @NotBlank @Valid private String attribute1 ; @Valid private String attribute2 ; When Spark tries to unmarshall these beans (after a remote RDD), I get the ClassNotFoundException : 17/09/07 09:19:25 INFO storage.BlockManager: Found block rdd_8_1 remotely 17/09/07 09:19:25 ERROR executor.Executor: Exception in task 3.0 in stage 2.0 (TID 6) java.lang.ClassNotFoundException: org.hibernate.validator.constraints.NotBlank at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:700) at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1566) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) ... Indeed, it means that the annotation class is not found, because it is not in the classpath. Why ? I don't know, because I make a uber JAR that contains this class. I suppose that at the time the job tries to unmarshall the RDD, the uber jar is not loaded. So, I try to add the hibernate JAR to the class loader manually, using this spark-submit command : spark-submit --queue default \ --class com.my.Launcher \ --deploy-mode cluster \ --master yarn-cluster \ --driver-java-options "-Dfile.encoding=UTF-8" \ --jars /home/user/hibernate-validator-5.2.2.Final.jar \ --driver-class-path /home/user/hibernate-validator-5.2.2.Final.jar \ --conf "spark.executor.extraClassPath=/home/user/hibernate-validator-5.2.2.Final.jar" \ /home/user/uberjar-job.jar Without effects. So, is there a way to add this class to the classloader ? Thank you in advance. Damien = Ce message et toutes les pieces jointes (ci-apres le "message") sont confidentiels et susceptibles de contenir des informations couvertes par le secret professionnel. Ce message est etabli a l'intention exclusive de ses destinataires. Toute utilisation ou diffusion non autorisee interdite. Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE et ses filiales declinent toute responsabilite au titre de ce message s'il a ete altere, deforme falsifie. = This message and any attachments (the "message") are confidential, intended solely for the addresses, and may contain legally privileged information. Any unauthorized use or dissemination is prohibited. E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified. =
Re: ClassNotFoundException for Workers
I've included that in my build file for the fat jar already. libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155" libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155" libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155" Not sure if I need special configuration? On Tue, 25 Jul 2017 at 04:17 周康wrote: > Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath > which include your application jar and attached executor jars. > > 2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai : > >> I have this spark job which is using S3 client in mapPartition. And I get >> this error >> >> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, >> most recent failure: Lost task 0.3 in stage 3.0 (TID 74, >> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError: >> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder >> +details >> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, >> most recent failure: Lost task 0.3 in stage 3.0 (TID 74, >> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError: >> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder >> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49) >> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) >> at org.apache.spark.scheduler.Task.run(Task.scala:99) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> This is my code >> val jsonRows = sqs.mapPartitions(partitions => { >> val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new >> DefaultCredentialsProvider).build() >> >> val txfm = new LogLine2Json >> val log = Logger.getLogger("parseLog") >> >> partitions.flatMap(messages => { >> val sqsMsg = Json.parse(messages) >> val bucketName = >> Json.stringify(sqsMsg("Records")(0)("s3")("bucket")("name")).replace("\"", >> "") >> val key = >> Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"", >> "") >> val obj = s3Client.getObject(new GetObjectRequest(bucketName, >> key)) >> val stream = obj.getObjectContent() >> >> scala.io.Source.fromInputStream(stream).getLines().map(line => { >> try { >> txfm.parseLine(line) >> } >> catch { >> case e: Throwable => { >> log.info(line); "{}"; >> } >> } >> }).filter(line => line != "{}") >> }) >> }) >> >> This is my build.sbt >> >> name := "sparrow-to-orc" >> >> version := "0.1" >> >> scalaVersion := "2.11.8" >> >> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" % >> "provided" >> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" % >> "provided" >> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" % >> "provided" >> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" >> % "provided" >> >> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" % >> "provided" >> libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" % >> "provided" >> libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT" >> >> libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155" >> libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155" >> libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155" >> >> libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+" >> libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0" >> dependencyOverrides ++= Set("com.fasterxml.jackson.core" % >> "jackson-databind" % "2.6.0") >> >> >> >>
Re: ClassNotFoundException for Workers
Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath which include your application jar and attached executor jars. 2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai: > I have this spark job which is using S3 client in mapPartition. And I get > this error > > Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 3.0 (TID 74, > ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError: > Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder > +details > Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 3.0 (TID 74, > ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError: > Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder > at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49) > at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46) > at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$ > anonfun$apply$23.apply(RDD.scala:796) > at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$ > anonfun$apply$23.apply(RDD.scala:796) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:282) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > This is my code > val jsonRows = sqs.mapPartitions(partitions => { > val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new > DefaultCredentialsProvider).build() > > val txfm = new LogLine2Json > val log = Logger.getLogger("parseLog") > > partitions.flatMap(messages => { > val sqsMsg = Json.parse(messages) > val bucketName = Json.stringify(sqsMsg(" > Records")(0)("s3")("bucket")("name")).replace("\"", "") > val key = > Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"", > "") > val obj = s3Client.getObject(new GetObjectRequest(bucketName, key)) > val stream = obj.getObjectContent() > > scala.io.Source.fromInputStream(stream).getLines().map(line => { > try { > txfm.parseLine(line) > } > catch { > case e: Throwable => { > log.info(line); "{}"; > } > } > }).filter(line => line != "{}") > }) > }) > > This is my build.sbt > > name := "sparrow-to-orc" > > version := "0.1" > > scalaVersion := "2.11.8" > > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" % > "provided" > > libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" % > "provided" > libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" % > "provided" > libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT" > > libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155" > libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155" > libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155" > > libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+" > libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0" > dependencyOverrides ++= Set("com.fasterxml.jackson.core" % > "jackson-databind" % "2.6.0") > > > > assemblyMergeStrategy in assembly := { > case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last > case PathList("javax", "inject", xs @ _*) => MergeStrategy.last > case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last > case PathList("javax", "activation", xs @ _*) => MergeStrategy.last > case PathList("org", "apache", xs @ _*) => MergeStrategy.last > case PathList("com", "google", xs @ _*) => MergeStrategy.last > case PathList("com", "esotericsoftware", xs @ _*) =>
Re: ClassNotFoundException org.apache.spark.Logging
Thanks Marcelo. Problem solved. Best, Carlo Hi Marcelo, Thanks you for your help. Problem solved as you suggested. Best Regards, Carlo > On 5 Aug 2016, at 18:34, Marcelo Vanzinwrote: > > On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca > wrote: >> >>org.apache.spark >>spark-core_2.10 >>2.0.0 >>jar >> >> >>org.apache.spark >>spark-sql_2.10 >>2.0.0 >>jar >> >> >>org.apache.spark >>spark-mllib_2.10 >>1.3.0 >>jar >> >> >> > > One of these is not like the others... > > -- > Marcelo -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: ClassNotFoundException org.apache.spark.Logging
I have also executed: mvn dependency:tree |grep log [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] +- log4j:log4j:jar:1.2.17:compile [INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile [INFO] | | +- commons-logging:commons-logging:jar:1.1.3:compile and the POM reports the above libraries. Many Thanks for your help. Carlo On 5 Aug 2016, at 18:17, Carlo.Allocca> wrote: Please Sean, could you detail the version mismatch? Many thanks, Carlo On 5 Aug 2016, at 18:11, Sean Owen > wrote: You also seem to have a version mismatch here. -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.
Re: ClassNotFoundException org.apache.spark.Logging
On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Alloccawrote: > > org.apache.spark > spark-core_2.10 > 2.0.0 > jar > > > org.apache.spark > spark-sql_2.10 > 2.0.0 > jar > > > org.apache.spark > spark-mllib_2.10 > 1.3.0 > jar > > > One of these is not like the others... -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: ClassNotFoundException org.apache.spark.Logging
Please Sean, could you detail the version mismatch? Many thanks, Carlo On 5 Aug 2016, at 18:11, Sean Owen> wrote: You also seem to have a version mismatch here. -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.
Re: ClassNotFoundException org.apache.spark.Logging
One option is to clone the class in your own project. Experts may have better solution. Cheers On Fri, Aug 5, 2016 at 10:10 AM, Carlo.Alloccawrote: > Hi Ted, > > Thanks for the promptly answer. > It is not yet clear to me what I should do. > > How to fix it? > > Many thanks, > Carlo > > On 5 Aug 2016, at 17:58, Ted Yu wrote: > > private[spark] trait Logging { > > > -- The Open University is incorporated by Royal Charter (RC 000391), an > exempt charity in England & Wales and a charity registered in Scotland (SC > 038302). The Open University is authorised and regulated by the Financial > Conduct Authority. >
Re: ClassNotFoundException org.apache.spark.Logging
Hi Ted, Thanks for the promptly answer. It is not yet clear to me what I should do. How to fix it? Many thanks, Carlo On 5 Aug 2016, at 17:58, Ted Yu> wrote: private[spark] trait Logging { -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.
Re: ClassNotFoundException org.apache.spark.Logging
In 2.0, Logging became private: private[spark] trait Logging { FYI On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Alloccawrote: > Dear All, > > I would like to ask for your help about the following issue: > java.lang.ClassNotFoundException: > org.apache.spark.Logging > > I checked and the class Logging is not present. > Moreover, the line of code where the exception is thrown > > final org.apache.spark.mllib.regression.LinearRegressionModel lrModel > = LinearRegressionWithSGD.train(a, numIterations, > stepSize); > > > My POM is as reported below. > > > What am I doing wrong or missing? How I can fix it? > > Many Thanks in advice for your support. > > Best, > Carlo > > > > POM > > > > > org.apache.spark > spark-core_2.10 > 2.0.0 > jar > > > > > org.apache.spark > spark-sql_2.10 > 2.0.0 > jar > > > > log4j > log4j > 1.2.17 > test > > > > > org.slf4j > slf4j-log4j12 > 1.7.16 > test > > > > > org.apache.hadoop > hadoop-client > 2.7.2 > > > > junit > junit > 4.12 > > > > org.hamcrest > hamcrest-core > 1.3 > > > org.apache.spark > spark-mllib_2.10 > 1.3.0 > jar > > > > > -- The Open University is incorporated by Royal Charter (RC 000391), an > exempt charity in England & Wales and a charity registered in Scotland (SC > 038302). The Open University is authorised and regulated by the Financial > Conduct Authority. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter
Can you try running the example like this ./bin/run-example sql.RDDRelation I know there are some jars in the example folders, and running them this way adds them to the classpath On Jul 7, 2016 3:47 AM, "kevin"wrote: > hi,all: > I build spark use: > > ./make-distribution.sh --name "hadoop2.7.1" --tgz > "-Pyarn,hadoop-2.6,parquet-provided,hive,hive-thriftserver" -DskipTests > -Dhadoop.version=2.7.1 > > I can run example : > ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ > --master spark://master1:7077 \ > --driver-memory 1g \ > --executor-memory 512m \ > --executor-cores 1 \ > lib/spark-examples*.jar \ > 10 > > but can't run example : > org.apache.spark.examples.sql.RDDRelation > > *I got error:* > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/2 is now RUNNING > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/4 is now RUNNING > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/3 is now RUNNING > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/0 is now RUNNING > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/1 is now RUNNING > 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated: > app-20160707182845-0003/5 is now RUNNING > 16/07/07 18:28:46 INFO cluster.SparkDeploySchedulerBackend: > SchedulerBackend is ready for scheduling beginning after reached > minRegisteredResourcesRatio: 0.0 > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/parquet/hadoop/ParquetOutputCommitter > at org.apache.spark.sql.SQLConf$.(SQLConf.scala:319) > at org.apache.spark.sql.SQLConf$.(SQLConf.scala) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:85) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:77) > at main.RDDRelation$.main(RDDRelation.scala:13) > at main.RDDRelation.main(RDDRelation.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.parquet.hadoop.ParquetOutputCommitter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 15 more > >
Re: ClassNotFoundException in RDD.map
Thanks Jacob, I've looked into the source code here and found that I miss this property there: spark.repl.class.uri Putting it solved the problem Cheers 2016-03-17 18:14 GMT-03:00 Jakob Odersky: > The error is very strange indeed, however without code that reproduces > it, we can't really provide much help beyond speculation. > > One thing that stood out to me immediately is that you say you have an > RDD of Any where every Any should be a BigDecimal, so why not specify > that type information? > When using Any, a whole class of errors, that normally the typechecker > could catch, can slip through. > > On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho > wrote: > > Hi Ted, thanks for answering. > > The map is just that, whenever I try inside the map it throws this > > ClassNotFoundException, even if I do map(f => f) it throws the exception. > > What is bothering me is that when I do a take or a first it returns the > > result, which make me conclude that the previous code isn't wrong. > > > > Kind Regards, > > Dirceu > > > > > > 2016-03-17 12:50 GMT-03:00 Ted Yu : > >> > >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > >> > >> Do you mind showing more of your code involving the map() ? > >> > >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho > >> wrote: > >>> > >>> Hello, > >>> I found a strange behavior after executing a prediction with MLIB. > >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset, > >>> which is BigDecimal, and Double is the prediction for that line. > >>> When I run > >>> myRdd.take(10) it returns ok > >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] = > >>> Array((1921821857196754403.00,0.1690292052496703), > >>> (454575632374427.00,0.16902820241892452), > >>> (989198096568001939.00,0.16903432789699502), > >>> (14284129652106187990.00,0.16903517653451386), > >>> (17980228074225252497.00,0.16903151028332508), > >>> (3861345958263692781.00,0.16903056986183976), > >>> (17558198701997383205.00,0.1690295450319745), > >>> (10651576092054552310.00,0.1690286445174418), > >>> (4534494349035056215.00,0.16903303401862327), > >>> (5551671513234217935.00,0.16902303368995966)) > >>> But when I try to run some map on it: > >>> myRdd.map(_._1).take(10) > >>> It throws a ClassCastException: > >>> org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 > >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in > stage > >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: > >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > >>> at java.lang.Class.forName0(Native Method) > >>> at java.lang.Class.forName(Class.java:278) > >>> at > >>> > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > >>> at > >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) > >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > >>> at > >>> > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > >>> at > >>> >
Re: ClassNotFoundException in RDD.map
The error is very strange indeed, however without code that reproduces it, we can't really provide much help beyond speculation. One thing that stood out to me immediately is that you say you have an RDD of Any where every Any should be a BigDecimal, so why not specify that type information? When using Any, a whole class of errors, that normally the typechecker could catch, can slip through. On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filhowrote: > Hi Ted, thanks for answering. > The map is just that, whenever I try inside the map it throws this > ClassNotFoundException, even if I do map(f => f) it throws the exception. > What is bothering me is that when I do a take or a first it returns the > result, which make me conclude that the previous code isn't wrong. > > Kind Regards, > Dirceu > > > 2016-03-17 12:50 GMT-03:00 Ted Yu : >> >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >> >> Do you mind showing more of your code involving the map() ? >> >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho >> wrote: >>> >>> Hello, >>> I found a strange behavior after executing a prediction with MLIB. >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset, >>> which is BigDecimal, and Double is the prediction for that line. >>> When I run >>> myRdd.take(10) it returns ok >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] = >>> Array((1921821857196754403.00,0.1690292052496703), >>> (454575632374427.00,0.16902820241892452), >>> (989198096568001939.00,0.16903432789699502), >>> (14284129652106187990.00,0.16903517653451386), >>> (17980228074225252497.00,0.16903151028332508), >>> (3861345958263692781.00,0.16903056986183976), >>> (17558198701997383205.00,0.1690295450319745), >>> (10651576092054552310.00,0.1690286445174418), >>> (4534494349035056215.00,0.16903303401862327), >>> (5551671513234217935.00,0.16902303368995966)) >>> But when I try to run some map on it: >>> myRdd.map(_._1).take(10) >>> It throws a ClassCastException: >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:278) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>>
Re: ClassNotFoundException in RDD.map
Hi Ted, thanks for answering. The map is just that, whenever I try inside the map it throws this ClassNotFoundException, even if I do map(f => f) it throws the exception. What is bothering me is that when I do a take or a first it returns the result, which make me conclude that the previous code isn't wrong. Kind Regards, Dirceu 2016-03-17 12:50 GMT-03:00 Ted Yu: > bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > > Do you mind showing more of your code involving the map() ? > > On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Hello, >> I found a strange behavior after executing a prediction with MLIB. >> My code return an RDD[(Any,Double)] where Any is the id of my dataset, >> which is BigDecimal, and Double is the prediction for that line. >> When I run >> myRdd.take(10) it returns ok >> res16: Array[_ >: (Double, Double) <: (Any, Double)] = >> Array((1921821857196754403.00,0.1690292052496703), >> (454575632374427.00,0.16902820241892452), >> (989198096568001939.00,0.16903432789699502), >> (14284129652106187990.00,0.16903517653451386), >> (17980228074225252497.00,0.16903151028332508), >> (3861345958263692781.00,0.16903056986183976), >> (17558198701997383205.00,0.1690295450319745), >> (10651576092054552310.00,0.1690286445174418), >> (4534494349035056215.00,0.16903303401862327), >> (5551671513234217935.00,0.16902303368995966)) >> But when I try to run some map on it: >> myRdd.map(_._1).take(10) >> It throws a ClassCastException: >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: >> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:278) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at
Re: ClassNotFoundException in RDD.map
bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 Do you mind showing more of your code involving the map() ? On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho < dirceu.semigh...@gmail.com> wrote: > Hello, > I found a strange behavior after executing a prediction with MLIB. > My code return an RDD[(Any,Double)] where Any is the id of my dataset, > which is BigDecimal, and Double is the prediction for that line. > When I run > myRdd.take(10) it returns ok > res16: Array[_ >: (Double, Double) <: (Any, Double)] = > Array((1921821857196754403.00,0.1690292052496703), > (454575632374427.00,0.16902820241892452), > (989198096568001939.00,0.16903432789699502), > (14284129652106187990.00,0.16903517653451386), > (17980228074225252497.00,0.16903151028332508), > (3861345958263692781.00,0.16903056986183976), > (17558198701997383205.00,0.1690295450319745), > (10651576092054552310.00,0.1690286445174418), > (4534494349035056215.00,0.16903303401862327), > (5551671513234217935.00,0.16902303368995966)) > But when I try to run some map on it: > myRdd.map(_._1).take(10) > It throws a ClassCastException: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:278) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > at >
Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2
you need make sure this class is accessible to all servers since its a cluster mode and drive can be on any of the worker nodes. On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappawrote: > Hi, > > I'm submitting a spark job like this: > > ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master >> spark://machine1:6066 --deploy-mode cluster --jars >> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar >> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1 >> machine2 1000 >> > > and in the driver stderr, I get the following exception: > > WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX): >> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1 >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> > > Note that everything works fine when using deploy-mode as 'client'. > This is the application that I'm trying to run: > https://github.com/tdas/spark-streaming-benchmark (this problem also > happens for non streaming applications) > > What can I do to sort this out? > > Thanks. >
Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2
I found out that by commenting this line in the application code: sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ") the exception does not occur anymore. Not entirely sure why, but everything goes fine without that line. Thanks! On Tue, Dec 29, 2015 at 1:39 PM, Prem Sparkwrote: > you need make sure this class is accessible to all servers since its a > cluster mode and drive can be on any of the worker nodes. > > > On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa > wrote: > >> Hi, >> >> I'm submitting a spark job like this: >> >> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master >>> spark://machine1:6066 --deploy-mode cluster --jars >>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar >>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1 >>> machine2 1000 >>> >> >> and in the driver stderr, I get the following exception: >> >> WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, >>> XXX.XXX.XX.XXX): java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:270) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >> >> Note that everything works fine when using deploy-mode as 'client'. >> This is the application that I'm trying to run: >> https://github.com/tdas/spark-streaming-benchmark (this problem also >> happens for non streaming applications) >> >> What can I do to sort this out? >> >> Thanks. >> > >
Re: ClassNotFoundException with a uber jar.
I'm not %100 sure, but I don't think a jar within a jar will work without a custom class loader. You can perhaps try to use "maven-assembly-plugin" or "maven-shade-plugin" to build your uber/fat jar. Both of these will build a flattened single jar. -- Ali On Nov 26, 2015, at 2:49 AM, Marc de Palolwrote: > Hi all, > > I have a uber jar made with maven, the contents are: > > my.org.my.classes.Class > ... > lib/lib1.jar // 3rd party libs > lib/lib2.jar > > I'm using this kind of jar for hadoop applications and all works fine. > > I added spark libs, scala and everything needed in spark, but when I submit > this jar to spark I get ClassNotFoundExceptions: > > spark-submit --class com.bla.TestJob --driver-memory 512m --master > yarn-client /home/ble/uberjar.jar > > Then when the job is running I get this: > java.lang.NoClassDefFoundError: > com/fasterxml/jackson/datatype/guava/GuavaModule > // usage of jackson's GuavaModule is expected, as the job is using jackson > to read json. > > > this class is contained in: > lib/jackson-datatype-guava-2.4.3.jar, which is in the uberjar > > So I really don't know what I'm missing. I've tried to use --jars and > SparkContext.addJar (adding the uberjar) with no luck. > > Is there any problem using uberjars with inner jars inside ? > > Thanks! > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-a-uber-jar-tp25493.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException even if class is present in Jarfile
It turned out to be a problem with `SerializationUtils` from Apache Commons Lang. There is an open issue where the class will throw a `ClassNotFoundException` even if the class is in the classpath in a multiple-classloader environment: https://issues.apache.org/jira/browse/LANG-1049 We moved away from the library and our Spark job is working fine now. The issue was not related with Spark finally. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254p25268.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException even if class is present in Jarfile
Where is the exception thrown (full stack trace)? How are you running your application, via spark-submit or spark-shell? On Tue, Nov 3, 2015 at 1:43 AM, hveigawrote: > Hello, > > I am facing an issue where I cannot run my Spark job in a cluster > environment (standalone or EMR) but it works successfully if I run it > locally using local[*] as master. > > I am getting ClassNotFoundException: com.mycompany.folder.MyObject on the > slave executors. I don't really understand why this is happening since I > have uncompressed the Jarfile to make sure that the class is present inside > (both .java and .class) and all the rest of the classes are being loaded > fine. > > Also, I would like to mention something weird that might be related but not > sure. There are two packages inside my jarfile that are called the same but > with different casing: > > - com.mycompany.folder.MyObject > - com.myCompany.something.Else > > Could that be the reason? > > Also, I have tried adding my jarfiles in all the ways I could find > (sparkConf.setJars(...), sparkContext.addJar(...), spark-submit opt --jars, > ...) but none of the actually worked. > > I am using Apache Spark 1.5.0, Java 7, sbt 0.13.7, scala 2.10.5. > > Thanks a lot, > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- -- Iulian Dragos -- Reactive Apps on the JVM www.typesafe.com
Re: ClassNotFoundException for Kryo serialization
Now I am running up against some other problem while trying to schedule tasks: 15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2419) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I verified that the same configuration works without using Kryo serialization. On Fri, May 1, 2015 at 9:44 AM, Akshat Aranya aara...@gmail.com wrote: I cherry-picked the fix for SPARK-5470 and the problem has gone away. On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote: Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at
Re: ClassNotFoundException for Kryo serialization
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63) I have verified that when the executor process is launched, my jar is in the classpath of the command line of the executor. I expect the class to be found by the default classloader being used at KryoSerializer.scala:63 Any ideas?
Re: ClassNotFoundException for Kryo serialization
Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63) I have verified that when the executor process is launched, my jar is in the classpath of
Re: ClassNotFoundException for Kryo serialization
I cherry-picked the fix for SPARK-5470 and the problem has gone away. On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote: Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at
Re: ClassNotFoundException
Hi Kevin, yes I can test it means I have to build Spark from git repository? Ralph Am 17.03.15 um 02:59 schrieb Kevin (Sangwoo) Kim: Hi Ralph, It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which is I'm working on. I submitted a PR for it, would you test it? Regards, Kevin -- Ralph Bergmann www http://www.dasralph.de | http://www.the4thFloor.eu mail ra...@dasralph.de skypedasralph facebook https://www.facebook.com/dasralph google+ https://plus.google.com/+RalphBergmann xing https://www.xing.com/profile/Ralph_Bergmann3 linkedin https://www.linkedin.com/in/ralphbergmann gulp https://www.gulp.de/Profil/RalphBergmann.html github https://github.com/the4thfloor pgp key id 0x421F9B78 pgp fingerprint CEE3 7AE9 07BE 98DF CD5A E69C F131 4A8E 421F 9B78 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException
Hi Ralph, It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which is I'm working on. I submitted a PR for it, would you test it? Regards, Kevin On Tue, Mar 17, 2015 at 1:11 AM Ralph Bergmann ra...@dasralph.de wrote: Hi, I want to try the JavaSparkPi example[1] on a remote Spark server but I get a ClassNotFoundException. When I run it local it works but not remote. I added the spark-core lib as dependency. Do I need more? Any ideas? Thanks Ralph [1] ... https://github.com/apache/spark/blob/master/examples/ src/main/java/org/apache/spark/examples/JavaSparkPi.java - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException when registering classes with Kryo
Thanks for the notification! For now, I'll use the Kryo serializer without registering classes until the bug fix has been merged into the next version of Spark (I guess that will be 1.3, right?). arun On Sun, Feb 1, 2015 at 10:58 PM, Shixiong Zhu zsxw...@gmail.com wrote: It's a bug that has been fixed in https://github.com/apache/spark/pull/4258 but not yet been merged. Best Regards, Shixiong Zhu 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com: Here is the relevant snippet of code in my main program: === sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) sparkConf.set(spark.kryo.registrationRequired, true) val summaryDataClass = classOf[SummaryData] val summaryViewClass = classOf[SummaryView] sparkConf.registerKryoClasses(Array( summaryDataClass, summaryViewClass)) === I get the following error: Exception in thread main java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo ... Caused by: java.lang.ClassNotFoundException: com.dtex.analysis.transform.SummaryData Note that the class in question SummaryData is in the same package as the main program and hence in the same jar. What do I need to do to make this work? Thanks, arun
Re: ClassNotFoundException when registering classes with Kryo
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258 but not yet been merged. Best Regards, Shixiong Zhu 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com: Here is the relevant snippet of code in my main program: === sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) sparkConf.set(spark.kryo.registrationRequired, true) val summaryDataClass = classOf[SummaryData] val summaryViewClass = classOf[SummaryView] sparkConf.registerKryoClasses(Array( summaryDataClass, summaryViewClass)) === I get the following error: Exception in thread main java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo ... Caused by: java.lang.ClassNotFoundException: com.dtex.analysis.transform.SummaryData Note that the class in question SummaryData is in the same package as the main program and hence in the same jar. What do I need to do to make this work? Thanks, arun
RE: ClassNotFoundException in standalone mode
I finally managed to get the example working, here are the details that may help other users. I have 2 windows nodes for the test system, PN01 and PN02. Both have the same shared drive S: (it is mapped to C:\source on PN02). If I run the worker and master from S:\spark-1.1.0-bin-hadoop2.4, then running simple test fails on the ClassNotFoundException (even if there is only one node which hosts both the master and the worker). If I run the workers and masters from the local drive (c:\source\spark-1.1.0-bin-hadoop2.4), then the simple test runs ok (with one or two nodes) I haven’t found why the class fails to load with the shared drive (I checked the permissions and they look ok) but at least the cluster is working now. If anyone has experience getting Spark with windows shared drive, any advice welcome ! Thanks, Benoit. PS: Yes thanks Angel, I did check that s:\spark\simple%JAVA_HOME%\bin\jar tvf s:\spark\simple\target\scala-2.10\simple-project_2.10-1.0.jar 299 Thu Nov 20 17:29:40 GMT 2014 META-INF/MANIFEST.MF 1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$2.class 1350 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$main$1.class 2581 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$.class 1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$1.class 710 Thu Nov 20 17:29:40 GMT 2014 SimpleApp.class From: angel2014 [mailto:angel.alvarez.pas...@gmail.com] Sent: Friday, November 21, 2014 3:16 AM To: u...@spark.incubator.apache.org Subject: Re: ClassNotFoundException in standalone mode Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] [hidden email]/user/SendEmail.jtp?type=nodenode=19443i=0: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.euhttp://UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077http://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List
Re: ClassNotFoundException in standalone mode
Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] ml-node+s1001560n19391...@n3.nabble.com: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is: */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar* Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=YW5nZWwuYWx2YXJlei5wYXNjdWFAZ21haWwuY29tfDF8ODAzOTc5ODky . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException in standalone mode
Looks like it can not found class or jar in your Driver machine. Are you sure that the corresponding jar file exist in Driver machine rather than your develop machine? 2014-11-21 11:16 GMT+08:00 angel2014 angel.alvarez.pas...@gmail.com: Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=0: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is: */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar* Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List, email [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=1 To unsubscribe from Apache Spark User List, click here. NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: Re: ClassNotFoundException in standalone mode http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Hi, Yes, the error still occurs when we replace the lambdas with named functions: (same error traces as in previous posts) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p10154.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Note that runnning a simple map+reduce job on the same hdfs files with the same installation works fine: Did you call collect() on the totalLength? Otherwise nothing has actually executed.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Oh, I'm sorry... reduce is also an operation On Wed, Jul 16, 2014 at 3:37 PM, Michael Armbrust mich...@databricks.com wrote: Note that runnning a simple map+reduce job on the same hdfs files with the same installation works fine: Did you call collect() on the totalLength? Otherwise nothing has actually executed.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Hi Michael, Thanks for your reply. Yes, the reduce triggered the actual execution, I got a total length (totalLength: 95068762, for the record). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
H, it could be some weirdness with classloaders / Mesos / spark sql? I'm curious if you would hit an error if there were no lambda functions involved. Perhaps if you load the data using jsonFile or parquetFile. Either way, I'd file a JIRA. Thanks! On Jul 16, 2014 6:48 PM, Svend svend.vanderve...@gmail.com wrote: Hi Michael, Thanks for your reply. Yes, the reduce triggered the actual execution, I got a total length (totalLength: 95068762, for the record). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Tobias, Regarding my comment on closure serialization: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. The SparkREPL works differently. It uses some dark magic to send the working session to the workers. -kr, Gerard. On Wed, May 21, 2014 at 2:47 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi Tobias, I was curious about this issue and tried to run your example on my local Mesos. I was able to reproduce your issue using your current config: [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) Creating a simple jar from the job and providing it through the configuration seems to solve it: val conf = new SparkConf() .setMaster(mesos://my_ip:5050/) * .setJars(Seq(/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar))* .setAppName(SparkExamplesMinimal) Resulting in: 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at SparkExamplesMinimal.scala:50) finished in 1.120 s 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at SparkExamplesMinimal.scala:50, took 1.177091435 s count: 100 Why the closure serialization does not work with Mesos is beyond my current knowledge. Would be great to hear from the experts (cross-posting to dev for that) -kr, Gerard. On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer t...@preferred.jpwrote: Hi, I have set up a cluster with Mesos (backed by Zookeeper) with three master and three slave instances. I set up Spark (git HEAD) for use with Mesos according to this manual: http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html Using the spark-shell, I can connect to this cluster and do simple RDD operations, but the same code in a Scala class and executed via sbt run-main works only partially. (That is, count() works, count() after flatMap() does not.) Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91 The file SparkExamplesScript.scala, when pasted into spark-shell, outputs the correct count() for the parallelized list comprehension, as well as for the flatMapped RDD. The file SparkExamplesMinimal.scala contains exactly the same code, and also the MASTER configuration and the Spark Executor are the same. However, while the count() for the parallelized list is displayed correctly, I receive the following error when asking for the count() of the flatMapped RDD: - 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which has no missing parents 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34) 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 8 tasks 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1 (PROCESS_LOCAL) 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1779147 bytes in 37 ms 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0) 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Tobias, On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote: first, thanks for your explanations regarding the jar files! No prob :-) On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com wrote: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. So the closure as a function is serialized, sent across the wire, deserialized there, and *still* you need the class files? (I am not sure I understand what is actually sent over the network then. Does that serialization only contain the values that I close over?) I also had that mental lapse. Serialization refers to converting object (not class) state (current values) into a byte stream and de-serialization restores the bytes from the wire into an seemingly identical object at the receiving side (except for transient variables), for that, it requires the class definition of that object to know what it needs to instantiate, so yes, the compiled classes need to be given to the Spark driver and it will take care of dispatching them to the workers (much better than in the old RMI days ;-) If I understand correctly what you are saying, then the documentation at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html (list item 8) needs to be extended quite a bit, right? The mesos docs have been recently updated here: https://github.com/apache/spark/pull/756/files Don't know where the latest version from master is built/available. -kr, Gerard.
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Here's the 1.0.0rc9 version of the docs: https://people.apache.org/~pwendell/spark-1.0.0-rc9-docs/running-on-mesos.html I refreshed them with the goal of steering users more towards prebuilt packages than relying on compiling from source plus improving overall formatting and clarity, but not otherwise modifying the content. I don't expect any changes for rc10. It does seem like an issue though that classpath issues are preventing that from running. Just to check, have you given the exact some jar a shot when running against a standalone cluster? If it works in standalone, I think that's good evidence that there's an issue with the Mesos classloaders in master. I'm running into a similar issue with classpaths failing on Mesos but working in standalone, but I haven't coherently written up my observations yet so haven't gotten that to this list. I'd almost gotten to the point where I thought that my custom code needed to be included in the SPARK_EXECUTOR_URI but that can't possibly be correct. The Spark workers that are launched on Mesos slaves should start with the Spark core jars and then transparently get classes from custom code over the network, or at least that's who I thought it should work. For those who have been using Mesos in previous releases, you've never had to do that before have you? On Wed, May 21, 2014 at 3:30 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi Tobias, On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote: first, thanks for your explanations regarding the jar files! No prob :-) On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com wrote: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. So the closure as a function is serialized, sent across the wire, deserialized there, and *still* you need the class files? (I am not sure I understand what is actually sent over the network then. Does that serialization only contain the values that I close over?) I also had that mental lapse. Serialization refers to converting object (not class) state (current values) into a byte stream and de-serialization restores the bytes from the wire into an seemingly identical object at the receiving side (except for transient variables), for that, it requires the class definition of that object to know what it needs to instantiate, so yes, the compiled classes need to be given to the Spark driver and it will take care of dispatching them to the workers (much better than in the old RMI days ;-) If I understand correctly what you are saying, then the documentation at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html (list item 8) needs to be extended quite a bit, right? The mesos docs have been recently updated here: https://github.com/apache/spark/pull/756/files Don't know where the latest version from master is built/available. -kr, Gerard.
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Andrew, Thanks for the current doc. I'd almost gotten to the point where I thought that my custom code needed to be included in the SPARK_EXECUTOR_URI but that can't possibly be correct. The Spark workers that are launched on Mesos slaves should start with the Spark core jars and then transparently get classes from custom code over the network, or at least that's who I thought it should work. For those who have been using Mesos in previous releases, you've never had to do that before have you? Regarding the delivery of the custom job code to Mesos, we have been using 'ADD_JARS' (in the command line) or 'SparkConfig.setJars(Seq[String]) with a fat jar packing all dependencies. That works as well on the Spark 'standalone' cluster, but we deploy mostly on Mesos, so I couldn't say about classloading difference between the two. -greetz, Gerard.
Re: ClassNotFoundException
I just ran into the same problem. I will respond if I find how to fix. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html Sent from the Apache Spark User List mailing list archive at Nabble.com.