RE: ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1

2017-09-12 Thread PICARD Damien
Ok, it just seems to be an issue with the syntax of the spark-submit command. 
It should be :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path hibernate-validator-5.2.2.Final.jar \
--conf "spark.executor.extraClassPath=hibernate -validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

I also have to add some others jars, like jboss-logging to meet the needs of 
hibernate-validator.

De : PICARD Damien (EXT) AssuResPriSmsAts
Envoyé : lundi 11 septembre 2017 08:53
À : 'user@spark.apache.org'
Objet : ClassNotFoundException while unmarshalling a remote RDD on Spark 1.5.1

Hi !

I'm facing a Classloader problem using Spark 1.5.1

I use javax.validation and hibernate validation annotations on some of my beans 
:

  @NotBlank
  @Valid
  private String attribute1 ;

  @Valid
  private String attribute2 ;

When Spark tries to unmarshall these beans (after a remote RDD), I get the 
ClassNotFoundException :
17/09/07 09:19:25 INFO storage.BlockManager: Found block rdd_8_1 remotely
17/09/07 09:19:25 ERROR executor.Executor: Exception in task 3.0 in stage 2.0 
(TID 6)
java.lang.ClassNotFoundException: org.hibernate.validator.constraints.NotBlank
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:700)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1566)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
   ...

Indeed, it means that the annotation class is not found, because it is not in 
the classpath. Why ? I don't know, because I make a uber JAR that contains this 
class. I suppose that at the time the job tries to unmarshall the RDD, the uber 
jar is not loaded.

So, I try to add the hibernate JAR to the class loader manually, using this 
spark-submit command :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path /home/user/hibernate-validator-5.2.2.Final.jar \
--conf 
"spark.executor.extraClassPath=/home/user/hibernate-validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

Without effects. So, is there a way to add this class to the classloader ?

Thank you in advance.

Damien


=

Ce message et toutes les pieces jointes (ci-apres le "message")
sont confidentiels et susceptibles de contenir des informations
couvertes par le secret professionnel. Ce message est etabli
a l'intention exclusive de ses destinataires. Toute utilisation
ou diffusion non autorisee interdite.
Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
et ses filiales declinent toute responsabilite au titre de ce message
s'il a ete altere, deforme falsifie.

=

This message and any attachments (the "message") are confidential,
intended solely for the addresses, and may contain legally privileged
information. Any unauthorized use or dissemination is prohibited.
E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any
of its subsidiaries or affiliates shall be liable for the message
if altered, changed or falsified.

=


Re: ClassNotFoundException for Workers

2017-07-31 Thread Noppanit Charassinvichai
I've included that in my build file for the fat jar already.


libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"

Not sure if I need special configuration?

On Tue, 25 Jul 2017 at 04:17 周康  wrote:

> Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath
> which include your application jar and attached executor jars.
>
> 2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai :
>
>> I have this spark job which is using S3 client in mapPartition. And I get
>> this error
>>
>> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 3.0 (TID 74,
>> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
>> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
>> +details
>> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 3.0 (TID 74,
>> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
>> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
>> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49)
>> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> This is my code
>> val jsonRows = sqs.mapPartitions(partitions => {
>>   val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new
>> DefaultCredentialsProvider).build()
>>
>>   val txfm = new LogLine2Json
>>   val log = Logger.getLogger("parseLog")
>>
>>   partitions.flatMap(messages => {
>> val sqsMsg = Json.parse(messages)
>> val bucketName =
>> Json.stringify(sqsMsg("Records")(0)("s3")("bucket")("name")).replace("\"",
>> "")
>> val key =
>> Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"",
>> "")
>> val obj = s3Client.getObject(new GetObjectRequest(bucketName,
>> key))
>> val stream = obj.getObjectContent()
>>
>> scala.io.Source.fromInputStream(stream).getLines().map(line => {
>>   try {
>> txfm.parseLine(line)
>>   }
>>   catch {
>> case e: Throwable => {
>>   log.info(line); "{}";
>> }
>>   }
>> }).filter(line => line != "{}")
>>   })
>> })
>>
>> This is my build.sbt
>>
>> name := "sparrow-to-orc"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" %
>> "provided"
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>> % "provided"
>>
>> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" %
>> "provided"
>> libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" %
>> "provided"
>> libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT"
>>
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
>> libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"
>>
>> libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+"
>> libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0"
>> dependencyOverrides ++= Set("com.fasterxml.jackson.core" %
>> "jackson-databind" % "2.6.0")
>>
>>
>>
>> 

Re: ClassNotFoundException for Workers

2017-07-25 Thread 周康
Ensure com.amazonaws.services.s3.AmazonS3ClientBuilder in your classpath
which include your application jar and attached executor jars.

2017-07-20 6:12 GMT+08:00 Noppanit Charassinvichai :

> I have this spark job which is using S3 client in mapPartition. And I get
> this error
>
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 3.0 (TID 74,
> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
> +details
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 3.0 (TID 74,
> ip-10-90-78-177.ec2.internal, executor 11): java.lang.NoClassDefFoundError:
> Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:49)
> at SparrowOrc$$anonfun$1.apply(sparrowOrc.scala:46)
> at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:796)
> at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$
> anonfun$apply$23.apply(RDD.scala:796)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:282)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> This is my code
> val jsonRows = sqs.mapPartitions(partitions => {
>   val s3Client = AmazonS3ClientBuilder.standard().withCredentials(new
> DefaultCredentialsProvider).build()
>
>   val txfm = new LogLine2Json
>   val log = Logger.getLogger("parseLog")
>
>   partitions.flatMap(messages => {
> val sqsMsg = Json.parse(messages)
> val bucketName = Json.stringify(sqsMsg("
> Records")(0)("s3")("bucket")("name")).replace("\"", "")
> val key = 
> Json.stringify(sqsMsg("Records")(0)("s3")("object")("key")).replace("\"",
> "")
> val obj = s3Client.getObject(new GetObjectRequest(bucketName, key))
> val stream = obj.getObjectContent()
>
> scala.io.Source.fromInputStream(stream).getLines().map(line => {
>   try {
> txfm.parseLine(line)
>   }
>   catch {
> case e: Throwable => {
>   log.info(line); "{}";
> }
>   }
> }).filter(line => line != "{}")
>   })
> })
>
> This is my build.sbt
>
> name := "sparrow-to-orc"
>
> version := "0.1"
>
> scalaVersion := "2.11.8"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" %
> "provided"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" %
> "provided"
> libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" %
> "provided"
> libraryDependencies += "com.cn" %% "sparrow-clf-parser" % "1.1-SNAPSHOT"
>
> libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.155"
> libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.155"
> libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.155"
>
> libraryDependencies += "com.github.seratch" %% "awscala" % "0.6.+"
> libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.0"
> dependencyOverrides ++= Set("com.fasterxml.jackson.core" %
> "jackson-databind" % "2.6.0")
>
>
>
> assemblyMergeStrategy in assembly := {
>   case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
>   case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
>   case PathList("org", "apache", xs @ _*) => MergeStrategy.last
>   case PathList("com", "google", xs @ _*) => MergeStrategy.last
>   case PathList("com", "esotericsoftware", xs @ _*) => 

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Thanks Marcelo.
Problem solved.

Best,
Carlo


Hi Marcelo,

Thanks you for your help.
Problem solved as you suggested.

Best Regards,
Carlo

> On 5 Aug 2016, at 18:34, Marcelo Vanzin  wrote:
>
> On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca  
> wrote:
>>
>>org.apache.spark
>>spark-core_2.10
>>2.0.0
>>jar
>>
>>
>>org.apache.spark
>>spark-sql_2.10
>>2.0.0
>>jar
>>
>>
>>org.apache.spark
>>spark-mllib_2.10
>>1.3.0
>>jar
>>
>>
>>
>
> One of these is not like the others...
>
> --
> Marcelo

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
I have also executed:

mvn dependency:tree |grep log
[INFO] |  | +- com.esotericsoftware:minlog:jar:1.3.0:compile
[INFO] +- log4j:log4j:jar:1.2.17:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.3:compile


and the POM reports the above libraries.

Many Thanks for your help.

Carlo


On 5 Aug 2016, at 18:17, Carlo.Allocca 
> wrote:

Please Sean, could you detail the version mismatch?

Many thanks,
Carlo
On 5 Aug 2016, at 18:11, Sean Owen 
> wrote:

You also seem to have a
version mismatch here.


-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Marcelo Vanzin
On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca  wrote:
> 
> org.apache.spark
> spark-core_2.10
> 2.0.0
> jar
> 
> 
> org.apache.spark
> spark-sql_2.10
> 2.0.0
> jar
> 
> 
> org.apache.spark
> spark-mllib_2.10
> 1.3.0
> jar
> 
>
> 

One of these is not like the others...

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Please Sean, could you detail the version mismatch?

Many thanks,
Carlo
On 5 Aug 2016, at 18:11, Sean Owen 
> wrote:

You also seem to have a
version mismatch here.

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
One option is to clone the class in your own project.

Experts may have better solution.

Cheers

On Fri, Aug 5, 2016 at 10:10 AM, Carlo.Allocca 
wrote:

> Hi Ted,
>
> Thanks for the promptly answer.
> It is not yet clear to me what I should do.
>
> How to fix it?
>
> Many thanks,
> Carlo
>
> On 5 Aug 2016, at 17:58, Ted Yu  wrote:
>
> private[spark] trait Logging {
>
>
> -- The Open University is incorporated by Royal Charter (RC 000391), an
> exempt charity in England & Wales and a charity registered in Scotland (SC
> 038302). The Open University is authorised and regulated by the Financial
> Conduct Authority.
>


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Hi Ted,

Thanks for the promptly answer.
It is not yet clear to me what I should do.

How to fix it?

Many thanks,
Carlo

On 5 Aug 2016, at 17:58, Ted Yu 
> wrote:

private[spark] trait Logging {

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.


Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
In 2.0, Logging became private:

private[spark] trait Logging {

FYI

On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca 
wrote:

> Dear All,
>
> I would like to ask for your help about the following issue: 
> java.lang.ClassNotFoundException:
> org.apache.spark.Logging
>
> I checked and the class Logging is not present.
> Moreover, the line of code where the exception is thrown
>
> final org.apache.spark.mllib.regression.LinearRegressionModel lrModel
> = LinearRegressionWithSGD.train(a, numIterations,
> stepSize);
>
>
> My POM is as reported below.
>
>
> What am I doing wrong or missing? How I can fix it?
>
> Many Thanks in advice for your support.
>
> Best,
> Carlo
>
>
>
>  POM
>
> 
>
> 
> org.apache.spark
> spark-core_2.10
> 2.0.0
> jar
> 
>
>
> 
> org.apache.spark
> spark-sql_2.10
> 2.0.0
> jar
> 
>
> 
> log4j
> log4j
> 1.2.17
> test
> 
>
>
> 
> org.slf4j
> slf4j-log4j12
> 1.7.16
> test
> 
>
>
> 
> org.apache.hadoop
> hadoop-client
> 2.7.2
> 
>
> 
> junit
> junit
> 4.12
> 
>
> 
> org.hamcrest
> hamcrest-core
> 1.3
> 
> 
> org.apache.spark
> spark-mllib_2.10
> 1.3.0
> jar
> 
>
> 
>
> -- The Open University is incorporated by Royal Charter (RC 000391), an
> exempt charity in England & Wales and a charity registered in Scotland (SC
> 038302). The Open University is authorised and regulated by the Financial
> Conduct Authority.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: ClassNotFoundException: org.apache.parquet.hadoop.ParquetOutputCommitter

2016-07-07 Thread Bryan Cutler
Can you try running the example like this

./bin/run-example sql.RDDRelation 

I know there are some jars in the example folders, and running them this
way adds them to the classpath
On Jul 7, 2016 3:47 AM, "kevin"  wrote:

> hi,all:
> I build spark use:
>
> ./make-distribution.sh --name "hadoop2.7.1" --tgz
> "-Pyarn,hadoop-2.6,parquet-provided,hive,hive-thriftserver" -DskipTests
> -Dhadoop.version=2.7.1
>
> I can run example :
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
> --master spark://master1:7077 \
> --driver-memory 1g \
> --executor-memory 512m \
> --executor-cores 1 \
> lib/spark-examples*.jar \
> 10
>
> but can't run example :
> org.apache.spark.examples.sql.RDDRelation
>
> *I got error:*
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/2 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/4 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/3 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/0 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/1 is now RUNNING
> 16/07/07 18:28:45 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20160707182845-0003/5 is now RUNNING
> 16/07/07 18:28:46 INFO cluster.SparkDeploySchedulerBackend:
> SchedulerBackend is ready for scheduling beginning after reached
> minRegisteredResourcesRatio: 0.0
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/parquet/hadoop/ParquetOutputCommitter
> at org.apache.spark.sql.SQLConf$.(SQLConf.scala:319)
> at org.apache.spark.sql.SQLConf$.(SQLConf.scala)
> at org.apache.spark.sql.SQLContext.(SQLContext.scala:85)
> at org.apache.spark.sql.SQLContext.(SQLContext.scala:77)
> at main.RDDRelation$.main(RDDRelation.scala:13)
> at main.RDDRelation.main(RDDRelation.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.parquet.hadoop.ParquetOutputCommitter
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
>
>


Re: ClassNotFoundException in RDD.map

2016-03-23 Thread Dirceu Semighini Filho
Thanks Jacob,
I've looked into the source code here and found that I miss this property
there:
spark.repl.class.uri

Putting it solved the problem

Cheers

2016-03-17 18:14 GMT-03:00 Jakob Odersky :

> The error is very strange indeed, however without code that reproduces
> it, we can't really provide much help beyond speculation.
>
> One thing that stood out to me immediately is that you say you have an
> RDD of Any where every Any should be a BigDecimal, so why not specify
> that type information?
> When using Any, a whole class of errors, that normally the typechecker
> could catch, can slip through.
>
> On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho
>  wrote:
> > Hi Ted, thanks for answering.
> > The map is just that, whenever I try inside the map it throws this
> > ClassNotFoundException, even if I do map(f => f) it throws the exception.
> > What is bothering me is that when I do a take or a first it returns the
> > result, which make me conclude that the previous code isn't wrong.
> >
> > Kind Regards,
> > Dirceu
> >
> >
> > 2016-03-17 12:50 GMT-03:00 Ted Yu :
> >>
> >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> >>
> >> Do you mind showing more of your code involving the map() ?
> >>
> >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho
> >>  wrote:
> >>>
> >>> Hello,
> >>> I found a strange behavior after executing a prediction with MLIB.
> >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
> >>> which is BigDecimal, and Double is the prediction for that line.
> >>> When I run
> >>> myRdd.take(10) it returns ok
> >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
> >>> Array((1921821857196754403.00,0.1690292052496703),
> >>> (454575632374427.00,0.16902820241892452),
> >>> (989198096568001939.00,0.16903432789699502),
> >>> (14284129652106187990.00,0.16903517653451386),
> >>> (17980228074225252497.00,0.16903151028332508),
> >>> (3861345958263692781.00,0.16903056986183976),
> >>> (17558198701997383205.00,0.1690295450319745),
> >>> (10651576092054552310.00,0.1690286445174418),
> >>> (4534494349035056215.00,0.16903303401862327),
> >>> (5551671513234217935.00,0.16902303368995966))
> >>> But when I try to run some map on it:
> >>> myRdd.map(_._1).take(10)
> >>> It throws a ClassCastException:
> >>> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0
> >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in
> stage
> >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
> >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >>> at java.security.AccessController.doPrivileged(Native Method)
> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >>> at java.lang.Class.forName0(Native Method)
> >>> at java.lang.Class.forName(Class.java:278)
> >>> at
> >>>
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> >>> at
> >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at
> >>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at
> >>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at
> >>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> >>> at
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> >>> at
> >>>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> >>> at
> >>>
> 

Re: ClassNotFoundException in RDD.map

2016-03-20 Thread Jakob Odersky
The error is very strange indeed, however without code that reproduces
it, we can't really provide much help beyond speculation.

One thing that stood out to me immediately is that you say you have an
RDD of Any where every Any should be a BigDecimal, so why not specify
that type information?
When using Any, a whole class of errors, that normally the typechecker
could catch, can slip through.

On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho
 wrote:
> Hi Ted, thanks for answering.
> The map is just that, whenever I try inside the map it throws this
> ClassNotFoundException, even if I do map(f => f) it throws the exception.
> What is bothering me is that when I do a take or a first it returns the
> result, which make me conclude that the previous code isn't wrong.
>
> Kind Regards,
> Dirceu
>
>
> 2016-03-17 12:50 GMT-03:00 Ted Yu :
>>
>> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>
>> Do you mind showing more of your code involving the map() ?
>>
>> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho
>>  wrote:
>>>
>>> Hello,
>>> I found a strange behavior after executing a prediction with MLIB.
>>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
>>> which is BigDecimal, and Double is the prediction for that line.
>>> When I run
>>> myRdd.take(10) it returns ok
>>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
>>> Array((1921821857196754403.00,0.1690292052496703),
>>> (454575632374427.00,0.16902820241892452),
>>> (989198096568001939.00,0.16903432789699502),
>>> (14284129652106187990.00,0.16903517653451386),
>>> (17980228074225252497.00,0.16903151028332508),
>>> (3861345958263692781.00,0.16903056986183976),
>>> (17558198701997383205.00,0.1690295450319745),
>>> (10651576092054552310.00,0.1690286445174418),
>>> (4534494349035056215.00,0.16903303401862327),
>>> (5551671513234217935.00,0.16902303368995966))
>>> But when I try to run some map on it:
>>> myRdd.map(_._1).take(10)
>>> It throws a ClassCastException:
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:278)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> 

Re: ClassNotFoundException in RDD.map

2016-03-19 Thread Dirceu Semighini Filho
Hi Ted, thanks for answering.
The map is just that, whenever I try inside the map it throws this
ClassNotFoundException, even if I do map(f => f) it throws the exception.
What is bothering me is that when I do a take or a first it returns the
result, which make me conclude that the previous code isn't wrong.

Kind Regards,
Dirceu

2016-03-17 12:50 GMT-03:00 Ted Yu :

> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>
> Do you mind showing more of your code involving the map() ?
>
> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho <
> dirceu.semigh...@gmail.com> wrote:
>
>> Hello,
>> I found a strange behavior after executing a prediction with MLIB.
>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
>> which is BigDecimal, and Double is the prediction for that line.
>> When I run
>> myRdd.take(10) it returns ok
>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
>> Array((1921821857196754403.00,0.1690292052496703),
>> (454575632374427.00,0.16902820241892452),
>> (989198096568001939.00,0.16903432789699502),
>> (14284129652106187990.00,0.16903517653451386),
>> (17980228074225252497.00,0.16903151028332508),
>> (3861345958263692781.00,0.16903056986183976),
>> (17558198701997383205.00,0.1690295450319745),
>> (10651576092054552310.00,0.1690286445174418),
>> (4534494349035056215.00,0.16903303401862327),
>> (5551671513234217935.00,0.16902303368995966))
>> But when I try to run some map on it:
>> myRdd.map(_._1).take(10)
>> It throws a ClassCastException:
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:278)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at 

Re: ClassNotFoundException in RDD.map

2016-03-19 Thread Ted Yu
bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1

Do you mind showing more of your code involving the map() ?

On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:

> Hello,
> I found a strange behavior after executing a prediction with MLIB.
> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
> which is BigDecimal, and Double is the prediction for that line.
> When I run
> myRdd.take(10) it returns ok
> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
> Array((1921821857196754403.00,0.1690292052496703),
> (454575632374427.00,0.16902820241892452),
> (989198096568001939.00,0.16903432789699502),
> (14284129652106187990.00,0.16903517653451386),
> (17980228074225252497.00,0.16903151028332508),
> (3861345958263692781.00,0.16903056986183976),
> (17558198701997383205.00,0.1690295450319745),
> (10651576092054552310.00,0.1690286445174418),
> (4534494349035056215.00,0.16903303401862327),
> (5551671513234217935.00,0.16902303368995966))
> But when I try to run some map on it:
> myRdd.map(_._1).take(10)
> It throws a ClassCastException:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:278)
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
> at
> 

Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Prem Spark
you need make sure this class is accessible to all servers since its a
cluster mode and drive can be on any of the worker nodes.


On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa  wrote:

> Hi,
>
> I'm submitting a spark job like this:
>
> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>> spark://machine1:6066 --deploy-mode cluster --jars
>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>> machine2  1000
>>
>
> and in the driver stderr, I get the following exception:
>
>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74, XXX.XXX.XX.XXX):
>> java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
> Note that everything works fine when using deploy-mode as 'client'.
> This is the application that I'm trying to run:
> https://github.com/tdas/spark-streaming-benchmark (this problem also
> happens for non streaming applications)
>
> What can I do to sort this out?
>
> Thanks.
>


Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Saiph Kappa
I found out that by commenting this line in the application code:
sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300
-XX:MaxInlineSize=300 ")

the exception does not occur anymore.  Not entirely sure why, but
everything goes fine without that line.

Thanks!

On Tue, Dec 29, 2015 at 1:39 PM, Prem Spark  wrote:

> you need make sure this class is accessible to all servers since its a
> cluster mode and drive can be on any of the worker nodes.
>
>
> On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa 
> wrote:
>
>> Hi,
>>
>> I'm submitting a spark job like this:
>>
>> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>>> spark://machine1:6066 --deploy-mode cluster --jars
>>> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
>>> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
>>> machine2  1000
>>>
>>
>> and in the driver stderr, I get the following exception:
>>
>>  WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 74,
>>> XXX.XXX.XX.XXX): java.lang.ClassNotFoundException: Benchmark$$anonfun$main$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:270)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>
>> Note that everything works fine when using deploy-mode as 'client'.
>> This is the application that I'm trying to run:
>> https://github.com/tdas/spark-streaming-benchmark (this problem also
>> happens for non streaming applications)
>>
>> What can I do to sort this out?
>>
>> Thanks.
>>
>
>


Re: ClassNotFoundException with a uber jar.

2015-11-26 Thread Ali Tajeldin EDU
I'm not %100 sure, but I don't think a jar within a jar will work without a 
custom class loader.  You can perhaps try to use "maven-assembly-plugin" or 
"maven-shade-plugin" to build your uber/fat jar.  Both of these will build a 
flattened single jar.
--
Ali

On Nov 26, 2015, at 2:49 AM, Marc de Palol  wrote:

> Hi all, 
> 
> I have a uber jar made with maven, the contents are:
> 
> my.org.my.classes.Class
> ...
> lib/lib1.jar // 3rd party libs
> lib/lib2.jar 
> 
> I'm using this kind of jar for hadoop applications and all works fine. 
> 
> I added spark libs, scala and everything needed in spark, but when I submit
> this jar to spark I get ClassNotFoundExceptions: 
> 
> spark-submit --class com.bla.TestJob --driver-memory 512m --master
> yarn-client /home/ble/uberjar.jar
> 
> Then when the job is running I get this: 
> java.lang.NoClassDefFoundError:
> com/fasterxml/jackson/datatype/guava/GuavaModule
> // usage of jackson's GuavaModule is expected, as the job is using jackson
> to read json.
> 
> 
> this class is contained in: 
> lib/jackson-datatype-guava-2.4.3.jar, which is in the uberjar
> 
> So I really don't know what I'm missing. I've tried to use --jars and
> SparkContext.addJar (adding the uberjar) with no luck. 
> 
> Is there any problem using uberjars with inner jars inside ? 
> 
> Thanks!
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-a-uber-jar-tp25493.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException even if class is present in Jarfile

2015-11-03 Thread hveiga
It turned out to be a problem with `SerializationUtils` from Apache Commons
Lang. There is an open issue where the class will throw a
`ClassNotFoundException` even if the class is in the classpath in a
multiple-classloader environment:
https://issues.apache.org/jira/browse/LANG-1049

We moved away from the library and our Spark job is working fine now. The
issue was not related with Spark finally.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254p25268.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException even if class is present in Jarfile

2015-11-03 Thread Iulian Dragoș
Where is the exception thrown (full stack trace)? How are you running your
application, via spark-submit or spark-shell?

On Tue, Nov 3, 2015 at 1:43 AM, hveiga  wrote:

> Hello,
>
> I am facing an issue where I cannot run my Spark job in a cluster
> environment (standalone or EMR) but it works successfully if I run it
> locally using local[*] as master.
>
> I am getting ClassNotFoundException: com.mycompany.folder.MyObject on the
> slave executors. I don't really understand why this is happening since I
> have uncompressed the Jarfile to make sure that the class is present inside
> (both .java and .class) and all the rest of the classes are being loaded
> fine.
>
> Also, I would like to mention something weird that might be related but not
> sure. There are two packages inside my jarfile that are called the same but
> with different casing:
>
> - com.mycompany.folder.MyObject
> - com.myCompany.something.Else
>
> Could that be the reason?
>
> Also, I have tried adding my jarfiles in all the ways I could find
> (sparkConf.setJars(...), sparkContext.addJar(...), spark-submit opt --jars,
> ...) but none of the actually worked.
>
> I am using Apache Spark 1.5.0, Java 7, sbt 0.13.7, scala 2.10.5.
>
> Thanks a lot,
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-even-if-class-is-present-in-Jarfile-tp25254.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: ClassNotFoundException for Kryo serialization

2015-05-02 Thread Akshat Aranya
Now I am running up against some other problem while trying to schedule tasks:

15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalStateException: unread block data
at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2419)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


I verified that the same configuration works without using Kryo serialization.


On Fri, May 1, 2015 at 9:44 AM, Akshat Aranya aara...@gmail.com wrote:
 I cherry-picked the fix for SPARK-5470 and the problem has gone away.

 On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote:
 Yes, this class is present in the jar that was loaded in the classpath
 of the executor Java process -- it wasn't even lazily added as a part
 of the task execution.  Schema$MyRow is a protobuf-generated class.

 After doing some digging around, I think I might be hitting up against
 SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
 can tell.

 On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Ted Yu
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

So the above class is in the jar which was in the classpath ?
Can you tell us a bit more about Schema$MyRow ?

On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63)

 I have verified that when the executor process is launched, my jar is in
 the classpath of the command line of the executor.  I expect the class to
 be found by the default classloader being used at KryoSerializer.scala:63

 Any ideas?



Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
Yes, this class is present in the jar that was loaded in the classpath
of the executor Java process -- it wasn't even lazily added as a part
of the task execution.  Schema$MyRow is a protobuf-generated class.

After doing some digging around, I think I might be hitting up against
SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
can tell.

On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63)

 I have verified that when the executor process is launched, my jar is in
 the classpath of 

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
I cherry-picked the fix for SPARK-5470 and the problem has gone away.

On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote:
 Yes, this class is present in the jar that was loaded in the classpath
 of the executor Java process -- it wasn't even lazily added as a part
 of the task execution.  Schema$MyRow is a protobuf-generated class.

 After doing some digging around, I think I might be hitting up against
 SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
 can tell.

 On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190)
   at
 

Re: ClassNotFoundException

2015-03-17 Thread Ralph Bergmann
Hi Kevin,


yes I can test it means I have to build Spark from git repository?


Ralph

Am 17.03.15 um 02:59 schrieb Kevin (Sangwoo) Kim:
 Hi Ralph,
 
 It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue,
 which is I'm working on. 
 I submitted a PR for it, would you test it?
 
 Regards,
 Kevin


-- 

Ralph Bergmann


www  http://www.dasralph.de | http://www.the4thFloor.eu
mail ra...@dasralph.de
skypedasralph

facebook https://www.facebook.com/dasralph
google+  https://plus.google.com/+RalphBergmann
xing https://www.xing.com/profile/Ralph_Bergmann3
linkedin https://www.linkedin.com/in/ralphbergmann
gulp https://www.gulp.de/Profil/RalphBergmann.html
github   https://github.com/the4thfloor


pgp key id   0x421F9B78
pgp fingerprint  CEE3 7AE9 07BE 98DF CD5A E69C F131 4A8E 421F 9B78

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException

2015-03-16 Thread Kevin (Sangwoo) Kim
Hi Ralph,

It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which
is I'm working on.
I submitted a PR for it, would you test it?

Regards,
Kevin

On Tue, Mar 17, 2015 at 1:11 AM Ralph Bergmann ra...@dasralph.de wrote:

 Hi,


 I want to try the JavaSparkPi example[1] on a remote Spark server but I
 get a ClassNotFoundException.

 When I run it local it works but not remote.

 I added the spark-core lib as dependency. Do I need more?

 Any ideas?

 Thanks Ralph


 [1] ...
 https://github.com/apache/spark/blob/master/examples/
 src/main/java/org/apache/spark/examples/JavaSparkPi.java

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


Re: ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Arun Lists
Thanks for the notification!

For now, I'll use the Kryo serializer without registering classes until the
bug fix has been merged into the next version of Spark (I guess that will
be 1.3, right?).

arun


On Sun, Feb 1, 2015 at 10:58 PM, Shixiong Zhu zsxw...@gmail.com wrote:

 It's a bug that has been fixed in
 https://github.com/apache/spark/pull/4258 but not yet been merged.

 Best Regards,
 Shixiong Zhu

 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:

 Here is the relevant snippet of code in my main program:

 ===

 sparkConf.set(spark.serializer,
   org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.kryo.registrationRequired, true)
 val summaryDataClass = classOf[SummaryData]
 val summaryViewClass = classOf[SummaryView]
 sparkConf.registerKryoClasses(Array(

   summaryDataClass, summaryViewClass))

 ===

 I get the following error:

 Exception in thread main java.lang.reflect.InvocationTargetException
 ...

 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
 ...

 Caused by: java.lang.ClassNotFoundException:
 com.dtex.analysis.transform.SummaryData


 Note that the class in question SummaryData is in the same package as the
 main program and hence in the same jar.

 What do I need to do to make this work?

 Thanks,
 arun






Re: ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Shixiong Zhu
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258
but not yet been merged.

Best Regards,
Shixiong Zhu

2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:

 Here is the relevant snippet of code in my main program:

 ===

 sparkConf.set(spark.serializer,
   org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.kryo.registrationRequired, true)
 val summaryDataClass = classOf[SummaryData]
 val summaryViewClass = classOf[SummaryView]
 sparkConf.registerKryoClasses(Array(

   summaryDataClass, summaryViewClass))

 ===

 I get the following error:

 Exception in thread main java.lang.reflect.InvocationTargetException
 ...

 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
 ...

 Caused by: java.lang.ClassNotFoundException:
 com.dtex.analysis.transform.SummaryData


 Note that the class in question SummaryData is in the same package as the
 main program and hence in the same jar.

 What do I need to do to make this work?

 Thanks,
 arun





RE: ClassNotFoundException in standalone mode

2014-11-24 Thread Benoit Pasquereau
I finally managed to get the example working, here are the details that may 
help other users.

I have 2 windows nodes for the test system, PN01 and PN02. Both have the same 
shared drive S: (it is mapped to C:\source on PN02).

If I run the worker and master from S:\spark-1.1.0-bin-hadoop2.4, then running 
simple test fails on the ClassNotFoundException (even if there is only one node 
which hosts both the master and the worker).

If I run the workers and masters from the local drive 
(c:\source\spark-1.1.0-bin-hadoop2.4), then the simple test runs ok (with one 
or two nodes)

I haven’t found why the class fails to load with the shared drive (I checked 
the permissions and they look ok) but at least the cluster is working now.

If anyone has experience getting Spark with windows shared drive, any advice 
welcome !

Thanks,
Benoit.


PS: Yes thanks Angel, I did check that
s:\spark\simple%JAVA_HOME%\bin\jar tvf 
s:\spark\simple\target\scala-2.10\simple-project_2.10-1.0.jar
   299 Thu Nov 20 17:29:40 GMT 2014 META-INF/MANIFEST.MF
  1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$2.class
  1350 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$main$1.class
  2581 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$.class
  1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$1.class
   710 Thu Nov 20 17:29:40 GMT 2014 SimpleApp.class


From: angel2014 [mailto:angel.alvarez.pas...@gmail.com]
Sent: Friday, November 21, 2014 3:16 AM
To: u...@spark.incubator.apache.org
Subject: Re: ClassNotFoundException in standalone mode

Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar?

2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] 
[hidden email]/user/SendEmail.jtp?type=nodenode=19443i=0:
Hi Guys,

I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 
2008).

A very simple program runs fine in local mode but fails in standalone mode.

Here is the error:

14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task
0.3 in stage 0.0 (TID 6, 
UK-RND-PN02.actixhost.euhttp://UK-RND-PN02.actixhost.eu): 
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:202)

I have added the jar to the SparkConf() to be on the safe side and it appears 
in standard output (copied after the code):

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

object SimpleApp {
  def main(args: Array[String]) {
val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md
val conf = new 
SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))
 
.setMaster(spark://UK-RND-PN02.actixhost.eu:7077http://UK-RND-PN02.actixhost.eu:7077)
 //.setMaster(local[4])
 .setAppName(Simple Application)
val sc = new SparkContext(conf)

val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url = println(Executor classpath is: + url.getFile))

val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line = line.contains(a)).count()
val numBs = logData.filter(line = line.contains(b)).count()
println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))
sc.stop()
  }
}

Simple-project is in the executor classpath list:
14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar
Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar
Executor classpath is:/S:/spark/simple/

Would you have any idea how I could investigate further ?

Thanks !
Benoit.


PS: I could attach a debugger to the Worker where the ClassNotFoundException 
happens but it is a bit painful
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement, you may review at 
http://www.amdocs.com/email_disclaimer.asp

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
To start a new topic under Apache Spark User List

Re: ClassNotFoundException in standalone mode

2014-11-20 Thread angel2014
Can you make sure the class SimpleApp$$anonfun$1 is included in your app
jar?

2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] 
ml-node+s1001560n19391...@n3.nabble.com:

  Hi Guys,



 I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows
 Server 2008).



 A very simple program runs fine in local mode but fails in standalone
 mode.



 Here is the error:



 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at
 SimpleApp.scala:22

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
 failure: Lost task

 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu):
 java.lang.ClassNotFoundException: SimpleApp$$anonfun$1

 java.net.URLClassLoader$1.run(URLClassLoader.java:202)



 I have added the jar to the SparkConf() to be on the safe side and it
 appears in standard output (copied after the code):



 /* SimpleApp.scala */

 import org.apache.spark.SparkContext

 import org.apache.spark.SparkContext._

 import org.apache.spark.SparkConf



 import java.net.URLClassLoader



 object SimpleApp {

   def main(args: Array[String]) {

 val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md

 val conf = new
 SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))

  .setMaster(spark://UK-RND-PN02.actixhost.eu:7077)

  //.setMaster(local[4])

  .setAppName(Simple Application)

 val sc = new SparkContext(conf)



 val cl = ClassLoader.getSystemClassLoader

 val urls = cl.asInstanceOf[URLClassLoader].getURLs

 urls.foreach(url = println(Executor classpath is: + url.getFile))



 val logData = sc.textFile(logFile, 2).cache()

 val numAs = logData.filter(line = line.contains(a)).count()

 val numBs = logData.filter(line = line.contains(b)).count()

 println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

 sc.stop()

   }

 }



 Simple-project is in the executor classpath list:

 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0

 Executor classpath is:/S:/spark/simple/

 Executor classpath is:
 */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar*

 Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar

 Executor classpath is:/S:/spark/simple/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar

 Executor classpath is:/S:/spark/simple/



 Would you have any idea how I could investigate further ?



 Thanks !

 Benoit.





 PS: I could attach a debugger to the Worker where the
 ClassNotFoundException happens but it is a bit painful
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement, you may review at
 http://www.amdocs.com/email_disclaimer.asp

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=YW5nZWwuYWx2YXJlei5wYXNjdWFAZ21haWwuY29tfDF8ODAzOTc5ODky
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: ClassNotFoundException in standalone mode

2014-11-20 Thread Yanbo Liang
Looks like it can not found class or jar in your Driver machine.
Are you sure that the corresponding jar file exist in Driver machine rather
than your develop machine?

2014-11-21 11:16 GMT+08:00 angel2014 angel.alvarez.pas...@gmail.com:

 Can you make sure the class SimpleApp$$anonfun$1 is included in your
 app jar?

 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List]
 [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=0:

  Hi Guys,



 I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows
 Server 2008).



 A very simple program runs fine in local mode but fails in standalone
 mode.



 Here is the error:



 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at
 SimpleApp.scala:22

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
 failure: Lost task

 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu):
 java.lang.ClassNotFoundException: SimpleApp$$anonfun$1

 java.net.URLClassLoader$1.run(URLClassLoader.java:202)



 I have added the jar to the SparkConf() to be on the safe side and it
 appears in standard output (copied after the code):



 /* SimpleApp.scala */

 import org.apache.spark.SparkContext

 import org.apache.spark.SparkContext._

 import org.apache.spark.SparkConf



 import java.net.URLClassLoader



 object SimpleApp {

   def main(args: Array[String]) {

 val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md

 val conf = new
 SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))

  .setMaster(spark://UK-RND-PN02.actixhost.eu:7077)

  //.setMaster(local[4])

  .setAppName(Simple Application)

 val sc = new SparkContext(conf)



 val cl = ClassLoader.getSystemClassLoader

 val urls = cl.asInstanceOf[URLClassLoader].getURLs

 urls.foreach(url = println(Executor classpath is: + url.getFile))



 val logData = sc.textFile(logFile, 2).cache()

 val numAs = logData.filter(line = line.contains(a)).count()

 val numBs = logData.filter(line = line.contains(b)).count()

 println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

 sc.stop()

   }

 }



 Simple-project is in the executor classpath list:

 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0

 Executor classpath is:/S:/spark/simple/

 Executor classpath is:
 */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar*

 Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar

 Executor classpath is:/S:/spark/simple/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar

 Executor classpath is:/S:/spark/simple/



 Would you have any idea how I could investigate further ?



 Thanks !

 Benoit.





 PS: I could attach a debugger to the Worker where the
 ClassNotFoundException happens but it is a bit painful
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement, you may review at
 http://www.amdocs.com/email_disclaimer.asp

 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
  To start a new topic under Apache Spark User List, email [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19443i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: Re: ClassNotFoundException in standalone
 mode
 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-18 Thread Svend
Hi, 

Yes, the error still occurs when we replace the lambdas with named
functions: 



(same error traces as in previous posts)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p10154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
 Note that runnning a simple map+reduce job on the same hdfs files with the
 same installation works fine:


Did you call collect() on the totalLength?  Otherwise nothing has actually
executed.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
Oh, I'm sorry... reduce is also an operation


On Wed, Jul 16, 2014 at 3:37 PM, Michael Armbrust mich...@databricks.com
wrote:


 Note that runnning a simple map+reduce job on the same hdfs files with the
 same installation works fine:


 Did you call collect() on the totalLength?  Otherwise nothing has
 actually executed.



Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Svend
Hi Michael, 

Thanks for your reply. Yes, the reduce triggered the actual execution, I got
a total length (totalLength: 95068762, for the record). 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
H, it could be some weirdness with classloaders / Mesos / spark sql?

I'm curious if you would hit an error if there were no lambda functions
involved.  Perhaps if you load the data using jsonFile or parquetFile.

Either way, I'd file a JIRA.  Thanks!
On Jul 16, 2014 6:48 PM, Svend svend.vanderve...@gmail.com wrote:

 Hi Michael,

 Thanks for your reply. Yes, the reduce triggered the actual execution, I
 got
 a total length (totalLength: 95068762, for the record).





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias,

Regarding my comment on closure serialization:

I was discussing it with my fellow Sparkers here and I totally overlooked
the fact that you need the class files to de-serialize the closures (or
whatever) on the workers, so you always need the jar file delivered to the
workers in order for it to work.

The SparkREPL  works differently. It uses some dark magic to send the
working session to the workers.

-kr, Gerard.





On Wed, May 21, 2014 at 2:47 PM, Gerard Maas gerard.m...@gmail.com wrote:

 Hi Tobias,

 I was curious about this issue and tried to run your example on my local
 Mesos. I was able to reproduce your issue using your current config:

 [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
 1.0:4 failed 4 times (most recent failure: Exception failure:
 java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2)
 org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times
 (most recent failure: Exception failure: java.lang.ClassNotFoundException:
 spark.SparkExamplesMinimal$$anonfun$2)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

 Creating a simple jar from the job and providing it through the
 configuration seems to solve it:

 val conf = new SparkConf()
   .setMaster(mesos://my_ip:5050/)
 *
 .setJars(Seq(/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar))*
   .setAppName(SparkExamplesMinimal)

 Resulting in:
  14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at
 SparkExamplesMinimal.scala:50) finished in 1.120 s
 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at
 SparkExamplesMinimal.scala:50, took 1.177091435 s
 count: 100

 Why the closure serialization does not work with Mesos is beyond my
 current knowledge.
 Would be great to hear from the experts (cross-posting to dev for that)

 -kr, Gerard.













 On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer t...@preferred.jpwrote:

 Hi,

 I have set up a cluster with Mesos (backed by Zookeeper) with three
 master and three slave instances. I set up Spark (git HEAD) for use
 with Mesos according to this manual:
 http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html

 Using the spark-shell, I can connect to this cluster and do simple RDD
 operations, but the same code in a Scala class and executed via sbt
 run-main works only partially. (That is, count() works, count() after
 flatMap() does not.)

 Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91
 The file SparkExamplesScript.scala, when pasted into spark-shell,
 outputs the correct count() for the parallelized list comprehension,
 as well as for the flatMapped RDD.

 The file SparkExamplesMinimal.scala contains exactly the same code,
 and also the MASTER configuration and the Spark Executor are the same.
 However, while the count() for the parallelized list is displayed
 correctly, I receive the following error when asking for the count()
 of the flatMapped RDD:

 -

 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1
 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which
 has no missing parents
 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing
 tasks from Stage 1 (FlatMappedRDD[1] at flatMap at
 SparkExamplesMinimal.scala:34)
 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set
 1.0 with 8 tasks
 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0
 as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1
 (PROCESS_LOCAL)
 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0
 as 1779147 bytes in 37 ms
 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0)
 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to
 java.lang.ClassNotFoundException
 java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at
 

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias,

On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote:
first, thanks for your explanations regarding the jar files!
No prob :-)


 On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com
 wrote:
  I was discussing it with my fellow Sparkers here and I totally overlooked
  the fact that you need the class files to de-serialize the closures (or
  whatever) on the workers, so you always need the jar file delivered to
 the
  workers in order for it to work.

 So the closure as a function is serialized, sent across the wire,
 deserialized there, and *still* you need the class files? (I am not
 sure I understand what is actually sent over the network then. Does
 that serialization only contain the values that I close over?)


I also had that mental lapse. Serialization refers to converting object
(not class) state (current values)  into a byte stream and de-serialization
restores the bytes from the wire into an seemingly identical object at the
receiving side (except for transient variables), for that, it requires the
class definition of that object to know what it needs to instantiate, so
yes, the compiled classes need to be given to the Spark driver and it will
take care of dispatching them to the workers (much better than in the old
RMI days ;-)


 If I understand correctly what you are saying, then the documentation
 at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
 (list item 8) needs to be extended quite a bit, right?


The mesos docs have been recently updated here:
https://github.com/apache/spark/pull/756/files
Don't know where the latest version from master is built/available.

-kr, Gerard.


Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Andrew Ash
Here's the 1.0.0rc9 version of the docs:
https://people.apache.org/~pwendell/spark-1.0.0-rc9-docs/running-on-mesos.html
I refreshed them with the goal of steering users more towards prebuilt
packages than relying on compiling from source plus improving overall
formatting and clarity, but not otherwise modifying the content. I don't
expect any changes for rc10.

It does seem like an issue though that classpath issues are preventing that
from running.  Just to check, have you given the exact some jar a shot when
running against a standalone cluster?  If it works in standalone, I think
that's good evidence that there's an issue with the Mesos classloaders in
master.

I'm running into a similar issue with classpaths failing on Mesos but
working in standalone, but I haven't coherently written up my observations
yet so haven't gotten that to this list.

I'd almost gotten to the point where I thought that my custom code needed
to be included in the SPARK_EXECUTOR_URI but that can't possibly be
correct.  The Spark workers that are launched on Mesos slaves should start
with the Spark core jars and then transparently get classes from custom
code over the network, or at least that's who I thought it should work.
 For those who have been using Mesos in previous releases, you've never had
to do that before have you?




On Wed, May 21, 2014 at 3:30 PM, Gerard Maas gerard.m...@gmail.com wrote:

 Hi Tobias,

 On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote:
 first, thanks for your explanations regarding the jar files!
 No prob :-)


 On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com
 wrote:
  I was discussing it with my fellow Sparkers here and I totally
 overlooked
  the fact that you need the class files to de-serialize the closures (or
  whatever) on the workers, so you always need the jar file delivered to
 the
  workers in order for it to work.

 So the closure as a function is serialized, sent across the wire,
 deserialized there, and *still* you need the class files? (I am not
 sure I understand what is actually sent over the network then. Does
 that serialization only contain the values that I close over?)


 I also had that mental lapse. Serialization refers to converting object
 (not class) state (current values)  into a byte stream and de-serialization
 restores the bytes from the wire into an seemingly identical object at the
 receiving side (except for transient variables), for that, it requires the
 class definition of that object to know what it needs to instantiate, so
 yes, the compiled classes need to be given to the Spark driver and it will
 take care of dispatching them to the workers (much better than in the old
 RMI days ;-)


 If I understand correctly what you are saying, then the documentation
 at
 https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
 (list item 8) needs to be extended quite a bit, right?


 The mesos docs have been recently updated here:
 https://github.com/apache/spark/pull/756/files
 Don't know where the latest version from master is built/available.

 -kr, Gerard.



Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Andrew,

Thanks for the current doc.

I'd almost gotten to the point where I thought that my custom code needed
 to be included in the SPARK_EXECUTOR_URI but that can't possibly be
 correct.  The Spark workers that are launched on Mesos slaves should start
 with the Spark core jars and then transparently get classes from custom
 code over the network, or at least that's who I thought it should work.
  For those who have been using Mesos in previous releases, you've never had
 to do that before have you?


Regarding the delivery of the custom job code to Mesos, we have been using
'ADD_JARS' (in the command line) or 'SparkConfig.setJars(Seq[String]) with
a fat jar packing all dependencies.
That works as well on the Spark 'standalone' cluster, but we deploy mostly
on Mesos, so I couldn't say about classloading difference between the two.

-greetz, Gerard.


Re: ClassNotFoundException

2014-05-04 Thread pedro
I just ran into the same problem. I will respond if I find how to fix.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.