Well, it looks like Spark is just not loading my code into the driver/executors.... E.g.:
List<String> foo = JavaRDD<MyMessage> bars.map( new Function< MyMessage, String>() { { System.err.println("classpath: " + System.getProperty("java.class.path")); CodeSource src = com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource(); if (src2 != null) { URL jar = src2.getLocation(); System.err.println("aaacom.google.protobuf.GeneratedMessageLite from jar: " + jar.toString()); } @Override public String call(MyMessage v1) throws Exception { return v1.getString(); } }).collect(); prints: classpath: ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar com.google.protobuf.GeneratedMessageLite from jar: file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar I do see after those lines: 14/09/18 23:28:09 INFO Executor: Adding file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class loader This is with: spart-submit --master local --class MyClass --jars uber.jar uber.jar My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would come from there. I'm using spark 1.1 and hadoop 2.3; hadoop 2.3 should use protobuf 2.5[1] and even shade it properly. I read claims in this list that Spark shades protobuf correctly since 0.9.? and looking thru the pom.xml on github it looks like Spark includes protobuf 2.5 in the hadoop 2.3 profile. I guess I'm still at "What's the deal with getting Spark to distribute and load code from my jar correctly?" [1] http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <pw...@yelp.com> wrote: > Dear List, > > I'm writing an application where I have RDDs of protobuf messages. > When I run the app via bin/spar-submit with --master local > --driver-class-path path/to/my/uber.jar, Spark is able to > ser/deserialize the messages correctly. > > However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I > try --master spark://my.master:7077 , then I run into errors that make > it look like my protobuf message classes are not on the classpath: > > Exception in thread "main" org.apache.spark.SparkException: Job > aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost): > java.lang.RuntimeException: Unable to find proto buffer class > > com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) > ... > > Why do I need --driver-class-path in the local scenario? And how can > I ensure my classes are on the classpath no matter how my app is > submitted via bin/spark-submit (e.g. --master spark://my.master:7077 ) > ? I've tried poking through the shell scripts and SparkSubmit.scala > and unfortunately I haven't been able to grok exactly what Spark is > doing with the remote/local JVMs. > > Cheers, > -Paul --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org