hmmmmmm would using kyro help me here? On Thursday, September 18, 2014, Paul Wais <pw...@yelp.com> wrote:
> Ah, can one NOT create an RDD of any arbitrary Serializable type? It > looks like I might be getting bitten by the same > "java.io.ObjectInputStream uses root class loader only" bugs mentioned > in: > > * > http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-td3259.html > * https://github.com/apache/spark/pull/181 > > * > http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E > * https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I > > > > > On Thu, Sep 18, 2014 at 4:51 PM, Paul Wais <pw...@yelp.com <javascript:;>> > wrote: > > Well, it looks like Spark is just not loading my code into the > > driver/executors.... E.g.: > > > > List<String> foo = JavaRDD<MyMessage> bars.map( > > new Function< MyMessage, String>() { > > > > { > > System.err.println("classpath: " + > > System.getProperty("java.class.path")); > > > > CodeSource src = > > > com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource(); > > if (src2 != null) { > > URL jar = src2.getLocation(); > > > System.err.println("aaacom.google.protobuf.GeneratedMessageLite > > from jar: " + jar.toString()); > > } > > > > @Override > > public String call(MyMessage v1) throws Exception { > > return v1.getString(); > > } > > }).collect(); > > > > prints: > > classpath: > ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar > > com.google.protobuf.GeneratedMessageLite from jar: > > file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar > > > > I do see after those lines: > > 14/09/18 23:28:09 INFO Executor: Adding > > file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class > > loader > > > > > > This is with: > > > > spart-submit --master local --class MyClass --jars uber.jar uber.jar > > > > > > My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would > > come from there. I'm using spark 1.1 and hadoop 2.3; hadoop 2.3 > > should use protobuf 2.5[1] and even shade it properly. I read claims > > in this list that Spark shades protobuf correctly since 0.9.? and > > looking thru the pom.xml on github it looks like Spark includes > > protobuf 2.5 in the hadoop 2.3 profile. > > > > > > I guess I'm still at "What's the deal with getting Spark to distribute > > and load code from my jar correctly?" > > > > > > [1] > http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml > > > > On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <pw...@yelp.com > <javascript:;>> wrote: > >> Dear List, > >> > >> I'm writing an application where I have RDDs of protobuf messages. > >> When I run the app via bin/spar-submit with --master local > >> --driver-class-path path/to/my/uber.jar, Spark is able to > >> ser/deserialize the messages correctly. > >> > >> However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I > >> try --master spark://my.master:7077 , then I run into errors that make > >> it look like my protobuf message classes are not on the classpath: > >> > >> Exception in thread "main" org.apache.spark.SparkException: Job > >> aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most > >> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost): > >> java.lang.RuntimeException: Unable to find proto buffer class > >> > > com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) > >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> java.lang.reflect.Method.invoke(Method.java:606) > >> > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) > >> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) > >> ... > >> > >> Why do I need --driver-class-path in the local scenario? And how can > >> I ensure my classes are on the classpath no matter how my app is > >> submitted via bin/spark-submit (e.g. --master spark://my.master:7077 ) > >> ? I've tried poking through the shell scripts and SparkSubmit.scala > >> and unfortunately I haven't been able to grok exactly what Spark is > >> doing with the remote/local JVMs. > >> > >> Cheers, > >> -Paul >