Well, it looks like Spark is just not loading my code into the
driver/executors.... E.g.:

List<String> foo = JavaRDD<MyMessage> bars.map(
    new Function< MyMessage, String>() {

    {
        System.err.println("classpath: " +
System.getProperty("java.class.path"));

        CodeSource src =
com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource();
        if (src2 != null) {
           URL jar = src2.getLocation();
           System.err.println("aaacom.google.protobuf.GeneratedMessageLite
from jar: " + jar.toString());
    }

    @Override
    public String call(MyMessage v1) throws Exception {
        return v1.getString();
    }
}).collect();

prints:
classpath: 
::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar
com.google.protobuf.GeneratedMessageLite from jar:
file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar

I do see after those lines:
14/09/18 23:28:09 INFO Executor: Adding
file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class
loader


This is with:

spart-submit --master local --class MyClass --jars uber.jar  uber.jar


My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would
come from there.  I'm using spark 1.1 and hadoop 2.3; hadoop 2.3
should use protobuf 2.5[1] and even shade it properly.  I read claims
in this list that Spark shades protobuf correctly since 0.9.? and
looking thru the pom.xml on github it looks like Spark includes
protobuf 2.5 in the hadoop 2.3 profile.


I guess I'm still at "What's the deal with getting Spark to distribute
and load code from my jar correctly?"


[1] 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml

On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <pw...@yelp.com> wrote:
> Dear List,
>
> I'm writing an application where I have RDDs of protobuf messages.
> When I run the app via bin/spar-submit with --master local
> --driver-class-path path/to/my/uber.jar, Spark is able to
> ser/deserialize the messages correctly.
>
> However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I
> try --master spark://my.master:7077 , then I run into errors that make
> it look like my protobuf message classes are not on the classpath:
>
> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost):
> java.lang.RuntimeException: Unable to find proto buffer class
>         
> com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775)
>         sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         java.lang.reflect.Method.invoke(Method.java:606)
>         
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104)
>         
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807)
>         ...
>
> Why do I need --driver-class-path in the local scenario?  And how can
> I ensure my classes are on the classpath no matter how my app is
> submitted via bin/spark-submit (e.g. --master spark://my.master:7077 )
> ?  I've tried poking through the shell scripts and SparkSubmit.scala
> and unfortunately I haven't been able to grok exactly what Spark is
> doing with the remote/local JVMs.
>
> Cheers,
> -Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to