If you use Standalone mode, just start spark-shell like following: spark-shell --jars your_uber_jar --conf spark.files.userClassPathFirst=true Yong Date: Tue, 15 Sep 2015 09:33:40 -0500 Subject: Re: Change protobuf version or any other third party library version in Spark application From: ljia...@gmail.com To: java8...@hotmail.com CC: ste...@hortonworks.com; user@spark.apache.org
Steve, Thanks for the input. You are absolutely right. When I use protobuf 2.6.1, I also ran into method not defined errors. You suggest using Maven sharding strategy, but I have already built the uber jar to package all my custom classes and its dependencies including protobuf 3. The problem is how to configure spark shell to use my uber jar first. java8964 -- appreciate the link and I will try the configuration. Looks promising. However, the "user classpath first" attribute does not apply to spark-shell, am I correct? Lan On Tue, Sep 15, 2015 at 8:24 AM, java8964 <java8...@hotmail.com> wrote: It is a bad idea to use the major version change of protobuf, as it most likely won't work. But you really want to give it a try, set the "user classpath first", so the protobuf 3 coming with your jar will be used. The setting depends on your deployment mode, check this for the parameter: https://issues.apache.org/jira/browse/SPARK-2996 Yong Subject: Re: Change protobuf version or any other third party library version in Spark application From: ste...@hortonworks.com To: ljia...@gmail.com CC: user@spark.apache.org Date: Tue, 15 Sep 2015 09:19:28 +0000 On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com> wrote: Hi, there, I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by default. However, I would like to use Protobuf 3 in my spark application so that I can use some new features such as Map support. Is there anyway to do that? Right now if I build a uber.jar with dependencies including protobuf 3 classes and pass to spark-shell through --jars option, during the execution, I got the error java.lang.NoSuchFieldError: unknownFields. protobuf is an absolute nightmare version-wise, as protoc generates incompatible java classes even across point versions. Hadoop 2.2+ is and will always be protobuf 2.5 only; that applies transitively to downstream projects (the great protobuf upgrade of 2013 was actually pushed by the HBase team, and required a co-ordinated change across multiple projects) Is there anyway to use a different version of Protobuf other than the default one included in the Spark distribution? I guess I can generalize and extend the question to any third party libraries. How to deal with version conflict for any third party libraries included in the Spark distribution? maven shading is the strategy. Generally it is less needed, though the troublesome binaries are, across the entire apache big data stack: google protobuf google guava kryo jackson you can generally bump up the other versions, at least by point releases.