Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Lan Jiang
I am happy to report that after set spark.dirver.userClassPathFirst, I can use 
protobuf 3 with spark-shell. Looks like the classloading issue in the driver, 
not executor. 

Marcelo, thank you very much for the tip!

Lan


> On Sep 15, 2015, at 1:40 PM, Marcelo Vanzin  wrote:
> 
> Hi,
> 
> Just "spark.executor.userClassPathFirst" is not enough. You should
> also set "spark.driver.userClassPathFirst". Also not that I don't
> think this was really tested with the shell, but that should work with
> regular apps started using spark-submit.
> 
> If that doesn't work, I'd recommend shading, as others already have.
> 
> On Tue, Sep 15, 2015 at 9:19 AM, Lan Jiang  wrote:
>> I used the --conf spark.files.userClassPathFirst=true  in the spark-shell
>> option, it still gave me the eror: java.lang.NoSuchFieldError: unknownFields
>> if I use protobuf 3.
>> 
>> The output says spark.files.userClassPathFirst is deprecated and suggest
>> using spark.executor.userClassPathFirst. I tried that and it did not work
>> either.
> 
> -- 
> Marcelo


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Marcelo Vanzin
Hi,

Just "spark.executor.userClassPathFirst" is not enough. You should
also set "spark.driver.userClassPathFirst". Also not that I don't
think this was really tested with the shell, but that should work with
regular apps started using spark-submit.

If that doesn't work, I'd recommend shading, as others already have.

On Tue, Sep 15, 2015 at 9:19 AM, Lan Jiang  wrote:
> I used the --conf spark.files.userClassPathFirst=true  in the spark-shell
> option, it still gave me the eror: java.lang.NoSuchFieldError: unknownFields
> if I use protobuf 3.
>
> The output says spark.files.userClassPathFirst is deprecated and suggest
> using spark.executor.userClassPathFirst. I tried that and it did not work
> either.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Steve Loughran


On 15 Sep 2015, at 05:47, Lan Jiang 
> wrote:

Hi, there,

I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by default. 
However, I would like to use Protobuf 3 in my spark application so that I can 
use some new features such as Map support.  Is there anyway to do that?

Right now if I build a uber.jar with dependencies including protobuf 3 classes 
and pass to spark-shell through --jars option, during the execution, I got the 
error java.lang.NoSuchFieldError: unknownFields.


protobuf is an absolute nightmare version-wise, as protoc generates 
incompatible java classes even across point versions. Hadoop 2.2+ is and will 
always be protobuf 2.5 only; that applies transitively to downstream projects  
(the great protobuf upgrade of 2013 was actually pushed by the HBase team, and 
required a co-ordinated change across multiple projects)


Is there anyway to use a different version of Protobuf other than the default 
one included in the Spark distribution? I guess I can generalize and extend the 
question to any third party libraries. How to deal with version conflict for 
any third party libraries included in the Spark distribution?

maven shading is the strategy. Generally it is less needed, though the 
troublesome binaries are,  across the entire apache big data stack:

google protobuf
google guava
kryo
jackson

you can generally bump up the other versions, at least by point releases.


Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Lan Jiang
I used the --conf spark.files.userClassPathFirst=true  in the spark-shell 
option, it still gave me the eror: java.lang.NoSuchFieldError: unknownFields if 
I use protobuf 3. 

The output says spark.files.userClassPathFirst is deprecated and suggest using 
spark.executor.userClassPathFirst. I tried that and it did not work either. 

Lan



> On Sep 15, 2015, at 10:31 AM, java8964 <java8...@hotmail.com> wrote:
> 
> If you use Standalone mode, just start spark-shell like following:
> 
> spark-shell --jars your_uber_jar --conf spark.files.userClassPathFirst=true 
> 
> Yong
> 
> Date: Tue, 15 Sep 2015 09:33:40 -0500
> Subject: Re: Change protobuf version or any other third party library version 
> in Spark application
> From: ljia...@gmail.com
> To: java8...@hotmail.com
> CC: ste...@hortonworks.com; user@spark.apache.org
> 
> Steve,
> 
> Thanks for the input. You are absolutely right. When I use protobuf 2.6.1, I 
> also ran into method not defined errors. You suggest using Maven sharding 
> strategy, but I have already built the uber jar to package all my custom 
> classes and its dependencies including protobuf 3. The problem is how to 
> configure spark shell to use my uber jar first. 
> 
> java8964 -- appreciate the link and I will try the configuration. Looks 
> promising. However, the "user classpath first" attribute does not apply to 
> spark-shell, am I correct? 
> 
> Lan
> 
> On Tue, Sep 15, 2015 at 8:24 AM, java8964 <java8...@hotmail.com 
> <mailto:java8...@hotmail.com>> wrote:
> It is a bad idea to use the major version change of protobuf, as it most 
> likely won't work.
> 
> But you really want to give it a try, set the "user classpath first", so the 
> protobuf 3 coming with your jar will be used.
> 
> The setting depends on your deployment mode, check this for the parameter:
> 
> https://issues.apache.org/jira/browse/SPARK-2996 
> <https://issues.apache.org/jira/browse/SPARK-2996>
> 
> Yong
> 
> Subject: Re: Change protobuf version or any other third party library version 
> in Spark application
> From: ste...@hortonworks.com <mailto:ste...@hortonworks.com>
> To: ljia...@gmail.com <mailto:ljia...@gmail.com>
> CC: user@spark.apache.org <mailto:user@spark.apache.org>
> Date: Tue, 15 Sep 2015 09:19:28 +
> 
> 
> 
> 
> On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com 
> <mailto:ljia...@gmail.com>> wrote:
> 
> Hi, there,
> 
> I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by 
> default. However, I would like to use Protobuf 3 in my spark application so 
> that I can use some new features such as Map support.  Is there anyway to do 
> that? 
> 
> Right now if I build a uber.jar with dependencies including protobuf 3 
> classes and pass to spark-shell through --jars option, during the execution, 
> I got the error java.lang.NoSuchFieldError: unknownFields. 
> 
> 
> protobuf is an absolute nightmare version-wise, as protoc generates 
> incompatible java classes even across point versions. Hadoop 2.2+ is and will 
> always be protobuf 2.5 only; that applies transitively to downstream projects 
>  (the great protobuf upgrade of 2013 was actually pushed by the HBase team, 
> and required a co-ordinated change across multiple projects)
> 
> 
> Is there anyway to use a different version of Protobuf other than the default 
> one included in the Spark distribution? I guess I can generalize and extend 
> the question to any third party libraries. How to deal with version conflict 
> for any third party libraries included in the Spark distribution? 
> 
> maven shading is the strategy. Generally it is less needed, though the 
> troublesome binaries are,  across the entire apache big data stack:
> 
> google protobuf
> google guava
> kryo
> jackson
> 
> you can generally bump up the other versions, at least by point releases.



Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Guru Medasani
Hi Lan,

Reading the pull request below. Looks like you should be able to use the config 
to both drivers and executors. I would give it a try with the Spark-shell on 
Yarn client mode.

https://github.com/apache/spark/pull/3233 
<https://github.com/apache/spark/pull/3233>

Yarn's config option spark.yarn.user.classpath.first does not work the same way 
as
spark.files.userClassPathFirst; Yarn's version is a lot more dangerous, in that 
it
modifies the system classpath, instead of restricting the changes to the user's 
class
loader. So this change implements the behavior of the latter for Yarn, and 
deprecates
the more dangerous choice.

To be able to achieve feature-parity, I also implemented the option for drivers 
(the existing
option only applies to executors). So now there are two options, each 
controlling whether
to apply userClassPathFirst to the driver or executors. The old option was 
deprecated, and
aliased to the new one (spark.executor.userClassPathFirst).

The existing "child-first" class loader also had to be fixed. It didn't handle 
resources, and it
was also doing some things that ended up causing JVM errors depending on how 
things
were being called.


Guru Medasani
gdm...@gmail.com



> On Sep 15, 2015, at 9:33 AM, Lan Jiang <ljia...@gmail.com> wrote:
> 
> Steve,
> 
> Thanks for the input. You are absolutely right. When I use protobuf 2.6.1, I 
> also ran into method not defined errors. You suggest using Maven sharding 
> strategy, but I have already built the uber jar to package all my custom 
> classes and its dependencies including protobuf 3. The problem is how to 
> configure spark shell to use my uber jar first. 
> 
> java8964 -- appreciate the link and I will try the configuration. Looks 
> promising. However, the "user classpath first" attribute does not apply to 
> spark-shell, am I correct? 
> 
> Lan
> 
> On Tue, Sep 15, 2015 at 8:24 AM, java8964 <java8...@hotmail.com 
> <mailto:java8...@hotmail.com>> wrote:
> It is a bad idea to use the major version change of protobuf, as it most 
> likely won't work.
> 
> But you really want to give it a try, set the "user classpath first", so the 
> protobuf 3 coming with your jar will be used.
> 
> The setting depends on your deployment mode, check this for the parameter:
> 
> https://issues.apache.org/jira/browse/SPARK-2996 
> <https://issues.apache.org/jira/browse/SPARK-2996>
> 
> Yong
> 
> Subject: Re: Change protobuf version or any other third party library version 
> in Spark application
> From: ste...@hortonworks.com <mailto:ste...@hortonworks.com>
> To: ljia...@gmail.com <mailto:ljia...@gmail.com>
> CC: user@spark.apache.org <mailto:user@spark.apache.org>
> Date: Tue, 15 Sep 2015 09:19:28 +
> 
> 
> 
> 
> On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com 
> <mailto:ljia...@gmail.com>> wrote:
> 
> Hi, there,
> 
> I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by 
> default. However, I would like to use Protobuf 3 in my spark application so 
> that I can use some new features such as Map support.  Is there anyway to do 
> that? 
> 
> Right now if I build a uber.jar with dependencies including protobuf 3 
> classes and pass to spark-shell through --jars option, during the execution, 
> I got the error java.lang.NoSuchFieldError: unknownFields. 
> 
> 
> protobuf is an absolute nightmare version-wise, as protoc generates 
> incompatible java classes even across point versions. Hadoop 2.2+ is and will 
> always be protobuf 2.5 only; that applies transitively to downstream projects 
>  (the great protobuf upgrade of 2013 was actually pushed by the HBase team, 
> and required a co-ordinated change across multiple projects)
> 
> 
> Is there anyway to use a different version of Protobuf other than the default 
> one included in the Spark distribution? I guess I can generalize and extend 
> the question to any third party libraries. How to deal with version conflict 
> for any third party libraries included in the Spark distribution? 
> 
> maven shading is the strategy. Generally it is less needed, though the 
> troublesome binaries are,  across the entire apache big data stack:
> 
> google protobuf
> google guava
> kryo
> jackson
> 
> you can generally bump up the other versions, at least by point releases.
> 



RE: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread java8964
If you use Standalone mode, just start spark-shell like following:
spark-shell --jars your_uber_jar --conf spark.files.userClassPathFirst=true 
Yong
Date: Tue, 15 Sep 2015 09:33:40 -0500
Subject: Re: Change protobuf version or any other third party library version 
in Spark application
From: ljia...@gmail.com
To: java8...@hotmail.com
CC: ste...@hortonworks.com; user@spark.apache.org

Steve,
Thanks for the input. You are absolutely right. When I use protobuf 2.6.1, I 
also ran into method not defined errors. You suggest using Maven sharding 
strategy, but I have already built the uber jar to package all my custom 
classes and its dependencies including protobuf 3. The problem is how to 
configure spark shell to use my uber jar first. 
java8964 -- appreciate the link and I will try the configuration. Looks 
promising. However, the "user classpath first" attribute does not apply to 
spark-shell, am I correct? 

Lan
On Tue, Sep 15, 2015 at 8:24 AM, java8964 <java8...@hotmail.com> wrote:



It is a bad idea to use the major version change of protobuf, as it most likely 
won't work.
But you really want to give it a try, set the "user classpath first", so the 
protobuf 3 coming with your jar will be used.
The setting depends on your deployment mode, check this for the parameter:
https://issues.apache.org/jira/browse/SPARK-2996
Yong

Subject: Re: Change protobuf version or any other third party library version 
in Spark application
From: ste...@hortonworks.com
To: ljia...@gmail.com
CC: user@spark.apache.org
Date: Tue, 15 Sep 2015 09:19:28 +













On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com> wrote:



Hi, there,



I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by default. 
However, I would like to use Protobuf 3 in my spark application so that I can 
use some new features such as Map support.  Is there anyway to do that? 



Right now if I build a uber.jar with dependencies including protobuf 3 classes 
and pass to spark-shell through --jars option, during the execution, I got the 
error java.lang.NoSuchFieldError: unknownFields. 









protobuf is an absolute nightmare version-wise, as protoc generates 
incompatible java classes even across point versions. Hadoop 2.2+ is and will 
always be protobuf 2.5 only; that applies transitively to downstream projects  
(the great protobuf upgrade
 of 2013 was actually pushed by the HBase team, and required a co-ordinated 
change across multiple projects)








Is there anyway to use a different version of Protobuf other than the default 
one included in the Spark distribution? I guess I can generalize and extend the 
question to any third party libraries. How to deal with version conflict for 
any third
 party libraries included in the Spark distribution? 







maven shading is the strategy. Generally it is less needed, though the 
troublesome binaries are,  across the entire apache big data stack:


google protobuf
google guava
kryo

jackson



you can generally bump up the other versions, at least by point releases.   
  

  

Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Lan Jiang
Steve,

Thanks for the input. You are absolutely right. When I use protobuf 2.6.1,
I also ran into method not defined errors. You suggest using Maven sharding
strategy, but I have already built the uber jar to package all my custom
classes and its dependencies including protobuf 3. The problem is how to
configure spark shell to use my uber jar first.

java8964 -- appreciate the link and I will try the configuration. Looks
promising. However, the "user classpath first" attribute does not apply to
spark-shell, am I correct?

Lan

On Tue, Sep 15, 2015 at 8:24 AM, java8964 <java8...@hotmail.com> wrote:

> It is a bad idea to use the major version change of protobuf, as it most
> likely won't work.
>
> But you really want to give it a try, set the "user classpath first", so
> the protobuf 3 coming with your jar will be used.
>
> The setting depends on your deployment mode, check this for the parameter:
>
> https://issues.apache.org/jira/browse/SPARK-2996
>
> Yong
>
> ----------
> Subject: Re: Change protobuf version or any other third party library
> version in Spark application
> From: ste...@hortonworks.com
> To: ljia...@gmail.com
> CC: user@spark.apache.org
> Date: Tue, 15 Sep 2015 09:19:28 +
>
>
>
>
> On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com> wrote:
>
> Hi, there,
>
> I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by
> default. However, I would like to use Protobuf 3 in my spark application so
> that I can use some new features such as Map support.  Is there anyway to
> do that?
>
> Right now if I build a uber.jar with dependencies including protobuf 3
> classes and pass to spark-shell through --jars option, during the
> execution, I got the error *java.lang.NoSuchFieldError: unknownFields. *
>
>
>
> protobuf is an absolute nightmare version-wise, as protoc generates
> incompatible java classes even across point versions. Hadoop 2.2+ is and
> will always be protobuf 2.5 only; that applies transitively to downstream
> projects  (the great protobuf upgrade of 2013 was actually pushed by the
> HBase team, and required a co-ordinated change across multiple projects)
>
>
> Is there anyway to use a different version of Protobuf other than the
> default one included in the Spark distribution? I guess I can generalize
> and extend the question to any third party libraries. How to deal with
> version conflict for any third party libraries included in the Spark
> distribution?
>
>
> maven shading is the strategy. Generally it is less needed, though the
> troublesome binaries are,  across the entire apache big data stack:
>
> google protobuf
> google guava
> kryo
> jackson
>
> you can generally bump up the other versions, at least by point releases.
>


RE: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread java8964
It is a bad idea to use the major version change of protobuf, as it most likely 
won't work.
But you really want to give it a try, set the "user classpath first", so the 
protobuf 3 coming with your jar will be used.
The setting depends on your deployment mode, check this for the parameter:
https://issues.apache.org/jira/browse/SPARK-2996
Yong

Subject: Re: Change protobuf version or any other third party library version 
in Spark application
From: ste...@hortonworks.com
To: ljia...@gmail.com
CC: user@spark.apache.org
Date: Tue, 15 Sep 2015 09:19:28 +













On 15 Sep 2015, at 05:47, Lan Jiang <ljia...@gmail.com> wrote:


Hi, there,



I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by default. 
However, I would like to use Protobuf 3 in my spark application so that I can 
use some new features such as Map support.  Is there anyway to do that? 



Right now if I build a uber.jar with dependencies including protobuf 3 classes 
and pass to spark-shell through --jars option, during the execution, I got the 
error java.lang.NoSuchFieldError: unknownFields. 









protobuf is an absolute nightmare version-wise, as protoc generates 
incompatible java classes even across point versions. Hadoop 2.2+ is and will 
always be protobuf 2.5 only; that applies transitively to downstream projects  
(the great protobuf upgrade
 of 2013 was actually pushed by the HBase team, and required a co-ordinated 
change across multiple projects)








Is there anyway to use a different version of Protobuf other than the default 
one included in the Spark distribution? I guess I can generalize and extend the 
question to any third party libraries. How to deal with version conflict for 
any third
 party libraries included in the Spark distribution? 







maven shading is the strategy. Generally it is less needed, though the 
troublesome binaries are,  across the entire apache big data stack:


google protobuf
google guava
kryo

jackson



you can generally bump up the other versions, at least by point releases.   
  

Change protobuf version or any other third party library version in Spark application

2015-09-14 Thread Lan Jiang
Hi, there,

I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by
default. However, I would like to use Protobuf 3 in my spark application so
that I can use some new features such as Map support.  Is there anyway to
do that?

Right now if I build a uber.jar with dependencies including protobuf 3
classes and pass to spark-shell through --jars option, during the
execution, I got the error *java.lang.NoSuchFieldError: unknownFields. *

Is there anyway to use a different version of Protobuf other than the
default one included in the Spark distribution? I guess I can generalize
and extend the question to any third party libraries. How to deal with
version conflict for any third party libraries included in the Spark
distribution?

Thanks!

Lan