Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread bo yang
Hi Corey, I see. Thanks for making it clear. I may be lucky not hitting code path of such Guava classes. But I hit some other jar conflicts when using Spark to connect to AWS S3. Then I had to manually try each version of org.apache.httpcomponents until I found a proper old version. Another

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, When you run on Yarn, Yarn's libraries are placed in the classpath, and they have precedence over your app's. So, with Spark 1.2, you'll get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get Guava 14 from Spark, so still a problem for you). Right now, the option Markus

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Koert Kuipers
the whole spark.files.userClassPathFirs never really worked for me in standalone mode, since jars were added dynamically which means they had different classloaders leading to a real classloader hell if you tried to add a newer version of jar that spark already used. see:

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Corey Nolet
My mistake Marcello, I was looking at the wrong message. That reply was meant for bo yang. On Feb 4, 2015 4:02 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Corey, On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote: Another suggestion is to build Spark by yourself.

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Koert Kuipers
marcelo, i was not aware of those fixes. its a fulltime job to keep up with spark... i will take another look. it would be great if that works on spark standalone also and resolves the issues i experienced before. about putting stuff on classpath before spark or yarn... yeah you can shoot

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Koert, On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers ko...@tresata.com wrote: do i understand it correctly that on yarn the the customer jars are truly placed before the yarn and spark jars on classpath? meaning at container construction time, on the same classloader? that would be great

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers ko...@tresata.com wrote: about putting stuff on classpath before spark or yarn... yeah you can shoot yourself in the foot with it, but since the container is isolated it should be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Koert Kuipers
i have never been a big fan of shading, since it leads to the same library being packaged many times. for example, its not unusual to have ASM 10 times in a jar because of the shading policy they promote. and all that because they broke one signature and without a really good reason. i am a big

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote: Another suggestion is to build Spark by yourself. I'm having trouble seeing what you mean here, Marcelo. Guava is already shaded to a different package for the 1.2.0 release. It shouldn't be causing conflicts.

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Koert Kuipers
anyhow i am ranting... sorry On Wed, Feb 4, 2015 at 5:54 PM, Koert Kuipers ko...@tresata.com wrote: yeah i think we have been lucky so far. but i dont really see how i have a choice. it would be fine if say hadoop exposes a very small set of libraries as part of the classpath. but if i look

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Corey Nolet
Bo yang- I am using Spark 1.2.0 and undoubtedly there are older Guava classes which are being picked up and serialized with the closures when they are sent from the driver to the executors because the class serial version ids don't match from the driver to the executors. Have you tried doing

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Koert Kuipers
yeah i think we have been lucky so far. but i dont really see how i have a choice. it would be fine if say hadoop exposes a very small set of libraries as part of the classpath. but if i look at the jars on hadoop classpath its a ton! and why? why does parquet need to be included with hadoop for

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-03 Thread M. Dale
Try spark.yarn.user.classpath.first (see https://issues.apache.org/jira/browse/SPARK-2996 - only works for YARN). Also thread at http://apache-spark-user-list.1001560.n3.nabble.com/netty-on-classpath-when-using-spark-submit-td18030.html. HTH, Markus On 02/03/2015 11:20 PM, Corey Nolet wrote:

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-03 Thread bo yang
Corey, Which version of Spark do you use? I am using Spark 1.2.0, and guava 15.0. It seems fine. Best, Bo On Tue, Feb 3, 2015 at 8:56 PM, M. Dale medal...@yahoo.com.invalid wrote: Try spark.yarn.user.classpath.first (see https://issues.apache.org/jira/browse/SPARK-2996 - only works for

“mapreduce.job.user.classpath.first” for Spark

2015-02-03 Thread Corey Nolet
I'm having a really bad dependency conflict right now with Guava versions between my Spark application in Yarn and (I believe) Hadoop's version. The problem is, my driver has the version of Guava which my application is expecting (15.0) while it appears the Spark executors that are working on my