Forwarding conversation below that didn't make it to the list.

---------- Forwarded message ----------
From: Jim Kleckner <j...@cloudphysics.com>
Date: Wed, Feb 25, 2015 at 8:42 PM
Subject: Re: Spark excludes "fastutil" dependencies we need
To: Ted Yu <yuzhih...@gmail.com>
Cc: Sean Owen <so...@cloudera.com>, user <user@spark.apache.org>


Inline

On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Interesting. Looking at SparkConf.scala :
>
>     val configs = Seq(
>       DeprecatedConfig("spark.files.userClassPathFirst",
> "spark.executor.userClassPathFirst",
>         "1.3"),
>       DeprecatedConfig("spark.yarn.user.classpath.first", null, "1.3",
>         "Use spark.{driver,executor}.userClassPathFirst instead."))
>
> It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first
> are deprecated.
>

Note that I did use the non-deprecated version, spark.executor.
userClassPathFirst=true.


>
> On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> No, we should not add fastutil back. It's up to the app to bring
>> dependencies it needs, and that's how I understand this issue. The
>> question is really, how to get the classloader visibility right. It
>> depends on where you need these classes. Have you looked into
>> spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?
>>
>

I noted that I tried this in my original email.

The issue appears related to the fact that parquet is also creating a shaded
jar and that one leaves out the Long2LongOpenHashMap class.

FYI, I have subsequently tried removing the exclusion from the spark build
and
that does cause the fastutil classes to be included and the example works...

So, should the userClassPathFirst flag work and there is a bug?

Or is it reasonable to put in a pull request for the elimination of the
exclusion?



>
>> On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > bq. depend on missing fastutil classes like Long2LongOpenHashMap
>> >
>> > Looks like Long2LongOpenHashMap should be added to the shaded jar.
>> >
>> > Cheers
>>
>> >
>> > On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com>
>> wrote:
>> >>
>> >> Spark includes the clearspring analytics package but intentionally
>> >> excludes
>> >> the dependencies of the fastutil package (see below).
>> >>
>> >> Spark includes parquet-column which includes fastutil and relocates it
>> >> under
>> >> parquet/
>> >> but creates a shaded jar file which is incomplete because it shades out
>> >> some
>> >> of
>> >> the fastutil classes, notably Long2LongOpenHashMap, which is present in
>> >> the
>> >> fastutil jar file that parquet-column is referencing.
>> >>
>> >> We are using more of the clearspring classes (e.g. QDigest) and those
>> do
>> >> depend on
>> >> missing fastutil classes like Long2LongOpenHashMap.
>> >>
>> >> Even though I add them to our assembly jar file, the class loader finds
>> >> the
>> >> spark assembly
>> >> and we get runtime class loader errors when we try to use it.
>> >>
>> >> It is possible to put our jar file first, as described here:
>> >>   https://issues.apache.org/jira/browse/SPARK-939
>> >>
>> >>
>> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
>> >>
>> >> which I tried with args to spark-submit:
>> >>   --conf spark.driver.userClassPathFirst=true  --conf
>> >> spark.executor.userClassPathFirst=true
>> >> but we still get the class not found error.
>> >>
>> >> We have tried copying the source code for clearspring into our package
>> and
>> >> renaming the
>> >> package and that makes it appear to work...  Is this risky?  It
>> certainly
>> >> is
>> >> ugly.
>> >>
>> >> Can anyone recommend a way to deal with this "dependency **ll" ?
>> >>
>> >>
>> >> === The spark/pom.xml file contains the following lines:
>> >>
>> >>       <dependency>
>> >>         <groupId>com.clearspring.analytics</groupId>
>> >>         <artifactId>stream</artifactId>
>> >>         <version>2.7.0</version>
>> >>         <exclusions>
>> >>
>> >>           <exclusion>
>> >>             <groupId>it.unimi.dsi</groupId>
>> >>             <artifactId>fastutil</artifactId>
>> >>           </exclusion>
>> >>         </exclusions>
>> >>       </dependency>
>> >>
>> >> === The parquet-column/pom.xml file contains:
>> >>         <artifactId>maven-shade-plugin</artifactId>
>> >>         <executions>
>> >>           <execution>
>> >>             <phase>package</phase>
>> >>             <goals>
>> >>               <goal>shade</goal>
>> >>             </goals>
>> >>             <configuration>
>> >>               <minimizeJar>true</minimizeJar>
>> >>               <artifactSet>
>> >>                 <includes>
>> >>                   <include>it.unimi.dsi:fastutil</include>
>> >>                 </includes>
>> >>               </artifactSet>
>> >>               <relocations>
>> >>                 <relocation>
>> >>                   <pattern>it.unimi.dsi</pattern>
>> >>                   <shadedPattern>parquet.it.unimi.dsi</shadedPattern>
>> >>                 </relocation>
>> >>               </relocations>
>> >>             </configuration>
>> >>           </execution>
>> >>         </executions>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>>
>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-fastutil-dependencies-we-need-tp21812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to