bq. depend on missing fastutil classes like Long2LongOpenHashMap

Looks like Long2LongOpenHashMap should be added to the shaded jar.

Cheers

On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com> wrote:

> Spark includes the clearspring analytics package but intentionally excludes
> the dependencies of the fastutil package (see below).
>
> Spark includes parquet-column which includes fastutil and relocates it
> under
> parquet/
> but creates a shaded jar file which is incomplete because it shades out
> some
> of
> the fastutil classes, notably Long2LongOpenHashMap, which is present in the
> fastutil jar file that parquet-column is referencing.
>
> We are using more of the clearspring classes (e.g. QDigest) and those do
> depend on
> missing fastutil classes like Long2LongOpenHashMap.
>
> Even though I add them to our assembly jar file, the class loader finds the
> spark assembly
> and we get runtime class loader errors when we try to use it.
>
> It is possible to put our jar file first, as described here:
>   https://issues.apache.org/jira/browse/SPARK-939
>
> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
>
> which I tried with args to spark-submit:
>   --conf spark.driver.userClassPathFirst=true  --conf
> spark.executor.userClassPathFirst=true
> but we still get the class not found error.
>
> We have tried copying the source code for clearspring into our package and
> renaming the
> package and that makes it appear to work...  Is this risky?  It certainly
> is
> ugly.
>
> Can anyone recommend a way to deal with this "dependency **ll" ?
>
>
> === The spark/pom.xml file contains the following lines:
>
>       <dependency>
>         <groupId>com.clearspring.analytics</groupId>
>         <artifactId>stream</artifactId>
>         <version>2.7.0</version>
>         <exclusions>
>
>           <exclusion>
>             <groupId>it.unimi.dsi</groupId>
>             <artifactId>fastutil</artifactId>
>           </exclusion>
>         </exclusions>
>       </dependency>
>
> === The parquet-column/pom.xml file contains:
>         <artifactId>maven-shade-plugin</artifactId>
>         <executions>
>           <execution>
>             <phase>package</phase>
>             <goals>
>               <goal>shade</goal>
>             </goals>
>             <configuration>
>               <minimizeJar>true</minimizeJar>
>               <artifactSet>
>                 <includes>
>                   <include>it.unimi.dsi:fastutil</include>
>                 </includes>
>               </artifactSet>
>               <relocations>
>                 <relocation>
>                   <pattern>it.unimi.dsi</pattern>
>                   <shadedPattern>parquet.it.unimi.dsi</shadedPattern>
>                 </relocation>
>               </relocations>
>             </configuration>
>           </execution>
>         </executions>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to