No, we should not add fastutil back. It's up to the app to bring
dependencies it needs, and that's how I understand this issue. The
question is really, how to get the classloader visibility right. It
depends on where you need these classes. Have you looked into
spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?

On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> bq. depend on missing fastutil classes like Long2LongOpenHashMap
>
> Looks like Long2LongOpenHashMap should be added to the shaded jar.
>
> Cheers
>
> On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com> wrote:
>>
>> Spark includes the clearspring analytics package but intentionally
>> excludes
>> the dependencies of the fastutil package (see below).
>>
>> Spark includes parquet-column which includes fastutil and relocates it
>> under
>> parquet/
>> but creates a shaded jar file which is incomplete because it shades out
>> some
>> of
>> the fastutil classes, notably Long2LongOpenHashMap, which is present in
>> the
>> fastutil jar file that parquet-column is referencing.
>>
>> We are using more of the clearspring classes (e.g. QDigest) and those do
>> depend on
>> missing fastutil classes like Long2LongOpenHashMap.
>>
>> Even though I add them to our assembly jar file, the class loader finds
>> the
>> spark assembly
>> and we get runtime class loader errors when we try to use it.
>>
>> It is possible to put our jar file first, as described here:
>>   https://issues.apache.org/jira/browse/SPARK-939
>>
>> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
>>
>> which I tried with args to spark-submit:
>>   --conf spark.driver.userClassPathFirst=true  --conf
>> spark.executor.userClassPathFirst=true
>> but we still get the class not found error.
>>
>> We have tried copying the source code for clearspring into our package and
>> renaming the
>> package and that makes it appear to work...  Is this risky?  It certainly
>> is
>> ugly.
>>
>> Can anyone recommend a way to deal with this "dependency **ll" ?
>>
>>
>> === The spark/pom.xml file contains the following lines:
>>
>>       <dependency>
>>         <groupId>com.clearspring.analytics</groupId>
>>         <artifactId>stream</artifactId>
>>         <version>2.7.0</version>
>>         <exclusions>
>>
>>           <exclusion>
>>             <groupId>it.unimi.dsi</groupId>
>>             <artifactId>fastutil</artifactId>
>>           </exclusion>
>>         </exclusions>
>>       </dependency>
>>
>> === The parquet-column/pom.xml file contains:
>>         <artifactId>maven-shade-plugin</artifactId>
>>         <executions>
>>           <execution>
>>             <phase>package</phase>
>>             <goals>
>>               <goal>shade</goal>
>>             </goals>
>>             <configuration>
>>               <minimizeJar>true</minimizeJar>
>>               <artifactSet>
>>                 <includes>
>>                   <include>it.unimi.dsi:fastutil</include>
>>                 </includes>
>>               </artifactSet>
>>               <relocations>
>>                 <relocation>
>>                   <pattern>it.unimi.dsi</pattern>
>>                   <shadedPattern>parquet.it.unimi.dsi</shadedPattern>
>>                 </relocation>
>>               </relocations>
>>             </configuration>
>>           </execution>
>>         </executions>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to