Spark includes the clearspring analytics package but intentionally excludes
the dependencies of the fastutil package (see below).

Spark includes parquet-column which includes fastutil and relocates it under
parquet/
but creates a shaded jar file which is incomplete because it shades out some
of 
the fastutil classes, notably Long2LongOpenHashMap, which is present in the
fastutil jar file that parquet-column is referencing.

We are using more of the clearspring classes (e.g. QDigest) and those do
depend on
missing fastutil classes like Long2LongOpenHashMap.

Even though I add them to our assembly jar file, the class loader finds the
spark assembly
and we get runtime class loader errors when we try to use it.

It is possible to put our jar file first, as described here:
  https://issues.apache.org/jira/browse/SPARK-939
  http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment

which I tried with args to spark-submit:
  --conf spark.driver.userClassPathFirst=true  --conf
spark.executor.userClassPathFirst=true
but we still get the class not found error.

We have tried copying the source code for clearspring into our package and
renaming the
package and that makes it appear to work...  Is this risky?  It certainly is
ugly.

Can anyone recommend a way to deal with this "dependency **ll" ?


=== The spark/pom.xml file contains the following lines:

      <dependency>
        <groupId>com.clearspring.analytics</groupId>
        <artifactId>stream</artifactId>
        <version>2.7.0</version>
        <exclusions>
          
          <exclusion>
            <groupId>it.unimi.dsi</groupId>
            <artifactId>fastutil</artifactId>
          </exclusion>
        </exclusions>
      </dependency>

=== The parquet-column/pom.xml file contains:
        <artifactId>maven-shade-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <minimizeJar>true</minimizeJar>
              <artifactSet>
                <includes>
                  <include>it.unimi.dsi:fastutil</include>
                </includes>
              </artifactSet>
              <relocations>
                <relocation>
                  <pattern>it.unimi.dsi</pattern>
                  <shadedPattern>parquet.it.unimi.dsi</shadedPattern>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to