Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below).
Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this "dependency **ll" ? === The spark/pom.xml file contains the following lines: <dependency> <groupId>com.clearspring.analytics</groupId> <artifactId>stream</artifactId> <version>2.7.0</version> <exclusions> <exclusion> <groupId>it.unimi.dsi</groupId> <artifactId>fastutil</artifactId> </exclusion> </exclusions> </dependency> === The parquet-column/pom.xml file contains: <artifactId>maven-shade-plugin</artifactId> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <minimizeJar>true</minimizeJar> <artifactSet> <includes> <include>it.unimi.dsi:fastutil</include> </includes> </artifactSet> <relocations> <relocation> <pattern>it.unimi.dsi</pattern> <shadedPattern>parquet.it.unimi.dsi</shadedPattern> </relocation> </relocations> </configuration> </execution> </executions> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org