No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?
On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. depend on missing fastutil classes like Long2LongOpenHashMap > > Looks like Long2LongOpenHashMap should be added to the shaded jar. > > Cheers > > On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com> wrote: >> >> Spark includes the clearspring analytics package but intentionally >> excludes >> the dependencies of the fastutil package (see below). >> >> Spark includes parquet-column which includes fastutil and relocates it >> under >> parquet/ >> but creates a shaded jar file which is incomplete because it shades out >> some >> of >> the fastutil classes, notably Long2LongOpenHashMap, which is present in >> the >> fastutil jar file that parquet-column is referencing. >> >> We are using more of the clearspring classes (e.g. QDigest) and those do >> depend on >> missing fastutil classes like Long2LongOpenHashMap. >> >> Even though I add them to our assembly jar file, the class loader finds >> the >> spark assembly >> and we get runtime class loader errors when we try to use it. >> >> It is possible to put our jar file first, as described here: >> https://issues.apache.org/jira/browse/SPARK-939 >> >> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment >> >> which I tried with args to spark-submit: >> --conf spark.driver.userClassPathFirst=true --conf >> spark.executor.userClassPathFirst=true >> but we still get the class not found error. >> >> We have tried copying the source code for clearspring into our package and >> renaming the >> package and that makes it appear to work... Is this risky? It certainly >> is >> ugly. >> >> Can anyone recommend a way to deal with this "dependency **ll" ? >> >> >> === The spark/pom.xml file contains the following lines: >> >> <dependency> >> <groupId>com.clearspring.analytics</groupId> >> <artifactId>stream</artifactId> >> <version>2.7.0</version> >> <exclusions> >> >> <exclusion> >> <groupId>it.unimi.dsi</groupId> >> <artifactId>fastutil</artifactId> >> </exclusion> >> </exclusions> >> </dependency> >> >> === The parquet-column/pom.xml file contains: >> <artifactId>maven-shade-plugin</artifactId> >> <executions> >> <execution> >> <phase>package</phase> >> <goals> >> <goal>shade</goal> >> </goals> >> <configuration> >> <minimizeJar>true</minimizeJar> >> <artifactSet> >> <includes> >> <include>it.unimi.dsi:fastutil</include> >> </includes> >> </artifactSet> >> <relocations> >> <relocation> >> <pattern>it.unimi.dsi</pattern> >> <shadedPattern>parquet.it.unimi.dsi</shadedPattern> >> </relocation> >> </relocations> >> </configuration> >> </execution> >> </executions> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org