Re: Spark excludes "fastutil" dependencies we need
I ran into a similar problem while using QDigest. Shading the clearspring, fastutil classes solves the problem. Snippet from pom.xml : package shade com.clearspring.analytics:stream it.unimi.dsi:fastutil it.unimi.dsi atom.it.unimi.dsi com.clearspring.analytics.stream atom.com.clearspring.analytics.stream true ${project.build.directory} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794p27512.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
Yes, I used both. The discussion on this seems to be at github now: https://github.com/apache/spark/pull/4780 I am using more classes from a package from which spark uses HyperLogLog as well. So we are both including the jar file but Spark is excluding the dependent package that is required. On Thu, Feb 26, 2015 at 9:54 AM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to use the class in your driver, there's a separate option for that. Also to note is that if you're adding a class that doesn't exist inside the Spark jars, which seems to be the case, this option should be irrelevant, since the class loaders should all end up finding the one copy of the class that you're adding with your app. -- Marcelo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-Spark-excludes-fastutil-dependencies-we-need-tp21849.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark excludes fastutil dependencies we need
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to use the class in your driver, there's a separate option for that. Also to note is that if you're adding a class that doesn't exist inside the Spark jars, which seems to be the case, this option should be irrelevant, since the class loaders should all end up finding the one copy of the class that you're adding with your app. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
Interesting. Looking at SparkConf.scala : val configs = Seq( DeprecatedConfig(spark.files.userClassPathFirst, spark.executor.userClassPathFirst, 1.3), DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3, Use spark.{driver,executor}.userClassPathFirst instead.)) It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first are deprecated. On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote: No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
Inline On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote: Interesting. Looking at SparkConf.scala : val configs = Seq( DeprecatedConfig(spark.files.userClassPathFirst, spark.executor.userClassPathFirst, 1.3), DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3, Use spark.{driver,executor}.userClassPathFirst instead.)) It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first are deprecated. Note that I did use the non-deprecated version, spark.executor. userClassPathFirst=true. On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote: No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? I noted that I tried this in my original email. The issue appears related to the fact that parquet is also creating a shaded jar and that one leaves out the Long2LongOpenHashMap class. FYI, I have subsequently tried removing the exclusion from the spark build and that does cause the fastutil classes to be included and the example works... So, should the userClassPathFirst flag work and there is a bug? Or is it reasonable to put in a pull request for the elimination of the exclusion? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
Maybe drop the exclusion for parquet-provided profile ? Cheers On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: Inline On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote: Interesting. Looking at SparkConf.scala : val configs = Seq( DeprecatedConfig(spark.files.userClassPathFirst, spark.executor.userClassPathFirst, 1.3), DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3, Use spark.{driver,executor}.userClassPathFirst instead.)) It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first are deprecated. Note that I did use the non-deprecated version, spark.executor. userClassPathFirst=true. On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote: No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? I noted that I tried this in my original email. The issue appears related to the fact that parquet is also creating a shaded jar and that one leaves out the Long2LongOpenHashMap class. FYI, I have subsequently tried removing the exclusion from the spark build and that does cause the fastutil classes to be included and the example works... So, should the userClassPathFirst flag work and there is a bug? Or is it reasonable to put in a pull request for the elimination of the exclusion? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org