Mailing list schizophrenia?
I notice that some people send messages directly to user@spark.apache.org and some via nabble, either using email or the web client. There are two index sites, one directly at apache.org and one at nabble. But messages sent directly to user@spark.apache.org only show up in the apache list. Further, it appears that you can subscribe either directly to user@spark.apache.org, in which you see all emails, or via nabble and you see a subset. Is this correct and is it intentional? Apache site: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/browser Nabble site: http://apache-spark-user-list.1001560.n3.nabble.com/ An example of a message that only shows up in Apache: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAGK53LnsD59wwQrP3-9yHc38C4eevAfMbV2so%2B_wi8k0%2Btq5HQ%40mail.gmail.com%3E This message was sent both to Nabble and user@spark.apache.org to see how that behaves. Jim
Re: Mailing list schizophrenia?
Yes, it did get delivered to the apache list shown here: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAGK53LnsD59wwQrP3-9yHc38C4eevAfMbV2so%2B_wi8k0%2Btq5HQ%40mail.gmail.com%3E But the web site for spark community directs people to nabble for viewing messages and it doesn't show up there. Community page: http://spark.apache.org/community.html Link in that page to the archive: http://apache-spark-user-list.1001560.n3.nabble.com/ The reliable archive: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/browser On Fri, Mar 20, 2015 at 12:34 PM, Ted Yu yuzhih...@gmail.com wrote: Jim: I can find the example message here: http://search-hadoop.com/m/JW1q5zP54J1 On Fri, Mar 20, 2015 at 12:29 PM, Jim Kleckner j...@cloudphysics.com wrote: I notice that some people send messages directly to user@spark.apache.org and some via nabble, either using email or the web client. There are two index sites, one directly at apache.org and one at nabble. But messages sent directly to user@spark.apache.org only show up in the apache list. Further, it appears that you can subscribe either directly to user@spark.apache.org, in which you see all emails, or via nabble and you see a subset. Is this correct and is it intentional? Apache site: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/browser Nabble site: http://apache-spark-user-list.1001560.n3.nabble.com/ An example of a message that only shows up in Apache: http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAGK53LnsD59wwQrP3-9yHc38C4eevAfMbV2so%2B_wi8k0%2Btq5HQ%40mail.gmail.com%3E This message was sent both to Nabble and user@spark.apache.org to see how that behaves. Jim
Reliable method/tips to solve dependency issues?
Do people have a reliable/repeatable method for solving dependency issues or tips? The current world of spark-hadoop-hbase-parquet-... is very challenging given the huge footprint of dependent packages and we may be pushing against the limits of how many packages can be combined into one environment... The process of searching the web to pick at incompatibilities one at a time is at best tedious and at worst non-converging. It makes me wonder if there is (or ought to be) a page cataloging in one place the conflicts that Spark users have hit and what was done to solve it. Eugene Yokota wrote an interesting blog about current sbt dependency management in sbt v 0.13.7 that includes nice improvements for working with dependencies: https://typesafe.com/blog/improved-dependency-management-with-sbt-0137 After reading that, I refreshed on the sbt documentation and found show update. It gives very extensive information. For reference, there was an extensive discussion thread about sbt and maven last year that touches on a lot of topics: http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201402.mbox/%3ccabpqxsukhd4qsf5dg9ruhn7wvonxfm+y5b1k5d8g7h6s9bh...@mail.gmail.com%3E
Re: Spark excludes fastutil dependencies we need
Yes, I used both. The discussion on this seems to be at github now: https://github.com/apache/spark/pull/4780 I am using more classes from a package from which spark uses HyperLogLog as well. So we are both including the jar file but Spark is excluding the dependent package that is required. On Thu, Feb 26, 2015 at 9:54 AM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to use the class in your driver, there's a separate option for that. Also to note is that if you're adding a class that doesn't exist inside the Spark jars, which seems to be the case, this option should be irrelevant, since the class loaders should all end up finding the one copy of the class that you're adding with your app. -- Marcelo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-Spark-excludes-fastutil-dependencies-we-need-tp21849.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Fwd: Spark excludes fastutil dependencies we need
Forwarding conversation below that didn't make it to the list. -- Forwarded message -- From: Jim Kleckner j...@cloudphysics.com Date: Wed, Feb 25, 2015 at 8:42 PM Subject: Re: Spark excludes fastutil dependencies we need To: Ted Yu yuzhih...@gmail.com Cc: Sean Owen so...@cloudera.com, user user@spark.apache.org Inline On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote: Interesting. Looking at SparkConf.scala : val configs = Seq( DeprecatedConfig(spark.files.userClassPathFirst, spark.executor.userClassPathFirst, 1.3), DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3, Use spark.{driver,executor}.userClassPathFirst instead.)) It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first are deprecated. Note that I did use the non-deprecated version, spark.executor. userClassPathFirst=true. On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote: No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? I noted that I tried this in my original email. The issue appears related to the fact that parquet is also creating a shaded jar and that one leaves out the Long2LongOpenHashMap class. FYI, I have subsequently tried removing the exclusion from the spark build and that does cause the fastutil classes to be included and the example works... So, should the userClassPathFirst flag work and there is a bug? Or is it reasonable to put in a pull request for the elimination of the exclusion? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com
Re: Fwd: Spark excludes fastutil dependencies we need
I created an issue and pull request. Discussion can continue there: https://issues.apache.org/jira/browse/SPARK-6029 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-fastutil-dependencies-we-need-tp21812p21814.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark excludes fastutil dependencies we need
Inline On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote: Interesting. Looking at SparkConf.scala : val configs = Seq( DeprecatedConfig(spark.files.userClassPathFirst, spark.executor.userClassPathFirst, 1.3), DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3, Use spark.{driver,executor}.userClassPathFirst instead.)) It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first are deprecated. Note that I did use the non-deprecated version, spark.executor. userClassPathFirst=true. On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote: No, we should not add fastutil back. It's up to the app to bring dependencies it needs, and that's how I understand this issue. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? I noted that I tried this in my original email. The issue appears related to the fact that parquet is also creating a shaded jar and that one leaves out the Long2LongOpenHashMap class. FYI, I have subsequently tried removing the exclusion from the spark build and that does cause the fastutil classes to be included and the example works... So, should the userClassPathFirst flag work and there is a bug? Or is it reasonable to put in a pull request for the elimination of the exclusion? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark excludes fastutil dependencies we need
Spark includes the clearspring analytics package but intentionally excludes the dependencies of the fastutil package (see below). Spark includes parquet-column which includes fastutil and relocates it under parquet/ but creates a shaded jar file which is incomplete because it shades out some of the fastutil classes, notably Long2LongOpenHashMap, which is present in the fastutil jar file that parquet-column is referencing. We are using more of the clearspring classes (e.g. QDigest) and those do depend on missing fastutil classes like Long2LongOpenHashMap. Even though I add them to our assembly jar file, the class loader finds the spark assembly and we get runtime class loader errors when we try to use it. It is possible to put our jar file first, as described here: https://issues.apache.org/jira/browse/SPARK-939 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment which I tried with args to spark-submit: --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true but we still get the class not found error. We have tried copying the source code for clearspring into our package and renaming the package and that makes it appear to work... Is this risky? It certainly is ugly. Can anyone recommend a way to deal with this dependency **ll ? === The spark/pom.xml file contains the following lines: dependency groupIdcom.clearspring.analytics/groupId artifactIdstream/artifactId version2.7.0/version exclusions exclusion groupIdit.unimi.dsi/groupId artifactIdfastutil/artifactId /exclusion /exclusions /dependency === The parquet-column/pom.xml file contains: artifactIdmaven-shade-plugin/artifactId executions execution phasepackage/phase goals goalshade/goal /goals configuration minimizeJartrue/minimizeJar artifactSet includes includeit.unimi.dsi:fastutil/include /includes /artifactSet relocations relocation patternit.unimi.dsi/pattern shadedPatternparquet.it.unimi.dsi/shadedPattern /relocation /relocations /configuration /execution /executions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org