Re: Spark excludes "fastutil" dependencies we need

2016-08-11 Thread cryptoe
I ran into a similar problem while using QDigest.
Shading the clearspring, fastutil classes solves the problem.
Snippet from pom.xml :
  
  

package

shade




   
com.clearspring.analytics:stream
   
it.unimi.dsi:fastutil





it.unimi.dsi
   
atom.it.unimi.dsi


   
com.clearspring.analytics.stream
   
atom.com.clearspring.analytics.stream



   
true
   
${project.build.directory}







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794p27512.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark excludes fastutil dependencies we need

2015-02-27 Thread Jim Kleckner
Yes, I used both.

The discussion on this seems to be at github now:
  https://github.com/apache/spark/pull/4780

I am using more classes from a package from which spark uses HyperLogLog as
well.
So we are both including the jar file but Spark is excluding the dependent
package that is required.


On Thu, Feb 26, 2015 at 9:54 AM, Marcelo Vanzin van...@cloudera.com wrote:

 On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com
 wrote:
  So, should the userClassPathFirst flag work and there is a bug?

 Sorry for jumping in the middle of conversation (and probably missing
 some of it), but note that this option applies only to executors. If
 you're trying to use the class in your driver, there's a separate
 option for that.

 Also to note is that if you're adding a class that doesn't exist
 inside the Spark jars, which seems to be the case, this option should
 be irrelevant, since the class loaders should all end up finding the
 one copy of the class that you're adding with your app.

 --
 Marcelo





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Spark-excludes-fastutil-dependencies-we-need-tp21849.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark excludes fastutil dependencies we need

2015-02-26 Thread Marcelo Vanzin
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote:
 So, should the userClassPathFirst flag work and there is a bug?

Sorry for jumping in the middle of conversation (and probably missing
some of it), but note that this option applies only to executors. If
you're trying to use the class in your driver, there's a separate
option for that.

Also to note is that if you're adding a class that doesn't exist
inside the Spark jars, which seems to be the case, this option should
be irrelevant, since the class loaders should all end up finding the
one copy of the class that you're adding with your app.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Ted Yu
Interesting. Looking at SparkConf.scala :

val configs = Seq(
  DeprecatedConfig(spark.files.userClassPathFirst,
spark.executor.userClassPathFirst,
1.3),
  DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3,
Use spark.{driver,executor}.userClassPathFirst instead.))

It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first
are deprecated.

On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote:

 No, we should not add fastutil back. It's up to the app to bring
 dependencies it needs, and that's how I understand this issue. The
 question is really, how to get the classloader visibility right. It
 depends on where you need these classes. Have you looked into
 spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?

 On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  bq. depend on missing fastutil classes like Long2LongOpenHashMap
 
  Looks like Long2LongOpenHashMap should be added to the shaded jar.
 
  Cheers
 
  On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com
 wrote:
 
  Spark includes the clearspring analytics package but intentionally
  excludes
  the dependencies of the fastutil package (see below).
 
  Spark includes parquet-column which includes fastutil and relocates it
  under
  parquet/
  but creates a shaded jar file which is incomplete because it shades out
  some
  of
  the fastutil classes, notably Long2LongOpenHashMap, which is present in
  the
  fastutil jar file that parquet-column is referencing.
 
  We are using more of the clearspring classes (e.g. QDigest) and those do
  depend on
  missing fastutil classes like Long2LongOpenHashMap.
 
  Even though I add them to our assembly jar file, the class loader finds
  the
  spark assembly
  and we get runtime class loader errors when we try to use it.
 
  It is possible to put our jar file first, as described here:
https://issues.apache.org/jira/browse/SPARK-939
 
 
 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
 
  which I tried with args to spark-submit:
--conf spark.driver.userClassPathFirst=true  --conf
  spark.executor.userClassPathFirst=true
  but we still get the class not found error.
 
  We have tried copying the source code for clearspring into our package
 and
  renaming the
  package and that makes it appear to work...  Is this risky?  It
 certainly
  is
  ugly.
 
  Can anyone recommend a way to deal with this dependency **ll ?
 
 
  === The spark/pom.xml file contains the following lines:
 
dependency
  groupIdcom.clearspring.analytics/groupId
  artifactIdstream/artifactId
  version2.7.0/version
  exclusions
 
exclusion
  groupIdit.unimi.dsi/groupId
  artifactIdfastutil/artifactId
/exclusion
  /exclusions
/dependency
 
  === The parquet-column/pom.xml file contains:
  artifactIdmaven-shade-plugin/artifactId
  executions
execution
  phasepackage/phase
  goals
goalshade/goal
  /goals
  configuration
minimizeJartrue/minimizeJar
artifactSet
  includes
includeit.unimi.dsi:fastutil/include
  /includes
/artifactSet
relocations
  relocation
patternit.unimi.dsi/pattern
shadedPatternparquet.it.unimi.dsi/shadedPattern
  /relocation
/relocations
  /configuration
/execution
  /executions
 
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 



Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Jim Kleckner
Inline

On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote:

 Interesting. Looking at SparkConf.scala :

 val configs = Seq(
   DeprecatedConfig(spark.files.userClassPathFirst,
 spark.executor.userClassPathFirst,
 1.3),
   DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3,
 Use spark.{driver,executor}.userClassPathFirst instead.))

 It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first
 are deprecated.


Note that I did use the non-deprecated version, spark.executor.
userClassPathFirst=true.



 On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote:

 No, we should not add fastutil back. It's up to the app to bring
 dependencies it needs, and that's how I understand this issue. The
 question is really, how to get the classloader visibility right. It
 depends on where you need these classes. Have you looked into
 spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?



I noted that I tried this in my original email.

The issue appears related to the fact that parquet is also creating a shaded
jar and that one leaves out the Long2LongOpenHashMap class.

FYI, I have subsequently tried removing the exclusion from the spark build
and
that does cause the fastutil classes to be included and the example works...

So, should the userClassPathFirst flag work and there is a bug?

Or is it reasonable to put in a pull request for the elimination of the
exclusion?




 On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  bq. depend on missing fastutil classes like Long2LongOpenHashMap
 
  Looks like Long2LongOpenHashMap should be added to the shaded jar.
 
  Cheers

 
  On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com
 wrote:
 
  Spark includes the clearspring analytics package but intentionally
  excludes
  the dependencies of the fastutil package (see below).
 
  Spark includes parquet-column which includes fastutil and relocates it
  under
  parquet/
  but creates a shaded jar file which is incomplete because it shades out
  some
  of
  the fastutil classes, notably Long2LongOpenHashMap, which is present in
  the
  fastutil jar file that parquet-column is referencing.
 
  We are using more of the clearspring classes (e.g. QDigest) and those
 do
  depend on
  missing fastutil classes like Long2LongOpenHashMap.
 
  Even though I add them to our assembly jar file, the class loader finds
  the
  spark assembly
  and we get runtime class loader errors when we try to use it.
 
  It is possible to put our jar file first, as described here:
https://issues.apache.org/jira/browse/SPARK-939
 
 
 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
 
  which I tried with args to spark-submit:
--conf spark.driver.userClassPathFirst=true  --conf
  spark.executor.userClassPathFirst=true
  but we still get the class not found error.
 
  We have tried copying the source code for clearspring into our package
 and
  renaming the
  package and that makes it appear to work...  Is this risky?  It
 certainly
  is
  ugly.
 
  Can anyone recommend a way to deal with this dependency **ll ?
 
 
  === The spark/pom.xml file contains the following lines:
 
dependency
  groupIdcom.clearspring.analytics/groupId
  artifactIdstream/artifactId
  version2.7.0/version
  exclusions
 
exclusion
  groupIdit.unimi.dsi/groupId
  artifactIdfastutil/artifactId
/exclusion
  /exclusions
/dependency
 
  === The parquet-column/pom.xml file contains:
  artifactIdmaven-shade-plugin/artifactId
  executions
execution
  phasepackage/phase
  goals
goalshade/goal
  /goals
  configuration
minimizeJartrue/minimizeJar
artifactSet
  includes
includeit.unimi.dsi:fastutil/include
  /includes
/artifactSet
relocations
  relocation
patternit.unimi.dsi/pattern
shadedPatternparquet.it.unimi.dsi/shadedPattern
  /relocation
/relocations
  /configuration
/execution
  /executions
 
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
  Sent from the Apache Spark User List mailing list archive at
 Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 





Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Ted Yu
Maybe drop the exclusion for parquet-provided profile ?

Cheers

On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote:

 Inline

 On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote:

 Interesting. Looking at SparkConf.scala :

 val configs = Seq(
   DeprecatedConfig(spark.files.userClassPathFirst,
 spark.executor.userClassPathFirst,
 1.3),
   DeprecatedConfig(spark.yarn.user.classpath.first, null, 1.3,
 Use spark.{driver,executor}.userClassPathFirst instead.))

 It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first
 are deprecated.


 Note that I did use the non-deprecated version, spark.executor.
 userClassPathFirst=true.



 On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen so...@cloudera.com wrote:

 No, we should not add fastutil back. It's up to the app to bring
 dependencies it needs, and that's how I understand this issue. The
 question is really, how to get the classloader visibility right. It
 depends on where you need these classes. Have you looked into
 spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?



 I noted that I tried this in my original email.

 The issue appears related to the fact that parquet is also creating a
 shaded
 jar and that one leaves out the Long2LongOpenHashMap class.

 FYI, I have subsequently tried removing the exclusion from the spark build
 and
 that does cause the fastutil classes to be included and the example
 works...

 So, should the userClassPathFirst flag work and there is a bug?

 Or is it reasonable to put in a pull request for the elimination of the
 exclusion?




 On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  bq. depend on missing fastutil classes like Long2LongOpenHashMap
 
  Looks like Long2LongOpenHashMap should be added to the shaded jar.
 
  Cheers

 
  On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com
 wrote:
 
  Spark includes the clearspring analytics package but intentionally
  excludes
  the dependencies of the fastutil package (see below).
 
  Spark includes parquet-column which includes fastutil and relocates it
  under
  parquet/
  but creates a shaded jar file which is incomplete because it shades
 out
  some
  of
  the fastutil classes, notably Long2LongOpenHashMap, which is present
 in
  the
  fastutil jar file that parquet-column is referencing.
 
  We are using more of the clearspring classes (e.g. QDigest) and those
 do
  depend on
  missing fastutil classes like Long2LongOpenHashMap.
 
  Even though I add them to our assembly jar file, the class loader
 finds
  the
  spark assembly
  and we get runtime class loader errors when we try to use it.
 
  It is possible to put our jar file first, as described here:
https://issues.apache.org/jira/browse/SPARK-939
 
 
 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
 
  which I tried with args to spark-submit:
--conf spark.driver.userClassPathFirst=true  --conf
  spark.executor.userClassPathFirst=true
  but we still get the class not found error.
 
  We have tried copying the source code for clearspring into our
 package and
  renaming the
  package and that makes it appear to work...  Is this risky?  It
 certainly
  is
  ugly.
 
  Can anyone recommend a way to deal with this dependency **ll ?
 
 
  === The spark/pom.xml file contains the following lines:
 
dependency
  groupIdcom.clearspring.analytics/groupId
  artifactIdstream/artifactId
  version2.7.0/version
  exclusions
 
exclusion
  groupIdit.unimi.dsi/groupId
  artifactIdfastutil/artifactId
/exclusion
  /exclusions
/dependency
 
  === The parquet-column/pom.xml file contains:
  artifactIdmaven-shade-plugin/artifactId
  executions
execution
  phasepackage/phase
  goals
goalshade/goal
  /goals
  configuration
minimizeJartrue/minimizeJar
artifactSet
  includes
includeit.unimi.dsi:fastutil/include
  /includes
/artifactSet
relocations
  relocation
patternit.unimi.dsi/pattern
shadedPatternparquet.it.unimi.dsi/shadedPattern
  /relocation
/relocations
  /configuration
/execution
  /executions
 
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
  Sent from the Apache Spark User List mailing list archive at
 Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 






Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Sean Owen
No, we should not add fastutil back. It's up to the app to bring
dependencies it needs, and that's how I understand this issue. The
question is really, how to get the classloader visibility right. It
depends on where you need these classes. Have you looked into
spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?

On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. depend on missing fastutil classes like Long2LongOpenHashMap

 Looks like Long2LongOpenHashMap should be added to the shaded jar.

 Cheers

 On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote:

 Spark includes the clearspring analytics package but intentionally
 excludes
 the dependencies of the fastutil package (see below).

 Spark includes parquet-column which includes fastutil and relocates it
 under
 parquet/
 but creates a shaded jar file which is incomplete because it shades out
 some
 of
 the fastutil classes, notably Long2LongOpenHashMap, which is present in
 the
 fastutil jar file that parquet-column is referencing.

 We are using more of the clearspring classes (e.g. QDigest) and those do
 depend on
 missing fastutil classes like Long2LongOpenHashMap.

 Even though I add them to our assembly jar file, the class loader finds
 the
 spark assembly
 and we get runtime class loader errors when we try to use it.

 It is possible to put our jar file first, as described here:
   https://issues.apache.org/jira/browse/SPARK-939

 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment

 which I tried with args to spark-submit:
   --conf spark.driver.userClassPathFirst=true  --conf
 spark.executor.userClassPathFirst=true
 but we still get the class not found error.

 We have tried copying the source code for clearspring into our package and
 renaming the
 package and that makes it appear to work...  Is this risky?  It certainly
 is
 ugly.

 Can anyone recommend a way to deal with this dependency **ll ?


 === The spark/pom.xml file contains the following lines:

   dependency
 groupIdcom.clearspring.analytics/groupId
 artifactIdstream/artifactId
 version2.7.0/version
 exclusions

   exclusion
 groupIdit.unimi.dsi/groupId
 artifactIdfastutil/artifactId
   /exclusion
 /exclusions
   /dependency

 === The parquet-column/pom.xml file contains:
 artifactIdmaven-shade-plugin/artifactId
 executions
   execution
 phasepackage/phase
 goals
   goalshade/goal
 /goals
 configuration
   minimizeJartrue/minimizeJar
   artifactSet
 includes
   includeit.unimi.dsi:fastutil/include
 /includes
   /artifactSet
   relocations
 relocation
   patternit.unimi.dsi/pattern
   shadedPatternparquet.it.unimi.dsi/shadedPattern
 /relocation
   /relocations
 /configuration
   /execution
 /executions




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark excludes fastutil dependencies we need

2015-02-24 Thread Ted Yu
bq. depend on missing fastutil classes like Long2LongOpenHashMap

Looks like Long2LongOpenHashMap should be added to the shaded jar.

Cheers

On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote:

 Spark includes the clearspring analytics package but intentionally excludes
 the dependencies of the fastutil package (see below).

 Spark includes parquet-column which includes fastutil and relocates it
 under
 parquet/
 but creates a shaded jar file which is incomplete because it shades out
 some
 of
 the fastutil classes, notably Long2LongOpenHashMap, which is present in the
 fastutil jar file that parquet-column is referencing.

 We are using more of the clearspring classes (e.g. QDigest) and those do
 depend on
 missing fastutil classes like Long2LongOpenHashMap.

 Even though I add them to our assembly jar file, the class loader finds the
 spark assembly
 and we get runtime class loader errors when we try to use it.

 It is possible to put our jar file first, as described here:
   https://issues.apache.org/jira/browse/SPARK-939

 http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment

 which I tried with args to spark-submit:
   --conf spark.driver.userClassPathFirst=true  --conf
 spark.executor.userClassPathFirst=true
 but we still get the class not found error.

 We have tried copying the source code for clearspring into our package and
 renaming the
 package and that makes it appear to work...  Is this risky?  It certainly
 is
 ugly.

 Can anyone recommend a way to deal with this dependency **ll ?


 === The spark/pom.xml file contains the following lines:

   dependency
 groupIdcom.clearspring.analytics/groupId
 artifactIdstream/artifactId
 version2.7.0/version
 exclusions

   exclusion
 groupIdit.unimi.dsi/groupId
 artifactIdfastutil/artifactId
   /exclusion
 /exclusions
   /dependency

 === The parquet-column/pom.xml file contains:
 artifactIdmaven-shade-plugin/artifactId
 executions
   execution
 phasepackage/phase
 goals
   goalshade/goal
 /goals
 configuration
   minimizeJartrue/minimizeJar
   artifactSet
 includes
   includeit.unimi.dsi:fastutil/include
 /includes
   /artifactSet
   relocations
 relocation
   patternit.unimi.dsi/pattern
   shadedPatternparquet.it.unimi.dsi/shadedPattern
 /relocation
   /relocations
 /configuration
   /execution
 /executions




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org