Maybe drop the exclusion for parquet-provided profile ?

Cheers

On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner <j...@cloudphysics.com> wrote:

> Inline
>
> On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Interesting. Looking at SparkConf.scala :
>>
>>     val configs = Seq(
>>       DeprecatedConfig("spark.files.userClassPathFirst",
>> "spark.executor.userClassPathFirst",
>>         "1.3"),
>>       DeprecatedConfig("spark.yarn.user.classpath.first", null, "1.3",
>>         "Use spark.{driver,executor}.userClassPathFirst instead."))
>>
>> It seems spark.files.userClassPathFirst and spark.yarn.user.classpath.first
>> are deprecated.
>>
>
> Note that I did use the non-deprecated version, spark.executor.
> userClassPathFirst=true.
>
>
>>
>> On Wed, Feb 25, 2015 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> No, we should not add fastutil back. It's up to the app to bring
>>> dependencies it needs, and that's how I understand this issue. The
>>> question is really, how to get the classloader visibility right. It
>>> depends on where you need these classes. Have you looked into
>>> spark.files.userClassPathFirst and spark.yarn.user.classpath.first ?
>>>
>>
>
> I noted that I tried this in my original email.
>
> The issue appears related to the fact that parquet is also creating a
> shaded
> jar and that one leaves out the Long2LongOpenHashMap class.
>
> FYI, I have subsequently tried removing the exclusion from the spark build
> and
> that does cause the fastutil classes to be included and the example
> works...
>
> So, should the userClassPathFirst flag work and there is a bug?
>
> Or is it reasonable to put in a pull request for the elimination of the
> exclusion?
>
>
>
>>
>>> On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> > bq. depend on missing fastutil classes like Long2LongOpenHashMap
>>> >
>>> > Looks like Long2LongOpenHashMap should be added to the shaded jar.
>>> >
>>> > Cheers
>>>
>>> >
>>> > On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner <j...@cloudphysics.com>
>>> wrote:
>>> >>
>>> >> Spark includes the clearspring analytics package but intentionally
>>> >> excludes
>>> >> the dependencies of the fastutil package (see below).
>>> >>
>>> >> Spark includes parquet-column which includes fastutil and relocates it
>>> >> under
>>> >> parquet/
>>> >> but creates a shaded jar file which is incomplete because it shades
>>> out
>>> >> some
>>> >> of
>>> >> the fastutil classes, notably Long2LongOpenHashMap, which is present
>>> in
>>> >> the
>>> >> fastutil jar file that parquet-column is referencing.
>>> >>
>>> >> We are using more of the clearspring classes (e.g. QDigest) and those
>>> do
>>> >> depend on
>>> >> missing fastutil classes like Long2LongOpenHashMap.
>>> >>
>>> >> Even though I add them to our assembly jar file, the class loader
>>> finds
>>> >> the
>>> >> spark assembly
>>> >> and we get runtime class loader errors when we try to use it.
>>> >>
>>> >> It is possible to put our jar file first, as described here:
>>> >>   https://issues.apache.org/jira/browse/SPARK-939
>>> >>
>>> >>
>>> http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment
>>> >>
>>> >> which I tried with args to spark-submit:
>>> >>   --conf spark.driver.userClassPathFirst=true  --conf
>>> >> spark.executor.userClassPathFirst=true
>>> >> but we still get the class not found error.
>>> >>
>>> >> We have tried copying the source code for clearspring into our
>>> package and
>>> >> renaming the
>>> >> package and that makes it appear to work...  Is this risky?  It
>>> certainly
>>> >> is
>>> >> ugly.
>>> >>
>>> >> Can anyone recommend a way to deal with this "dependency **ll" ?
>>> >>
>>> >>
>>> >> === The spark/pom.xml file contains the following lines:
>>> >>
>>> >>       <dependency>
>>> >>         <groupId>com.clearspring.analytics</groupId>
>>> >>         <artifactId>stream</artifactId>
>>> >>         <version>2.7.0</version>
>>> >>         <exclusions>
>>> >>
>>> >>           <exclusion>
>>> >>             <groupId>it.unimi.dsi</groupId>
>>> >>             <artifactId>fastutil</artifactId>
>>> >>           </exclusion>
>>> >>         </exclusions>
>>> >>       </dependency>
>>> >>
>>> >> === The parquet-column/pom.xml file contains:
>>> >>         <artifactId>maven-shade-plugin</artifactId>
>>> >>         <executions>
>>> >>           <execution>
>>> >>             <phase>package</phase>
>>> >>             <goals>
>>> >>               <goal>shade</goal>
>>> >>             </goals>
>>> >>             <configuration>
>>> >>               <minimizeJar>true</minimizeJar>
>>> >>               <artifactSet>
>>> >>                 <includes>
>>> >>                   <include>it.unimi.dsi:fastutil</include>
>>> >>                 </includes>
>>> >>               </artifactSet>
>>> >>               <relocations>
>>> >>                 <relocation>
>>> >>                   <pattern>it.unimi.dsi</pattern>
>>> >>                   <shadedPattern>parquet.it.unimi.dsi</shadedPattern>
>>> >>                 </relocation>
>>> >>               </relocations>
>>> >>             </configuration>
>>> >>           </execution>
>>> >>         </executions>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-excludes-fastutil-dependencies-we-need-tp21794.html
>>> >> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to