[ 
https://issues.apache.org/jira/browse/SPARK-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493200#comment-14493200
 ] 

Sean Owen commented on SPARK-6511:
----------------------------------

Yeah that might be the fastest way to find all the jars at once. They occur in 
various places in the raw Hadoop distro. That's really not too bad, this one 
liner. I don't know if it's so great to start then also modifying the classpath 
based on HADOOP_HOME as this might not be what the end user wants or interfere 
with an explicitly configured classpath.

In something like CDH they're all laid out in one directory, per components, so 
are easier to find, but that isn't much different. I don't see that the distro 
sets SPARK_DIST_CLASSPATH but sets things like SPARK_LIBRARY_PATH in 
spark-env.sh to ${SPARK_HOME}/lib. I actually don't see where the Hadoop deps 
come in but it is going to be something similar. The effect is about the same, 
to add all of the Hadoop client and YARN jars to the classpath too.

> Publish "hadoop provided" build with instructions for different distros
> -----------------------------------------------------------------------
>
>                 Key: SPARK-6511
>                 URL: https://issues.apache.org/jira/browse/SPARK-6511
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>            Reporter: Patrick Wendell
>
> Currently we publish a series of binaries with different Hadoop client jars. 
> This mostly works, but some users have reported compatibility issues with 
> different distributions.
> One improvement moving forward might be to publish a binary build that simply 
> asks you to set HADOOP_HOME to pick up the Hadoop client location. That way 
> it would work across multiple distributions, even if they have subtle 
> incompatibilities with upstream Hadoop.
> I think a first step for this would be to produce such a build for the 
> community and see how well it works. One potential issue is that our fancy 
> excludes and dependency re-writing won't work with the simpler "append 
> Hadoop's classpath to Spark". Also, how we deal with the Hive dependency is 
> unclear, i.e. should we continue to bundle Spark's Hive (which has some fixes 
> for dependency conflicts) or do we allow for linking against vanilla Hive at 
> runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to