Re: repositories for spark jars
The alternative is for Spark to not explicitly include hadoop_client, perhaps only as provided, and provide a facility to insert the hadoop client jars of your choice at packaging time. Unfortunately, hadoop_client pulls in a ton of other deps, so it's not as simple as copying one extra jar into dist/jars. On Mon, Mar 17, 2014 at 10:58 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Nathan, I don't think this would be possible because there are at least dozens of permutations of Hadoop versions (different vendor distros X different versions X YARN vs not YARN, etc) and maybe hundreds. So publishing new artifacts for each would be really difficult. What is the exact problem you ran into? Maybe we need to improve the documentation to make it more clear how to correctly link against spark/hadoop for user applications. Basically the model we have now is users link against Spark and then link against the hadoop-client relevant to their version of Hadoop. - Patrick On Mon, Mar 17, 2014 at 9:50 AM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: After just spending a couple days fighting with a new spark installation, getting spark and hadoop version numbers matching everywhere, I have a suggestion I'd like to put out there. Can we put the hadoop version against which the spark jars were built into the version number? I noticed that the Cloudera maven repo has started to do this ( https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/) - sadly, though, only with the cdh5.x versions, not with the 4.x versions for which they also have spark parcels. But I see no signs of it in the central maven repo. Is this already done in some other repo about which I don't know, perhaps? I know it would save us a lot of time and grief simply to be able to point a project we build at the right version, and not have to rebuild and deploy spark manually. -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com -- -- Evan Chan Staff Engineer e...@ooyala.com |
repositories for spark jars
After just spending a couple days fighting with a new spark installation, getting spark and hadoop version numbers matching everywhere, I have a suggestion I'd like to put out there. Can we put the hadoop version against which the spark jars were built into the version number? I noticed that the Cloudera maven repo has started to do this ( https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/) - sadly, though, only with the cdh5.x versions, not with the 4.x versions for which they also have spark parcels. But I see no signs of it in the central maven repo. Is this already done in some other repo about which I don't know, perhaps? I know it would save us a lot of time and grief simply to be able to point a project we build at the right version, and not have to rebuild and deploy spark manually. -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: repositories for spark jars
Hey Nathan, I don't think this would be possible because there are at least dozens of permutations of Hadoop versions (different vendor distros X different versions X YARN vs not YARN, etc) and maybe hundreds. So publishing new artifacts for each would be really difficult. What is the exact problem you ran into? Maybe we need to improve the documentation to make it more clear how to correctly link against spark/hadoop for user applications. Basically the model we have now is users link against Spark and then link against the hadoop-client relevant to their version of Hadoop. - Patrick On Mon, Mar 17, 2014 at 9:50 AM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: After just spending a couple days fighting with a new spark installation, getting spark and hadoop version numbers matching everywhere, I have a suggestion I'd like to put out there. Can we put the hadoop version against which the spark jars were built into the version number? I noticed that the Cloudera maven repo has started to do this ( https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/) - sadly, though, only with the cdh5.x versions, not with the 4.x versions for which they also have spark parcels. But I see no signs of it in the central maven repo. Is this already done in some other repo about which I don't know, perhaps? I know it would save us a lot of time and grief simply to be able to point a project we build at the right version, and not have to rebuild and deploy spark manually. -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com