Re: repositories for spark jars

2014-03-19 Thread Evan Chan
The alternative is for Spark to not explicitly include hadoop_client,
perhaps only as provided, and provide a facility to insert the
hadoop client jars of your choice at packaging time.   Unfortunately,
hadoop_client pulls in a ton of other deps, so it's not as simple as
copying one extra jar into dist/jars.

On Mon, Mar 17, 2014 at 10:58 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Nathan,

 I don't think this would be possible because there are at least dozens
 of permutations of Hadoop versions (different vendor distros X
 different versions X YARN vs not YARN, etc) and maybe hundreds. So
 publishing new artifacts for each would be really difficult.

 What is the exact problem you ran into? Maybe we need to improve the
 documentation to make it more clear how to correctly link against
 spark/hadoop for user applications. Basically the model we have now is
 users link against Spark and then link against the hadoop-client
 relevant to their version of Hadoop.

 - Patrick

 On Mon, Mar 17, 2014 at 9:50 AM, Nathan Kronenfeld
 nkronenf...@oculusinfo.com wrote:
 After just spending a couple days fighting with a new spark installation,
 getting spark and hadoop version numbers matching everywhere, I have a
 suggestion I'd like to put out there.

 Can we put the hadoop version against which the spark jars were built into
 the version number?

 I noticed that the Cloudera maven repo has started to do this (
 https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/)
 - sadly, though, only with the cdh5.x versions, not with the 4.x versions
 for which they also have spark parcels.  But I see no signs of it in the
 central maven repo.

 Is this already done in some other repo about which I don't know, perhaps?

 I know it would save us a lot of time and grief simply to be able to point
 a project we build at the right version, and not have to rebuild and deploy
 spark manually.

 --
 Nathan Kronenfeld
 Senior Visualization Developer
 Oculus Info Inc
 2 Berkeley Street, Suite 600,
 Toronto, Ontario M5A 4J5
 Phone:  +1-416-203-3003 x 238
 Email:  nkronenf...@oculusinfo.com



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |


repositories for spark jars

2014-03-17 Thread Nathan Kronenfeld
After just spending a couple days fighting with a new spark installation,
getting spark and hadoop version numbers matching everywhere, I have a
suggestion I'd like to put out there.

Can we put the hadoop version against which the spark jars were built into
the version number?

I noticed that the Cloudera maven repo has started to do this (
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/)
- sadly, though, only with the cdh5.x versions, not with the 4.x versions
for which they also have spark parcels.  But I see no signs of it in the
central maven repo.

Is this already done in some other repo about which I don't know, perhaps?

I know it would save us a lot of time and grief simply to be able to point
a project we build at the right version, and not have to rebuild and deploy
spark manually.

-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com


Re: repositories for spark jars

2014-03-17 Thread Patrick Wendell
Hey Nathan,

I don't think this would be possible because there are at least dozens
of permutations of Hadoop versions (different vendor distros X
different versions X YARN vs not YARN, etc) and maybe hundreds. So
publishing new artifacts for each would be really difficult.

What is the exact problem you ran into? Maybe we need to improve the
documentation to make it more clear how to correctly link against
spark/hadoop for user applications. Basically the model we have now is
users link against Spark and then link against the hadoop-client
relevant to their version of Hadoop.

- Patrick

On Mon, Mar 17, 2014 at 9:50 AM, Nathan Kronenfeld
nkronenf...@oculusinfo.com wrote:
 After just spending a couple days fighting with a new spark installation,
 getting spark and hadoop version numbers matching everywhere, I have a
 suggestion I'd like to put out there.

 Can we put the hadoop version against which the spark jars were built into
 the version number?

 I noticed that the Cloudera maven repo has started to do this (
 https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-core_2.10/)
 - sadly, though, only with the cdh5.x versions, not with the 4.x versions
 for which they also have spark parcels.  But I see no signs of it in the
 central maven repo.

 Is this already done in some other repo about which I don't know, perhaps?

 I know it would save us a lot of time and grief simply to be able to point
 a project we build at the right version, and not have to rebuild and deploy
 spark manually.

 --
 Nathan Kronenfeld
 Senior Visualization Developer
 Oculus Info Inc
 2 Berkeley Street, Suite 600,
 Toronto, Ontario M5A 4J5
 Phone:  +1-416-203-3003 x 238
 Email:  nkronenf...@oculusinfo.com