Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Mridul Muralidharan
In ideal situation, +1 on removing all vendor specific builds and making just hadoop version specific - that is what we should depend on anyway. Though I hope Sean is correct in assuming that vendor specific builds for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause incompatibilities

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Andrew Ash
Does the Apache project team have any ability to measure download counts of the various releases? That data could be useful when it comes time to sunset vendor-specific releases, like CDH4 for example. On Mon, Mar 9, 2015 at 5:34 AM, Mridul Muralidharan mri...@gmail.com wrote: In ideal

Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the Maven artifacts. Patrick I see you just commented on SPARK-5134 and will follow up there. Sounds like this may accidentally not be a problem. On binary

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Yeah, my concern is that people should get Apache Spark from *Apache*, not from a vendor. It helps everyone use the latest features no matter where they are. In the Hadoop distro case, Hadoop made all this effort to have standard APIs (e.g. YARN), so it should be easy. But it is a problem if

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
I think it's important to separate the goals from the implementation. I agree with Matei on the goal - I think the goal needs to be to allow people to download Apache Spark and use it with CDH, HDP, MapR, whatever... This is the whole reason why HDFS and YARN have stable API's, so that other

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia
Our goal is to let people use the latest Apache release even if vendors fall behind or don't want to package everything, so that's why we put out releases for vendors' versions. It's fairly low overhead. Matei On Mar 8, 2015, at 5:56 PM, Sean Owen so...@cloudera.com wrote: Ah. I

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen
Yeah it's not much overhead, but here's an example of where it causes a little issue. I like that reasoning. However, the released builds don't track the later versions of Hadoop that vendors would be distributing -- there's no Hadoop 2.6 build for example. CDH4 is here, but not the far-more-used