[Let me know if I should be posting these comments in a different thread.] Should the default Spark version in spark-ec2 <https://github.com/apache/spark/blob/e1535ad3c6f7400f2b7915ea91da9c60510557ba/ec2/spark_ec2.py#L86> be updated for this release?
Nick On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Nicholas, > > Thanks for this, we can merge in doc changes outside of the actual > release timeline, so we'll make sure to loop those changes in before > we publish the final 1.1 docs. > > - Patrick > > On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > There were several formatting and typographical errors in the SQL docs > that > > I've fixed in this PR. Dunno if we want to roll that into the release. > > > > > > On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell <pwend...@gmail.com> > > wrote: > >> > >> Okay I'll plan to add cdh4 binary as well for the final release! > >> > >> --- > >> sent from my phone > >> On Aug 29, 2014 8:26 AM, "Ye Xianjin" <advance...@gmail.com> wrote: > >> > >> > We just used CDH 4.7 for our production cluster. And I believe we > won't > >> > use CDH 5 in the next year. > >> > > >> > Sent from my iPhone > >> > > >> > > On 2014年8月29日, at 14:39, Matei Zaharia <matei.zaha...@gmail.com> > >> > > wrote: > >> > > > >> > > Personally I'd actually consider putting CDH4 back if there are > still > >> > users on it. It's always better to be inclusive, and the convenience > of > >> > a > >> > one-click download is high. Do we have a sense on what % of CDH users > >> > still > >> > use CDH4? > >> > > > >> > > Matei > >> > > > >> > > On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com) > >> > > wrote: > >> > > > >> > > (Copying my reply since I don't know if it goes to the mailing list) > >> > > > >> > > Great, thanks for explaining the reasoning. You're saying these > aren't > >> > > going into the final release? I think that moots any issue > surrounding > >> > > distributing them then. > >> > > > >> > > This is all I know of from the ASF: > >> > > https://community.apache.org/projectIndependence.html I don't read > it > >> > > as expressly forbidding this kind of thing although you can see how > it > >> > > bumps up against the spirit. There's not a bright line -- what about > >> > > Tomcat providing binaries compiled for Windows for example? does > that > >> > > favor an OS vendor? > >> > > > >> > > From this technical ASF perspective only the releases matter -- do > >> > > what you want with snapshots and RCs. The only issue there is maybe > >> > > releasing something different than was in the RC; is that at all > >> > > confusing? Just needs a note. > >> > > > >> > > I think this theoretical issue doesn't exist if these binaries > aren't > >> > > released, so I see no reason to not proceed. > >> > > > >> > > The rest is a different question about whether you want to spend > time > >> > > maintaining this profile and candidate. The vendor already manages > >> > > their build I think and -- and I don't know -- may even prefer not > to > >> > > have a different special build floating around. There's also the > >> > > theoretical argument that this turns off other vendors from adopting > >> > > Spark if it's perceived to be too connected to other vendors. I'd > like > >> > > to maximize Spark's distribution and there's some argument you do > this > >> > > by not making vendor profiles. But as I say a different question to > >> > > just think about over time... > >> > > > >> > > (oh and PS for my part I think it's a good thing that CDH4 binaries > >> > > were removed. I wasn't arguing for resurrecting them) > >> > > > >> > >> On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell < > pwend...@gmail.com> > >> > wrote: > >> > >> Hey Sean, > >> > >> > >> > >> The reason there are no longer CDH-specific builds is that all > newer > >> > >> versions of CDH and HDP work with builds for the upstream Hadoop > >> > >> projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) > and > >> > >> the Hadoop-without-Hive (also 2.4) build. > >> > >> > >> > >> For MapR - we can't officially post those artifacts on ASF web > space > >> > >> when we make the final release, we can only link to them as being > >> > >> hosted by MapR specifically since they use non-compatible licenses. > >> > >> However, I felt that providing these during a testing period was > >> > >> alright, with the goal of increasing test coverage. I couldn't find > >> > >> any policy against posting these on personal web space during RC > >> > >> voting. However, we can remove them if there is one. > >> > >> > >> > >> Dropping CDH4 was more because it is now pretty old, but we can add > >> > >> it > >> > >> back if people want. The binary packaging is a slightly separate > >> > >> question from release votes, so I can always add more binary > packages > >> > >> whenever. And on this, my main concern is covering the most popular > >> > >> Hadoop versions to lower the bar for users to build and test Spark. > >> > >> > >> > >> - Patrick > >> > >> > >> > >>> On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen <so...@cloudera.com> > >> > wrote: > >> > >>> +1 I tested the source and Hadoop 2.4 release. Checksums and > >> > >>> signatures are OK. Compiles fine with Java 8 on OS X. Tests... > don't > >> > >>> fail any more than usual. > >> > >>> > >> > >>> FWIW I've also been using the 1.1.0-SNAPSHOT for some time in > >> > >>> another > >> > >>> project and have encountered no problems. > >> > >>> > >> > >>> > >> > >>> I notice that the 1.1.0 release removes the CDH4-specific build, > but > >> > >>> adds two MapR-specific builds. Compare with > >> > >>> https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I > >> > >>> commented on the commit: > >> > >>> > >> > > >> > > https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc > >> > >>> > >> > >>> I'm in favor of removing all vendor-specific builds. This change > >> > >>> *looks* a bit funny as there was no JIRA (?) and appears to swap > one > >> > >>> vendor for another. Of course there's nothing untoward going on, > but > >> > >>> what was the reasoning? It's best avoided, and MapR already > >> > >>> distributes Spark just fine, no? > >> > >>> > >> > >>> This is a gray area with ASF projects. I mention it as well > because > >> > >>> it > >> > >>> came up with Apache Flink recently > >> > >>> ( > >> > > >> > > http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E > >> > ) > >> > >>> Another vendor rightly noted this could look like favoritism. They > >> > >>> changed to remove vendor releases. > >> > >>> > >> > >>>> On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell > >> > >>>> <pwend...@gmail.com> > >> > wrote: > >> > >>>> Please vote on releasing the following candidate as Apache Spark > >> > version 1.1.0! > >> > >>>> > >> > >>>> The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): > >> > >>>> > >> > > >> > > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327 > >> > >>>> > >> > >>>> The release files, including signatures, digests, etc. can be > found > >> > at: > >> > >>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2/ > >> > >>>> > >> > >>>> Release artifacts are signed with the following key: > >> > >>>> https://people.apache.org/keys/committer/pwendell.asc > >> > >>>> > >> > >>>> The staging repository for this release can be found at: > >> > >>>> > >> > > https://repository.apache.org/content/repositories/orgapachespark-1029/ > >> > >>>> > >> > >>>> The documentation corresponding to this release can be found at: > >> > >>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/ > >> > >>>> > >> > >>>> Please vote on releasing this package as Apache Spark 1.1.0! > >> > >>>> > >> > >>>> The vote is open until Monday, September 01, at 03:11 UTC and > >> > >>>> passes > >> > if > >> > >>>> a majority of at least 3 +1 PMC votes are cast. > >> > >>>> > >> > >>>> [ ] +1 Release this package as Apache Spark 1.1.0 > >> > >>>> [ ] -1 Do not release this package because ... > >> > >>>> > >> > >>>> To learn more about Apache Spark, please see > >> > >>>> http://spark.apache.org/ > >> > >>>> > >> > >>>> == Regressions fixed since RC1 == > >> > >>>> LZ4 compression issue: > >> > https://issues.apache.org/jira/browse/SPARK-3277 > >> > >>>> > >> > >>>> == What justifies a -1 vote for this release? == > >> > >>>> This vote is happening very late into the QA period compared with > >> > >>>> previous votes, so -1 votes should only occur for significant > >> > >>>> regressions from 1.0.2. Bugs already present in 1.0.X will not > >> > >>>> block > >> > >>>> this release. > >> > >>>> > >> > >>>> == What default changes should I be aware of? == > >> > >>>> 1. The default value of "spark.io.compression.codec" is now > >> > >>>> "snappy" > >> > >>>> --> Old behavior can be restored by switching to "lzf" > >> > >>>> > >> > >>>> 2. PySpark now performs external spilling during aggregations. > >> > >>>> --> Old behavior can be restored by setting "spark.shuffle.spill" > >> > >>>> to > >> > "false". > >> > >>>> > >> > >>>> > >> > >>>> > --------------------------------------------------------------------- > >> > >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> > >>>> For additional commands, e-mail: dev-h...@spark.apache.org > >> > > > >> > > > --------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> > > For additional commands, e-mail: dev-h...@spark.apache.org > >> > > > >> > > > > > >