Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

Patrick Wendell Thu, 28 Aug 2014 23:44:10 -0700

Yeah, we can't/won't post MapR binaries on the ASF web space for the
release. However, I have been linking to them (at their request) with
a clear identifier that it is an incompatible license and a 3rd party
build.


The only "vendor specific" build property we provide is compatibility
with different Hadoop FileSystem clients, since unfortunately there is
not a universally adopted client/server protocol. I think our goal has
always been to provide a path for using "ASF Spark" with
vendor-specific filesystems. Some vendors perform backports or
enhancements... and this of course we would never want to manage in
the upstream project.

In terms of vendor support for this approach - In the early days
Cloudera asked us to add CDH4 repository and more recently Pivotal and
MapR also asked us to allow linking against their hadoop-client
libraries. So we've added these based on direct requests from vendors.
Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
imagine ruffling feathers by supporting this. But if we get feedback
in that direction over time we can of course consider a different
approach.

- Patrick



On Thu, Aug 28, 2014 at 11:30 PM, Sean Owen <so...@cloudera.com> wrote:
> (Copying my reply since I don't know if it goes to the mailing list)
>
> Great, thanks for explaining the reasoning. You're saying these aren't
> going into the final release? I think that moots any issue surrounding
> distributing them then.
>
> This is all I know of from the ASF:
> https://community.apache.org/projectIndependence.html I don't read it
> as expressly forbidding this kind of thing although you can see how it
> bumps up against the spirit. There's not a bright line -- what about
> Tomcat providing binaries compiled for Windows for example? does that
> favor an OS vendor?
>
> From this technical ASF perspective only the releases matter -- do
> what you want with snapshots and RCs. The only issue there is maybe
> releasing something different than was in the RC; is that at all
> confusing? Just needs a note.
>
> I think this theoretical issue doesn't exist if these binaries aren't
> released, so I see no reason to not proceed.
>
> The rest is a different question about whether you want to spend time
> maintaining this profile and candidate. The vendor already manages
> their build I think and -- and I don't know -- may even prefer not to
> have a different special build floating around. There's also the
> theoretical argument that this turns off other vendors from adopting
> Spark if it's perceived to be too connected to other vendors. I'd like
> to maximize Spark's distribution and there's some argument you do this
> by not making vendor profiles. But as I say a different question to
> just think about over time...
>
> (oh and PS for my part I think it's a good thing that CDH4 binaries
> were removed. I wasn't arguing for resurrecting them)
>
> On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>> Hey Sean,
>>
>> The reason there are no longer CDH-specific builds is that all newer
>> versions of CDH and HDP work with builds for the upstream Hadoop
>> projects. I dropped CDH4 in favor of a  newer Hadoop version (2.4) and
>> the Hadoop-without-Hive (also 2.4) build.
>>
>> For MapR - we can't officially post those artifacts on ASF web space
>> when we make the final release, we can only link to them as being
>> hosted by MapR specifically since they use non-compatible licenses.
>> However, I felt that providing these during a testing period was
>> alright, with the goal of increasing test coverage. I couldn't find
>> any policy against posting these on personal web space during RC
>> voting. However, we can remove them if there is one.
>>
>> Dropping CDH4 was more because it is now pretty old, but we can add it
>> back if people want. The binary packaging is a slightly separate
>> question from release votes, so I can always add more binary packages
>> whenever. And on this, my main concern is covering the most popular
>> Hadoop versions to lower the bar for users to build and test Spark.
>>
>> - Patrick
>>
>> On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen <so...@cloudera.com> wrote:
>>> +1 I tested the source and Hadoop 2.4 release. Checksums and
>>> signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
>>> fail any more than usual.
>>>
>>> FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
>>> project and have encountered no problems.
>>>
>>>
>>> I notice that the 1.1.0 release removes the CDH4-specific build, but
>>> adds two MapR-specific builds. Compare with
>>> https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
>>> commented on the commit:
>>> https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
>>>
>>> I'm in favor of removing all vendor-specific builds. This change
>>> *looks* a bit funny as there was no JIRA (?) and appears to swap one
>>> vendor for another. Of course there's nothing untoward going on, but
>>> what was the reasoning? It's best avoided, and MapR already
>>> distributes Spark just fine, no?
>>>
>>> This is a gray area with ASF projects. I mention it as well because it
>>> came up with Apache Flink recently
>>> (http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
>>> Another vendor rightly noted this could look like favoritism. They
>>> changed to remove vendor releases.
>>>
>>> On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>>>> Please vote on releasing the following candidate as Apache Spark version 
>>>> 1.1.0!
>>>>
>>>> The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1029/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/
>>>>
>>>> Please vote on releasing this package as Apache Spark 1.1.0!
>>>>
>>>> The vote is open until Monday, September 01, at 03:11 UTC and passes if
>>>> a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 1.1.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see
>>>> http://spark.apache.org/
>>>>
>>>> == Regressions fixed since RC1 ==
>>>> LZ4 compression issue: https://issues.apache.org/jira/browse/SPARK-3277
>>>>
>>>> == What justifies a -1 vote for this release? ==
>>>> This vote is happening very late into the QA period compared with
>>>> previous votes, so -1 votes should only occur for significant
>>>> regressions from 1.0.2. Bugs already present in 1.0.X will not block
>>>> this release.
>>>>
>>>> == What default changes should I be aware of? ==
>>>> 1. The default value of "spark.io.compression.codec" is now "snappy"
>>>> --> Old behavior can be restored by switching to "lzf"
>>>>
>>>> 2. PySpark now performs external spilling during aggregations.
>>>> --> Old behavior can be restored by setting "spark.shuffle.spill" to 
>>>> "false".
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

Reply via email to