Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

Matei Zaharia Sun, 08 Dec 2013 16:53:28 -0800

I agree that minor releases should be binary-compatible for all public APIs, 
and I think that’s a good goal for future ones. In fact our releases have 
always provided full compatibility for “external” APIs, just not for internal 
ones that you might use for defining a new RDD, new transformations, etc. 
However, it seems that more people want those directly, so that’s a good goal 
to aim for.


In this case we pushed in more features than usual because this was the last 
branch on Scala 2.9, and there were some pretty key features (YARN 2.2 
compatibility, standalone mode HA) that we thought 2.9 users would want.

Something else we’ll probably do is mark more “internal”, yet useful-to-extend, 
APIs through an annotation. I’m talking about things like writing a custom RDD 
or SparkListener. These may change in major versions, but at least you’ll be 
able to expect that maintenance releases in the original branch don’t break 
them.

Matei

On Dec 8, 2013, at 2:45 PM, Mark Hamstra <m...@clearstorydata.com> wrote:

> Yup, I'm already started on that process.
> 
> And it's not that I disagree with any particular change that was merged per
> se -- I haven't seen anything merged that most users won't want.  It's more
> that I object to the burden that our current development/versioning/release
> process puts on Spark users responsible for production code.  For them,
> adopting a new patch-level release should be a decision requiring almost no
> thinking since the new release should be essentially just bug-fixes that
> maintain full binary compatibility.  With our current process, those users
> have to suck in a bunch of new, less-tested, less-mature code that may
> comprise new features or functionality that the user doesn't want (at least
> not right away in production), but that they can't cleanly separate from
> the bug-fixes that they do want.  Our process simply has to change if we
> place users' desires ahead of Spark developers' desires.
> 
> 
> On Sun, Dec 8, 2013 at 2:12 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> 
>> Hey Mark,
>> 
>> One constructive action you and other people can take to help us
>> assess the quality and completeness of this release is to download the
>> release, run the tests, run the release in your dev environment, read
>> through the documentation, etc. This is one of the main points of
>> releasing an RC to the community... even if you disagree with some
>> patches that were merged in, this is still a way you can help validate
>> the release.
>> 
>> - Patrick
>> 
>> On Sun, Dec 8, 2013 at 1:30 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>> I'm aware of the changes file, but it really doesn't address the issue
>> that
>>> I am raising.  The changes file just tells me what has gone into the
>>> release candidate.  In general, it doesn't tell me why those changes went
>>> in or provide any rationale by which to judge whether that is the
>> complete
>>> set of changes that should go in.
>>> 
>>> I talked some with Matei about related versioning and release issues last
>>> week, and I've raised them in other contexts previously, but I'm taking
>> the
>>> liberty to annoy people again because I really am not happy with our
>>> current versioning and release process, and I really am of the opinion
>> that
>>> we've got to start doing much better before I can vote in favor of a 1.0
>>> release.  I fully realize that this is not a 1.0 release, and that
>> because
>>> we are pre-1.0 we still have a lot of flexibility with releases that
>> break
>>> backward or forward compatibility and with version numbers that have
>>> nothing like the semantic meaning that they will eventually need to have;
>>> but it is not going to be easy to change our process and culture so that
>> we
>>> produce the kind of stability and reliability that Spark users need to be
>>> able to depend upon and version numbers that clearly communicate what
>> those
>>> users expect them to mean.  I think that we should start making those
>>> changes now.  Just because we have flexibility pre-1.0, that doesn't mean
>>> that we shouldn't start training ourselves now to work within the
>>> constraints of post-1.0 Spark.  If I'm to be happy voting for an eventual
>>> 1.0 release candidate, I'll need to have seen at least one full
>> development
>>> cycle that already adheres to the post-1.0 constraints, demonstrating the
>>> maturity of our development process.
>>> 
>>> That demonstration cycle is clearly not this one -- and I understand that
>>> there were some compelling reasons (particularly with regard too getting
>> a
>>> "full" release of Spark based on Scala 2.9.3 before we make the jump to
>>> 2.10.  This "patch-level" release breaks binary compatibility and
>> contains
>>> a lot of code that isn't anywhere close to meeting the criterion for
>>> inclusion in a real, post-1.0 patch-level release: essentially "changes
>>> that every, or nearly every, existing Spark user needs (not just wants),
>>> and that work with all existing and future binaries built with the prior
>>> patch-level version of Spark as a dependency."  Like I said, we are
>> clearly
>>> nowhere close to that with the move from 0.8.0 to 0.8.1; but I also
>> haven't
>>> been able to recognize any alternative criterion by which to judge the
>>> quality and completeness of this release candidate.
>>> 
>>> Maybe there just isn't one, and I'm just going to have to swallow my
>>> concerns while watching 0.8.1 go out the door; but if we don't start
>> doing
>>> better on this kind of thing in the future, you are going to start
>> hearing
>>> more complaining from me. I just hope that it doesn't get to the point
>>> where I feel compelled to actively oppose an eventual 1.0 release
>>> candidate.
>>> 
>>> 
>>> On Sun, Dec 8, 2013 at 12:37 PM, Henry Saputra <henry.sapu...@gmail.com
>>> wrote:
>>> 
>>>> Ah, sorry for the confusion Patrick, like you said I was just trying to
>> let
>>>> people aware about this file and the purpose of it.
>>>> 
>>>> On Sunday, December 8, 2013, Patrick Wendell wrote:
>>>> 
>>>>> Hey Henry,
>>>>> 
>>>>> Are you suggesting we need to change something about or changes file?
>>>>> Or are you just pointing people to the file?
>>>>> 
>>>>> - Patrick
>>>>> 
>>>>> On Sun, Dec 8, 2013 at 11:37 AM, Henry Saputra <
>> henry.sapu...@gmail.com>
>>>>> wrote:
>>>>>> HI Spark devs,
>>>>>> 
>>>>>> I have modified the Subject to avoid polluting the VOTE thread since
>>>>>> it related to more info how and which commits merge back to 0.8.*
>>>>>> branch.
>>>>>> Please respond to the previous question to this thread.
>>>>>> 
>>>>>> Technically the CHANGES.txt [1] file should describe the changes in
>> a
>>>>>> particular release and it is the main requirement needed to cut an
>> ASF
>>>>>> release.
>>>>>> 
>>>>>> 
>>>>>> - Henry
>>>>>> 
>>>>>> [1]
>>>>> https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt
>>>>>> 
>>>>>> On Sun, Dec 8, 2013 at 12:03 AM, Josh Rosen <rosenvi...@gmail.com>
>>>>> wrote:
>>>>>>> We can use git log to figure out which changes haven't made it into
>>>>>>> branch-0.8.  Here's a quick attempt, which only lists pull requests
>>>> that
>>>>>>> were only merged into one of the branches.  For completeness, this
>>>>> could be
>>>>>>> extended to find commits that weren't part of a merge and are only
>>>>> present
>>>>>>> in one branch.
>>>>>>> 
>>>>>>> *Script:*
>>>>>>> 
>>>>>>> MASTER_BRANCH=origin/master
>>>>>>> RELEASE_BRANCH=origin/branch-0.8
>>>>>>> 
>>>>>>> git log --oneline --grep "Merge pull request" $MASTER_BRANCH  |
>> cut -f
>>>>> 2-
>>>>>>> -d ' ' | sort > master-prs
>>>>>>> git log --oneline --grep "Merge pull request" $RELEASE_BRANCH |
>> cut -f
>>>>> 2-
>>>>>>> -d ' ' | sort > release-prs
>>>>>>> 
>>>>>>> comm -23 master-prs release-prs > master-only
>>>>>>> comm -23 release-prs master-prs > release-only
>>>>>>> 
>>>>>>> 
>>>>>>> *Master Branch Only:*
>>>>>>> Merge pull request #1 from colorant/yarn-client-2.2
>>>>>>> Merge pull request #105 from pwendell/doc-fix
>>>>>>> Merge pull request #110 from pwendell/master
>>>>>>> Merge pull request #146 from JoshRosen/pyspark-custom-serializers
>>>>>>> Merge pull request #151 from russellcardullo/add-graphite-sink
>>>>>>> Merge pull request #154 from soulmachine/ClusterScheduler
>>>>>>> Merge pull request #156 from haoyuan/master
>>>>>>> Merge pull request #159 from liancheng/dagscheduler-actor-refine
>>>>>>> Merge pull request #16 from pwendell/master
>>>>>>> Merge pull request #185 from mkolod/random-number-generator
>>>>>>> Merge pull request #187 from aarondav/example-bcast-test
>>>>>>> Merge pull request #190 from markhamstra/Stages4Jobs
>>>>>>> Merge pull request #198 from
>>>>> ankurdave/zipPartitions-preservesPartitioning
>>>>>>> Merge pull request #2 from colorant/yarn-client-2.2
>>>>>>> Merge pull request #203 from witgo/master
>>>>>>> Merge pull request #204 from rxin/hash
>>>>>>> Merge pull request #205 from kayousterhout/logging
>>>>>>> Merge pull request #206 from ash211/patch-2
>>>>>>> Merge pull request #207 from henrydavidge/master
>>>>>>> Merge pull request #209 from pwendell/better-docs
>>>>>>> Merge pull request #210 from haitaoyao/http-timeout
>>>>>>> Merge pull request #212 from markhamstra/SPARK-963
>>>>>>> Merge pull request #216 from liancheng/fix-spark-966
>>>>>>> Merge pull request #217 from aarondav/mesos-urls
>>>>>>> Merge pull request #22 from GraceH/metrics-naming
>>>>>>> Merge pull request #220 from rxin/zippart
>>>>>>> Merge pull request #225 from ash211/patch-3
>>>>>>> Merge pull request #226 from ash211/patch-4
>>>>>>> Merge pull request #233 from hsaputra/changecontexttobackend
>>>>>>> Merge pull request #239 from aarondav/nit
>>>>>>> Merge pull request #242 from pwendell/master
>>>>>>> Merge pull request #3 from aarondav/pv-test
>>>>>>> Merge pull request #36 from pwendell/versions
>>>>>>> Merge pull request #37 from pwendell/merge-0.8
>>>>>>> Merge pull request #39 from pwendell/master
>>>>>>> Merge pull request #45 from pwendell/metrics_units
>>>>>>> Merge pull request #56 from jerryshao/kafka-0.8-dev
>>>>>>> Merge pull request #64 from prabeesh/master
>>>>>>> Merge pull request #66 from shivaram/sbt-assembly-deps
>>>>>>> Merge pull request #670 from jey/ec2-ssh-improvements
>>>>>>> Merge pull request #71 from aarondav/scdefaults
>>>>>>> Merge pull request #78 from mosharaf/master
>>>>>>> Merge pull request #8 from vchekan/checkpoint-ttl-restore
>>>>>>> Merge pull request #80 from rxin/build
>>>>>>> Merge pull request #82 from JoshRosen/map-output-t
>>>> 
>>

Re: [DISCUSS] About the [VOTE] Release Apache Spark 0.8.1-incubating (rc1)

Reply via email to