Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Mark Hamstra Sat, 17 May 2014 11:10:16 -0700

That is a past issue that we don't need to be re-opening now.  The present
issue, and what I am asking, is which pending bug fixes does anyone
anticipate will require breaking the public API guaranteed in rc9?



On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan <mri...@gmail.com>wrote:

> We made incompatible api changes whose impact we don't know yet completely
> : both from implementation and usage point of view.
>
> We had the option of getting real-world feedback from the user community if
> we had gone to 0.10 but the spark developers seemed to be in a hurry to get
> to 1.0 - so I made my opinion known but left it to the wisdom of larger
> group of committers to decide ... I did not think it was critical enough to
> do a binding -1 on.
>
> Regards
> Mridul
> On 17-May-2014 9:43 pm, "Mark Hamstra" <m...@clearstorydata.com> wrote:
>
> > Which of the unresolved bugs in spark-core do you think will require an
> > API-breaking change to fix?  If there are none of those, then we are
> still
> > essentially on track for a 1.0.0 release.
> >
> > The number of contributions and pace of change now is quite high, but I
> > don't think that waiting for the pace to slow before releasing 1.0 is
> > viable.  If Spark's short history is any guide to its near future, the
> pace
> > will not slow by any significant amount for any noteworthy length of
> time,
> > but rather will continue to increase.  What we need to be aiming for, I
> > think, is to have the great majority of those new contributions being
> made
> > to MLLlib, GraphX, SparkSQL and other areas of the code that we have
> > clearly marked as not frozen in 1.x. I think we are already seeing that,
> > but if I am just not recognizing breakage of our semantic versioning
> > guarantee that will be forced on us by some pending changes, now would
> be a
> > good time to set me straight.
> >
> >
> > On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan <mri...@gmail.com
> > >wrote:
> >
> > > I had echoed similar sentiments a while back when there was a
> discussion
> > > around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
> > > changes, add missing functionality, go through a hardening release
> before
> > > 1.0
> > >
> > > But the community preferred a 1.0 :-)
> > >
> > > Regards,
> > > Mridul
> > >
> > > On 17-May-2014 3:19 pm, "Sean Owen" <so...@cloudera.com> wrote:
> > > >
> > > > On this note, non-binding commentary:
> > > >
> > > > Releases happen in local minima of change, usually created by
> > > > internally enforced code freeze. Spark is incredibly busy now due to
> > > > external factors -- recently a TLP, recently discovered by a large
> new
> > > > audience, ease of contribution enabled by Github. It's getting like
> > > > the first year of mainstream battle-testing in a month. It's been
> very
> > > > hard to freeze anything! I see a number of non-trivial issues being
> > > > reported, and I don't think it has been possible to triage all of
> > > > them, even.
> > > >
> > > > Given the high rate of change, my instinct would have been to release
> > > > 0.10.0 now. But won't it always be very busy? I do think the rate of
> > > > significant issues will slow down.
> > > >
> > > > Version ain't nothing but a number, but if it has any meaning it's
> the
> > > > semantic versioning meaning. 1.0 imposes extra handicaps around
> > > > striving to maintain backwards-compatibility. That may end up being
> > > > bent to fit in important changes that are going to be required in
> this
> > > > continuing period of change. Hadoop does this all the time
> > > > unfortunately and gets away with it, I suppose -- minor version
> > > > releases are really major. (On the other extreme, HBase is at 0.98
> and
> > > > quite production-ready.)
> > > >
> > > > Just consider this a second vote for focus on fixes and 1.0.x rather
> > > > than new features and 1.x. I think there are a few steps that could
> > > > streamline triage of this flood of contributions, and make all of
> this
> > > > easier, but that's for another thread.
> > > >
> > > >
> > > > On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra <
> m...@clearstorydata.com
> > >
> > > wrote:
> > > > > +1, but just barely.  We've got quite a number of outstanding bugs
> > > > > identified, and many of them have fixes in progress.  I'd hate to
> see
> > > those
> > > > > efforts get lost in a post-1.0.0 flood of new features targeted at
> > > 1.1.0 --
> > > > > in other words, I'd like to see 1.0.1 retain a high priority
> relative
> > > to
> > > > > 1.1.0.
> > > > >
> > > > > Looking through the unresolved JIRAs, it doesn't look like any of
> the
> > > > > identified bugs are show-stoppers or strictly regressions
> (although I
> > > will
> > > > > note that one that I have in progress, SPARK-1749, is a bug that we
> > > > > introduced with recent work -- it's not strictly a regression
> because
> > > we
> > > > > had equally bad but different behavior when the DAGScheduler
> > exceptions
> > > > > weren't previously being handled at all vs. being slightly
> > mis-handled
> > > > > now), so I'm not currently seeing a reason not to release.
> > >
> >
>

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Reply via email to