Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Mridul Muralidharan Sat, 17 May 2014 14:06:12 -0700

On 18-May-2014 1:45 am, "Mark Hamstra" <m...@clearstorydata.com> wrote:
>
> I'm not trying to muzzle the discussion.  All I am saying is that we don't
> need to have the same discussion about 0.10 vs. 1.0 that we already had.


Agreed, no point in repeating the same discussion ... I am also trying to
understand what the concerns are.

Specifically though, the scope of 1.0 (in terms of changes) went up quite a
bit - a lot of which are new changes and features; not just the initially
envisioned api changes and stability fixes.

If this is raising concerns, particularly since lot of users are depending
on stability of spark interfaces (api, env, scripts, behavior); I want to
understand better what they are - and if they are legitimately serious
enough, we will need to revisit decision to go to 1.0 instead of 0.10 ...
I hope we don't need to though given how late we are in dev cycle

Regards
Mridul

>  If you can tell me about specific changes in the current release
candidate
> that occasion new arguments for why a 1.0 release is an unacceptable idea,
> then I'm listening.
>
>
> On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan <mri...@gmail.com
>wrote:
>
> > On 17-May-2014 11:40 pm, "Mark Hamstra" <m...@clearstorydata.com> wrote:
> > >
> > > That is a past issue that we don't need to be re-opening now.  The
> > present
> >
> > Huh ? If we need to revisit based on changed circumstances, we must -
the
> > scope of changes introduced in this release was definitely not
anticipated
> > when 1.0 vs 0.10 discussion happened.
> >
> > If folks are worried about stability of core; it is a valid concern IMO.
> >
> > Having said that, I am still ok with going to 1.0; but if a conversation
> > starts about need for 1.0 vs going to 0.10 I want to hear more and
possibly
> > allay the concerns and not try to muzzle the discussion.
> >
> >
> > Regards
> > Mridul
> >
> > > issue, and what I am asking, is which pending bug fixes does anyone
> > > anticipate will require breaking the public API guaranteed in rc9
> > >
> > >
> > > On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan <mri...@gmail.com
> > >wrote:
> > >
> > > > We made incompatible api changes whose impact we don't know yet
> > completely
> > > > : both from implementation and usage point of view.
> > > >
> > > > We had the option of getting real-world feedback from the user
> > community if
> > > > we had gone to 0.10 but the spark developers seemed to be in a
hurry to
> > get
> > > > to 1.0 - so I made my opinion known but left it to the wisdom of
larger
> > > > group of committers to decide ... I did not think it was critical
> > enough to
> > > > do a binding -1 on.
> > > >
> > > > Regards
> > > > Mridul
> > > > On 17-May-2014 9:43 pm, "Mark Hamstra" <m...@clearstorydata.com>
> > wrote:
> > > >
> > > > > Which of the unresolved bugs in spark-core do you think will
require
> > an
> > > > > API-breaking change to fix?  If there are none of those, then we
are
> > > > still
> > > > > essentially on track for a 1.0.0 release.
> > > > >
> > > > > The number of contributions and pace of change now is quite high,
but
> > I
> > > > > don't think that waiting for the pace to slow before releasing
1.0 is
> > > > > viable.  If Spark's short history is any guide to its near future,
> > the
> > > > pace
> > > > > will not slow by any significant amount for any noteworthy length
of
> > > > time,
> > > > > but rather will continue to increase.  What we need to be aiming
for,
> > I
> > > > > think, is to have the great majority of those new contributions
being
> > > > made
> > > > > to MLLlib, GraphX, SparkSQL and other areas of the code that we
have
> > > > > clearly marked as not frozen in 1.x. I think we are already seeing
> > that,
> > > > > but if I am just not recognizing breakage of our semantic
versioning
> > > > > guarantee that will be forced on us by some pending changes, now
> > would
> > > > be a
> > > > > good time to set me straight.
> > > > >
> > > > >
> > > > > On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan <
> > mri...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I had echoed similar sentiments a while back when there was a
> > > > discussion
> > > > > > around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize
the
> > api
> > > > > > changes, add missing functionality, go through a hardening
release
> > > > before
> > > > > > 1.0
> > > > > >
> > > > > > But the community preferred a 1.0 :-)
> > > > > >
> > > > > > Regards,
> > > > > > Mridul
> > > > > >
> > > > > > On 17-May-2014 3:19 pm, "Sean Owen" <so...@cloudera.com> wrote:
> > > > > > >
> > > > > > > On this note, non-binding commentary:
> > > > > > >
> > > > > > > Releases happen in local minima of change, usually created by
> > > > > > > internally enforced code freeze. Spark is incredibly busy now
due
> > to
> > > > > > > external factors -- recently a TLP, recently discovered by a
> > large
> > > > new
> > > > > > > audience, ease of contribution enabled by Github. It's getting
> > like
> > > > > > > the first year of mainstream battle-testing in a month. It's
been
> > > > very
> > > > > > > hard to freeze anything! I see a number of non-trivial issues
> > being
> > > > > > > reported, and I don't think it has been possible to triage
all of
> > > > > > > them, even.
> > > > > > >
> > > > > > > Given the high rate of change, my instinct would have been to
> > release
> > > > > > > 0.10.0 now. But won't it always be very busy? I do think the
rate
> > of
> > > > > > > significant issues will slow down.
> > > > > > >
> > > > > > > Version ain't nothing but a number, but if it has any meaning
> > it's
> > > > the
> > > > > > > semantic versioning meaning. 1.0 imposes extra handicaps
around
> > > > > > > striving to maintain backwards-compatibility. That may end up
> > being
> > > > > > > bent to fit in important changes that are going to be
required in
> > > > this
> > > > > > > continuing period of change. Hadoop does this all the time
> > > > > > > unfortunately and gets away with it, I suppose -- minor
version
> > > > > > > releases are really major. (On the other extreme, HBase is at
> > 0.98
> > > > and
> > > > > > > quite production-ready.)
> > > > > > >
> > > > > > > Just consider this a second vote for focus on fixes and 1.0.x
> > rather
> > > > > > > than new features and 1.x. I think there are a few steps that
> > could
> > > > > > > streamline triage of this flood of contributions, and make
all of
> > > > this
> > > > > > > easier, but that's for another thread.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra <
> > > > m...@clearstorydata.com
> > > > > >
> > > > > > wrote:
> > > > > > > > +1, but just barely.  We've got quite a number of
outstanding
> > bugs
> > > > > > > > identified, and many of them have fixes in progress.  I'd
hate
> > to
> > > > see
> > > > > > those
> > > > > > > > efforts get lost in a post-1.0.0 flood of new features
targeted
> > at
> > > > > > 1.1.0 --
> > > > > > > > in other words, I'd like to see 1.0.1 retain a high priority
> > > > relative
> > > > > > to
> > > > > > > > 1.1.0.
> > > > > > > >
> > > > > > > > Looking through the unresolved JIRAs, it doesn't look like
any
> > of
> > > > the
> > > > > > > > identified bugs are show-stoppers or strictly regressions
> > > > (although I
> > > > > > will
> > > > > > > > note that one that I have in progress, SPARK-1749, is a bug
> > that we
> > > > > > > > introduced with recent work -- it's not strictly a
regression
> > > > because
> > > > > > we
> > > > > > > > had equally bad but different behavior when the DAGScheduler
> > > > > exceptions
> > > > > > > > weren't previously being handled at all vs. being slightly
> > > > > mis-handled
> > > > > > > > now), so I'm not currently seeing a reason not to release.
> > > > > >
> > > > >
> > > >
> >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Reply via email to