On this note, non-binding commentary:

Releases happen in local minima of change, usually created by
internally enforced code freeze. Spark is incredibly busy now due to
external factors -- recently a TLP, recently discovered by a large new
audience, ease of contribution enabled by Github. It's getting like
the first year of mainstream battle-testing in a month. It's been very
hard to freeze anything! I see a number of non-trivial issues being
reported, and I don't think it has been possible to triage all of
them, even.

Given the high rate of change, my instinct would have been to release
0.10.0 now. But won't it always be very busy? I do think the rate of
significant issues will slow down.

Version ain't nothing but a number, but if it has any meaning it's the
semantic versioning meaning. 1.0 imposes extra handicaps around
striving to maintain backwards-compatibility. That may end up being
bent to fit in important changes that are going to be required in this
continuing period of change. Hadoop does this all the time
unfortunately and gets away with it, I suppose -- minor version
releases are really major. (On the other extreme, HBase is at 0.98 and
quite production-ready.)

Just consider this a second vote for focus on fixes and 1.0.x rather
than new features and 1.x. I think there are a few steps that could
streamline triage of this flood of contributions, and make all of this
easier, but that's for another thread.


On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> +1, but just barely.  We've got quite a number of outstanding bugs
> identified, and many of them have fixes in progress.  I'd hate to see those
> efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 --
> in other words, I'd like to see 1.0.1 retain a high priority relative to
> 1.1.0.
>
> Looking through the unresolved JIRAs, it doesn't look like any of the
> identified bugs are show-stoppers or strictly regressions (although I will
> note that one that I have in progress, SPARK-1749, is a bug that we
> introduced with recent work -- it's not strictly a regression because we
> had equally bad but different behavior when the DAGScheduler exceptions
> weren't previously being handled at all vs. being slightly mis-handled
> now), so I'm not currently seeing a reason not to release.

Reply via email to