Re: [DISCUSS] Branches and versions for Hadoop 3

Jason Lowe Mon, 28 Aug 2017 12:42:06 -0700

Allen Wittenauer wrote:

> > On Aug 25, 2017, at 1:23 PM, Jason Lowe <jl...@oath.com> wrote:
> >
> > Allen Wittenauer wrote:
> >
> > > Doesn't this place an undue burden on the contributor with the first
> incompatible patch to prove worthiness?  What happens if it is decided that
> it's not good enough?
> >
> > It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
>
>         So, by this definition as soon as a patch comes in to remove
> deprecated bits there will be no issue with a branch-3 getting created,
> correct?
>

I think this gets back to the "if it's worth committing" part.  I feel the
community should collectively decide when it's worth taking the hit to
maintain the separate code line.  IMHO removing deprecated bits alone is
not reason enough to diverge the code base and the additional maintenance
that comes along with the extra code line.  A new feature is traditionally
the reason to diverge because that's something users would actually care
enough about to take the compatibility hit when moving to the version that
has it.  That also helps drive a timely release of the new code line
because users want the feature that went into it.

> >  Otherwise if past trunk behavior is any indication, it ends up mostly
> enabling people to commit to just trunk, forgetting that the thing they are
> committing is perfectly valid for branch-3.
>
>         I'm not sure there was any "forgetting" involved.  We likely
> wouldn't be talking about 3.x at all if it wasn't for the code diverging
> enough.
>

I don't think it was the myriad of small patches that went only into trunk
over the last 6 years that drove this.  Instead I think it was simply that
an "important enough" feature went in, like erasure coding, that gathered
momentum behind this release.  Trunk sat ignored for basically 5+ years,
and plenty of patches went into just trunk that should have gone into at
least branch-2 as well.  I don't think we as a community did the
contributors any favors by putting their changes into a code line that
didn't see a release for a very long time.  Yes 3.x could have released
sooner to help solve that issue, but given the complete lack of excitement
around 3.x until just recently is there any reason this won't happen again
with 4.x?  Seems to me 4.x will need to have something "interesting enough"
to drive people to release it relative to 3.x, which to me indicates we
shouldn't commit things only to there until we have an interest to do so.

> > Given the number of committers that openly ignore discussions like
> this, who is going to verify that incompatible changes don't get in?
> >
> > The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.
> >  Yes, I know that means it's inevitable that compatibility breakages
> will happen, and we can and should improve the automation around
> compatibility testing when possible.
>
>         The automation only goes so far.  At least while investigating
> Yetus bugs, I've seen more than enough blatant and purposeful ignored
> errors and warnings that I'm not convinced it will be effective. ("That
> javadoc compile failure didn't come from my patch!"  Um, yes, yes it did.)
> PR for features has greatly trumped code correctness for a few years now.
>

I totally agree here.  We can and should do better about this outside of
automation.  I brought up automation since I see it as a useful part of the
total solution along with better developer education, oversight, etc.  I'm
thinking specifically about tools that can report on public API signature
changes, but that's just one aspect of compatibility.  Semantic behavior is
not something a static analysis tool can automatically detect, and the only
way to automate some of that is something like end-to-end compatibility
testing.  Bigtop may cover some of this with testing of older versions of
downstream projects like HBase, Hive, Oozie, etc., and we could setup some
tests that standup two different Hadoop clusters and run tests that verify
interop between them.  But the tests will never be exhaustive and we will
still need educated committers and oversight to fill in the gaps.

>  But I don't think there's a magic bullet for preventing all
> compatibility bugs from being introduced, just like there isn't one for
> preventing general bugs.  Does having a trunk branch separate but
> essentially similar to branch-3 make this any better?
>
>         Yes: it's been the process for over a decade now.  Unless there is
> some outreach done, it is almost a guarantee that someone will commit
> something to trunk they shouldn't because they simply won't know (or care?)
> the process has changed.
>

As you mentioned, people are already breaking compatibility left and right
as it is, which is why I wondered if it was really any better in practice.
Personally I'd rather find out about a major breakage sooner than later,
since if trunk remains an active area of development at all times it's more
likely the community will sit up and take notice when something crazy goes
in.  In the past, trunk was not really an actively deployed area for over 5
years, and all sorts of stuff went in without people really being aware of
it.

I agree everyone needs to be aware of the new policy which in large part is
this discussion thread.  If we do ultimately decide to keep trunk a 3.x
line for now then we should also update the HowToCommit wiki and any other
documentation accordingly along with an announce thread.

> I would *love* for the PMC to define a policy or guidelines that says the
> community should strive for a major after x  incompatible changes, a minor
> after y changes, a micro after z fixes.  Even if it doesn't have any teeth,
> it would at least give people hope that their contributions won't be lost
> in the dustbin of history and may actually push others to work on getting a
> release out.

Agreed, we need to be better about getting releases out the door.  But it
takes significant effort to do this, something that hasn't always happened
in the past.  In my mind that's part of the motivation to keep trunk a 3.x
line for a while because it accomplishes three things:

1) Patches are more likely to see a release sooner than if they are
committed to a new, trunk-as-4.x line.

2) Patches are more likely to see testing in a "real world" deployment
sooner than if the are committed to a new, trunk-as-4.x line

3) Choosing to break backward compatibility to create the 4.x line becomes
a more conscious, community-discussed event.  It's convenient as a
developer to just say, "it's trunk, we can break anything" but breaking
backwards compatibility has a cost.  IMHO the benefits of that breakage
need to more than pay for the costs users have to endure to migrate to the
new version otherwise it will just be seen as another hostility towards our
users.  Removing deprecated code that many popular downstream projects are
still using is a key example of something that usually isn't worth the cost.

I think 1) and 2) are a direct benefit to people who contribute code to the
project.  They are more likely to see their contribution in a release they
can deploy sooner than a new, major Hadoop release that will require more
effort on the Apache Hadoop project to get released and more effort on
their side to migrate onto.

I believe 3) benefits the committers since we can collectively discuss
whether a new change is worth maintaining a new, separate code line and if
it is, hopefully kickstart discussions for a release manager to get it out
sooner rather than later.  Otherwise I fear trunk will be another "fire and
forget" patch dumping ground until that key feature eventually comes along
to drive interest in publishing it.

        We need to do better.
>

I totally agree, and trying to keep trunk an actively watched and released
line should help.  It sounds like we agree on that part but disagree on the
specifics of how to help trunk remain active.  Given that historically
trunk has languished for years I was hoping this proposal would help reduce
the likelihood of it happening again.  If we eventually decide that cutting
branch-3 now makes more sense then I'll do what I can to make that work
well, but it would be good to see concrete proposals on how to avoid the
problems we had with it over the last 6 years.

Jason

Re: [DISCUSS] Branches and versions for Hadoop 3

Reply via email to