Re: [DISCUSS] Branches and versions for Hadoop 3

Vinod Kumar Vavilapalli Mon, 28 Aug 2017 13:04:27 -0700

+1 to Andrew’s proposal for 3.x releases.

We had fairly elaborate threads on this branching & compatibility topic before. 
One of them’s here: [1]


+1 to what Jason said.
 (a) Incompatible changes are not to be treated lightly.  We need to stop 
breaking stuff and ‘just dump it on trunk'.
 (b) Major versions are expensive. We should hesitate before asking our users 
to move from 2.0 to 3.0 or 3.0 to 4.0 (with incompatible changes) *without* any 
other major value proposition.

Some of the incompatible changes can clear wait while others cannot and so may 
mandate a major release. What are the some of the common types of incompatible 
changes?
 - Renaming APIs, removing deprecated APIs, renaming configuration properties, 
changing the default value of a configuration, changing shell output / logging 
etc:
    — Today, we do this on trunk even though the actual effort involved is very 
minimal compared to the overhead it forces in maintaining incompatible trunk.
 - Dependency library updates - updating guava, protobuf etc in Hadoop breaks 
upstreaming applications. I am assuming Classpath Isolation [2] is still a 
blocker for 3.0 GA.
 - JDK upgrades: We tried two different ways with JDK 7 and JDK 8, we need a 
formal policy on this.

If we can managing the above common breaking changes, we can cause less pain to 
our end users.

Here’s what we can do for 3.x / 4.x specifically.
 - Stay on trunk based 3.x releases
 - Avoid all incompatible changes as much as possible
 - If we run into a bunch of minor incompatible changes that have be done, we 
either (a) make the incompatible behavior optional or (b) just park them say 
with an parked-incompatible-change label if making it optional is not possible
 - We create a 4.0 only when (a) we hit the first major incompatible change 
because a major next-step for Hadoop needs it (for e.g. Erasure Coding), and/or 
(b) the number of parked incompatible changes passes a certain threshold. 
Unlike Jason, I don’t see the threshold to be 1 for cases that don’t fit (1).

References
 [1] Looking to a Hadoop 3 release: 
http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results
 
<http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results>
 [2] Classpath isolation for downstream client: 
https://issues.apache.org/jira/browse/HADOOP-11656 
<https://issues.apache.org/jira/browse/HADOOP-11656>

Thanks
+Vinod

> On Aug 25, 2017, at 1:23 PM, Jason Lowe <jl...@oath.com.INVALID> wrote:
> 
> Allen Wittenauer wrote:
> 
> 
>> Doesn't this place an undue burden on the contributor with the first
>> incompatible patch to prove worthiness?  What happens if it is decided that
>> it's not good enough?
> 
> 
> It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
> 
> I do not think it makes sense to pay for the maintenance overhead of two
> nearly-identical lines with no backwards-incompatible changes between them
> until we have the need.  Otherwise if past trunk behavior is any
> indication, it ends up mostly enabling people to commit to just trunk,
> forgetting that the thing they are committing is perfectly valid for
> branch-3.  If we can agree that trunk and branch-3 should be equivalent
> until an incompatible change goes into trunk, why pay for the commit
> overhead and potential for accidentally missed commits until it is really
> necessary?
> 
> How many will it take before the dam will break?  Or is there a timeline
>> going to be given before trunk gets set to 4.x?
> 
> 
> I think the threshold count for the dam should be 1.  As soon as we have a
> JIRA that needs to be committed to move the project forward and we cannot
> ship it in a 3.x release then we create branch-3 and move trunk to 4.x.
> As for a timeline going to 4.x, again I don't see it so much as a "baking
> period" as a "when we need it" criteria.  If we need it in a week then we
> should cut it in a week.  Or a year then a year.  It all depends upon when
> that 4.x-only change is ready to go in.
> 
> Given the number of committers that openly ignore discussions like this,
>> who is going to verify that incompatible changes don't get in?
>> 
> 
> The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.  Yes, I know that means
> it's inevitable that compatibility breakages will happen, and we can and
> should improve the automation around compatibility testing when possible.
> But I don't think there's a magic bullet for preventing all compatibility
> bugs from being introduced, just like there isn't one for preventing
> general bugs.  Does having a trunk branch separate but essentially similar
> to branch-3 make this any better?
> 
> Longer term:  what is the PMC doing to make sure we start doing major
>> releases in a timely fashion again?  In other words, is this really an
>> issue if we shoot for another major in (throws dart) 2 years?
>> 
> 
> If we're trying to do semantic versioning then we shouldn't have a regular
> cadence for major releases unless we have a regular cadence of changes that
> break compatibility.  I'd hope that's not something we would strive
> towards.  I do agree that we should try to be better about shipping
> releases, major or minor, in a more timely manner, but I don't agree that
> we should cut 4.0 simply based on a duration since the last major release.
> The release contents and community's desire for those contents should
> dictate the release numbering and schedule, respectively.
> 
> Jason
> 
> 
> On Fri, Aug 25, 2017 at 2:16 PM, Allen Wittenauer <a...@effectivemachines.com>
> wrote:
> 
>> 
>>> On Aug 25, 2017, at 10:36 AM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>> 
>>> Until we need to make incompatible changes, there's no need for
>>> a Hadoop 4.0 version.
>> 
>> Some questions:
>> 
>>        Doesn't this place an undue burden on the contributor with the
>> first incompatible patch to prove worthiness?  What happens if it is
>> decided that it's not good enough?
>> 
>>        How many will it take before the dam will break?  Or is there a
>> timeline going to be given before trunk gets set to 4.x?
>> 
>>        Given the number of committers that openly ignore discussions like
>> this, who is going to verify that incompatible changes don't get in?
>> 
>>        Longer term:  what is the PMC doing to make sure we start doing
>> major releases in a timely fashion again?  In other words, is this really
>> an issue if we shoot for another major in (throws dart) 2 years?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>> 
>>

Re: [DISCUSS] Branches and versions for Hadoop 3

Reply via email to