We had an extended discussion around this in the last community sync.
Thanks for those who participated!

To sum up the discussion:

--> As mesos devs, we should strive to not make incompatible changes in
APIs, flags, environment variables.

--> In the rare case where an incompatible change is preferred (e.g., code
complexity), we should give a clear 6 months heads up the users that a
breaking change is going to take place.

--> Breaking changes do not necessitate a major version bump. This is
because we want to allow live upgrades between major versions (e.g., 1.10
to 2.0).

--> Compatibility guarantees do not apply to experimental features (incl.
APIs).

--> We need to have clear documentation about procedure that devs could
follow when deprecating/removing stable features and adding experimental
features.

--> We need to improve upgrades.md to make it easy for operators to know
what features are deprecated/removed between versions X and Y.

--> We should decouple internal protos used by Mesos from the unversioned
protos used by driver based frameworks.

I will spend some time in the next few weeks to create/update the
documentation reflecting these points.

Anything else I missed?

Thanks,

On Sat, Oct 15, 2016 at 11:47 AM, haosdent <haosd...@gmail.com> wrote:

> Thanks @yan's great inputs! I couldn't agree more almost of them.
>
> > Also the API is not just what the machine reads but all the documentation
> associated with it, right? It depends on what the documentation says; what
> the user _should_ expect.
>
> I think different users may have different expectations. And the guy who
> developed the APIs may have different understand from some users as well.
> Our documentations should cover most of cases.
>
> But in case that we didn't or forgot to write it explicitly in the
> document, should we give up to update the API? Just like user Alice said
> this is a BUG while user Bob said this is a feature. I think we still need
> to raise it case by case to ensure most users are not affected by the
> breaking API changes.
>
> On Sat, Oct 15, 2016 at 6:55 AM, Vinod Kone <vinodk...@apache.org> wrote:
>
> > We will chat about this in the upcoming community sync (thursday 3 PM).
> > So, please make sure to attend if you are interested.
> >
> > On Fri, Oct 14, 2016 at 3:44 PM, Yan Xu <xuj...@apple.com> wrote:
> >
> >>
> >> On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu <xuj...@apple.com> wrote:
> >>
> >>> Thanks Alex for starting this!
> >>>
> >>> In addition to comments below, I think it'll be helpful to keep the
> >>> existing versioning doc concise and user-friendly while having a
> dedicated
> >>> doc for the "implementation details" where precise requirements and
> >>> procedures go. Maybe some duplication/cross-referencing is needed but
> Mesos
> >>> developers will find the latter much more helpful while the
> users/framework
> >>> developer will find the former easy to read.
> >>>
> >>> e.g., a similar split:
> >>> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
> >>> https://github.com/kubernetes/kubernetes/blob/master/docs/de
> >>> vel/api_changes.md (which has a lot of details on how the kubernetes
> >>> community is thinking about similar issues, which we can learn from)
> >>>
> >>> Jiang Yan Xu 
> >>>
> >>> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com>
> >>> wrote:
> >>>
> >>>> Folks,
> >>>>
> >>>> There have been a bunch of online [1, 2] and offline discussions about
> >>>> our
> >>>> deprecation and versioning policy. I found that people—including
> >>>> myself—read the versioning doc [3] differently; moreover some aspects
> >>>> are
> >>>> not captured there. I would like to start a discussion around this
> >>>> topic by
> >>>> sharing my confusions and suggestions. This will hopefully help us
> stay
> >>>> on
> >>>> the same page and have similar expectations. The second goal is to
> >>>> eliminate ambiguities from the versioning doc (thanks Vinod for
> >>>> volunteering to update it).
> >>>>
> >>>
> >>> +1 Let me know if there are things I can help with.
> >>>
> >>>
> >>>>
> >>>> 1. API vs. semantic changes.
> >>>> Current versioning guide treat features (e.g. flags, metrics,
> endpoints)
> >>>> and API differently: incompatible changes for the former are allowed
> >>>> after
> >>>> 6 month deprecation cycle, while for the latter they require bumping a
> >>>> major version. I suggest we consolidate these policies.
> >>>>
> >>>
> >>> I feel that the distinction is not API vs. semantic changes, Backwards
> >>> compatible API guarantee should imply backwards compatible semantics
> (of
> >>> the API).
> >>> i.e., if a change in API doesn't cause the message to be dropped to the
> >>> floor but leads to behavior change that causes problems in the system,
> it
> >>> still breaks compatibility.
> >>>
> >>> IMO the distinction is more between:
> >>> - Compatibility between components that are impossible/very unpleasant
> >>> to upgrade in lockstep - high priority for compatibility guarantee.
> >>> - Compatibility between components that are generally bundled (modules)
> >>> or things that usually aren't built into automated tooling (e.g., the
> >>> /state endpoint) - more relaxed for now but we should explicitly
> exclude
> >>> them from the guarantee.
> >>>
> >>>
> >>>>
> >>>> We should also define and clearly explain what changes require bumping
> >>>> the
> >>>> major version. I have no strong opinion here and would love to hear
> what
> >>>> people think. The original motivation for maintaining backwards
> >>>> compatibility is to make sure vN schedulers can correctly work with vN
> >>>> API
> >>>> without being updated. But what about semantic changes that do not
> touch
> >>>> the API? For example, what if we decide to send less task health
> >>>> updates to
> >>>> schedulers based on some health policy? It influences the flow of task
> >>>> status updates, should such change be considered compatible? Taking it
> >>>> to
> >>>> an extreme, we may not even be able to fix some bugs because someone
> may
> >>>> already rely on this behaviour!
> >>>>
> >>>
> >>> API changes should warrant a major version bump. Also the API is not
> >>> just what the machine reads but all the documentation associated with
> it,
> >>> right? It depends on what the documentation says; what the user
> _should_
> >>> expect.
> >>>
> >>> That said, I feel that these things are hard to be talked about in the
> >>> abstract. Even with a guideline, we still need to make case-by-case
> >>> decisions. (e.g., has the documentation precisely defined this precise
> >>> behavior? If not, is it reasonable for the users to expect some
> behavior
> >>> because it's common sense? How bad is it if some behavior just changes
> a
> >>> tiny bit?) Therefore we need to make sure the process for API changes
> are
> >>> more rigorously defined.
> >>>
> >>> Whether something is a bug depends on whether the API does what it says
> >>> it'll do. The line may sometimes be blurry but in general I don't feel
> it's
> >>> a problem. If someone is relying on the behavior that is a bug, we
> should
> >>> still help them fix it but the bug shouldn't count as "our guarantee".
> >>>
> >>>
> >>>>
> >>>> Another tightly related thing we should explicitly call out is
> >>>> upgradability and rollback capabilities inside a major release.
> >>>> Committing
> >>>> to this may significantly limit what we can change within a major
> >>>> release;
> >>>> on the other side it will give users more time and a better experience
> >>>> about using and maintaining Mesos clusters.
> >>>>
> >>>
> >>> According to the versioning doc upgradability depends on whether you
> >>> depend on deprecated/removed features.
> >>>
> >>> That paragraph should be explained more precisely:
> >>> - "deprecated" means your system won't break but warnings are shown
> >>> (Maybe we should use some standard deprecation warning keywords so the
> >>> operator can monitor the log for such warnings!
> >>> - "removed": means it may break.
> >>>
> >>> If you deprecate a flag/env that interface with operator tooling in the
> >>> next minor release, the operator basically has 6 months from the next
> minor
> >>> release to change the her tooling. I feel this is pretty acceptable.
> >>> If you deprecate a flag/env variable that interface with the framework
> >>> (executor) in the next minor release, I feel it may not be enough and
> it
> >>> probably warrants a major version bump. So perhaps the API shouldn't be
> >>> just the protos.
> >>>
> >>>
> >>>> 2. Versioned vs. unversioned protobufs.
> >>>> Currently we have v1 and unnamed protobufs, which simultaneously mean
> >>>> v0,
> >>>> v2, and internal. I am sometimes confused about what is the right way
> to
> >>>> update or introduce a field or message there, do people feel the same?
> >>>> How
> >>>> about splitting the unnamed version into explicit v0, v2, and
> internal?
> >>>>
> >>>
> >>> As haosdent mentioned, we have captured this in MESOS-6268. The benefit
> >>> is clear but I guess the people will be more motivated when we find
> some v2
> >>> feature can't be made compatible with the v0 API. (Anand's point
> >>> in MESOS-6016). On the other hand, if we cut v0 API access before that
> >>> happens (is v0 API obsolete and should be removed 6 months after 1.0?)
> then
> >>> we don't need to worry about v0 and can use unversioned protos as
> >>> "internal"?
> >>>
> >>>
> >>>> Food for thought. It would be great if we can only maintain "diffs" to
> >>>> the
> >>>> internal protobufs in the code, instead of duplicating them
> altogether.
> >>>>
> >>>> 3. API and feature labelling.
> >>>> I suggest to introduce explicit labels for API and features, to ensure
> >>>> users have the right assumptions about the their lifetime while
> >>>> engineers
> >>>> have the ability to change a wip feature in an non-compatible way. I
> >>>> propose the following:
> >>>> API: stable, non-stable, pure (not used by Mesos components)
> >>>> Feature: experimental, normal.
> >>>>
> >>>
> >>>  +1 on formalizing the terminologies.
> >>>
> >>> Historically the distinction is not clear for the following:
> >>>
> >>> 1. The API has no compatibility guarantee at all.
> >>> 2. The feature provided by this API is experimental
> >>>
> >>
> >> To add to this point: because 2) logically doesn't apply to the "pure
> >> (not used by Mesos components)" fields in the API, it could be more
> >> confusing and thus require more precise definition.
> >>
> >>
> >>>
> >>> IMO It's OK that we say that we don't distinguish the two (the API has
> >>> no compatibility guarantee until the feature is fully released) but we
> have
> >>> to make it clear.
> >>> If we don't make such distinction, ALL API additions should be marked
> as
> >>> unstable first and be changed stable later (as a formal process).
> >>>
> >>>
> >>>>
> >>>> Looking forward to your thoughts and suggestions.
> >>>> AlexR
> >>>>
> >>>> [1] https://www.mail-archive.com/user@mesos.apache.org/msg08025.html
> >>>> [2] https://www.mail-archive.com/dev@mesos.apache.org/msg36621.html
> >>>> [3]
> >>>> https://github.com/apache/mesos/blob/b2beef37f6f85a8c75e9681
> >>>> 36caa7a1f292ba20e/docs/versioning.md
> >>>>
> >>>
> >>>
> >>
> >
>
>
> --
> Best Regards,
> Haosdent Huang
>

Reply via email to