I'm concerned that using this "unstable" version makes it impossible to
upgrade in-place.

Because if a client (cache config, Traffic Monitor, random ops scripts,
etc) uses it, and a breaking change is made, if you upgrade Traffic Ops
first you'll break all clients, and if you upgrade clients first, they'll
try to talk to TO and get 200's but the data will be malformed.

You could theoretically downgrade all clients to the previous version,
starting from the most-downstream, and then upgrade. But if a production
CDN is using a new feature, the CDN will almost certainly have things
relying on it that will break, either CDN operations or clients using new
features.

Worse, it seems like this isn't obvious. Which makes it a pretty big
footgun, if ATC operators use the "beta" API in their production CDN
without realizing they just made it impossible to upgrade.

On the other hand, I'm not seeing the big development savings. What
features have we added in the past that we added to the API, and then
changed our minds one version later and decided we did it wrong and wanted
to make a breaking change? Since using it makes it impossible to upgrade,
this means all production CDNs will have to wait 2 major versions for new
features. Underlying data changes that require two major versions to add
(like Layered Profiles) are pretty rare; this means for every small,
compatible change, users will have to wait two major versions to use a new
feature in production. That seems like a pretty high cost.


On Tue, Aug 31, 2021 at 10:27 AM Rawlin Peters <raw...@apache.org> wrote:

> For your 1st reason, that is all hinged on whether or not the software
> needs to use the unstable version of the API. That is why you also
> have the choice to stay on the stable version and not have to worry
> about coordinating upgrades. Mind you, upgrades would only need to be
> coordinated in the cases where a component actually uses one of the
> broken APIs in the unstable version. We can easily keep track of
> breaking changes in the changelog in order to call out certain
> upgrades that would need to be coordinated (for any components that
> use the unstable API). Just because that process might be more
> error-prone than keeping the latest API version stable doesn't mean we
> shouldn't do it. It's a small risk that has a huge reward in time
> saved by not having to deal with so many API upgrades.
>
> I think your 2nd reason is actually supporting this proposal:
>
> > The removal of the 1.x API is showing how expensive it truly is to
> safely remove API versions, and that’s something to be weighed in addition
> to maintenance cost to the project for those versions.
>
> The 1.x API removal was a prime example in just how much code was able
> to stay on the stable API version until we decided to remove it. With
> this proposal, all of that code would still be able to remain
> unchanged for a longer period of time than without this proposal,
> saving much unnecessary toil. It also reduces maintenance cost of
> prior versions because in creating less new major versions, we will
> have less of them to support over time.
>
> > I think the million-dollar question revolves more around how much/far
> back we are willing to support. If it’s only one release at a time, that’s
> going to drive those 3rd party code maintenance costs up significantly
> higher as part of just doing business which will slow down deployments even
> if releases are moving faster.
>
> I don't think so, because we'd be creating less major versions to
> remove in the first place, so we wouldn't have to worry about
> upgrading 3rd party code that stays on the stable API version. From
> the lessons learned with the API 1.x removal, the vast majority of 3rd
> party code stays on the stable API version until that version is
> getting removed. So we would be releasing faster *and* deploying
> faster.
>
> For your 3rd reason, developers working on the same route generally
> always have to coordinate changes in some way, and we are usually very
> good about that. That is how it's always been done and will continue
> to be done, unaffected by this proposal. It's not really the release
> manager's responsibility to figure out what has been broken and what
> upgrades need to be coordinated. That is a collective responsibility
> of all ATC developers when making breaking changes. Breaking changes
> should be called out in the changelog, along with any prescribed
> upgrade orders. If this proposal is accepted, I think we should give
> these types of changes their own specific section in the changelog.
>
> For your 4th reason, I don't think we've ever decided to merge
> something that was half-baked just to avoid API versioning issues. A
> PR is already a feature branch and can remain open until ready to
> merge. The problem this proposal solves is when a developer starts
> developing a feature towards e.g. API 4.0, but we just cut a release
> and are now on API 5.0, so that developer then needs to *rework* their
> PR to now target API 5.0. Unnecessary rework decreases productivity
> and makes the feature take longer to get to production and produce
> value for us. This proposal basically extends the runway, so that we
> don't have to make the decision to delay the release if the feature is
> nearly complete in order to avoid that unnecessary rework. We can
> simply cut the release on time and have the new feature land in the
> subsequent release (with no unnecessary rework for the developer).
> Additionally, it is always somewhat disappointing when we have to
> *wait* to start developing a new feature because a release is about to
> be cut in order to avoid unnecessary rework caused by API versioning.
> This proposal would allow that work to start at any point in time
> without adding any unnecessary rework.
>
> For your last point, I know you keep linking to Rob's
> https://github.com/rob05c/apiver library whenever conversations
> related to API versioning come up, but this proposal is mainly
> concerned with major version changes, for which that library was not
> made. Also, I'm not really sure how Elixir would help solve this
> problem.
>
> - Rawlin
>

Reply via email to