Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Aleksey Yeshchenko Thu, 12 Dec 2024 03:07:01 -0800

I don’t like ‘unstable’ either, albeit for a different reason, but I don’t 
think three is enough and fits, as we already have some features that don’t fit 
into either of (preview,beta,ga) - released but broken, released but dangerous, 
deprecated, removed.


For new features going forward, alpha (preview) -> beta -> GA works well enough.

But we also need an approved non-euphemism for features like MVs (I suggest 
‘broken’) and possibly a softer version of it ('dangerous') for our existing 
features that work fine in some narrow well-defined circumstances but will blow 
in your face if you don’t know exactly what you are doing.

These classifications are largely orthogonal.

Alpha(preview)->Beta->GA communicates readiness of a feature under development, 
with GA being the default final state for most features.

From there a feature can transition into ‘broken’ or ‘dangerous’ territory. 
Serious issues get uncovered (very) late sometimes. It is what it is.
And we do deprecate and remove functionality when it’s superseded.


> -1 on unstable. It's way too many words than are needed. Three is a
> magic number and fits:
> 
> Preview
> Beta
> GA

> On 11 Dec 2024, at 18:50, Josh McKenzie <[email protected]> wrote:
> 
> A structured, disciplined approach to graduating something from [Optional] -> 
> [Default] makes sense to me, similar to how we're talking about a structured 
> flow of [Preview] -> [Beta] -> [GA]. Having those clear stages gives us a 
> framework to define what requirements of stage transitions would be which'll 
> ideally lead to us producing higher quality, more predictable, more 
> consistent results for our end users.
> 
> For instance, requirements from [Optional] -> [Default] could be higher level 
> abstractions like:
> Confidence in stability
> Strong evidence to indicate superiority in majority of workloads (by count or 
> importance or size, etc)
> These are all things we kind of do implicitly and ad-hoc on the mailing list, 
> and I'm not looking to tie us down to any granular structure or specificity. 
> More thinking it could be useful for someone that's worked on something who 
> wonders "Huh. How do I take this from being optional to the default?" and 
> having an answer better than "reinvent the wheel every time and fling 
> spaghetti at the dev list and pray".
> 
> :)
> 
> 
> On Wed, Dec 11, 2024, at 1:04 PM, Paulo Motta wrote:
>> Thanks for bringing up this topic, Josh. 
>> 
>> Outside of the major features (ie. MV/SAI/TCM/Accord), one related 
>> discussion in this topic is: how can we "promote" small improvements in 
>> existing features from optional to default ?
>> 
>> It makes sense to have optimizations launched behind a feature flag 
>> initially (beta phase) while the improvement gets real world exposure, but I 
>> think we need a better way to promote these optimizations to default 
>> behavior on a regular cadence.
>> 
>> Take for example optimized repairs from CASSANDRA-16274. It was launched in 
>> 4.x as an optional feature gated behind a flag, ie. 
>> auto_optimise_full_repair_streams: false. 
>> 
>> I could be easily missing something, but is there a world where 
>> non-optimized repairs make sense once this optimization is proven to work ? 
>> I agree this is fine while the feature is maturing, but at some point we 
>> need to rip the bandaid and make the optimization default (and clearly 
>> communicate that). This would allow cleanup code toil of default behavior 
>> that is no longer being used, because everyone is enabling the improvement 
>> during deployment.
>> 
>> This is just one example to demonstrate the issue and I don't want this 
>> discussion to focus on this particular case, but I can think of other 
>> improvements launched as optional that are never made default.
>> 
>> I don't know if this should be continued to be addressed on a 
>> improvement-by-improvement basis or if we could have a more streamlined 
>> process to review and communicate these changes more consciously at every 
>> major release.
>> 
>> In the same way we open a loop when adding an optimized behavior behind a 
>> feature flag, I think we should have a process to close these loops by 
>> promoting these optimizations to default when it makes sense.
>> 
>> On Tue, Dec 10, 2024 at 2:10 PM Josh McKenzie <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> So some questions to test a world w/3 classifications (Preview, Beta, GA):
>> - What would we do with the current experimental features (MV's, JDK17, 
>> witnesses, etc)? Flag them as preview or beta as appropriate on a 
>> case-by-case basis and add runtime warnings / documentation where missing?
>> 
>> - What would we do in the future if a feature's GA and we discover a Very 
>> Big Problem with it that'll take some time to fix? Keep it GA but cut a 
>> hotfix release w/a bunch of warnings? Bounce it back to Preview? Leave it be 
>> and just feverishly try and fix it?
>> 
>>> for policy decisions like this (that don’t need to be agreed in advance) we 
>>> should try to legislate the minimum necessary policy to proceed today
>> Definitely agree; MV's being in limbo for years strains the "3-step 
>> classification" structure for me. If we want to avoid having a solution for 
>> the MV-shaped case on the grounds we won't allow ourselves to reach this 
>> state again in the future, that seems reasonable. With the caveat that we 
>> might be in a similar situation with vector search right now, etc.
>> 
>> 
>> On Tue, Dec 10, 2024, at 1:48 PM, Benedict Elliott Smith wrote:
>>> Yep, I agree with this - we can revisit if we ever absolutely feel the need 
>>> to add additional states for exceptional circumstances.
>>> 
>>> > On 10 Dec 2024, at 13:24, Patrick McFadin <[email protected] 
>>> > <mailto:[email protected]>> wrote:
>>> > 
>>> > -1 on unstable. It's way too many words than are needed. Three is a
>>> > magic number and fits:
>>> > 
>>> > Preview
>>> > Beta
>>> > GA
>>> > 
>>> > As a matter of testing the process, any pending CEP should go though
>>> > this exercise so we can see how it will work.
>>> > 
>>> > PS
>>> > Got the actual numbers from Whimsy.
>>> > DEV - 1425 users
>>> > USER - 2650
>>> > 
>>> > This means that when features experience a state change, finding more
>>> > avenues to get the word out will be important.
>>> > 
>>> > On Tue, Dec 10, 2024 at 10:04 AM Benedict Elliott Smith
>>> > <[email protected] <mailto:[email protected]>> wrote:
>>> >> 
>>> >> As an aside, it would be nice to admit we basically revisit everything 
>>> >> each time it becomes relevant again, and for policy decisions like this 
>>> >> (that don’t need to be agreed in advance) we should try to legislate the 
>>> >> minimum necessary policy to proceed today, and leave future refinements 
>>> >> for later when the relevant context arises.
>>> >> 
>>> >> On 10 Dec 2024, at 13:00, Benedict Elliott Smith <[email protected] 
>>> >> <mailto:[email protected]>> wrote:
>>> >> 
>>> >> I agree with Aleksey that if we think something is broken, we shouldn’t 
>>> >> use euphemisms, and for this reason I don’t like unstable (this could 
>>> >> for instance simply mean API unstable). If we intend to never need this 
>>> >> descriptor, we should avoid bike-shedding and insert a “placeholder” for 
>>> >> now to be refined as and when we need it when we have the necessary 
>>> >> future context.
>>> >> 
>>> >> i.e.
>>> >> 
>>> >> preview -> beta -> [“has problems that will take time to resolve 
>>> >> placeholder” -> beta] -> GA
>>> >> 
>>> >> 
>>> >> 
>>> >> On 10 Dec 2024, at 12:39, Josh McKenzie <[email protected] 
>>> >> <mailto:[email protected]>> wrote:
>>> >> 
>>> >> +1 to this classification with one addition. I think we need to augment 
>>> >> this with formalization on what we do with features we don't recommend 
>>> >> people use (i.e. MV in their current incarnation). For something 
>>> >> retroactively found to be unstable, we could add an "Unstable" 
>>> >> qualification for it, leaving us with:
>>> >> 
>>> >> Unstable: Warnings on use, clearly communicated as to why, either 
>>> >> on-track to be fixed or removed from the codebase. No lingering for 
>>> >> years in a fugue state. We should target never needing this 
>>> >> classification.
>>> >> Preview: Ready to be tried by end users but has caveats and most likely 
>>> >> is not api stable. Developer only documentation acceptable.
>>> >> Beta: Feature complete/API stable but has not had enough testing to be 
>>> >> considered rock solid. Developer and User documentation required.
>>> >> GA: Ready for use, no known issue, PMC is satisfied with the testing 
>>> >> that has been done
>>> >> 
>>> >> 
>>> >> To walk through how some of the flow might look to test the above:
>>> >> 
>>> >> Simple case:
>>> >> - Preview -> Beta -> GA
>>> >> 
>>> >> Late discovered defect case:
>>> >> - Preview -> Beta -> Unstable -> Beta -> GA
>>> >> 
>>> >> Pathological worst-case (i.e. MV):
>>> >> - Preview -> Beta -> GA -> Unstable -> [Preview|Removed]
>>> >> 
>>> >> On Tue, Dec 10, 2024, at 12:29 PM, Jeremiah Jordan wrote:
>>> >> 
>>> >> I agree with Aleksey and Patrick.  We should define terminology and then 
>>> >> stick to it.  My preferred list would be:
>>> >> 
>>> >> Preview - Ready to be tried by end users but has caveats and most likely 
>>> >> is not api stable.
>>> >> Beta - Feature complete/API stable but has not had enough testing to be 
>>> >> considered rock solid.
>>> >> GA - Ready for use, no known issue, PMC is satisfied with the testing 
>>> >> that has been done
>>> >> 
>>> >> 
>>> >> Whether or not something is enabled by default or the default 
>>> >> implementation is a separate access from the readiness.  Though if we 
>>> >> are replacing an existing thing with a new default I would hope we apply 
>>> >> extra rigor to allowing that to happen.
>>> >> 
>>> >> -Jeremiah
>>> >> 
>>> >> On Dec 10, 2024 at 11:15:37 AM, Patrick McFadin <[email protected] 
>>> >> <mailto:[email protected]>> wrote:
>>> >> 
>>> >> I'm going to try to pull this back from the inevitable bikeshedding
>>> >> and airing of grievances that happen. Rewind all the way back to
>>> >> Josh's  original point, which is a defined process. Why I really love
>>> >> this being brought up is our maturing process of communicating to the
>>> >> larger user base. The dev list has very few participants. Less than
>>> >> 1000 last I looked. Most users I talk to just want to know what they
>>> >> are getting. Well-formed, clear communication is how the PMC can let
>>> >> end users know that a new feature is one of three states:
>>> >> 
>>> >> 1. Beta
>>> >> 2. Generally Available
>>> >> 3. Default (where appropriate)
>>> >> 
>>> >> Yes! The work is just sorting out what each level means and then
>>> >> codifying that in confluence. Then, we look at any features that are
>>> >> under question, assign a level, and determine what it takes to go from
>>> >> one state to another.
>>> >> 
>>> >> The CEPs need to reflect this change. What makes a Beta, GA, Default
>>> >> for new feature X. It makes it clear for implementers and end users,
>>> >> which is an important feature of project maturity.
>>> >> 
>>> >> Patrick
>>> >> 
>>> >> 
>>> >> 
>>> >> On Dec 10, 2024 at 5:46:38 AM, Aleksey Yeshchenko <[email protected] 
>>> >> <mailto:[email protected]>> wrote:
>>> >> 
>>> >> What we’ve done is we’ve overloaded the term ‘experimental’ to mean too 
>>> >> many related but different ideas. We need additional, more specific 
>>> >> terminology to disambiguate.
>>> >> 
>>> >> 1. Labelling released features that were known to be unstable at release 
>>> >> as ‘experimental’  retroactively shouldn’t happen and AFAIK only 
>>> >> happened once, with MVs, and ‘experimental’ there was just a euphemism 
>>> >> for ‘broken’. Our practices are more mature now, I like to think, that a 
>>> >> situation like this would not arise in the future - the bar for 
>>> >> releasing a completed marketable feature is higher. So the label 
>>> >> ‘experimental’ should not be applied retroactively to anything.
>>> >> 
>>> >> 2. It’s possible that a released, once considered production-ready 
>>> >> feature, might be discovered to be deeply flawed after being released 
>>> >> already. We need to temporarily mark such a feature as ‘broken' or 
>>> >> ‘flawed'. Not experimental, and not even ‘unstable’. Make sure we emit a 
>>> >> warning on its use everywhere, and, if possible, make it opt-in in the 
>>> >> next major, at the very least, to prevent new uses of it. Announce on 
>>> >> dev, add a note in NEWS.txt, etc. If the flaws are later addressed, 
>>> >> remove the label. Removing the feature itself might not be possible, but 
>>> >> should be considered, with heavy advanced telegraphing to the community.
>>> >> 
>>> >> 3. There is probably room for genuine use of ‘experimental’ as a feature 
>>> >> label. For opt-in features that we commit with an understanding that 
>>> >> they might not make it at all. Unstable API is implied here, but a 
>>> >> feature can also have an unstable API without being experimental - so 
>>> >> ‘experimental' doesn’t equal to ‘api-unstable’. These should not be 
>>> >> relied on by any production code, they would be heavily gated by 
>>> >> unambiguous configuration flags, disabled by default, allowed to be 
>>> >> removed or changed in any version including a minor one.
>>> >> 
>>> >> 4. New features without known flaws, intended to be production-ready and 
>>> >> marketable eventually, that we may want to gain some real-world 
>>> >> confidence with before we are happy to market or make default. UCS, for 
>>> >> example, which seems to be in heavy use in Astra and doesn’t have any 
>>> >> known open issues (AFAIK). It’s not experimental, it’s not unstable, 
>>> >> it’s not ‘alpha’ or ‘beta’, it just hasn't been widely enough used to 
>>> >> have gained a lot of confidence. It’s just new. I’m not sure what label 
>>> >> even applies here. It’s just a regular feature that happens to be new, 
>>> >> doesn’t need a label, just needs to see some widespread use before we 
>>> >> can make it a default. No other limitation on its use.
>>> >> 
>>> >> 5. Early-integrated, not-yet fully-completed features that are NOT 
>>> >> experimental in nature. Isolated, gated behind deep configuration flags. 
>>> >> Have a CEP behind them, we trust that they will be eventually completed, 
>>> >> but for pragmatic reasons it just made sense to commit them at an 
>>> >> earlier stage. ‘Preview’, ‘alpha’, ‘beta’ are labels that could apply 
>>> >> here depending on current feature readiness status. API-instability is 
>>> >> implied. Once finished they just become a regular new feature, no flag 
>>> >> needed, no heavy config gating needed.
>>> >> 
>>> >> I might be missing some scenarios here.
>>> >> 
>>> >> 
>>> >>

Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Reply via email to