On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote: > > On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote: > > <snip> > > > That's very hard to describe, since it relies so much on previous > experience and qualitative judgements. That's the main reason why I > had > more examples before, but they just led to more discussion about > those > examples - so that didn't quite have the intended effect. > > <snip> > > I only took a short course and used this very little. I am sure there > are many here with industry experience where the use of Q&A is every > day work. > > Thanks for thinking about this Sebastian. I used to use such a risk management approach fairly regularly, and it can be useful. In general it's something you do for a larger design change or new product, rather than for an individual change. It helps get an overview of the main risks, and prompts thinking about risks you may have missed. > > One concept from there is to create a risk/danger and probability > assessment, which can be ad-hoc for your product. An example just to > make something up: > > > > I am not sure anyone finds this interesting or if fits to the NEP > specifically [1], but I truly think it can be useful (although maybe it > doesn't need to be formalized). So I fleshed it out: > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below) > I'd be happy to try it. It does feel a bit too much to put all that content into the NEP though. Maybe we can just add a more brief "assess severity and likelihood and severity of your proposed change, and include that assessment when proposing a deprecation. See <here> for more details". And then we can link to a wiki page or separate doc page, that we can then easily update without it being a NEP revision. Cheers, Ralf > My reasoning for suggesting it is that a process/formalism (no matter how > ridiculous it may seem at first) for how to assess the impact of a backward > compatible change can be helpful by: conceptualizing, clearly separating > backward incompatible impact assessment from benefits assessment, making it > easier to follow a decision/thought processes, and allowing some nuance [2]. > > I actually believe that it can help with difficult decisions, even if only > applied occasionally, and that it is not a burden because it provides > fairly steps. Will it be useful often? Maybe not. But every time there is a > proposal and we pause and hesitate because it is unclear whether it is > worth the backcompat impact, I think this can provide a way to discuss it > and come to a decision as objectively as possible. (And no, I do not think > that any of the categories or mitigation strategies are an exact science.) > > Cheers, > > Sebastian > > > [1] This is additional to the proposed promises such as a two releases of > deprecations and discussing most/all deprecations on the mailing list, > which are unrelated. It is rather to provide a formalism where currently > only the examples give points of reference. > [2] There is a reason that also the Python version is short and > intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and > https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 There > are just few definite rules that can be formalized, so a framework for > diligent assessment seems the best we can do (if we want to). > > > > > > Assessing impact > Here “impact” means how unmodified code may be negatively affected by a > change ignoring any deprecation period. > > To get an idea about how much impact a change has, try to list all > potential impacts. This will often be just a single item (user of function > x has to replace it with y), but it could be multiple different ones. > *After* listing all potential impacts rank them on the following two > scales (do not yet think about how to make the transition easier): > > 1. *Severity* (How bad is the impact for an affected user?) > - Minor: A performance regression or change in (undocumented) > warning/error category will fall here. This type of change would > normally > not require a deprecation cycle or special consideration. > - Typical: Code must be updated to avoid an error, the update is > simple to do in a way that works both on existing and future NumPy > versions. > - Severe: Code will error or crash, and there is no simple work > around or fix. > - Critical: Code returns incorrect results. A change requiring > massive effort may fall here. A hard crash (e.g. segfault) in itself is > typically *not* critical. > 2. *Likelihood* (How many users does the change affect?) > - Rare: Change has very few impacted users (or even no known users > after a code search). The normal assumption is that there is always > someone > affected, but a rarely used keyword argument of an already rarely used > function will fall here. > - Limited: Change is in a rarely used function or function > argument. Another possibility is that it affects only a small group of > very > advanced users. > - Common: Change affects a bigger audience or multiple large > downstream libraries. > - Ubiquitous: Change affects a large fraction of NumPy users. > > The categories will not always be perfectly clear. That is OK. Rather than > establishing precise guidelines, the purpose is a structured *processes* that > can be reviewed. When the impact is exceptionally difficult to assess, it > is often feasible to try a change on the development branch while > signalling willigness to revert it. Downstream libraries test against it > (and the release candidate) which gives a chance to correct an originally > optimistic assessment. > > After assessing each impact, it will fall somewhere on the following table: > Severity\LikelyhoodRareLimitedCommonUbiquitous > *Minor* ok ok ok? > *Typical* ok? no? > *Severe* no? no > *Critical* no? no no no > Note that all changes should normally follow the two release deprecation > warning policy (except “minor” ones). The “no” fields means a change is > clearly unacceptable, although a NEP can always overrule it. This table > only assesses the “impact”. It does not assess how the impact compares to > the benefits of the proposed change. This must be favourable no matter how > small the impact is. However, by assessing the impact, it will be easier to > weigh it against the benefit. (Note that the table is not symmetric. An > impact with “critical” severity is unlikely to be considered even when no > known users are impacted.) > > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Mitigation-and-arguing-of-benefits>Mitigation > and arguing of benefits > Any change falling outside the “ok” fields requires careful consideration. > When an impact is larger, you can try to mitigate it and “move” on the > table. Some possible reasons for this are: > > - A avoidable warning for at least two releases (the policy for any > change that modifies behaviour) reduces a change one category (usually from > “typical” to “minor” severity). > - The severity category may be reduced by creating an easy work around > (i.e. to move it from “sever” to “typical”). > - Sometimes a change may break working code, but also fix *existing* bugs, > this can offset the severity. In extreme cases, this may warrant > classifying a change as a bug-fix. > - For particularly noisy changes (i.e. ubiquitous category) > considering fixing downstream packages, delay the warning (or use a > PendingDeprecationWarning). Simply prolonging the the deprecation > period is also an option. This reduces how many users struggle with the > change and smoothens the transition. > - Exceptionally clear documentation and communication could be used to > ensure that the impact is more acceptable. This may not be enough to move a > “category” by itself, but also helps. > > After mitigation, the benefits can be assessed: > > - Any benefit of the change can be argued to “offset” the impact. If > this is necessary, a broad community discussion on the mailing list is > required. It should be clear that this does not actually “mitigate” the > impact but rather argues that the benefit outweighs it. > > These are not a fixed set of rules, but provide a framework to assess and > then try to mitigate the impact of a proposed change to an acceptable > level. Arguing that a benefit can overcome multiple “impact” categories > will require exceptionally large benefits, and most likely a NEP. For > example a change with an initial impact classification of “severe” and > “ubiquitous” is unlikely to even be considered unless the severity can be > reduced. > Many deprecations will fall somewhere below or equal to a “typical and > limited” impact (i.e. removal of an uncommon function argument). They > recieve a deprecation warning to make the impact acceptable with a brief > discussiong that the change itself is worthwhile (i.e. the API is much > cleaner afterwards). Any more disruptive change requires broad community > discussion. This needs at least a discussion on the NumPy mailing list and > it is likely that the person proposing it will be asked to write a NEP. > > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Summary-and-reasoning-for-this-processess>Summary > and reasoning for this processess > The aim of this process and table is to provide a loose formalism with the > goal of: > > - *Diligence:* Following this process ensures detailed assessment of > its impact without being distracted by the benefits. This is achieved by > following well defined steps: > 1. Listing each potential impact (usually one). > 2. Assessing the severity. > 3. Assessing the likelihood. > 4. Discussing what steps are/can be taken to lower the impact *ignoring > any benefits*. > 5. If the impact is not low at this point, this should prompt > considering and listing of alternatives. > 6. Argue that the benefits outweigh the remaining impact. (This is > a distinct step: the original impact assessment stands as it was.) > - *Transparency:* Using this process for difficult decisions makes it > easier for the reviewer and community to follow how a decision was made and > criticize it. > - *Nuance:* When the it is clear that an impact is larger than typical > with will prompt more care and thought. In some cases it may also clarify > that a change is lower impact than expected on first sight. > - *Experience:* Using a similar formalism for many changes makes it > easier to learn from past decisions by providing an approach to compare and > conceptualize them. > > We aim to follow these steps in the future for difficult decisions. In > general, any reviewer and community member may ask for this process to be > followed for a proposed change, if the change is difficult, it will be > worth the effort. If it is very low impact it will be quick to clarify why. > NOTE: At this time the process is new and is expected to require > clarification. > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Examples>Examples > It should be stressed again, that the categories will rarely be clear and > intentially are categorized with some uncertainty below. Even unclear > categories can help in forming a more clear idea of a change. > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Histogram>Histogram > The “histogram” example doesn’t really add much with respect to this > process. But noting the duplicate effort/impact would move probably move it > into a more severe category than most deprecations. That makes it a more > difficult decision and indicates that careful thought should be spend on > alternatives. > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Integer-indexing-requirement>Integer > indexing requirement > > - Severity: Typical–Severe (although fairly easy, users often had to > do many changes) > - Likelihood: Ubiquitous > > How ubiquitous it really was became probably only clear after the (rc?) > release. The change would now probably go through a NEP as it initially > falls into the lower right part of the table. To get into the > “acceptable” part of the table we note that: > > 1. Real bugs were caught in the processes (argued to reduce severity) > 2. The deprecation was delayed and longer than normally (argued to > mitigate the number of affected users by giving much more time) > > Even with these considerations, it still has a larger impact and clearly > requires careful thought and community discussion about the benefits. > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Removing-financial-functions>Removing > financial functions > > - Severity: Severe (on the high end) > - Likelihood: Limited (maybe common) > > While not used by a large user base (limited), the removal is disurptive > (severe). The change ultimately required a NEP, since it is not easy to > weigh the maintainence advantage of removing the functions against the > impact to their users. > The NEP included the reduction of the severity by providing a work-around: > A pip installable package as a drop-in replacement (reducing the severity). > For heavy users of these functions this will still be more severe than most > deprecations, but it lowered the impact assessment enough to consider the > benefit of removal to outweigh the impact. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion