On Sat, Jul 04, 2020 at 09:11:34AM -0700, Ben Rudiak-Gould wrote: Quoting William Kahan, one of the people who designed IEEE-754 and NANs:
> What he says there (on page 9) is > > >Some familiar functions have yet to be defined for NaN . For instance > >max{x, y} should deliver the same result as max{y, x} but almost no > >implementations do that when x is NaN . There are good reasons to > >define max{NaN, 5} := max{5, NaN} := 5 though many would disagree. > > It's clear that he's not referring to standard behavior here and I'm > not convinced that even he believes very strongly that min and max > should behave that way. Are you suggesting that Kahan *doesn't* believe that min() and max() should be symmetric? This is what Python does now: py> max(float('nan'), 1) nan py> max(1, float('nan')) 1 That's the sort of thing Kahan is describing, and it's clear to me that he thinks that's a bad thing. I will accept that treating NANs as missing values (as opposed to NAN-poisoning behaviour that returns a NAN if one of the arguments is a NAN) is open to debate. Personally, I don't think that there aren't many, or any, good use-cases for NAN-poisoning in this function. When we had this debate four years ago, I recall there was one person who suggested a use for it, but without going into details that I can recall. > NaN means "there may be a correct answer but I don't know what it is." That's one interpretation, but not the only one. Python makes it quite hard to get a NAN from the builtins, but other languuages do not. Here's Julia: julia> 0/0 NaN So there's at least one NAN which means *there is no correct answer*. In my younger days I was a NAN bigot who instisted that there was only one possible interpretation for NANs, but as I've gotten older I've accepted that treating them as *missing values* is acceptable. (Besides, like it or not, that's what a ton of software does.) With that interpretation, a NAN passed as the lower or upper bounds can be seen as another way of saying "no lower bounds" (i.e. negative infinity) or "no upper bounds" (i.e. positive infinity), not "some unknown bounds". > For example, evaluating (x**2+3*x+1)/(x+2) at x = -2 yields NaN. *cough* Did you try it? In Python it raises an exception; in Julia it returns -Inf. Substituting -2 gives -1/0 which under the rules of IEEE-754 should give -Inf. > The correct answer to the problem that yielded this formula is > probably -1, How do you get that conclusion? For (x**2+3*x+1)/(x+2) to equal -1, you would have to substitute either x=-3 or x=-1, not -2. py> x = -1; (x**2+3*x+1)/(x+2) -1.0 py> x = -3; (x**2+3*x+1)/(x+2) -1.0 [...] > It's definitely true that if plugging in any finite or infinite number > whatsoever in place of a NaN will yield the same result, then that > should be the result when you plug in a NaN. For example, clamp(x, > NaN, x) should be x for every x (even NaN), and clamp(y, NaN, x) where > y > x should be a ValueError (or however invalid bounds are treated). I think you are using the signature clamp(lower, value, upper) here. Is that right? I dislike that signature but for the sake of the argument I will use it in the following examples. I agree with you that `clamp(lower=x, value=NAN, upper= x)` should return x. I agree that we should raise if the bounds are in reverse order, e.g. clamp(lower=2, value=x, upper=1) I trust we agree that if the value is a NAN, and the bounds are not equal, we should return a NAN: clamp(1, NAN, 2) # return a NAN So I think we agree on all these cases. So I think there is only one point of contention: what to do if the bounds are NANs? There are two obvious, simple and reasonable behaviours: Option 1: Treat a NAN bounds as *missing data*, which effectively means "there is no limit", i.e. as if you had passed the infinity of the appropriate sign for the bounds. Option 2: Treat a NAN bounds as invalid, or unknown, in which case you want to return a NAN (or an exception). This is called "NAN poisoning". I will happily accept that people might reasonably want either behaviour. But unless we provide two implementations, we have to pick one or the other. Which should we pick? In the absense of any clear winner, my position is that NAN poisoning should be opt-in. We should pick the option which inconveniences people who want the other the least. Let's say the stdlib uses Option 1. The function doesn't need to do any explicit checks for NANs, so there's no problem with large integers overflowing, or Decimals raising ValueError, or any need to do a conversion to float. People who want NAN poisoning can opt-in by doing a check for NANs themselves, either in a wrapper function, or by testing the bounds *once* ahead of time and then just calling the stdlib `clamp` once they know they aren't NANs. If they use a wrapper function, they end up testing the bounds for NANs on every call, but that's what Option 2 would do so they are no worse off. So if we choose Option 1, the inconvenience to people who want Option 2 is very small. Now consider if we pick Option 2. That means that every single call to clamp checks the bounds to see if they are NANs, even though they have probably been checked a thousand times before: for x in range(10000): clamp(value=x, lower=50, upper=100) # every single time, clamp will check that # neither 50 nor 100 is a NAN. The implementation is more complex: it has to be prepared for overflow errors at the very least: >>> math.isnan(10**600) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too large to convert to float so that's even more expense that everyone has to pay, whether they need it or not. Those who want to avoid NAN poisoning can write a wrapper function: def myclamp(value, lower, upper): try: if math.isnan(lower): lower = float("-inf") except OverflowError: lower = float("-inf") # And similar for upper return clamp(value, lower, upper) but now they are paying the cost *twice*, not avoiding it. The standard clamp still does the same NAN testing. They can't opt-out of testing for NANs, instead they end up doing the tests twice, once in their wrapper function and once in the standard function. Option 1 respects those who want to opt-out of NAN testing, and those who might choose to opt-in to it. Option 2 forces NAN testing on everyone whether they need it or not, and punishes those who try to opt-out by making them do twice as many NAN tests when they actually want to do none at all. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/X2B5KDFFOB4PWX4253I4XGOPFQPA7N75/ Code of Conduct: http://python.org/psf/codeofconduct/