[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Mon, 30 Dec 2019 16:03:34 -0800

On Dec 30, 2019, at 14:35, David Mertz <me...@gnosis.cx> wrote:
> 
> On Mon, Dec 30, 2019, 5:17 PM Andrew Barnert 
>> The fact that all three of the alternate orders anyone’s asked for or 
>> suggested turned out to be spurious, and nobody can think of a good use for 
>> a different one, that’s a pretty good argument that YAGNI.
> 
> 
> I think everyone agrees that the only actual use cases are 'ignore', 
> 'poison', 'raise', and I suppose 'fast/unsafe'.
> 
> I missed some details in trying to emulate IEEE total_order(), but my point 
> was never that it was desirable, just easy (but it turns out to be slightly 
> less way than I first thought). Any actual order involving NaNs is a fool's 
> mission.


OK. Sorry; I misinterpreted your point, and I think we agree here.

Well, I guess I still disagree about total_order looking easy (nothing with 
IEEE floats is ever as easy as you expect…) but that isn’t relevant here. :)

>> Wait, what’s wrong with the behavior of math.isnan for floats? If you want a 
>> NaN test that differs from the one defined by IEEE, I think we’re off into 
>> uncharted waters.
> 
> 
> I do not believe or accept that statistics is meant to "blow up on 
> non-floats". I might like math.isnan() to behave better with non-floats 
> numbers, but that's a different issue. You seem to propose a perfectly good 
> version of a general is_nan() later in your comment.

OK, I thought you were saying math.isnan is the wrong semantics even for floats.

The key to me is that something “general enough for median and friends” is 
pretty easy, while something truly general that handles all possible nan-like 
values in all possible types is not.

>> Wait, are you arguing that we should just offer a generic is_nan function 
>> (as a builtin?), instead of adding an on_nan handler parameter to median and 
>> friends?
> 
> 
> I don't think a built-in. Maybe in math, or statistics. I don't care if it's 
> spelled '_isnan()' for private use by statistics. I'm not worried about 
> public functionality. Just avoiding repetition in different paths in 
> 'median_<whatever>' and xtile().

I think they all start with data = sorted(data), so they could all change to 
use data = _nan_smart_sorted(data, on_nan), and just put all the smarts 
(prefiltering and/or keying with isnan depending on the value of on_nan) inside 
that _nan_smart_sorted helper.

The only issue to worry about is that I think some of them check for a minimum 
of 1 or 2 elements before sorting, and, given IGNORE (and POISON?) they 
probably need to instead check for that after the filter-and-sort step. 
(Presumably asking for the xtile of [nan, nan, nan] with IGNORE is an error, 
not nan.)

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UBQGG7CD55JMT4LHDEG43GNQOG2A6VRB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to