Re: [Numpy-discussion] sorting -inf, nan, inf

Tim Hochberg Tue, 19 Sep 2006 14:05:12 -0700

Charles R Harris wrote:
>
>
> On 9/19/06, *Tim Hochberg* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     A. M. Archibald wrote:
>     > On 19/09/06, Tim Hochberg <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>     >
>     >> Keith Goodman wrote:
>     >>
>     >>> In what order would you like argsort to sort the values -inf,
>     nan, inf?
>     >>>
>     >>>
>     >> Ideally, -inf should sort first, inf should sort last and nan
>     should
>     >> raise an exception if present.
>     >>
>     >> -tim
>     >>
>     >
>     > Mmm. Somebody who's working with NaNs has more or less already
>     decided
>     > they don't want to be pestered with exceptions for invalid data.
>     Do you really think so? In my experience NaNs are nearly always
>     just an
>     indication of a mistake somewhere that didn't get trapped for one
>     reason
>     or another.
>
>     >  I'd
>     > be happy if they wound up at either end, but I'm not sure it's worth
>     > hacking up the sort algorithm when a simple isnan() can pull
>     them out.
>     >
>     Moving them to the end seems to be the worst choice to me. Leaving
>     them
>     alone is fine with me. Or raising an exception would be fine. Or doing
>     one or the other depending on the error mode settings would be even
>     better if it is practical.
>
>     > What's happening now is that NaN<a, NaN==a, and NaN>a are all
>     false,
>     > which rather confuses the sorting algorithm. But as long as it
>     doesn't
>     > actually *break* (that is, leave some of the non-NaNs incorrectly
>     > sorted) I don't care.
>     >
>     Is that true? Are all of numpy's sorting algorithms robust against
>     nontransitive objects laying around? The answer to that appears to be
>     no. Try running this a couple of times to see what I mean:
>
>     n = 10
>     for i in range(10):
>         for kind in 'quicksort', 'heapsort', 'mergesort':
>                  a = rand(n)
>                  b = a.copy()
>                  a[n//2] = nan
>                  a.sort(kind=kind)
>                  b.sort(kind=kind)
>                  assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)
>
>
>     The values don't correctly cross the inserted NaN and the sort is
>     incorrect.
>
>
> Except for heapsort those are doing insertion sorts because n is so 
> small. Heapsort is trickier to understand because of the way the heap 
> is formed and values pulled off.
I'm not sure where the breakpoint is, but I was seeing failures for all 
three sort types with N as high as 10000. I suspect that they're all 
broken in the presence of  NaNs.  I further suspect you'd need some 
punishingly slow n**2 algorithm to be robust in the presence of NaNs.


> I'll look into that. I bet searchsorted gets messed up by nan's. Do 
> you think it worthwhile to worry about it?

No. But that's because it's my contention is that any sequence with NaNs 
in it is *not sorted* and thus isn't suitable input for searchsorted.

-tim


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Numpy-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Re: [Numpy-discussion] sorting -inf, nan, inf

Reply via email to