Charles R Harris wrote: > > > On 9/19/06, *Tim Hochberg* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > A. M. Archibald wrote: > > On 19/09/06, Tim Hochberg <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > > >> Keith Goodman wrote: > >> > >>> In what order would you like argsort to sort the values -inf, > nan, inf? > >>> > >>> > >> Ideally, -inf should sort first, inf should sort last and nan > should > >> raise an exception if present. > >> > >> -tim > >> > > > > Mmm. Somebody who's working with NaNs has more or less already > decided > > they don't want to be pestered with exceptions for invalid data. > Do you really think so? In my experience NaNs are nearly always > just an > indication of a mistake somewhere that didn't get trapped for one > reason > or another. > > > I'd > > be happy if they wound up at either end, but I'm not sure it's worth > > hacking up the sort algorithm when a simple isnan() can pull > them out. > > > Moving them to the end seems to be the worst choice to me. Leaving > them > alone is fine with me. Or raising an exception would be fine. Or doing > one or the other depending on the error mode settings would be even > better if it is practical. > > > What's happening now is that NaN<a, NaN==a, and NaN>a are all > false, > > which rather confuses the sorting algorithm. But as long as it > doesn't > > actually *break* (that is, leave some of the non-NaNs incorrectly > > sorted) I don't care. > > > Is that true? Are all of numpy's sorting algorithms robust against > nontransitive objects laying around? The answer to that appears to be > no. Try running this a couple of times to see what I mean: > > n = 10 > for i in range(10): > for kind in 'quicksort', 'heapsort', 'mergesort': > a = rand(n) > b = a.copy() > a[n//2] = nan > a.sort(kind=kind) > b.sort(kind=kind) > assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b) > > > The values don't correctly cross the inserted NaN and the sort is > incorrect. > > > Except for heapsort those are doing insertion sorts because n is so > small. Heapsort is trickier to understand because of the way the heap > is formed and values pulled off. I'm not sure where the breakpoint is, but I was seeing failures for all three sort types with N as high as 10000. I suspect that they're all broken in the presence of NaNs. I further suspect you'd need some punishingly slow n**2 algorithm to be robust in the presence of NaNs.
> I'll look into that. I bet searchsorted gets messed up by nan's. Do > you think it worthwhile to worry about it? No. But that's because it's my contention is that any sequence with NaNs in it is *not sorted* and thus isn't suitable input for searchsorted. -tim ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion