Re: [Numpy-discussion] Recarray sort() slowness

Charles R Harris Sun, 04 Mar 2007 07:37:51 -0800

On 3/4/07, Francesc Altet <[EMAIL PROTECTED]> wrote:


Hi,

I've finally implemented Chuck's suggestion of sorting of a recarray of
two fields (the first being the actual array to be sorted and the other
being the array to be reordered following the resulting order for the
first one). Indeed, this approach saves me an amount of memory
equivalent to the size of the second column, which is really nice.

However, I'm afraid that I wouldn't be able to use this approach as it
is 25x slower (see the attached benchmark; beware! only runs on a Linux
kernel 2.6!) than regular argsorting of the first field and then doing a
fancy indexing over the second. If the slowdown would be 2x I still can
have a chance to use it, but 25x is a no go.

I'm curious why the recarray.sort(order='fieldN') is so slow, and I'm
wondering if this can be speed-up in some way or another.



I suspect there are several reasons.

1) It defines a new type with the comparison done on all fields
2) exchanges are done by copying the specified number of bytes

I think Travis was clever to define a new type, it made things easy and very
general, but it wasn't aimed at speed. There might be some optimizations
possible in there that Travis could speak to.

It would be pretty easy to modify argsort itself to do what you want in a
type specific way using a key vector and a vector to be sorted by the keys.
I expect it would be about 1/2 as fast as the normal argsort. Hmmm,
something like keysort(...).

Chuck

_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Recarray sort() slowness

Reply via email to