On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() is > implementing a classic binary search,
Yes. > which is pretty bad in terms of > cache misses. There are several modern implementations of binary search > which arrange items in memory such that cache misses are much more rare. > Clearly making such an indexing arrangement would take time, but in my > particular case, I can spare the time to create an index if searching > was faster, since I'd make the index once but do the searching many times. > > Is there an implementation of such an algorithm that works easilty with > numpy? Also, can you offer any advice, suggestions, and comments to me > if I attempted to implement such an algorithm? I'm no help. You seem to know more than I do. Sadly, the first few Google hits I get for "binary search minimize cache misses" are patents. I don't know what the substantive content of those patents are; I have a strict policy of not reading patents. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion