On Thu, May 8, 2008 at 9:51 PM, Andrew Straw <[EMAIL PROTECTED]> wrote:
> I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() is > implementing a classic binary search, which is pretty bad in terms of > cache misses. There are several modern implementations of binary search > which arrange items in memory such that cache misses are much more rare. > Clearly making such an indexing arrangement would take time, but in my > particular case, I can spare the time to create an index if searching > was faster, since I'd make the index once but do the searching many times. > > Is there an implementation of such an algorithm that works easilty with > numpy? Also, can you offer any advice, suggestions, and comments to me > if I attempted to implement such an algorithm? > What sort of algorithm is best also depends on the use. If you have a 25e6 sized table that you want to interpolate through with another set of 25e6 indices, then binary search is the wrong approach. In that case you really want to start from the last position and search forward with increasing steps to bracket the next value. Basically, binary search is order n*ln(m), where n is the size of the index list and m the size of the table. The sequential way is nearly n + m, which will be much better if n and m are of comparable size. Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion