> I will post any new insights as I continue to work on this...
>
OK, I save isolated a sample of my data that illustrates the terrible
performance with the binarysearch. I have uploaded it as a pytables file
to http://astraw.com/framenumbers.h5 in case anyone wants to have a look
themselves. Here's an example of the type of benchmark I've been running:
import fastsearch.downsamp
import fastsearch.binarysearch
import tables
h5=tables.openFile('framenumbers.h5',mode='r')
framenumbers=h5.root.framenumbers.read()
keys=h5.root.keys.read()
h5.close()
def bench( implementation ):
for key in keys:
implementation.index( key )
downsamp = fastsearch.downsamp.DownSampledPreSearcher( framenumbers )
binary = fastsearch.binarysearch.BinarySearcher( framenumbers )
# The next two lines are IPython-specific, and the 2nd takes a looong time:
%timeit bench(downsamp)
%timeit bench(binary)
Running the above gives:
In [14]: %timeit bench(downsamp)
10 loops, best of 3: 64 ms per loop
In [15]: %timeit bench(binary)
10 loops, best of 3: 184 s per loop
Quite a difference (a factor of about 3000)! At this point, I haven't
delved into the dataset to see what makes it so pathological --
performance is nowhere near this bad for the binary search algorithm
with other sets of keys.
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion