Dmitry Chichkov <dchich...@gmail.com> writes: > Given: a large list (10,000,000) of floating point numbers; > Task: fastest python code that finds k (small, e.g. 10) smallest > items, preferably with item indexes; > Limitations: in python, using only standard libraries (numpy & scipy > is Ok); > > I've tried several methods. With N = 10,000,000, K = 10 The fastest so > far (without item indexes) was pure python implementation > nsmallest_slott_bisect (using bisect/insert). And with indexes > nargsmallest_numpy_argmin (argmin() in the numpy array k times). > > Anyone up to the challenge beating my code with some clever selection > algorithm? > > Current Table: > 1.66864395142 mins_heapq(items, n): > 0.946580886841 nsmallest_slott_bisect(items, n): > 1.38014793396 nargsmallest(items, n): > 10.0732769966 sorted(items)[:n]: > 3.17916202545 nargsmallest_numpy_argsort(items, n): > 1.31794500351 nargsmallest_numpy_argmin(items, n): > 2.37499308586 nargsmallest_numpy_array_argsort(items, n): > 0.524670124054 nargsmallest_numpy_array_argmin(items, n): > > 0.0525538921356 numpy argmin(items): 1892997 > 0.364673852921 min(items): 10.0000026786
I think without numpy, nsmallest_slott_bisect is almost optimal. There is a slight improvement: 1.33862709999 nsmallest_slott_bisect(items, n): [10.000011643188717, 10.000017791492528] 0.883894920349 nsmallest_slott_bisect2(items, n): [10.000011643188717, 10.000017791492528] ==== code ==== from bisect import insort from itertools import islice def nsmallest_slott_bisect(iterable, n, insort=insort): it = iter(iterable) mins = sorted(islice(it, n)) for el in it: if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates insort(mins, el) mins.pop() return mins def nsmallest_slott_bisect2(iterable, n, insort=insort): it = iter(iterable) mins = sorted(islice(it, n)) maxmin = mins[-1] for el in it: if el <= maxmin: #NOTE: equal sign is to preserve duplicates insort(mins, el) mins.pop() maxmin = mins[-1] return mins import time from random import randint, random test_data = [randint(10, 50) + random() for i in range(10000000)] K = 10 init = time.time() mins = nsmallest_slott_bisect(test_data, K) print time.time() - init, 'nsmallest_slott_bisect(items, n):', mins[: 2] init = time.time() mins = nsmallest_slott_bisect2(test_data, K) print time.time() - init, 'nsmallest_slott_bisect2(items, n):', mins[: 2] -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list