On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai <n...@uark.edu> wrote: > I have two large matrices, say, ABC and DEF, each with a shape of 7000 by > 4500. I have another list, say, elem, containing 850 values from ABC. I am > interested in finding out the corresponding values in DEF where ABC has > elem and store them *separately*. The code that I am using is: > > for i in range(len(elem)): > DEF_distr = DEF[ABC==elem[i]] > > DEF_distr gets used for further processing before it gets cleared from > memory and the next round of the above loop begins. The loop above > currently takes about 20 minutes! I think the bottle neck is where elem is > getting searched repeatedly in ABC. So I am looking for a solution where > all elem can get processed in a single call and the indices of ABC be > stored in another variable (separately). I would appreciate if you suggest > any faster method for getting DEF_distr. > > You'll need to mention some details about the contents of ABC/DEF in order to get the best answer (what range of values, do they have a certain structure, etc). I made the assumption that ABC and elem have integers (I'm not sure it makes sense to search for ABC==elem[n] unless they are both integers), and then used a sort followed by searchsorted. This has a side effect of reordering the elements in DEF_distr. I don't know if that matters. You can skip the .copy() calls if you don't care that ABC/DEF are sorted.
ABC_1D = ABC.copy().ravel() ABC_1D_sorter = np.argsort(ABC_1D) ABC_1D = ABC_1D[ABC_1D_sorter] DEF_1D = DEF.copy().ravel() DEF_1D = DEF_1D[ABC_1D_sorter] ind1 = np.searchsorted(ABC_1D, elem, side='left') ind2 = np.searchsorted(ABC_1D, elem, side='right') DEF_distr = [] for n in range(len(elem)): DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] ) I tried this on the big memory workstation, and for the 7Kx4K size I get about 100 seconds for the simple method and 10 seconds for this more complicated sort-based method - if you are getting 20 minutes for that, maybe there is a memory problem, or a different part of the code that is the bottleneck? Hope that helps, Aronne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion