duncan smith wrote: > Hello, > I'm trying to find a computationally efficient way of identifying > unique subarrays, counting them and returning an array containing only > the unique subarrays and a corresponding 1D array of counts. The > following code works, but is a bit slow. > > ############### > > from collections import Counter > import numpy > > def bag_data(data): > # data (a numpy array) is bagged along axis 0 > # returns concatenated array and corresponding array of counts > vec_shape = data.shape[1:] > counts = Counter(tuple(arr.flatten()) for arr in data) > data_out = numpy.zeros((len(counts),) + vec_shape) > cnts = numpy.zeros((len(counts,))) > for i, (tup, cnt) in enumerate(counts.iteritems()): > data_out[i] = numpy.array(tup).reshape(vec_shape) > cnts[i] = cnt > return data_out, cnts > > ############### > > I've been looking through the numpy docs, but don't seem to be able to > come up with a clean solution that avoids Python loops.
Me neither :( > TIA for any > useful pointers. Cheers. Here's what I have so far: def bag_data(data): counts = numpy.zeros(data.shape[0]) seen = {} for i, arr in enumerate(data): sarr = arr.tostring() if sarr in seen: counts[seen[sarr]] += 1 else: seen[sarr] = i counts[i] = 1 nz = counts != 0 return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts) -- https://mail.python.org/mailman/listinfo/python-list