Hi (Sorry for topposting)
numpy.ravel is faster than numpy.flatten (no copy) numpy.empty is faster than numpy.zeros numpy.fromiter might be useful to avoid the loop (just a hunch) Albert-Jan > From: duncan@invalid.invalid > Subject: counting unique numpy subarrays > Date: Fri, 4 Dec 2015 19:43:35 +0000 > To: python-list@python.org > > Hello, > I'm trying to find a computationally efficient way of identifying > unique subarrays, counting them and returning an array containing only > the unique subarrays and a corresponding 1D array of counts. The > following code works, but is a bit slow. > > ############### > > from collections import Counter > import numpy > > def bag_data(data): > # data (a numpy array) is bagged along axis 0 > # returns concatenated array and corresponding array of counts > vec_shape = data.shape[1:] > counts = Counter(tuple(arr.flatten()) for arr in data) > data_out = numpy.zeros((len(counts),) + vec_shape) > cnts = numpy.zeros((len(counts,))) > for i, (tup, cnt) in enumerate(counts.iteritems()): > data_out[i] = numpy.array(tup).reshape(vec_shape) > cnts[i] = cnt > return data_out, cnts > > ############### > > I've been looking through the numpy docs, but don't seem to be able to > come up with a clean solution that avoids Python loops. TIA for any > useful pointers. Cheers. > > Duncan > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list