Re: counting unique numpy subarrays
duncan smith wrote: > Hello, > I'm trying to find a computationally efficient way of identifying > unique subarrays, counting them and returning an array containing only > the unique subarrays and a corresponding 1D array of counts. The > following code works, but is a bit slow. > > ### > > from collections import Counter > import numpy > > def bag_data(data): > # data (a numpy array) is bagged along axis 0 > # returns concatenated array and corresponding array of counts > vec_shape = data.shape[1:] > counts = Counter(tuple(arr.flatten()) for arr in data) > data_out = numpy.zeros((len(counts),) + vec_shape) > cnts = numpy.zeros((len(counts,))) > for i, (tup, cnt) in enumerate(counts.iteritems()): > data_out[i] = numpy.array(tup).reshape(vec_shape) > cnts[i] = cnt > return data_out, cnts > > ### > > I've been looking through the numpy docs, but don't seem to be able to > come up with a clean solution that avoids Python loops. Me neither :( > TIA for any > useful pointers. Cheers. Here's what I have so far: def bag_data(data): counts = numpy.zeros(data.shape[0]) seen = {} for i, arr in enumerate(data): sarr = arr.tostring() if sarr in seen: counts[seen[sarr]] += 1 else: seen[sarr] = i counts[i] = 1 nz = counts != 0 return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts) -- https://mail.python.org/mailman/listinfo/python-list
RE: counting unique numpy subarrays
Hi (Sorry for topposting) numpy.ravel is faster than numpy.flatten (no copy) numpy.empty is faster than numpy.zeros numpy.fromiter might be useful to avoid the loop (just a hunch) Albert-Jan > From: duncan@invalid.invalid > Subject: counting unique numpy subarrays > Date: Fri, 4 Dec 2015 19:43:35 + > To: python-list@python.org > > Hello, > I'm trying to find a computationally efficient way of identifying > unique subarrays, counting them and returning an array containing only > the unique subarrays and a corresponding 1D array of counts. The > following code works, but is a bit slow. > > ### > > from collections import Counter > import numpy > > def bag_data(data): > # data (a numpy array) is bagged along axis 0 > # returns concatenated array and corresponding array of counts > vec_shape = data.shape[1:] > counts = Counter(tuple(arr.flatten()) for arr in data) > data_out = numpy.zeros((len(counts),) + vec_shape) > cnts = numpy.zeros((len(counts,))) > for i, (tup, cnt) in enumerate(counts.iteritems()): > data_out[i] = numpy.array(tup).reshape(vec_shape) > cnts[i] = cnt > return data_out, cnts > > ### > > I've been looking through the numpy docs, but don't seem to be able to > come up with a clean solution that avoids Python loops. TIA for any > useful pointers. Cheers. > > Duncan > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: counting unique numpy subarrays
On 04/12/15 22:36, Albert-Jan Roskam wrote: > Hi > > (Sorry for topposting) > > numpy.ravel is faster than numpy.flatten (no copy) > numpy.empty is faster than numpy.zeros > numpy.fromiter might be useful to avoid the loop (just a hunch) > > Albert-Jan > Thanks, I'd forgotten the difference between numpy. flatten and numpy.ravel. I wasn't even aware of numpy.empty. Duncan -- https://mail.python.org/mailman/listinfo/python-list
Re: counting unique numpy subarrays
On 04/12/15 23:06, Peter Otten wrote: > duncan smith wrote: > >> Hello, >> I'm trying to find a computationally efficient way of identifying >> unique subarrays, counting them and returning an array containing only >> the unique subarrays and a corresponding 1D array of counts. The >> following code works, but is a bit slow. >> >> ### >> >> from collections import Counter >> import numpy >> >> def bag_data(data): >> # data (a numpy array) is bagged along axis 0 >> # returns concatenated array and corresponding array of counts >> vec_shape = data.shape[1:] >> counts = Counter(tuple(arr.flatten()) for arr in data) >> data_out = numpy.zeros((len(counts),) + vec_shape) >> cnts = numpy.zeros((len(counts,))) >> for i, (tup, cnt) in enumerate(counts.iteritems()): >> data_out[i] = numpy.array(tup).reshape(vec_shape) >> cnts[i] = cnt >> return data_out, cnts >> >> ### >> >> I've been looking through the numpy docs, but don't seem to be able to >> come up with a clean solution that avoids Python loops. > > Me neither :( > >> TIA for any >> useful pointers. Cheers. > > Here's what I have so far: > > def bag_data(data): > counts = numpy.zeros(data.shape[0]) > seen = {} > for i, arr in enumerate(data): > sarr = arr.tostring() > if sarr in seen: > counts[seen[sarr]] += 1 > else: > seen[sarr] = i > counts[i] = 1 > nz = counts != 0 > return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts) > Three times as fast as what I had, and a bit cleaner. Excellent. Cheers. Duncan -- https://mail.python.org/mailman/listinfo/python-list