Re: counting unique numpy subarrays

2015-12-04 Thread Peter Otten
duncan smith wrote:

> Hello,
>   I'm trying to find a computationally efficient way of identifying
> unique subarrays, counting them and returning an array containing only
> the unique subarrays and a corresponding 1D array of counts. The
> following code works, but is a bit slow.
> 
> ###
> 
> from collections import Counter
> import numpy
> 
> def bag_data(data):
> # data (a numpy array) is bagged along axis 0
> # returns concatenated array and corresponding array of counts
> vec_shape = data.shape[1:]
> counts = Counter(tuple(arr.flatten()) for arr in data)
> data_out = numpy.zeros((len(counts),) + vec_shape)
> cnts = numpy.zeros((len(counts,)))
> for i, (tup, cnt) in enumerate(counts.iteritems()):
> data_out[i] = numpy.array(tup).reshape(vec_shape)
> cnts[i] =  cnt
> return data_out, cnts
> 
> ###
> 
> I've been looking through the numpy docs, but don't seem to be able to
> come up with a clean solution that avoids Python loops. 

Me neither :(

> TIA for any
> useful pointers. Cheers.

Here's what I have so far:

def bag_data(data):
counts = numpy.zeros(data.shape[0])
seen = {}
for i, arr in enumerate(data):
sarr = arr.tostring()
if sarr in seen:
counts[seen[sarr]] += 1
else:
seen[sarr] = i
counts[i] = 1
nz = counts != 0
return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts)

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: counting unique numpy subarrays

2015-12-04 Thread Albert-Jan Roskam
Hi

(Sorry for topposting)

numpy.ravel is faster than numpy.flatten (no copy)
numpy.empty is faster than numpy.zeros
numpy.fromiter might be useful to avoid the loop (just a hunch)

Albert-Jan

> From: duncan@invalid.invalid
> Subject: counting unique numpy subarrays
> Date: Fri, 4 Dec 2015 19:43:35 +
> To: python-list@python.org
> 
> Hello,
>   I'm trying to find a computationally efficient way of identifying
> unique subarrays, counting them and returning an array containing only
> the unique subarrays and a corresponding 1D array of counts. The
> following code works, but is a bit slow.
> 
> ###
> 
> from collections import Counter
> import numpy
> 
> def bag_data(data):
> # data (a numpy array) is bagged along axis 0
> # returns concatenated array and corresponding array of counts
> vec_shape = data.shape[1:]
> counts = Counter(tuple(arr.flatten()) for arr in data)
> data_out = numpy.zeros((len(counts),) + vec_shape)
> cnts = numpy.zeros((len(counts,)))
> for i, (tup, cnt) in enumerate(counts.iteritems()):
> data_out[i] = numpy.array(tup).reshape(vec_shape)
> cnts[i] =  cnt
> return data_out, cnts
> 
> ###
> 
> I've been looking through the numpy docs, but don't seem to be able to
> come up with a clean solution that avoids Python loops. TIA for any
> useful pointers. Cheers.
> 
> Duncan
> -- 
> https://mail.python.org/mailman/listinfo/python-list
  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: counting unique numpy subarrays

2015-12-04 Thread duncan smith
On 04/12/15 22:36, Albert-Jan Roskam wrote:
> Hi
> 
> (Sorry for topposting)
> 
> numpy.ravel is faster than numpy.flatten (no copy)
> numpy.empty is faster than numpy.zeros
> numpy.fromiter might be useful to avoid the loop (just a hunch)
> 
> Albert-Jan
> 

Thanks, I'd forgotten the difference between numpy. flatten and
numpy.ravel. I wasn't even aware of numpy.empty.

Duncan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: counting unique numpy subarrays

2015-12-04 Thread duncan smith
On 04/12/15 23:06, Peter Otten wrote:
> duncan smith wrote:
> 
>> Hello,
>>   I'm trying to find a computationally efficient way of identifying
>> unique subarrays, counting them and returning an array containing only
>> the unique subarrays and a corresponding 1D array of counts. The
>> following code works, but is a bit slow.
>>
>> ###
>>
>> from collections import Counter
>> import numpy
>>
>> def bag_data(data):
>> # data (a numpy array) is bagged along axis 0
>> # returns concatenated array and corresponding array of counts
>> vec_shape = data.shape[1:]
>> counts = Counter(tuple(arr.flatten()) for arr in data)
>> data_out = numpy.zeros((len(counts),) + vec_shape)
>> cnts = numpy.zeros((len(counts,)))
>> for i, (tup, cnt) in enumerate(counts.iteritems()):
>> data_out[i] = numpy.array(tup).reshape(vec_shape)
>> cnts[i] =  cnt
>> return data_out, cnts
>>
>> ###
>>
>> I've been looking through the numpy docs, but don't seem to be able to
>> come up with a clean solution that avoids Python loops. 
> 
> Me neither :(
> 
>> TIA for any
>> useful pointers. Cheers.
> 
> Here's what I have so far:
> 
> def bag_data(data):
> counts = numpy.zeros(data.shape[0])
> seen = {}
> for i, arr in enumerate(data):
> sarr = arr.tostring()
> if sarr in seen:
> counts[seen[sarr]] += 1
> else:
> seen[sarr] = i
> counts[i] = 1
> nz = counts != 0
> return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts)
> 

Three times as fast as what I had, and a bit cleaner. Excellent. Cheers.

Duncan
-- 
https://mail.python.org/mailman/listinfo/python-list