Basically, what you want to do is a histogram. Numpy has that
functionality built in. However: the version built in to numpy is about
as suboptimal as yours. The problem is the unnecessary sorting of the data.
In principle, a histogram does not need any sorting, so it could be done
in strictly O(N). I don't see any simply way of doing so efficiently in
numpy without resorting to a costly loop. We had a discussion about this
a while ago but I cannot remember any resolution. The proper thing to do
would probably be a C routine.
If performance is crucial to you, I would suggest using weave. A minimal
example as starting point is given below.
Greetings,
Norbert
---
#!/usr/bin/env python
import numpy
import scipy.weave
import scipy.weave.converters
def histogram(data, nR):
res = numpy.zeros(nR,int)
dataflat = data.flatten()
Ndata = len(dataflat)
scipy.weave.inline(
for(int i=0;iNdata;i++) {
int d = dataflat(i);
if(d nR d = 0) {
res(d) += 1;
}
}
,
arg_names = [dataflat,res,Ndata,nR],
type_converters = scipy.weave.converters.blitz,
)
return res
a = numpy.array([0,3,1,2,3,5,2,3,2,4,5,1,3,4,2])
print a
print histogram(a,4)
---
Robin wrote:
Hello,
As a converting MATLAB user I am bringing over some of my code. A core
function I need is to sample the probability distribution of a given
set of data. Sometimes these sets are large so I would like to make it
as efficient as possible. (the data are integers representing members
of a discrete space)
In MATLAB the best way I found was the diff of find of diff trick
which resulted in the completely vectorised solution (below). Does it
make sense to translate this into numpy? I don't have a feel yet for
what is fast/slow - are the functions below built in and so quick (I
thought diff might not be).
Is there a better/more pythonic way to do it?
function Pr=prob(data, nR)
Pr= zeros(nR,1);
% diff of find of diff trick for counting number of elements
temp = sort(data(data0));% want vector excluding P(0)
dtemp= diff([temp;max(temp)+1]);
count = diff(find([1;dtemp]));
indx = temp(dtemp0);
Pr(indx)= count ./ numel(data);% probability
Thanks
Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion