I decided to hold myself over until being able to take a hard look at the numpy histogramdd code:
Here is a quick thing a put together in cython. It's a 40x speedup over histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3) array, this executed in 0.005 seconds on my machine. This only works for arrays with uint8 data types having dimensions (x, y, 3) (common image format). The return array is a (16, 16, 16) equal width bin histogram of the input. If anyone wants the cython C-output, let me know and I will email it to you. If there is interest, I will extend this for different size bins and aliases for different data types. Chris import numpy as np cimport numpy as np DTYPE = np.uint8 DTYPE32 = np.int ctypedef np.uint8_t DTYPE_t ctypedef np.int_t DTYPE_t32 def hist3d(np.ndarray[DTYPE_t, ndim=3] img): cdef int x = img.shape[0] cdef int y = img.shape[1] cdef int z = img.shape[2] cdef int addx cdef int addy cdef int addz cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], dtype=DTYPE32) cdef int i, j, v0, v1, v2 for i in range(x): for j in range(y): v0 = img[i, j, 0] v1 = img[i, j, 1] v2 = img[i, j, 2] addx = (v0 - (v0 % 16)) / 16 addy = (v1 - (v1 % 16)) / 16 addz = (v2 - (v2 % 16)) / 16 out[addx, addy, addz] += 1 return out On Tue, May 5, 2009 at 9:46 AM, David Huard <david.hu...@gmail.com> wrote: > > > On Mon, May 4, 2009 at 4:18 PM, <josef.p...@gmail.com> wrote: > >> On Mon, May 4, 2009 at 4:00 PM, Chris Colbert <sccolb...@gmail.com> >> wrote: >> > i'll take a look at them over the next few days and see what i can hack >> out. >> > >> > Chris >> > >> > On Mon, May 4, 2009 at 3:18 PM, David Huard <david.hu...@gmail.com> >> wrote: >> >> >> >> >> >> On Mon, May 4, 2009 at 7:00 AM, <josef.p...@gmail.com> wrote: >> >>> >> >>> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert <sccolb...@gmail.com> >> >>> wrote: >> >>> > this actually sort of worked. Thanks for putting me on the right >> track. >> >>> > >> >>> > Here is what I ended up with. >> >>> > >> >>> > this is what I ended up with: >> >>> > >> >>> > def hist3d(imgarray): >> >>> > histarray = N.zeros((16, 16, 16)) >> >>> > temp = imgarray.copy() >> >>> > bins = N.arange(0, 257, 16) >> >>> > histarray = N.histogramdd((temp[:,:,0].ravel(), >> >>> > temp[:,:,1].ravel(), >> >>> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] >> >>> > return histarray >> >>> > >> >>> > this creates a 3d histogram of rgb image values in the range 0,255 >> >>> > using 16 >> >>> > bins per component color. >> >>> > >> >>> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a >> for >> >>> > loop. >> >>> > >> >>> > not quite framerate, but good enough for prototyping. >> >>> > >> >>> >> >>> I don't think your copy to temp is necessary, and use reshape(-1,3) as >> >>> in the example of Stefan, which will avoid copying the array 3 times. >> >>> >> >>> If you need to gain some more speed, then rewriting histogramdd and >> >>> removing some of the unnecessary checks and calculations looks >> >>> possible. >> >> >> >> Indeed, the strategy used in the histogram function is faster than the >> one >> >> used in the histogramdd case, so porting one to the other should speed >> >> things up. >> >> >> >> David >> >> is searchsorted faster than digitize and bincount ? >> > > That depends on the number of bins and whether or not the bin width is > uniform. A 1D benchmark I did a while ago showed that if the bin width is > uniform, then the best strategy is to create a counter initialized to 0, > loop through the data, compute i = (x-bin0) /binwidth and increment counter > i by 1 (or by the weight of the data). If the bins are non uniform, then for > nbin > 30 you'd better use searchsort, and digitize otherwise. > > For those interested in speeding up histogram code, I recommend reading a > thread started by Cameron Walsh on the 12/12/06 named "Histograms of > extremely large data sets" Code and benchmarks were posted. > > Chris, if your bins all have the same width, then you can certainly write > an histogramdd routine that is way faster by using the indexing trick > instead of digitize or searchsort. > > Cheers, > > David > > > > >> >> Using the idea of histogramdd, I get a bit below a tenth of a second, >> my best for this problem is below. >> I was trying for a while what the fastest way is to convert a two >> dimensional array into a one dimensional index for bincount. I found >> that using the return index of unique1d is very slow compared to >> numeric index calculation. >> >> Josef >> >> example timed for: >> nobs = 307200 >> nbins = 16 >> factors = np.random.randint(256,size=(nobs,3)).copy() >> factors2 = factors.reshape(-1,480,3).copy() >> >> def hist3(factorsin, nbins): >> if factorsin.ndim != 2: >> factors = factorsin.reshape(-1,factorsin.shape[-1]) >> else: >> factors = factorsin >> N, D = factors.shape >> darr = np.empty(factors.T.shape, dtype=int) >> nele = np.max(factors)+1 >> bins = np.arange(0, nele, nele/nbins) >> bins[-1] += 1 >> for i in range(D): >> darr[i] = np.digitize(factors[:,i],bins) - 1 >> >> #add weighted rows >> darrind = darr[D-1] >> for i in range(D-1): >> darrind += darr[i]*nbins**(D-i-1) >> return np.bincount(darrind) # return flat not reshaped >> > >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion