Martin Spacek wrote: > Tim Hochberg wrote: > > >> Here's an approach (mean_accumulate) that avoids making any copies of >> the data. It runs almost 4x as fast as your approach (called baseline >> here) on my box. Perhaps this will be useful: >> >> > --snip-- > >> def mean_accumulate(data, indices): >> result = np.zeros([32, 32], float) >> for i in indices: >> result += data[i] >> result /= len(indices) >> return result >> > > Great! I got a roughly 9x speed improvement using take() in combination > with this approach. Thanks Tim! > > Here's what my code looks like now: > > >>> def mean_accum(data): > >>> result = np.zeros(data[0].shape, np.float64) > >>> for dataslice in data: > >>> result += dataslice > >>> result /= len(data) > >>> return result > >>> > >>> # frameis are int64 > >>> frames = data.take(frameis.astype(np.int32), axis=0) > >>> meanframe = mean_accum(frames) > > I'm surprised that using a python for loop is faster than the built-in > mean method. I suppose mean() can't perform the same in-place operations > because in certain cases doing so would fail? > I'm not sure why mean is slow here, although possibly it's a locality issue -- mean likely computes along axis zero each time, which means it's killing the cache -- and on the other hand the accumulate version is cache friendly. One thing to keep in mind about python for loops is that they are slow if you are doing a simple computation inside (a single add for instance). IIRC, they are 10's of times slower. However, here one is doing 1000 odd operations in the inner loop, so the loop overhead is minimal.
(What would be perfect here is something just like take, but that returned an iterator instead of a new array as that could be done with no copying -- unfortunately such a beast does not exist as far as I know) I'm actually surprised that the take version is faster than my original version since it makes a big ol' copy. I guess this is an indication that indexing is more expensive than I realize. That's why nothing beats measuring! An experiment to reshape your data so that it's friendly to mean (assuming it really does operate on axis zero) and try that. However, this turns out to be a huge pesimization, mostly because take + transpose is pretty slow. -tim > Martin > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion