Martin Spacek wrote: > Hello, > > I'm a bit ignorant of optimization in numpy. > > I have a movie with 65535 32x32 frames stored in a 3D array of uint8 > with shape (65535, 32, 32). I load it from an open file f like this: > > >>> import numpy as np > >>> data = np.fromfile(f, np.uint8, count=65535*32*32) > >>> data = data.reshape(65535, 32, 32) > > I'm picking several thousand frames more or less randomly from > throughout the movie and finding the mean frame over those frames: > > >>> meanframe = data[frameis].mean(axis=0) > > frameis is a 1D array of frame indices with no repeated values in it. If > it has say 4000 indices in it, then the above line takes about 0.5 sec > to complete on my system. I'm doing this for a large number of different > frameis, some of which can have many more indices in them. All this > takes many minutes to complete, so I'm looking for ways to speed it up. > > If I divide it into 2 steps: > > >>> temp = data[frameis] > >>> meanframe = temp.mean(axis=0) > > and time it, I find the first step takes about 0.2 sec, and the second > takes about 0.3 sec. So it's not just the mean() step, but also the > indexing step that's taking some time. > > If I flatten with ravel: > > >>> temp = data[frameis].ravel() > >>> meanframe = temp.mean(axis=0) > > then the first step still takes about 0.2 sec, but the mean() step drops > to about 0.1 sec. But of course, this is taking a flat average across > all pixels in the movie, which isn't what I want to do. > > I have a feeling that the culprit is the non contiguity of the movie > frames being averaged, but I don't know how to proceed. > > Any ideas? Could reshaping the data somehow speed things up? Would > weave.blitz or weave.inline or pyrex help? > > I'm running numpy 0.9.8 > > Thanks, > > Martin > Martin,
Here's an approach (mean_accumulate) that avoids making any copies of the data. It runs almost 4x as fast as your approach (called baseline here) on my box. Perhaps this will be useful: frames = 65535 samples = 4000 data = (256 * np.random.random((frames, 32, 32))).astype(np.uint8) indices = np.arange(frames) random.shuffle(indices) indices = indices[:samples] def mean_baseline(data, indices): return data[indices].mean(axis=0) def mean_accumulate(data, indices): result = np.zeros([32, 32], float) for i in indices: result += data[i] result /= len(indices) return result if __name__ == "__main__": import timeit print mean_baseline(data, indices)[0,:8] print timeit.Timer("s.mean_baseline(s.data, s.indices)", "import scratch as s").timeit(10) print mean_accumulate(data, indices)[0,:8] print timeit.Timer("s.mean_accumulate(s.data, s.indices)", "import scratch as s").timeit(10) This gives: [ 126.947 127.39175 128.03725 129.83425 127.98925 126.866 128.5352 127.6205 ] 3.95907664242 [ 126.947 127.39175 128.03725 129.83425 127.98925 126.866 128.53525 127.6205 ] 1.06913644053 I also wondered if sorting indices would help since it would help improve locality of reference, but when I measured that it appeared to help not at all. regards, -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion