<josef.pktd <at> gmail.com> writes: > > On Mon, Apr 19, 2010 at 2:36 AM, Paul Northug <pnorthug <at> gmail.com> wrote: <snip> > > > > # > > import numpy as np > > from scipy.signal import fftn, ifftn, correlate > > > > M, N, P, T = 2, 3, 4, 5 > > > > phi = np.random.randn(M, N, P) > > a = np.random.randn(N, P+T-1) > > x = np.zeros((M, T)) > > > > # 1. correlate > > > > x = np.array([correlate(phi[i], a, 'valid').squeeze() for i in range(M)]) > > > > # 2. dot products > > > > y = np.zeros_like(x) > > for b in range(P): y += np.dot(phi[:,:,b], a[:,b:b+T]) > > > > # 3. a series of ffts (code adapted from scipy convolve) > > > > s1 = np.array(phi.shape[1:]) > > s2 = np.array(a.shape) > > size = s1+s2-1 > > z = np.empty_like(y) > > fsize = (2**np.ceil(np.log2(size))).astype(np.int) # power of 2 pad > > af = fftn(a, fsize) > > fslice = tuple([slice(0, sz) for sz in size]) > > for i in range(M): > > phif = fftn(phi[i], fsize).conj() > > z[i] = np.real(ifftn(phif * af)[fslice])[0,:T] > > > > # 4. <BROKEN> single fft, ifft pair - some rows are correct > > > > s1 = np.array(phi.shape) > > s2 = np.array(a[np.newaxis].shape) > > size = s1+s2-1 > > fsize = (2**np.ceil(np.log2(size))).astype(np.int) > > af = fftn(a[np.newaxis], fsize) > > phif = fftn(phi, fsize).conj() > > fslice = tuple([slice(0, sz) for sz in size]) > > w = np.real(ifftn(phif * af)[fslice])[:,:,:T] > > I only checked this in your example > > >>> np.squeeze(signal.fftconvolve(phi[:,::-1,::-1], a[None,...],mode='valid')) > array([[ 0.42695078, 4.3433924 , -2.53084395, -4.22672281, -5.16705981], > [-2.35108765, -2.15060513, -1.23528122, 1.40386599, 1.38189896]]) > > >>> y > array([[ 0.42695078, 4.3433924 , -2.53084395, -4.22672281, -5.16705981], > [-2.35108765, -2.15060513, -1.23528122, 1.40386599, 1.38189896]]) > >>> x > array([[ 0.42695078, 4.3433924 , -2.53084395, -4.22672281, -5.16705981], > [-2.35108765, -2.15060513, -1.23528122, 1.40386599, 1.38189896]]) > > Josef >
Thanks Josef. You saved me a lot of work. I was missing the middle ::-1. The fft approach is orders of magnitude slower than method #2 in this case. Using CUDA fft3, it is about 15 times slower for the desired problem size. It is not a fair comparison as the convolution is degenerate along 2 dimensions but it is useful to know. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion