<josef.pktd <at> gmail.com> writes:

>
> On Mon, Apr 19, 2010 at 2:36 AM, Paul Northug <pnorthug <at> gmail.com> wrote:
<snip>
> >
> > #
> > import numpy as np
> > from scipy.signal import fftn, ifftn, correlate
> >
> > M, N, P, T = 2, 3, 4, 5
> >
> > phi = np.random.randn(M, N, P)
> > a = np.random.randn(N, P+T-1)
> > x = np.zeros((M, T))
> >
> > # 1. correlate
> >
> > x = np.array([correlate(phi[i], a, 'valid').squeeze() for i in range(M)])
> >
> > # 2. dot products
> >
> > y = np.zeros_like(x)
> > for b in range(P): y += np.dot(phi[:,:,b], a[:,b:b+T])
> >
> > # 3. a series of ffts (code adapted from scipy convolve)
> >
> > s1 = np.array(phi.shape[1:])
> > s2 = np.array(a.shape)
> > size = s1+s2-1
> > z = np.empty_like(y)
> > fsize = (2**np.ceil(np.log2(size))).astype(np.int) # power of 2 pad
> > af = fftn(a, fsize)
> > fslice = tuple([slice(0, sz) for sz in size])
> > for i in range(M):
> >    phif = fftn(phi[i], fsize).conj()
> >    z[i] = np.real(ifftn(phif * af)[fslice])[0,:T]
> >
> > # 4. <BROKEN> single fft, ifft pair - some rows are correct
> >
> > s1 = np.array(phi.shape)
> > s2 = np.array(a[np.newaxis].shape)
> > size = s1+s2-1
> > fsize = (2**np.ceil(np.log2(size))).astype(np.int)
> > af = fftn(a[np.newaxis], fsize)
> > phif = fftn(phi, fsize).conj()
> > fslice = tuple([slice(0, sz) for sz in size])
> > w = np.real(ifftn(phif * af)[fslice])[:,:,:T]
>
> I only checked this in your example
>
> >>> np.squeeze(signal.fftconvolve(phi[:,::-1,::-1], a[None,...],mode='valid'))
> array([[ 0.42695078,  4.3433924 , -2.53084395, -4.22672281, -5.16705981],
>        [-2.35108765, -2.15060513, -1.23528122,  1.40386599,  1.38189896]])
>
> >>> y
> array([[ 0.42695078,  4.3433924 , -2.53084395, -4.22672281, -5.16705981],
>        [-2.35108765, -2.15060513, -1.23528122,  1.40386599,  1.38189896]])
> >>> x
> array([[ 0.42695078,  4.3433924 , -2.53084395, -4.22672281, -5.16705981],
>        [-2.35108765, -2.15060513, -1.23528122,  1.40386599,  1.38189896]])
>
> Josef
>

Thanks Josef. You saved me a lot of work. I was missing the middle ::-1.

The fft approach is orders of magnitude slower than method #2 in this
case. Using CUDA fft3, it is about 15 times slower for the desired
problem size. It is not a fair comparison as the convolution is
degenerate along 2 dimensions but it is useful to know.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to