Hi Saigopal, What pyfft version do you use? Can you please post the full testing code which can be executed to reproduce the bug? Because the code I composed (basically, added comparison with CPU to your code) works normally on my desktop with Tesla C2050 (Ubuntu 10.04 x64, Cuda 3.2, PyCuda 0.94.2, pyfft 0.3.4), giving the relative error of ~ 1e-7:
# ---------- import numpy from pyfft.cuda import Plan import pycuda.autoinit import pycuda.gpuarray as gpuarray # w,h,k are the array dimensions in a power of 2 # im1, im2 are the input 3d arrays of dtype complex64 w = h = k = 512 im1 = numpy.random.rand(w,h,k).astype(numpy.complex64) im2 = numpy.random.rand(w,h,k).astype(numpy.complex64) plan = Plan((w,h,k), normalize=True) # forward transform on device im1_gpu = gpuarray.to_gpu(im1) plan.execute(im1_gpu) im1_ft = im1_gpu.get() del im1_gpu im2_gpu = gpuarray.to_gpu(im2) plan.execute(im2_gpu) im2_ft = im2_gpu.get() del im2_gpu # do multiplication on host - can be done on device. conv = im1_ft * im2_ft #inverse transform on device conv_gpu = gpuarray.to_gpu(conv) del conv plan.execute(conv_gpu, inverse=True) corr_gpu = conv_gpu.get() # Reference calculation on CPU: im1_ft = numpy.fft.fftn(im1) im2_ft = numpy.fft.fftn(im2) conv = im1_ft * im2_ft del im1 del im2 del im1_ft del im2_ft corr_cpu = numpy.fft.ifftn(conv) print numpy.linalg.norm(corr_cpu - corr_gpu) / numpy.linalg.norm(corr_gpu) # ---------- (I inserted all these deletions because otherwise numpy threw MemoryError, despite the fact that the desktop has 12GB RAM) Best regards, Bogdan P.S. There is a harmless typo in your code: string instead of boolean value in "inverse='True'" On Tue, Jan 18, 2011 at 1:21 PM, Saigopal Nelaturi <saigo...@gmail.com> wrote: > Hello all, > I am implementing a simple 3d convolution on the gpu using pyfft. The > basic idea is straightforward - obtain the 3d Fourier transform for each > array, multiply and take the inverse transform of the product. I am > using pyfft for the implementation. The code below works correctly when > my input array is 256^3 but fails (executes but gives garbage results) > for a 512^3 voxel grid. > > > # w,h,k are the array dimensions in a power of 2 > # im1, im2 are the input 3d arrays of dtype complex64 > > plan = Plan((w,h,k), normalize=True) > > # forward transform on device > im1_gpu = gpuarray.to_gpu(im1) > plan.execute(im1_gpu) > im1_ft = im1_gpu.get() > del im1_gpu > > im2_gpu = gpuarray.to_gpu(im2) > plan.execute(im2_gpu) > im2_ft = im2_gpu.get() > del im2_gpu > > > # do multiplication on host - can be done on device. > conv = im1_ft * im2_ft > > #inverse transform on device > conv_gpu = gpuarray.to_gpu(conv) > plan.execute(conv_gpu, inverse='True') > corr = conv_gpu.get() > > > I don't think there's anything wrong with the code (it works for smaller > array sizes) as such but I am perplexed as to why the failure occurs. I > am running the code on a Tesla C2050 (2.8GB available memory) and so > there's enough space to hold the 512^3 array with complex64 dtype. Does > anyone have an explanation? > > -Saigopal > > > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda > _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda