Hi Saigopal,

What pyfft version do you use? Can you please post the full testing
code which can be executed to reproduce the bug? Because the code I
composed (basically, added comparison with CPU to your code) works
normally on my desktop with Tesla C2050 (Ubuntu 10.04 x64, Cuda 3.2,
PyCuda 0.94.2, pyfft 0.3.4), giving the relative error of ~ 1e-7:

# ----------
import numpy
from pyfft.cuda import Plan
import pycuda.autoinit
import pycuda.gpuarray as gpuarray

# w,h,k are the array dimensions in a power of 2
# im1, im2 are the input 3d arrays of dtype complex64

w = h = k = 512
im1 = numpy.random.rand(w,h,k).astype(numpy.complex64)
im2 = numpy.random.rand(w,h,k).astype(numpy.complex64)

plan = Plan((w,h,k), normalize=True)

# forward transform on device
im1_gpu = gpuarray.to_gpu(im1)
plan.execute(im1_gpu)
im1_ft = im1_gpu.get()
del im1_gpu

im2_gpu = gpuarray.to_gpu(im2)
plan.execute(im2_gpu)
im2_ft = im2_gpu.get()
del im2_gpu

# do multiplication on host - can be done on device.
conv = im1_ft * im2_ft

#inverse transform on device
conv_gpu = gpuarray.to_gpu(conv)
del conv
plan.execute(conv_gpu, inverse=True)
corr_gpu = conv_gpu.get()

# Reference calculation on CPU:
im1_ft = numpy.fft.fftn(im1)
im2_ft = numpy.fft.fftn(im2)
conv = im1_ft * im2_ft
del im1
del im2
del im1_ft
del im2_ft
corr_cpu = numpy.fft.ifftn(conv)

print numpy.linalg.norm(corr_cpu - corr_gpu) / numpy.linalg.norm(corr_gpu)
# ----------

(I inserted all these deletions because otherwise numpy threw
MemoryError, despite the fact that the desktop has 12GB RAM)

Best regards,
Bogdan

P.S. There is a harmless typo in your code: string instead of boolean
value in "inverse='True'"

On Tue, Jan 18, 2011 at 1:21 PM, Saigopal Nelaturi <saigo...@gmail.com> wrote:
> Hello all,
>        I am implementing a simple 3d convolution on the gpu using pyfft. The
> basic idea is straightforward - obtain the 3d Fourier transform for each
> array, multiply and take the inverse transform of the product. I am
> using pyfft for the implementation. The code below works correctly when
> my input array is 256^3 but fails (executes but gives garbage results)
> for a 512^3 voxel grid.
>
>
> # w,h,k are the array dimensions in a power of 2
> # im1, im2 are the input 3d arrays of dtype complex64
>
> plan = Plan((w,h,k), normalize=True)
>
> # forward transform on device
> im1_gpu = gpuarray.to_gpu(im1)
> plan.execute(im1_gpu)
> im1_ft = im1_gpu.get()
> del im1_gpu
>
> im2_gpu = gpuarray.to_gpu(im2)
> plan.execute(im2_gpu)
> im2_ft = im2_gpu.get()
> del im2_gpu
>
>
> # do multiplication on host - can be done on device.
> conv = im1_ft * im2_ft
>
> #inverse transform on device
> conv_gpu = gpuarray.to_gpu(conv)
> plan.execute(conv_gpu, inverse='True')
> corr = conv_gpu.get()
>
>
> I don't think there's anything wrong with the code (it works for smaller
> array sizes) as such but I am perplexed as to why the failure occurs. I
> am running the code on a Tesla C2050 (2.8GB available memory) and so
> there's enough space to hold the 512^3 array with complex64 dtype. Does
> anyone have an explanation?
>
> -Saigopal
>
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to