Hi Bogdan,
Thanks for the quick response. My operating specs are exactly the same as
yours, and when I run your test I get an error of ~3e-7. But I think that
number may have to do with dividing by the norm of the convolution in the
expression in the last line of your test

print numpy.linalg.norm(corr_cpu - corr_gpu) / numpy.linalg.norm(corr_gpu)

If the GPU convolution norm is high and the difference between cpu and gpu
values of convolution is relatively low, you would get a low value for the
division. I think you want to show that norm(corr_cpu-corr_gpu) is
approximately zero for two matrices corr_cpu and corr_gpu with nearly
identical entries. If I replace your statements with

print numpy.linalg.norm(corr_cpu - corr_gpu)
print numpy.linalg.norm(corr_gpu)

I get
69734.4856715
1.94368e+11

The ratio of those two numbers is ~3e-7. But the norm of the difference
between the two convolutions (cpu vs gpu) is high (69734). Is there
something I am missing?

Thanks,
Saigopal

On Mon, Jan 17, 2011 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com>wrote:

> Hi Saigopal,
>
> What pyfft version do you use? Can you please post the full testing
> code which can be executed to reproduce the bug? Because the code I
> composed (basically, added comparison with CPU to your code) works
> normally on my desktop with Tesla C2050 (Ubuntu 10.04 x64, Cuda 3.2,
> PyCuda 0.94.2, pyfft 0.3.4), giving the relative error of ~ 1e-7:
>
> # ----------
> import numpy
> from pyfft.cuda import Plan
> import pycuda.autoinit
> import pycuda.gpuarray as gpuarray
>
> # w,h,k are the array dimensions in a power of 2
> # im1, im2 are the input 3d arrays of dtype complex64
>
> w = h = k = 512
> im1 = numpy.random.rand(w,h,k).astype(numpy.complex64)
> im2 = numpy.random.rand(w,h,k).astype(numpy.complex64)
>
> plan = Plan((w,h,k), normalize=True)
>
> # forward transform on device
> im1_gpu = gpuarray.to_gpu(im1)
> plan.execute(im1_gpu)
> im1_ft = im1_gpu.get()
> del im1_gpu
>
> im2_gpu = gpuarray.to_gpu(im2)
> plan.execute(im2_gpu)
> im2_ft = im2_gpu.get()
> del im2_gpu
>
> # do multiplication on host - can be done on device.
> conv = im1_ft * im2_ft
>
> #inverse transform on device
> conv_gpu = gpuarray.to_gpu(conv)
> del conv
> plan.execute(conv_gpu, inverse=True)
> corr_gpu = conv_gpu.get()
>
> # Reference calculation on CPU:
> im1_ft = numpy.fft.fftn(im1)
> im2_ft = numpy.fft.fftn(im2)
> conv = im1_ft * im2_ft
> del im1
> del im2
> del im1_ft
> del im2_ft
> corr_cpu = numpy.fft.ifftn(conv)
>
> print numpy.linalg.norm(corr_cpu - corr_gpu) / numpy.linalg.norm(corr_gpu)
> # ----------
>
> (I inserted all these deletions because otherwise numpy threw
> MemoryError, despite the fact that the desktop has 12GB RAM)
>
> Best regards,
> Bogdan
>
> P.S. There is a harmless typo in your code: string instead of boolean
> value in "inverse='True'"
>
> On Tue, Jan 18, 2011 at 1:21 PM, Saigopal Nelaturi <saigo...@gmail.com>
> wrote:
> > Hello all,
> >        I am implementing a simple 3d convolution on the gpu using pyfft.
> The
> > basic idea is straightforward - obtain the 3d Fourier transform for each
> > array, multiply and take the inverse transform of the product. I am
> > using pyfft for the implementation. The code below works correctly when
> > my input array is 256^3 but fails (executes but gives garbage results)
> > for a 512^3 voxel grid.
> >
> >
> > # w,h,k are the array dimensions in a power of 2
> > # im1, im2 are the input 3d arrays of dtype complex64
> >
> > plan = Plan((w,h,k), normalize=True)
> >
> > # forward transform on device
> > im1_gpu = gpuarray.to_gpu(im1)
> > plan.execute(im1_gpu)
> > im1_ft = im1_gpu.get()
> > del im1_gpu
> >
> > im2_gpu = gpuarray.to_gpu(im2)
> > plan.execute(im2_gpu)
> > im2_ft = im2_gpu.get()
> > del im2_gpu
> >
> >
> > # do multiplication on host - can be done on device.
> > conv = im1_ft * im2_ft
> >
> > #inverse transform on device
> > conv_gpu = gpuarray.to_gpu(conv)
> > plan.execute(conv_gpu, inverse='True')
> > corr = conv_gpu.get()
> >
> >
> > I don't think there's anything wrong with the code (it works for smaller
> > array sizes) as such but I am perplexed as to why the failure occurs. I
> > am running the code on a Tesla C2050 (2.8GB available memory) and so
> > there's enough space to hold the 512^3 array with complex64 dtype. Does
> > anyone have an explanation?
> >
> > -Saigopal
> >
> >
> > _______________________________________________
> > PyCUDA mailing list
> > PyCUDA@tiker.net
> > http://lists.tiker.net/listinfo/pycuda
> >
>
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to