Hi Saigopal, On Tue, Jan 18, 2011 at 5:14 PM, Saigopal Nelaturi <saigo...@gmail.com> wrote: > Thanks for the quick response. My operating specs are exactly the same as > yours, and when I run your test I get an error of ~3e-7. But I think that > number may have to do with dividing by the norm of the convolution in the > expression in the last line of your test
Of course, it is the relative error. Absolute error would depend on the size of the array and hence would not provide any information. Relative error of the order of 1e-6 - 1e-7 when working with single-precision numbers is normal. > If the GPU convolution norm is high and the difference between cpu and gpu > values of convolution is relatively low, you would get a low value for the > division. Exactly. That's what I am checking in my code. You can try comparing numpy.fft.fftn() results for single- and double- precision numbers - you will get the same relative error. > The ratio of those two numbers is ~3e-7. But the norm of the difference > between the two convolutions (cpu vs gpu) is high (69734). Is there something > I am missing? It is high because the array size is extremely large. Probably, the following last line would illustrate my point better: print numpy.max(numpy.abs((corr_cpu - corr_gpu) / corr_cpu)) This gives 6.6e-7 on my machine. This means that the relative difference between every pair of elements with the same index in CPU-produced array and GPU-produced array is smaller than 6.6e-7. You cannot really ask for much more when you are using single-precision numbers — it is defined by the size of the mantissa. So, if this small difference is the actual reason of you getting "garbage data", the only solution is to start using double precision numbers (or maybe review your algorithms). Best regards, Bogdan _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda