Dear Michiel,

On a windows machine (cuda 4.1) timing was about the same without fiddling, where I have a consumer level GT 330.

Over on linux (cuda 4.0) with a tesla card the difference you report could be reproduced and seemed to originate in an "-arch=sm_20" switch from the ptx files previously mentioned.

If I add the switch (-arch sm_20) to the nvcc command line it degrades to match the pycuda performance.

To get pycuda matching the nvcc speed I needed to add extra arguments in SourceModule: arch='compute_10', code='sm_20'.

Looking at the ptx output there seems to be something funky going on in the compiler with double versus single precision. Surprisingly it gets to be even slower for me if I ask for erff instead of erf.

Which nvidia SDK, driver and card are you using? It seems the difference comes down to a slightly different default nvcc invocation. Perhaps you will need to tune according to the precision you need for your problem.

Best,

Jon



On 10/04/2012 14:37, Michiel Bruinink wrote:
Hello,
When you use the erf function with pyCuda, execution takes almost 2x
longer than with nvcc. The exp function takes about 30% longer.
I use these functions in my pyCuda program a lot and in that case it
makes the program 3x slower than when using nvcc.
I attached two small example programs that can be run straight away.
Regards,
Michiel.


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to