Re: [PyCUDA] erf and exp much slower with pyCuda

Jonathan WRIGHT Tue, 10 Apr 2012 07:11:06 -0700

Dear Michiel,

On a windows machine (cuda 4.1) timing was about the same withoutfiddling, where I have a consumer level GT 330.

Over on linux (cuda 4.0) with a tesla card the difference you reportcould be reproduced and seemed to originate in an "-arch=sm_20" switchfrom the ptx files previously mentioned.

If I add the switch (-arch sm_20) to the nvcc command line it degradesto match the pycuda performance.

To get pycuda matching the nvcc speed I needed to add extra arguments inSourceModule: arch='compute_10', code='sm_20'.

Looking at the ptx output there seems to be something funky going on inthe compiler with double versus single precision. Surprisingly it getsto be even slower for me if I ask for erff instead of erf.

Which nvidia SDK, driver and card are you using? It seems the differencecomes down to a slightly different default nvcc invocation. Perhaps youwill need to tune according to the precision you need for your problem.


Best,

Jon



On 10/04/2012 14:37, Michiel Bruinink wrote:

Hello,
When you use the erf function with pyCuda, execution takes almost 2x
longer than with nvcc. The exp function takes about 30% longer.
I use these functions in my pyCuda program a lot and in that case it
makes the program 3x slower than when using nvcc.
I attached two small example programs that can be run straight away.
Regards,
Michiel.


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] erf and exp much slower with pyCuda

Reply via email to