Re: [PyCUDA] PyCUDA poor FP32 performance on Fermi ?

Andreas Kloeckner Mon, 02 Jul 2012 15:58:11 -0700

Hi Roberto,

"Roberto Colistete Jr." <[email protected]> writes:
>      It is my first post here in this PyCUDA group. I am using PyCUDA x 
> CUDA x Mathematica 8 CUDA to compare performance in some problems in 
> Physics.
>
>      Until CC 1.3, the performance ratio of PyCUDA between DP/SP 
> (FP64/FP32) was as expected (near 1/8 or 1/12), comparable when running 
> CUDA or Mathematica 8 CUDA.
>
>      But using the same source code on any GPU device with CC 2.0/2.1 
> (Fermi), the performance in FP32 (SP) is poor with :
> - DP/SP ratio of approx. 1/3 to 1/2;
> - better GPU device (Tesla C2050, CC2.0) being slower (0.77s x 0.33s) in 
> FP32 than older GPU (Tesla C1060, CC1.3)), while in FP64 it is faster 
> (0.89s x 4.48s).
>
>      The same behaviour happens with other CC2.x GPU devices (GTX 480, 
> GT 540M, etc) and any Linux (Ubuntu, Fedora, etc).
>
>      Do you have some explanation about this issue ? And recomendation 
> to solve it ?


It's not really likely that PyCUDA has much to do with this issue. It
might be that the compiler flags that PyCUDA passes to nvcc are to
blame. You can find the nvcc command line by sticking a print statement 
on line 113 (or thereabouts) of pycuda/compiler.py. The resulting binary
should perform just as well as the corresponding CUDA C implementation
compiled with the same flags. If you'd like to pass different flags,
just pass an 'options' kwarg to SourceModule.

HTH,
Andreas

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA poor FP32 performance on Fermi ?

Reply via email to