Hi Roberto, "Roberto Colistete Jr." <[email protected]> writes: > It is my first post here in this PyCUDA group. I am using PyCUDA x > CUDA x Mathematica 8 CUDA to compare performance in some problems in > Physics. > > Until CC 1.3, the performance ratio of PyCUDA between DP/SP > (FP64/FP32) was as expected (near 1/8 or 1/12), comparable when running > CUDA or Mathematica 8 CUDA. > > But using the same source code on any GPU device with CC 2.0/2.1 > (Fermi), the performance in FP32 (SP) is poor with : > - DP/SP ratio of approx. 1/3 to 1/2; > - better GPU device (Tesla C2050, CC2.0) being slower (0.77s x 0.33s) in > FP32 than older GPU (Tesla C1060, CC1.3)), while in FP64 it is faster > (0.89s x 4.48s). > > The same behaviour happens with other CC2.x GPU devices (GTX 480, > GT 540M, etc) and any Linux (Ubuntu, Fedora, etc). > > Do you have some explanation about this issue ? And recomendation > to solve it ?
It's not really likely that PyCUDA has much to do with this issue. It might be that the compiler flags that PyCUDA passes to nvcc are to blame. You can find the nvcc command line by sticking a print statement on line 113 (or thereabouts) of pycuda/compiler.py. The resulting binary should perform just as well as the corresponding CUDA C implementation compiled with the same flags. If you'd like to pass different flags, just pass an 'options' kwarg to SourceModule. HTH, Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
