Hi,

It is my first post here in this PyCUDA group. I am using PyCUDA x CUDA x Mathematica 8 CUDA to compare performance in some problems in Physics.

Until CC 1.3, the performance ratio of PyCUDA between DP/SP (FP64/FP32) was as expected (near 1/8 or 1/12), comparable when running CUDA or Mathematica 8 CUDA.

But using the same source code on any GPU device with CC 2.0/2.1 (Fermi), the performance in FP32 (SP) is poor with :
- DP/SP ratio of approx. 1/3 to 1/2;
- better GPU device (Tesla C2050, CC2.0) being slower (0.77s x 0.33s) in FP32 than older GPU (Tesla C1060, CC1.3)), while in FP64 it is faster (0.89s x 4.48s).

The same behaviour happens with other CC2.x GPU devices (GTX 480, GT 540M, etc) and any Linux (Ubuntu, Fedora, etc).

Do you have some explanation about this issue ? And recomendation to solve it ?

        Regards,

        Roberto

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to