Hi,
It is my first post here in this PyCUDA group. I am using PyCUDA x
CUDA x Mathematica 8 CUDA to compare performance in some problems in
Physics.
Until CC 1.3, the performance ratio of PyCUDA between DP/SP
(FP64/FP32) was as expected (near 1/8 or 1/12), comparable when running
CUDA or Mathematica 8 CUDA.
But using the same source code on any GPU device with CC 2.0/2.1
(Fermi), the performance in FP32 (SP) is poor with :
- DP/SP ratio of approx. 1/3 to 1/2;
- better GPU device (Tesla C2050, CC2.0) being slower (0.77s x 0.33s) in
FP32 than older GPU (Tesla C1060, CC1.3)), while in FP64 it is faster
(0.89s x 4.48s).
The same behaviour happens with other CC2.x GPU devices (GTX 480,
GT 540M, etc) and any Linux (Ubuntu, Fedora, etc).
Do you have some explanation about this issue ? And recomendation
to solve it ?
Regards,
Roberto
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda