Hi all, if you have been wondering why the matrix-multiply example shipped with PyOpenCL shows sub-standard performance on Nvidia hardware, wonder no longer. In anticipation of next week's SciPy conference, I've finally fixed that, and it turned out to be (d'oh!) bank conflicts. Which is odd, since the example was (at some point) derived from Nvidia's own SDK example. Anyway, for me, matmul performance on the same hardware is now comparable between CL and CUDA.
Just thought I'd let you know. Happy hacking, Andreas
pgpxheCFPKiWc.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
