[PyOpenCL] Fixed: curious slowness of PyOpenCL matrix-multiply example

Andreas Kloeckner Tue, 22 Jun 2010 18:31:31 -0700

Hi all,

if you have been wondering why the matrix-multiply example shipped with
PyOpenCL shows sub-standard performance on Nvidia hardware, wonder no
longer. In anticipation of next week's SciPy conference, I've finally
fixed that, and it turned out to be (d'oh!) bank conflicts. Which is
odd, since the example was (at some point) derived from Nvidia's own SDK
example. Anyway, for me, matmul performance on the same hardware is now
comparable between CL and CUDA.


Just thought I'd let you know.

Happy hacking,
Andreas

pgpxheCFPKiWc.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

[PyOpenCL] Fixed: curious slowness of PyOpenCL matrix-multiply example

Reply via email to