On Tue, 20 Sep 2011 17:19:50 -0500, Robert L Cloud <[email protected]> wrote:
> However, even for small domains, where most of everything should fit into
> cache, my program is far slower than an OpenMP program.

Just one more suggestion from my side: Try and do more per work item. It
might be that the AMD implementation has a fairly high setup cost for
each work item, and so having fewer (larger) ones is going to be
beneficial. In my experience, the AMD implementation gives performance
about as good as gcc, while Intel can be significantly better, depending
on what you're trying to do.

Andreas


Attachment: pgpjtlHswryfE.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to