Thanks Andreas. No i don't think so. I was using a kernel very similar
to the benchmark_all example. But as Craig just showed his is incredibly
slow too, obviously it isn't a good test of the capabilities. I might
try some CUDA examples and see where i am going wrong.
Andreas Klöckner wrote:
On Donnerstag 17 September 2009, Lyndon Whaite wrote:
The 8600 is a slow card but at 70-100 theoretical GFLOPS, for trivial
parallel tasks i think i should be getting the same if not a bit better
than a Core 2. I imagine for artificial test problems (like thousands of
parallel dot products) i should be able to attain 1/3 - 1/2 of the
theoretical rate. Is this correct?, what are other peoples experiences?,
maybe i am doing something wrong or my card is crappier than i thought.
Are you loading or storing anything from global memory? If so, you're very
likely memory- and not compute-bound. But that's only guesswork--you'd have to
post your kernel.
Andreas
------------------------------------------------------------------------
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net