Re: [PyOpenCL] benchmark

Lyndon Whaite Fri, 18 Sep 2009 05:54:36 -0700

Thanks Andreas, your very right and you steered me in the rightdirection :-) . If i cache the data in local memory outside the innerloop in the benchmark_all example and increase the local work size imanage 47 GFLOPS (from 100 GFLOPS theoretical) - much more like what iwas expecting. Thanks for your help.



Execution time of test without OpenCL:  10.1647880077 s
===============================================================
Platform name: NVIDIA
Platform profile: FULL_PROFILE
Platform vendor: NVIDIA Corporation
Platform version: OpenCL 1.0
---------------------------------------------------------------
Device name: GeForce 8600 GT
Device type: GPU
Device memory:  255 MB
Device max clock speed: 1188 MHz
Device compute units: 4
Execution time of test: 9.9648e-05 s
Results OK



Andreas Klöckner wrote:

On Donnerstag 17 September 2009, Lyndon Whaite wrote:

Thanks Andreas. No i don't think so. I was using a kernel very similar
to the benchmark_all example.


That's a contradiction--benchmark-all is purely memory-bound. :)

Andreas

------------------------------------------------------------------------


_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

Re: [PyOpenCL] benchmark

Reply via email to