Hi Everyone,
First Excellent job with PyOpenCL Andreas!. On the subject of
benchmarks, what sort of maximum FLOPS (and on what hardware) have
people achieved with PyOpenCL? If people have some good benchmark
examples, maybe they could be included with PyOpenCL.
The reason i ask is that i cant seem to get the speed that i probably
should be, for trivial massively parallel tests i have done i get speed
roughly 5 times slower than on a single core of my 3 GHz Core 2 duo, on
an 8600 GT. This speed is calculated using the profiler, so it doesn't
include the time taken to copy the data, this is just the kernel
execution time.
The 8600 is a slow card but at 70-100 theoretical GFLOPS, for trivial
parallel tasks i think i should be getting the same if not a bit better
than a Core 2. I imagine for artificial test problems (like thousands of
parallel dot products) i should be able to attain 1/3 - 1/2 of the
theoretical rate. Is this correct?, what are other peoples experiences?,
maybe i am doing something wrong or my card is crappier than i thought.
Thanks for your time
Lyndon
David Garcia wrote:
Craig,
Even if you ran it on the GPU you could get worse results than with
numpy. Performing computations on the GPU requires quite a bit of
orchestration, such as copying the data to video memory and reading it
back. You want to make your workload as close as this as possible:
1. Load data into GPU.
2. Perform _lengthy_ computation on GPU.
3. Take the output from 2 and do some more heavy computation in the
GPU. Repeat as necessary.
4. Read back results.
Even if you are executing OpenCL on a CPU it will still have some
overhead, so you want your kernels to be significantly expensive to
compute. Otherwise you won't see much benefit.
Cheers,
David
2009/9/17 Craig Swank <[email protected] <mailto:[email protected]>>
Oops, It looks like, upon further review, that those opencl
results below are just for my cpu. I don't have gpu results and
probably don't have a gpu. I'm going to try on another computer
and I'll update this post.
Craig
On Sep 17, 2009, at 12:59 PM, Craig Swank wrote:
Hello,
I am just looking at opencl for the first time today. Looks
pretty
neat. I added the following lines to benchmark-all.py:
c_result2 = numpy.empty_like(a)
time1 = time()
c_result2 = a + b
c_result2 = c_result2 * (a + b)
c_result2 = c_result2 * (a / 2.0)
time2 = time()
print "Execution time of test without OpenCL, but with numpy:
", time2
- time1, "s"
To do the same calculations the way numpy was designed to do,
and got
the following results (edited for readability):
Execution time of test without OpenCL: 23.8333249092 s
Execution time of test without OpenCL, but with numpy:
7.41481781006e-05 s
Execution time of test: 0.014881 s
The numpy way is quite a bit faster. My question is, is there
a use
case where the use of opencl would overtake numpy for these
types of
calculations? Or maybe I just have a sucky GPU? I don't know.
Craig
_______________________________________________
PyOpenCL mailing list
[email protected] <mailto:[email protected]>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net
_______________________________________________
PyOpenCL mailing list
[email protected] <mailto:[email protected]>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net
------------------------------------------------------------------------
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net
_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net