Thanks, David. I was kind of guessing as much. I'll bet it would take quite a bit more than just running over relatively small loops to really see pyopencl shine.
By the way, I noticed the benchmark-all.py example has a couple of nested for loops in the non pyopencl section. The same exact array is generated with just one of them. Is there some logic behind the nested loops? It makes the example 10 times slower and is a bit misleading, I think. When I removed one of them the time for no opencl was about twice as fast as when the context was my 2 core processor, which makes sense. Craig ________________________________________ From: David Garcia [[email protected]] Sent: Thursday, September 17, 2009 3:27 PM To: Swank, Craig Cc: [email protected] Subject: Re: [PyOpenCL] benchmark Craig, Even if you ran it on the GPU you could get worse results than with numpy. Performing computations on the GPU requires quite a bit of orchestration, such as copying the data to video memory and reading it back. You want to make your workload as close as this as possible: 1. Load data into GPU. 2. Perform _lengthy_ computation on GPU. 3. Take the output from 2 and do some more heavy computation in the GPU. Repeat as necessary. 4. Read back results. Even if you are executing OpenCL on a CPU it will still have some overhead, so you want your kernels to be significantly expensive to compute. Otherwise you won't see much benefit. Cheers, David 2009/9/17 Craig Swank <[email protected]<mailto:[email protected]>> Oops, It looks like, upon further review, that those opencl results below are just for my cpu. I don't have gpu results and probably don't have a gpu. I'm going to try on another computer and I'll update this post. Craig On Sep 17, 2009, at 12:59 PM, Craig Swank wrote: Hello, I am just looking at opencl for the first time today. Looks pretty neat. I added the following lines to benchmark-all.py: c_result2 = numpy.empty_like(a) time1 = time() c_result2 = a + b c_result2 = c_result2 * (a + b) c_result2 = c_result2 * (a / 2.0) time2 = time() print "Execution time of test without OpenCL, but with numpy: ", time2 - time1, "s" To do the same calculations the way numpy was designed to do, and got the following results (edited for readability): Execution time of test without OpenCL: 23.8333249092 s Execution time of test without OpenCL, but with numpy: 7.41481781006e-05 s Execution time of test: 0.014881 s The numpy way is quite a bit faster. My question is, is there a use case where the use of opencl would overtake numpy for these types of calculations? Or maybe I just have a sucky GPU? I don't know. Craig _______________________________________________ PyOpenCL mailing list [email protected]<mailto:[email protected]> http://tiker.net/mailman/listinfo/pyopencl_tiker.net _______________________________________________ PyOpenCL mailing list [email protected]<mailto:[email protected]> http://tiker.net/mailman/listinfo/pyopencl_tiker.net _______________________________________________ PyOpenCL mailing list [email protected] http://tiker.net/mailman/listinfo/pyopencl_tiker.net
