Thanks, David.  I was kind of guessing as much.  I'll bet it would take quite a 
bit more than just running over relatively small loops to really see pyopencl 
shine.

By the way, I noticed the benchmark-all.py example has a couple of nested for 
loops in the non pyopencl section.  The same exact array is generated with just 
one of them.  Is there some logic behind the nested loops?  It makes the 
example 10 times slower and is a bit misleading, I think.  When I removed one 
of them the time for no opencl was about twice as fast as when the context was 
my 2 core processor, which makes sense.

Craig
________________________________________
From: David Garcia [[email protected]]
Sent: Thursday, September 17, 2009 3:27 PM
To: Swank, Craig
Cc: [email protected]
Subject: Re: [PyOpenCL] benchmark

Craig,

Even if you ran it on the GPU you could get worse results than with numpy. 
Performing computations on the GPU requires quite a bit of orchestration, such 
as copying the data to video memory and reading it back. You want to make your 
workload as close as this as possible:

1. Load data into GPU.
2. Perform _lengthy_ computation on GPU.
3. Take the output from 2 and do some more heavy computation in the GPU. Repeat 
as necessary.
4. Read back results.

Even if you are executing OpenCL on a CPU it will still have some overhead, so 
you want your kernels to be significantly expensive to compute. Otherwise you 
won't see much benefit.

Cheers,

David


2009/9/17 Craig Swank <[email protected]<mailto:[email protected]>>
Oops, It looks like, upon further review, that those opencl results below are 
just for my cpu.  I don't have gpu results and probably don't have a gpu.  I'm 
going to try on another computer and I'll update this post.

Craig




On Sep 17, 2009, at 12:59 PM, Craig Swank wrote:

Hello,
I am just looking at opencl for the first time today.  Looks pretty
neat.  I added the following lines to benchmark-all.py:

c_result2 = numpy.empty_like(a)
time1 = time()
c_result2 = a + b
c_result2 = c_result2 * (a + b)
c_result2 = c_result2 * (a / 2.0)
time2 = time()
print "Execution time of test without OpenCL, but with numpy: ", time2
- time1, "s"

To do the same calculations the way numpy was designed to do, and got
the following results (edited for readability):

Execution time of test without OpenCL:  23.8333249092 s
Execution time of test without OpenCL, but with numpy:
7.41481781006e-05 s
Execution time of test: 0.014881 s

The numpy way is quite a bit faster.  My question is, is there a use
case where the use of opencl would overtake numpy for these types of
calculations? Or maybe I just have a sucky GPU?  I don't know.

Craig


_______________________________________________
PyOpenCL mailing list
[email protected]<mailto:[email protected]>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net


_______________________________________________
PyOpenCL mailing list
[email protected]<mailto:[email protected]>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net


_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

Reply via email to