Re: [PyOpenCL] benchmark

Lyndon Whaite Thu, 17 Sep 2009 15:51:51 -0700


Hi Everyone,

First Excellent job with PyOpenCL Andreas!. On the subject ofbenchmarks, what sort of maximum FLOPS (and on what hardware) havepeople achieved with PyOpenCL? If people have some good benchmarkexamples, maybe they could be included with PyOpenCL.

The reason i ask is that i cant seem to get the speed that i probablyshould be, for trivial massively parallel tests i have done i get speedroughly 5 times slower than on a single core of my 3 GHz Core 2 duo, onan 8600 GT. This speed is calculated using the profiler, so it doesn'tinclude the time taken to copy the data, this is just the kernelexecution time.

The 8600 is a slow card but at 70-100 theoretical GFLOPS, for trivialparallel tasks i think i should be getting the same if not a bit betterthan a Core 2. I imagine for artificial test problems (like thousands ofparallel dot products) i should be able to attain 1/3 - 1/2 of thetheoretical rate. Is this correct?, what are other peoples experiences?,maybe i am doing something wrong or my card is crappier than i thought.


Thanks for your time
Lyndon



David Garcia wrote:

Craig,

Even if you ran it on the GPU you could get worse results than withnumpy. Performing computations on the GPU requires quite a bit oforchestration, such as copying the data to video memory and reading itback. You want to make your workload as close as this as possible:


1. Load data into GPU.
2. Perform _lengthy_ computation on GPU.

3. Take the output from 2 and do some more heavy computation in theGPU. Repeat as necessary.

4. Read back results.

Even if you are executing OpenCL on a CPU it will still have someoverhead, so you want your kernels to be significantly expensive tocompute. Otherwise you won't see much benefit.


Cheers,

David


2009/9/17 Craig Swank <[email protected] <mailto:[email protected]>>

    Oops, It looks like, upon further review, that those opencl
    results below are just for my cpu.  I don't have gpu results and
    probably don't have a gpu.  I'm going to try on another computer
    and I'll update this post.

    Craig




    On Sep 17, 2009, at 12:59 PM, Craig Swank wrote:

        Hello,
        I am just looking at opencl for the first time today.  Looks
        pretty
        neat.  I added the following lines to benchmark-all.py:

        c_result2 = numpy.empty_like(a)
        time1 = time()
        c_result2 = a + b
        c_result2 = c_result2 * (a + b)
        c_result2 = c_result2 * (a / 2.0)
        time2 = time()
        print "Execution time of test without OpenCL, but with numpy:
        ", time2
        - time1, "s"

        To do the same calculations the way numpy was designed to do,
        and got
        the following results (edited for readability):

        Execution time of test without OpenCL:  23.8333249092 s
        Execution time of test without OpenCL, but with numpy:
        7.41481781006e-05 s
        Execution time of test: 0.014881 s

        The numpy way is quite a bit faster.  My question is, is there
        a use
        case where the use of opencl would overtake numpy for these
        types of
        calculations? Or maybe I just have a sucky GPU?  I don't know.

        Craig


        _______________________________________________
        PyOpenCL mailing list
        [email protected] <mailto:[email protected]>
        http://tiker.net/mailman/listinfo/pyopencl_tiker.net



    _______________________________________________
    PyOpenCL mailing list
    [email protected] <mailto:[email protected]>
    http://tiker.net/mailman/listinfo/pyopencl_tiker.net


------------------------------------------------------------------------

_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

Re: [PyOpenCL] benchmark

Reply via email to