Hi Everyone,

First Excellent job with PyOpenCL Andreas!. On the subject of benchmarks, what sort of maximum FLOPS (and on what hardware) have people achieved with PyOpenCL? If people have some good benchmark examples, maybe they could be included with PyOpenCL.

The reason i ask is that i cant seem to get the speed that i probably should be, for trivial massively parallel tests i have done i get speed roughly 5 times slower than on a single core of my 3 GHz Core 2 duo, on an 8600 GT. This speed is calculated using the profiler, so it doesn't include the time taken to copy the data, this is just the kernel execution time.

The 8600 is a slow card but at 70-100 theoretical GFLOPS, for trivial parallel tasks i think i should be getting the same if not a bit better than a Core 2. I imagine for artificial test problems (like thousands of parallel dot products) i should be able to attain 1/3 - 1/2 of the theoretical rate. Is this correct?, what are other peoples experiences?, maybe i am doing something wrong or my card is crappier than i thought.

Thanks for your time
Lyndon



David Garcia wrote:
Craig,

Even if you ran it on the GPU you could get worse results than with numpy. Performing computations on the GPU requires quite a bit of orchestration, such as copying the data to video memory and reading it back. You want to make your workload as close as this as possible:

1. Load data into GPU.
2. Perform _lengthy_ computation on GPU.
3. Take the output from 2 and do some more heavy computation in the GPU. Repeat as necessary.
4. Read back results.

Even if you are executing OpenCL on a CPU it will still have some overhead, so you want your kernels to be significantly expensive to compute. Otherwise you won't see much benefit.

Cheers,

David


2009/9/17 Craig Swank <[email protected] <mailto:[email protected]>>

    Oops, It looks like, upon further review, that those opencl
    results below are just for my cpu.  I don't have gpu results and
    probably don't have a gpu.  I'm going to try on another computer
    and I'll update this post.

    Craig




    On Sep 17, 2009, at 12:59 PM, Craig Swank wrote:

        Hello,
        I am just looking at opencl for the first time today.  Looks
        pretty
        neat.  I added the following lines to benchmark-all.py:

        c_result2 = numpy.empty_like(a)
        time1 = time()
        c_result2 = a + b
        c_result2 = c_result2 * (a + b)
        c_result2 = c_result2 * (a / 2.0)
        time2 = time()
        print "Execution time of test without OpenCL, but with numpy:
        ", time2
        - time1, "s"

        To do the same calculations the way numpy was designed to do,
        and got
        the following results (edited for readability):

        Execution time of test without OpenCL:  23.8333249092 s
        Execution time of test without OpenCL, but with numpy:
        7.41481781006e-05 s
        Execution time of test: 0.014881 s

        The numpy way is quite a bit faster.  My question is, is there
        a use
        case where the use of opencl would overtake numpy for these
        types of
        calculations? Or maybe I just have a sucky GPU?  I don't know.

        Craig


        _______________________________________________
        PyOpenCL mailing list
        [email protected] <mailto:[email protected]>
        http://tiker.net/mailman/listinfo/pyopencl_tiker.net



    _______________________________________________
    PyOpenCL mailing list
    [email protected] <mailto:[email protected]>
    http://tiker.net/mailman/listinfo/pyopencl_tiker.net


------------------------------------------------------------------------

_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

_______________________________________________
PyOpenCL mailing list
[email protected]
http://tiker.net/mailman/listinfo/pyopencl_tiker.net

Reply via email to