Hi, I had the opportunity to benchmark open source OpenCL drivers (POCL on CPU, Beignet on GPU) versus proprietary ones and they behave very well, now !
Test computer: Macbook pro 13" with an Iris 5100 GPU integrated into the Haswell processor (i5-4308U) running Debian Jessie (or macOSX) The code used is describes on pages 7-14 of this document: http://pdebuyl.be/tmp/esp2014_draft.pdf It consists of a map operation (cast and multiplication/divisions) followed by a sparse matrix dense vector multiplication implemented as an array of struct (method called LUT, better suited to CPU) or as a struct of array (called CSR, better suited to GPU). CSR is implemented using parallel reduction within a workgroup. All OpenCL method use single precision floating point arithmetics and Kahan summation while OpenMP code uses double precision arithmetics. This benchmark is the execution time in millisecond of the complete treatment for input images of various size (from 1 to 16 Mpixel). It is the best timing out of 3, averaged over 10 processing, using the timeit module from python. Reference timings: 1D_CPU_LUT_OpenMP Img size Linux/gcc Apple/clang 1.02 12.12 13.451 2.10 30.14 35.307 4.19 63.79 87.110 6.22 96.17 130.77 11.90 222.15 265.94 16.78 270.42 359.93 1D_CPU_CSR_OpenMP Img size Linux/gcc Apple/clang 1.02 12.31 12.256 2.10 30.20 33.220 4.19 64.34 76.948 6.22 88.82 111.60 11.90 206.82 218.81 16.78 280.03 443.35 Execution on the CPU: 1D_CPU_LUT_OpenCL Img size AMD Intel Apple POCL 1.02 13.11 8.25 9.7813 8.47 2.10 29.85 15.20 20.563 17.85 4.19 58.08 32.77 47.877 47.19 6.22 97.88 53.04 80.372 62.53 11.90 184.29 125.52 149.33 135.89 16.78 261.21 149.31 205.81 190.14 1D_CPU_CSR_OpenCL Img size AMD Intel Apple POCL 1.02 16.96 10.05 9.8027 10.02 2.10 37.12 18.46 21.904 21.35 4.19 82.78 42.24 46.961 59.89 6.22 133.41 70.17 68.312 73.87 11.90 271.61 182.41 143.57 178.77 16.78 346.55 222.82 212.17 260.62 Execution on the integrated GPU: 1D_GPU_LUT_OpenCL Img size Beignet Apple 1.02 7.50 10.066 2.10 14.44 16.345 4.19 28.91 34.538 6.22 ----- 37.570 11.90 ----- 68.443 16.78 ----- 78.333 no data: MemoryError (only 256MB on GPU) 1D_GPU_CSR_OpenCL Img size Beignet Apple 1.02 3.95 6.0475 2.10 7.55 13.324 4.19 15.62 23.255 6.22 23.88 33.352 11.90 45.63 55.099 16.78 68.78 82.569 It is funny to notice this laptop GPU outperforms a Intel Xeon-phi accelerator which is much more expensive than the whole laptop, using the same code. Cheers, -- Jérôme Kieffer Data analysis unit - ESRF _______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
