Dustin Kleckner <[email protected]> writes: > I’ve been using pyopencl for awhile for various simulation/data processing > tasks. I recently upgraded to a new computer, and noticed things were > considerably slower. > > After some experimentation, I tracked this down to the version of pyopencl I > was using. The updated version (2015.2.4; most recent on pypi) takes > significantly longer to queue a function call (~1.5 ms) than the old version > (2015.1, ~0.03 ms). Both times come from the same machine*. Profiling > indicates that the newer version is making lots of function calls the old > version did not. FYI, the code I used to test this is below (adapted from > documentation). > > For my purposes, this is slightly alarming: my code makes lots of kernel > calls, in which case the new version is 50x slower for small data sets! > > Is this something that has been/will be fixed in newer versions of pyopencl? > Is there a workaround? Of course, for the time being I can use the old > version, but I’d rather not be stuck with it. > > If needed, I can provide the profiler output.
tl;dr: Hang on to the kernel object, i.e. 'sum_knl = prg.sum'. It's used for caching stuff. PyOpenCL 2015.2 generates custom Python code to make kernel invocation *faster* (not slower). Generating this code (which gets attached to the kernel object, prg.sum) takes time, and every time you call 'prg.sum', you get a new kernel object. So you're likely mainly benchmarking the generation (and compilation) of the invoker code. HTH, Andreas _______________________________________________ PyOpenCL mailing list [email protected] https://lists.tiker.net/listinfo/pyopencl
