Dustin Kleckner <[email protected]> writes:
> I’ve been using pyopencl for awhile for various simulation/data processing 
> tasks.  I recently upgraded to a new computer, and noticed things were 
> considerably slower.
>
> After some experimentation, I tracked this down to the version of pyopencl I 
> was using.  The updated version (2015.2.4; most recent on pypi) takes 
> significantly longer to queue a function call (~1.5 ms) than the old version 
> (2015.1, ~0.03 ms).  Both times come from the same machine*.  Profiling 
> indicates that the newer version is making lots of function calls the old 
> version did not.  FYI, the code I used to test this is below (adapted from 
> documentation).
>
> For my purposes, this is slightly alarming: my code makes lots of kernel 
> calls, in which case the new version is 50x slower for small data sets!
>
> Is this something that has been/will be fixed in newer versions of pyopencl?  
> Is there a workaround?  Of course, for the time being I can use the old 
> version, but I’d rather not be stuck with it.
>
> If needed, I can provide the profiler output.

tl;dr: Hang on to the kernel object, i.e. 'sum_knl = prg.sum'. It's used
for caching stuff.

PyOpenCL 2015.2 generates custom Python code to make kernel invocation
*faster* (not slower). Generating this code (which gets attached to the
kernel object, prg.sum) takes time, and every time you call 'prg.sum',
you get a new kernel object. So you're likely mainly benchmarking the
generation (and compilation) of the invoker code.

HTH,
Andreas

_______________________________________________
PyOpenCL mailing list
[email protected]
https://lists.tiker.net/listinfo/pyopencl

Reply via email to