Thanks for the quick reply — that fixed it.

Just out of curiosity, is there a compelling reason not to cache the kernel 
code in the program objects, and then quickly returning repeated calls?  I 
generally wouldn’t expect calling a method repeatedly to be significant slower 
than getting a copy and then calling it.  I guess what you’re saying is that I 
shouldn’t think of “pgm.sum” as a method, but rather as an argument-less 
function that returns a method?  In this case shouldn’t I expect the syntax to 
be “sum_knl = prg.sum()”? 

Best,
Dustin

PS: Thanks for writing pyopencl — it has made my life much easier!


> On Feb 13, 2016, at 6:42 PM, Andreas Kloeckner <[email protected]> 
> wrote:
> 
> Dustin Kleckner <[email protected]> writes:
>> I’ve been using pyopencl for awhile for various simulation/data processing 
>> tasks.  I recently upgraded to a new computer, and noticed things were 
>> considerably slower.
>> 
>> After some experimentation, I tracked this down to the version of pyopencl I 
>> was using.  The updated version (2015.2.4; most recent on pypi) takes 
>> significantly longer to queue a function call (~1.5 ms) than the old version 
>> (2015.1, ~0.03 ms).  Both times come from the same machine*.  Profiling 
>> indicates that the newer version is making lots of function calls the old 
>> version did not.  FYI, the code I used to test this is below (adapted from 
>> documentation).
>> 
>> For my purposes, this is slightly alarming: my code makes lots of kernel 
>> calls, in which case the new version is 50x slower for small data sets!
>> 
>> Is this something that has been/will be fixed in newer versions of pyopencl? 
>>  Is there a workaround?  Of course, for the time being I can use the old 
>> version, but I’d rather not be stuck with it.
>> 
>> If needed, I can provide the profiler output.
> 
> tl;dr: Hang on to the kernel object, i.e. 'sum_knl = prg.sum'. It's used
> for caching stuff.
> 
> PyOpenCL 2015.2 generates custom Python code to make kernel invocation
> *faster* (not slower). Generating this code (which gets attached to the
> kernel object, prg.sum) takes time, and every time you call 'prg.sum',
> you get a new kernel object. So you're likely mainly benchmarking the
> generation (and compilation) of the invoker code.
> 
> HTH,
> Andreas


_______________________________________________
PyOpenCL mailing list
[email protected]
https://lists.tiker.net/listinfo/pyopencl

Reply via email to