Thanks for the quick reply — that fixed it. Just out of curiosity, is there a compelling reason not to cache the kernel code in the program objects, and then quickly returning repeated calls? I generally wouldn’t expect calling a method repeatedly to be significant slower than getting a copy and then calling it. I guess what you’re saying is that I shouldn’t think of “pgm.sum” as a method, but rather as an argument-less function that returns a method? In this case shouldn’t I expect the syntax to be “sum_knl = prg.sum()”?
Best, Dustin PS: Thanks for writing pyopencl — it has made my life much easier! > On Feb 13, 2016, at 6:42 PM, Andreas Kloeckner <[email protected]> > wrote: > > Dustin Kleckner <[email protected]> writes: >> I’ve been using pyopencl for awhile for various simulation/data processing >> tasks. I recently upgraded to a new computer, and noticed things were >> considerably slower. >> >> After some experimentation, I tracked this down to the version of pyopencl I >> was using. The updated version (2015.2.4; most recent on pypi) takes >> significantly longer to queue a function call (~1.5 ms) than the old version >> (2015.1, ~0.03 ms). Both times come from the same machine*. Profiling >> indicates that the newer version is making lots of function calls the old >> version did not. FYI, the code I used to test this is below (adapted from >> documentation). >> >> For my purposes, this is slightly alarming: my code makes lots of kernel >> calls, in which case the new version is 50x slower for small data sets! >> >> Is this something that has been/will be fixed in newer versions of pyopencl? >> Is there a workaround? Of course, for the time being I can use the old >> version, but I’d rather not be stuck with it. >> >> If needed, I can provide the profiler output. > > tl;dr: Hang on to the kernel object, i.e. 'sum_knl = prg.sum'. It's used > for caching stuff. > > PyOpenCL 2015.2 generates custom Python code to make kernel invocation > *faster* (not slower). Generating this code (which gets attached to the > kernel object, prg.sum) takes time, and every time you call 'prg.sum', > you get a new kernel object. So you're likely mainly benchmarking the > generation (and compilation) of the invoker code. > > HTH, > Andreas _______________________________________________ PyOpenCL mailing list [email protected] https://lists.tiker.net/listinfo/pyopencl
