I've noticed that using e.g. clmath._atan2(out, in1, in2, queue) with a
pre-allocated `out` array is nearly twice as fast as using
clmath.atan2(in1, in2, queue), even when a memory pool is used to
allocate the Array.
Consider the (simple) code here:
https://gist.github.com/hgomersall/d7a229df0f816388b63f
It defines the two test cases above inside a function which can be run
inside ipython as follows:
In [1]: from clmath_test import *
In [2]: timeit cl_test()
1000 loops, best of 3: 639 µs per loop
In [3]: timeit cl_test_preallocated()
1000 loops, best of 3: 363 µs per loop
Am I missing something here or is this expected behaviour?
Is _atan2 part of the stable API?
(this was on an nvidia machine. On my intel laptop, I seem to run into
this bug:
https://bugs.launchpad.net/ubuntu/+source/pyopencl/+bug/1354086)
Cheers,
Henry
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl