I've noticed that using e.g. clmath._atan2(out, in1, in2, queue) with a pre-allocated `out` array is nearly twice as fast as using clmath.atan2(in1, in2, queue), even when a memory pool is used to allocate the Array.

Consider the (simple) code here:
https://gist.github.com/hgomersall/d7a229df0f816388b63f

It defines the two test cases above inside a function which can be run inside ipython as follows:

In [1]: from clmath_test import *

In [2]: timeit cl_test()
1000 loops, best of 3: 639 µs per loop

In [3]: timeit cl_test_preallocated()
1000 loops, best of 3: 363 µs per loop

Am I missing something here or is this expected behaviour?

Is _atan2 part of the stable API?

(this was on an nvidia machine. On my intel laptop, I seem to run into this bug:
https://bugs.launchpad.net/ubuntu/+source/pyopencl/+bug/1354086)

Cheers,

Henry

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to