Gregor Thalhammer <[email protected]> writes:
> on my macbook pro (running os x 10.11) I have been plagued with nasty
> segfaults when using the nvidia 750M GPU and the high level methods of
> pyopencl.array together with complex64 arrays. Ultimately it seems to
> be due to buggy nvidia OpenCL drivers on os x, but I found a
> workaround:
>
> The typedef in pyopencl/cl/pyopencl-complex.h  for cfloat_t (after macro 
> expansion) 
> as a 
> "union {struct {float x,y;}; struct {float real, imag;}}“ 
> is too sophisticated. Same for a simpler „struct {float x,y;}“ But segfaults 
> go away if instead I use  "typedef float2 cfloat_t;“ and then replace a.real 
> by a.x (same for .imag -> .y). All tests pass (could not test for complex 
> double). This is not beautiful, but it works for me. 
>
> The docs state that the the struct has been introduced to avoid silent
> bugs, e.g. complex + real not giving the expected result, so I
> understand that my workaround is not acceptable for a PR.

I would be happy to accept a pull request to this effect that, based on
some flag, replaces the struct-based definition of the complex number
with a vector-based one, to work around broken OpenCL implementations,
of which there appear to be many. The one requirement would be that the
struct-based implementation remains the default, and that all code
continues to work with it.

Andreas


_______________________________________________
PyOpenCL mailing list
[email protected]
https://lists.tiker.net/listinfo/pyopencl

Reply via email to