Hello, I have a question and hope that you can help me. I am trying to find the bottleneck in my code but I can't get a grip at the moment.
For a while I thought it was the writes to global memory At the moment I am using an early "return" statement in my code to skip parts of the code, e.g. a for-loop. Now I am wondering if this is working at all. Could it be that the code exits even way before the "return" statement when the compiler recognizes that calculations done in a for-loop are not written to global memory or used anywhere else? Kind regards, Joe
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda