On Mon, 21 Mar 2011 19:55:31 +0100, Magnus Paulsson <paulsso...@gmail.com> wrote: > > Wild theory: Maybe the print statements introduce GPU synchronization? > > Does your observation change with multiple loops through the code? > > > > Also note that the profiler won't help you debug overlap. If it is > > active, all GPU activity is synchronous. > > > > Andreas > > No. None of the above. The "Working.py" code runs overlapping using > the profiler including print statments.
CUDA 4.0 programming guide, 3.2.5.1: "When an application is run via a CUDA debugger or profiler (cuda-gdb, CUDA Visual Profiler, Parallel Nsight), all launches are synchronous." (and that sentence has been around for a few versions) Either you are or that sentence is wrong. :) Andreas
pgpZFl3cvbeEP.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda