On Mon, 21 Mar 2011 19:02:21 +0100, Magnus Paulsson <paulsso...@gmail.com> wrote: > > If by 'working' you mean 'actually overlapping', here's an additional > > subtlety. If 'exec' includes any kind of memory allocations, those are > > implicitly synchronization points--so you might be synchronizing without > > even seeing it. A memory pool would be a good solution for that (but > > would only help on the second run through). > > > > pyFFT (and my toy code) only allocate memory at the start. Otherwise > we would not see overlap in the "Working.py".
Wild theory: Maybe the print statements introduce GPU synchronization? Does your observation change with multiple loops through the code? Also note that the profiler won't help you debug overlap. If it is active, all GPU activity is synchronous. Andreas
pgpq7C1HuT4C0.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda