On Mon, 21 Mar 2011 19:02:21 +0100, Magnus Paulsson <paulsso...@gmail.com> 
wrote:
> > If by 'working' you mean 'actually overlapping', here's an additional
> > subtlety. If 'exec' includes any kind of memory allocations, those are
> > implicitly synchronization points--so you might be synchronizing without
> > even seeing it. A memory pool would be a good solution for that (but
> > would only help on the second run through).
> >
> 
> pyFFT (and my toy code) only allocate memory at the start. Otherwise
> we would not see overlap in the "Working.py".

Wild theory: Maybe the print statements introduce GPU synchronization?
Does your observation change with multiple loops through the code?

Also note that the profiler won't help you debug overlap. If it is
active, all GPU activity is synchronous.

Andreas

Attachment: pgpq7C1HuT4C0.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to