On Mon, 21 Mar 2011 17:43:44 +0100, Magnus Paulsson <paulsso...@gmail.com> wrote: > > Looking at your streams.py code, I'm wondering why you're expecting > > things to run in parallel if your synchronizing with both stream1 and > > stream2 after you're done with each of them? Wouldn't that explicitly > > prevent any parallelism between them? > > > > What am I missing? > > > > Andreas > > The syncronization was made to run the two streams as follows: > [start new task 1 : sync 2 : start new task 2 : sync 1] repeat > > Notice that I did not sync with the newly started task but with the > previous one. > > > To make things clearer I attach two short tests where one runs with > overlapping mem-copy and exec but the other doesn't. > Easiest seen in the compute visual profiler (turn off most of the data > collection so it only runs once, otherwise it runs 12 times). > > notWorking: > stream 1: put > stream 1: exec > stream 1: get > stream 2: put > stream 2: exec > stream 2: get > > > Working: > stream 1: put > stream 1: exec > stream 2: put > stream 1: get > stream 2: exec > stream 2: get > > > So it seems that CUDA does not work the way I was expecting it to. Can > someone explain why CUDA does not start the stream 2 put operation > before the stream 1 get operation in the notWorking.py?
If by 'working' you mean 'actually overlapping', here's an additional subtlety. If 'exec' includes any kind of memory allocations, those are implicitly synchronization points--so you might be synchronizing without even seeing it. A memory pool would be a good solution for that (but would only help on the second run through). If however 'not working' means 'wrong results', then something's even more fishy. Andreas
pgp33XtmScyER.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda