On Mon, 21 Mar 2011 17:43:44 +0100, Magnus Paulsson <paulsso...@gmail.com> 
wrote:
> > Looking at your streams.py code, I'm wondering why you're expecting
> > things to run in parallel if your synchronizing with both stream1 and
> > stream2 after you're done with each of them? Wouldn't that explicitly
> > prevent any parallelism between them?
> >
> > What am I missing?
> >
> > Andreas
> 
> The syncronization was made to run the two streams as follows:
> [start new task 1 : sync 2 : start new task 2 : sync 1] repeat
> 
> Notice that I did not sync with the newly started task but with the
> previous one.
> 
> 
> To make things clearer I attach two short tests where one runs with
> overlapping mem-copy and exec but the other doesn't.
> Easiest seen in the compute visual profiler (turn off most of the data
> collection so it only runs once, otherwise it runs 12 times).
> 
>   notWorking:
> stream 1: put
> stream 1: exec
> stream 1: get
> stream 2: put
> stream 2: exec
> stream 2: get
> 
> 
>   Working:
> stream 1: put
> stream 1: exec
> stream 2: put
> stream 1: get
> stream 2: exec
> stream 2: get
> 
> 
> So it seems that CUDA does not work the way I was expecting it to. Can
> someone explain why CUDA does not start the stream 2 put operation
> before the stream 1 get operation in the notWorking.py?

If by 'working' you mean 'actually overlapping', here's an additional
subtlety. If 'exec' includes any kind of memory allocations, those are
implicitly synchronization points--so you might be synchronizing without
even seeing it. A memory pool would be a good solution for that (but
would only help on the second run through).

If however 'not working' means 'wrong results', then something's even
more fishy.

Andreas

Attachment: pgp33XtmScyER.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to