On Wed, Apr 04, 2012 at 04:35:21PM +0300, Tomi Pieviläinen wrote: > Hi all, > > after profiling my script, pstats shows that significant amount of > time is spent in a function that only calls > pycuda.autoinit.context.synchronize() and few cuda kernel calls > depending on a setting (different calls in brances of if). > > Is it really possible, that most of the time is spent processing that > if, or are syncing or gpu-function calls somehow skipped in cProfile? > The relevant line from pstats is > > 239528 1065.698 0.004 1089.941 0.005 translocation.py:308(forces)
Well, I ran it through the line_profiler, and it seems like 99% of the time is spent on synchronization calls. Weird that the syncthreads within the kernel code doesn't cause much delays (I'm running only one block, so they should be equivalent, right?). -- Tomi Pieviläinen, +358 400 487 504 A: Because it disrupts the natural way of thinking. Q: Why is top posting frowned upon?
signature.asc
Description: Digital signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda