Thanks; I'd seen the note about --cuda-trace in old release notes while ago, but since forgotten about it entirely. Cool.
Here's what I'm seeing: cuParamSetv (mykernel) cuParamSetSize (mykernel) cuMemcpyHtoD cuMemcpyHtoD cuMemcpyHtoD cuMemcpyHtoD cuMemcpyHtoD cuMemcpyHtoD cuMemcpyHtoD cuParamSetTexRef (mykernel) cuParamSetTexRef (mykernel) cuLaunchGrid (mykernel) cuCtxSynchronize And then the active threads are: a 1 <_MainThread(MainThread, started 140345423361824)> a 1 <CudaThread(Thread-25, started 140345099409152)> ... Does this mean that the cuCtxSynchronize function was *called* or that it *returned*? I'm assuming that it was called and is now blocking on the kernel, but I want to make sure before I dive in. Thanks for all of the help. Eli On Fri, Mar 9, 2012 at 12:37 PM, Andreas Kloeckner <li...@informa.tiker.net> wrote: > <#part sign=pgpmime> > On Fri, 9 Mar 2012 10:42:50 -0800, "Eli Stevens (Gmail)" > <wickedg...@gmail.com> wrote: >> Thanks for all of the pointers. I'm going to hold off on MPI for now; >> I'm hesitant to add additional dependencies unless they're really >> needed. >> >> The threading solution as outlined here: >> >> http://stackoverflow.com/questions/5904872/python-multiprocessing-with-pycuda >> >> Seems to be working well, except for the following oddity: >> >> - When I run my test suite (which is now using the threading approach) >> in sections, every test passes. >> - When I run the entire set of tests for this feature, around thread >> 20 or 30 the thread spun up to wrap the kernel call never finishes >> (ie. Thread.isActive() >> is always true; join never returns). Changing the order that the >> tests are run in changes what test this happens in; I haven't yet >> determined if the failing test is arbitrary but deterministic (it >> seems like it is, but the sample size of test runs is small so far) or >> random. >> >> The main thread seems to be fine, but the cuda wrapper thread is a >> mystery beyond that it's getting to the log message right before the >> pycuda.driver.Function.__call__ and not to the one after (just started >> debugging last night, so I haven't dug in a huge amount yet). >> >> From a quick perusal of the python source, it doesn't seem like >> there's a python logger for pycuda internals; what's the recommended >> way to tell what's going on in pycuda outside of the kernel? I'd like >> to know if it's making into and/or out of the kernel before turning on >> the cuda debugger (since there's maybe 45 kernel calls before the >> hang, and it already takes maybe 5 minutes when going fill tilt to get >> there). >> >> Would you be interested in a pull request that added logging at the >> python level? > > You can build PyCUDA with CUDA API tracing (CUDA_TRACE = True). That > should do most of what you want. You might need to have it dump out the > thread id along with the API call to make sense of it all, but in > principle I think that might give you what you want. > > Andreas _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda