Thanks; I'd seen the note about --cuda-trace in old release notes
while ago, but since forgotten about it entirely.  Cool.

Here's what I'm seeing:

cuParamSetv (mykernel)
cuParamSetSize (mykernel)
cuMemcpyHtoD
cuMemcpyHtoD
cuMemcpyHtoD
cuMemcpyHtoD
cuMemcpyHtoD
cuMemcpyHtoD
cuMemcpyHtoD
cuParamSetTexRef (mykernel)
cuParamSetTexRef (mykernel)
cuLaunchGrid (mykernel)
cuCtxSynchronize

And then the active threads are:

a 1 <_MainThread(MainThread, started 140345423361824)>
a 1 <CudaThread(Thread-25, started 140345099409152)>
...

Does this mean that the cuCtxSynchronize function was *called* or that
it *returned*?  I'm assuming that it was called and is now blocking on
the kernel, but I want to make sure before I dive in.

Thanks for all of the help.

Eli

On Fri, Mar 9, 2012 at 12:37 PM, Andreas Kloeckner
<li...@informa.tiker.net> wrote:
> <#part sign=pgpmime>
> On Fri, 9 Mar 2012 10:42:50 -0800, "Eli Stevens (Gmail)" 
> <wickedg...@gmail.com> wrote:
>> Thanks for all of the pointers.  I'm going to hold off on MPI for now;
>> I'm hesitant to add additional dependencies unless they're really
>> needed.
>>
>> The threading solution as outlined here:
>>
>> http://stackoverflow.com/questions/5904872/python-multiprocessing-with-pycuda
>>
>> Seems to be working well, except for the following oddity:
>>
>> - When I run my test suite (which is now using the threading approach)
>> in sections, every test passes.
>> - When I run the entire set of tests for this feature, around thread
>> 20 or 30 the thread spun up to wrap the kernel call never finishes
>> (ie. Thread.isActive()
>>  is always true; join never returns).  Changing the order that the
>> tests are run in changes what test this happens in; I haven't yet
>> determined if the failing test is arbitrary but deterministic (it
>> seems like it is, but the sample size of test runs is small so far) or
>> random.
>>
>> The main thread seems to be fine, but the cuda wrapper thread is a
>> mystery beyond that it's getting to the log message right before the
>> pycuda.driver.Function.__call__ and not to the one after (just started
>> debugging last night, so I haven't dug in a huge amount yet).
>>
>> From a quick perusal of the python source, it doesn't seem like
>> there's a python logger for pycuda internals; what's the recommended
>> way to tell what's going on in pycuda outside of the kernel?  I'd like
>> to know if it's making into and/or out of the kernel before turning on
>> the cuda debugger (since there's maybe 45 kernel calls before the
>> hang, and it already takes maybe 5 minutes when going fill tilt to get
>> there).
>>
>> Would you be interested in a pull request that added logging at the
>> python level?
>
> You can build PyCUDA with CUDA API tracing (CUDA_TRACE = True). That
> should do most of what you want. You might need to have it dump out the
> thread id along with the API call to make sense of it all, but in
> principle I think that might give you what you want.
>
> Andreas

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to