Thanks for all of the pointers.  I'm going to hold off on MPI for now;
I'm hesitant to add additional dependencies unless they're really
needed.

The threading solution as outlined here:

http://stackoverflow.com/questions/5904872/python-multiprocessing-with-pycuda

Seems to be working well, except for the following oddity:

- When I run my test suite (which is now using the threading approach)
in sections, every test passes.
- When I run the entire set of tests for this feature, around thread
20 or 30 the thread spun up to wrap the kernel call never finishes
(ie. Thread.isActive()
 is always true; join never returns).  Changing the order that the
tests are run in changes what test this happens in; I haven't yet
determined if the failing test is arbitrary but deterministic (it
seems like it is, but the sample size of test runs is small so far) or
random.

The main thread seems to be fine, but the cuda wrapper thread is a
mystery beyond that it's getting to the log message right before the
pycuda.driver.Function.__call__ and not to the one after (just started
debugging last night, so I haven't dug in a huge amount yet).

>From a quick perusal of the python source, it doesn't seem like
there's a python logger for pycuda internals; what's the recommended
way to tell what's going on in pycuda outside of the kernel?  I'd like
to know if it's making into and/or out of the kernel before turning on
the cuda debugger (since there's maybe 45 kernel calls before the
hang, and it already takes maybe 5 minutes when going fill tilt to get
there).

Would you be interested in a pull request that added logging at the
python level?

Cheers,
Eli

On Thu, Mar 8, 2012 at 3:18 PM, Andreas Kloeckner
<li...@informa.tiker.net> wrote:
> <#part sign=pgpmime>
> On Thu, 8 Mar 2012 16:04:49 -0600, David Mertens <dcmertens.p...@gmail.com> 
> wrote:
>> I never had the pleasure of working with multiple GPUs at once, but I
>> nonetheless have given some thought to how I might handle this sort of
>> situation. My plan would have been to use MPI to launch one process for
>> each GPU and shuttle the data between processes using Numpy's MPI bindings.
>> On Ubuntu, at least, it's possible to run MPI with multiple processes on a
>> single processor.
>>
>> Hope that helps, if you can't solve your original problem.
>
> I've even done that with PyCUDA [1], and it works as well as one might expect.
>
> [1] https://github.com/inducer/hedge
>
> Andreas
>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to