Hi, Thank you for the response.
As a followup question, I was looking at the underlying code for ipc_mem_handle, and it seems like when a handle is deleted, it tries to do a mem_free on the underlying device pointer. So could there not be a situation as follows: 1. Process A allocates gpuarray G and passes IPC handle H to Process B 2. Process B unpacks H, does something with it, then lets it go out of scope, so H is closed and the underlying gpu mem is freed 3. Process A then tries to do something on G but then finds that its memory has already been freed -Alex On Fri, May 1, 2015 at 9:34 AM, Andreas Kloeckner <li...@informa.tiker.net> wrote: > Alex Park <a...@nervanasys.com> writes: > > > Hi, > > > > I'm trying to use multiple gpus with mpi and ipc handles instead of the > > built-in mpi primitives to p2p communication. > > > > I think I'm not quite understanding how contexts should be managed. For > > example, I have two versions of a toy example to try out accessing data > > between nodes via ipc handle. Both seem to work, in the sense that > process > > 1 can 'see' the data from process 0, but the first version completes > > without any error, while the second version generates the following > error: > > > > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > > > > > > │··········· > > cuMemFree failed: invalid value > > > > The two versions are attached below. Would appreciate any insight as to > > what I'm doing wrong. > > Context management in CUDA is a bit of a mess. In particular, resources > in a given context cannot be freed if that context isn't (or can't be > made) the active context in the current thread. Your Context.pop atexit > makes sure that PyCUDA can select contexts when it tries to do clean-up, > which may (because of MPI) run in a different thread than the one that > does the bulk of the work. > > That's my best guess as to what's going on. tl;dr: The Context.pop() is > the main difference. > > Andreas > -- *Alex Park, PhD * *Engineer* *Making machines smarter.* *Nervana Systems* | nervanasys.com | (617) 283-6951 6440 Lusk Blvd. #D211, San Diego, CA 92121 2483 Old Middlefield Way #203, Mountain View, CA 94043
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda