Hi,

Thank you for the response.

As a followup question, I was looking at the underlying code for
ipc_mem_handle, and it seems like when a handle is deleted, it tries to do
a mem_free on the underlying device pointer.

So could there not be a situation as follows:

1.  Process A allocates gpuarray G and passes IPC handle H to Process B
2.  Process B unpacks H, does something with it, then lets it go out of
scope, so H is closed and the underlying gpu mem is freed
3.  Process A then tries to do something on G but then finds that its
memory has already been freed

-Alex


On Fri, May 1, 2015 at 9:34 AM, Andreas Kloeckner <li...@informa.tiker.net>
wrote:

> Alex Park <a...@nervanasys.com> writes:
>
> > Hi,
> >
> > I'm trying to use multiple gpus with mpi and ipc handles instead of the
> > built-in mpi primitives to p2p communication.
> >
> > I think I'm not quite understanding how contexts should be managed.  For
> > example, I have two versions of a toy example to try out accessing data
> > between nodes via ipc handle.  Both seem to work, in the sense that
> process
> > 1 can 'see' the data from process 0, but the first version completes
> > without any error, while the second version generates the following
> error:
> >
> > PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> >
> >
> >        │···········
> > cuMemFree failed: invalid value
> >
> > The two versions are attached below.  Would appreciate any insight as to
> > what I'm doing wrong.
>
> Context management in CUDA is a bit of a mess. In particular, resources
> in a given context cannot be freed if that context isn't (or can't be
> made) the active context in the current thread. Your Context.pop atexit
> makes sure that PyCUDA can select contexts when it tries to do clean-up,
> which may (because of MPI) run in a different thread than the one that
> does the bulk of the work.
>
> That's my best guess as to what's going on. tl;dr: The Context.pop() is
> the main difference.
>
> Andreas
>



-- 
*Alex Park, PhD *
*Engineer*

*Making machines smarter.*
*Nervana Systems* | nervanasys.com | (617) 283-6951
6440 Lusk Blvd. #D211, San Diego, CA 92121
2483 Old Middlefield Way #203, Mountain View, CA 94043
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to