Alex Park <a...@nervanasys.com> writes:

> Hi,
>
> I'm trying to use multiple gpus with mpi and ipc handles instead of the
> built-in mpi primitives to p2p communication.
>
> I think I'm not quite understanding how contexts should be managed.  For
> example, I have two versions of a toy example to try out accessing data
> between nodes via ipc handle.  Both seem to work, in the sense that process
> 1 can 'see' the data from process 0, but the first version completes
> without any error, while the second version generates the following error:
>
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
>
>
>        │···········
> cuMemFree failed: invalid value
>
> The two versions are attached below.  Would appreciate any insight as to
> what I'm doing wrong.

Context management in CUDA is a bit of a mess. In particular, resources
in a given context cannot be freed if that context isn't (or can't be
made) the active context in the current thread. Your Context.pop atexit
makes sure that PyCUDA can select contexts when it tries to do clean-up,
which may (because of MPI) run in a different thread than the one that
does the bulk of the work.

That's my best guess as to what's going on. tl;dr: The Context.pop() is
the main difference.

Andreas

Attachment: signature.asc
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to