Each module has the opportunity to provide an ft_event function, that is
supposedly called when a change in the module behavior is necessary. Thus,
it is relatively easy to let the BTL knows about the fact that a particular
destination process will migrate to a new location.

  George.


On Fri, Oct 23, 2015 at 5:45 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Gianmario,
>
> Iirc, there is one pipe between orted and each children stderr.
> stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0
> This is the way stdout/stderr from tasks end up being printed by mpirun :
> orted does i/o forwarding (aka IOF)
>
> are you trying to migrate only one task (and other tasks still run) or are
> you trying to checkpoint and restart on a different set of nodes ?
>
> Typically, a task uses shared memory for intra node communications, and
> infiniband or tcp for inter node communications.
> So if you migrate only one task, and i assume you have no virtual shared
> memory, then you need to notify its neighbors they have to switch from shm
> to ib/tcp.
> At first glance, that is much harder than moving orted and its children :
> You would "only" have to re-establish all connections and migrate the shm.
> Also, orted assumes/need its children are running on the same node, (they
> use a session dir in /tmp, orted waits SIGCHLD when its child dies,...) so
> if you migrate everything, you do not have to worry about that part.
>
> You might also want to consider some virtualization :
> If a node is running in its own vm, or its own container with a virtual
> ip, you could reuse existing infrastructure at least to migrate orted and
> its tcp/ip connections
>
> Cheers,
>
> Gilles
>
> Federico Reghenzani <federico1.reghenz...@mail.polimi.it> wrote:
> Hi Adrian and Gilles,
>
> first of all thank you for your responses. I'm working with Gianmario on
> this ambitious project.
>
> 2015-10-22 13:17 GMT+02:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com>:
>
>> Gianmario,
>>
>> there was c/r support in the v1.6 series but it has been removed.
>> the current trend is to do application level checkpointing
>> (much more efficient and much smaller checkpoint file size)
>>
>> iirc, ompi took care of closing/restoring all communication, and a third
>> party checkpoint was required to checkpoint/restart *standalone* processes.
>>
>> generally speaking, mpirun and orted communicate via tcp
>> orted and MPI (intra node comms) currently use tcp but we are moving to
>> unix sockets
>> MPI tasks communicate via btl (infiniband, tcp, shared memory, ...)
>>
>>
> We have also seen that orted opens 2 pipe to each child, is it correct?
> Does orted use them to communicate with children?
>
>
>
>> imho, moving only one MPI task to an other node is much harder, not to
>> say impossible, than moving orted and its children MPI tasks to an other
>> node
>>
>>
> Mmm, I can ask you why? I mean, if we migrate the entire orted we need to
> close/reopen *mpirun-orted* and *task-task* (btl) sockets, and if we
> migrate the single task we need to close/reopen *orte-task* and
> *task-task *sockets. In both cases we have to broadcast the information
> of "changing location" of the task or orted.
>
>
>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Thursday, October 22, 2015, Gianmario Pozzi <pozzigma...@gmail.com>
>> wrote:
>>
>>> Hi everyone!
>>>
>>> My team and I are working on the possibility to checkpoint a process and
>>> restarting it on another node. We are using CRIU framework for the
>>> checkpoint/restart part, but we are facing some issues related to migration.
>>>
>>> First of all: we found out that some attempts to C/R an OMPI process
>>> have been already made in the past. Is anything related to that still
>>> supported/available/working?
>>>
>>> Then, we need to know which network communications are used at any time,
>>> in order to "pause" them during migrations (at least the ones involving the
>>> migrating node). Our code analysis makes us think that:
>>> -OpenMPI runtime (HNP<->orteds) uses orte/OOB
>>> -Running applications exchange data via ompi/BTL
>>>
>>> Is that correct? If not, can someone give us a hint?
>>>
>>> Questions on how to update topology info may be yet to come.
>>>
>>> Thank you guys!
>>>
>>> Gianmario
>>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/10/18242.php
>>
>
>
> Cheers,
> Federico
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/10/18253.php
>

Reply via email to