Gianmario,

Iirc, there is one pipe between orted and each children stderr.
stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0
This is the way stdout/stderr from tasks end up being printed by mpirun : orted 
does i/o forwarding (aka IOF)

are you trying to migrate only one task (and other tasks still run) or are you 
trying to checkpoint and restart on a different set of nodes ?

Typically, a task uses shared memory for intra node communications, and 
infiniband or tcp for inter node communications.
So if you migrate only one task, and i assume you have no virtual shared 
memory, then you need to notify its neighbors they have to switch from shm to 
ib/tcp.
At first glance, that is much harder than moving orted and its children :
You would "only" have to re-establish all connections and migrate the shm.
Also, orted assumes/need its children are running on the same node, (they use a 
session dir in /tmp, orted waits SIGCHLD when its child dies,...) so if you 
migrate everything, you do not have to worry about that part.

You might also want to consider some virtualization :
If a node is running in its own vm, or its own container with a virtual ip, you 
could reuse existing infrastructure at least to migrate orted and its tcp/ip 
connections

Cheers,

Gilles

Federico Reghenzani <federico1.reghenz...@mail.polimi.it> wrote:
>Hi Adrian and Gilles,
>
>
>first of all thank you for your responses. I'm working with Gianmario on this 
>ambitious project.
>
>
>2015-10-22 13:17 GMT+02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com>:
>
>Gianmario,
>
>
>there was c/r support in the v1.6 series but it has been removed.
>
>the current trend is to do application level checkpointing
>
>(much more efficient and much smaller checkpoint file size)
>
>
>iirc, ompi took care of closing/restoring all communication, and a third party 
>checkpoint was required to checkpoint/restart *standalone* processes.
>
>
>generally speaking, mpirun and orted communicate via tcp
>
>orted and MPI (intra node comms) currently use tcp but we are moving to unix 
>sockets
>
>MPI tasks communicate via btl (infiniband, tcp, shared memory, ...)
>
>
>
>We have also seen that orted opens 2 pipe to each child, is it correct? Does 
>orted use them to communicate with children?  
>
>
> 
>
>imho, moving only one MPI task to an other node is much harder, not to say 
>impossible, than moving orted and its children MPI tasks to an other node
>
>
>
>Mmm, I can ask you why? I mean, if we migrate the entire orted we need to 
>close/reopen mpirun-orted and task-task (btl) sockets, and if we migrate the 
>single task we need to close/reopen orte-task and task-task sockets. In both 
>cases we have to broadcast the information of "changing location" of the task 
>or orted.
>
>
> 
>
>Cheers,
>
>
>Gilles
>
>
>
>On Thursday, October 22, 2015, Gianmario Pozzi <pozzigma...@gmail.com> wrote:
>
>Hi everyone!
>
>
>My team and I are working on the possibility to checkpoint a process and 
>restarting it on another node. We are using CRIU framework for the 
>checkpoint/restart part, but we are facing some issues related to migration.
>
>
>First of all: we found out that some attempts to C/R an OMPI process have been 
>already made in the past. Is anything related to that still 
>supported/available/working?
>
>
>Then, we need to know which network communications are used at any time, in 
>order to "pause" them during migrations (at least the ones involving the 
>migrating node). Our code analysis makes us think that:
>
>-OpenMPI runtime (HNP<->orteds) uses orte/OOB
>
>-Running applications exchange data via ompi/BTL
>
>
>Is that correct? If not, can someone give us a hint?
>
>
>Questions on how to update topology info may be yet to come.
>
>
>Thank you guys!
>
>
>Gianmario
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2015/10/18242.php
>
>
>
>Cheers,
>Federico
>
>__
>
>Federico Reghenzani
>
>M.Eng. Student @ Politecnico di Milano
>
>Computer Science and Engineering
>
>

Reply via email to