Gianmario, Iirc, there is one pipe between orted and each children stderr. stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0 This is the way stdout/stderr from tasks end up being printed by mpirun : orted does i/o forwarding (aka IOF)
are you trying to migrate only one task (and other tasks still run) or are you trying to checkpoint and restart on a different set of nodes ? Typically, a task uses shared memory for intra node communications, and infiniband or tcp for inter node communications. So if you migrate only one task, and i assume you have no virtual shared memory, then you need to notify its neighbors they have to switch from shm to ib/tcp. At first glance, that is much harder than moving orted and its children : You would "only" have to re-establish all connections and migrate the shm. Also, orted assumes/need its children are running on the same node, (they use a session dir in /tmp, orted waits SIGCHLD when its child dies,...) so if you migrate everything, you do not have to worry about that part. You might also want to consider some virtualization : If a node is running in its own vm, or its own container with a virtual ip, you could reuse existing infrastructure at least to migrate orted and its tcp/ip connections Cheers, Gilles Federico Reghenzani <federico1.reghenz...@mail.polimi.it> wrote: >Hi Adrian and Gilles, > > >first of all thank you for your responses. I'm working with Gianmario on this >ambitious project. > > >2015-10-22 13:17 GMT+02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com>: > >Gianmario, > > >there was c/r support in the v1.6 series but it has been removed. > >the current trend is to do application level checkpointing > >(much more efficient and much smaller checkpoint file size) > > >iirc, ompi took care of closing/restoring all communication, and a third party >checkpoint was required to checkpoint/restart *standalone* processes. > > >generally speaking, mpirun and orted communicate via tcp > >orted and MPI (intra node comms) currently use tcp but we are moving to unix >sockets > >MPI tasks communicate via btl (infiniband, tcp, shared memory, ...) > > > >We have also seen that orted opens 2 pipe to each child, is it correct? Does >orted use them to communicate with children? > > > > >imho, moving only one MPI task to an other node is much harder, not to say >impossible, than moving orted and its children MPI tasks to an other node > > > >Mmm, I can ask you why? I mean, if we migrate the entire orted we need to >close/reopen mpirun-orted and task-task (btl) sockets, and if we migrate the >single task we need to close/reopen orte-task and task-task sockets. In both >cases we have to broadcast the information of "changing location" of the task >or orted. > > > > >Cheers, > > >Gilles > > > >On Thursday, October 22, 2015, Gianmario Pozzi <pozzigma...@gmail.com> wrote: > >Hi everyone! > > >My team and I are working on the possibility to checkpoint a process and >restarting it on another node. We are using CRIU framework for the >checkpoint/restart part, but we are facing some issues related to migration. > > >First of all: we found out that some attempts to C/R an OMPI process have been >already made in the past. Is anything related to that still >supported/available/working? > > >Then, we need to know which network communications are used at any time, in >order to "pause" them during migrations (at least the ones involving the >migrating node). Our code analysis makes us think that: > >-OpenMPI runtime (HNP<->orteds) uses orte/OOB > >-Running applications exchange data via ompi/BTL > > >Is that correct? If not, can someone give us a hint? > > >Questions on how to update topology info may be yet to come. > > >Thank you guys! > > >Gianmario > > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2015/10/18242.php > > > >Cheers, >Federico > >__ > >Federico Reghenzani > >M.Eng. Student @ Politecnico di Milano > >Computer Science and Engineering > >