2015-10-26 8:04 GMT+01:00 Gilles Gouaillardet <gil...@rist.or.jp>:

> Federico,
>
> that looks good to me.
> the image does not show the channel between orded and its children.
> this is a currently a TCP socket (v1.10) and we are moving to Unix socket
> (already in master)
>
>
Which is the framework involved in this communication? I'm not sure what
this channel is used for.


Cheers,
>
> Gilles
>
>
> On 10/26/2015 3:28 PM, Federico Reghenzani wrote:
>
> Hi Gilles,
> t​​hank you again for your great answer. Our idea is to migrate tasks
> between nodes, possibly individually, and other tasks still run (obviously,
> if they want to communicate with "migrating" node, we should pause them).
>
>
> Just to be sure if we have understood correctly, is the attached image
> exact?
>
> Cheers,
> Federico
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
>
>
>
> 2015-10-23 11:45 GMT+02:00 Gilles Gouaillardet <
> <gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com>:
>
>> Gianmario,
>>
>> Iirc, there is one pipe between orted and each children stderr.
>> stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0
>> This is the way stdout/stderr from tasks end up being printed by mpirun :
>> orted does i/o forwarding (aka IOF)
>>
>> are you trying to migrate only one task (and other tasks still run) or
>> are you trying to checkpoint and restart on a different set of nodes ?
>>
>> Typically, a task uses shared memory for intra node communications, and
>> infiniband or tcp for inter node communications.
>> So if you migrate only one task, and i assume you have no virtual shared
>> memory, then you need to notify its neighbors they have to switch from shm
>> to ib/tcp.
>> At first glance, that is much harder than moving orted and its children :
>> You would "only" have to re-establish all connections and migrate the shm.
>> Also, orted assumes/need its children are running on the same node, (they
>> use a session dir in /tmp, orted waits SIGCHLD when its child dies,...) so
>> if you migrate everything, you do not have to worry about that part.
>>
>> You might also want to consider some virtualization :
>> If a node is running in its own vm, or its own container with a virtual
>> ip, you could reuse existing infrastructure at least to migrate orted and
>> its tcp/ip connections
>>
>> Cheers,
>>
>> Gilles
>>
>> Federico Reghenzani <federico1.reghenz...@mail.polimi.it> wrote:
>> Hi Adrian and Gilles,
>>
>> first of all thank you for your responses. I'm working with Gianmario on
>> this ambitious project.
>>
>> 2015-10-22 13:17 GMT+02:00 Gilles Gouaillardet <
>> <gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com>:
>>
>>> Gianmario,
>>>
>>> there was c/r support in the v1.6 series but it has been removed.
>>> the current trend is to do application level checkpointing
>>> (much more efficient and much smaller checkpoint file size)
>>>
>>> iirc, ompi took care of closing/restoring all communication, and a third
>>> party checkpoint was required to checkpoint/restart *standalone* processes.
>>>
>>> generally speaking, mpirun and orted communicate via tcp
>>> orted and MPI (intra node comms) currently use tcp but we are moving to
>>> unix sockets
>>> MPI tasks communicate via btl (infiniband, tcp, shared memory, ...)
>>>
>>>
>> We have also seen that orted opens 2 pipe to each child, is it correct?
>> Does orted use them to communicate with children?
>>
>>
>>
>>> imho, moving only one MPI task to an other node is much harder, not to
>>> say impossible, than moving orted and its children MPI tasks to an other
>>> node
>>>
>>>
>> Mmm, I can ask you why? I mean, if we migrate the entire orted we need to
>> close/reopen *mpirun-orted* and *task-task* (btl) sockets, and if we
>> migrate the single task we need to close/reopen *orte-task* and
>> *task-task *sockets. In both cases we have to broadcast the information
>> of "changing location" of the task or orted.
>>
>>
>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Thursday, October 22, 2015, Gianmario Pozzi <pozzigma...@gmail.com>
>>> wrote:
>>>
>>>> Hi everyone!
>>>>
>>>> My team and I are working on the possibility to checkpoint a process
>>>> and restarting it on another node. We are using CRIU framework for the
>>>> checkpoint/restart part, but we are facing some issues related to 
>>>> migration.
>>>>
>>>> First of all: we found out that some attempts to C/R an OMPI process
>>>> have been already made in the past. Is anything related to that still
>>>> supported/available/working?
>>>>
>>>> Then, we need to know which network communications are used at any
>>>> time, in order to "pause" them during migrations (at least the ones
>>>> involving the migrating node). Our code analysis makes us think that:
>>>> -OpenMPI runtime (HNP<->orteds) uses orte/OOB
>>>> -Running applications exchange data via ompi/BTL
>>>>
>>>> Is that correct? If not, can someone give us a hint?
>>>>
>>>> Questions on how to update topology info may be yet to come.
>>>>
>>>> Thank you guys!
>>>>
>>>> Gianmario
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/10/18242.php
>>>
>>
>>
>> Cheers,
>> Federico
>> __
>> Federico Reghenzani
>> M.Eng. Student @ Politecnico di Milano
>> Computer Science and Engineering
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/10/18253.php
>>
>
>
>
> _______________________________________________
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18267.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/10/18268.php
>

Reply via email to