Thank you.

And, to workaround, is it possible to temporary suspend processes on a node
and later resume it (requested by RM)?  I saw in the code that orted can
receive SIGTSTP and SIGCONT to suspend/resume processes.


Cheers,
Federico Reghenzani

2015-04-10 16:58 GMT+02:00 Ralph Castain <r...@open-mpi.org>:

> I’m afraid not. The MPI job would not be very happy to suddenly lose some
> nodes during execution, and relocating MPI processes during execution is
> something we don’t currently support.
>
> There is work underway to integrate the RM more fully into that procedure
> so it could tell the MPI job to checkpoint, wait until that completed,
> terminate the job, and then fast-restart it on the new nodes - but that
> isn’t here yet.
>
>
> On Apr 10, 2015, at 7:54 AM, Federico Reghenzani <
> federico1.reghenz...@mail.polimi.it> wrote:
>
> The RM can ask for deallocation of some nodes?
>
> For example, mpirun asks to the RM which resources are available (let
> node1, node2, node3) and spawns orted in the nodes. After some time during
> the elaboration, can the RM ask to deassign node3 or  reassign jobs on
> node3 to node4?
>
> Cheers,
> Federico Reghenzani
>
> 2015-03-26 18:09:22 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>:
>
> P.S. also check ESS (orte/mca/ess) for environment setup.
> 2015-03-26 18:06 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>:
> >
> > 2015-03-26 17:58 GMT+06:00 Gianmario Pozzi <pozzigmario_at_[hidden]>:
> >
> >> Hi everyone,
> >> I'm an italian M.Sc. student in Computer Engineering at Politecnico di
> >> Milano.
> >>
> >> My team and I are trying to integrate OpenMPI with a real time resource
> >> manager written by a group of students named BBQ (
> >> http://bosp.dei.polimi.it/ ). We are encountering some troubles,
> though.
> >>
> >> Our main issue is to understand how ORTE interacts with the resource
> >> manager, which parts of the code (if any) are executed on the "slave"
> nodes
> >> and which ones on the "master".
> >> We spent some time looking at the source code but we still have many
> >> doubts.
> >>
> >
> > Hello,
> > check orte/mca/plm and orte/mca/ras
> > PLM - process lifecycle manager
> > RAS - resource allocation subsystem.
> >
> > In RAS mpirun detects under which RM it works and gets the allocation.
> > in PLM spawn of remote processes is done.
> > mpirun spawns orted daemons on the slave nodes and all the rest is done
> > without RM intervention (IMHO).
> >
> >
> >>
> >> Thank you.
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2015/03/17157.php
> >>
> >
> >
> >
> > --
> > С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич
> > Best regards, Artem Y. Polyakov
> >
>
>
>  --
>> С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич
>> Best regards, Artem Y. Polyakov
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/04/17210.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/04/17211.php
>

Reply via email to