Thank you. And, to workaround, is it possible to temporary suspend processes on a node and later resume it (requested by RM)? I saw in the code that orted can receive SIGTSTP and SIGCONT to suspend/resume processes.
Cheers, Federico Reghenzani 2015-04-10 16:58 GMT+02:00 Ralph Castain <r...@open-mpi.org>: > I’m afraid not. The MPI job would not be very happy to suddenly lose some > nodes during execution, and relocating MPI processes during execution is > something we don’t currently support. > > There is work underway to integrate the RM more fully into that procedure > so it could tell the MPI job to checkpoint, wait until that completed, > terminate the job, and then fast-restart it on the new nodes - but that > isn’t here yet. > > > On Apr 10, 2015, at 7:54 AM, Federico Reghenzani < > federico1.reghenz...@mail.polimi.it> wrote: > > The RM can ask for deallocation of some nodes? > > For example, mpirun asks to the RM which resources are available (let > node1, node2, node3) and spawns orted in the nodes. After some time during > the elaboration, can the RM ask to deassign node3 or reassign jobs on > node3 to node4? > > Cheers, > Federico Reghenzani > > 2015-03-26 18:09:22 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>: > > P.S. also check ESS (orte/mca/ess) for environment setup. > 2015-03-26 18:06 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>: > > > > 2015-03-26 17:58 GMT+06:00 Gianmario Pozzi <pozzigmario_at_[hidden]>: > > > >> Hi everyone, > >> I'm an italian M.Sc. student in Computer Engineering at Politecnico di > >> Milano. > >> > >> My team and I are trying to integrate OpenMPI with a real time resource > >> manager written by a group of students named BBQ ( > >> http://bosp.dei.polimi.it/ ). We are encountering some troubles, > though. > >> > >> Our main issue is to understand how ORTE interacts with the resource > >> manager, which parts of the code (if any) are executed on the "slave" > nodes > >> and which ones on the "master". > >> We spent some time looking at the source code but we still have many > >> doubts. > >> > > > > Hello, > > check orte/mca/plm and orte/mca/ras > > PLM - process lifecycle manager > > RAS - resource allocation subsystem. > > > > In RAS mpirun detects under which RM it works and gets the allocation. > > in PLM spawn of remote processes is done. > > mpirun spawns orted daemons on the slave nodes and all the rest is done > > without RM intervention (IMHO). > > > > > >> > >> Thank you. > >> > >> _______________________________________________ > >> devel mailing list > >> devel_at_[hidden] > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > >> http://www.open-mpi.org/community/lists/devel/2015/03/17157.php > >> > > > > > > > > -- > > С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич > > Best regards, Artem Y. Polyakov > > > > > -- >> С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич >> Best regards, Artem Y. Polyakov > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/04/17210.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/04/17211.php >