Hello all,
It was long ago where I've asked about hints to implement a dynamic BTL
control. I've currently managed to change the MPI communication path
from a BTL module (e.g. openib) to another BTL module (e.g. tcp) at
runtime of a distributed application.
For this I've developed a so called BTL Control Client (orte-btlctl) to
send control messages to all processes through the ORTE RML. These
messages are received and processed in the OMPI BML. In BML I've
implemented a function to stop the MPI communication and another for
changing the BTL exclusivity and recalculating the btl_{send,eager,rdma}
lists. All is done at runtime so a distributed application running with
Open MPI is not affected in its computation.
I also managed to unload a module not used anymore, e.g. openib after
changing the MPI communication to tcp, through the already implemented
function mca_bml_r2_del_btl(mca_btl_base_module_t* btl).
The Question:
The function to (re)initialise a BTL module
"mca_bml_r2_add_btl(mca_btl_base_module_t* btl)" is currently not
implemented. Why is it not implemented? And what has to be done if I
want to implement it?
As far as I understood the internals of the OMPI Layer, for adding a BTL
module you have to implement the following steps:
1. find the corresponding component in mca_btl_base_components_opened
2. Do component->btl_init to get an array of BTL modules
3. and add those to mca_btl_base_modules_initialized
4. Iterate through mca_btl_base_modules_initialized and add BTL module
to mca_bml_r2.btl_modules in bml_r2
5. Add BTL module to btl_{send,eager,rdma} (if applicable) for all
reachable procs
Am I missing something?
The Background:
I should give some background, why I'm implementing this. Changing the
MPI communication from a high speed network to a network with
flowcontrol (openib->tcp) is necessary for checkpointing distributed
applications in virtual machines. Ok, you are able to checkpoint through
the FT-Framework and BLCR in Open MPI, but virtual machines already
provide trivial functions for checkpointing. As you are not able to
checkpoint the hardware information of e.g. openib you have to get rid
of it in case of a checkpoint, and change back again on resume/continue.
Would such feature/support generally be interesting for you? The
implementation will be made publicly available on bitbucket until end of
march.
Thoughts? Suggestions? Or hints? :)
Thanks a lot,
Christoph Konersmann
--
Paderborn Center for Parallel Computing - PC2
University of Paderborn - Germany
http://www.pc2.de
Christoph Konersmann <c...@upb.de>