[OMPI devel] Changing BTLs at runtime

Christoph Konersmann Tue, 23 Mar 2010 04:02:56 -0400

Hello all,

It was long ago where I've asked about hints to implement a dynamic BTLcontrol. I've currently managed to change the MPI communication pathfrom a BTL module (e.g. openib) to another BTL module (e.g. tcp) atruntime of a distributed application.

For this I've developed a so called BTL Control Client (orte-btlctl) tosend control messages to all processes through the ORTE RML. Thesemessages are received and processed in the OMPI BML. In BML I'veimplemented a function to stop the MPI communication and another forchanging the BTL exclusivity and recalculating the btl_{send,eager,rdma}lists. All is done at runtime so a distributed application running withOpen MPI is not affected in its computation.

I also managed to unload a module not used anymore, e.g. openib afterchanging the MPI communication to tcp, through the already implementedfunction mca_bml_r2_del_btl(mca_btl_base_module_t* btl).


The Question:

The function to (re)initialise a BTL module"mca_bml_r2_add_btl(mca_btl_base_module_t* btl)" is currently notimplemented. Why is it not implemented? And what has to be done if Iwant to implement it?

As far as I understood the internals of the OMPI Layer, for adding a BTLmodule you have to implement the following steps:

1. find the corresponding component in mca_btl_base_components_opened
2. Do component->btl_init to get an array of BTL modules
3. and add those to mca_btl_base_modules_initialized

4. Iterate through mca_btl_base_modules_initialized and add BTL moduleto mca_bml_r2.btl_modules in bml_r25. Add BTL module to btl_{send,eager,rdma} (if applicable) for allreachable procs


Am I missing something?

The Background:

I should give some background, why I'm implementing this. Changing theMPI communication from a high speed network to a network withflowcontrol (openib->tcp) is necessary for checkpointing distributedapplications in virtual machines. Ok, you are able to checkpoint throughthe FT-Framework and BLCR in Open MPI, but virtual machines alreadyprovide trivial functions for checkpointing. As you are not able tocheckpoint the hardware information of e.g. openib you have to get ridof it in case of a checkpoint, and change back again on resume/continue.

Would such feature/support generally be interesting for you? Theimplementation will be made publicly available on bitbucket until end ofmarch.


Thoughts? Suggestions? Or hints? :)
Thanks a lot,

Christoph Konersmann

--
Paderborn Center for Parallel Computing - PC2
University of Paderborn - Germany
http://www.pc2.de

Christoph Konersmann <c...@upb.de>

[OMPI devel] Changing BTLs at runtime

Reply via email to