You've got this a tad wrong, but that's okay - let me try to clarify a couple 
of things that may help.

First, you don't want to add this as a separate orted command. As you noted, 
orte has no direct way to tell the OMPI layer to do anything. Instead, you want 
to pass a message to the process that is received in the OMPI layer. That is 
easy to do.

1. add a message tag in ompi/mca/dpm/dpm.h - perhaps something like 
OMPI_RML_TAG_BTL_CTL

2. in the btl, add a call to orte_rml.recv_nb() that identifies the above tag 
and specifies a callback function to use when such a message arrives

3. in that callback function, toggle your "paused" flag - or you can unpack the 
buffer to get a flag telling you what value to set. Your choice.

Now, when you want to pause the BTL, you do an orte_grpcomm.xcast() to the 
above message tag. ORTE will deliver that message to every process, which will 
then have its callback function called.

HTH
Ralph

On Jan 8, 2010, at 9:03 AM, Christoph Konersmann wrote:

> Hi again,
> 
> Maybe I should give more specific information with some code snippets...
> 
> Currently I added
> #define ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26
> to odls_types.h to identify if I want to trigger the BTL pause.
> 
> In process_commands() of orted/orted_comm.c this flag is processed first by 
> broadcasting to all orteds with xcast of the grpcomm framework. At second 
> it's forwarded with orte_odls.deliver_message to the local procs.  So every 
> process should get the trigger. Or is there another possibly easier way of 
> spawning the trigger?
> 
> I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator if 
> pause is set.
> struct mca_btl_base_module_t {
> [...]
>    bool        btl_paused;
> [...]
> };
> 
> I then added a line to the initial values in every BTL component that 
> btl_paused should be false by default. E.g. in self/btl_self.c:
> mca_btl_base_module_t mca_btl_self = {
> [...]
>    false, /* btl_paused */
> [...]
> };
> Or did I forget something?
> 
> So my problem is now, when every process gets the trigger in the ORTE 
> project, how could I set btl->paused to true in OMPI project? ORTE has not 
> (and I know it should not) have access to the OMPI components. Is there a way 
> of implementing a libevent callback function in the BTL modules? Or is there 
> another way? I already read the documentation at your wiki-site, but for me 
> it's not really trivial as I'm relatively new to this.
> 
> An idea to get the connection to the OMPI project would be to use the 
> ft_event framework. Therefore I added another opal_crs_state_type_t 
> OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c 
> with:
> if( NULL != orte_ess.ft_event ) {
>    if( ORTE_SUCCESS != (ret = orte_ess.ft_event(OPAL_CRS_PAUSE))) {
>        goto CLEANUP;
>    }
> }
> But the ft_event() is NULL and therefore isn't executed...
> 
> Any ideas? Any advices?
> 
> For me the performance impact of a solution is of no interest.
> 
> Thanks, and please excuse me if I bother you with this.
> 
> Christoph
> 
> Christoph Konersmann schrieb:
>> Hi all,
>> I'm trying to implement a method to pause all BTL's sending packets to their 
>> destinations.
>> Currently I added a state variable to orte_process_info which will be 
>> changed with an external program through process_commands() in 
>> orte/orted/orted_comm.c (I hope it's processed globaly not locally). While 
>> this state is changed to something defined as PAUSE, I want the send_methods 
>> in PML-Layer to be halted omitting any network traffic. By now it's not 
>> working, cause the PML-Layer does not see the state change.
>> Another way would be to use a libevent thread on the bml/pml-level. I've 
>> read that this library is already supported/implemented, or am I wrong? How 
>> would I use libevent in this context? Does somebody have an example or hint? 
>> Or should I use the fault tolerance framework for this purpose?
>> Any help would be appreciated. thanks
> 
> -- 
> Paderborn Center for Parallel Computing - PC2
> University of Paderborn - Germany
> http://www.pc2.de
> 
> Christoph Konersmann <c...@upb.de>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to