I like the idea. I do have some question not necessarily related to your proposal, but to how we can use the information you propose to expose.
I have a question regarding the extension of this concept to multi-BTL runs. Granted we will have to have a local indexing of BTL (I'm not concerned about this). But how do we ensure the naming is globally consistent (in the sense that all processes in the job will agree that usnic0 is index 0) even when we have a heterogeneous environment? As an example some of our clusters have 1 NIC on some nodes, and 2 on others. Of course we can say we don't guarantee consistent naming, but for tools trying to understand communication issues on distributed environments having a global view is a clear plus. Another question is about the level of details. I wonder if this level of details is really needed, or providing the aggregate pvar will be enough in most cases. The problem I see here is the lack of topological knowledge at the upper level. Seeing a large number of messages on a particular BTL might suggest that something is wrong inside the implementation, when in fact the BTL is the only one connecting a subset of peers. Without us exposing this information, I'm afraid the tool might get the wrong picture ... Thanks, George. On Tue, Nov 5, 2013 at 11:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > WHAT: suggestion for how to expose multiple MPI_T pvar values for a given > variable. > > WHY: so that we have a common convention across OMPI (and possibly set a > precedent for other MPI implementations...?). > > WHERE: ompi/mca/btl/usnic, but if everyone likes it, potentially elsewhere in > OMPI > > TIMEOUT: before 1.7.4, so let's set a first timeout of next Tuesday teleconf > (Nov 12) > > More detail: > ------------ > > Per my discussion on the call today, I'm sending the attached PPT of how > we're exposing MPI_T performance variables in the usnic BTL in the multi-BTL > case. > > Feedback is welcome, especially because we're the first MPI implementation to > expose MPI_T pvars in this way (already committed on the trunk and targeted > for 1.7.4). So this methodology may well become a useful precedent. > > ** Issue #1: we want to expose each usnic BTL pvar (e.g., > btl_usnic_num_sends) on a per-usnic-BTL-*module* basis. How to do this? > > 1. Add a prefix/suffix on each pvar name (e.g., btl_usnic_num_sends_0, > btl_usnic_num_sends_1, ...etc.). > 2. Return an array of values under the single name (btl_usnic_num_sends) -- > one value for each BTL module. > > We opted for the 2nd option. The MPI_T pvar interface provides a way to get > the array length for a pvar, so this is all fine and good. > > Specifically: btl_usnic_num_sends returns an array of N values, where N is > the number of usnic BTL modules being used by the MPI process. Each slot in > the array corresponds to the value from one usnic BTL module. > > ** Issue #2: but how do you map a given value to an underlying Linux usnic > interface? > > Our solution was twofold: > > 1. Guarantee that the ordering of values in all pvar arrays is the same > (i.e., usnic BTL module 0 will always be in slot 0, usnic BTL module 1 will > always be in slot 1, ...etc.). > > 2. Add another pvar that is an MPI_T state variable with an associated MPI_T > "enumeration", which contains string names of the underlying Linux devices. > This allows you to map a given value from a pvar to an underlying Linux > device (e.g., from usnic BTL module 2 to /dev/usnic_3, or whatever). > > See the attached PPT. > > If people have no objection to this, we should use this convention across > OMPI (e.g., for other BTLs that expose MPI_T pvars). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel