[OMPI devel] v1.7 is broken

2013-11-05 Thread Mike Dubman
Hi,
The latest merges from trunk to v1.7 broke v1.7  for openib:

*08:08:36* btl_openib_xrc.c:80: warning: 'ibv_close_xrc_domain' is
deprecated (declared at
/usr/include/infiniband/ofa_verbs.h:102)*08:08:36*   CC
btl_openib_fd.lo*08:08:36*   CC   btl_openib_ip.lo*08:08:36*   CC
 connect/btl_openib_connect_base.lo*08:08:36*   CC
connect/btl_openib_connect_oob.lo*08:08:37*   CC
connect/btl_openib_connect_empty.lo*08:08:37*   CC
connect/btl_openib_connect_xoob.lo*08:08:37*
connect/btl_openib_connect_xoob.c:359:35: error: macro
"ompi_rte_send_buffer_nb" passed 6 arguments, but takes just
5*08:08:37* connect/btl_openib_connect_xoob.c: In function
'xoob_send_connect_data':*08:08:37*
connect/btl_openib_connect_xoob.c:357: error:
'ompi_rte_send_buffer_nb' undeclared (first use in this
function)*08:08:37* connect/btl_openib_connect_xoob.c:357: error:
(Each undeclared identifier is reported only once*08:08:37*
connect/btl_openib_connect_xoob.c:357: error: for each function it
appears in.)*08:08:37* connect/btl_openib_connect_xoob.c: In function
'xoob_recv_qp_create':*08:08:37*
connect/btl_openib_connect_xoob.c:560: warning:
'ibv_create_xrc_rcv_qp' is deprecated (declared at
/usr/include/infiniband/ofa_verbs.h:126)*08:08:37*
connect/btl_openib_connect_xoob.c:572: warning:
'ibv_modify_xrc_rcv_qp' is deprecated (declared at
/usr/include/infiniband/ofa_verbs.h:152)*08:08:37*
connect/btl_openib_connect_xoob.c:616: warning:
'ibv_modify_xrc_rcv_qp' is deprecated (declared at
/usr/include/infiniband/ofa_verbs.h:152)*08:08:37*
connect/btl_openib_connect_xoob.c: In function
'xoob_recv_qp_connect':*08:08:37*
connect/btl_openib_connect_xoob.c:649: warning: 'ibv_reg_xrc_rcv_qp'
is deprecated (declared at
/usr/include/infiniband/ofa_verbs.h:185)*08:08:37*
connect/btl_openib_connect_xoob.c: In function
'xoob_component_query':*08:08:37*
connect/btl_openib_connect_xoob.c:1027: error: void value not ignored
as it ought to be*08:08:37* make[2]: ***
[connect/btl_openib_connect_xoob.lo] Error 1*08:08:37* make[2]:
Leaving directory
`/scrap/jenkins/scrap/workspace/hpc-ompi-shmem@3/label/hpc-test-node/ompi/mca/btl/openib'



M


Re: [OMPI devel] v1.7 is broken

2013-11-05 Thread Ralph Castain
I'll have to fix it when I return on Wed - trivial fix. Thanks!



On Mon, Nov 4, 2013 at 10:27 PM, Mike Dubman wrote:

>
> Hi,
> The latest merges from trunk to v1.7 broke v1.7  for openib:
>
> *08:08:36* btl_openib_xrc.c:80: warning: 'ibv_close_xrc_domain' is deprecated 
> (declared at /usr/include/infiniband/ofa_verbs.h:102)*08:08:36*   CC   
> btl_openib_fd.lo*08:08:36*   CC   btl_openib_ip.lo*08:08:36*   CC   
> connect/btl_openib_connect_base.lo*08:08:36*   CC   
> connect/btl_openib_connect_oob.lo*08:08:37*   CC   
> connect/btl_openib_connect_empty.lo*08:08:37*   CC   
> connect/btl_openib_connect_xoob.lo*08:08:37* 
> connect/btl_openib_connect_xoob.c:359:35: error: macro 
> "ompi_rte_send_buffer_nb" passed 6 arguments, but takes just 5*08:08:37* 
> connect/btl_openib_connect_xoob.c: In function 
> 'xoob_send_connect_data':*08:08:37* connect/btl_openib_connect_xoob.c:357: 
> error: 'ompi_rte_send_buffer_nb' undeclared (first use in this 
> function)*08:08:37* connect/btl_openib_connect_xoob.c:357: error: (Each 
> undeclared identifier is reported only once*08:08:37* 
> connect/btl_openib_connect_xoob.c:357: error: for each function it appears 
> in.)*08:08:37* connect/btl_openib_connect_xoob.c: In function 
> 'xoob_recv_qp_create':*08:08:37* connect/btl_openib_connect_xoob.c:560: 
> warning: 'ibv_create_xrc_rcv_qp' is deprecated (declared at 
> /usr/include/infiniband/ofa_verbs.h:126)*08:08:37* 
> connect/btl_openib_connect_xoob.c:572: warning: 'ibv_modify_xrc_rcv_qp' is 
> deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)*08:08:37* 
> connect/btl_openib_connect_xoob.c:616: warning: 'ibv_modify_xrc_rcv_qp' is 
> deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)*08:08:37* 
> connect/btl_openib_connect_xoob.c: In function 
> 'xoob_recv_qp_connect':*08:08:37* connect/btl_openib_connect_xoob.c:649: 
> warning: 'ibv_reg_xrc_rcv_qp' is deprecated (declared at 
> /usr/include/infiniband/ofa_verbs.h:185)*08:08:37* 
> connect/btl_openib_connect_xoob.c: In function 
> 'xoob_component_query':*08:08:37* connect/btl_openib_connect_xoob.c:1027: 
> error: void value not ignored as it ought to be*08:08:37* make[2]: *** 
> [connect/btl_openib_connect_xoob.lo] Error 1*08:08:37* make[2]: Leaving 
> directory 
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem@3/label/hpc-test-node/ompi/mca/btl/openib'
>
>
>
> M
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] v1.7 is broken

2013-11-05 Thread Ralph Castain
One thing that might help these problems: could you please complete the
move from openib/connect to common/ofacm? It is a little frustrating to
have to maintain two duplicate codes that are literally copy/paste versions
of each other.

I'd be happy to approve the CMR when available.



On Tue, Nov 5, 2013 at 2:46 AM, Ralph Castain  wrote:

> I'll have to fix it when I return on Wed - trivial fix. Thanks!
>
>
>
> On Mon, Nov 4, 2013 at 10:27 PM, Mike Dubman wrote:
>
>>
>> Hi,
>> The latest merges from trunk to v1.7 broke v1.7  for openib:
>>
>> *08:08:36* btl_openib_xrc.c:80: warning: 'ibv_close_xrc_domain' is 
>> deprecated (declared at /usr/include/infiniband/ofa_verbs.h:102)*08:08:36*   
>> CC   btl_openib_fd.lo*08:08:36*   CC   btl_openib_ip.lo*08:08:36*   
>> CC   connect/btl_openib_connect_base.lo*08:08:36*   CC   
>> connect/btl_openib_connect_oob.lo*08:08:37*   CC   
>> connect/btl_openib_connect_empty.lo*08:08:37*   CC   
>> connect/btl_openib_connect_xoob.lo*08:08:37* 
>> connect/btl_openib_connect_xoob.c:359:35: error: macro 
>> "ompi_rte_send_buffer_nb" passed 6 arguments, but takes just 5*08:08:37* 
>> connect/btl_openib_connect_xoob.c: In function 
>> 'xoob_send_connect_data':*08:08:37* connect/btl_openib_connect_xoob.c:357: 
>> error: 'ompi_rte_send_buffer_nb' undeclared (first use in this 
>> function)*08:08:37* connect/btl_openib_connect_xoob.c:357: error: (Each 
>> undeclared identifier is reported only once*08:08:37* 
>> connect/btl_openib_connect_xoob.c:357: error: for each function it appears 
>> in.)*08:08:37* connect/btl_openib_connect_xoob.c: In function 
>> 'xoob_recv_qp_create':*08:08:37* connect/btl_openib_connect_xoob.c:560: 
>> warning: 'ibv_create_xrc_rcv_qp' is deprecated (declared at 
>> /usr/include/infiniband/ofa_verbs.h:126)*08:08:37* 
>> connect/btl_openib_connect_xoob.c:572: warning: 'ibv_modify_xrc_rcv_qp' is 
>> deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)*08:08:37* 
>> connect/btl_openib_connect_xoob.c:616: warning: 'ibv_modify_xrc_rcv_qp' is 
>> deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)*08:08:37* 
>> connect/btl_openib_connect_xoob.c: In function 
>> 'xoob_recv_qp_connect':*08:08:37* connect/btl_openib_connect_xoob.c:649: 
>> warning: 'ibv_reg_xrc_rcv_qp' is deprecated (declared at 
>> /usr/include/infiniband/ofa_verbs.h:185)*08:08:37* 
>> connect/btl_openib_connect_xoob.c: In function 
>> 'xoob_component_query':*08:08:37* connect/btl_openib_connect_xoob.c:1027: 
>> error: void value not ignored as it ought to be*08:08:37* make[2]: *** 
>> [connect/btl_openib_connect_xoob.lo] Error 1*08:08:37* make[2]: Leaving 
>> directory 
>> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem@3/label/hpc-test-node/ompi/mca/btl/openib'
>>
>>
>>
>> M
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>


Re: [OMPI devel] [OMPI bugs] [Open MPI] #3885: Move r29608 to v1.7 branch (Fix C++11 issue identified by)

2013-11-05 Thread George Bosilca
Excellent, we must be one of the most reactive communities out there. This 
patch went all the way from trunk into the stable in a blazing 6 hours 
interval. Didn’t even got a chance of getting one good-old nightly test.

Unfortunately the patch might have some issues, our configure bails out with 
gcc 4.8.2

  George.


On Nov 5, 2013, at 21:32 , Open MPI  wrote:

> #3885: Move r29608 to v1.7 branch (Fix C++11 issue identified by)
> ---+-
> Reporter:  jsquyres|   Owner:  ompi-gk1.7
>Type:  changeset move request  |  Status:  assigned
> Priority:  major   |   Milestone:  Open MPI 1.7.4
> Version:  trunk   |  Resolution:
> Keywords:  |
> ---+-
> Changes (by brbarret):
> 
> * owner:  ompi-rm1.7 => ompi-gk1.7
> 
> 
> Comment:
> 
> RM approved.
> 
> -- 
> Ticket URL: 
> Open MPI 
> 
> ___
> bugs mailing list
> b...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs



Re: [OMPI devel] [OMPI bugs] [Open MPI] #3885: Move r29608 to v1.7 branch (Fix C++11 issue identified by)

2013-11-05 Thread Nathan Hjelm
Is that related to this patch? I was about to bisect this error:

checking for compiler familyid... 2
checking for compiler familyname... INTEL
checking for compiler version... 1375113743
checking for compiler version_str... 
/usr/projects/hpctools/hjelmn/ompi-trunk-git/configure: line 27938: 20130607: 
No such file or directory

happening with every compiler now.

-Nathan Hjelm
HPC-5, LANL

On Tue, Nov 05, 2013 at 10:43:59PM +0100, George Bosilca wrote:
> Excellent, we must be one of the most reactive communities out there. This 
> patch went all the way from trunk into the stable in a blazing 6 hours 
> interval. Didn’t even got a chance of getting one good-old nightly test.
> 
> Unfortunately the patch might have some issues, our configure bails out with 
> gcc 4.8.2
> 
>   George.
> 
> 
> On Nov 5, 2013, at 21:32 , Open MPI  wrote:
> 
> > #3885: Move r29608 to v1.7 branch (Fix C++11 issue identified by)
> > ---+-
> > Reporter:  jsquyres|   Owner:  ompi-gk1.7
> >Type:  changeset move request  |  Status:  assigned
> > Priority:  major   |   Milestone:  Open MPI 1.7.4
> > Version:  trunk   |  Resolution:
> > Keywords:  |
> > ---+-
> > Changes (by brbarret):
> > 
> > * owner:  ompi-rm1.7 => ompi-gk1.7
> > 
> > 
> > Comment:
> > 
> > RM approved.
> > 
> > -- 
> > Ticket URL: 
> > Open MPI 
> > 
> > ___
> > bugs mailing list
> > b...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/bugs
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


[OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Jeff Squyres (jsquyres)
WHAT: suggestion for how to expose multiple MPI_T pvar values for a given 
variable.

WHY: so that we have a common convention across OMPI (and possibly set a 
precedent for other MPI implementations...?).

WHERE: ompi/mca/btl/usnic, but if everyone likes it, potentially elsewhere in 
OMPI

TIMEOUT: before 1.7.4, so let's set a first timeout of next Tuesday teleconf 
(Nov 12)

More detail:


Per my discussion on the call today, I'm sending the attached PPT of how we're 
exposing MPI_T performance variables in the usnic BTL in the multi-BTL case.

Feedback is welcome, especially because we're the first MPI implementation to 
expose MPI_T pvars in this way (already committed on the trunk and targeted for 
1.7.4).  So this methodology may well become a useful precedent.  

** Issue #1: we want to expose each usnic BTL pvar (e.g., btl_usnic_num_sends) 
on a per-usnic-BTL-*module* basis.  How to do this?

1. Add a prefix/suffix on each pvar name (e.g., btl_usnic_num_sends_0, 
btl_usnic_num_sends_1, ...etc.).
2. Return an array of values under the single name (btl_usnic_num_sends) -- one 
value for each BTL module.

We opted for the 2nd option.  The MPI_T pvar interface provides a way to get 
the array length for a pvar, so this is all fine and good.

Specifically: btl_usnic_num_sends returns an array of N values, where N is the 
number of usnic BTL modules being used by the MPI process.  Each slot in the 
array corresponds to the value from one usnic BTL module.

** Issue #2: but how do you map a given value to an underlying Linux usnic 
interface?

Our solution was twofold:

1. Guarantee that the ordering of values in all pvar arrays is the same (i.e., 
usnic BTL module 0 will always be in slot 0, usnic BTL module 1 will always be 
in slot 1, ...etc.).

2. Add another pvar that is an MPI_T state variable with an associated MPI_T 
"enumeration", which contains string names of the underlying Linux devices.  
This allows you to map a given value from a pvar to an underlying Linux device 
(e.g., from usnic BTL module 2 to /dev/usnic_3, or whatever).

See the attached PPT.

If people have no objection to this, we should use this convention across OMPI 
(e.g., for other BTLs that expose MPI_T pvars).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


varForMultipleDevs-2.pptx
Description: varForMultipleDevs-2.pptx


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Paul Hargrove
Jeff,

If this approach is to be adopted by other components (and perhaps other
MPIs), then it would be important for the enumeration variable name to be
derived in a UNIFORM way:
__SOMETHING
Without a fixed value for "SOMETHING" somebody will need to read sources
(or documentation) to make the connection.

In the slides you used "btl_usnic_devices", which seems overly specific
since a single NIC might have multiple PORTS making the "_devices" term
inappropriate/misleading (yes, it matches "device" in the sense of
/dev/foo, but not in the sense of a device as a physical object).  For tcp
on a multi-homed host "device" is again not necessarily the first word that
comes to mind for identifying the "interface" or listening address.
 Perhaps something nice and generic like "_instances", which is at least
consistent with the definition of "module" given at
http://www.open-mpi.org/faq/?category=developers#ompi-terminology

-Paul


On Tue, Nov 5, 2013 at 2:37 PM, Jeff Squyres (jsquyres)
wrote:

> WHAT: suggestion for how to expose multiple MPI_T pvar values for a given
> variable.
>
> WHY: so that we have a common convention across OMPI (and possibly set a
> precedent for other MPI implementations...?).
>
> WHERE: ompi/mca/btl/usnic, but if everyone likes it, potentially elsewhere
> in OMPI
>
> TIMEOUT: before 1.7.4, so let's set a first timeout of next Tuesday
> teleconf (Nov  12)
>
> More detail:
> 
>
> Per my discussion on the call today, I'm sending the attached PPT of how
> we're exposing MPI_T performance variables in the usnic BTL in the
> multi-BTL case.
>
> Feedback is welcome, especially because we're the first MPI implementation
> to expose MPI_T pvars in this way (already committed on the trunk and
> targeted for 1.7.4).  So this methodology may well become a useful
> precedent.
>
> ** Issue #1: we want to expose each usnic BTL pvar (e.g.,
> btl_usnic_num_sends) on a per-usnic-BTL-*module* basis.  How to do this?
>
> 1. Add a prefix/suffix on each pvar name (e.g., btl_usnic_num_sends_0,
> btl_usnic_num_sends_1, ...etc.).
> 2. Return an array of values under the single name (btl_usnic_num_sends)
> -- one value for each BTL module.
>
> We opted for the 2nd option.  The MPI_T pvar interface provides a way to
> get the array length for a pvar, so this is all fine and good.
>
> Specifically: btl_usnic_num_sends returns an array of N values, where N is
> the number of usnic BTL modules being used by the MPI process.  Each slot
> in the array corresponds to the value from one usnic BTL module.
>
> ** Issue #2: but how do you map a given value to an underlying Linux usnic
> interface?
>
> Our solution was twofold:
>
> 1. Guarantee that the ordering of values in all pvar arrays is the same
> (i.e., usnic BTL module 0 will always be in slot 0, usnic BTL module 1 will
> always be in slot 1, ...etc.).
>
> 2. Add another pvar that is an MPI_T state variable with an associated
> MPI_T "enumeration", which contains string names of the underlying Linux
> devices.  This allows you to map a given value from a pvar to an underlying
> Linux device (e.g., from usnic BTL module 2 to /dev/usnic_3, or whatever).
>
> See the attached PPT.
>
> If people have no objection to this, we should use this convention across
> OMPI (e.g., for other BTLs that expose MPI_T pvars).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread George Bosilca
I like the idea. I do have some question not necessarily related to
your proposal, but to how we can use the information you propose to
expose.

I have a question regarding the extension of this concept to multi-BTL
runs. Granted we will have to have a local indexing of BTL (I'm not
concerned about this). But how do we ensure the naming is globally
consistent (in the sense that all processes in the job will agree that
usnic0 is index 0) even when we have a heterogeneous environment? As
an example some of our clusters have 1 NIC on some nodes, and 2 on
others. Of course we can say we don't guarantee consistent naming, but
for tools trying to understand communication issues on distributed
environments having a global view is a clear plus.

Another question is about the level of details. I wonder if this level
of details is really needed, or providing the aggregate pvar will be
enough in most cases. The problem I see here is the lack of
topological knowledge at the upper level. Seeing a large number of
messages on a particular BTL might suggest that something is wrong
inside the implementation, when in fact the BTL is the only one
connecting a subset of peers. Without us exposing this information,
I'm afraid the tool might get the wrong picture ...

Thanks,
  George.



On Tue, Nov 5, 2013 at 11:37 PM, Jeff Squyres (jsquyres)
 wrote:
> WHAT: suggestion for how to expose multiple MPI_T pvar values for a given 
> variable.
>
> WHY: so that we have a common convention across OMPI (and possibly set a 
> precedent for other MPI implementations...?).
>
> WHERE: ompi/mca/btl/usnic, but if everyone likes it, potentially elsewhere in 
> OMPI
>
> TIMEOUT: before 1.7.4, so let's set a first timeout of next Tuesday teleconf 
> (Nov  12)
>
> More detail:
> 
>
> Per my discussion on the call today, I'm sending the attached PPT of how 
> we're exposing MPI_T performance variables in the usnic BTL in the multi-BTL 
> case.
>
> Feedback is welcome, especially because we're the first MPI implementation to 
> expose MPI_T pvars in this way (already committed on the trunk and targeted 
> for 1.7.4).  So this methodology may well become a useful precedent.
>
> ** Issue #1: we want to expose each usnic BTL pvar (e.g., 
> btl_usnic_num_sends) on a per-usnic-BTL-*module* basis.  How to do this?
>
> 1. Add a prefix/suffix on each pvar name (e.g., btl_usnic_num_sends_0, 
> btl_usnic_num_sends_1, ...etc.).
> 2. Return an array of values under the single name (btl_usnic_num_sends) -- 
> one value for each BTL module.
>
> We opted for the 2nd option.  The MPI_T pvar interface provides a way to get 
> the array length for a pvar, so this is all fine and good.
>
> Specifically: btl_usnic_num_sends returns an array of N values, where N is 
> the number of usnic BTL modules being used by the MPI process.  Each slot in 
> the array corresponds to the value from one usnic BTL module.
>
> ** Issue #2: but how do you map a given value to an underlying Linux usnic 
> interface?
>
> Our solution was twofold:
>
> 1. Guarantee that the ordering of values in all pvar arrays is the same 
> (i.e., usnic BTL module 0 will always be in slot 0, usnic BTL module 1 will 
> always be in slot 1, ...etc.).
>
> 2. Add another pvar that is an MPI_T state variable with an associated MPI_T 
> "enumeration", which contains string names of the underlying Linux devices.  
> This allows you to map a given value from a pvar to an underlying Linux 
> device (e.g., from usnic BTL module 2 to /dev/usnic_3, or whatever).
>
> See the attached PPT.
>
> If people have no objection to this, we should use this convention across 
> OMPI (e.g., for other BTLs that expose MPI_T pvars).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Jeff Squyres (jsquyres)
On Nov 5, 2013, at 2:54 PM, Paul Hargrove  wrote:

> If this approach is to be adopted by other components (and perhaps other 
> MPIs), then it would be important for the enumeration variable name to be 
> derived in a UNIFORM way:
> __SOMETHING
> Without a fixed value for "SOMETHING" somebody will need to read sources (or 
> documentation) to make the connection.

This is a good point; we got a similar piece of feedback from the MPI tools 
group.

How about naming the state variable "_"?  And then that 
will apply to all "_*" pvars.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Jeff Squyres (jsquyres)
On Nov 5, 2013, at 2:59 PM, George Bosilca  wrote:

> I have a question regarding the extension of this concept to multi-BTL
> runs. Granted we will have to have a local indexing of BTL (I'm not
> concerned about this). But how do we ensure the naming is globally
> consistent (in the sense that all processes in the job will agree that
> usnic0 is index 0) even when we have a heterogeneous environment?

The MPI_T pvars are local-only.  So even if index 0 is usnic_0 in proc A, but 
index 0 is usnic_3 in proc B, it shouldn't matter.  More specifically: these 
values only have meaning within the process from which they were gathered.

I guess I'm trying to say that there's no need to ensure globally consistent 
ordering between processes.  ...unless I'm missing something?

> As
> an example some of our clusters have 1 NIC on some nodes, and 2 on
> others. Of course we can say we don't guarantee consistent naming, but
> for tools trying to understand communication issues on distributed
> environments having a global view is a clear plus.

A good point.  But even with globally consistent ordering, you don't know that 
usnic_0 in process A communicates with usnic_0 in process B (indeed, we run 
some QA cases here at Cisco where we deliberately ensure that usnic_X in 
process A is on the same subnet as usnic_Y in process B, where X!=Y, and 
everything still works properly).

> Another question is about the level of details. I wonder if this level
> of details is really needed, or providing the aggregate pvar will be
> enough in most cases. The problem I see here is the lack of
> topological knowledge at the upper level. Seeing a large number of
> messages on a particular BTL might suggest that something is wrong
> inside the implementation, when in fact the BTL is the only one
> connecting a subset of peers. Without us exposing this information,
> I'm afraid the tool might get the wrong picture ...

I think exposing network-level information can only be used to infer indirect 
information about the upper-layer MPI semantics.  However, exposing these 
counters was not intended to be used for MPI-application-level semantic 
information; it was more intended to expose information about what is happening 
on your underlying network -- something that OS bypass networks don't otherwise 
provide.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Paul Hargrove
On Tue, Nov 5, 2013 at 6:00 PM, Jeff Squyres (jsquyres)
wrote:

> On Nov 5, 2013, at 2:54 PM, Paul Hargrove  wrote:
>
> > If this approach is to be adopted by other components (and perhaps other
> MPIs), then it would be important for the enumeration variable name to be
> derived in a UNIFORM way:
> > __SOMETHING
> > Without a fixed value for "SOMETHING" somebody will need to read sources
> (or documentation) to make the connection.
>
> This is a good point; we got a similar piece of feedback from the MPI
> tools group.
>
> How about naming the state variable "_"?  And then
> that will apply to all "_*" pvars.



Hmm...  not sure how that jives with "principle of least astonishment".
Other than that "_SOMETHING" == "" seems like a solution that totally
avoids the problems associated with words like "device" (which might imply
something about h/w architecture) or "instance" (with potential
implications regarding s/w architecture).

So, on balance: +0.9  (my other 0.1 goes to "_enum" for "principle of least
astonishment".)

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Jeff Squyres (jsquyres)
Hmm. "_enum" has possibilities.

How about using a * in the name, to represent where the match is?  E.G.,  
btl_usnic_*_enum?

It's a string, so it's not just limited to letters and underscores.

Sent from my phone. No type good.

On Nov 5, 2013, at 6:26 PM, "Paul Hargrove" 
mailto:phhargr...@lbl.gov>> wrote:

On Tue, Nov 5, 2013 at 6:00 PM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
On Nov 5, 2013, at 2:54 PM, Paul Hargrove 
mailto:phhargr...@lbl.gov>> wrote:

> If this approach is to be adopted by other components (and perhaps other 
> MPIs), then it would be important for the enumeration variable name to be 
> derived in a UNIFORM way:
> __SOMETHING
> Without a fixed value for "SOMETHING" somebody will need to read sources (or 
> documentation) to make the connection.

This is a good point; we got a similar piece of feedback from the MPI tools 
group.

How about naming the state variable "_"?  And then that 
will apply to all "_*" pvars.


Hmm...  not sure how that jives with "principle of least astonishment".
Other than that "_SOMETHING" == "" seems like a solution that totally avoids 
the problems associated with words like "device" (which might imply something 
about h/w architecture) or "instance" (with potential implications regarding 
s/w architecture).

So, on balance: +0.9  (my other 0.1 goes to "_enum" for "principle of least 
astonishment".)

-Paul


--
Paul H. Hargrove  
phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-05 Thread Paul Hargrove
I stand by my previous "vote"

"btl_usnic" gets 90% of my vote.
"btl_usnic_enum" gets 10%.
"btl_usnic_*_enum" gets nada.

Rationale:
While Jeff is correct that the string can legally contain '*', I would
imagine that users would like to have the ability to use wildcards (or even
full regular expressions) when interacting with their tools.  For that
reason I'd suggest sticking to just letters, digits and underscore.

-Paul


On Tue, Nov 5, 2013 at 7:50 PM, Jeff Squyres (jsquyres)
wrote:

>  Hmm. "_enum" has possibilities.
>
>  How about using a * in the name, to represent where the match is?  E.G.,
>  btl_usnic_*_enum?
>
>  It's a string, so it's not just limited to letters and underscores.
>
> Sent from my phone. No type good.
>
> On Nov 5, 2013, at 6:26 PM, "Paul Hargrove"  wrote:
>
>   On Tue, Nov 5, 2013 at 6:00 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> On Nov 5, 2013, at 2:54 PM, Paul Hargrove  wrote:
>>
>> > If this approach is to be adopted by other components (and perhaps
>> other MPIs), then it would be important for the enumeration variable name
>> to be derived in a UNIFORM way:
>> > __SOMETHING
>> > Without a fixed value for "SOMETHING" somebody will need to read
>> sources (or documentation) to make the connection.
>>
>> This is a good point; we got a similar piece of feedback from the MPI
>> tools group.
>>
>> How about naming the state variable "_"?  And then
>> that will apply to all "_*" pvars.
>
>
>
>  Hmm...  not sure how that jives with "principle of least astonishment".
> Other than that "_SOMETHING" == "" seems like a solution that totally
> avoids the problems associated with words like "device" (which might imply
> something about h/w architecture) or "instance" (with potential
> implications regarding s/w architecture).
>
>  So, on balance: +0.9  (my other 0.1 goes to "_enum" for "principle of
> least astonishment".)
>
>  -Paul
>
>
>  --
>  Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
>  ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900