Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Ralph Castain
Just to clarify my point, since the 1.7 branch was mentioned in this thread. I 
didn't worry about USNIC calling abort because, as Jeff pointed out, we do so 
in other places. However, I do believe that we shouldn't be doing so (including 
in orte) because it isn't the role of a library to abort the process. We should 
report errors upward to the app and let it decide how to respond.

That said, I know we initially did it because we hit places where we couldn't 
propagate an error code (e.g., in a void routine called by the event lib). I've 
been working on resolving that in orte, but it still isn't complete.

Figure we should do the same to the MPI layer, recognizing that it will take 
time to complete


On Feb 27, 2014, at 1:48 PM, Rolf vandeVaart  wrote:

> It could.  I added that argument 4 years ago to support by my failover work 
> with the BFO.  It was a way for a BTL to pass some type of string back to the 
> PML telling the PML who it was for verbose output to understand what was 
> happening. 
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
>> (jsquyres)
>> Sent: Thursday, February 27, 2014 4:22 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in
>> trunk/ompi/mca: btl/usnic rte
>> 
>> Speaking of which, shouldn't the OB1 error handler send the error message
>> string that it received as the 4th param to ompi_rte_abort() so that it can 
>> be
>> printed out?
>> 
>> 
>> Index: ompi/mca/pml/ob1/pml_ob1.c
>> ===
>> 
>> --- ompi/mca/pml/ob1/pml_ob1.c   (revision 30877)
>> +++ ompi/mca/pml/ob1/pml_ob1.c   (working copy)
>> @@ -780,7 +780,7 @@
>>return;
>>}
>> #endif /* OPAL_CUDA_SUPPORT */
>> -ompi_rte_abort(-1, NULL);
>> +ompi_rte_abort(-1, btlinfo);
>> }
>> 
>> #if OPAL_ENABLE_FT_CR== 0
>> 
>> 
>> 
>> On Feb 27, 2014, at 1:12 PM, Jeff Squyres (jsquyres) 
>> wrote:
>> 
>>> FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() 
>>> within
>> them:
>>> 
>>> - usnic
>>> - openib
>>> - portals4
>>> - the btl base itself
>>> 
>>> 
>>> On Feb 27, 2014, at 7:16 AM, Ralph Castain  wrote:
>>> 
> The majority of places we call abort in this commit is actually down in a
>> progress thread.  We didn't think it was safe to call the PML error function 
>> in a
>> progress thread -- is that incorrect?
 
 If not, then we probably should create some mechanism for doing so. I
>> agree with George that we shouldn't call abort inside a library
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Rolf vandeVaart
It could.  I added that argument 4 years ago to support by my failover work 
with the BFO.  It was a way for a BTL to pass some type of string back to the 
PML telling the PML who it was for verbose output to understand what was 
happening. 

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
>(jsquyres)
>Sent: Thursday, February 27, 2014 4:22 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in
>trunk/ompi/mca: btl/usnic rte
>
>Speaking of which, shouldn't the OB1 error handler send the error message
>string that it received as the 4th param to ompi_rte_abort() so that it can be
>printed out?
>
>
>Index: ompi/mca/pml/ob1/pml_ob1.c
>===
>
>--- ompi/mca/pml/ob1/pml_ob1.c (revision 30877)
>+++ ompi/mca/pml/ob1/pml_ob1.c (working copy)
>@@ -780,7 +780,7 @@
> return;
> }
> #endif /* OPAL_CUDA_SUPPORT */
>-ompi_rte_abort(-1, NULL);
>+ompi_rte_abort(-1, btlinfo);
> }
>
> #if OPAL_ENABLE_FT_CR== 0
>
>
>
>On Feb 27, 2014, at 1:12 PM, Jeff Squyres (jsquyres) 
>wrote:
>
>> FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() within
>them:
>>
>> - usnic
>> - openib
>> - portals4
>> - the btl base itself
>>
>>
>> On Feb 27, 2014, at 7:16 AM, Ralph Castain  wrote:
>>
 The majority of places we call abort in this commit is actually down in a
>progress thread.  We didn't think it was safe to call the PML error function 
>in a
>progress thread -- is that incorrect?
>>>
>>> If not, then we probably should create some mechanism for doing so. I
>agree with George that we shouldn't call abort inside a library
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>--
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Jeff Squyres (jsquyres)
Speaking of which, shouldn't the OB1 error handler send the error message 
string that it received as the 4th param to ompi_rte_abort() so that it can be 
printed out?


Index: ompi/mca/pml/ob1/pml_ob1.c
===
--- ompi/mca/pml/ob1/pml_ob1.c  (revision 30877)
+++ ompi/mca/pml/ob1/pml_ob1.c  (working copy)
@@ -780,7 +780,7 @@
 return;
 }
 #endif /* OPAL_CUDA_SUPPORT */
-ompi_rte_abort(-1, NULL);
+ompi_rte_abort(-1, btlinfo);
 }
 
 #if OPAL_ENABLE_FT_CR== 0



On Feb 27, 2014, at 1:12 PM, Jeff Squyres (jsquyres)  wrote:

> FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() within 
> them:
> 
> - usnic
> - openib
> - portals4
> - the btl base itself
> 
> 
> On Feb 27, 2014, at 7:16 AM, Ralph Castain  wrote:
> 
>>> The majority of places we call abort in this commit is actually down in a 
>>> progress thread.  We didn't think it was safe to call the PML error 
>>> function in a progress thread -- is that incorrect?
>> 
>> If not, then we probably should create some mechanism for doing so. I agree 
>> with George that we shouldn't call abort inside a library
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Jeff Squyres (jsquyres)
FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() within 
them:

- usnic
- openib
- portals4
- the btl base itself


On Feb 27, 2014, at 7:16 AM, Ralph Castain  wrote:

>> The majority of places we call abort in this commit is actually down in a 
>> progress thread.  We didn't think it was safe to call the PML error function 
>> in a progress thread -- is that incorrect?
> 
> If not, then we probably should create some mechanism for doing so. I agree 
> with George that we shouldn't call abort inside a library


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Open MPI's 'mpiexec' trash output of program being aborted?

2014-02-27 Thread Ralph Castain
Not intentional, but I suspect it's a race condition you are seeing. I'll have 
to look to see where it is getting lost

On Feb 27, 2014, at 9:32 AM, Paul Kapinos  wrote:

> Dear Open MPI developer,
> 
> 
> Please take a look at the attached 'program'.
> 
> In this program, we try to catch signals send from outside, and "handle" 
> them. In case of different signals different output has to be produced.
> 
> When you start this file directly, or using 'mpiexec' from Intel MPI, and 
> then abort it by Ctrl-C, the output "SIGINT  received" is written to file and 
> to StdOut.
> 
> When you start this file using Open MPI's 'mpiexec', the output is written to 
> file, but *not* to StdOutput - 'mpiexec' seem to nick it.
> 
> Is that behaviour intentionally? (it is quite uncomfortable, huh)
> 
> Best
> 
> Paul Kapinos
> 
> P.S. Tested versions: 1.6.5, 1.7.4
> 
> 
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, IT Center
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Open MPI's 'mpiexec' trash output of program being aborted?

2014-02-27 Thread Paul Kapinos

Dear Open MPI developer,


Please take a look at the attached 'program'.

In this program, we try to catch signals send from outside, and "handle" them. 
In case of different signals different output has to be produced.


When you start this file directly, or using 'mpiexec' from Intel MPI, and then 
abort it by Ctrl-C, the output "SIGINT  received" is written to file and to StdOut.


When you start this file using Open MPI's 'mpiexec', the output is written to 
file, but *not* to StdOutput - 'mpiexec' seem to nick it.


Is that behaviour intentionally? (it is quite uncomfortable, huh)

Best

Paul Kapinos

P.S. Tested versions: 1.6.5, 1.7.4





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
#!/usr/bin/perl
use Sys::Hostname;

open (MYFILE, '>>testoutput.txt');


$| = 1;
print"running on ", hostname, "\n";
print MYFILE "running on ", hostname, "\n";

$SIG{INT}  = sub { print "SIGINT  received\n"; print MYFILE "SIGINT  received\n"; exit 0 };
$SIG{TERM} = sub { print "SIGTERM received\n"; print MYFILE "SIGTERM received\n"; exit 0 };

sleep 10; 

close (MYFILE); 


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Ralph Castain

On Feb 27, 2014, at 6:58 AM, Jeff Squyres (jsquyres)  wrote:

> On Feb 27, 2014, at 3:33 AM, George Bosilca  wrote:
> 
>> I’m concerned about your usage of abort here. Looking at the code I noticed 
>> that you call RTE_ABORT deep inside the BTL stack. This is a significant 
>> divergence from our current behavior (except for USNIC apparently as the 
>> code is now in the 1.7). The BTLs are not deciders, but merely reporters. 
>> Any error should be reported upstream, and will be dealt with at that level.
> 
> The majority of places we call abort in this commit is actually down in a 
> progress thread.  We didn't think it was safe to call the PML error function 
> in a progress thread -- is that incorrect?

If not, then we probably should create some mechanism for doing so. I agree 
with George that we shouldn't call abort inside a library

> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] warning in openib BTL

2014-02-27 Thread Jeff Squyres (jsquyres)
I'm also seeing these warnings this morning:

connect/btl_openib_connect_rdmacm.c:369:5: warning: "BTL_OPENIB_RDMACM_IB_ADDR" 
is not defined

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] autoconf warnings: openib BTL

2014-02-27 Thread Jeff Squyres (jsquyres)
I'm seeing this warning this morning:

-
configure.ac:1139: warning: AC_RUN_IFELSE called without default to allow cross 
c\
ompiling
../../lib/autoconf/general.m4:2748: AC_RUN_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:639: AS_IF is expanded from...
ompi/mca/btl/openib/configure.m4:37: MCA_ompi_btl_openib_CONFIG is expanded 
from.\
..
config/ompi_mca.m4:571: MCA_CONFIGURE_M4_CONFIG_COMPONENT is expanded from...
config/ompi_mca.m4:352: MCA_CONFIGURE_FRAMEWORK is expanded from...
config/ompi_mca.m4:252: MCA_CONFIGURE_PROJECT is expanded from...
config/ompi_mca.m4:39: OMPI_MCA is expanded from...
configure.ac:1139: the top level
-

Is it necessary to AC_RUN_IFELSE here?  Is AC_CHECK_DECLS not sufficient for 
some reason?

It strikes me that this test you currently have in configure.m4 really should 
be a run-time test, and that all you need in configure.m4 should be an 
AC_CHECK_DECLS to see if AF_IB exists.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread Jeff Squyres (jsquyres)
On Feb 27, 2014, at 3:33 AM, George Bosilca  wrote:

> I’m concerned about your usage of abort here. Looking at the code I noticed 
> that you call RTE_ABORT deep inside the BTL stack. This is a significant 
> divergence from our current behavior (except for USNIC apparently as the code 
> is now in the 1.7). The BTLs are not deciders, but merely reporters. Any 
> error should be reported upstream, and will be dealt with at that level.

The majority of places we call abort in this commit is actually down in a 
progress thread.  We didn't think it was safe to call the PML error function in 
a progress thread -- is that incorrect?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] compile error in v1.7

2014-02-27 Thread Mike Dubman
yep, now it fine.
thx


On Thu, Feb 27, 2014 at 4:43 PM, Ralph Castain  wrote:

> you need to update your repo
>
> On Feb 26, 2014, at 9:55 PM, Mike Dubman  wrote:
>
> *07:32:17* make[2]: Entering directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:17*   CC   
> runtime/orte_finalize.lo*07:32:17*   CC   
> runtime/orte_init.lo*07:32:17*   CC   runtime/orte_locks.lo*07:32:17*   
> CC   runtime/orte_globals.lo*07:32:18*   CC   
> runtime/orte_quit.lo*07:32:18*   CC   
> runtime/data_type_support/orte_dt_compare_fns.lo*07:32:18*   CC   
> runtime/data_type_support/orte_dt_copy_fns.lo*07:32:19*   CC   
> runtime/data_type_support/orte_dt_print_fns.lo*07:32:19*   CC   
> runtime/data_type_support/orte_dt_packing_fns.lo*07:32:20*   CC   
> runtime/data_type_support/orte_dt_unpacking_fns.lo*07:32:20*   CC   
> runtime/orte_mca_params.lo*07:32:20*   CC   
> runtime/orte_wait.lo*07:32:21*   CC   runtime/orte_cr.lo*07:32:21*   CC   
> runtime/orte_data_server.lo*07:32:21*   CC   
> runtime/orte_info_support.lo*07:32:21*   CC   
> util/error_strings.lo*07:32:22*   CC   util/name_fns.lo*07:32:22*   CC
>util/proc_info.lo*07:32:22*   CC   util/session_dir.lo*07:32:23*   CC  
>  util/show_help.lo*07:32:23*   CC   util/context_fns.lo*07:32:23*   
> CC   util/parse_options.lo*07:32:23*   CC   
> util/pre_condition_transports.lo*07:32:24*   CC   
> util/hnp_contact.lo*07:32:24*   LEX  
> util/hostfile/hostfile_lex.c*07:32:24*   CC   
> util/hostfile/hostfile_lex.lo*07:32:24*   CC   
> util/hostfile/hostfile.lo*07:32:25*   CC   
> util/dash_host/dash_host.lo*07:32:26*   CC   util/comm/comm.lo*07:32:26*  
>  CC   util/nidmap.lo*07:32:26* util/nidmap.c: In function 
> 'orte_util_decode_pidmap':*07:32:26* util/nidmap.c:1033: error: 
> 'OPAL_DB_LOCALRANK' undeclared (first use in this function)*07:32:26* 
> util/nidmap.c:1033: error: (Each undeclared identifier is reported only 
> once*07:32:26* util/nidmap.c:1033: error: for each function it appears 
> in.)*07:32:26* make[2]: *** [util/nidmap.lo] Error 1*07:32:26* make[2]: 
> Leaving directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:26* 
> make[1]: *** [install-recursive] Error 1*07:32:26* make[1]: Leaving directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:26* make: 
> *** [install-recursive] Error 1
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] compile error in v1.7

2014-02-27 Thread Ralph Castain
you need to update your repo

On Feb 26, 2014, at 9:55 PM, Mike Dubman  wrote:

> 07:32:17 make[2]: Entering directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'
> 07:32:17   CC   runtime/orte_finalize.lo
> 07:32:17   CC   runtime/orte_init.lo
> 07:32:17   CC   runtime/orte_locks.lo
> 07:32:17   CC   runtime/orte_globals.lo
> 07:32:18   CC   runtime/orte_quit.lo
> 07:32:18   CC   runtime/data_type_support/orte_dt_compare_fns.lo
> 07:32:18   CC   runtime/data_type_support/orte_dt_copy_fns.lo
> 07:32:19   CC   runtime/data_type_support/orte_dt_print_fns.lo
> 07:32:19   CC   runtime/data_type_support/orte_dt_packing_fns.lo
> 07:32:20   CC   runtime/data_type_support/orte_dt_unpacking_fns.lo
> 07:32:20   CC   runtime/orte_mca_params.lo
> 07:32:20   CC   runtime/orte_wait.lo
> 07:32:21   CC   runtime/orte_cr.lo
> 07:32:21   CC   runtime/orte_data_server.lo
> 07:32:21   CC   runtime/orte_info_support.lo
> 07:32:21   CC   util/error_strings.lo
> 07:32:22   CC   util/name_fns.lo
> 07:32:22   CC   util/proc_info.lo
> 07:32:22   CC   util/session_dir.lo
> 07:32:23   CC   util/show_help.lo
> 07:32:23   CC   util/context_fns.lo
> 07:32:23   CC   util/parse_options.lo
> 07:32:23   CC   util/pre_condition_transports.lo
> 07:32:24   CC   util/hnp_contact.lo
> 07:32:24   LEX  util/hostfile/hostfile_lex.c
> 07:32:24   CC   util/hostfile/hostfile_lex.lo
> 07:32:24   CC   util/hostfile/hostfile.lo
> 07:32:25   CC   util/dash_host/dash_host.lo
> 07:32:26   CC   util/comm/comm.lo
> 07:32:26   CC   util/nidmap.lo
> 07:32:26 util/nidmap.c: In function 'orte_util_decode_pidmap':
> 07:32:26 util/nidmap.c:1033: error: 'OPAL_DB_LOCALRANK' undeclared (first use 
> in this function)
> 07:32:26 util/nidmap.c:1033: error: (Each undeclared identifier is reported 
> only once
> 07:32:26 util/nidmap.c:1033: error: for each function it appears in.)
> 07:32:26 make[2]: *** [util/nidmap.lo] Error 1
> 07:32:26 make[2]: Leaving directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'
> 07:32:26 make[1]: *** [install-recursive] Error 1
> 07:32:26 make[1]: Leaving directory 
> `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'
> 07:32:26 make: *** [install-recursive] Error 1
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte

2014-02-27 Thread George Bosilca
Guys,

I’m concerned about your usage of abort here. Looking at the code I noticed 
that you call RTE_ABORT deep inside the BTL stack. This is a significant 
divergence from our current behavior (except for USNIC apparently as the code 
is now in the 1.7). The BTLs are not deciders, but merely reporters. Any error 
should be reported upstream, and will be dealt with at that level.

If you want to pursue such a drastic change in the behavior of Open MPI, you 
should definitively make it through an RFC.

  George.

On Feb 26, 2014, at 23:21 , svn-commit-mai...@open-mpi.org wrote:

> Author: jsquyres (Jeff Squyres)
> Date: 2014-02-26 17:21:25 EST (Wed, 26 Feb 2014)
> New Revision: 30860
> URL: https://svn.open-mpi.org/trac/ompi/changeset/30860
> 
> Log:
> Add usnic connectivity-checking agent service.
> 
> Basically: since usnic is a connectionless transport, we do not get
> OS-provided services "for free" that connection-oriented transports
> get, namely: "hey, I wasn't able to make a connection to peer X", and
> "hey, your connection to peer X has died."
> 
> This connectivity-checker runs in a separate progress thread in the
> usnic BTL in local rank 0 on each server.  Upon first send in any
> process, the connectivty-checker agent will send some UDP pings to the
> peer to ensure that we can reach it.  If we can't, we'll abort the job
> with a nice show_help message.
> 
> There's a lengthy comment in btl_usnic_connectivity.h explains the
> scheme and how it works.
> 
> Reviewed by Dave Goodell.
> 
> cmr=v1.7.5:ticket=#4253



[OMPI devel] compile error in v1.7

2014-02-27 Thread Mike Dubman
*07:32:17* make[2]: Entering directory
`/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:17*
  CC   runtime/orte_finalize.lo*07:32:17*   CC
runtime/orte_init.lo*07:32:17*   CC
runtime/orte_locks.lo*07:32:17*   CC
runtime/orte_globals.lo*07:32:18*   CC
runtime/orte_quit.lo*07:32:18*   CC
runtime/data_type_support/orte_dt_compare_fns.lo*07:32:18*   CC
runtime/data_type_support/orte_dt_copy_fns.lo*07:32:19*   CC
runtime/data_type_support/orte_dt_print_fns.lo*07:32:19*   CC
runtime/data_type_support/orte_dt_packing_fns.lo*07:32:20*   CC
runtime/data_type_support/orte_dt_unpacking_fns.lo*07:32:20*   CC
 runtime/orte_mca_params.lo*07:32:20*   CC
runtime/orte_wait.lo*07:32:21*   CC   runtime/orte_cr.lo*07:32:21*
  CC   runtime/orte_data_server.lo*07:32:21*   CC
runtime/orte_info_support.lo*07:32:21*   CC
util/error_strings.lo*07:32:22*   CC   util/name_fns.lo*07:32:22*
 CC   util/proc_info.lo*07:32:22*   CC
util/session_dir.lo*07:32:23*   CC   util/show_help.lo*07:32:23*
CC   util/context_fns.lo*07:32:23*   CC
util/parse_options.lo*07:32:23*   CC
util/pre_condition_transports.lo*07:32:24*   CC
util/hnp_contact.lo*07:32:24*   LEX
util/hostfile/hostfile_lex.c*07:32:24*   CC
util/hostfile/hostfile_lex.lo*07:32:24*   CC
util/hostfile/hostfile.lo*07:32:25*   CC
util/dash_host/dash_host.lo*07:32:26*   CC
util/comm/comm.lo*07:32:26*   CC   util/nidmap.lo*07:32:26*
util/nidmap.c: In function 'orte_util_decode_pidmap':*07:32:26*
util/nidmap.c:1033: error: 'OPAL_DB_LOCALRANK' undeclared (first use
in this function)*07:32:26* util/nidmap.c:1033: error: (Each
undeclared identifier is reported only once*07:32:26*
util/nidmap.c:1033: error: for each function it appears in.)*07:32:26*
make[2]: *** [util/nidmap.lo] Error 1*07:32:26* make[2]: Leaving
directory `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:26*
make[1]: *** [install-recursive] Error 1*07:32:26* make[1]: Leaving
directory `/scrap/jenkins/workspace/ompi-vendor-gerrit/label/hpc/orte'*07:32:26*
make: *** [install-recursive] Error 1