Paul,

While looking at the issue, we noticed that we were missing some code that 
deals with MTL priorities.

PR 409 (https://github.com/open-mpi/ompi-release/pull/409) is attempting to fix 
that.

Hopefully, this will also fix the error you encountered.

Thanks again,
Yohann

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Paul Hargrove
Sent: Wednesday, July 22, 2015 12:07 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] 1.10.0rc2

Yohann,

Things run fine with those additional flags.
In fact, adding just "--mca pml cm" is sufficient to eliminate the SEGV.

-Paul

On Wed, Jul 22, 2015 at 8:49 AM, Burette, Yohann 
<yohann.bure...@intel.com<mailto:yohann.bure...@intel.com>> wrote:
Hi Paul,

Thank you for doing all this testing!

About 1), it’s hard for me to see whether it’s a problem with mtl:ofi or with 
how OMPI selects the components to use.
Could you please run your test again with “--mca mtl ofi --mca mtl_ofi_provider 
sockets --mca pml cm”?
The idea is that if it still fails, then we have a problem with either mtl:ofi 
or the OFI/sockets provider. If it works, then there is an issue with how OMPI 
selects what component to use.

I just tried 1.10.0rc2 with the latest libfabric (master) and it seems to work 
fine.

Yohann

From: devel 
[mailto:devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org>] On 
Behalf Of Paul Hargrove
Sent: Wednesday, July 22, 2015 1:05 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] 1.10.0rc2

1.10.0rc2 looks mostly good to me, but I still found some issues.


1) New to this round of testing, I have built mtl:ofi with gcc, pgi, icc, 
clang, open64 and studio compilers.
I have only the sockets provider in libfaric (v1.0.0 and 1.1.0rc2).
However, unless I pass "-mca mtl ^ofi" to mpirun I get a SEGV from a callback 
invoked in opal_progress().
Gdb did not give a function name for the  callback, but the PC looks valid.


2) Of the several compilers I tried, only pgi-13.0 failed to compile mtl:ofi:

        /bin/sh ../../../../libtool  --tag=CC   --mode=compile pgcc 
-DHAVE_CONFIG_H -I. 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi
 -I../../../../opal/include -I../../../../orte/include 
-I../../../../ompi/include -I../../../../oshmem/include 
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen 
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen  
-I/usr/common/ftg/libfabric/1.1.0rc2p1/include 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2
 -I../../../.. 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include
   
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include
  -g  -c -o mtl_ofi_component.lo 
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c
libtool: compile:  pgcc -DHAVE_CONFIG_H -I. 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi
 -I../../../../opal/include -I../../../../orte/include 
-I../../../../ompi/include -I../../../../oshmem/include 
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/private/autogen 
-I../../../../opal/mca/hwloc/hwloc191/hwloc/include/hwloc/autogen 
-I/usr/common/ftg/libfabric/1.1.0rc2p1/include 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2
 -I../../../.. 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/orte/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/oshmem/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/hwloc/hwloc191/hwloc/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/hwloc/hwloc191/hwloc/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/opal/mca/event/libevent2021/libevent/include
 
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/BLD/opal/mca/event/libevent2021/libevent/include
 -g -c 
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c
  -fpic -DPIC -o .libs/mtl_ofi_component.o
PGC-S-0060-opal_convertor_clone is not a member of this struct or union 
(/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.10.0rc2-linux-x86_64-pgi-13.10/openmpi-1.10.0rc2/ompi/mca/mtl/ofi/mtl_ofi_component.c:
 51)
pgcc-Fatal-/global/scratch2/sd/hargrove/pgi-13.10/linux86-64/13.10/bin/pgc 
TERMINATED by signal 11

Since this ends with a SEGV in the compiler, I don't think this is an issue 
with the C code, just a plain compiler bug.
At lease pgi-9.0-4 and pgi-10.9 compiled the code just fine.


3) As I noted in a separate email, there are some newly uncovered issues in the 
embedded hwloc w/ pgi and -m32.
However, I had not tested such configurations previously, and all indications 
are that these issues have existed for a while.
Brice is on vacation, so there will not be an official hwloc fix for this issue 
until next week at the earliest.
[The upside is that I now have coverage for eight additional x86 configurations 
(true x86 or x86-64 w/ -m32).]


4) I noticed a couple warnings somebody might want to investigate:
  
openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:2323:59:
 warning: format specifies type 'int' but the argument has type 'struct ibv_qp 
*' [-Wformat]
  openmpi-1.10.0rc2/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c", 
line 2471: warning: improper pointer/integer combination: arg #3



Also worth noting:

The ConnectX and ConnectIB XRC detection logic appears to be working as 
expected on multiple systems.

I also have learned that pgi-9.0-4 is not a conforming C99 compiler when passed 
-m32, which is not Open MPI's fault.


And as before...
+ I am currently without any SPARC platforms
+ Several qemu-emulated ARM and MIPS tests will complete by morning (though I 
have some ARM successes already)


-Paul

On Tue, Jul 21, 2015 at 12:29 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
Hey folks

1.10.0rc2 is now out for review - excepting the library version numbers, this 
should be the final version. Please take a quick gander and let me know of any 
problems.

http://www.open-mpi.org/software/ompi/v1.10/

Ralph


_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/07/17670.php



--
Paul H. Hargrove                          
phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: 
+1-510-495-2352<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory     Fax: 
+1-510-486-6900<tel:%2B1-510-486-6900>

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/07/17681.php



--
Paul H. Hargrove                          
phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to