Re: [OMPI devel] Remaining MTT errors

2015-09-14 Thread Gilles Gouaillardet

Ralph,

the collective/op, collective/op_mpifh, collective/op_usempi, 
group/group, onesided/c_lock_illegal and random/attr-error-code fails 
because your contrib/platform/intel/bend/linux.conf contains the 
following line


mpi_param_check = 0

and this is not handled correctly by ibm test suite.

for example, in op.c, we handle
- mpi_param_check is disabled at configure time
- mpi_param_check is disabled at runtime, via mca cli or environment 
variable

*but*
mpi_param_check being disabled by the config file is not supported.

if you run
mpirun --mca mpi_param_check 0 ...
or
mpirun --mca mpi_param_check 1 ...
or
comment the mpi_param_check = ... from your config file

this test would run just fine (!)

that leads to a few questions :
1) should we handle this scenario (e.g. check config file) in mtt test 
itself ? (and how ? via MPIT ? )

2) should we handle this scenario before running the test ?
(e.g. ompi_info ... --all | grep mpi_param_check, and force 
OMPI_MCA_mpi_param_check=0 environment variable if mpi_param_check is 
disabled)

3) should we handle this scenario in ompi itself ?
(e.g. if the param config file contains a definition, and no related, 
environment variable is set, then force the environment variable but do 
not propagate it)


random/attr-error-code only check mpi_param_check at configure time, and 
i will fix that from now


for now, i suggest you comment the mpi_param_check = 0 line from your 
linux.conf file


Cheers,

Gilles

On 9/12/2015 9:51 AM, Ralph Castain wrote:

Hi folks

I’ve closed all the holes I can find in the PMIx integration, and 
things look pretty good overall. There are a handful of failures still 
being seen - most of them involving what appear to be unrelated code. 
I’m not entirely sure I understand the source of the errors, and could 
really use some help to determine (a) if these are in any way related 
to PMIx, and if so (b) how.


The errors from my MTT run are here: 
http://mtt.open-mpi.org/index.php?do_redir=2256


Any help diagnosing these problems would be greatly appreciated
Ralph



___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/09/18013.php




Re: [OMPI devel] oshmem examples cannot be built

2015-09-14 Thread Igor

Hello,
Checked on #6a8fad4 and it looks fine.
$git show -1
commit 6a8fad49e57007cf8edce4ad8406f724d4e11f8f
Author: Ralph Castain 
List-Post: devel@lists.open-mpi.org
Date:   Fri Sep 11 02:01:25 2015 -0700

Configure:
--disable-vt --enable-orterun-prefix-by-default --enable-oshmem 
--with-slurm --with-pmi --enable-debug

Build examples:
$env PATH=$PWD/../install/bin:$PATH make
mpicc -ghello_c.c   -o hello_c
mpicc -gring_c.c   -o ring_c
mpicc -gconnectivity_c.c   -o connectivity_c
make[1]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[2]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'

mpifort -g hello_mpifh.f -o hello_mpifh
mpifort -g ring_mpifh.f -o ring_mpifh
make[2]: Leaving directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[1]: Leaving directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[1]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[2]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'

shmemcc -g hello_oshmem_c.c -o hello_oshmem
make[2]: Leaving directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[2]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'

shmemcc -g ring_oshmem_c.c -o ring_oshmem
make[2]: Leaving directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'
make[2]: Entering directory 
`/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples'


Igor

On 14.09.2015 2:34, Jeff Squyres (jsquyres) wrote:

I actually tested manually before I replied -- I did a "make" in the examples 
dir, and it used shmemcc.  So something went wonky in Ralph's build for some reason...?



On Sep 12, 2015, at 11:52 AM, Paul Hargrove  wrote:

Another FYI.
In my Aug 26 build of master (openmpi-dev-2371-gea935df.tar.bz2) running "make" 
in the examples directory *did* use shmemcc:

make[2]: Entering directory 
`/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u2/BLD/examples'
shmemcc -g ring_oshmem_c.c -o ring_oshmem

So, something has changed if mpicc is being used today.

-Paul

On Sat, Sep 12, 2015 at 5:58 AM, Jeff Squyres (jsquyres)  
wrote:
On Sep 11, 2015, at 9:00 PM, Ralph Castain  wrote:

FWIW: shemcc is just a symlink to mpicc, and I don’t see any -loshmem in that 
—showme output

$ shmemcc --showme
gcc -I/home/jsquyres/bogus/include -pthread -Wl,-rpath 
-Wl,/home/jsquyres/bogus/lib -Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib 
-loshmem -lmpi

The actual argv[0] of the executable should be determining which data file is 
used to populate the underlying argv.

Probably best to open a github issue on this and assign to the OSHMEM devs to 
figure out what is going on here...?



On Sep 11, 2015, at 5:43 PM, Ralph Castain  wrote:

I typed “make” - the Makefile determines what to call. I suspect it isn’t 
calling the right thing



On Sep 11, 2015, at 4:17 PM, Jeff Squyres (jsquyres)  wrote:

Shouldn't you be using shmemcc, not mpicc?



On Sep 11, 2015, at 7:01 PM, Ralph Castain  wrote:

On current master:

03:57:56  (topic/pmix) /home/common/openmpi/foobar/examples$ make ring_oshmem_c
mpicc -gring_oshmem_c.c   -o ring_oshmem_c
/tmp/ccfqcVje.o: In function `main':
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:20: undefined reference to 
`start_pes'
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:21: undefined reference to 
`_my_pe'
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:22: undefined reference to 
`_num_pes'
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:32: undefined reference to 
`shmem_int_put'
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:44: undefined reference to 
`shmem_int_wait_until'
/home/common/openmpi/foobar/examples/ring_oshmem_c.c:49: undefined reference to 
`shmem_int_put'
collect2: error: ld returned 1 exit status
make: *** [ring_oshmem_c] Error 1
03:58:51  (topic/pmix) /home/common/openmpi/foobar/examples$ mpicc --showme
gcc -I/home/common/openmpi/build/foobar/include/openmpi 
-I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/hwloc/hwloc1110/hwloc/include
 
-I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/event/libevent2022/libevent
 
-I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/event/libevent2022/libevent/include
 -I/home/common/openmpi/build/foobar/include -pthread -Wl,-rpath 
-Wl,/home/common/openmpi/build/foobar/lib -Wl,--enable-new-dtags 
-L/home/common/openmpi/build/foobar/lib -lmpi
03:59:12  (topic/pmix) /home/common/openmpi/foobar/examples$

None of the oshmem examples can be built - all fail with the same error. My 
configure:

enable_orterun_prefix_by_default=yes
enable_mpi_thread_multiple=no
enable_mem_debug=no
enable_mem_profile=no
enable_debug_symbols=yes
enable_binaries=yes
enable_heterogeneous=no
enable_picky=yes
enable_debug=yes
enable_shared=yes
enable_static=no
enable_memchecker=no
enable_ipv6=no
enable_mpi_fortran=yes
enable_mpi_cxx=no
enable_mpi_cxx_seek=no
enable_cxx_exceptions=no
enable

[OMPI devel] MPI-3.1 books now available

2015-09-14 Thread Jeff Squyres (jsquyres)
The MPI-3.1 standard is now available (at cost) in hardcover:

http://blogs.cisco.com/performance/mpi-3-1-books-now-available-in-hardcover

Enjoy.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Remaining MTT errors

2015-09-14 Thread Jeff Squyres (jsquyres)
On Sep 14, 2015, at 12:40 AM, Gilles Gouaillardet  wrote:
> 
> that leads to a few questions :
> 1) should we handle this scenario (e.g. check config file) in mtt test itself 
> ? (and how ? via MPIT ? )

The tests (sorta) tried to handle this.

I got some inspiration from Gilles' initial commit, and added a bit to the test 
suite this morning: it checks MPI_T to see if mpi_param_check is 0 or not.  
This is a robust, reliable way to know.  I wrapped it all up in an ompitest 
utility function, and edited each of the above tests to use it.

Hopefully, that should fix the problem.

> 2) should we handle this scenario before running the test ?
> (e.g. ompi_info ... --all | grep mpi_param_check, and force 
> OMPI_MCA_mpi_param_check=0 environment variable if mpi_param_check is 
> disabled)
> 3) should we handle this scenario in ompi itself ?
> (e.g. if the param config file contains a definition, and no related, 
> environment variable is set, then force the environment variable but do not 
> propagate it)
> 
> random/attr-error-code only check mpi_param_check at configure time, and i 
> will fix that from now
> 
> for now, i suggest you comment the mpi_param_check = 0 line from your 
> linux.conf file
> 
> Cheers,
> 
> Gilles
> 
> On 9/12/2015 9:51 AM, Ralph Castain wrote:
>> Hi folks
>> 
>> I’ve closed all the holes I can find in the PMIx integration, and things 
>> look pretty good overall. There are a handful of failures still being seen - 
>> most of them involving what appear to be unrelated code. I’m not entirely 
>> sure I understand the source of the errors, and could really use some help 
>> to determine (a) if these are in any way related to PMIx, and if so (b) how.
>> 
>> The errors from my MTT run are here:  
>> http://mtt.open-mpi.org/index.php?do_redir=2256
>> 
>> Any help diagnosing these problems would be greatly appreciated
>> Ralph
>> 
>> 
>> 
>> ___
>> devel mailing list
>> 
>> de...@open-mpi.org
>> 
>> Subscription: 
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/18013.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18031.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] HWLOC issue

2015-09-14 Thread George Bosilca
Brice,

I confirm your patch solves the issue I reported earlier for OMPI. I did
not try it on a standalone HWLOC, so I am not sure that it maintains the
coherency of the output. If you want I can give it a try.

Thanks,
  George.


On Thu, Sep 10, 2015 at 6:08 PM, Brice Goglin  wrote:

> Try this patch (it applies to hwloc v1.9-v1.11, it should be OK against
> OMPI's tree).
> Your bridge 22:00.0 says it contains the master bus 00. It causes a cycle
> in hwloc's insert algorithm, caught be the assertion. The patch just
> removes this invalid bridge entirely.
>
> Brice
>
>
>
> Le 10/09/2015 21:23, George Bosilca a écrit :
>
> It used to work. Now I don't know exactly when I last updated the trunk
> version on the cluster, but not more than 10 days ago.
>
> lstopo complains with the same assert. Interestingly enough, the same
> binary succeed on the other nodes of the same cluster ...
>
>   George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin 
> wrote:
>
>> Did it work on the same machine before? Or did OMPI enable hwloc's PCI
>> discovery recently?
>>
>> Does lstopo complain the same?
>>
>> Brice
>>
>>
>>
>> Le 10/09/2015 21:10, George Bosilca a écrit :
>>
>> With the current trunk version I keep getting an assert deep down in
>> orted.
>>
>> orted:
>> ../../../../../../../ompi/opal/mca/hwloc/hwloc1110/hwloc/src/pci-common.c:177:
>> hwloc_pci_try_insert_siblings_below_new_bridge: Assertion `comp !=
>> HWLOC_PCI_BUSID_SUPERSET' failed.
>>
>> The stack looks like this:
>>
>> [dancer18:21100] *** Process received signal ***
>> [dancer18:21100] Signal: Aborted (6)
>> [dancer18:21100] Signal code:  (-6)
>> [dancer18:21100] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fc22ce61710]
>> [dancer18:21100] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7fc22caf0625]
>> [dancer18:21100] [ 2] /lib64/libc.so.6(abort+0x175)[0x7fc22caf1e05]
>> [dancer18:21100] [ 3] /lib64/libc.so.6(+0x2b74e)[0x7fc22cae974e]
>> [dancer18:21100] [ 4]
>> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x7fc22cae9810]
>> [dancer18:21100] [ 5]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0a62)[0x7fc22ddc6a62]
>> [dancer18:21100] [ 6]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0b60)[0x7fc22ddc6b60]
>> [dancer18:21100] [ 7]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_insert_pci_device_list+0x8f)[0x7fc22ddc724c]
>> [dancer18:21100] [ 8]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xbf2d6)[0x7fc22ddd52d6]
>> [dancer18:21100] [ 9]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xd22f7)[0x7fc22dde82f7]
>> [dancer18:21100] [10]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_topology_load+0x1a3)[0x7fc22dde8ee1]
>> [dancer18:21100] [11]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc_base_get_topology+0x80)[0x7fc22ddb6ece]
>> [dancer18:21100] [12]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_ess_base_orted_setup+0x127)[0x7fc22e0b3523]
>> [dancer18:21100] [13]
>> /home/bosilca/opt/trunk/debug/lib/openmpi/mca_ess_env.so(+0xe45)[0x7fc22c6bbe45]
>> [dancer18:21100] [14]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_init+0x2c6)[0x7fc22e06b55a]
>> [dancer18:21100] [15]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_daemon+0x5c1)[0x7fc22e09a895]
>> [dancer18:21100] [16] /home/bosilca/opt/trunk/debug/bin/orted[0x40082a]
>> [dancer18:21100] [17]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc22cadcd5d]
>> [dancer18:21100] [18] /home/bosilca/opt/trunk/debug/bin/orted[0x4006e9]
>>
>> Any ideas?
>>
>>   George.
>>
>>
>>
>> ___
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17993.php
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/17994.php
>>
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17995.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/17997.php
>