Re: [OMPI devel] Remaining MTT errors
Ralph, the collective/op, collective/op_mpifh, collective/op_usempi, group/group, onesided/c_lock_illegal and random/attr-error-code fails because your contrib/platform/intel/bend/linux.conf contains the following line mpi_param_check = 0 and this is not handled correctly by ibm test suite. for example, in op.c, we handle - mpi_param_check is disabled at configure time - mpi_param_check is disabled at runtime, via mca cli or environment variable *but* mpi_param_check being disabled by the config file is not supported. if you run mpirun --mca mpi_param_check 0 ... or mpirun --mca mpi_param_check 1 ... or comment the mpi_param_check = ... from your config file this test would run just fine (!) that leads to a few questions : 1) should we handle this scenario (e.g. check config file) in mtt test itself ? (and how ? via MPIT ? ) 2) should we handle this scenario before running the test ? (e.g. ompi_info ... --all | grep mpi_param_check, and force OMPI_MCA_mpi_param_check=0 environment variable if mpi_param_check is disabled) 3) should we handle this scenario in ompi itself ? (e.g. if the param config file contains a definition, and no related, environment variable is set, then force the environment variable but do not propagate it) random/attr-error-code only check mpi_param_check at configure time, and i will fix that from now for now, i suggest you comment the mpi_param_check = 0 line from your linux.conf file Cheers, Gilles On 9/12/2015 9:51 AM, Ralph Castain wrote: Hi folks I’ve closed all the holes I can find in the PMIx integration, and things look pretty good overall. There are a handful of failures still being seen - most of them involving what appear to be unrelated code. I’m not entirely sure I understand the source of the errors, and could really use some help to determine (a) if these are in any way related to PMIx, and if so (b) how. The errors from my MTT run are here: http://mtt.open-mpi.org/index.php?do_redir=2256 Any help diagnosing these problems would be greatly appreciated Ralph ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2015/09/18013.php
Re: [OMPI devel] oshmem examples cannot be built
Hello, Checked on #6a8fad4 and it looks fine. $git show -1 commit 6a8fad49e57007cf8edce4ad8406f724d4e11f8f Author: Ralph Castain List-Post: devel@lists.open-mpi.org Date: Fri Sep 11 02:01:25 2015 -0700 Configure: --disable-vt --enable-orterun-prefix-by-default --enable-oshmem --with-slurm --with-pmi --enable-debug Build examples: $env PATH=$PWD/../install/bin:$PATH make mpicc -ghello_c.c -o hello_c mpicc -gring_c.c -o ring_c mpicc -gconnectivity_c.c -o connectivity_c make[1]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[2]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' mpifort -g hello_mpifh.f -o hello_mpifh mpifort -g ring_mpifh.f -o ring_mpifh make[2]: Leaving directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[1]: Leaving directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[1]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[2]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' shmemcc -g hello_oshmem_c.c -o hello_oshmem make[2]: Leaving directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[2]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' shmemcc -g ring_oshmem_c.c -o ring_oshmem make[2]: Leaving directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' make[2]: Entering directory `/hpc/home/USERS/ivanovi/prj/ii-ompi-trunk/examples' Igor On 14.09.2015 2:34, Jeff Squyres (jsquyres) wrote: I actually tested manually before I replied -- I did a "make" in the examples dir, and it used shmemcc. So something went wonky in Ralph's build for some reason...? On Sep 12, 2015, at 11:52 AM, Paul Hargrove wrote: Another FYI. In my Aug 26 build of master (openmpi-dev-2371-gea935df.tar.bz2) running "make" in the examples directory *did* use shmemcc: make[2]: Entering directory `/home/phargrov/OMPI/openmpi-master-linux-x86_64-ss12u2/BLD/examples' shmemcc -g ring_oshmem_c.c -o ring_oshmem So, something has changed if mpicc is being used today. -Paul On Sat, Sep 12, 2015 at 5:58 AM, Jeff Squyres (jsquyres) wrote: On Sep 11, 2015, at 9:00 PM, Ralph Castain wrote: FWIW: shemcc is just a symlink to mpicc, and I don’t see any -loshmem in that —showme output $ shmemcc --showme gcc -I/home/jsquyres/bogus/include -pthread -Wl,-rpath -Wl,/home/jsquyres/bogus/lib -Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib -loshmem -lmpi The actual argv[0] of the executable should be determining which data file is used to populate the underlying argv. Probably best to open a github issue on this and assign to the OSHMEM devs to figure out what is going on here...? On Sep 11, 2015, at 5:43 PM, Ralph Castain wrote: I typed “make” - the Makefile determines what to call. I suspect it isn’t calling the right thing On Sep 11, 2015, at 4:17 PM, Jeff Squyres (jsquyres) wrote: Shouldn't you be using shmemcc, not mpicc? On Sep 11, 2015, at 7:01 PM, Ralph Castain wrote: On current master: 03:57:56 (topic/pmix) /home/common/openmpi/foobar/examples$ make ring_oshmem_c mpicc -gring_oshmem_c.c -o ring_oshmem_c /tmp/ccfqcVje.o: In function `main': /home/common/openmpi/foobar/examples/ring_oshmem_c.c:20: undefined reference to `start_pes' /home/common/openmpi/foobar/examples/ring_oshmem_c.c:21: undefined reference to `_my_pe' /home/common/openmpi/foobar/examples/ring_oshmem_c.c:22: undefined reference to `_num_pes' /home/common/openmpi/foobar/examples/ring_oshmem_c.c:32: undefined reference to `shmem_int_put' /home/common/openmpi/foobar/examples/ring_oshmem_c.c:44: undefined reference to `shmem_int_wait_until' /home/common/openmpi/foobar/examples/ring_oshmem_c.c:49: undefined reference to `shmem_int_put' collect2: error: ld returned 1 exit status make: *** [ring_oshmem_c] Error 1 03:58:51 (topic/pmix) /home/common/openmpi/foobar/examples$ mpicc --showme gcc -I/home/common/openmpi/build/foobar/include/openmpi -I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/hwloc/hwloc1110/hwloc/include -I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/event/libevent2022/libevent -I/home/common/openmpi/build/foobar/include/openmpi/opal/mca/event/libevent2022/libevent/include -I/home/common/openmpi/build/foobar/include -pthread -Wl,-rpath -Wl,/home/common/openmpi/build/foobar/lib -Wl,--enable-new-dtags -L/home/common/openmpi/build/foobar/lib -lmpi 03:59:12 (topic/pmix) /home/common/openmpi/foobar/examples$ None of the oshmem examples can be built - all fail with the same error. My configure: enable_orterun_prefix_by_default=yes enable_mpi_thread_multiple=no enable_mem_debug=no enable_mem_profile=no enable_debug_symbols=yes enable_binaries=yes enable_heterogeneous=no enable_picky=yes enable_debug=yes enable_shared=yes enable_static=no enable_memchecker=no enable_ipv6=no enable_mpi_fortran=yes enable_mpi_cxx=no enable_mpi_cxx_seek=no enable_cxx_exceptions=no enable
[OMPI devel] MPI-3.1 books now available
The MPI-3.1 standard is now available (at cost) in hardcover: http://blogs.cisco.com/performance/mpi-3-1-books-now-available-in-hardcover Enjoy. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Remaining MTT errors
On Sep 14, 2015, at 12:40 AM, Gilles Gouaillardet wrote: > > that leads to a few questions : > 1) should we handle this scenario (e.g. check config file) in mtt test itself > ? (and how ? via MPIT ? ) The tests (sorta) tried to handle this. I got some inspiration from Gilles' initial commit, and added a bit to the test suite this morning: it checks MPI_T to see if mpi_param_check is 0 or not. This is a robust, reliable way to know. I wrapped it all up in an ompitest utility function, and edited each of the above tests to use it. Hopefully, that should fix the problem. > 2) should we handle this scenario before running the test ? > (e.g. ompi_info ... --all | grep mpi_param_check, and force > OMPI_MCA_mpi_param_check=0 environment variable if mpi_param_check is > disabled) > 3) should we handle this scenario in ompi itself ? > (e.g. if the param config file contains a definition, and no related, > environment variable is set, then force the environment variable but do not > propagate it) > > random/attr-error-code only check mpi_param_check at configure time, and i > will fix that from now > > for now, i suggest you comment the mpi_param_check = 0 line from your > linux.conf file > > Cheers, > > Gilles > > On 9/12/2015 9:51 AM, Ralph Castain wrote: >> Hi folks >> >> I’ve closed all the holes I can find in the PMIx integration, and things >> look pretty good overall. There are a handful of failures still being seen - >> most of them involving what appear to be unrelated code. I’m not entirely >> sure I understand the source of the errors, and could really use some help >> to determine (a) if these are in any way related to PMIx, and if so (b) how. >> >> The errors from my MTT run are here: >> http://mtt.open-mpi.org/index.php?do_redir=2256 >> >> Any help diagnosing these problems would be greatly appreciated >> Ralph >> >> >> >> ___ >> devel mailing list >> >> de...@open-mpi.org >> >> Subscription: >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18013.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18031.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] HWLOC issue
Brice, I confirm your patch solves the issue I reported earlier for OMPI. I did not try it on a standalone HWLOC, so I am not sure that it maintains the coherency of the output. If you want I can give it a try. Thanks, George. On Thu, Sep 10, 2015 at 6:08 PM, Brice Goglin wrote: > Try this patch (it applies to hwloc v1.9-v1.11, it should be OK against > OMPI's tree). > Your bridge 22:00.0 says it contains the master bus 00. It causes a cycle > in hwloc's insert algorithm, caught be the assertion. The patch just > removes this invalid bridge entirely. > > Brice > > > > Le 10/09/2015 21:23, George Bosilca a écrit : > > It used to work. Now I don't know exactly when I last updated the trunk > version on the cluster, but not more than 10 days ago. > > lstopo complains with the same assert. Interestingly enough, the same > binary succeed on the other nodes of the same cluster ... > > George. > > > On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin > wrote: > >> Did it work on the same machine before? Or did OMPI enable hwloc's PCI >> discovery recently? >> >> Does lstopo complain the same? >> >> Brice >> >> >> >> Le 10/09/2015 21:10, George Bosilca a écrit : >> >> With the current trunk version I keep getting an assert deep down in >> orted. >> >> orted: >> ../../../../../../../ompi/opal/mca/hwloc/hwloc1110/hwloc/src/pci-common.c:177: >> hwloc_pci_try_insert_siblings_below_new_bridge: Assertion `comp != >> HWLOC_PCI_BUSID_SUPERSET' failed. >> >> The stack looks like this: >> >> [dancer18:21100] *** Process received signal *** >> [dancer18:21100] Signal: Aborted (6) >> [dancer18:21100] Signal code: (-6) >> [dancer18:21100] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fc22ce61710] >> [dancer18:21100] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7fc22caf0625] >> [dancer18:21100] [ 2] /lib64/libc.so.6(abort+0x175)[0x7fc22caf1e05] >> [dancer18:21100] [ 3] /lib64/libc.so.6(+0x2b74e)[0x7fc22cae974e] >> [dancer18:21100] [ 4] >> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x7fc22cae9810] >> [dancer18:21100] [ 5] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0a62)[0x7fc22ddc6a62] >> [dancer18:21100] [ 6] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0b60)[0x7fc22ddc6b60] >> [dancer18:21100] [ 7] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_insert_pci_device_list+0x8f)[0x7fc22ddc724c] >> [dancer18:21100] [ 8] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xbf2d6)[0x7fc22ddd52d6] >> [dancer18:21100] [ 9] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xd22f7)[0x7fc22dde82f7] >> [dancer18:21100] [10] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_topology_load+0x1a3)[0x7fc22dde8ee1] >> [dancer18:21100] [11] >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc_base_get_topology+0x80)[0x7fc22ddb6ece] >> [dancer18:21100] [12] >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_ess_base_orted_setup+0x127)[0x7fc22e0b3523] >> [dancer18:21100] [13] >> /home/bosilca/opt/trunk/debug/lib/openmpi/mca_ess_env.so(+0xe45)[0x7fc22c6bbe45] >> [dancer18:21100] [14] >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_init+0x2c6)[0x7fc22e06b55a] >> [dancer18:21100] [15] >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_daemon+0x5c1)[0x7fc22e09a895] >> [dancer18:21100] [16] /home/bosilca/opt/trunk/debug/bin/orted[0x40082a] >> [dancer18:21100] [17] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc22cadcd5d] >> [dancer18:21100] [18] /home/bosilca/opt/trunk/debug/bin/orted[0x4006e9] >> >> Any ideas? >> >> George. >> >> >> >> ___ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/17993.php >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/17994.php >> > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/17995.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/17997.php >