I tried last nights v1.8 tarball (openmpi-v1.8.3-272-g4e4f997.tar.bz2) with
the Studio Compilers (v12.3) on a Solaris/x86-64 system.
Configure args (other than prefix) were:

--enable-debug --with-verbs \
CC=cc CXX=CC FC=f90 \
CFLAGS=-m64 --with-wrapper-cflags=-m64 \
FCFLAGS=-m64 --with-wrapper-fcflags=-m64 \
CXXFLAGS='-m64 -library=stlport4' --with-wrapper-cxxflags='-m64
-library=stlport4'


When running ring_c I see the following

$ mpirun -mca btl sm,self,openib -np 2 -host pcp-j-19,pcp-j-20
examples/ring_c'
[pcp-j-20:24250] mca_oob_tcp_accept: accept() failed: Error 0 (0).
[pcp-j-20:24250] *** Process received signal ***
[pcp-j-20:24250] Signal: Segmentation Fault (11)
[pcp-j-20:24250] Signal code: Address not mapped (1)
[pcp-j-20:24250] Failing at address: fffffd7fe45bf227
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'opal_backtrace_print+0x2d
[0xfffffd7fe450a91d]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'show_stackframe+0xafd
[0xfffffd7fe450066d]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff202cc6]
/lib/amd64/libc.so.1'call_user_handler+0x2aa [0xfffffd7fff1f648e]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'opal_hwloc172_hwloc_get_obj_by_depth+0x1d7
[0xfffffd7fe45bf227] [Signal 11 (SEGV)]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'opal_hwloc172_hwloc_get_root_obj+0x24
[0xfffffd7fe4560504]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'opal_hwloc_base_get_nbobjs_by_type+0xec
[0xfffffd7fe45653ec]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/openmpi/mca_rmaps_round_robin.so'orte_rmaps_rr_byobj+0x252
[0xfffffd7fe1c9ddd2]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/openmpi/mca_rmaps_round_robin.so'orte_rmaps_rr_map+0x65e
[0xfffffd7fe1c912be]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-rte.so.7.0.5'orte_rmaps_base_map_job+0xdce
[0xfffffd7fe276aace]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'event_process_ac
tive_single_queue+0x1dc [0xfffffd7fe453afbc]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'event_process_active+0xb1
[0xfffffd7fe453b361]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/lib/libopen-pal.so.6.2.1'opal_libevent2021_event_base_loop+0x339
[0xfffffd7fe453bc79]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/bin/orterun'orterun+0x1d0e
[0x4101fe]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/bin/orterun'main+0x20
[0x408ca0]
/shared/OMPI/openmpi-1.8-latest-solaris11-x64-ib-ss12u3-nightly/INST/bin/orterun'0x8b0b
[0x408b0b]
[pcp-j-20:24250] *** End of error message ***


Dbx gives me

t@1 (l@1) terminated by signal SEGV (no mapping at the fault address)
Current function is opal_hwloc172_hwloc_get_obj_by_depth
   74     return topology->levels[depth][idx];
(dbx) where
current thread: t@1
=>[1] opal_hwloc172_hwloc_get_obj_by_depth(topology = 0x4d49e0, depth = 0,
idx = 0), line 74 in "traversal.c"
  [2] opal_hwloc172_hwloc_get_root_obj(topology = 0x4d49e0), line 118 in
"helper.h"
  [3] opal_hwloc_base_get_nbobjs_by_type(topo = 0x4d49e0, target =
OPAL_HWLOC172_hwloc_OBJ_CORE, cache_level = 0, rtype = '\003'), line 833 in
"hwloc_base_util.c"
  [4] orte_rmaps_rr_byobj(jdata = 0x43c940, app = 0x483fe0, node_list =
0xfffffd7fffdff4b0, num_slots = 2, num_procs = 2U, target =
OPAL_HWLOC172_hwloc_OBJ_CORE, cache_level = 0), line 495 in
"rmaps_rr_mappers.c"
  [5] orte_rmaps_rr_map(jdata = 0x43c940), line 165 in "rmaps_rr.c"
  [6] orte_rmaps_base_map_job(fd = -1, args = 4, cbdata = 0x4a3300), line
277 in "rmaps_base_map_job.c"
  [7] event_process_active_single_queue(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at
0xfffffd7fe453afbc
  [8] event_process_active(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at
0xfffffd7fe453b361
  [9] opal_libevent2021_event_base_loop(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at
0xfffffd7fe453bc79
  [10] orterun(argc = 9, argv = 0xfffffd7fffdffa58), line 1081 in
"orterun.c"
  [11] main(argc = 9, argv = 0xfffffd7fffdffa58), line 13 in "main.c"
(dbx) print depth
depth = 0
(dbx) print index
index = 0xfffffd7fff19c174


Pretty sure that index value is bogus.

-Paul



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to