Following up as I promised... My results on NERSC's small Cray XE6 (the test/dev rack "Grace", rather than the full-sized "Hopper") match those I get on the Cray XC30 (Edison), and don't follow those Ralph reports for LANL's XE6.
An attempt to build/link hello_c.c results in unresolved symbols from libnuma, libxpmem and libugni. A complete list is available if it matters. This is still with last night's openmpi-1.9a1r27905 tarball, and the following 1-line mod to the platform file: - enable_shared=yes + enable_shared=no If it will help determine what is going on, I can probably get NERSC accounts for any of the DOE Lab folks easily. They will only get access to the full-sized XE6 (Hopper) for now. In case any of these are helpful clues to the difference(s): $ module list Currently Loaded Modulefiles: 1) modules/3.2.6.6 18) dvs/1.8.6_0.9.0-1.0401.1401.1.120 2) torque/4.1.4-snap.201211160904 19) csa/3.0.0-1_2.0401.37452.4.50.gem 3) moab/6.0.4 20) job/1.5.5-0.1_2.0401.35380.1.10.gem 4) xtpe-network-gemini 21) xpmem/0.1-2.0401.36790.4.3.gem 5) cray-mpich2/5.6.0 22) gni-headers/2.1-1.0401.5675.4.4.gem 6) atp/1.6.0 23) dmapp/3.2.1-1.0401.5983.4.5.gem 7) xe-sysroot/4.1.40 24) pmi/4.0.0-1.0000.9282.69.4.gem 8) switch/1.0-1.0401.36779.2.72.gem 25) ugni/4.0-1.0401.5928.9.5.gem 9) shared-root/1.0-1.0401.37253.3.50.gem 26) udreg/2.3.2-1.0401.5929.3.3.gem 10) pdsh/2.26-1.0401.37449.1.1.gem 27) xt-libsci/12.0.00 11) nodehealth/5.0-1.0401.38460.12.18.gem 28) gcc/4.7.2 12) lbcd/2.1-1.0401.35360.1.2.gem 29) xt-asyncpe/5.16 13) hosts/1.0-1.0401.35364.1.115.gem 30) eswrap/1.0.10 14) configuration/1.0-1.0401.35391.1.2.gem 31) xtpe-mc12 15) ccm/2.2.0-1.0401.37254.2.142 32) cray-shmem/5.6.0 16) audit/1.0.0-1.0401.37969.2.32.gem 33) PrgEnv-gnu/4.1.40 17) rca/1.0.0-2.0401.38656.2.2.gem -Paul On Fri, Jan 25, 2013 at 5:50 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > Ralph, > > Again our results differ. > I did NOT need the additional #include to link a simple test program. > I am going to try on our XE6 shortly. > > I suspect you are right about something in the configury being different. > I am willing to try a few more nightly tarballs if somebody thinks they > have the proper fix. > > -Paul > > > On Fri, Jan 25, 2013 at 5:45 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Jan 25, 2013, at 5:12 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> Ralph, >> >> Those are the result of the missing -lnuma that Nathan already identified >> earlier as missing in BOTH 1.7 and trunk. >> I see MORE missing symbols, which include ones from libxpmem and libugni. >> >> >> Alright, let me try to be clearer. We are missing -lnuma as well as the >> required include file - both are necessary to remove the issue. >> >> I find both the xpmem and ugni libraries *are* correctly included in both >> 1.7 and trunk. It could be a case of finding them in the configury, but we >> are finding them *and* correctly including them on the XE6. >> >> HTH >> Ralph >> >> >> -Paul >> >> >> On Fri, Jan 25, 2013 at 4:59 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> >>> On Jan 25, 2013, at 4:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> > The repeated libs is something we obviously should fix, but all the >>> libs are there - including lustre. I guess those were dropped due to the >>> shared lib setting, so we probably should fix that in the platform file. >>> > >>> > Perhaps that is the cause of Nathan's issue? shrug...regardless, apps >>> build and run just fine using mpicc for me. >>> >>> Correction - turns out I misspoke. I find apps *don't* build correctly >>> with this setup: >>> >>> mpicc -g hello_c.c -o hello_c >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_set_area_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1116: >>> undefined reference to `mbind' >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1135: >>> undefined reference to `mbind' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_get_area_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1337: >>> undefined reference to `get_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_find_kernel_max_numnodes': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239: >>> undefined reference to `get_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_set_thisthread_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1183: >>> undefined reference to `set_mempolicy' >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1194: >>> undefined reference to `migrate_pages' >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1206: >>> undefined reference to `set_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_get_thisthread_membind': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1284: >>> undefined reference to `get_mempolicy' >>> /usr/aprojects/hpctools/rhc/build/lib/libopen-pal.a(topology-linux.o): >>> In function `hwloc_linux_find_kernel_max_numnodes': >>> /lscratch1/rcastain/openmpi-1.9a1/opal/mca/hwloc/hwloc151/hwloc/src/topology-linux.c:1239: >>> undefined reference to `get_mempolicy' >>> collect2: ld returned 1 exit status >>> make: *** [hello_c] Error 1 >>> >>> So it looks like hwloc is borked when built static. >>> >>> Sigh >>> Ralph >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900