[OMPI devel] [2.0.2rc2] FreeBSD-11 run failure

2017-01-04 Thread Paul Hargrove
With the 2.0.2rc2 tarball on FreeBSD-11 (i386 or amd64) I am configuring
with:
 --prefix=... CC=clang CXX=clang++ --disable-mpi-fortran

I get a failure running ring_c:

mpirun -mca btl sm,self -np 2 examples/ring_c'
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
+ exit 1

When I configure with either "--disable-dlopen" OR "--enable-static
--disable-shared" the problem vanishes.
So, I suspect a dlopen-related issue.

I will

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] hwloc missing NUMANode object

2017-01-04 Thread Gilles Gouaillardet

Ralph and Brice,


since 
https://github.com/open-mpi/ompi/commit/fe68f2309912ea2afdc3339ff9a3b697f69a2dd1 
we likely set the default binding policy to OPAL_BIND_TO_NUMA



unfortunatly, that does not work on my VM (Virtual Box, single socket, 4 
cores) since there is no HWLOC_OBJ_NODE here


with hwloc v1.11


  complete_cpuset="0x000f" online_cpuset="0x000f" 
allowed_cpuset="0x000f" local_memory="3975217152">
complete_cpuset="0x000f" online_cpuset="0x000f" 
allowed_cpuset="0x000f">
  complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001" cache_size="8388608" depth="3" 
cache_linesize="64" cache_associativity="16" cache_type="0">
complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001" cache_size="262144" depth="2" 
cache_linesize="64" cache_associativity="8" cache_type="0">
  complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001" cache_size="32768" depth="1" 
cache_linesize="64" cache_associativity="8" cache_type="1">
complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001" cache_size="32768" depth="1" 
cache_linesize="64" cache_associativity="8" cache_type="2">
  complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001">
complete_cpuset="0x0001" online_cpuset="0x0001" 
allowed_cpuset="0x0001"/>

[...]




but with the latest hwloc (master branch, that does not work yet with 
Open MPI)





  complete_cpuset="0x000f" allowed_cpuset="0x000f" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="1">
complete_cpuset="0x000f" allowed_cpuset="0x000f" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="2" local_memory="4294500352">
  complete_cpuset="0x000f" allowed_cpuset="0x000f" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="3">
complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="9" cache_size="8388608" depth="3" 
cache_linesize="64" cache_associativity="16" cache_type="0">
  complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="8" cache_size="262144" depth="2" 
cache_linesize="64" cache_associativity="8" cache_type="0">
complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="6" cache_size="32768" depth="1" 
cache_linesize="64" cache_associativity="8" cache_type="1">
  complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="7" cache_size="32768" depth="1" 
cache_linesize="64" cache_associativity="8" cache_type="2">
complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="4">
  complete_cpuset="0x0001" allowed_cpuset="0x0001" 
nodeset="0x0001" complete_nodeset="0x0001" 
allowed_nodeset="0x0001" gp_index="5"/>

[...]




an ugly work around is to bind to HWLOC_OBJ_PACKAGE when there is no 
HWLOC_OBJ_NODE


diff --git a/orte/mca/rmaps/base/rmaps_base_binding.c 
b/orte/mca/rmaps/base/rmaps_base_binding.c

index 6786da7..ddc106c 100644
--- a/orte/mca/rmaps/base/rmaps_base_binding.c
+++ b/orte/mca/rmaps/base/rmaps_base_binding.c
@@ -916,6 +916,10 @@ int orte_rmaps_base_compute_bindings(orte_job_t *jdata)
 bind_depth = 
hwloc_get_cache_type_depth(node->topology, clvl, 
(hwloc_obj_cache_type_t)-1);

 } else {
 bind_depth = hwloc_get_type_depth(node->topology, hwb);
+if (0 > bind_depth && HWLOC_OBJ_NODE == hwb) {
+hwb = HWLOC_OBJ_PACKAGE;
+bind_depth = hwloc_get_type_depth(node->topology, hwb);
+}
 }
 if (0 > bind_depth) {
 /* didn't find such an object */


or we could check the existence of HWLOC_OBJ_PACKAGE before setting the 
default policy in orte_rmaps_base_map_job(),


not to mention it is possible to generate an xml topology with the 
latest hwloc, and


mpirun --mca hwloc_base_topo_file topo.xml ...



Brice,


things would be much easier if there were an HWLOC_OBJ_NODE object in 
the topology.


could you please consider backporting the relevant changes from master 
into the v1.11 branch ?



Cheers,


Gilles

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


[OMPI devel] [2.0.2rc2] opal_fifo hang w/ --enable-osx-builtin-atomics

2017-01-04 Thread Paul Hargrove
On Macs running Yosemite (OS X 10.10 w/ Xcode 7.1) and El Capitan (OS X
10.11 w/ Xcode 8.1) I have configured with
CC=cc CXX=c++ FC=/sw/bin/gfortran --prefix=...
--enable-osx-builtin-atomics

Upon running "make check", the test "opal_fifo" hangs on both systems.
Without the --enable-osx-builtin-atomics things are fine.

I don't have data for Sierra (10.12).

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel