Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-05 Thread Gilles Gouaillardet
Charles,

are you saying that even if you

mpirun --mca pml ob1 ...

(e.g. force the ob1 component of the pml framework) the memory leak is
still present ?

As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end up with -I/usr/include
-L/usr/lib64 and that could silently hide third party libraries
installed in a non standard directory). If --with-FOO fails for you,
then this is a bug we will appreciate you report.

Cheers,

Gilles
On Fri, Oct 5, 2018 at 6:42 AM Charles A Taylor  wrote:
>
>
> We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for 
> that matter) built with UCX support.   The leak shows up
> whether the “ucx” PML is specified for the run or not.  The applications in 
> question are arepo and gizmo but it I have no reason to believe
> that others are not affected as well.
>
> Basically the MPI processes grow without bound until SLURM kills the job or 
> the host memory is exhausted.
> If I configure and build with “--without-ucx” the problem goes away.
>
> I didn’t see anything about this on the UCX github site so I thought I’d ask 
> here.  Anyone else seeing the same or similar?
>
> What version of UCX is OpenMPI 3.1.x tested against?
>
> Regards,
>
> Charlie Taylor
> UF Research Computing
>
> Details:
> —
> RHEL7.5
> OpenMPI 3.1.2 (and any other version I’ve tried).
> ucx 1.2.2-1.el7 (RH native)
> RH native IB stack
> Mellanox FDR/EDR IB fabric
> Intel Parallel Studio 2018.1.163
>
> Configuration Options:
> —
> CFG_OPTS=""
> CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" 
> LDFLAGS=\"\" "
> CFG_OPTS="$CFG_OPTS --enable-static"
> CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
> CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
> CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
> CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
> CFG_OPTS="$CFG_OPTS --with-libevent=external"
> CFG_OPTS="$CFG_OPTS --with-hwloc=external"
> CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
> CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
> CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
> CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
> CFG_OPTS="$CFG_OPTS --with-mxm=no"
> CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
> CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
> CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
> CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
>
> rpmbuild --ba \
>  --define '_name openmpi' \
>  --define "_version $OMPI_VER" \
>  --define "_release ${RELEASE}" \
>  --define "_prefix $PREFIX" \
>  --define '_mandir %{_prefix}/share/man' \
>  --define '_defaultdocdir %{_prefix}' \
>  --define 'mflags -j 8' \
>  --define 'use_default_rpm_opt_flags 1' \
>  --define 'use_check_files 0' \
>  --define 'install_shell_scripts 1' \
>  --define 'shell_scripts_basename mpivars' \
>  --define "configure_options $CFG_OPTS " \
>  openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
>
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-05 Thread Pavel Shamis
Posting this on UCX list.

On Thu, Oct 4, 2018 at 4:42 PM Charles A Taylor  wrote:

>
> We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2,
> for that matter) built with UCX support.   The leak shows up
> whether the “ucx” PML is specified for the run or not.  The applications
> in question are arepo and gizmo but it I have no reason to believe
> that others are not affected as well.
>
> Basically the MPI processes grow without bound until SLURM kills the job
> or the host memory is exhausted.
> If I configure and build with “--without-ucx” the problem goes away.
>
> I didn’t see anything about this on the UCX github site so I thought I’d
> ask here.  Anyone else seeing the same or similar?
>
> What version of UCX is OpenMPI 3.1.x tested against?
>
> Regards,
>
> Charlie Taylor
> UF Research Computing
>
> Details:
> —
> RHEL7.5
> OpenMPI 3.1.2 (and any other version I’ve tried).
> ucx 1.2.2-1.el7 (RH native)
> RH native IB stack
> Mellanox FDR/EDR IB fabric
> Intel Parallel Studio 2018.1.163
>
> Configuration Options:
> —
> CFG_OPTS=""
> CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\"
> LDFLAGS=\"\" "
> CFG_OPTS="$CFG_OPTS --enable-static"
> CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
> CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
> CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
> CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
> CFG_OPTS="$CFG_OPTS --with-libevent=external"
> CFG_OPTS="$CFG_OPTS --with-hwloc=external"
> CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
> CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
> CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
> CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
> CFG_OPTS="$CFG_OPTS --with-mxm=no"
> CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
> CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
> CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
> CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
>
> rpmbuild --ba \
>  --define '_name openmpi' \
>  --define "_version $OMPI_VER" \
>  --define "_release ${RELEASE}" \
>  --define "_prefix $PREFIX" \
>  --define '_mandir %{_prefix}/share/man' \
>  --define '_defaultdocdir %{_prefix}' \
>  --define 'mflags -j 8' \
>  --define 'use_default_rpm_opt_flags 1' \
>  --define 'use_check_files 0' \
>  --define 'install_shell_scripts 1' \
>  --define 'shell_scripts_basename mpivars' \
>  --define "configure_options $CFG_OPTS " \
>  openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
>
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] ompio on Lustre

2018-10-05 Thread Dave Love
"Gabriel, Edgar"  writes:

> It was originally for performance reasons, but this should be fixed at
> this point. I am not aware of correctness problems.
>
> However, let me try to clarify your question about: What do you
> precisely mean by "MPI I/O on Lustre mounts without flock"? Was the
> Lustre filesystem mounted without flock?

No, it wasn't (and romio complains).

> If yes, that could lead to
> some problems, we had that on our Lustre installation for a while, but
> problems were even occurring without MPI I/O in that case (although I
> do not recall all details, just that we had to change the mount
> options).

Yes, without at least localflock you might expect problems with things
like bdb and sqlite, but I couldn't see any file locking calls in the
Lustre component.  If it is a problem, shouldn't the component fail like
without it like romio does?

I have suggested ephemeral PVFS^WOrangeFS but I doubt that will be
thought useful.

> Maybe just take a testsuite (either ours or HDF5), make sure
> to run it in a multi-node configuration and see whether it works
> correctly.

For some reason I didn't think MTT, if that's what you mean, was
available, but I see it is; I'll see if I can drive it when I have a
chance.  Tests from HDF5 might be easiest, thanks for the suggestion.
I'd tried with ANL's "testmpio", which was the only thing I found
immediately, but it threw up errors even on a local filesystem, at which
stage I thought it was best to ask...  I'll report back if I get useful
results.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-05 Thread Jeff Squyres (jsquyres) via users
Oops!  We had a typo in yesterday's fix -- fixed:

https://github.com/open-mpi/ompi/pull/5847

Ralph also put double extra super protection to make triple sure that this 
error can't happen again in:

https://github.com/open-mpi/ompi/pull/5846

Both of these should be in tonight's nightly snapshot.

Thank you!


> On Oct 5, 2018, at 5:45 AM, Ralph H Castain  wrote:
> 
> Please send Jeff and I the opal/mca/pmix/pmix4x/pmix/config.log again - we’ll 
> need to see why it isn’t building. The patch definitely is not in the v4.0 
> branch, but it should have been in master.
> 
> 
>> On Oct 5, 2018, at 2:04 AM, Siegmar Gross 
>>  wrote:
>> 
>> Hi Ralph, hi Jeff,
>> 
>> 
>> On 10/3/18 8:14 PM, Ralph H Castain wrote:
>>> Jeff and I talked and believe the patch in 
>>> https://github.com/open-mpi/ompi/pull/5836 should fix the problem.
>> 
>> 
>> Today I've installed openmpi-master-201810050304-5f1c940 and
>> openmpi-v4.0.x-201810050241-c079666. Unfortunately, I still get the
>> same error for all seven versions that I was able to build.
>> 
>> loki hello_1 114 mpicc --showme
>> gcc -I/usr/local/openmpi-master_64_gcc/include -fexceptions -pthread 
>> -std=c11 -m64 -Wl,-rpath -Wl,/usr/local/openmpi-master_64_gcc/lib64 
>> -Wl,--enable-new-dtags -L/usr/local/openmpi-master_64_gcc/lib64 -lmpi
>> 
>> loki hello_1 115 ompi_info | grep "Open MPI repo revision"
>>  Open MPI repo revision: v2.x-dev-6262-g5f1c940
>> 
>> loki hello_1 116 mpicc hello_1_mpi.c
>> 
>> loki hello_1 117 mpiexec -np 2 a.out
>> [loki:25575] [[64603,0],0] ORTE_ERROR_LOG: Not found in file 
>> ../../../../../openmpi-master-201810050304-5f1c940/orte/mca/ess/hnp/ess_hnp_module.c
>>  at line 320
>> --
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>  opal_pmix_base_select failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --
>> loki hello_1 118
>> 
>> 
>> I don't know, if you have already applied your suggested patch or if the
>> error message is still from a version without that patch. Do you need
>> anything else?
>> 
>> 
>> Best regards
>> 
>> Siegmar
>> 
>> 
 On Oct 2, 2018, at 2:50 PM, Jeff Squyres (jsquyres) via users 
  wrote:
 
 (Ralph sent me Siegmar's pmix config.log, which Siegmar sent to him 
 off-list)
 
 It looks like Siegmar passed --with-hwloc=internal.
 
 Open MPI's configure understood this and did the appropriate things.
 PMIX's configure didn't.
 
 I think we need to add an adjustment into the PMIx configure.m4 in OMPI...
 
 
> On Oct 2, 2018, at 5:25 PM, Ralph H Castain  wrote:
> 
> Hi Siegmar
> 
> I honestly have no idea - for some reason, the PMIx component isn’t 
> seeing the internal hwloc code in your environment.
> 
> Jeff, Brice - any ideas?
> 
> 
>> On Oct 2, 2018, at 1:18 PM, Siegmar Gross 
>>  wrote:
>> 
>> Hi Ralph,
>> 
>> how can I confirm that HWLOC built? Some hwloc files are available
>> in the built directory.
>> 
>> loki openmpi-master-201809290304-73075b8-Linux.x86_64.64_gcc 111 find . 
>> -name '*hwloc*'
>> ./opal/mca/btl/usnic/.deps/btl_usnic_hwloc.Plo
>> ./opal/mca/hwloc
>> ./opal/mca/hwloc/external/.deps/hwloc_external_component.Plo
>> ./opal/mca/hwloc/base/hwloc_base_frame.lo
>> ./opal/mca/hwloc/base/.deps/hwloc_base_dt.Plo
>> ./opal/mca/hwloc/base/.deps/hwloc_base_maffinity.Plo
>> ./opal/mca/hwloc/base/.deps/hwloc_base_frame.Plo
>> ./opal/mca/hwloc/base/.deps/hwloc_base_util.Plo
>> ./opal/mca/hwloc/base/hwloc_base_dt.lo
>> ./opal/mca/hwloc/base/hwloc_base_util.lo
>> ./opal/mca/hwloc/base/hwloc_base_maffinity.lo
>> ./opal/mca/hwloc/base/.libs/hwloc_base_util.o
>> ./opal/mca/hwloc/base/.libs/hwloc_base_dt.o
>> ./opal/mca/hwloc/base/.libs/hwloc_base_maffinity.o
>> ./opal/mca/hwloc/base/.libs/hwloc_base_frame.o
>> ./opal/mca/hwloc/.libs/libmca_hwloc.la
>> ./opal/mca/hwloc/.libs/libmca_hwloc.a
>> ./opal/mca/hwloc/libmca_hwloc.la
>> ./opal/mca/hwloc/hwloc201
>> ./opal/mca/hwloc/hwloc201/.deps/hwloc201_component.Plo
>> ./opal/mca/hwloc/hwloc201/hwloc201_component.lo
>> ./opal/mca/hwloc/hwloc201/hwloc
>> ./opal/mca/hwloc/hwloc201/hwloc/include/hwloc
>> ./opal/mca/hwloc/hwloc201/hwloc/hwloc
>> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/libhwloc_embedded.la
>> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_pci_la-topology-pci.Plo
>> 

Re: [OMPI users] ompio on Lustre

2018-10-05 Thread Gabriel, Edgar
It was originally for performance reasons, but this should be fixed at this 
point. I am not aware of correctness problems.

However, let me try to clarify your question about: What do you precisely mean 
by "MPI I/O on Lustre mounts without flock"? Was the Lustre filesystem mounted 
without flock? If yes, that could lead to some problems, we had that on our 
Lustre installation for a while, but problems were even occurring without MPI 
I/O in that case (although I do not recall all details, just that we had to 
change the mount options). Maybe just take a testsuite (either ours or HDF5), 
make sure to run it in a multi-node configuration and see whether it works 
correctly.

Thanks
Edgar

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave
> Love
> Sent: Friday, October 5, 2018 5:15 AM
> To: users@lists.open-mpi.org
> Subject: [OMPI users] ompio on Lustre
> 
> Is romio preferred over ompio on Lustre for performance or correctness?
> If it's relevant, the context is MPI-IO on Lustre mounts without flock, which
> ompio doesn't seem to require.
> Thanks.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] ompio on Lustre

2018-10-05 Thread Dave Love
Is romio preferred over ompio on Lustre for performance or correctness?
If it's relevant, the context is MPI-IO on Lustre mounts without flock,
which ompio doesn't seem to require.
Thanks.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-05 Thread Ralph H Castain
Please send Jeff and I the opal/mca/pmix/pmix4x/pmix/config.log again - we’ll 
need to see why it isn’t building. The patch definitely is not in the v4.0 
branch, but it should have been in master.


> On Oct 5, 2018, at 2:04 AM, Siegmar Gross 
>  wrote:
> 
> Hi Ralph, hi Jeff,
> 
> 
> On 10/3/18 8:14 PM, Ralph H Castain wrote:
>> Jeff and I talked and believe the patch in 
>> https://github.com/open-mpi/ompi/pull/5836 
>>  should fix the problem.
> 
> 
> Today I've installed openmpi-master-201810050304-5f1c940 and
> openmpi-v4.0.x-201810050241-c079666. Unfortunately, I still get the
> same error for all seven versions that I was able to build.
> 
> loki hello_1 114 mpicc --showme
> gcc -I/usr/local/openmpi-master_64_gcc/include -fexceptions -pthread -std=c11 
> -m64 -Wl,-rpath -Wl,/usr/local/openmpi-master_64_gcc/lib64 
> -Wl,--enable-new-dtags -L/usr/local/openmpi-master_64_gcc/lib64 -lmpi
> 
> loki hello_1 115 ompi_info | grep "Open MPI repo revision"
>  Open MPI repo revision: v2.x-dev-6262-g5f1c940
> 
> loki hello_1 116 mpicc hello_1_mpi.c
> 
> loki hello_1 117 mpiexec -np 2 a.out
> [loki:25575] [[64603,0],0] ORTE_ERROR_LOG: Not found in file 
> ../../../../../openmpi-master-201810050304-5f1c940/orte/mca/ess/hnp/ess_hnp_module.c
>  at line 320
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  opal_pmix_base_select failed
>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
> --
> loki hello_1 118
> 
> 
> I don't know, if you have already applied your suggested patch or if the
> error message is still from a version without that patch. Do you need
> anything else?
> 
> 
> Best regards
> 
> Siegmar
> 
> 
>>> On Oct 2, 2018, at 2:50 PM, Jeff Squyres (jsquyres) via users 
>>>  wrote:
>>> 
>>> (Ralph sent me Siegmar's pmix config.log, which Siegmar sent to him 
>>> off-list)
>>> 
>>> It looks like Siegmar passed --with-hwloc=internal.
>>> 
>>> Open MPI's configure understood this and did the appropriate things.
>>> PMIX's configure didn't.
>>> 
>>> I think we need to add an adjustment into the PMIx configure.m4 in OMPI...
>>> 
>>> 
 On Oct 2, 2018, at 5:25 PM, Ralph H Castain  wrote:
 
 Hi Siegmar
 
 I honestly have no idea - for some reason, the PMIx component isn’t seeing 
 the internal hwloc code in your environment.
 
 Jeff, Brice - any ideas?
 
 
> On Oct 2, 2018, at 1:18 PM, Siegmar Gross 
>  wrote:
> 
> Hi Ralph,
> 
> how can I confirm that HWLOC built? Some hwloc files are available
> in the built directory.
> 
> loki openmpi-master-201809290304-73075b8-Linux.x86_64.64_gcc 111 find . 
> -name '*hwloc*'
> ./opal/mca/btl/usnic/.deps/btl_usnic_hwloc.Plo
> ./opal/mca/hwloc
> ./opal/mca/hwloc/external/.deps/hwloc_external_component.Plo
> ./opal/mca/hwloc/base/hwloc_base_frame.lo
> ./opal/mca/hwloc/base/.deps/hwloc_base_dt.Plo
> ./opal/mca/hwloc/base/.deps/hwloc_base_maffinity.Plo
> ./opal/mca/hwloc/base/.deps/hwloc_base_frame.Plo
> ./opal/mca/hwloc/base/.deps/hwloc_base_util.Plo
> ./opal/mca/hwloc/base/hwloc_base_dt.lo
> ./opal/mca/hwloc/base/hwloc_base_util.lo
> ./opal/mca/hwloc/base/hwloc_base_maffinity.lo
> ./opal/mca/hwloc/base/.libs/hwloc_base_util.o
> ./opal/mca/hwloc/base/.libs/hwloc_base_dt.o
> ./opal/mca/hwloc/base/.libs/hwloc_base_maffinity.o
> ./opal/mca/hwloc/base/.libs/hwloc_base_frame.o
> ./opal/mca/hwloc/.libs/libmca_hwloc.la
> ./opal/mca/hwloc/.libs/libmca_hwloc.a
> ./opal/mca/hwloc/libmca_hwloc.la
> ./opal/mca/hwloc/hwloc201
> ./opal/mca/hwloc/hwloc201/.deps/hwloc201_component.Plo
> ./opal/mca/hwloc/hwloc201/hwloc201_component.lo
> ./opal/mca/hwloc/hwloc201/hwloc
> ./opal/mca/hwloc/hwloc201/hwloc/include/hwloc
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/libhwloc_embedded.la
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_pci_la-topology-pci.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_gl_la-topology-gl.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_cuda_la-topology-cuda.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_xml_libxml_la-topology-xml-libxml.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_opencl_la-topology-opencl.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_nvml_la-topology-nvml.Plo
> ./opal/mca/hwloc/hwloc201/hwloc/hwloc/.libs/libhwloc_embedded.la
> 

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-05 Thread Siegmar Gross

Hi Ralph, hi Jeff,


On 10/3/18 8:14 PM, Ralph H Castain wrote:

Jeff and I talked and believe the patch in 
https://github.com/open-mpi/ompi/pull/5836 should fix the problem.



Today I've installed openmpi-master-201810050304-5f1c940 and
openmpi-v4.0.x-201810050241-c079666. Unfortunately, I still get the
same error for all seven versions that I was able to build.

loki hello_1 114 mpicc --showme
gcc -I/usr/local/openmpi-master_64_gcc/include -fexceptions -pthread -std=c11 
-m64 -Wl,-rpath -Wl,/usr/local/openmpi-master_64_gcc/lib64 
-Wl,--enable-new-dtags -L/usr/local/openmpi-master_64_gcc/lib64 -lmpi


loki hello_1 115 ompi_info | grep "Open MPI repo revision"
  Open MPI repo revision: v2.x-dev-6262-g5f1c940

loki hello_1 116 mpicc hello_1_mpi.c

loki hello_1 117 mpiexec -np 2 a.out
[loki:25575] [[64603,0],0] ORTE_ERROR_LOG: Not found in file 
../../../../../openmpi-master-201810050304-5f1c940/orte/mca/ess/hnp/ess_hnp_module.c 
at line 320

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
loki hello_1 118


I don't know, if you have already applied your suggested patch or if the
error message is still from a version without that patch. Do you need
anything else?


Best regards

Siegmar






On Oct 2, 2018, at 2:50 PM, Jeff Squyres (jsquyres) via users 
 wrote:

(Ralph sent me Siegmar's pmix config.log, which Siegmar sent to him off-list)

It looks like Siegmar passed --with-hwloc=internal.

Open MPI's configure understood this and did the appropriate things.
PMIX's configure didn't.

I think we need to add an adjustment into the PMIx configure.m4 in OMPI...



On Oct 2, 2018, at 5:25 PM, Ralph H Castain  wrote:

Hi Siegmar

I honestly have no idea - for some reason, the PMIx component isn’t seeing the 
internal hwloc code in your environment.

Jeff, Brice - any ideas?



On Oct 2, 2018, at 1:18 PM, Siegmar Gross 
 wrote:

Hi Ralph,

how can I confirm that HWLOC built? Some hwloc files are available
in the built directory.

loki openmpi-master-201809290304-73075b8-Linux.x86_64.64_gcc 111 find . -name 
'*hwloc*'
./opal/mca/btl/usnic/.deps/btl_usnic_hwloc.Plo
./opal/mca/hwloc
./opal/mca/hwloc/external/.deps/hwloc_external_component.Plo
./opal/mca/hwloc/base/hwloc_base_frame.lo
./opal/mca/hwloc/base/.deps/hwloc_base_dt.Plo
./opal/mca/hwloc/base/.deps/hwloc_base_maffinity.Plo
./opal/mca/hwloc/base/.deps/hwloc_base_frame.Plo
./opal/mca/hwloc/base/.deps/hwloc_base_util.Plo
./opal/mca/hwloc/base/hwloc_base_dt.lo
./opal/mca/hwloc/base/hwloc_base_util.lo
./opal/mca/hwloc/base/hwloc_base_maffinity.lo
./opal/mca/hwloc/base/.libs/hwloc_base_util.o
./opal/mca/hwloc/base/.libs/hwloc_base_dt.o
./opal/mca/hwloc/base/.libs/hwloc_base_maffinity.o
./opal/mca/hwloc/base/.libs/hwloc_base_frame.o
./opal/mca/hwloc/.libs/libmca_hwloc.la
./opal/mca/hwloc/.libs/libmca_hwloc.a
./opal/mca/hwloc/libmca_hwloc.la
./opal/mca/hwloc/hwloc201
./opal/mca/hwloc/hwloc201/.deps/hwloc201_component.Plo
./opal/mca/hwloc/hwloc201/hwloc201_component.lo
./opal/mca/hwloc/hwloc201/hwloc
./opal/mca/hwloc/hwloc201/hwloc/include/hwloc
./opal/mca/hwloc/hwloc201/hwloc/hwloc
./opal/mca/hwloc/hwloc201/hwloc/hwloc/libhwloc_embedded.la
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_pci_la-topology-pci.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_gl_la-topology-gl.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_cuda_la-topology-cuda.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_xml_libxml_la-topology-xml-libxml.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_opencl_la-topology-opencl.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.deps/hwloc_nvml_la-topology-nvml.Plo
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.libs/libhwloc_embedded.la
./opal/mca/hwloc/hwloc201/hwloc/hwloc/.libs/libhwloc_embedded.a
./opal/mca/hwloc/hwloc201/.libs/hwloc201_component.o
./opal/mca/hwloc/hwloc201/.libs/libmca_hwloc_hwloc201.la
./opal/mca/hwloc/hwloc201/.libs/libmca_hwloc_hwloc201.a
./opal/mca/hwloc/hwloc201/libmca_hwloc_hwloc201.la
./orte/mca/rtc/hwloc
./orte/mca/rtc/hwloc/rtc_hwloc.lo
./orte/mca/rtc/hwloc/.deps/rtc_hwloc.Plo
./orte/mca/rtc/hwloc/.deps/rtc_hwloc_component.Plo
./orte/mca/rtc/hwloc/mca_rtc_hwloc.la
./orte/mca/rtc/hwloc/.libs/mca_rtc_hwloc.so
./orte/mca/rtc/hwloc/.libs/mca_rtc_hwloc.la
./orte/mca/rtc/hwloc/.libs/rtc_hwloc.o
./orte/mca/rtc/hwloc/.libs/rtc_hwloc_component.o
./orte/mca/rtc/hwloc/.libs/mca_rtc_hwloc.soT
./orte/mca/rtc/hwloc/.libs/mca_rtc_hwloc.lai

[OMPI users] error building openmpi-master-201810050304-5f1c940 on Linux with Sun C

2018-10-05 Thread Siegmar Gross

Hi,

I've tried to install openmpi-master-201810050304-5f1c940 on my "SUSE Linux
Enterprise Server 12.3 (x86_64)" with Sun C 5.15 (Oracle Developer Studio
12.6). Unfortunately, I get the following error.


loki openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc 128 head -7 
config.log | tail -1
  $ ../openmpi-master-201810050304-5f1c940/configure 
--prefix=/usr/local/openmpi-master_64_cc 
--libdir=/usr/local/openmpi-master_64_cc/lib64 
--with-jdk-bindir=/usr/local/jdk-10.0.1/bin 
--with-jdk-headers=/usr/local/jdk-10.0.1/include JAVA_HOME=/usr/local/jdk-10.0.1 
LDFLAGS=-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 CC=cc CXX=CC FC=f95 
CFLAGS=-m64 -mt CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp 
--disable-mpi-fortran --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java 
--with-valgrind=/usr/local/valgrind --with-hwloc=internal --without-verbs 
--with-wrapper-cflags=-std=c11 -m64 -mt --with-wrapper-cxxflags=-m64 
--with-wrapper-fcflags=-m64 --with-wrapper-ldflags=-mt --enable-debug

loki openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc 129


loki openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc 131 tail -27 
log.make.Linux.x86_64.64_cc

Making all in tools/wrappers
make[2]: Entering directory 
'/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers'

  GENERATE opal_wrapper.1
  CC   opal_wrapper.o
  CCLD opal_wrapper
opal_wrapper.o: In function `opal_atomic_mb':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:64: 
undefined reference to `atomic_thread_fence'

opal_wrapper.o: In function `opal_atomic_wmb':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:69: 
undefined reference to `atomic_thread_fence'

opal_wrapper.o: In function `opal_atomic_rmb':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:74: 
undefined reference to `atomic_thread_fence'

opal_wrapper.o: In function `opal_atomic_lock_init':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:210: 
undefined reference to `atomic_flag_clear'

opal_wrapper.o: In function `opal_atomic_trylock':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:216: 
undefined reference to `atomic_flag_test_and_set'

opal_wrapper.o: In function `opal_atomic_unlock':
/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers//../../../../openmpi-master-201810050304-5f1c940/opal/include/opal/sys/atomic_stdc.h:229: 
undefined reference to `atomic_flag_clear'

postopt: error: ld failed to link the binary
cc: postopt failed for .libs/opal_wrapper
Makefile:1873: recipe for target 'opal_wrapper' failed
make[2]: *** [opal_wrapper] Error 2
make[2]: Leaving directory 
'/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal/tools/wrappers'

Makefile:2377: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
'/export2/src/openmpi-master/openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc/opal'

Makefile:1895: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1
loki openmpi-master-201810050304-5f1c940-Linux.x86_64.64_cc 132



I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users