Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-12 Thread Ole Holm Nielsen

On 12-03-2021 10:35, Ole Holm Nielsen wrote:
Thanks a lot for pointing at the solution.  I have asked the sysadmin if 
he can install the libnl3-devel RPM.  Hopefully that will resolve the 
issue for us.


I can report that after installing the libnl3-devel RPM and rebuilding 
the libfabric module, then OpenMPI builds without any problems.


Can the libfabric be updated with the prerequisite of libnl3-devel?

Thanks,
Ole


On 12-03-2021 09:30, Kenneth Hoste wrote:

Dear Ole,

Please check 
https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, 
where this issue is also reported.


It seems to be related to (not) having specific OS packages installed 
when libfabric is being installed.


We probably need to make some changes (configure options, or 
registering required OS dependencies) for this, so additional feedback 
on this is welcome (in particular in the GitHub issue).



regards,

Kenneth

On 11/03/2021 16:11, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external 
cluster with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I 
can't get OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash 
/tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb 


== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and 
libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi 
or --with-libfabric), but not found.

configure: error: Cannot continue.
  (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... 

== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 

== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem 
on our own cluster.


Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-12 Thread Ole Holm Nielsen

Hi Kenneth,

Thanks a lot for pointing at the solution.  I have asked the sysadmin if 
he can install the libnl3-devel RPM.  Hopefully that will resolve the 
issue for us.


Best regards,
Ole


On 12-03-2021 09:30, Kenneth Hoste wrote:

Dear Ole,

Please check 
https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, 
where this issue is also reported.


It seems to be related to (not) having specific OS packages installed 
when libfabric is being installed.


We probably need to make some changes (configure options, or registering 
required OS dependencies) for this, so additional feedback on this is 
welcome (in particular in the GitHub issue).



regards,

Kenneth

On 11/03/2021 16:11, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external 
cluster with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I 
can't get OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash 
/tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb 


== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and 
libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi or 
--with-libfabric), but not found.

configure: error: Cannot continue.
  (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... 

== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 

== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem 
on our own cluster.


Thanks,
Ole





Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-12 Thread Kenneth Hoste

Dear Ole,

Please check 
https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, 
where this issue is also reported.


It seems to be related to (not) having specific OS packages installed 
when libfabric is being installed.


We probably need to make some changes (configure options, or registering 
required OS dependencies) for this, so additional feedback on this is 
welcome (in particular in the GitHub issue).



regards,

Kenneth

On 11/03/2021 16:11, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external 
cluster with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I 
can't get OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash /tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb 


== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and 
libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi or 
--with-libfabric), but not found.

configure: error: Cannot continue.
  (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... 

== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 

== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed (first 
300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem on 
our own cluster.


Thanks,
Ole