Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node
On 12-03-2021 10:35, Ole Holm Nielsen wrote: Thanks a lot for pointing at the solution. I have asked the sysadmin if he can install the libnl3-devel RPM. Hopefully that will resolve the issue for us. I can report that after installing the libnl3-devel RPM and rebuilding the libfabric module, then OpenMPI builds without any problems. Can the libfabric be updated with the prerequisite of libnl3-devel? Thanks, Ole On 12-03-2021 09:30, Kenneth Hoste wrote: Dear Ole, Please check https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, where this issue is also reported. It seems to be related to (not) having specific OS packages installed when libfabric is being installed. We probably need to make some changes (configure options, or registering required OS dependencies) for this, so additional feedback on this is welcome (in particular in the GitHub issue). regards, Kenneth On 11/03/2021 16:11, Ole Holm Nielsen wrote: Dear EasyBuilders, I'm trying to get EasyBuild modules up and running on an external cluster with AMD EPYC 7351 and running CentOS 7.6. With EB 4.3.3 I can't get OpenMPI to build :-( I'm trying to build this module: $ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r == Temporary log file in case of crash /tmp/eb-b3w2Qf/easybuild-WAbEyF.log == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == resolving dependencies ... == processing EasyBuild easyconfig /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb == building and installing OpenMPI/4.0.5-GCC-10.2.0... == fetching files... == creating build dir, resetting environment... == unpacking... == patching... == preparing... == configuring... == FAILED: Installation ended unsuccessfully (build directory: /groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics (took 1 min 38 sec) == Results of the build can be found in the log file(s) /tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log The logfile ends with these warnings and errors: --- MCA component btl:usnic (m4 configuration macro) checking for MCA component btl:usnic compile mode... dso checking size of void *... (cached) 8 checking for 64 bit Linux... yes checking --with-ofi value... sanity check ok (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking --with-ofi-libdir value... simple ok (unspecified value) checking looking for OFI libfabric in... (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking rdma/fabric.h usability... yes checking rdma/fabric.h presence... yes checking for rdma/fabric.h... yes looking for library in lib checking for library containing fi_getinfo... -lfabric checking if libfabric requires libnl v1 or v3... v1 v3 configure: WARNING: Unfortunately, libfabric links to both libnl and libnl-3. configure: WARNING: This is a configuration that is *known* to cause run-time crashes. configure: WARNING: This is an error in libfabric (not Open MPI). configure: WARNING: Open MPI will therefore skip using libfabric. configure: WARNING: OFI libfabric support requested (via --with-ofi or --with-libfabric), but not found. configure: error: Cannot continue. (at easybuild/tools/run.py:533 in parse_cmd_output) == 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... == 2021-03-11 08:57:29,621 filetools.py:341 INFO Path /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock successfully removed. == 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock == 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics == 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for application name OpenMPI version 4.0.5 Can someone tell me what's going on here? We don't have this problem on our own cluster.
Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node
Hi Kenneth, Thanks a lot for pointing at the solution. I have asked the sysadmin if he can install the libnl3-devel RPM. Hopefully that will resolve the issue for us. Best regards, Ole On 12-03-2021 09:30, Kenneth Hoste wrote: Dear Ole, Please check https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, where this issue is also reported. It seems to be related to (not) having specific OS packages installed when libfabric is being installed. We probably need to make some changes (configure options, or registering required OS dependencies) for this, so additional feedback on this is welcome (in particular in the GitHub issue). regards, Kenneth On 11/03/2021 16:11, Ole Holm Nielsen wrote: Dear EasyBuilders, I'm trying to get EasyBuild modules up and running on an external cluster with AMD EPYC 7351 and running CentOS 7.6. With EB 4.3.3 I can't get OpenMPI to build :-( I'm trying to build this module: $ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r == Temporary log file in case of crash /tmp/eb-b3w2Qf/easybuild-WAbEyF.log == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == resolving dependencies ... == processing EasyBuild easyconfig /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb == building and installing OpenMPI/4.0.5-GCC-10.2.0... == fetching files... == creating build dir, resetting environment... == unpacking... == patching... == preparing... == configuring... == FAILED: Installation ended unsuccessfully (build directory: /groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics (took 1 min 38 sec) == Results of the build can be found in the log file(s) /tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log The logfile ends with these warnings and errors: --- MCA component btl:usnic (m4 configuration macro) checking for MCA component btl:usnic compile mode... dso checking size of void *... (cached) 8 checking for 64 bit Linux... yes checking --with-ofi value... sanity check ok (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking --with-ofi-libdir value... simple ok (unspecified value) checking looking for OFI libfabric in... (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking rdma/fabric.h usability... yes checking rdma/fabric.h presence... yes checking for rdma/fabric.h... yes looking for library in lib checking for library containing fi_getinfo... -lfabric checking if libfabric requires libnl v1 or v3... v1 v3 configure: WARNING: Unfortunately, libfabric links to both libnl and libnl-3. configure: WARNING: This is a configuration that is *known* to cause run-time crashes. configure: WARNING: This is an error in libfabric (not Open MPI). configure: WARNING: Open MPI will therefore skip using libfabric. configure: WARNING: OFI libfabric support requested (via --with-ofi or --with-libfabric), but not found. configure: error: Cannot continue. (at easybuild/tools/run.py:533 in parse_cmd_output) == 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... == 2021-03-11 08:57:29,621 filetools.py:341 INFO Path /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock successfully removed. == 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock == 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics == 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for application name OpenMPI version 4.0.5 Can someone tell me what's going on here? We don't have this problem on our own cluster. Thanks, Ole
Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node
Dear Ole, Please check https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, where this issue is also reported. It seems to be related to (not) having specific OS packages installed when libfabric is being installed. We probably need to make some changes (configure options, or registering required OS dependencies) for this, so additional feedback on this is welcome (in particular in the GitHub issue). regards, Kenneth On 11/03/2021 16:11, Ole Holm Nielsen wrote: Dear EasyBuilders, I'm trying to get EasyBuild modules up and running on an external cluster with AMD EPYC 7351 and running CentOS 7.6. With EB 4.3.3 I can't get OpenMPI to build :-( I'm trying to build this module: $ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r == Temporary log file in case of crash /tmp/eb-b3w2Qf/easybuild-WAbEyF.log == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == found valid index for /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so using it... == resolving dependencies ... == processing EasyBuild easyconfig /groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb == building and installing OpenMPI/4.0.5-GCC-10.2.0... == fetching files... == creating build dir, resetting environment... == unpacking... == patching... == preparing... == configuring... == FAILED: Installation ended unsuccessfully (build directory: /groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics (took 1 min 38 sec) == Results of the build can be found in the log file(s) /tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log The logfile ends with these warnings and errors: --- MCA component btl:usnic (m4 configuration macro) checking for MCA component btl:usnic compile mode... dso checking size of void *... (cached) 8 checking for 64 bit Linux... yes checking --with-ofi value... sanity check ok (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking --with-ofi-libdir value... simple ok (unspecified value) checking looking for OFI libfabric in... (/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0) checking rdma/fabric.h usability... yes checking rdma/fabric.h presence... yes checking for rdma/fabric.h... yes looking for library in lib checking for library containing fi_getinfo... -lfabric checking if libfabric requires libnl v1 or v3... v1 v3 configure: WARNING: Unfortunately, libfabric links to both libnl and libnl-3. configure: WARNING: This is a configuration that is *known* to cause run-time crashes. configure: WARNING: This is an error in libfabric (not Open MPI). configure: WARNING: Open MPI will therefore skip using libfabric. configure: WARNING: OFI libfabric support requested (via --with-ofi or --with-libfabric), but not found. configure: error: Cannot continue. (at easybuild/tools/run.py:533 in parse_cmd_output) == 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... == 2021-03-11 08:57:29,621 filetools.py:341 INFO Path /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock successfully removed. == 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: /groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock == 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed (first 300 chars): cmd " ./configure --prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 --with-libevent=/groups/physics == 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for application name OpenMPI version 4.0.5 Can someone tell me what's going on here? We don't have this problem on our own cluster. Thanks, Ole