We are on RHEL 8, using modules that we can load/unload various version of packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded along with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1
make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check fails with the below errors, Running check examples to verify correct installation Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See https://petsc.org/release/faq/ -------------------------------------------------------------------------- The library attempted to open the following supporting CUDA libraries, but each of them failed. CUDA-aware support is disabled. libcuda.so.1: cannot open shared object file: No such file or directory libcuda.dylib: cannot open shared object file: No such file or directory /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory If you are not interested in CUDA-aware support, then run with --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you are interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location of libcuda.so.1 to get passed this issue. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: g117 Local device: mlx5_0 -------------------------------------------------------------------------- lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ The library attempted to open the following supporting CUDA libraries, but each of them failed. CUDA-aware support is disabled. libcuda.so.1: cannot open shared object file: No such file or directory libcuda.dylib: cannot open shared object file: No such file or directory /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory If you are not interested in CUDA-aware support, then run with --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you are interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the locationof libcuda.so.1 to get passed this issue. WARNING: There was an error initializing an OpenFabrics device. Local host: xxx Local device: mlx5_0 lid velocity = 0.0016, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 [g117:4162783] 1 more process has sent help message help-mpi-common-cuda.txt / dlopen failed [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [g117:4162783] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init Completed test examples Error while running make check gmake[1]: *** [makefile:149: check] Error 1 make: *** [GNUmakefile:17: check] Error 2 Where is $MPI_RUN set? I'd like to be able to pass options such as --mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib' which will help me troubleshoot and hide unneeded warnings. Thanks, Rob