On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users < petsc-users@mcs.anl.gov> wrote:
> Hi Junchao, both the slurm scontrol show job_id -dd and looking at > CUDA_VISIBLE_DEVICES does not provide information about which MPI process > is associated to which GPU in the node in our system. I can see this with > nvidia-smi, but if you have any other suggestion using slurm I would like > to hear it. > > I've been trying to compile the code+Petsc in summit, but have been having > all sorts of issues related to spectrum-mpi, and the different compilers > they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle > Fortran 2018, others give issues of repeated MPI definitions, etc.). > The PETSc configure examples are in the repository: https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads Thanks, Matt > I also wanted to ask you, do you know if it is possible to compile PETSc > with the xl/16.1.1-10 suite? > > Thanks! > > I configured the library --with-cuda and when compiling I get a > compilation error with CUDAC: > > CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:1: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: > warning: Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. > [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: > note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning > #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > <scratch space>:141:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:2: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from > /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: > CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT > to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: > expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: > expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: > expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > <scratch space>:198:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): > warning #1835-D: attribute "warn_unused_result" does not apply here > > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:1: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: > warning: Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. > [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: > note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning > #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > <scratch space>:149:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:2: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from > /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: > CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT > to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: > expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: > expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: > expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > <scratch space>:208:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): > warning #1835-D: attribute "warn_unused_result" does not apply here > > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(len); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(flg); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(haystack); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(needle); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > fatal error: too many errors emitted, stopping now [-ferror-limit=] > 20 errors generated. > Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. > gmake[3]: *** [gmakefile:209: > arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 > gmake[2]: *** > [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: > libs] Error 2 > **************************ERROR************************************* > Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log > Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to > petsc-ma...@mcs.anl.gov > ******************************************************************** > > > > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Monday, August 21, 2023 4:17 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* PETSc users list <petsc-users@mcs.anl.gov>; Guan, Collin X. (Fed) < > collin.g...@nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > That is a good question. Looking at > https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if > you can share the output of your job so we can search CUDA_VISIBLE_DEVICES > and see how GPUs were allocated. > > --Junchao Zhang > > > On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI > processes meshes but only working on 2 of them? > It says in the script it has allocated 2.4GB > Best, > Marcos > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Monday, August 21, 2023 3:29 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* PETSc users list <petsc-users@mcs.anl.gov>; Guan, Collin X. (Fed) < > collin.g...@nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > If you look at the PIDs of the nvidia-smi output, you will only find 8 > unique PIDs, which is expected since you allocated 8 MPI ranks per node. > The duplicate PIDs are usually for threads spawned by the MPI runtime > (for example, progress threads in MPI implementation). So your job script > and output are all good. > > Thanks. > > On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Hi Junchao, something I'm noting related to running with cuda enabled > linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu > calculations, the GPU 0 in the node is taking what seems to be all > sub-matrices corresponding to all the MPI processes in the node. This is > the result of the nvidia-smi command on a node with 8 MPI processes (each > advancing the same number of unknowns in the calculation) and 4 GPU V100s: > > Mon Aug 21 14:36:07 2023 > > +---------------------------------------------------------------------------------------+ > | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA > Version: 12.2 | > > |-----------------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | > Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | > GPU-Util Compute M. | > | | | > MIG M. | > > |=========================================+======================+======================| > | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | > 0 | > | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | > 0 | > | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | > 0 | > | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | > 0 | > | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > > > > +---------------------------------------------------------------------------------------+ > | Processes: > | > | GPU GI CI PID Type Process name > GPU Memory | > | ID ID > Usage | > > |=======================================================================================| > | 0 N/A N/A 214626 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214630 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 1 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 1 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > > +---------------------------------------------------------------------------------------+ > > > You can see that GPU 0 is connected to all 8 MPI Processes, each taking > about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. > I'm wondering if this is expected or there are some changes I need to do on > my submission script/runtime parameters. > This is the script in this case (2 nodes, 8 MPI processes/node, 4 > GPU/node): > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=gpu > #SBATCH --ntasks=16 > #SBATCH --ntasks-per-node=8 > #SBATCH --cpus-per-task=1 > #SBATCH --nodes=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:4 > > export OMP_NUM_THREADS=1 > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > > srun -N 2 -n 16 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux > test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Thank you for the advice, > Marcos > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>