It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause?
One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked. On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang <junchao.zh...@gmail.com> wrote: > Jacob, is it because the cuda arch is too old? > > --Junchao Zhang > > > On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <mlo...@gmail.com> wrote: > >> I'm seeing the same thing on latest main with a different machine and >> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, >> static_cast<int>(device->deviceId))); in the initialize function. >> >> >> Running check examples to verify correct installation >> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >> processes >> 2,17c2,46 >> < 0 SNES Function norm 2.391552133017e-01 >> < 0 KSP Residual norm 2.928487269734e-01 >> < 1 KSP Residual norm 1.876489580142e-02 >> < 2 KSP Residual norm 3.291394847944e-03 >> < 3 KSP Residual norm 2.456493072124e-04 >> < 4 KSP Residual norm 1.161647147715e-05 >> < 5 KSP Residual norm 1.285648407621e-06 >> < 1 SNES Function norm 6.846805706142e-05 >> < 0 KSP Residual norm 2.292783790384e-05 >> < 1 KSP Residual norm 2.100673631699e-06 >> < 2 KSP Residual norm 2.121341386147e-07 >> < 3 KSP Residual norm 2.455932678957e-08 >> < 4 KSP Residual norm 1.753095730744e-09 >> < 5 KSP Residual norm 7.489214418904e-11 >> < 2 SNES Function norm 2.103908447865e-10 >> < Number of SNES iterations = 2 >> --- >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: GPU error >> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >> Could be the program crashed before they were used or a spelling mistake, >> etc! >> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >> source: command line >> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >> environment >> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >> source: command line >> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb >> GIT Date: 2023-01-05 17:22:48 +0000 >> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >> Thu Jan 5 17:25:17 2023 >> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >> > [0]PETSC ERROR: #1 initialize() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:10 >> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >> > [0]PETSC ERROR: #8 create() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >> > [0]PETSC ERROR: #10 VecSetType() at >> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at >> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 >> > [0]PETSC ERROR: #13 main() at ex19.c:149 >> >> >> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <mlo...@gmail.com> wrote: >> >>> I'm trying to compile the cuda example >>> >>> ./config/examples/arch-ci-linux-cuda-double-64idx.py >>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> >>> and running make test passes the test ok >>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >>> but the eager variant fails, pasted below. >>> >>> I get a similar error running my client code, pasted after. There when >>> running with -info, it seems that some lazy initialization happens first, >>> and i also call VecCreateSeqCuda which seems to have no issue. >>> >>> Any idea? This happens to be with an -sm 3.5 device if it matters, >>> otherwise it's a recent cuda compiler+driver. >>> >>> >>> petsc test code output: >>> >>> >>> >>> not ok >>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >>> Error code: 97 >>> # [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> # [0]PETSC ERROR: GPU error >>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 >>> 15:22:33 2023 >>> # [0]PETSC ERROR: Configure options >>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >>> --with-cuda=1 --with-precision=double --with-clanguage=c >>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >>> # [0]PETSC ERROR: #2 initialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >>> # [0]PETSC ERROR: #3 init_device_id_() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >>> # [0]PETSC ERROR: #4 getDevice() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >>> # [0]PETSC ERROR: #10 PetscInitialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >>> # [0]PETSC ERROR: #11 main() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >>> # [0]PETSC ERROR: PETSc Option Table entries: >>> # [0]PETSC ERROR: -default_device_type host >>> # [0]PETSC ERROR: -device_enable eager >>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>> entire error message to petsc-ma...@mcs.anl.gov---------- >>> >>> >>> >>> >>> >>> solver code output: >>> >>> >>> >>> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off >>> by default 0 >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType host available, initializing >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>> host initialized, default device id 0, view FALSE, init type lazy >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType cuda available, initializing >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>> cuda initialized, default device id 0, view FALSE, init type lazy >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType hip not available >>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>> PetscDeviceType sycl not available >>> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number >>> of processors = 1 >>> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS >>> lancer.(none) >>> [0] <sys> PetscInitialize_Common(): Running on machine: lancer >>> # [Info] Petsc initialization complete. >>> # [Trace] Timing: Starting solver... >>> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >>> # [Trace] Timing: PetscTimeIntegrator initialization... >>> # [Trace] Timing: Allocating Petsc CUDA arrays... >>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags >>> = 100000000 >>> [0] <sys> configure(): Configured device 0 >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >>> seconds. >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags >>> = 100000000 >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] <dm> DMGetDMTS(): Creating new DMTS >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] <dm> DMGetDMSNES(): Creating new DMSNES >>> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write >>> # [Info] Initializing petsc with ode23 integrator >>> # [Trace] Timing: PetscTimeIntegrator initialization finished in >>> 0.016754 seconds. >>> >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>> [0] <device> PetscDeviceContextSetupGlobalContext_Private(): >>> Initializing global PetscDeviceContext with device type cuda >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: GPU error >>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >>> Jan 5 15:39:14 2023 >>> [0]PETSC ERROR: Configure options >>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >>> --download-hwloc=1 >>> [0]PETSC ERROR: #1 initialize() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:10 >>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >>> at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>> [0]PETSC ERROR: #7 >>> PetscDeviceContextGetCurrentContextAssertType_Internal() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:23 >>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >>> veccuda2.cu:261 >>> [0]PETSC ERROR: #10 VecMAXPY() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >>> [0]PETSC ERROR: #11 TSStep_RK() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >>> [0]PETSC ERROR: #12 TSStep() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >>> [0]PETSC ERROR: #13 TSSolve() at >>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >>> >>> >>>