I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built.
[ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zh...@gmail.com> wrote: > Hello, Philip, > Do I still need to use the feature-petsc-kokkos branch? > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fackle...@ornl.gov> > wrote: > >> Junchao, >> >> Thank you for working on this. If you open the parameter file for, say, >> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add >> -dm_mat_type >> aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the >> corresponding cusparse/cuda option). >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <junchao.zh...@gmail.com> >> *Sent:* Thursday, December 1, 2022 17:05 >> *To:* Fackler, Philip <fackle...@ornl.gov> >> *Cc:* xolotl-psi-developm...@lists.sourceforge.net < >> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < >> petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Roth, >> Philip <rot...@ornl.gov> >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry for the long delay. I could not get something useful from the >> -log_view output. Since I have already built xolotl, could you give me >> instructions on how to do a xolotl test to reproduce the divergence with >> petsc GPU backends (but fine on CPU)? >> Thank you. >> --Junchao Zhang >> >> >> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fackle...@ornl.gov> >> wrote: >> >> ------------------------------------------------------------------ PETSc >> Performance Summary: >> ------------------------------------------------------------------ >> >> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 >> 14:36:46 2022 >> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: >> 2022-10-28 14:39:41 +0000 >> >> Max Max/Min Avg Total >> Time (sec): 6.023e+00 1.000 6.023e+00 >> Objects: 1.020e+02 1.000 1.020e+02 >> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 >> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 >> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU >> time over all processors) >> CpuToGpu Count: total number of CPU to GPU copies per processor >> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >> processor) >> GpuToCpu Count: total number of GPU to CPU copies per processor >> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >> processor) >> GPU %F: percent flops on GPU in this event >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> GPU - CpuToGpu - - GpuToCpu - GPU >> >> Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> Mflop/s Count Size Count Size %F >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 2 5.14e-03 0 0.00e+00 0 >> >> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 97100 0 0 0 97100 0 0 0 184 >> -nan 2 5.14e-03 0 0.00e+00 54 >> >> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan >> -nan 1 3.36e-04 0 0.00e+00 100 >> >> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 >> -nan 1 4.80e-03 0 0.00e+00 46 >> >> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 >> -nan 1 4.80e-03 0 0.00e+00 53 >> >> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan >> -nan 1 4.80e-03 0 0.00e+00 19 >> >> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> >> --- Event Stage 1: Unknown >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> Object Type Creations Destructions. Reports information only >> for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 5 5 >> Distributed Mesh 2 2 >> Index Set 11 11 >> IS L to G Mapping 1 1 >> Star Forest Graph 7 7 >> Discrete System 2 2 >> Weak Form 2 2 >> Vector 49 49 >> TSAdapt 1 1 >> TS 1 1 >> DMTS 1 1 >> SNES 1 1 >> DMSNES 3 3 >> SNESLineSearch 1 1 >> Krylov Solver 4 4 >> DMKSP interface 1 1 >> Matrix 4 4 >> Preconditioner 4 4 >> Viewer 2 1 >> >> --- Event Stage 1: Unknown >> >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.14e-08 >> #PETSc Option Table entries: >> -log_view >> -log_view_gpu_times >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with 64 bit PetscInt >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 8 >> Configure options: PETSC_DIR=/home/4pf/repos/petsc >> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries >> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 >> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install >> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install >> >> ----------------------------------------- >> Libraries compiled on 2022-11-01 21:01:08 on PC0115427 >> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 >> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -O3 >> ----------------------------------------- >> >> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using libraries: >> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib >> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc >> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib >> -L/home/4pf/build/kokkos/cuda/install/lib >> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 >> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers >> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas >> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl >> ----------------------------------------- >> >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <junchao.zh...@gmail.com> >> *Sent:* Tuesday, November 15, 2022 13:03 >> *To:* Fackler, Philip <fackle...@ornl.gov> >> *Cc:* xolotl-psi-developm...@lists.sourceforge.net < >> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < >> petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Roth, >> Philip <rot...@ornl.gov> >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Can you paste -log_view result so I can see what functions are used? >> >> --Junchao Zhang >> >> >> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fackle...@ornl.gov> >> wrote: >> >> Yes, most (but not all) of our system test cases fail with the >> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos >> backend. >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <junchao.zh...@gmail.com> >> *Sent:* Monday, November 14, 2022 19:34 >> *To:* Fackler, Philip <fackle...@ornl.gov> >> *Cc:* xolotl-psi-developm...@lists.sourceforge.net < >> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < >> petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Zhang, >> Junchao <jczh...@mcs.anl.gov>; Roth, Philip <rot...@ornl.gov> >> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec >> diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry to hear that. It seems you could run the same code on CPUs but >> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it >> right? >> >> --Junchao Zhang >> >> >> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < >> petsc-users@mcs.anl.gov> wrote: >> >> This is an issue I've brought up before (and discussed in-person with >> Richard). I wanted to bring it up again because I'm hitting the limits of >> what I know to do, and I need help figuring this out. >> >> The problem can be reproduced using Xolotl's "develop" branch built >> against a petsc build with kokkos and kokkos-kernels enabled. Then, either >> add the relevant kokkos options to the "petscArgs=" line in the system test >> parameter file(s), or just replace the system test parameter files with the >> ones from the "feature-petsc-kokkos" branch. See here the files that >> begin with "params_system_". >> >> Note that those files use the "kokkos" options, but the problem is >> similar using the corresponding cuda/cusparse options. I've already tried >> building kokkos-kernels with no TPLs and got slightly different results, >> but the same problem. >> >> Any help would be appreciated. >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> >>