Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang
On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fackle...@ornl.gov> wrote: > Junchao, > > Thank you for working on this. If you open the parameter file for, say, > the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add > -dm_mat_type > aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the > corresponding cusparse/cuda option). > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Thursday, December 1, 2022 17:05 > *To:* Fackler, Philip <fackle...@ornl.gov> > *Cc:* xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Roth, > Philip <rot...@ornl.gov> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Hi, Philip, > Sorry for the long delay. I could not get something useful from the > -log_view output. Since I have already built xolotl, could you give me > instructions on how to do a xolotl test to reproduce the divergence with > petsc GPU backends (but fine on CPU)? > Thank you. > --Junchao Zhang > > > On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fackle...@ornl.gov> > wrote: > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 > 14:36:46 2022 > Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: > 2022-10-28 14:39:41 +0000 > > Max Max/Min Avg Total > Time (sec): 6.023e+00 1.000 6.023e+00 > Objects: 1.020e+02 1.000 1.020e+02 > Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 > Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 > MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 2 5.14e-03 0 0.00e+00 0 > > VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 97100 0 0 0 97100 0 0 0 184 > -nan 2 5.14e-03 0 0.00e+00 54 > > TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan > -nan 1 3.36e-04 0 0.00e+00 100 > > TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 > -nan 1 4.80e-03 0 0.00e+00 46 > > KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 > -nan 1 4.80e-03 0 0.00e+00 53 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan > -nan 1 4.80e-03 0 0.00e+00 19 > > KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 5 5 > Distributed Mesh 2 2 > Index Set 11 11 > IS L to G Mapping 1 1 > Star Forest Graph 7 7 > Discrete System 2 2 > Weak Form 2 2 > Vector 49 49 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 4 4 > DMKSP interface 1 1 > Matrix 4 4 > Preconditioner 4 4 > Viewer 2 1 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.14e-08 > #PETSc Option Table entries: > -log_view > -log_view_gpu_times > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries > --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 > --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install > > ----------------------------------------- > Libraries compiled on 2022-11-01 21:01:08 on PC0115427 > Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 > Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib > -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc > -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib > -L/home/4pf/build/kokkos/cuda/install/lib > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 > -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers > -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Tuesday, November 15, 2022 13:03 > *To:* Fackler, Philip <fackle...@ornl.gov> > *Cc:* xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Roth, > Philip <rot...@ornl.gov> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Can you paste -log_view result so I can see what functions are used? > > --Junchao Zhang > > > On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fackle...@ornl.gov> > wrote: > > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip <fackle...@ornl.gov> > *Cc:* xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Zhang, > Junchao <jczh...@mcs.anl.gov>; Roth, Philip <rot...@ornl.gov> > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > >