Junchao,

Thank you for working on this. If you open the parameter file for, say, the 
PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add 
-dm_mat_type aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the 
corresponding cusparse/cuda option).

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com>
Sent: Thursday, December 1, 2022 17:05
To: Fackler, Philip <fackle...@ornl.gov>
Cc: xolotl-psi-developm...@lists.sourceforge.net 
<xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov 
<petsc-users@mcs.anl.gov>; Blondel, Sophie <sblon...@utk.edu>; Roth, Philip 
<rot...@ornl.gov>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec 
diverging when running on CUDA device.

Hi, Philip,
  Sorry for the long delay.  I could not get something useful from the 
-log_view output.  Since I have already built xolotl, could you give me 
instructions on how to do a xolotl test to reproduce the divergence with petsc 
GPU backends (but fine on CPU)?
  Thank you.
--Junchao Zhang


On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip 
<fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
------------------------------------------------------------------ PETSc 
Performance Summary: 
------------------------------------------------------------------

Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 
2022
Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 
2022-10-28 14:39:41 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           6.023e+00     1.000   6.023e+00
Objects:              1.020e+02     1.000   1.020e+02
Flops:                1.080e+09     1.000   1.080e+09  1.080e+09
Flops/sec:            1.793e+08     1.000   1.793e+08  1.793e+08
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     
Avg         %Total    Count   %Total
 0:      Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time 
over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per 
processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per 
processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             
 --- Global ---  --- Stage ----  Total
   GPU    - CpuToGpu -   - GpuToCpu - GPU

                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
 Mflop/s Count   Size   Count   Size  %F

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


--- Event Stage 0: Main Stage

BuildTwoSided          3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  
0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetGraph             3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetUp                3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFPack              4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFUnpack            4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecDot               190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMDot              775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNorm             1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScale            1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecCopy              780 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecSet              4955 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  
0  0  0  0   2  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecAXPY              190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAYPX              597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAXPBYCZ           643 1.0   nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecWAXPY             502 1.0   nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMAXPY            1159 1.0   nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScatterBegin     4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  
0  0  0  0   1  0  0  0  0  -nan
    -nan      2 5.14e-03    0 0.00e+00  0

VecScatterEnd       4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecReduceArith       380 1.0   nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecReduceComm        190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNormalize         965 1.0   nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

TSStep                20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 
0.0e+00 97100  0  0  0  97100  0  0  0   184
    -nan      2 5.14e-03    0 0.00e+00 54

TSFunctionEval       597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63  
1  0  0  0  63  1  0  0  0  -nan
    -nan      1 3.36e-04    0 0.00e+00 100

TSJacobianEval       190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  
3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

MatMult             1930 1.0   nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 
41  0  0  0   1 41  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatMultTranspose       1 1.0   nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatSolve             965 1.0   nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  
5  0  0  0   1  5  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSOR               965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 
31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorSym         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorNum       190 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 
11  0  0  0   1 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatScale             190 1.0   nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatAssemblyBegin     761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatAssemblyEnd       761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetRowIJ            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatCreateSubMats     380 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  
0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetOrdering         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatZeroEntries       379 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetValuesCOO      190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  
0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSetUp             760 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve             190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 10 86  0  0  0  10 86  0  0  0  1602
    -nan      1 4.80e-03    0 0.00e+00 46

KSPGMRESOrthog       775 1.0   nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  
2  0  0  0   1  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESSolve             71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 
0.0e+00 95 99  0  0  0  95 99  0  0  0   188
    -nan      1 4.80e-03    0 0.00e+00 53

SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SNESFunctionEval     573 1.0   nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60  
2  0  0  0  60  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESJacobianEval     190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  
3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

SNESLineSearch       190 1.0   nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 
10  0  0  0  53 10  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

PCSetUp              570 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 
11  0  0  0   2 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCApply              965 1.0   nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00  8 
57  0  0  0   8 57  0  0  0  -nan
    -nan      1 4.80e-03    0 0.00e+00 19

KSPSolve_FS_0        965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 
31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve_FS_1        965 1.0   nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 
15  0  0  0   2 15  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0


--- Event Stage 1: Unknown

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


Object Type          Creations   Destructions. Reports information only for 
process 0.

--- Event Stage 0: Main Stage

           Container     5              5
    Distributed Mesh     2              2
           Index Set    11             11
   IS L to G Mapping     1              1
   Star Forest Graph     7              7
     Discrete System     2              2
           Weak Form     2              2
              Vector    49             49
             TSAdapt     1              1
                  TS     1              1
                DMTS     1              1
                SNES     1              1
              DMSNES     3              3
      SNESLineSearch     1              1
       Krylov Solver     4              4
     DMKSP interface     1              1
              Matrix     4              4
      Preconditioner     4              4
              Viewer     2              1

--- Event Stage 1: Unknown

========================================================================================================================
Average time to get PetscTime(): 3.14e-08
#PETSc Option Table entries:
-log_view
-log_view_gpu_times
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: PETSC_DIR=/home/4pf/repos/petsc 
PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx 
--with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries 
--prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices 
--COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 
--with-kokkos-dir=/home/4pf/build/kokkos/cuda/install 
--with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install

-----------------------------------------
Libraries compiled on 2022-11-01 21:01:08 on PC0115427
Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
Using PETSc arch:
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas 
-Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector 
-fvisibility=hidden -O3
-----------------------------------------

Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include 
-I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include 
-I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
-----------------------------------------

Using C linker: mpicc
Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib 
-L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc 
-Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib 
-L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib 
-Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib 
-L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 
-L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels 
-lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt 
-lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
-----------------------------------------


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
Sent: Tuesday, November 15, 2022 13:03
To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
Cc: 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
<xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Blondel, Sophie 
<sblon...@utk.edu<mailto:sblon...@utk.edu>>; Roth, Philip 
<rot...@ornl.gov<mailto:rot...@ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec 
diverging when running on CUDA device.

Can you paste -log_view result so I can see what functions are used?

--Junchao Zhang


On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip 
<fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or 
cuda backends. All of them pass with the CPU-only kokkos backend.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
Sent: Monday, November 14, 2022 19:34
To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
Cc: 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
<xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Blondel, Sophie 
<sblon...@utk.edu<mailto:sblon...@utk.edu>>; Zhang, Junchao 
<jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>>; Roth, Philip 
<rot...@ornl.gov<mailto:rot...@ornl.gov>>
Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging 
when running on CUDA device.

Hi, Philip,
  Sorry to hear that.  It seems you could run the same code on CPUs but not no 
GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right?

--Junchao Zhang


On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
This is an issue I've brought up before (and discussed in-person with Richard). 
I wanted to bring it up again because I'm hitting the limits of what I know to 
do, and I need help figuring this out.

The problem can be reproduced using Xolotl's "develop" branch built against a 
petsc build with kokkos and kokkos-kernels enabled. Then, either add the 
relevant kokkos options to the "petscArgs=" line in the system test parameter 
file(s), or just replace the system test parameter files with the ones from the 
"feature-petsc-kokkos" branch. See here the files that begin with 
"params_system_".

Note that those files use the "kokkos" options, but the problem is similar 
using the corresponding cuda/cusparse options. I've already tried building 
kokkos-kernels with no TPLs and got slightly different results, but the same 
problem.

Any help would be appreciated.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

Reply via email to