> -mg_coarse_sub_mat_solver_type value: cusparse
It is a PC factor option so it is -mg_coarse_sub_pc_factor_mat_solver_type cusparse -help | grep mg_coarse_sub should have found it. Barry > On Jul 21, 2019, at 8:12 PM, Mark Adams <mfad...@lbl.gov> wrote:b > > Barry, > > Option left: name:-mg_coarse_mat_solver_type value: cusparse > > I tried this too: > > Option left: name:-mg_coarse_sub_mat_solver_type value: cusparse > > Here is the view. cuda did not get into the factor type. > > PC Object: 24 MPI processes > type: gamg > type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using externally compute Galerkin coarse grid matrices > GAMG specific options > Threshold for dropping small values in graph on each level = 0.05 > 0.025 0.0125 > Threshold scaling factor for each level not specified = 0.5 > AGG specific options > Symmetric graph false > Number of levels to square graph 10 > Number smoothing steps 1 > Complexity: grid = 1.14213 > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 24 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 24 MPI processes > type: bjacobi > number of blocks = 24 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=6, cols=6 > package used to perform factorization: petsc > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaijcusparse > rows=6, cols=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 24 MPI processes > type: mpiaijcusparse > rows=6, cols=6, bs=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using scalable MatPtAP() implementation > using I-node (on process 0) routines: found 2 nodes, limit used is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > > > > On Sun, Jul 21, 2019 at 3:58 PM Mark Adams <mfad...@lbl.gov> wrote: > Barry, I do NOT see communication. This is what made me think it was not > running on the GPU. I added print statements and found that > MatSolverTypeRegister_CUSPARSE IS called but (what it registers) > MatGetFactor_seqaijcusparse_cusparse does NOT get called. > > I have a job waiting on the queue. I'll send ksp_view when it runs. I will > try -mg_coarse_mat_solver_type cusparse. That is probably the problem. Maybe > I should set the coarse grid solver in a more robust way in GAMG, like use > the matrix somehow? I currently use PCSetType(pc, PCLU). > > I can't get an interactive shell now to run DDT, but I can try stepping > through from MatGetFactor to see what its doing. > > Thanks, > Mark > > On Sun, Jul 21, 2019 at 11:14 AM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > On Jul 21, 2019, at 8:55 AM, Mark Adams via petsc-dev > > <petsc-dev@mcs.anl.gov> wrote: > > > > I am running ex56 with -ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse > > and I see no GPU communication in MatSolve (the serial LU coarse grid > > solver). > > Do you mean to say, you DO see communication? > > What does -ksp_view should you? It should show the factor type in the > information about the coarse grid solve? > > You might need something like -mg_coarse_mat_solver_type cusparse > (because it may default to the PETSc one, it may be possible to have it > default to the cusparse if it exists and the matrix is of type > MATSEQAIJCUSPARSE). > > The determination of the MatGetFactor() is a bit involved including > pasting together strings and string compares and could be finding a CPU > factorization. > > I could run on one MPI_Rank() in the debugger and put a break point in > MatGetFactor() and track along to see what it picks and why. You could do > this debugging without GAMG first, just -pc_type lu > > > GAMG does set the coarse grid solver to LU manually like this: ierr = > > PCSetType(pc2, PCLU);CHKERRQ(ierr); > > For parallel runs this won't work using the GPU code and only sequential > direct solvers, so it must using BJACOBI in that case? > > Barry > > > > > > > I am thinking the dispatch of the CUDA version of this got dropped somehow. > > > > I see that this is getting called: > > > > PETSC_EXTERN PetscErrorCode MatSolverTypeRegister_CUSPARSE(void) > > { > > PetscErrorCode ierr; > > > > PetscFunctionBegin; > > ierr = > > MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_LU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); > > ierr = > > MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_CHOLESKY,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); > > ierr = > > MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ILU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); > > ierr = > > MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ICC,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); > > PetscFunctionReturn(0); > > } > > > > but MatGetFactor_seqaijcusparse_cusparse is not getting called. > > > > GAMG does set the coarse grid solver to LU manually like this: ierr = > > PCSetType(pc2, PCLU);CHKERRQ(ierr); > > > > Any ideas? > > > > Thanks, > > Mark > > > > >