Re: [petsc-dev] Error running on Titan with GPUs & GNU
I did not configure hypre manually, so I guess it is not using GPUs. On Fri, Nov 2, 2018 at 2:40 PM Smith, Barry F. wrote: > > > > On Nov 2, 2018, at 1:25 PM, Mark Adams wrote: > > > > And I just tested it with GAMG and it seems fine. And hypre ran, but it > is not clear that it used GPUs > > Presumably hyper must be configured to use GPUs. Currently the PETSc > hyper download installer hypre.py doesn't have any options for getting > hypre built for GPUs. > > Barry > > > > > 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > 0 SNES Function norm 0.239155 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > Cycle type V > > Maximum number of levels 25 > > Maximum number of iterations PER hypre call 1 > > Convergence tolerance PER hypre call 0. > > Threshold for strong coupling 0.25 > > Interpolation truncation factor 0. > > Interpolation: max elements per row 0 > > Number of levels of aggressive coarsening 0 > > Number of paths for aggressive coarsening 1 > > Maximum row sums 0.9 > > Sweeps down 1 > > Sweeps up 1 > > Sweeps on coarse1 > > Relax down symmetric-SOR/Jacobi > > Relax upsymmetric-SOR/Jacobi > > Relax on coarse Gaussian-elimination > > Relax weight (all) 1. > > Outer relax weight (all) 1. > > Using CF-relaxation > > Not using more complex smoothers. > > Measure typelocal > > Coarsen typeFalgout > > Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaijcusparse > > rows=64, cols=64, bs=4 > > total: nonzeros=1024, allocated nonzeros=1024 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 16 nodes, limit used is 5 > > 1 SNES Function norm 6.80716e-05 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > Cycle type V > > Maximum number of levels 25 > > Maximum number of iterations PER hypre call 1 > > Convergence tolerance PER hypre call 0. > > Threshold for strong coupling 0.25 > > Interpolation truncation factor 0. > > Interpolation: max elements per row 0 > > Number of levels of aggressive coarsening 0 > > Number of paths for aggressive coarsening 1 > > Maximum row sums 0.9 > > Sweeps down 1 > > Sweeps up 1 > > Sweeps on coarse1 > > Relax down symmetric-SOR/Jacobi > > Relax upsymmetric-SOR/Jacobi > > Relax on coarse Gaussian-elimination > > Relax weight (all) 1. > > Outer relax weight (all) 1. > > Using CF-relaxation > > Not using more complex smoothers. > > Measure typelocal > > Coarsen typeFalgout > > Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaijcusparse > > rows=64, cols=64, bs=4 > > total: nonzeros=1024, allocated nonzeros=1024 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 16 nodes, limit used is 5 > > 2 SNES Function norm 4.093e-11 > > Number of SNES iterations = 2 > > > > > > On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. > wrote: > > > > > > > On Nov 2, 2018, at 1:03 PM, Mark Adams wrote: > > > > > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working > with GPUs. That is good enough for now. > > > Thanks, > > > >Excellant! > > > > > > > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > > lid velocity = 0.0625, prandtl #
Re: [petsc-dev] Error running on Titan with GPUs & GNU
> On Nov 2, 2018, at 1:25 PM, Mark Adams wrote: > > And I just tested it with GAMG and it seems fine. And hypre ran, but it is > not clear that it used GPUs Presumably hyper must be configured to use GPUs. Currently the PETSc hyper download installer hypre.py doesn't have any options for getting hypre built for GPUs. Barry > > 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 0.239155 > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization > with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > Cycle type V > Maximum number of levels 25 > Maximum number of iterations PER hypre call 1 > Convergence tolerance PER hypre call 0. > Threshold for strong coupling 0.25 > Interpolation truncation factor 0. > Interpolation: max elements per row 0 > Number of levels of aggressive coarsening 0 > Number of paths for aggressive coarsening 1 > Maximum row sums 0.9 > Sweeps down 1 > Sweeps up 1 > Sweeps on coarse1 > Relax down symmetric-SOR/Jacobi > Relax upsymmetric-SOR/Jacobi > Relax on coarse Gaussian-elimination > Relax weight (all) 1. > Outer relax weight (all) 1. > Using CF-relaxation > Not using more complex smoothers. > Measure typelocal > Coarsen typeFalgout > Interpolation type classical > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaijcusparse > rows=64, cols=64, bs=4 > total: nonzeros=1024, allocated nonzeros=1024 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 16 nodes, limit used is 5 > 1 SNES Function norm 6.80716e-05 > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization > with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > Cycle type V > Maximum number of levels 25 > Maximum number of iterations PER hypre call 1 > Convergence tolerance PER hypre call 0. > Threshold for strong coupling 0.25 > Interpolation truncation factor 0. > Interpolation: max elements per row 0 > Number of levels of aggressive coarsening 0 > Number of paths for aggressive coarsening 1 > Maximum row sums 0.9 > Sweeps down 1 > Sweeps up 1 > Sweeps on coarse1 > Relax down symmetric-SOR/Jacobi > Relax upsymmetric-SOR/Jacobi > Relax on coarse Gaussian-elimination > Relax weight (all) 1. > Outer relax weight (all) 1. > Using CF-relaxation > Not using more complex smoothers. > Measure typelocal > Coarsen typeFalgout > Interpolation type classical > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaijcusparse > rows=64, cols=64, bs=4 > total: nonzeros=1024, allocated nonzeros=1024 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 16 nodes, limit used is 5 > 2 SNES Function norm 4.093e-11 > Number of SNES iterations = 2 > > > On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. wrote: > > > > On Nov 2, 2018, at 1:03 PM, Mark Adams wrote: > > > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with > > GPUs. That is good enough for now. > > Thanks, > >Excellant! > > > > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres > > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > 0 SNES Function norm 0.239155 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization > > with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=1, initial guess is zero > >
Re: [petsc-dev] Error running on Titan with GPUs & GNU
And I just tested it with GAMG and it seems fine. And hypre ran, but it is not clear that it used GPUs 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 0.239155 KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse1 Relax down symmetric-SOR/Jacobi Relax upsymmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Using CF-relaxation Not using more complex smoothers. Measure typelocal Coarsen typeFalgout Interpolation type classical linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaijcusparse rows=64, cols=64, bs=4 total: nonzeros=1024, allocated nonzeros=1024 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 16 nodes, limit used is 5 1 SNES Function norm 6.80716e-05 KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse1 Relax down symmetric-SOR/Jacobi Relax upsymmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Using CF-relaxation Not using more complex smoothers. Measure typelocal Coarsen typeFalgout Interpolation type classical linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaijcusparse rows=64, cols=64, bs=4 total: nonzeros=1024, allocated nonzeros=1024 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 16 nodes, limit used is 5 2 SNES Function norm 4.093e-11 Number of SNES iterations = 2 On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. wrote: > > > > On Nov 2, 2018, at 1:03 PM, Mark Adams wrote: > > > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with > GPUs. That is good enough for now. > > Thanks, > >Excellant! > > > > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > 0 SNES Function norm 0.239155 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: none > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaijcusparse > > rows=64, cols=64, bs=4 > > total: nonzeros=1024, allocated nonzeros=1024 > > total number of mallocs used during MatSetValues calls =0 >
Re: [petsc-dev] Error running on Titan with GPUs & GNU
FYI, I seem to have the new GPU machine at ORNL (summitdev) working with GPUs. That is good enough for now. Thanks, 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 0.239155 KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaijcusparse rows=64, cols=64, bs=4 total: nonzeros=1024, allocated nonzeros=1024 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 16 nodes, limit used is 5 1 SNES Function norm 6.82338e-05 KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaijcusparse rows=64, cols=64, bs=4 total: nonzeros=1024, allocated nonzeros=1024 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 16 nodes, limit used is 5 2 SNES Function norm 3.346e-10 Number of SNES iterations = 2 14:01 master= ~/petsc/src/snes/examples/tutorials$ On Thu, Nov 1, 2018 at 9:33 AM Mark Adams wrote: > > > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams wrote: > >> >> >> On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp wrote: >> >>> Hi Mark, >>> >>> ah, I was confused by the Python information at the beginning of >>> configure.log. So it is picking up the correct compiler. >>> >>> Have you tried uncommenting the check for GNU? >>> >> > Yes, but I am getting an error that the cuda files do not find mpi.h. > > >> >> I'm getting a make error. >> >> Thanks, >> >