> On Nov 2, 2018, at 1:25 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> And I just tested it with GAMG and it seems fine.  And hypre ran, but it is 
> not clear that it used GPUs....

    Presumably hyper must be configured to use GPUs. Currently the PETSc hyper 
download installer hypre.py doesn't have any options for getting hypre built 
for GPUs.

    Barry

> 
> 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres 
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
>   0 SNES Function norm 0.239155 
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
>     HYPRE BoomerAMG preconditioning
>       Cycle type V
>       Maximum number of levels 25
>       Maximum number of iterations PER hypre call 1
>       Convergence tolerance PER hypre call 0.
>       Threshold for strong coupling 0.25
>       Interpolation truncation factor 0.
>       Interpolation: max elements per row 0
>       Number of levels of aggressive coarsening 0
>       Number of paths for aggressive coarsening 1
>       Maximum row sums 0.9
>       Sweeps down         1
>       Sweeps up           1
>       Sweeps on coarse    1
>       Relax down          symmetric-SOR/Jacobi
>       Relax up            symmetric-SOR/Jacobi
>       Relax on coarse     Gaussian-elimination
>       Relax weight  (all)      1.
>       Outer relax weight (all) 1.
>       Using CF-relaxation
>       Not using more complex smoothers.
>       Measure type        local
>       Coarsen type        Falgout
>       Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaijcusparse
>     rows=64, cols=64, bs=4
>     total: nonzeros=1024, allocated nonzeros=1024
>     total number of mallocs used during MatSetValues calls =0
>       using I-node routines: found 16 nodes, limit used is 5
>   1 SNES Function norm 6.80716e-05 
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
>     HYPRE BoomerAMG preconditioning
>       Cycle type V
>       Maximum number of levels 25
>       Maximum number of iterations PER hypre call 1
>       Convergence tolerance PER hypre call 0.
>       Threshold for strong coupling 0.25
>       Interpolation truncation factor 0.
>       Interpolation: max elements per row 0
>       Number of levels of aggressive coarsening 0
>       Number of paths for aggressive coarsening 1
>       Maximum row sums 0.9
>       Sweeps down         1
>       Sweeps up           1
>       Sweeps on coarse    1
>       Relax down          symmetric-SOR/Jacobi
>       Relax up            symmetric-SOR/Jacobi
>       Relax on coarse     Gaussian-elimination
>       Relax weight  (all)      1.
>       Outer relax weight (all) 1.
>       Using CF-relaxation
>       Not using more complex smoothers.
>       Measure type        local
>       Coarsen type        Falgout
>       Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaijcusparse
>     rows=64, cols=64, bs=4
>     total: nonzeros=1024, allocated nonzeros=1024
>     total number of mallocs used during MatSetValues calls =0
>       using I-node routines: found 16 nodes, limit used is 5
>   2 SNES Function norm 4.093e-11 
> Number of SNES iterations = 2
> 
> 
> On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
> 
> > On Nov 2, 2018, at 1:03 PM, Mark Adams <mfad...@lbl.gov> wrote:
> > 
> > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with 
> > GPUs. That is good enough for now.
> > Thanks,
> 
>    Excellant!
> 
> > 
> > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 
> > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres 
> > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155 
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> > with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: none
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   1 SNES Function norm 6.82338e-05 
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> > with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: none
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   2 SNES Function norm 3.346e-10 
> > Number of SNES iterations = 2
> > 14:01 master= ~/petsc/src/snes/examples/tutorials$ 
> > 
> > 
> > 
> > On Thu, Nov 1, 2018 at 9:33 AM Mark Adams <mfad...@lbl.gov> wrote:
> > 
> > 
> > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams <mfad...@lbl.gov> wrote:
> > 
> > 
> > On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp <r...@iue.tuwien.ac.at> wrote:
> > Hi Mark,
> > 
> > ah, I was confused by the Python information at the beginning of 
> > configure.log. So it is picking up the correct compiler.
> > 
> > Have you tried uncommenting the check for GNU?
> > 
> > Yes, but I am getting an error that the cuda files do not find mpi.h.
> >  
> > 
> > I'm getting a make error.
> > 
> > Thanks, 
> 

Reply via email to