Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-11-02 Thread Mark Adams via petsc-dev
I did not configure hypre manually, so I guess it is not using GPUs.

On Fri, Nov 2, 2018 at 2:40 PM Smith, Barry F.  wrote:

>
>
> > On Nov 2, 2018, at 1:25 PM, Mark Adams  wrote:
> >
> > And I just tested it with GAMG and it seems fine.  And hypre ran, but it
> is not clear that it used GPUs
>
> Presumably hyper must be configured to use GPUs. Currently the PETSc
> hyper download installer hypre.py doesn't have any options for getting
> hypre built for GPUs.
>
> Barry
>
> >
> > 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155
> > KSP Object: 1 MPI processes
> >   type: fgmres
> > restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> >   maximum iterations=1, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: hypre
> > HYPRE BoomerAMG preconditioning
> >   Cycle type V
> >   Maximum number of levels 25
> >   Maximum number of iterations PER hypre call 1
> >   Convergence tolerance PER hypre call 0.
> >   Threshold for strong coupling 0.25
> >   Interpolation truncation factor 0.
> >   Interpolation: max elements per row 0
> >   Number of levels of aggressive coarsening 0
> >   Number of paths for aggressive coarsening 1
> >   Maximum row sums 0.9
> >   Sweeps down 1
> >   Sweeps up   1
> >   Sweeps on coarse1
> >   Relax down  symmetric-SOR/Jacobi
> >   Relax upsymmetric-SOR/Jacobi
> >   Relax on coarse Gaussian-elimination
> >   Relax weight  (all)  1.
> >   Outer relax weight (all) 1.
> >   Using CF-relaxation
> >   Not using more complex smoothers.
> >   Measure typelocal
> >   Coarsen typeFalgout
> >   Interpolation type  classical
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> > type: seqaijcusparse
> > rows=64, cols=64, bs=4
> > total: nonzeros=1024, allocated nonzeros=1024
> > total number of mallocs used during MatSetValues calls =0
> >   using I-node routines: found 16 nodes, limit used is 5
> >   1 SNES Function norm 6.80716e-05
> > KSP Object: 1 MPI processes
> >   type: fgmres
> > restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> >   maximum iterations=1, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: hypre
> > HYPRE BoomerAMG preconditioning
> >   Cycle type V
> >   Maximum number of levels 25
> >   Maximum number of iterations PER hypre call 1
> >   Convergence tolerance PER hypre call 0.
> >   Threshold for strong coupling 0.25
> >   Interpolation truncation factor 0.
> >   Interpolation: max elements per row 0
> >   Number of levels of aggressive coarsening 0
> >   Number of paths for aggressive coarsening 1
> >   Maximum row sums 0.9
> >   Sweeps down 1
> >   Sweeps up   1
> >   Sweeps on coarse1
> >   Relax down  symmetric-SOR/Jacobi
> >   Relax upsymmetric-SOR/Jacobi
> >   Relax on coarse Gaussian-elimination
> >   Relax weight  (all)  1.
> >   Outer relax weight (all) 1.
> >   Using CF-relaxation
> >   Not using more complex smoothers.
> >   Measure typelocal
> >   Coarsen typeFalgout
> >   Interpolation type  classical
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> > type: seqaijcusparse
> > rows=64, cols=64, bs=4
> > total: nonzeros=1024, allocated nonzeros=1024
> > total number of mallocs used during MatSetValues calls =0
> >   using I-node routines: found 16 nodes, limit used is 5
> >   2 SNES Function norm 4.093e-11
> > Number of SNES iterations = 2
> >
> >
> > On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. 
> wrote:
> >
> >
> > > On Nov 2, 2018, at 1:03 PM, Mark Adams  wrote:
> > >
> > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working
> with GPUs. That is good enough for now.
> > > Thanks,
> >
> >Excellant!
> >
> > >
> > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > > lid velocity = 0.0625, prandtl # 

Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-11-02 Thread Smith, Barry F. via petsc-dev



> On Nov 2, 2018, at 1:25 PM, Mark Adams  wrote:
> 
> And I just tested it with GAMG and it seems fine.  And hypre ran, but it is 
> not clear that it used GPUs

Presumably hyper must be configured to use GPUs. Currently the PETSc hyper 
download installer hypre.py doesn't have any options for getting hypre built 
for GPUs.

Barry

> 
> 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres 
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
>   0 SNES Function norm 0.239155 
> KSP Object: 1 MPI processes
>   type: fgmres
> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
> happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
> HYPRE BoomerAMG preconditioning
>   Cycle type V
>   Maximum number of levels 25
>   Maximum number of iterations PER hypre call 1
>   Convergence tolerance PER hypre call 0.
>   Threshold for strong coupling 0.25
>   Interpolation truncation factor 0.
>   Interpolation: max elements per row 0
>   Number of levels of aggressive coarsening 0
>   Number of paths for aggressive coarsening 1
>   Maximum row sums 0.9
>   Sweeps down 1
>   Sweeps up   1
>   Sweeps on coarse1
>   Relax down  symmetric-SOR/Jacobi
>   Relax upsymmetric-SOR/Jacobi
>   Relax on coarse Gaussian-elimination
>   Relax weight  (all)  1.
>   Outer relax weight (all) 1.
>   Using CF-relaxation
>   Not using more complex smoothers.
>   Measure typelocal
>   Coarsen typeFalgout
>   Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
> type: seqaijcusparse
> rows=64, cols=64, bs=4
> total: nonzeros=1024, allocated nonzeros=1024
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 16 nodes, limit used is 5
>   1 SNES Function norm 6.80716e-05 
> KSP Object: 1 MPI processes
>   type: fgmres
> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
> happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
> HYPRE BoomerAMG preconditioning
>   Cycle type V
>   Maximum number of levels 25
>   Maximum number of iterations PER hypre call 1
>   Convergence tolerance PER hypre call 0.
>   Threshold for strong coupling 0.25
>   Interpolation truncation factor 0.
>   Interpolation: max elements per row 0
>   Number of levels of aggressive coarsening 0
>   Number of paths for aggressive coarsening 1
>   Maximum row sums 0.9
>   Sweeps down 1
>   Sweeps up   1
>   Sweeps on coarse1
>   Relax down  symmetric-SOR/Jacobi
>   Relax upsymmetric-SOR/Jacobi
>   Relax on coarse Gaussian-elimination
>   Relax weight  (all)  1.
>   Outer relax weight (all) 1.
>   Using CF-relaxation
>   Not using more complex smoothers.
>   Measure typelocal
>   Coarsen typeFalgout
>   Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
> type: seqaijcusparse
> rows=64, cols=64, bs=4
> total: nonzeros=1024, allocated nonzeros=1024
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 16 nodes, limit used is 5
>   2 SNES Function norm 4.093e-11 
> Number of SNES iterations = 2
> 
> 
> On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F.  wrote:
> 
> 
> > On Nov 2, 2018, at 1:03 PM, Mark Adams  wrote:
> > 
> > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with 
> > GPUs. That is good enough for now.
> > Thanks,
> 
>Excellant!
> 
> > 
> > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 
> > -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres 
> > -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155 
> > KSP Object: 1 MPI processes
> >   type: fgmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> > with no iterative refinement
> > happy breakdown tolerance 1e-30
> >   maximum iterations=1, initial guess is zero
> >   

Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-11-02 Thread Mark Adams via petsc-dev
And I just tested it with GAMG and it seems fine.  And hypre ran, but it is
not clear that it used GPUs

14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
-dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres
-snes_monitor_short -snes_rtol 1.e-5 -ksp_view
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 0.239155
KSP Object: 1 MPI processes
  type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
HYPRE BoomerAMG preconditioning
  Cycle type V
  Maximum number of levels 25
  Maximum number of iterations PER hypre call 1
  Convergence tolerance PER hypre call 0.
  Threshold for strong coupling 0.25
  Interpolation truncation factor 0.
  Interpolation: max elements per row 0
  Number of levels of aggressive coarsening 0
  Number of paths for aggressive coarsening 1
  Maximum row sums 0.9
  Sweeps down 1
  Sweeps up   1
  Sweeps on coarse1
  Relax down  symmetric-SOR/Jacobi
  Relax upsymmetric-SOR/Jacobi
  Relax on coarse Gaussian-elimination
  Relax weight  (all)  1.
  Outer relax weight (all) 1.
  Using CF-relaxation
  Not using more complex smoothers.
  Measure typelocal
  Coarsen typeFalgout
  Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaijcusparse
rows=64, cols=64, bs=4
total: nonzeros=1024, allocated nonzeros=1024
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 16 nodes, limit used is 5
  1 SNES Function norm 6.80716e-05
KSP Object: 1 MPI processes
  type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
HYPRE BoomerAMG preconditioning
  Cycle type V
  Maximum number of levels 25
  Maximum number of iterations PER hypre call 1
  Convergence tolerance PER hypre call 0.
  Threshold for strong coupling 0.25
  Interpolation truncation factor 0.
  Interpolation: max elements per row 0
  Number of levels of aggressive coarsening 0
  Number of paths for aggressive coarsening 1
  Maximum row sums 0.9
  Sweeps down 1
  Sweeps up   1
  Sweeps on coarse1
  Relax down  symmetric-SOR/Jacobi
  Relax upsymmetric-SOR/Jacobi
  Relax on coarse Gaussian-elimination
  Relax weight  (all)  1.
  Outer relax weight (all) 1.
  Using CF-relaxation
  Not using more complex smoothers.
  Measure typelocal
  Coarsen typeFalgout
  Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaijcusparse
rows=64, cols=64, bs=4
total: nonzeros=1024, allocated nonzeros=1024
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 16 nodes, limit used is 5
  2 SNES Function norm 4.093e-11
Number of SNES iterations = 2


On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F.  wrote:

>
>
> > On Nov 2, 2018, at 1:03 PM, Mark Adams  wrote:
> >
> > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with
> GPUs. That is good enough for now.
> > Thanks,
>
>Excellant!
>
> >
> > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155
> > KSP Object: 1 MPI processes
> >   type: fgmres
> > restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> >   maximum iterations=1, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: none
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> > type: seqaijcusparse
> > rows=64, cols=64, bs=4
> > total: nonzeros=1024, allocated nonzeros=1024
> > total number of mallocs used during MatSetValues calls =0
> 

Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-11-02 Thread Mark Adams via petsc-dev
FYI, I seem to have the new GPU machine at ORNL (summitdev) working with
GPUs. That is good enough for now.
Thanks,

14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
-dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres
-snes_monitor_short -snes_rtol 1.e-5 -ksp_view
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 0.239155
KSP Object: 1 MPI processes
  type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaijcusparse
rows=64, cols=64, bs=4
total: nonzeros=1024, allocated nonzeros=1024
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 16 nodes, limit used is 5
  1 SNES Function norm 6.82338e-05
KSP Object: 1 MPI processes
  type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: none
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaijcusparse
rows=64, cols=64, bs=4
total: nonzeros=1024, allocated nonzeros=1024
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 16 nodes, limit used is 5
  2 SNES Function norm 3.346e-10
Number of SNES iterations = 2
14:01 master= ~/petsc/src/snes/examples/tutorials$



On Thu, Nov 1, 2018 at 9:33 AM Mark Adams  wrote:

>
>
> On Wed, Oct 31, 2018 at 12:30 PM Mark Adams  wrote:
>
>>
>>
>> On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp  wrote:
>>
>>> Hi Mark,
>>>
>>> ah, I was confused by the Python information at the beginning of
>>> configure.log. So it is picking up the correct compiler.
>>>
>>> Have you tried uncommenting the check for GNU?
>>>
>>
> Yes, but I am getting an error that the cuda files do not find mpi.h.
>
>
>>
>> I'm getting a make error.
>>
>> Thanks,
>>
>