from:"Dave May"

Re: [petsc-users] ISGlobalToLocalMappingApplyBlock

2017-11-15 Thread Dave May

On Wed, 15 Nov 2017 at 05:55, Adrian Croucher 
wrote:

> hi
>
> I'm trying to use ISGlobalToLocalMappingApplyBlock() and am a bit
> puzzled about the results it's giving.
>
> I've attached a small test to illustrate. It just sets up a
> local-to-global mapping with 10 elements. Running on two processes the
> first has global indices 0 - 4 and the the second has 5 - 9. I then try
> to find the local index corresponding to global index 8.
>
> If I set the blocksize parameter to 1, it correctly gives the results -1
> on rank 0 and 3 on rank 1.
>
> But if I set the blocksize to 2 (or more), the results are -253701943 on
> rank 0 and -1 on rank 1. Neither of these are what I expected- I thought
> they should be the same as in the blocksize 1 case.


The man page says to use "block global  numbering"


>
> I'm presuming the global indices I pass in to
> ISGlobalToLocalMappingApplyBlock() should be global block indices (i.e.
> not scaled up by blocksize).


Yes, the indices should relate to the blocks

If I do scale them up it doesn't give the
> answers I expect either.
>
> Or am I wrong to expect this to give the same results regardless of
> blocksize?


Yep.

However the large negative number being printed looks an uninitialized
variable. This seems odd as with mode = MASK nout should equal N and any
requested block indices not in the IS should result in -1 being inserted in
your local_indices array.

What's the value of nout?

Thanks,
  Dave


>
> Cheers, Adrian
>
> --
> Dr Adrian Croucher
> Senior Research Fellow
> Department of Engineering Science
> University of Auckland, New Zealand
> email: a.crouc...@auckland.ac.nz
> tel: +64 (0)9 923 4611
>
>

Re: [petsc-users] Understanding preallocation for MPI

2017-07-07 Thread Dave May

On Fri, 7 Jul 2017 at 11:31, Florian Lindner  wrote:

> Hello,
>
> I'm having some struggle understanding the preallocation for MPIAIJ
> matrices, especially when a value is in off-diagonal
> vs. diagonal block.
>
> The small example program is at https://pastebin.com/67dXnGm3
>
> In general it should be parallel, but right now I just run it in serial.


When you run this code in serial, the mat type will be MATSEQAIJ. Hence,
the call to MatMPIAIJSetPreallocation() will have no effect because the mat
type does not match MPIAIJ. As a result, your code doesn't perform any
preallocation for SEQAIJ matrices.

In addition to calling MatMPIAIJSetPreallocation(), add a call to
MatSEQAIJSetPreallocation.

Thanks,
  Dave


>
> According to my understanding of
>
>
> http://www.mcs.anl.gov/petsc/petsc-3.7/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html
>
> a entry is in the diagonal submatrix, if its row is in the OwnershipRange
> and its column is in OwnershipRangeColumn.
> That also means that in a serial run, there is only a diagonal submatrix.
>
> However, having MAT_NEW_NONZERO_ALLOCATION_ERR set, I get an error when
>
> Inserting 6 elements in row 2, though I have exactly
>
> 2 o_nnz = 0, d_nnz = 6 (means 6 elements allocated in the diagonal
> submatrix of row 2)
>
> Error is:
>
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: New nonzero at (2,5) caused a malloc
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
> off this check
>
>
> What is wrong with my understanding?
>
> Thanks,
> Florian
>

Re: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem

2017-06-21 Thread Dave May

Please send your modified version of ex34. It will be faster to examine the
source and experiment with option choices locally rather than sending
emails back and forth.

Thanks,
  Dave

On Thu, 22 Jun 2017 at 03:13, Jason Lefley 
wrote:

> Hello,
>
> We are attempting to use the PETSc KSP solver framework in a fluid
> dynamics simulation we developed. The solution is part of a pressure
> projection and solves a Poisson problem. We use a cell-centered layout with
> a regular grid in 3d. We started with ex34.c from the KSP tutorials since
> it has the correct calls for the 3d DMDA, uses a cell-centered layout, and
> states that it works with multi-grid. We modified the operator construction
> function to match the coefficients and Dirichlet boundary conditions used
> in our problem (we’d also like to support Neumann but left those out for
> now to keep things simple). As a result of the modified boundary
> conditions, our version does not perform a null space removal on the right
> hand side or operator as the original did. We also modified the right hand
> side to contain a sinusoidal pattern for testing. Other than these changes,
> our code is the same as the original ex34.c
>
> With the default KSP options and using CG with the default pre-conditioner
> and without a pre-conditioner, we see good convergence. However, we’d like
> to accelerate the time to solution further and target larger problem sizes
> (>= 1024^3) if possible. Given these objectives, multi-grid as a
> pre-conditioner interests us. To understand the improvement that multi-grid
> provides, we ran ex45 from the KSP tutorials. ex34 with CG and no
> pre-conditioner appears to converge in a single iteration and we wanted to
> compare against a problem that has similar convergence patterns to our
> problem. Here’s the tests we ran with ex45:
>
> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129
> time in KSPSolve(): 7.0178e+00
> solver iterations: 157
> KSP final norm of residual: 3.16874e-05
>
> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type
> cg -pc_type none
> time in KSPSolve(): 4.1072e+00
> solver iterations: 213
> KSP final norm of residual: 0.000138866
>
> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type
> cg
> time in KSPSolve(): 3.3962e+00
> solver iterations: 88
> KSP final norm of residual: 6.46242e-05
>
> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type
> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1
> -mg_levels_pc_type bjacobi
> time in KSPSolve(): 1.3201e+00
> solver iterations: 4
> KSP final norm of residual: 8.13339e-05
>
> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type
> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1
> -mg_levels_pc_type bjacobi -ksp_type cg
> time in KSPSolve(): 1.3008e+00
> solver iterations: 4
> KSP final norm of residual: 2.21474e-05
>
> We found the multi-grid pre-conditioner options in the KSP tutorials
> makefile. These results make sense; both the default GMRES and CG solvers
> converge and CG without a pre-conditioner takes more iterations. The
> multi-grid pre-conditioned runs are pretty dramatically accelerated and
> require only a handful of iterations.
>
> We ran our code (modified ex34.c as described above) with the same
> parameters:
>
> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
> time in KSPSolve(): 5.3729e+00
> solver iterations: 123
> KSP final norm of residual: 0.00595066
>
> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
> -ksp_type cg -pc_type none
> time in KSPSolve(): 3.6154e+00
> solver iterations: 188
> KSP final norm of residual: 0.00505943
>
> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
> -ksp_type cg
> time in KSPSolve(): 3.5661e+00
> solver iterations: 98
> KSP final norm of residual: 0.00967462
>
> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson
> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi
> time in KSPSolve(): 4.5606e+00
> solver iterations: 44
> KSP final norm of residual: 949.553
>
> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson
> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg
> time in KSPSolve(): 1.5481e+01
> solver iterations: 198
> KSP final norm of residual: 0.916558
>
> We performed all tests with petsc-3.7.6.
>
> The trends with CG and GMRES seem consistent with the results from ex45.
> However, with multi-grid, something doesn’t seem right. Convergence seems

Re: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ?

2017-06-21 Thread Dave May

You can assemble R^t and then use MatPtAP which supports MPIAIJ


On Wed, 21 Jun 2017 at 15:00, Franck Houssen 
wrote:

> How to compute RARt with A and R as distributed (MPI) matrices ?
>
> This works with sequential matrices.
> The doc say "currently only implemented for pairs of AIJ matrices and
> classes which inherit from AIJ": I supposed that MPIAIJ was someway
> inheriting from AIJ, seems that it doesn't.
>
> Is this kind of matrix product possible with distributed matrices in PETSc
> ? Or is this a known limitation ?
> Do I go the wrong way to do that (= should use another method) ? If yes,
> what is the correct one ?
>
> Franck
>
> PS: running debian/testing + gcc-6.3 + bitbucket petsc.
>
> >> mpirun -n 2 matRARt.exe seq
> Mat Object: 1 MPI processes
>   type: seqaij
> row 0: (0, 1.)  (1, 0.)
> row 1: (0, 0.)  (1, 1.)
>
> >> mpirun -n 2 matRARt.exe mpi
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Matrix of type  does not support RARt
>
>

Re: [petsc-users] Advice on improving Stokes Schur preconditioners

2017-06-14 Thread Dave May

On Wed, 14 Jun 2017 at 19:42, David Nolte <dno...@dim.uchile.cl> wrote:

> Dave, thanks a lot for your great answer and for sharing your experience.
> I have a much clearer picture now. :)
>
> The experiments 3/ give the desired results for examples of cavity flow.
> The (1/mu scaled) mass matrix seems OK.
>
> I followed your and Matt's recommendations, used a FULL Schur
> factorization, LU in the 0th split, and gradually relaxed the tolerance of
> GMRES/Jacobi in split 1 (observed the gradual increase in outer
> iterations). Then I replaced the split_0 LU with AMG (further increase of
> outer iterations and iterations on the Schur complement).
> Doing so I converged to using hypre boomeramg (smooth_type Euclid,
> strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur block,
> which gave the best time-to-solution in my particular setup and convergence
> to rtol=1e-8 within 60 outer iterations.
> In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2) instead
> of "preonly" did not help convergence (on the contrary).
>
> I also repeated the experiments with "-pc_fieldsplit_schur_precondition
> selfp", with hypre(ilu) in split 0 and hypre in split 1, just to check, and
> somewhat disappointingly ( ;-) ) the wall time is less than half than when
> using gmres/Jac and Sp = mass matrix.
> I am aware that this says nothing about scaling and robustness with
> respect to h-refinement...
>

- selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator is
not spectrally equivalent to S

- For split 0 did you use preonly-hypre(ilu)?

- For split 1 did you also use hypre(ilu) (you just wrote hypre)?

- What was the iteration count for the saddle point problem with hypre and
selfp? Iterates will increase if you refine the mesh and a cross over will
occur at some (unknown) resolution and the mass matrix variant will be
faster.


>
> Would you agree that these configurations "make sense"?
>

If you want to weak scale, the configuration with the mass matrix makes the
most sense.

If you are only interested in solving many problems on one mesh, then do
what ever you can to make the solve time as fast as possible - including
using preconditioners defined with non-spectrally equivalent operators :D

Thanks,
  Dave


> Furthermore, maybe anyone has a hint where to start tuning multigrid? So
> far hypre worked better than ML, but I have not experimented much with the
> parameters.
>

>
>
> Thanks again for your help!
>
> Best wishes,
> David
>
>
>
>
> On 06/12/2017 04:52 PM, Dave May wrote:
>
> I've been following the discussion and have a couple of comments:
>
> 1/ For the preconditioners that you are using (Schur factorisation LDU, or
> upper block triangular DU), the convergence properties (e.g. 1 iterate for
> LDU and 2 iterates for DU) come from analysis involving exact inverses of
> A_00 and S
>
> Once you switch from using exact inverses of A_00 and S, you have to rely
> on spectral equivalence of operators. That is fine, but the spectral
> equivalence does not tell you how many iterates LDU or DU will require to
> converge. What it does inform you about is that if you have a spectrally
> equivalent operator for A_00 and S (Schur complement), then under mesh
> refinement, your iteration count (whatever it was prior to refinement) will
> not increase.
>
> 2/ Looking at your first set of options, I see you have opted to use
> -fieldsplit_ksp_type preonly (for both split 0 and 1). That is nice as it
> creates a linear operator thus you don't need something like FGMRES or GCR
> applied to the saddle point problem.
>
> Your choice for Schur is fine in the sense that the diagonal of M is
> spectrally equivalent to M, and M is spectrally equivalent to S. Whether it
> is "fine" in terms of the iteration count for Schur systems, we cannot say
> apriori (since the spectral equivalence doesn't give us direct info about
> the iterations we should expect).
>
> Your preconditioner for A_00 relies on AMG producing a spectrally
> equivalent operator with bounds which are tight enough to ensure
> convergence of the saddle point problem. I'll try explain this.
>
> In my experience, for many problems (unstructured FE with variable
> coefficients, structured FE meshes with variable coefficients) AMG and
> preonly is not a robust choice. To control the approximation (the spectral
> equiv bounds), I typically run a stationary or Krylov method on split 0
> (e.g. -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since the AMG
> preconditioner generated is spectrally equivalent (usually!), these solves
> will converge to a chosen rtol in a constant number of iterates under
> h-refinement. In practice, if I don't e

Re: [petsc-users] Advice on improving Stokes Schur preconditioners

2017-06-12 Thread Dave May

I've been following the discussion and have a couple of comments:

1/ For the preconditioners that you are using (Schur factorisation LDU, or
upper block triangular DU), the convergence properties (e.g. 1 iterate for
LDU and 2 iterates for DU) come from analysis involving exact inverses of
A_00 and S

Once you switch from using exact inverses of A_00 and S, you have to rely
on spectral equivalence of operators. That is fine, but the spectral
equivalence does not tell you how many iterates LDU or DU will require to
converge. What it does inform you about is that if you have a spectrally
equivalent operator for A_00 and S (Schur complement), then under mesh
refinement, your iteration count (whatever it was prior to refinement) will
not increase.

2/ Looking at your first set of options, I see you have opted to use
-fieldsplit_ksp_type preonly (for both split 0 and 1). That is nice as it
creates a linear operator thus you don't need something like FGMRES or GCR
applied to the saddle point problem.

Your choice for Schur is fine in the sense that the diagonal of M is
spectrally equivalent to M, and M is spectrally equivalent to S. Whether it
is "fine" in terms of the iteration count for Schur systems, we cannot say
apriori (since the spectral equivalence doesn't give us direct info about
the iterations we should expect).

Your preconditioner for A_00 relies on AMG producing a spectrally
equivalent operator with bounds which are tight enough to ensure
convergence of the saddle point problem. I'll try explain this.

In my experience, for many problems (unstructured FE with variable
coefficients, structured FE meshes with variable coefficients) AMG and
preonly is not a robust choice. To control the approximation (the spectral
equiv bounds), I typically run a stationary or Krylov method on split 0
(e.g. -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since the AMG
preconditioner generated is spectrally equivalent (usually!), these solves
will converge to a chosen rtol in a constant number of iterates under
h-refinement. In practice, if I don't enforce that I hit something like
rtol=1.0e-1 (or 1.0e-2) on the 0th split, saddle point iterates will
typically increase for "hard" problems under mesh refinement (1e4-1e7
coefficient variation), and may not even converge at all when just using
-fieldsplit_0_ksp_type preonly. Failure ultimately depends on how "strong"
the preconditioner for A_00 block is (consider re-discretized geometric
multigrid versus AMG). Running an iterative solve on the 0th split lets you
control and recover from weak/poor, but spectrally equivalent
preconditioners for A_00. Note that people hate this approach as it
invariably nests Krylov methods, and subsequently adds more global
reductions. However, it is scalable, optimal, tuneable and converges faster
than the case which didn't converge at all :D

3/ I agree with Matt's comments, but I'd do a couple of other things first.

* I'd first check the discretization is implemented correctly. Your P2/P1
element is inf-sup stable - thus the condition number of S
(unpreconditioned) should be independent of the mesh resolution (h). An
easy way to verify this is to run either LDU (schur_fact_type full) or DU
(schur_fact_type upper) and monitor the iterations required for those S
solves. Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8
-fieldsplit_1_ksp_monitor_true_residual -fieldsplit_1_ksp_pc_right
-fieldsplit_1_ksp_type gmres -fieldsplit_0_pc_type lu

Then refine the mesh (ideally via sub-division) and repeat the experiment.
If the S iterates don't asymptote, but instead grow with each refinement -
you likely have a problem with the discretisation.

* Do the same experiment, but this time use your mass matrix as the
preconditioner for S and use -fieldsplit_1_pc_type lu. If the iterates,
compared with the previous experiments (without a Schur PC) have gone up
your mass matrix is not defined correctly. If in the previous experiment
(without a Schur PC) iterates on the S solves were bounded, but now when
preconditioned with the mass matrix the iterates go up, then your mass
matrix is definitely not correct.

4/ Lastly, to finally get to your question regarding does  +400 iterates
for the solving the Schur seem "reasonable" and what is "normal behaviour"?

It seems "high" to me. However the specifics of your discretisation, mesh
topology, element quality, boundary conditions render it almost impossible
to say what should be expected. When I use a Q2-P2* discretisation on a
structured mesh with a non-constant viscosity I'd expect something like
50-60 for 1.0e-10 with a mass matrix scaled by the inverse (local)
viscosity. For constant viscosity maybe 30 iterates. I think this kind of
statement is not particularly useful or helpful though.

Given you use an unstructured tet mesh, it is possible that some elements
have very bad quality (high aspect ratio (AR), highly skewed). I am certain
that P2/P1 has an inf-sup constant which is

Re: [petsc-users] How to VecScatter from global to local vector, and then, VecGather back ?

2017-06-06 Thread Dave May

On 6 June 2017 at 17:45, Franck Houssen  wrote:

> How to VecScatter from global to local vector, and then, VecGather back ?
>
> This is a very simple use case: I need to split a global vector in local
> (possibly overlapping) pieces, then I need to modify each local piece (x2),
> and finally I need to assemble (+=) back local parts into a global vector.
> Read the doc and went through examples... But still can't make this work:
> can I get some help on this ?
>
>
Your usage of VecScatter in the code is fine.

The reason you don't get the expected result of (-2,-4,-2) is because your
vector (globVec) contains a bunch of -1's prior to the gather operation.

Just call

  VecZeroEntries(globVec);
before the call to

  VecScatterBegin(scatCtx, locVec, globVec, ADD_VALUES, SCATTER_REVERSE);

  VecScatterEnd  (scatCtx, locVec, globVec, ADD_VALUES, SCATTER_REVERSE);
and you'll get the correct result.

Thanks,
  Dave




> Note: running petsc-3.7.6 on debian with gcc-6.3
>
> Thanks,
>
> Franck
>
> ~> head -n 12 vecScatterGather.cpp
> // How to VecScatter from global to local vector, and then, VecGather back
> ?
> //
> //  global vector: 3x1   2 overlapping local vector:
> 2x1global vector: 3x1
> //
> // x2
> //  |1->|2
> //   |1scatter  |1|2
> gather   |2
> //   |1  ->
> -> |4
> //   |1|1->|2
>   |2
> //  |1|2
> //
> // ~> g++ -o vecScatterGather.exe vecScatterGather.cpp -lpetsc -lm; mpirun
> -n 2 vecScatterGather.exe
>
>
>

Re: [petsc-users] a question about PetscSectionCreate

2017-05-29 Thread Dave May

On Mon, 29 May 2017 at 08:39, leejearl  wrote:

> Hi, all:
> I have create a IS for every cell in dmplex by the following steps:
> 1. Creating a integer array which size is matched to the number of cells.
> 2. Use the routine "ISCreateGeneral" to create a corresponding IS.
>
> Is there any routine which can create a IS for every cell in the dmplex
> directly?,
>

I don't think so as Plex would have to somehow know what geom quantity to
use to define the size of IS (e.g. vertex, cell, face, edge)

and what is the difference between ISCopy() and ISDuplicate()?
>

ISDuplicate allocates memory for a new with the same comm and layout as the
original IS AND copies values from the original IS into the new one. (Note
that this is slightly different from other duplicate functions like
VecDuplicate which only allocate memory and does not copy values from the
orig vec.)

ISCopy does not allocate memory for the IS (passed as the second arg), it
only performs the copy of values.

Thanks
  Dave


>
> Thanks,
> leejearl
>
>
> On 2017年05月28日 19:35, Matthew Knepley wrote:
>
> On Sun, May 28, 2017 at 6:02 AM, Lawrence Mitchell <
> lawrence.mitch...@imperial.ac.uk> wrote:
>
>>
>>
>> > On 28 May 2017, at 09:16, leejearl  wrote:
>> >
>> > Hi, Dave: I want to store a PetscInt tag for every cell of the dmplex
>> with the struct. Thanks,
>>
>> You probably want to use a DMLabel to store these ids. Unless you have a
>> different I'd for every cell.
>
>
> Several things to think about:
>
> 1) If you want to store a tag for EVERY cell, then just use an IS. Cell
> numberings are guaranteed to be
> contiguous and start from 0.
>
> 2) If you want to tag only SOME cells, then use a DMLabel as Lawrence
> suggests. This uses hash tables
> for fast construction, and sorted lists for fast search and retrieval.
>
> 3) If you want to store a VARIABLE number of data items per cell, then use
> a Section and an array that you allocate.
>
>Matt
>
>
>>
>> Lawrence
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/
>
>
> --
> 李季
> 西北工业大学航空学院流体力学系
> Phone: 17792092487
> QQ: 188524324
>
>

Re: [petsc-users] a question about PetscSectionCreate

2017-05-28 Thread Dave May

On Sun, 28 May 2017 at 09:30, leejearl  wrote:

> Hi, Dave:
>  Thank you for your kind reply. If I want to store a mixture of
> PetscReal and PetscInt, how can I do it?


What operations do you need to perform with your struct?


>
>  Thanks,
> leejearl
>
>

Re: [petsc-users] a question about PetscSectionCreate

2017-05-28 Thread Dave May

On Sun, 28 May 2017 at 08:31, leejearl  wrote:

> Hi, PETSc developer:
>
>  I need to create a PetscSection with a struct. The struct is
> defined as follow,
>
> typedef struct
> {
>PetscReal x;
>PetscInt id;
> } testStruct;
>
> When I run the program, I got a wrong output as follow,
>
> Vec Object: 1 MPI processes
>type: seq
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
> 2.
> 4.94066e-324
>
> But when I defined the struct as
>
> typedef struct
> {
>PetscReal x;
>PetscReal id;
> } testStruct;
>
> The output is ok. It seems that  there is some wrong with the memories
> when I define the "id" as a PetscInt type.


Yep.


>
> I can not find out the reasons, and any one can help me with it?


The Vec object can only store quantities of type PetscScalar. It cannot
store PetscInt's and it definitely cannot represent a mixture of
PetscReal's and PetscInt's.


Thanks,
 Dave

The
> source file "test.c" is attached.
>
>
> Thanks,
>
> leejearl
>
>

Re: [petsc-users] Some general questions

2017-05-12 Thread Dave May

On 12 May 2017 at 07:50, Matt Baker  wrote:

> Hello,
>
>
> I have a few questions on how to improve performance of my program. I'm
> solving Poisson's equation on a (large) 3D FD grid with Dirichlet boundary
> conditions and multiple right hand sides. I set up the matrix and
> everything's working fine so far, but I'm sure the solving process could go
> faster. I know multigrid is generally the best preconditioner in such a
> case and algebraic multigrid currently works best.
>

If you use a DMDA for your FD problem, consider using PCMG with Galerkin.
It will set up a geometric multigrid hierarchy. Depending on the specifics
of your Poisson problem (constant coefficient versus highly hetegoneous),
geometric MG is likely superior (faster time to solution) than AMG.

>
> So generally speaking:
>
>
> Should I make the effort of symmetrizising the system matrix? I know how
> to do it, but it would probably take some time. CG does currently work, but
> is not competitive against other methods, so I guess the matrix might not
> be "symmetric enough"?
>

In its basic form, CG is only guaranteed to converge with with an SPD
operator.
If you want to use CG, definitely do the work and make the operator
symmetric.

>
> For the various multigrid preconditioners: I always read that the problem
> should be solved exactly on the coarsest grid, but wouldn't an iterative
> solver do the same job if its provided accuracy is high enough, since the
> coarse discretization and the subsequent interpolation process introduce
> errors themselves?
>

Yes iterative can work well. If your Poisson problem has a constant
coefficient, rtol 1.0e-1 is likely a sufficient tolerance to use for an the
coarse grid solve (e.g. overall convergence of solve won't be affected). If
the Poisson problem has a highly variable coefficient (jumps of O(1e3) or
more), or it has very large gradients say 1e3 variation over a few cells,
then you will have to perform a more accurate iterative coarse level solve
(say rtol 1e-4 to 1e-6). Note that the numbers for rtol I quote are purely
empirical.

>
> I submit my program to a batch system, but PETSc was compiled on the login
> node with different hardware. Is this affecting performance? What parts of
> the configuration process should I perform on a compute node then?
>
If the login and compute nodes are fundamentally different, you should
configure petsc with the option
 --with-batch
and following the instructions.

Thanks,
  Dave

>
> Thanks.
>

Re: [petsc-users] strange convergence

2017-05-03 Thread Dave May

On Wed, 3 May 2017 at 09:29, Hoang Giang Bui  wrote:

> Dear Jed
>
> If I understood you correctly you suggest to avoid penalty by using the
> Lagrange multiplier for the mortar constraint? In this case it leads to the
> use of discrete Lagrange multiplier space. Do you or anyone already have
> experience using discrete Lagrange multiplier space with Petsc?
>

Yes - this is similar to solving incompressible Stokes in which the
pressure is a Lagrange multiplier enforcing the div(v)=0 constraint.

Robust preconditioners for this problem are constructed using PCFIELDSPLIT.

Thanks,
  Dave



> There is also similar question on stackexchange
>
> https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers
>
> Giang
>
> On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown  wrote:
>
>> Hoang Giang Bui  writes:
>>
>> > Hi Barry
>> >
>> > The first block is from a standard solid mechanics discretization based
>> on
>> > balance of momentum equation. There is some material involved but in
>> > principal it's well-posed elasticity equation with positive definite
>> > tangent operator. The "gluing business" uses the mortar method to keep
>> the
>> > continuity of displacement. Instead of using Lagrange multiplier to
>> treat
>> > the constraint I used penalty method to penalize the energy. The
>> > discretization form of mortar is quite simple
>> >
>> > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA }
>> >
>> > rho is penalty parameter. In the simulation I initially set it low (~E)
>> to
>> > preserve the conditioning of the system.
>>
>> There are two things that can go wrong here with AMG:
>>
>> * The penalty term can mess up the strength of connection heuristics
>>   such that you get poor choice of C-points (classical AMG like
>>   BoomerAMG) or poor choice of aggregates (smoothed aggregation).
>>
>> * The penalty term can prevent Jacobi smoothing from being effective; in
>>   this case, it can lead to poor coarse basis functions (higher energy
>>   than they should be) and poor smoothing in an MG cycle.  You can fix
>>   the poor smoothing in the MG cycle by using a stronger smoother, like
>>   ASM with some overlap.
>>
>> I'm generally not a fan of penalty methods due to the irritating
>> tradeoffs and often poor solver performance.
>>
>> > In the figure below, the colorful blocks are u_1 and the base is u_2.
>> Both
>> > u_1 and u_2 use isoparametric quadratic approximation.
>> >
>> > 
>> >  Snapshot.png
>> > <
>> https://drive.google.com/file/d/0Bw8Hmu0-YGQXc2hKQ1BhQ1I4OEU/view?usp=drive_web
>> >
>> > 
>> >
>> > Giang
>> >
>> > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith 
>> wrote:
>> >
>> >>
>> >>   Ok, so boomerAMG algebraic multigrid is not good for the first block.
>> >> You mentioned the first block has two things glued together? AMG is
>> >> fantastic for certain problems but doesn't work for everything.
>> >>
>> >>Tell us more about the first block, what PDE it comes from, what
>> >> discretization, and what the "gluing business" is and maybe we'll have
>> >> suggestions for how to precondition it.
>> >>
>> >>Barry
>> >>
>> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui 
>> wrote:
>> >> >
>> >> > It's in fact quite good
>> >> >
>> >> > Residual norms for fieldsplit_u_ solve.
>> >> > 0 KSP Residual norm 4.014715925568e+00
>> >> > 1 KSP Residual norm 2.160497019264e-10
>> >> > Residual norms for fieldsplit_wp_ solve.
>> >> > 0 KSP Residual norm 0.e+00
>> >> >   0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm
>> >> 9.006493082896e+06 ||r(i)||/||b|| 1.e+00
>> >> > Residual norms for fieldsplit_u_ solve.
>> >> > 0 KSP Residual norm 9.9416e-01
>> >> > 1 KSP Residual norm 7.118380416383e-11
>> >> > Residual norms for fieldsplit_wp_ solve.
>> >> > 0 KSP Residual norm 0.e+00
>> >> >   1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm
>> >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11
>> >> > Linear solve converged due to CONVERGED_ATOL iterations 1
>> >> >
>> >> > Giang
>> >> >
>> >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith 
>> wrote:
>> >> >
>> >> >   Run again using LU on both blocks to see what happens.
>> >> >
>> >> >
>> >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui 
>> >> wrote:
>> >> > >
>> >> > > I have changed the way to tie the nonconforming mesh. It seems the
>> >> matrix now is better
>> >> > >
>> >> > > with -pc_type lu  the output is
>> >> > >   0 KSP preconditioned resid norm 3.308678584240e-01 true resid
>> norm
>> >> 9.006493082896e+06 ||r(i)||/||b|| 1.e+00
>> >> > >   1 KSP preconditioned resid norm 2.004313395301e-12 true resid
>> norm
>> >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12
>> >> > > Linear solve converged due to

Re: [petsc-users] Solving NON-Diagonally dominant sparse system

2017-04-11 Thread Dave May

Nope - welcome to finite precision arithmetic. What's the condition number?

On Tue, 11 Apr 2017 at 14:07, Kaushik Kulkarni <kaushik...@gmail.com> wrote:

> But anyway since I am starting off with the exact solution itself,
> shouldn't the norm should be zero independent of the conditioning?
>
> On Tue, Apr 11, 2017 at 11:57 AM, Dave May <dave.mayhe...@gmail.com>
> wrote:
>
>
>
> On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni <kaushik...@gmail.com>
> wrote:
>
> A strange behavior I am observing is:
> Problem: I have to solve A*x=rhs, and currently I am currently trying to
> solve for a system where I know the exact solution. I have initialized the
> exact solution in the Vec x_exact.
>
> MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy
> VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs
> VecNorm(dummy, NORM_INFINITY, _val); // norm_val = ||dummy||, which
> gives us the residual norm
> PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the
> norm.
>
> // Starting with the linear solver
> KSPCreate(PETSC_COMM_SELF, );
> KSPSetOperators(ksp, A, A);
> KSPSetFromOptions(ksp);
> KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given
> initial input x_exact. So the result will also be stored in x_exact
>
> On running with -pc_type lu -pc_factor_mat_solver_package superlu
> -ksp_monitor I get the following output:
> Norm = 0.00
>   0 KSP Residual norm 4.371606462669e+04
>   1 KSP Residual norm 5.850058113796e+02
>   2 KSP Residual norm 5.832677911508e+02
>   3 KSP Residual norm 1.987386549571e+02
>   4 KSP Residual norm 1.220006530614e+02
>   .
>   .
>   .
>
>
> The default KSP is left preconditioned GMRES. Hence the above iterates
> report the preconditioned residual. If your operator is singular, and LU
> generated garbage, the preconditioned residual can be very different to the
> true residual.
>
> To see the true residual, use
> -ksp_monitor_true_residual
>
> Alternatively, use a right preconditioned KSP method, e.g.
> -ksp_type fgmres
> (or -ksp_type gcr)
> With these methods, you will see the true residual with just -ksp_monitor
>
>
> Thanks
>   Dave
>
>
>
>
>
> Since the initial guess is the exact solution should'nt the first residual
> itself be zero and converge in one iteration.
>
> Thanks,
> Kaushik
>
>
> On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni <kaushik...@gmail.com>
> wrote:
>
> Thank you for the inputs.
> I tried Barry' s suggestion to use SuperLU, but the solution does not
> converge and on doing -ksp_monitor -ksp_converged_reason. I get the
> following error:-
> 240 KSP Residual norm 1.722571678777e+07
> Linear solve did not converge due to DIVERGED_DTOL iterations 240
> For some reason it is diverging, although I am sure that for the given
> system a unique solution exists.
>
> Thanks,
> Kaushik
>
> On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li <x...@lbl.gov> wrote:
>
> If you need to use SuperLU_DIST, the pivoting is done statically, using
> maximum weighted matching, so the small diagonals are usually taken care as
> well. It is not as good as partial pivoting, but works most of the time.
>
> Sherry
>
> On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>
>
>I would suggest using ./configure --download-superlu and then when
> running the program -pc_type lu -pc_factor_mat_solver_package superlu
>
>Note that this is SuperLU, it is not SuperLU_DIST.  Superlu uses
> partial pivoting for numerical stability so should be able to handle the
> small or zero diagonal entries.
>
> Barry
>
> > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni <kaushik...@gmail.com>
> wrote:
> >
> > Hello,
> > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the
> matrix structure I have added a file matrix.log which contains the output
> of MatView() and also the output of Matview_draw in the image file.
> >
> > From the matrix structure it can be seen that Jacobi iteration won't
> work and some of the diagonal entries being very low(of the order of 1E-16)
> LU factorization would also fail.
> >
> > Can someone please suggest what all could I try next, in order to make
> the solution converge?
> >
> > Thanks,
> > Kaushik
> > 
> > --
> > Kaushik Kulkarni
> > Fourth Year Undergraduate
> > Department of Mechanical Engineering
> > Indian Institute of Technology Bombay
> > Mumbai, India
> > https://kaushikcfd.github.io/About/
> > +91-9967687150
> > 
>
>
>
>
>
> --
> Kaushik

Re: [petsc-users] Solving NON-Diagonally dominant sparse system

2017-04-11 Thread Dave May

On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni  wrote:

> A strange behavior I am observing is:
> Problem: I have to solve A*x=rhs, and currently I am currently trying to
> solve for a system where I know the exact solution. I have initialized the
> exact solution in the Vec x_exact.
>
> MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy
> VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs
> VecNorm(dummy, NORM_INFINITY, _val); // norm_val = ||dummy||, which
> gives us the residual norm
> PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the
> norm.
>
> // Starting with the linear solver
> KSPCreate(PETSC_COMM_SELF, );
> KSPSetOperators(ksp, A, A);
> KSPSetFromOptions(ksp);
> KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given
> initial input x_exact. So the result will also be stored in x_exact
>
> On running with -pc_type lu -pc_factor_mat_solver_package superlu
> -ksp_monitor I get the following output:
> Norm = 0.00
>   0 KSP Residual norm 4.371606462669e+04
>   1 KSP Residual norm 5.850058113796e+02
>   2 KSP Residual norm 5.832677911508e+02
>   3 KSP Residual norm 1.987386549571e+02
>   4 KSP Residual norm 1.220006530614e+02
>   .
>   .
>   .
>

The default KSP is left preconditioned GMRES. Hence the above iterates
report the preconditioned residual. If your operator is singular, and LU
generated garbage, the preconditioned residual can be very different to the
true residual.

To see the true residual, use
-ksp_monitor_true_residual

Alternatively, use a right preconditioned KSP method, e.g.
-ksp_type fgmres
(or -ksp_type gcr)
With these methods, you will see the true residual with just -ksp_monitor


Thanks
  Dave




>
> Since the initial guess is the exact solution should'nt the first residual
> itself be zero and converge in one iteration.
>
> Thanks,
> Kaushik
>
>
> On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni 
> wrote:
>
> Thank you for the inputs.
> I tried Barry' s suggestion to use SuperLU, but the solution does not
> converge and on doing -ksp_monitor -ksp_converged_reason. I get the
> following error:-
> 240 KSP Residual norm 1.722571678777e+07
> Linear solve did not converge due to DIVERGED_DTOL iterations 240
> For some reason it is diverging, although I am sure that for the given
> system a unique solution exists.
>
> Thanks,
> Kaushik
>
> On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li  wrote:
>
> If you need to use SuperLU_DIST, the pivoting is done statically, using
> maximum weighted matching, so the small diagonals are usually taken care as
> well. It is not as good as partial pivoting, but works most of the time.
>
> Sherry
>
> On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith  wrote:
>
>
>I would suggest using ./configure --download-superlu and then when
> running the program -pc_type lu -pc_factor_mat_solver_package superlu
>
>Note that this is SuperLU, it is not SuperLU_DIST.  Superlu uses
> partial pivoting for numerical stability so should be able to handle the
> small or zero diagonal entries.
>
> Barry
>
> > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni 
> wrote:
> >
> > Hello,
> > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the
> matrix structure I have added a file matrix.log which contains the output
> of MatView() and also the output of Matview_draw in the image file.
> >
> > From the matrix structure it can be seen that Jacobi iteration won't
> work and some of the diagonal entries being very low(of the order of 1E-16)
> LU factorization would also fail.
> >
> > Can someone please suggest what all could I try next, in order to make
> the solution converge?
> >
> > Thanks,
> > Kaushik
> > 
> > --
> > Kaushik Kulkarni
> > Fourth Year Undergraduate
> > Department of Mechanical Engineering
> > Indian Institute of Technology Bombay
> > Mumbai, India
> > https://kaushikcfd.github.io/About/
> > +91-9967687150
> > 
>
>
>
>
>
> --
> Kaushik Kulkarni
> Fourth Year Undergraduate
> Department of Mechanical Engineering
> Indian Institute of Technology Bombay
> Mumbai, India
> https://kaushikcfd.github.io/About/
> +91-9967687150
>
>
>
>
> --
> Kaushik Kulkarni
> Fourth Year Undergraduate
> Department of Mechanical Engineering
> Indian Institute of Technology Bombay
> Mumbai, India
> https://kaushikcfd.github.io/About/
> +91-9967687150
>

Re: [petsc-users] Using MatShell without MatMult

2017-04-07 Thread Dave May

You should also not call PetscInitialize() from within your user MatMult
function.




On Fri, 7 Apr 2017 at 13:24, Matthew Knepley  wrote:

> On Fri, Apr 7, 2017 at 5:11 AM, Francesco Migliorini <
> francescomigliorin...@gmail.com> wrote:
>
> Hello,
>
> I need to solve a linear system using GMRES without creating explicitly
> the matrix because very large. So, I am trying to use the MatShell strategy
> but I am stucked. The problem is that it seems to me that inside the
> user-defined MyMatMult it is required to use MatMult and this would
> honestly vanish all the gain from using this strategy. Indeed, I would need
> to access directly to the entries of the input vector, multiply them by
> some parameters imported in MyMatMult with *common* and finally compose
> the output vector without creating any matrix. First of all, is it
> possible?
>
>
> Yes.
>
>
> Secondly, if so, where is my mistake? Here's an example of my code with a
> very simple 10x10 system with the identity matrix:
>
> [...]
> call PetscInitialize(PETSC_NULL_CHARACTER,perr)
> ind(1) = 10
> call VecCreate(PETSC_COMM_WORLD,feP,perr)
> call VecSetSizes(feP,PETSC_DECIDE,ind,perr)
> call VecSetFromOptions(feP,perr)
> call VecDuplicate(feP,u1P,perr)
> do jt = 1,10
>  ind(1) = jt-1
>  fval(1) = jt
>   call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr)
> enddo
> call VecAssemblyBegin(feP,perr)
> call VecAssemblyEnd(feP,perr)
> ind(1) = 10
> call MatCreateShell(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, ind,
> ind, PETSC_NULL_INTEGER, TheShellMatrix, perr)
> call MatShellSetOperation(TheShellMatrix, MATOP_MULT, MyMatMult, perr)
>
>
> Here I would probably use
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatShellSetContext.html
>
> instead of a common block, but that works too.
>
>
> call KSPCreate(PETSC_COMM_WORLD, ksp, perr)
> call KSPSetType(ksp,KSPGMRES,perr)
> call KSPSetOperators(ksp,TheShellMatrix,TheShellMatrix,perr)
> call KSPSolve(ksp,feP,u1P,perr)
> call PetscFinalize(PETSC_NULL_CHARACTER,perr)
> [...]
>
> subroutine MyMatMult(TheShellMatrix,T,AT,ierr)
> [...]
> Vec T, AT
> Mat TheShellMatrix
> PetscReal   fval(1), u0(1)
> [...]
> call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> ind(1) = 10
> call VecCreate(PETSC_COMM_WORLD,AT,ierr)
> call VecSetSizes(AT,PETSC_DECIDE,ind,ierr)
> call VecSetFromOptions(AT,ierr)
>
>
> Its not your job to create AT. We are passing it in, so just use it.
>
>
> do i =0,9
> ind(1) = i
> call VecGetValues(T,1,ind,u0(1),ierr)
> fval(1) = u0(1)
> call VecSetValues(AT,1,ind,fval(1),INSERT_VALUES,ierr)
>
>
> You can do it this way, but its easier to use
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetArray.html
>
> outside the loop for both vectors.
>
>Matt
>
>
> enddo
> call VecAssemblyBegin(AT,ierr)
> call VecAssemblyEnd(AT,ierr)
> return
> end subroutine MyMatMult
>
> The output of this code is something completely invented but in some way
> related to the actual solution:
> 5.0964719143762542E-002
> 0.10192943828752508
> 0.15289415743128765
> 0.20385887657505017
> 0.25482359571881275
> 0.30578831486257529
> 0.35675303400633784
> 0.40771775315010034
> 0.45868247229386289
> 0.50964719143762549
>
> Instead, if I use MatMult in MyMatMult I get the right solution. Here's
> the code
>
> subroutine MyMatMult(TheShellMatrix,T,AT,ierr)
> [...]
> Vec T, AT
> Mat TheShellMatrix, IDEN
> PetscReal   fval(1)
> [...]
> call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> ind(1) = 10
> call MatCreate(PETSC_COMM_WORLD,IDEN,ierr)
> call MatSetSizes(IDEN,PETSC_DECIDE,PETSC_DECIDE,ind,ind,ierr)
> call MatSetUp(IDEN,ierr)
> do i =0,9
> ind(1) = i
> fval(1) = 1
> call MatSetValues(IDEN,1,ind,1,ind,fval(1),INSERT_VALUES,ierr)
> enddo
> call MatAssemblyBegin(IDEN,MAT_FINAL_ASSEMBLY,ierr)
> call MatAssemblyEnd(IDEN,MAT_FINAL_ASSEMBLY,ierr)
> call MatMult(IDEN,T,AT,ierr)
> return
> end subroutine MyMatMult
>
> Thanks in advance for any answer!
> Francesco Migliorini
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] Question about PETSC

2017-03-16 Thread Dave May

On Thu, 16 Mar 2017 at 07:16, Matthew Knepley  wrote:

> Hi Valentin,
>
> Have you seen this example:
> https://bitbucket.org/petsc/petsc/src/1830d94e4628b31f970259df1d58bc250c9af32a/src/ksp/ksp/examples/tutorials/ex2f.F?at=master=file-view-default
>
> Would that be enough to get started?
>

Matt is correct. The best way to get into PETSc is by studying the example
codes provided in the source tree.

As a precursor to studying fortran examples, you should take a look at this
page and decided which PETSc-fortran approach you wish to use:

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html

Thanks,
  Dave


>   Thanks,
>
>  Matt
>
> On Thu, Mar 16, 2017 at 12:37 AM, Валентин Егоров <
> egorow.walen...@gmail.com> wrote:
>
> Hello!
> My name is Valentin Egorov. I am from Russia. And I have a question for
> you about PETSC. I would like to make a programm on Fortran with PETSC, but
> I can't. I have a matrix 400*400. I have also vector B with 400 elements. I
> need to solve linear equations. Could you help me to do it. I can't
> understand how use PETSC in Fortran? In fortran programm? And where to put
> matrix elements?
>
> Sincerely, Valentin Egorov!
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] ksp solve error with nested matrix

2017-02-08 Thread Dave May

Any time you modify one of the submats, you need to call assembly begin/end
on that sub matrix AND on the outer matnest object.

Thanks,
  Dave


On Wed, 8 Feb 2017 at 22:51, Manav Bhatia  wrote:

> aha.. that might be it.
>
> Does that need to be called for the global matrix after each assembly of
> the Jacobian blocks, or just once for the whole matrix?
>
> -Manav
>
> > On Feb 8, 2017, at 3:47 PM, Barry Smith  wrote:
> >
> >
> >> On Feb 8, 2017, at 3:40 PM, Manav Bhatia  wrote:
> >>
> >> Hi,
> >>
> >>   I have a nested matrix with 2x2 blocks. The blocks (1,1) and (2,2)
> are AIJ matrices and blocks (1,2) and (2,1) and shell matrices. I am
> calling the code with the following arguments: -pc_type fieldsplit , and
> get the error shown below.
> >>
> >>   I see that the error is complaining about the matrix being in
> unassembled state, but I think I am initializing and calling assembly end
> routines on both the diagonal blocks. Still, there is something obvious I
> am missing.
> >
> >  It is complaining about the outer most matrix, not the blocks. Perhaps
> you haven't finished with setting up your nest matrix?
> >
> >>
> >>   I would appreciate any guidance on this.
> >>
> >> Regards,
> >> Manav
> >>
> >>
> >>
> >> [1;31m[0]PETSC ERROR: - Error Message
> --
> >> [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state
> >> [0]PETSC ERROR: Not for unassembled matrix
> >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> >> [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016
> >> [0]PETSC ERROR:
> /Users/manav/Library/Developer/Xcode/DerivedData/MAST-crggwcqrouiyeucduvscdahjauvx/Build/Products/Debug/examples
> on a arch-darwin-cxx-opt named Dhcp-90-250.HPC.MsState.Edu by manav Wed
> Feb  8 15:28:04 2017
> >> [0]PETSC ERROR: Configure options
> --prefix=/Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/../
> --CC=mpicc-openmpi-clang38 --CXX=mpicxx-openmpi-clang38
> --FC=mpif90-openmpi-clang38 --with-clanguage=c++ --with-fortran=0
> --with-mpiexec=/opt/local/bin/mpiexec-openmpi-clang38
> --with-shared-libraries=1 --with-x=1 --with-x-dir=/opt/X11
> --with-debugging=0 --with-lapack-lib=/usr/lib/liblapack.dylib
> --with-blas-lib=/usr/lib/libblas.dylib --download-superlu=yes
> --download-superlu_dist=yes --download-suitesparse=yes --download-mumps=yes
> --download-scalapack=yes --download-parmetis=yes --download-metis=yes
> --download-hypre=yes --download-ml=yes
> >> [0]PETSC ERROR: #1 MatMult() line 2248 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/mat/interface/matrix.c
> >> [0]PETSC ERROR: #2 PCApplyBAorAB() line 717 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/ksp/pc/interface/precon.c
> >> [0]PETSC ERROR: #3 KSP_PCApplyBAorAB() line 274 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/include/petsc/private/kspimpl.h
> >> [0]PETSC ERROR: #4 KSPGMRESCycle() line 156 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/ksp/ksp/impls/gmres/gmres.c
> >> [0]PETSC ERROR: #5 KSPSolve_GMRES() line 240 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/ksp/ksp/impls/gmres/gmres.c
> >> [0]PETSC ERROR: #6 KSPSolve() line 656 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/ksp/ksp/interface/itfunc.c
> >> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 230 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/snes/impls/ls/ls.c
> >> [0]PETSC ERROR: #8 SNESSolve() line 4005 in
> /Users/manav/Documents/codes/numerical_lib/petsc/petsc-3.7.4/src/snes/interface/snes.c
> >>
> >
>
>

Re: [petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur

2017-01-11 Thread Dave May

It looks like the Schur solve is requiring a huge number of iterates to
converge (based on the instances of MatMult).
This is killing the performance.

Are you sure that A11 is a good approximation to S? You might consider
trying the selfp option

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre

Note that the best approx to S is likely both problem and discretisation
dependent so if selfp is also terrible, you might want to consider coding
up your own approx to S for your specific system.


Thanks,
  Dave


On Wed, 11 Jan 2017 at 22:34, David Knezevic 
wrote:

I have a definite block 2x2 system and I figured it'd be good to apply the
PCFIELDSPLIT functionality with Schur complement, as described in Section
4.5 of the manual.

The A00 block of my matrix is very small so I figured I'd specify a direct
solver (i.e. MUMPS) for that block.

So I did the following:
- PCFieldSplitSetIS to specify the indices of the two splits
- PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver
and PC types for each (MUMPS for A00, ILU+CG for A11)
- I set -pc_fieldsplit_schur_fact_type full

Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for a
test case. It seems to converge well, but I'm concerned about the speed
(about 90 seconds, vs. about 1 second if I use a direct solver for the
entire system). I just wanted to check if I'm setting this up in a good way?

Many thanks,
David

---

  0 KSP Residual norm 5.405774214400e+04
  1 KSP Residual norm 1.849649014371e+02
  2 KSP Residual norm 7.462775074989e-02
  3 KSP Residual norm 2.680497175260e-04
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=1000
  tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
FieldSplit with Schur preconditioner, factorization FULL
Preconditioner for the Schur complement formed from A11
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
  KSP Object:  (fieldsplit_RB_split_)   1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
left preconditioning
using NONE norm type for convergence test
  PC Object:  (fieldsplit_RB_split_)   1 MPI processes
type: cholesky
  Cholesky: out-of-place factorization
  tolerance for zero pivot 2.22045e-14
  matrix ordering: natural
  factor fill ratio given 0., needed 0.
Factored matrix follows:
  Mat Object:   1 MPI processes
type: seqaij
rows=324, cols=324
package used to perform factorization: mumps
total: nonzeros=3042, allocated nonzeros=3042
total number of mallocs used during MatSetValues calls =0
  MUMPS run parameters:
SYM (matrix type):   2
PAR (host participation):1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info):   0
ICNTL(4) (level of printing):0
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling):7
ICNTL(7) (sequentia matrix ordering):7
ICNTL(8) (scalling strategy):77
ICNTL(10) (max num of refinements):  0
ICNTL(11) (error analysis):  0
ICNTL(12) (efficiency control):
0
ICNTL(13) (efficiency control):
0
ICNTL(14) (percentage of estimated workspace increase):
20
ICNTL(18) (input mat struct):
0
ICNTL(19) (Shur complement info):
0
ICNTL(20) (rhs sparse pattern):
0
ICNTL(21) (solution struct):
 0
ICNTL(22) (in-core/out-of-core facility):
0
ICNTL(23) (max size of memory can be allocated
locally):0
ICNTL(24) (detection of null pivot rows):
0
ICNTL(25) (computation of a null space basis):
 0
ICNTL(26) (Schur options for rhs or solution):
 0
ICNTL(27) (experimental parameter):
-24
ICNTL(28) (use parallel or sequential ordering):
 1
ICNTL(29) (parallel ordering):
 0
ICNTL(30) (user-specified set of entries in inv(A)):
 0
ICNTL(31) (factors is

Re: [petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

2017-01-08 Thread Dave May

I suggest you check the code is valgrind clean.

See the petsc FAQ page for details of how to do this.

Thanks,
  Dave

On Sun, 8 Jan 2017 at 04:57, Mark Adams  wrote:

> This error seems to be coming from the computation of the extreme
> eigenvalues of the matrix for smoothing in smoothed aggregation.
>
> Are you getting good solutions with hypre? This error looks like it might
> just be the first place where a messed up matrix fails in GAMG.
>
> On Sat, Jan 7, 2017 at 10:23 PM, Matthew Knepley 
> wrote:
>
> On Sat, Jan 7, 2017 at 7:38 PM, Manuel Valera 
> wrote:
>
> Ok great, i tried those command line args and this is the result:
>
> when i use -pc_type gamg:
>
> [1]PETSC ERROR: - Error Message
> --
>
>
> [1]PETSC ERROR: Petsc has generated inconsistent data
>
>
> [1]PETSC ERROR: Have un-symmetric graph (apparently). Use
> '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold
> -1.0' if the matrix is structurally symmetric.
>
>
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
>
> [1]PETSC ERROR: Petsc Release Version 3.7.4, unknown
>
>
> [1]PETSC ERROR: ./ucmsMR on a arch-linux2-c-debug named ocean by valera
> Sat Jan  7 17:35:05 2017
>
>
> [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5
> --download-netcdf --download-hypre --download-metis --download-parmetis
> --download-trillinos
>
>
> [1]PETSC ERROR: #1 smoothAggs() line 462 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
>
>
> [1]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
>
>
> [1]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
>
>
> [1]PETSC ERROR: #4 PCSetUp() line 968 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/interface/precon.c
>
>
> [1]PETSC ERROR: #5 KSPSetUp() line 390 in
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
>
>
> application called MPI_Abort(comm=0x8402, 77) - process 1
>
>
> when i use -pc_type gamg and -pc_gamg_sym_graph true:
>
>
> Do everything Barry said.
>
> However, I would like to track down this error. It could be a bug in our
> code. However, it appears to happen in the call
> to LAPACK, so it could also be a problem with that library on your
> machine. Could you run this case in the debugger
> and give the stack trace?
>
>   Thanks,
>
>  Matt
>
>
>  
>
>
> [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
>
>
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>
>
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
>
>
> [1]PETSC ERROR:
> 
>
> [1]PETSC ERROR: -  Stack Frames
> 
>
>
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
>
>
> [1]PETSC ERROR:   INSTEAD the line number of the start of the function
>
>
> [1]PETSC ERROR:   is given.
>
>
> [1]PETSC ERROR: [1] LAPACKgesvd line 42
> /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c
>
>
> [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues_GMRES line 24
> /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c
>
>
> [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues line 51
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
>
>
> [1]PETSC ERROR: [1] PCGAMGOptProlongator_AGG line 1187
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
>
>
> [1]PETSC ERROR: [1] PCSetUp_GAMG line 472
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
>
>
> [1]PETSC ERROR: [1] PCSetUp line 930
> /usr/dataC/home/valera/petsc/src/ksp/pc/interface/precon.c
>
>
> [1]PETSC ERROR: [1] KSPSetUp line 305
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
>
>
> [0] PCGAMGOptProlongator_AGG line 1187
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
>
>
> [0]PETSC ERROR: [0] PCSetUp_GAMG line 472
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
>
>
> [0]PETSC ERROR: [0] PCSetUp line 930
> /usr/dataC/home/valera/petsc/src/ksp/pc/interface/precon.c
>
>
> [0]PETSC ERROR: [0] KSPSetUp line 305
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
>
>
> [0]PETSC ERROR: - Error Message
> --
>
>
> when i use -pc_type hypre it actually shows something different on
> -ksp_view :
>
>
> KSP Object: 2 MPI processes
>
>   type: gcr
>
> GCR:

Re: [petsc-users] Suspicious long call to VecAXPY

2017-01-06 Thread Dave May

On 6 January 2017 at 22:31, Łukasz Kasza  wrote:

>
>
> Dear PETSc Users,
>
> Please consider the following 2 snippets which do exactly the same
> (calculate a sum of two vectors):
> 1.
> VecAXPY(amg_level_x[level],1.0,amg_level_residuals[level]);
>
> 2.
> VecGetArray(amg_level_residuals[level], );
> VecSetValues(amg_level_x[level],size,indices,values,ADD_VALUES);
> VecRestoreArray(amg_level_residuals[level], );
> VecAssemblyBegin(amg_level_x[level]);
> VecAssemblyEnd(amg_level_x[level]);
>
> In my program I have both of the snippets executed in a loop. The problem
> with the first one is that the longer the program goes the longer it takes
> to execute it. At the same time the execution time of the second snippet is
> more or less constant. I don't know why but after a few hundreds of
> iterations VecAXPY takes more than MatMult on the matrix and vector of the
> same size and after that it still grows!


How did you profile this?


> Always returning a correct value though. I am using 4.5.3 version,


Which version of PETSc are you using?? Current release is 3.7.5


> the vectors are
> sequential. VecAXPY in such case is just a wrapper for blas, do you have
> any idea why the execution time of this function constantly grows?
>

Maybe your code is leaking memory and ultimately your OS starts swapping?

Please send the code.

Thanks,
  Dave



>
> Best regards.
>
>
>

Re: [petsc-users] Best way to scatter a Seq vector ?

2017-01-06 Thread Dave May

an one
>> processor, what would be a better approach ?
>> >
>> >
>> > Thanks once again,
>> >
>> > Manuel
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <mval...@mail.sdsu.edu>
>> wrote:
>> > Thanks i had no idea how to debug and read those logs, that solved this
>> issue at least (i was sending a message from root to everyone else, but
>> trying to catch from everyone else including root)
>> >
>> > Until next time, many thanks,
>> >
>> > Manuel
>> >
>> > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <knep...@gmail.com>
>> wrote:
>> > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <mval...@mail.sdsu.edu>
>> wrote:
>> > I did a PetscBarrier just before calling the vicariate routine and im
>> pretty sure im calling it from every processor, code looks like this:
>> >
>> > From the gdb trace.
>> >
>> >   Proc 0: Is in some MPI routine you call yourself, line 113
>> >
>> >   Proc 1: Is in VecCreate(), line 130
>> >
>> > You need to fix your communication code.
>> >
>> >Matt
>> >
>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>> >
>> > print*,'entering POInit from',rank
>> > !call exit()
>> >
>> > call PetscObjsInit()
>> >
>> >
>> > And output gives:
>> >
>> >  entering POInit from   0
>> >  entering POInit from   1
>> >  entering POInit from   2
>> >  entering POInit from   3
>> >
>> >
>> > Still hangs in the same way,
>> >
>> > Thanks,
>> >
>> > Manuel
>> >
>> >
>> >
>> > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <mval...@mail.sdsu.edu>
>> wrote:
>> > Thanks for the answers !
>> >
>> > heres the screenshot of what i got from bt in gdb (great hint in how to
>> debug in petsc, didn't know that)
>> >
>> > I don't really know what to look at here,
>> >
>> > Thanks,
>> >
>> > Manuel
>> >
>> > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <dave.mayhe...@gmail.com>
>> wrote:
>> > Are you certain ALL ranks in PETSC_COMM_WORLD call these function(s).
>> These functions cannot be inside if statements like
>> > if (rank == 0){
>> >   VecCreateMPI(...)
>> > }
>> >
>> >
>> > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mval...@mail.sdsu.edu>
>> wrote:
>> > Thanks Dave for the quick answer, appreciate it,
>> >
>> > I just tried that and it didn't make a difference, any other
>> suggestions ?
>> >
>> > Thanks,
>> > Manuel
>> >
>> > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhe...@gmail.com>
>> wrote:
>> > You need to swap the order of your function calls.
>> > Call VecSetSizes() before VecSetType()
>> >
>> > Thanks,
>> >   Dave
>> >
>> >
>> > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mval...@mail.sdsu.edu>
>> wrote:
>> > Hello all, happy new year,
>> >
>> > I'm working on parallelizing my code, it worked and provided some
>> results when i just called more than one processor, but created artifacts
>> because i didn't need one image of the whole program in each processor,
>> conflicting with each other.
>> >
>> > Since the pressure solver is the main part i need in parallel im
>> chosing mpi to run everything in root processor until its time to solve for
>> pressure, at this point im trying to create a distributed vector using
>> either
>> >
>> >  call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
>> > or
>> >  call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>> >  call VecSetType(xp,VECMPI,ierr)
>> >  call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>> >
>> >
>> > In both cases program hangs at this point, something it never happened
>> on the naive way i described before. I've made sure the global size, nbdp,
>> is the same in every processor. What can be wrong?
>> >
>> > Thanks for your kind help,
>> >
>> > Manuel.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> >
>>
>>
>

Re: [petsc-users] Fieldsplit with sub pc MUMPS in parallel

2017-01-05 Thread Dave May

Do you now see identical residual histories for a job using 1 rank and 4
ranks?

If not, I am inclined to believe that the IS's you are defining for the
splits in the parallel case are incorrect. The operator created to
approximate the Schur complement with selfp should not depend on  the
number of ranks.

Or possibly your problem is horribly I'll-conditioned. If it is, then this
could result in slightly different residual histories when using different
numbers of ranks - even if the operators are in fact identical


Thanks,
  Dave




On Thu, 5 Jan 2017 at 12:14, Karin  wrote:

> Dear Barry, dear Dave,
>
> THANK YOU!
> You two pointed out the right problem.By using the options you provided
> (-fieldsplit_0_ksp_type gmres -fieldsplit_0_ksp_pc_side right
> -fieldsplit_1_ksp_type gmres -fieldsplit_1_ksp_pc_side right), the solver
> converges in 3 iterations whatever the size of the communicator.
> All the trick is in the precise resolution of the Schur complement, by
> using a Krylov method (and not only preonly) *and* applying the
> preconditioner on the right (so evaluating the convergence on the
> unpreconditioned residual).
>
> @Barry : the difference you see on the nonzero allocations for the
> different runs is just an artefact : when using more than one proc, we
> slighly over-estimate the number of non-zero terms. If I run the same
> problem with the -info option, I get extra information :
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 110 X 110; storage space: 0
> unneeded,5048 used
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 271 X 271; storage space: 4249
> unneeded,26167 used
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 307 X 307; storage space: 7988
> unneeded,31093 used
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 110 X 244; storage space: 0
> unneeded,6194 used
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 271 X 233; storage space: 823
> unneeded,9975 used
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 307 X 197; storage space: 823
> unneeded,8263 used
> And 5048+26167+31093+6194+9975+8263=86740 which is the number of exactly
> estimated nonzero terms for 1 proc.
>
>
> Thank you again!
>
> Best regards,
> Nicolas
>
>
> 2017-01-05 1:36 GMT+01:00 Barry Smith :
>
>
>
>
>There is something wrong with your set up.
>
>
>
>
>
> 1 process
>
>
>
>
>
>total: nonzeros=140616, allocated nonzeros=140616
>
>
>   total: nonzeros=68940, allocated nonzeros=68940
>
>
> total: nonzeros=3584, allocated nonzeros=3584
>
>
> total: nonzeros=1000, allocated nonzeros=1000
>
>
> total: nonzeros=8400, allocated nonzeros=8400
>
>
>
>
>
> 2 processes
>
>
> total: nonzeros=146498, allocated nonzeros=146498
>
>
>   total: nonzeros=73470, allocated nonzeros=73470
>
>
> total: nonzeros=3038, allocated nonzeros=3038
>
>
> total: nonzeros=1110, allocated nonzeros=1110
>
>
> total: nonzeros=6080, allocated nonzeros=6080
>
>
> total: nonzeros=146498, allocated nonzeros=146498
>
>
>   total: nonzeros=73470, allocated nonzeros=73470
>
>
> total: nonzeros=6080, allocated nonzeros=6080
>
>
>   total: nonzeros=2846, allocated nonzeros=2846
>
>
> total: nonzeros=86740, allocated nonzeros=94187
>
>
>
>
>
>   It looks like you are setting up the problem differently in parallel and
> seq. If it is suppose to be an identical problem then the number nonzeros
> should be the same in at least the first two matrices.
>
>
>
>
>
>
>
>
>
>
>
> > On Jan 4, 2017, at 3:39 PM, Karin  wrote:
>
>
> >
>
>
> > Dear Petsc team,
>
>
> >
>
>
> > I am (still) trying to solve Biot's poroelasticity problem :
>
>
> >  
>
>
> >
>
>
> > I am using a mixed P2-P1 finite element discretization. The matrix of
> the discretized system in binary format is attached to this email.
>
>
> >
>
>
> > I am using the fieldsplit framework to solve the linear system. Since I
> am facing some troubles, I have decided to go back to simple things. Here
> are the options I am using :
>
>
> >
>
>
> > -ksp_rtol 1.0e-5
>
>
> > -ksp_type fgmres
>
>
> > -pc_type fieldsplit
>
>
> > -pc_fieldsplit_schur_factorization_type full
>
>
> > -pc_fieldsplit_type schur
>
>
> > -pc_fieldsplit_schur_precondition selfp
>
>
> > -fieldsplit_0_pc_type lu
>
>
> > -fieldsplit_0_pc_factor_mat_solver_package mumps
>
>
> > -fieldsplit_0_ksp_type preonly
>
>
> > -fieldsplit_0_ksp_converged_reason
>
>
> > -fieldsplit_1_pc_type lu
>
>
> > -fieldsplit_1_pc_factor_mat_solver_package mumps
>
>
> > -fieldsplit_1_ksp_type preonly
>
>
> > -fieldsplit_1_ksp_converged_reason
>
>
> >
>
>
> > On a single proc, everything runs fine : the solver converges in 3
> iterations, according to the theory (see Run-1-proc.txt [contains
> -log_view]).
>
>
> >
>
>
> > On 2 procs, the solver converges in 28 iterations (see Run-2-proc.txt).
>
>
> >
>
>
> > On 3 procs, the

Re: [petsc-users] VecSetSizes hangs in MPI

2017-01-04 Thread Dave May

Are you certain ALL ranks in PETSC_COMM_WORLD call these function(s). These
functions cannot be inside if statements like
if (rank == 0){
  VecCreateMPI(...)
}


On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mval...@mail.sdsu.edu> wrote:

> Thanks Dave for the quick answer, appreciate it,
>
> I just tried that and it didn't make a difference, any other suggestions ?
>
> Thanks,
> Manuel
>
> On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhe...@gmail.com> wrote:
>
> You need to swap the order of your function calls.
> Call VecSetSizes() before VecSetType()
>
> Thanks,
>   Dave
>
>
> On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mval...@mail.sdsu.edu> wrote:
>
> Hello all, happy new year,
>
> I'm working on parallelizing my code, it worked and provided some results
> when i just called more than one processor, but created artifacts because i
> didn't need one image of the whole program in each processor, conflicting
> with each other.
>
> Since the pressure solver is the main part i need in parallel im chosing
> mpi to run everything in root processor until its time to solve for
> pressure, at this point im trying to create a distributed vector using
> either
>
>  call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
> or
>
>  call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>
>  call VecSetType(xp,VECMPI,ierr)
>
>  call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>
>
>
> In both cases program hangs at this point, something it never happened on
> the naive way i described before. I've made sure the global size, nbdp, is
> the same in every processor. What can be wrong?
>
>
> Thanks for your kind help,
>
>
> Manuel.
>
>
>
>
>
>
>
>

Re: [petsc-users] VecSetSizes hangs in MPI

2017-01-04 Thread Dave May

You need to swap the order of your function calls.
Call VecSetSizes() before VecSetType()

Thanks,
  Dave


On Wed, 4 Jan 2017 at 23:21, Manuel Valera  wrote:

Hello all, happy new year,

I'm working on parallelizing my code, it worked and provided some results
when i just called more than one processor, but created artifacts because i
didn't need one image of the whole program in each processor, conflicting
with each other.

Since the pressure solver is the main part i need in parallel im chosing
mpi to run everything in root processor until its time to solve for
pressure, at this point im trying to create a distributed vector using
either

 call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
or

 call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)

 call VecSetType(xp,VECMPI,ierr)

 call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)



In both cases program hangs at this point, something it never happened on
the naive way i described before. I've made sure the global size, nbdp, is
the same in every processor. What can be wrong?


Thanks for your kind help,


Manuel.

Re: [petsc-users] Fwd: Fieldsplit with sub pc MUMPS in parallel

2017-01-04 Thread Dave May

The issue is your fieldsplit_1 solve. You are applying mumps to an
approximate Schur complement - not the true Schur complement. Seemingly the
approximation is dependent on the communicator size.

If you want to see iteration counts of 2, independent of mesh size and
communicator size you need to solve the true Schur complement system
(fieldsplit_1) to a specified tolerance (Erik 1e-10) - don't use preonly.

In practice you probably don't want to iterate on the Schur complement
either as it is likely too expensive. If you provided fieldsplit with a
spectrally equivalent approximation to S, iteration counts would be larger
than two, but they would be independent of the number of elements and comm
size

Thanks,
  Dave

On Wed, 4 Jan 2017 at 22:39, Karin  wrote:

> Dear Petsc team,
>
> I am (still) trying to solve Biot's poroelasticity problem :
>  [image: Images intégrées 1]
>
> I am using a mixed P2-P1 finite element discretization. The matrix of the
> discretized system in binary format is attached to this email.
>
> I am using the fieldsplit framework to solve the linear system. Since I am
> facing some troubles, I have decided to go back to simple things. Here are
> the options I am using :
>
> -ksp_rtol 1.0e-5
> -ksp_type fgmres
> -pc_type fieldsplit
> -pc_fieldsplit_schur_factorization_type full
> -pc_fieldsplit_type schur
> -pc_fieldsplit_schur_precondition selfp
> -fieldsplit_0_pc_type lu
> -fieldsplit_0_pc_factor_mat_solver_package mumps
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_ksp_converged_reason
> -fieldsplit_1_pc_type lu
> -fieldsplit_1_pc_factor_mat_solver_package mumps
> -fieldsplit_1_ksp_type preonly
> -fieldsplit_1_ksp_converged_reason
>
> On a single proc, everything runs fine : the solver converges in 3
> iterations, according to the theory (see Run-1-proc.txt [contains
> -log_view]).
>
> On 2 procs, the solver converges in 28 iterations (see Run-2-proc.txt).
>
> On 3 procs, the solver converges in 91 iterations (see Run-3-proc.txt).
>
> I do not understand this behavior : since MUMPS is a parallel direct
> solver, shouldn't the solver converge in max 3 iterations whatever the
> number of procs?
>
>
> Thanks for your precious help,
> Nicolas
>
>
>
>
>
>

Re: [petsc-users] MatSetValues in runtime

2016-12-05 Thread Dave May

On 5 December 2016 at 16:49, Massoud Rezavand  wrote:

> Dear Petsc team,
>
> In order to create a parallel matrix and solve by KSP, is it possible to
> directly use MatSetValues() in runtime when each matrix entry is just
> created  without MatMPIAIJSetPreallocation()?
>

Yes, but performance will be terrible without specify any preallocation
info.
See this note
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html

For a code example of for how to do the assembly without preallocation (not
recommended), you can refer to this (its the same pattern for MPIAIJ)

http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex1.c.html


>
> I mean, when you only know the global size of Mat, and the number of
> nonzeros per row is not constant neither for all rows nor during time, is
> it possible to set the singular entries into Mat one by one after creating
> each one?
>

Why don't you just destroy the matrix and create a new one every time the
non-zero structure changes?
That's what I recommended last time.


Thanks,
  Dave


> Thanks
> Massoud
>
>
>
>
>
>
>

Re: [petsc-users] How do I Ensure that Two DMDA Objects Are Coupled Correctly?

2016-11-29 Thread Dave May

For collocated variables, I recommend you use the function
DMDAGetReducedDMDA()

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDAGetReducedDMDA.html

That's the simplest option.


In general, if the 2 dmdas have the same number of points in each
direction, and you let petsc determine the partitions when you called
DMDACreate3d(), they will have the same layout in parallel.

You can confirm they overlap using the returned values from
DMDAGetCorners() and DMDAGetGhostCorners().

Alternatively you can specify the layout yourself when you create the DMDAs
using the lx[] ly[] lz[] arrays. See

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMDACreate3d.html


Thanks
Dave

On Tue, 29 Nov 2016 at 07:58, Clark C Pederson  wrote:

> Dear all,
>
> I'm writing a simple solver for the time-dependent incompressible
> Navier-Stokes equations, using a structured grid and a fractional step
> approach.  Specifically, I'm following the approach of Kim and Moin, from
> their 1985 paper. To those who are unfamiliar with fractional step methods,
> you separate out the equation solving process into multiple steps.  In the
> first step, you advance your velocity fields in time to obtain an
> "uncorrected" velocity.  In the second step, you compute a pressure-like
> term using the uncorrected velocities.  In the third step, you compute the
> correct velocities using the pressure-like term.
>
> This lead me to my question: There's two types of data I need to work with
> in different ways.  The first is the velocity field (u,v,w), which needs to
> be updated using time-stepping routines.  The second is the pressure-like
> term, which is solved for in a Poisson equation.  What is the best way to
> couple these two fields, using distributed arrays?
>
> The suggestion here: (
> http://lists.mcs.anl.gov/pipermail/petsc-users/2013-October/019022.html)
> is to use two DMDAs, one for the velocities with 3 DoF and one for the
> pressures with 1 DoF.  This seems like the simplest way to work with the
> problem, aside from one problem: when the two arrays need to interface, how
> do I ensure that the local processes align?  In other words, how do I make
> sure that my local array p[k][j][i] can pull the correct information from
> the local vel[k][j][i].u etc. array locations? The points would be
> identical or neighboring points on the structured grid, but they would be
> part of two different DMDA objects.  How do I make sure that each processor
> has the data it needs?
>
> I've looked in the manual, examples, and the mailing list, but I couldn't
> find anything that answered this question.  The answer may be very simple,
> so any help is appreciated.
>
> Thanks,
> Clark Pederson
>

Re: [petsc-users] PETSc for ISPH

2016-11-28 Thread Dave May

Massoud,

On 28 November 2016 at 20:18, Massoud Rezavand 
wrote:

> Hi,
>
> Thanks.
>
> As you know, in SPH method, the calculations are done over the neighboring
> particles (j) that fall inside a support domain defined by a circle over
> the particle of interest (i). Since the Lagrangian nature the method, the
> number of neighboring particles are varying slightly over time, e.g. in a
> 2D domain this number is varying between 43 to 51 (in my experience).
>
> The number of nonzeros per row (A_ij) is equal to the number of
> neighboring particles and normally is not fixed over time, therefore, we
> put the elements dynamically at each time step and we have to calculate
> d_nz and o_nz at each time iteration.
>
> In order to preallocate the matrix, another way would be to calculate the
> number of neighboring particles and set that as the number of nonzeros per
> row.  Doing so, do you recommend to use :
>
> MatMPIAIJSetPreallocation()
>
> to preallocate A to achieve the best performance?
>

The other thing to bear in mind is that PETSc objects like Mat and Vec are
not really dynamic with respect to the partitioning.

In you SPH simulation, at each time step not only does the number of
non-zeros (e.g. nearest neighbours) change, but likely so to will the
number of particles per sub-domain (depending on how you define a
sub-domain - see footnote below). Once you create a Mat and Vec object, the
partition is defined once and for all and cannot be altered. Hence, when
you particles cross sub-domain boundaries, you will have to destroy the
matrix and re-create the non-zero structure and re-do the preallocation.
The good news is the setup time for a new mat and vec in petsc is fast so I
doubt you'll notice much overhead of the create/destroy being performed at
each time step.

Thanks,
  Dave

(*) My comment kind of assumes that since you are modelling incompressible
fluids, you have a constant smoothing length and will partition the domain
via boxes of size 2h and a sub-domain wrt to the particles will be defined
via all the points live in a set of boxes mapped to a given MPI-rank. PPM
probably has some clever load balancing strategy, but nevertheless I think
you'll run into this issue with Mat and Vec.



>
>
> Regards,
>
> Massoud
>
>
>
> On 11/28/2016 06:36 PM, Barry Smith wrote:
>
>> On Nov 28, 2016, at 10:30 AM, Massoud Rezavand <
>>> massoud.rezav...@uibk.ac.at> wrote:
>>>
>>> Dear Barry,
>>>
>>> You recommended me to directly use MatSetValues() and not to put the
>>> matrix in a parallel CSR matrix.
>>>
>>> In order to count the d_nz and o_nz I have to put the entries into a
>>> sequential CSR matrix
>>>
>> If you don't know the number of nonzeros per row how are you going to
>> put the values into a sequential CSR format?
>> On the other hand if you can figure out the number of nonzeros per row
>> without creating the matrix how come you cannot figure out the d_nz and
>> o_nz?
>>
>>
>> and then do the MatMPIAIJSetPreallocation() and then do the MatSet
>>> Values().
>>>
>>  If you do put the values into a sequential CSR format, which it is
>> not clear to me is needed, then you can just call
>> MatCreateMPIAIJWithArrays() and skip the "MatMPIAIJSetPreallocation() and
>> then do the MatSet Values()"
>>
>> Barry
>>
>>
>>
>> Does it effect the performance ?
>>>
>>>
>>> Regards,
>>>
>>> Massoud
>>>
>>> On 11/21/2016 08:10 PM, Barry Smith wrote:
>>>
 On Nov 21, 2016, at 12:20 PM, Massoud Rezavand <
> massoud.rezav...@uibk.ac.at> wrote:
>
> Thank you very much.
>
> Yes I am developing the new 3D version in Parallel with the PPM (the
> new generation OpenFPM, not released yet) library which generates the
> particles and decomposes the domain.
>
> I don't have the parallel matrix generation yet. In the old version I
> had CSR format and a vector of knowns (b).
> So, should I use MatSetValuesStencil() ?
>
  MatSetValuesStencil is for finite differences on a structured
 grid. I don't think it makes sense for your application.

  You need to use MatMPIAIJSetPreallocation() and then
 MatSetValues() to put the entries in.

 What do you recommend for creating the vector of knowns (b)?
>
 Just use VecCreateMPI()

> On the other hand, due to the convergence issues for millions of
> particles in ISPH, I have to use a preconditioner. In a paper I saw they
> have used BoomerAMG from HYPRE. Do you have any recommendation?
>
 We have many to try, it is not clear that any would be particularly
 good for SPH. Certainly try BoomerAMG

 I saw an example ( ex19.c) using BoomerAMG. Should I follow that?
>
>
> PS: regarding the unbalance sparsity in SPH, yes in contrast to the
> mesh-based methods, the A matrix in ISPH is changing over the time but the
> number of non-zeros is defined by the number of neighboring particles 
>

Re: [petsc-users] How to get a matrix and vector for a coupled system of equations?

2016-11-28 Thread Dave May

On 28 November 2016 at 10:43, Rolf Kuiper <kui...@mpia.de> wrote:

> Hi Dave,
>
> Thanks a lot for your prompt reply! This is even easier than I thought
> (and that is most likely the reason, why I could not think into this
> direction) :)
> And yes, I might/should upgrade our PETSc version used (3.1), but it will
> take me some days to check the full code package.
>
> Now, I will create my DA (now in 3.6 called DMDA) for the coupled system
> via
> DACreate3d(PETSC_COMM_WORLD, Periodicity, DA_STENCIL_BOX, Nx, Ny, Nz, Px,
> Py, Pz, 4, 1, lx, ly, lz, );
> instead of
> DACreate3d(PETSC_COMM_WORLD, Periodicity, DA_STENCIL_BOX, Nx, Ny, Nz, Px,
> Py, Pz, 1, 1, lx, ly, lz, );
> which I have used for the single equation.
>
> I would like to ask one follow-up question:
> Currently, I loop over my matrix (and vectors) via three for-loops over
> the 3 spatial directions, and set the columns indices in the following way:
> for(k){
> row.k= k;
> col[0].k = k;
> col[1].k = k;
> col[2].k = k;
> col[3].k = k;
> col[4].k = k+1;
> col[5].k = k-1;
> for(j){
> row.j= j;
> col[0].j = j;
> col[1].j = j;
> col[2].j = j+1;
> col[3].j = j-1;
> col[4].j = j;
> col[5].j = j;
> for(i){
> row.i= i;
> col[0].i = i+1;
> col[1].i = i-1;
> col[2].i = i;
> col[3].i = i;
> col[4].i = i;
> col[5].i = i;
> Now for the coupled equations, I should overall additionally loop over the
> number of DOFs (just 2 in my previous email example).
>

So presumably you are using MatSetValuesStencil() to insert entries into
the matrix.
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValuesStencil.html

You can keep using MatSetValuesStencil(), for you multi-component example,
however you will additionally need to enter a value for the member "c"
within the MatStencil struct.
See
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatStencil.html

"c" relates to the component or DOF index. So the code above will need to
be modified slightly to define values for row.c and col[0].c , col[1].c ...
etc


Alternatively you can use MatSetValuesStencilBlocked()
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValuesBlockedStencil.html#MatSetValuesBlockedStencil




> Could you give me an easy example or pseudo-code for the associated
> assignment of columns (I have the same stencil in each of the submatrices,
> so for the 2 DOFs in 3D, I would get 7+6=13 column entries per row)? Or can
> you link me to an existing example within the PETSc help?
>

Note that the manual pages I've sent links to contains links to example
codes (see bottom of the webpage) where you can see how to use these
functions.

Thanks,
  Dave


>
> Again, Thanks a lot!
> Rolf
>
>
>
>
> Am 24.11.2016 um 22:30 schrieb Dave May <dave.mayhe...@gmail.com>:
>
> When you create the DMDA, set the number of DOFs (degrees of freedom) per
> point to be 2 instead of 1.
>
> You must be using and ancient version of petsc given the function names
> you quoted. Consider upgrading to 3.7
>
> Thanks,
> Dave
>
> On Thu, 24 Nov 2016 at 20:24, Rolf Kuiper <kui...@mpia.de> wrote:
>
>> Dear PETSc users,
>>
>> maybe this is an easy question, but I can’t find the information right
>> away in the user’s guide nor online.
>>
>> What I am currently doing and which works fine:
>> To solve a partial differential equation for the quantity q on a parallel
>> distributed grid, which is represented by the Distributed Array MyDA, I am
>> currently creating the associated sparse matrix for the KSP solver via
>> DAGetMatrix(MyDA, MATMPIAIJ, );
>>
>> The solution vector and right hand side vector are created via
>> DACreateGlobalVector(MyDA, );
>> VecDuplicate(MyRightHandSideVector, );
>>
>> The DA is constructed using DACreate3d() with the corresponding regular
>> structured grid information.
>>
>> And here is my problem:
>> Now, I would like to solve a coupled system of equations for the
>> quantities q1 and q2 on the same grid. I.e., the matrix should just get the
>> double number of rows and columns, the vectors contain twice the number of
>> entries (e.g. first all q1s and then all q2s). And I would like to be sure
>> that the entries of q1 and q2, which are associated with the same grid cell
>> are located on the same processor.
>> Is there already a pre-defined structures available (such as MATMPIAIJ)
>> within PETSc to enlarge such a single equation to store the entries of
>> coupled equations? Such as
>> DACreateTwiceTheGlobalVector()?
>>
>> The equation is (simplified) of the form
>> d/dt q1 + grad q2 = 0
>> d/dt q2 + f(q1) = 0
>> with an arbitrary function f depending on q1.
>>
>> Thanks a lot for your help in advance,
>> Rolf
>>
>
>

Re: [petsc-users] How to get a matrix and vector for a coupled system of equations?

2016-11-24 Thread Dave May

When you create the DMDA, set the number of DOFs (degrees of freedom) per
point to be 2 instead of 1.

You must be using and ancient version of petsc given the function names you
quoted. Consider upgrading to 3.7

Thanks,
Dave

On Thu, 24 Nov 2016 at 20:24, Rolf Kuiper  wrote:

> Dear PETSc users,
>
> maybe this is an easy question, but I can’t find the information right
> away in the user’s guide nor online.
>
> What I am currently doing and which works fine:
> To solve a partial differential equation for the quantity q on a parallel
> distributed grid, which is represented by the Distributed Array MyDA, I am
> currently creating the associated sparse matrix for the KSP solver via
> DAGetMatrix(MyDA, MATMPIAIJ, );
>
> The solution vector and right hand side vector are created via
> DACreateGlobalVector(MyDA, );
> VecDuplicate(MyRightHandSideVector, );
>
> The DA is constructed using DACreate3d() with the corresponding regular
> structured grid information.
>
> And here is my problem:
> Now, I would like to solve a coupled system of equations for the
> quantities q1 and q2 on the same grid. I.e., the matrix should just get the
> double number of rows and columns, the vectors contain twice the number of
> entries (e.g. first all q1s and then all q2s). And I would like to be sure
> that the entries of q1 and q2, which are associated with the same grid cell
> are located on the same processor.
> Is there already a pre-defined structures available (such as MATMPIAIJ)
> within PETSc to enlarge such a single equation to store the entries of
> coupled equations? Such as
> DACreateTwiceTheGlobalVector()?
>
> The equation is (simplified) of the form
> d/dt q1 + grad q2 = 0
> d/dt q2 + f(q1) = 0
> with an arbitrary function f depending on q1.
>
> Thanks a lot for your help in advance,
> Rolf
>

Re: [petsc-users] Shell preconditioner within a fieldsplit

2016-11-13 Thread Dave May

Damn - the last part of my email is wrong. You want to set the PCType to
"mat". KSPType preonly is fine

On Mon, 14 Nov 2016 at 07:04, Dave May <dave.mayhe...@gmail.com> wrote:

> Looks like you want the contents of your mat shell, specifically the op
> Ax, to define the action of the preconditioner.
>
> You need to either create a PCShell (rather than a MatShell), and define
> the operation called by PCApply(), or keep you current shell but change
> "preonly" to "mat".
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMAT.html#PCMAT
>
> Thanks
> Dave
>
> On Mon, 14 Nov 2016 at 06:36, Safin, Artur <aks084...@utdallas.edu> wrote:
>
> Hello,
>
>
> What is the proper way to set up a shell preconditioner within a
> fielsplit? I have tried it on my own, but do not get the proper
> behavior. The relevant portion looks like this:
>
>
> __
>
> // Global System
> KSPSetOperators(ksp, A, A);
>
> // Skipped code..
>
>
>
> // Shell Preconditioner for the pressure sub-block
>
> KSP *subksp;
>
> PCFieldSplitGetSubKSP(pc, NULL, );
>
> Mat pressureA;
> KSPSetType(subksp[0], "preonly");
> MatCreateShell(MPI_COMM_WORLD, n_local_P_dofs, n_local_P_dofs, ,
> PETSC_DETERMINE, PETSC_DETERMINE, );
> MatShellSetOperation(pressureA, MATOP_MULT, (void(*)(void))
> PressureBlock);
> KSPSetOperators(subksp[0], pressureA, pressureA);
>
> // Skipped code..
>
>
> KSPSetUp(ksp);
>
> KSPSolve(ksp, b, x);
>
> __
>
>
> The fieldsplit component works fine; the solver however does not go into
> the custom function PressureBlock(), so I am curious as to what the correct
> approach is.
>
>
> Best,
>
>
> Artur
>
>

Re: [petsc-users] Shell preconditioner within a fieldsplit

2016-11-13 Thread Dave May

Looks like you want the contents of your mat shell, specifically the op Ax,
to define the action of the preconditioner.

You need to either create a PCShell (rather than a MatShell), and define
the operation called by PCApply(), or keep you current shell but change
"preonly" to "mat".

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMAT.html#PCMAT

Thanks
Dave

On Mon, 14 Nov 2016 at 06:36, Safin, Artur  wrote:

> Hello,
>
>
> What is the proper way to set up a shell preconditioner within a
> fielsplit? I have tried it on my own, but do not get the proper
> behavior. The relevant portion looks like this:
>
>
> __
>
> // Global System
> KSPSetOperators(ksp, A, A);
>
> // Skipped code..
>
>
>
> // Shell Preconditioner for the pressure sub-block
>
> KSP *subksp;
>
> PCFieldSplitGetSubKSP(pc, NULL, );
>
> Mat pressureA;
> KSPSetType(subksp[0], "preonly");
> MatCreateShell(MPI_COMM_WORLD, n_local_P_dofs, n_local_P_dofs, ,
> PETSC_DETERMINE, PETSC_DETERMINE, );
> MatShellSetOperation(pressureA, MATOP_MULT, (void(*)(void))
> PressureBlock);
> KSPSetOperators(subksp[0], pressureA, pressureA);
>
> // Skipped code..
>
>
> KSPSetUp(ksp);
>
> KSPSolve(ksp, b, x);
>
> __
>
>
> The fieldsplit component works fine; the solver however does not go into
> the custom function PressureBlock(), so I am curious as to what the correct
> approach is.
>
>
> Best,
>
>
> Artur
>

Re: [petsc-users] Column #j is wrong in parallel from message "Inserting a new nonzero (i, j) into matrix"

2016-10-21 Thread Dave May

On 21 October 2016 at 18:55, Eric Chamberland <
eric.chamberl...@giref.ulaval.ca> wrote:

> Hi,
>
> I am on a new issue with a message:
> [1]PETSC ERROR: - Error Message
> --
> [1]PETSC ERROR: Argument out of range
> [1]PETSC ERROR: New nonzero at (374328,1227) caused a malloc
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
> off this check
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016
> [1]PETSC ERROR: /pmi/ericc/projetm4/depots_prepush/BIB/bin/BIBMEF.opt on
> a arch-linux2-c-debug named lorien by eric Fri Oct 21 13:46:51 2016
> [1]PETSC ERROR: Configure options 
> --prefix=/opt/petsc-3.7.2_debug_matmatmult_mpi
> --with-mpi-compilers=1 --with-make-np=12 --with-shared-libraries=1
> --with-mpi-dir=/opt/openmpi-1.10.2 --with-debugging=yes
> --with-mkl_pardiso=1 --with-mkl_pardiso-dir=/opt/intel/composerxe/mkl
> --download-ml=yes --download-mumps=yes --download-superlu=yes
> --download-superlu_dist=yes --download-parmetis=yes --download-ptscotch=yes
> --download-metis=yes --download-suitesparse=yes --download-hypre=yes
> --with-scalapack=1 --with-scalapack-include=/opt/intel/composerxe/mkl/include
> --with-scalapack-lib="-L/opt/intel/composerxe/mkl/lib/intel64
> -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64"
> --with-blas-lapack-dir=/opt/intel/composerxe/mkl/lib/intel64
> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in
> /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c
> [1]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in
> /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c
> [1]PETSC ERROR: #3 MatAssemblyEnd() line 5194 in
> /groshd/ericc/petsc-3.7.2-debug/src/mat/interface/matrix.c
>
> I am starting to debug, but I just want to be sure that the indices 374328
> and 1227 are both global indices...
>

They are.


>
> re-reading the thread makes me think yes... but I am not 100% sure...
>
> Thanks,
>
> Eric
>
>
>
> On 26/03/15 09:52 PM, Barry Smith wrote:
>
>>
>>   Eric,
>>
>>   I have now updated all the standard MPI matrix types AIJ, BAIJ, SBAIJ
>> to print the correct global indices in the error messages when a new
>> nonzero location is generated thus making debugging this issue easier. In
>> the branches barry/fix-inserting-new-nonzero-column-location, next and
>> the next release.
>>
>>Thanks for pushing on this. The previous code was too "developer
>> centric" and not enough "user centric" enough.
>>
>>Barry
>>
>> On Mar 25, 2015, at 1:03 PM, Eric Chamberland <
>>> eric.chamberl...@giref.ulaval.ca> wrote:
>>>
>>> Hi,
>>>
>>> while looking for where in the world do I insert the (135,9) entry in my
>>> matrix, I have discovered that the column # shown is wrong in parallel!
>>>
>>> I am using PETsc 3.5.3.
>>>
>>> The full error message is:
>>>
>>> [0]PETSC ERROR: MatSetValues_MPIAIJ() line 564 in
>>> /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/aij/mpi/mpiaij.c Inserting
>>> a new nonzero (135, 9) into matrix
>>>
>>> This line code is a call to a #defined macro:
>>>
>>> MatSetValues_SeqAIJ_B_Private(row,col,value,addv);
>>>
>>> where the "col" parameter is not equal to "in[j]"!!!
>>>
>>> in gdb, printing "in[j]" gave me:
>>>
>>> print in[j]
>>> $6 = 537
>>>
>>> while "col" is:
>>>
>>> print col
>>> $7 = 9
>>>
>>> So, I expected to have a message telling me that (135,537) and not
>>> (135,9) is a new entry matrix!!!
>>>
>>> Would it be a big work to fix this so that the col # displayed is
>>> correct?
>>>
>>> Thanks!
>>>
>>> Eric
>>>
>>

Re: [petsc-users] cannot convert ‘int’ to ‘PetscInt

2016-10-16 Thread Dave May

On Sunday, 16 October 2016, 丁老师  wrote:

> Dear professor:
>I met the following error for Petsc 3.7.3.
>I delcare LocalSize as  int, but it doesn't work anymore. it works for
> 3.6.3.
>

This error has nothing to do with the version of petsc. Whether it "worked"
is dependent on the size of PetscInt which is configure/architecture
dependent

>
>
>error: cannot convert ‘int*’ to ‘PetscInt* {aka long int*}’ for
> argument ‘2’ to ‘PetscErrorCode VecGetLocalSize(Vec, PetscInt*)’
>VecGetLocalSize (Petsc_b, );
>
> Regards
>

So just fix your code and declare LocalSize as a PetscInt.

If you insist on representing it as an int (which in general is unsafe as
PetscInt might be a 32-bit or 64-bit int), define a new variable and cast
LocalInt to int

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [petsc-users] partition of DM Vec entries

2016-10-14 Thread Dave May

On 15 October 2016 at 06:17, Dave May <dave.mayhe...@gmail.com> wrote:

>
>
> On Saturday, 15 October 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>
>>
>>   Unless the particles are more or less equally distributed over the the
>> entire domain any kind of "domain decomposition" approach is questionably
>> for managing the particles. Otherwise certain processes that have domains
>> that contain most of the particles will have a great deal of work, for all
>> of its particles, while domains with few particles will have little work. I
>> can see two approaches to alleviate this problem.
>>
>> 1) constantly adjust the sizes/locations of the domains to load balance
>> the particles per domain or
>>
>> 2)  parallelize the particles (some how) instead of just the geometry.
>>
>> Anyways, there is a preliminary DMSWARM class in the development version
>> of PETSc for helping to work with particles provided by Dave May. You might
>> look at it. I don't know if it would useful for you or not. IMHO software
>> library support for particle methods is still very primitive compared to
>> finite difference/element support, in other words we still have a lot to do.
>
>
> If you are using an SPH formulation with a constant smoothing length (such
> as for incompressible media), then DMSWARM will be extremely useful. It
> manages the assignment of fields on point clouds and managed data exchanges
> required for particle advection and gather operations from neighbor
> cells required for evaluating the SPH basis functions.
>
> DMSWARM is in the master branch. We would be happy if you want to be beta
> tester. The API is in its infancy and thus having a user play with what's
> there would be the best way to refine the design as required.
>
> Take a look at the examples and let us know if you need help.
>


Specifically look at these examples (in the order I've listed)

* src/dm/examples/tutorials/swarm_ex2.c
Demonstrates how to create the swarm, register fields within the swarm and
how to represent these fields as PETSc Vec objects.

* src/dm/examples/tutorials/swarm_ex3.c
This demonstrates how you push particles from one sub-domain to another.

* src/dm/examples/tutorials/swarm_ex1.c
This demonstrates how to define a collection operation to gather particles
from neighbour cells (cells being defined via DMDA)

There isn't a single complete example using a DMSWARM and DMDA for
everything required by SPH, but all the plumbing is in place.

Thanks,
  Dave


>
> Thanks,
>   Dave
>
>
>>
>>
>>   Barry
>>
>>
>>
>>
>>
>> > On Oct 14, 2016, at 9:54 PM, Sang pham van <pvsang...@gmail.com> wrote:
>> >
>> > Hi Barry,
>> >
>> > Thank your for your answer. I am writing a parallel code for
>> smoothed-particle hydrodynamic, in this code I used a DMDA background mesh
>> for management of particles. Each DMDA cell manages a number of particles,
>> the number can change in both time and cell. In each time step, I need to
>> update position and velocity of particles in border cells to neighbor
>> partition. I think I can not use DMDA Vec to do this be cause the number of
>> particles is not the same in all ghost cells.
>> >
>> > I think I am able to write a routine do this work, but the code may be
>> quite complicated and not so "formal", I would be very appreciated if you
>> can suggest a method to solve my problem.
>> >
>> > Many thanks.
>> >
>> >
>> >
>> >
>> > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith <bsm...@mcs.anl.gov>
>> wrote:
>> >
>> >   Thanks, the question is very clear now.
>> >
>> >   For DMDA you can use DMDAGetNeighborsRank() to get the list of the
>> (up to) 9 neighbors of a processor. (Sadly this routine does not have a
>> manual page but the arguments are obvious). For other DM I don't think
>> there is any simple way to get this information. For none of the DM is
>> there a way to get information about what process is providing a specific
>> ghost cell.
>> >
>> >   It is the "hope" of PETSc (and I would think most parallel computing
>> models) that the details of exactly what process is computing neighbor
>> values should not matter for your own computation. Maybe if you provide
>> more details on how you wish to use this information we may have
>> suggestions on how to proceed.
>> >
>> >   Barry
>> >
>> >
>> >
>> > > On Oct 14, 2016, at 9:23 PM, Sang pham van <pvsang...@gmail.com>
>> wrote:
&g

Re: [petsc-users] partition of DM Vec entries

2016-10-14 Thread Dave May

On Saturday, 15 October 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>   Unless the particles are more or less equally distributed over the the
> entire domain any kind of "domain decomposition" approach is questionably
> for managing the particles. Otherwise certain processes that have domains
> that contain most of the particles will have a great deal of work, for all
> of its particles, while domains with few particles will have little work. I
> can see two approaches to alleviate this problem.
>
> 1) constantly adjust the sizes/locations of the domains to load balance
> the particles per domain or
>
> 2)  parallelize the particles (some how) instead of just the geometry.
>
> Anyways, there is a preliminary DMSWARM class in the development version
> of PETSc for helping to work with particles provided by Dave May. You might
> look at it. I don't know if it would useful for you or not. IMHO software
> library support for particle methods is still very primitive compared to
> finite difference/element support, in other words we still have a lot to do.


If you are using an SPH formulation with a constant smoothing length (such
as for incompressible media), then DMSWARM will be extremely useful. It
manages the assignment of fields on point clouds and managed data exchanges
required for particle advection and gather operations from neighbor
cells required for evaluating the SPH basis functions.

DMSWARM is in the master branch. We would be happy if you want to be beta
tester. The API is in its infancy and thus having a user play with what's
there would be the best way to refine the design as required.

Take a look at the examples and let us know if you need help.

Thanks,
  Dave


>
>
>   Barry
>
>
>
>
>
> > On Oct 14, 2016, at 9:54 PM, Sang pham van <pvsang...@gmail.com
> <javascript:;>> wrote:
> >
> > Hi Barry,
> >
> > Thank your for your answer. I am writing a parallel code for
> smoothed-particle hydrodynamic, in this code I used a DMDA background mesh
> for management of particles. Each DMDA cell manages a number of particles,
> the number can change in both time and cell. In each time step, I need to
> update position and velocity of particles in border cells to neighbor
> partition. I think I can not use DMDA Vec to do this be cause the number of
> particles is not the same in all ghost cells.
> >
> > I think I am able to write a routine do this work, but the code may be
> quite complicated and not so "formal", I would be very appreciated if you
> can suggest a method to solve my problem.
> >
> > Many thanks.
> >
> >
> >
> >
> > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith <bsm...@mcs.anl.gov
> <javascript:;>> wrote:
> >
> >   Thanks, the question is very clear now.
> >
> >   For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up
> to) 9 neighbors of a processor. (Sadly this routine does not have a manual
> page but the arguments are obvious). For other DM I don't think there is
> any simple way to get this information. For none of the DM is there a way
> to get information about what process is providing a specific ghost cell.
> >
> >   It is the "hope" of PETSc (and I would think most parallel computing
> models) that the details of exactly what process is computing neighbor
> values should not matter for your own computation. Maybe if you provide
> more details on how you wish to use this information we may have
> suggestions on how to proceed.
> >
> >   Barry
> >
> >
> >
> > > On Oct 14, 2016, at 9:23 PM, Sang pham van <pvsang...@gmail.com
> <javascript:;>> wrote:
> > >
> > > Hi Barry,
> > >
> > > In 2 processes case, the problem is simple, as I know all ghost cells
> of partition 0 are updated from partition 1. However, in the case of many
> processes, how do I know from which partitions ghost cells of partition 0
> are updated? In other words, How can I know neighboring partitions of the
> partition 0? and can I get a list of ghost cells managing by a neighboring
> partition?
> > > Please let me know if my question is still not clear.
> > >
> > > Many thanks.
> > >
> > >
> > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith <bsm...@mcs.anl.gov
> <javascript:;>> wrote:
> > >
> > > > On Oct 14, 2016, at 8:50 PM, Sang pham van <pvsang...@gmail.com
> <javascript:;>> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am using DM Vec for a FV code, for some reasons, I want to know
> partition of all ghost cells of a specific partition. is there a way do
> that?
> > >
> > >   Could you please explain in more detail what you want, I don't
> understand? Perhaps give a specific example with 2 processes?
> > >
> > >  Barry
> > >
> > >
> > >
> > > >
> > > > Many thanks.
> > > >
> > > > Best,
> > > >
> > >
> > >
> >
> >
>
>

Re: [petsc-users] Performance of the Telescope Multigrid Preconditioner

2016-10-07 Thread Dave May

On Friday, 7 October 2016, frank <hengj...@uci.edu> wrote:

> Dear all,
>
> Thank you so much for the advice.
>
> All setup is done in the first solve.
>
>
>> ** The time for 1st solve does not scale.
>> In practice, I am solving a variable coefficient  Poisson equation. I
>> need to build the matrix every time step. Therefore, each step is similar
>> to the 1st solve which does not scale. Is there a way I can improve the
>> performance?
>>
>
>> You could use rediscretization instead of Galerkin to produce the coarse
>> operators.
>>
>
> Yes I can think of one option for improved performance, but I cannot tell
> whether it will be beneficial because the logging isn't sufficiently fine
> grained (and there is no easy way to get the info out of petsc).
>
> I use PtAP to repartition the matrix, this could be consuming most of the
> setup time in Telescope with your run. Such a repartitioning could be avoid
> if you provided a method to create the operator on the coarse levels (what
> Matt is suggesting). However, this requires you to be able to define your
> coefficients on the coarse grid. This will most likely reduce setup time,
> but your coarse grid operators (now re-discretized) are likely to be less
> effective than those generated via Galerkin coarsening.
>
>
> Please correct me if I understand this incorrectly:   I can define my own
> restriction function and pass it to petsc instead of using PtAP.
> If so,what's the interface to do that?
>

You need to provide your provide a method to KSPSetComputeOoerators to your
outer KSP

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetComputeOperators.html

This method will get propagated through telescope to the KSP running in the
sub-comm.

Note that this functionality is currently not support for fortran. I need
to make a small modification to telescope to enable fortran support.

Thanks
  Dave


>
>
>
> Also, you use CG/MG when FMG by itself would probably be faster. Your
>> smoother is likely not strong enough, and you
>> should use something like V(2,2). There is a lot of tuning that is
>> possible, but difficult to automate.
>>
>
> Matt's completely correct.
> If we could automate this in a meaningful manner, we would have done so.
>
>
> I am not as familiar with multigrid as you guys. It would be very kind if
> you could be more specific.
> What does V(2,2) stand for? Is there some strong smoother build in petsc
> that I can try?
>
>
> Another thing, the vector assemble and scatter take more time as I
> increased the cores#:
>
>  cores#   4096
> 8192  16384 32768  65536
> VecAssemblyBegin   2982.91E+002.87E+008.59E+00
> 2.75E+012.21E+03
> VecAssemblyEnd  2983.37E-031.78E-031.78E-03
> 5.13E-031.99E-03
> VecScatterBegin   763033.82E+003.01E+002.54E+00
> 4.40E+001.32E+00
> VecScatterEnd  763033.09E+011.47E+012.23E+01
> 2.96E+012.10E+01
>
> The above data is produced by solving a constant coefficients Possoin
> equation with different rhs for 100 steps.
> As you can see, the time of VecAssemblyBegin increase dramatically from
> 32K cores to 65K.
> With 65K cores, it took more time to assemble the rhs than solving the
> equation.   Is there a way to improve this?
>
>
> Thank you.
>
> Regards,
> Frank
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>>>
>>>
>>>
>>>
>>> On 10/04/2016 12:56 PM, Dave May wrote:
>>>
>>>
>>>
>>> On Tuesday, 4 October 2016, frank <hengj...@uci.edu
>>> <javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote:
>>>
>>>> Hi,
>>>> This question is follow-up of the thread "Question about memory usage
>>>> in Multigrid preconditioner".
>>>> I used to have the "Out of Memory(OOM)" problem when using the
>>>> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0;
>>>> -matptap_scalable" option did solve that problem.
>>>>
>>>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I
>>>> used one sub-communicator in all the tests. The difference between the
>>>> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2
>>>> the number of multigrid levels in the up/down solver. The function
>>>> "ksp_solve" is timed. It is kind of slow and doesn't scale at all.
>>>>
>>>> Test1: 512^

Re: [petsc-users] Performance of the Telescope Multigrid Preconditioner

2016-10-07 Thread Dave May

he communicator size is reduced by cannot be determined
apriori. Experimentation is the only way.


>
>
> Also, you use CG/MG when FMG by itself would probably be faster. Your
> smoother is likely not strong enough, and you
> should use something like V(2,2). There is a lot of tuning that is
> possible, but difficult to automate.
>

Matt's completely correct.
If we could automate this in a meaningful manner, we would have done so.


Thanks,
  Dave


>
>   Thanks,
>
>  Matt
>
>
>> Thank you.
>>
>> Regards,
>> Frank
>>
>>
>>
>>
>>
>> On 10/04/2016 12:56 PM, Dave May wrote:
>>
>>
>>
>> On Tuesday, 4 October 2016, frank <hengj...@uci.edu> wrote:
>>
>>> Hi,
>>> This question is follow-up of the thread "Question about memory usage in
>>> Multigrid preconditioner".
>>> I used to have the "Out of Memory(OOM)" problem when using the
>>> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0;
>>> -matptap_scalable" option did solve that problem.
>>>
>>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I
>>> used one sub-communicator in all the tests. The difference between the
>>> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2
>>> the number of multigrid levels in the up/down solver. The function
>>> "ksp_solve" is timed. It is kind of slow and doesn't scale at all.
>>>
>>> Test1: 512^3 grid points
>>> Core#telescope_reduction_factorMG levels# for up/down
>>> solver Time for KSPSolve (s)
>>> 512 8 4 /
>>> 3  6.2466
>>> 4096   64   5 /
>>> 3  0.9361
>>> 32768 64   4 /
>>> 3  4.8914
>>>
>>> Test2: 1024^3 grid points
>>> Core#telescope_reduction_factorMG levels# for up/down
>>> solver Time for KSPSolve (s)
>>> 4096   64   5 / 4
>>>  3.4139
>>> 8192   128 5 /
>>> 4  2.4196
>>> 16384 32   5 / 3
>>>  5.4150
>>> 32768 64   5 /
>>> 3  5.6067
>>> 65536 128 5 /
>>> 3  6.5219
>>>
>>
>> You have to be very careful how you interpret these numbers. Your solver
>> contains nested calls to KSPSolve, and unfortunately as a result the
>> numbers you report include setup time. This will remain true even if you
>> call KSPSetUp on the outermost KSP.
>>
>> Your email concerns scalability of the silver application, so let's focus
>> on that issue.
>>
>> The only way to clearly separate setup from solve time is to perform two
>> identical solves. The second solve will not require any setup. You should
>> monitor the second solve via a new PetscStage.
>>
>> This was what I did in the telescope paper. It was the only way to
>> understand the setup cost (and scaling) cf the solve time (and scaling).
>>
>> Thanks
>>   Dave
>>
>>
>>
>>> I guess I didn't set the MG levels properly. What would be the efficient
>>> way to arrange the MG levels?
>>> Also which preconditionr at the coarse mesh of the 2nd communicator
>>> should I use to improve the performance?
>>>
>>> I attached the test code and the petsc options file for the 1024^3 cube
>>> with 32768 cores.
>>>
>>> Thank you.
>>>
>>> Regards,
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 09/15/2016 03:35 AM, Dave May wrote:
>>>
>>> HI all,
>>>
>>> I the only unexpected memory usage I can see is associated with the call
>>> to MatPtAP().
>>> Here is something you can try immediately.
>>> Run your code with the additional options
>>>   -matrap 0 -matptap_scalable
>>&g

Re: [petsc-users] using DMDA with python

2016-10-05 Thread Dave May

On 5 October 2016 at 18:49, Matthew Knepley  wrote:

> On Wed, Oct 5, 2016 at 11:19 AM, E. Tadeu  wrote:
>
>> Matt,
>>
>>   Do you know if there is any example of solving Navier Stokes using a
>> staggered approach by using a different DM object such as DMPlex?
>>
>
> SNES ex62 can do P2/P1 Stokes, which is similar. Is that what you want to
> see?
>
> For real structured grid, staggered mesh stuff like MAC, I would just do
> this on a single DMDA, but think of it as being staggered, and expand my
> stencil as necessary.
>

Following that up, for a DMDA example using a staggered grid, take a look
at snes/ex30.c

http://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex30.c.html

Thanks,
  Dave


>
>   Thanks,
>
>  Matt
>
>
>>
>>   Thanks,
>> Edson
>>
>>
>> On Tue, Oct 4, 2016 at 11:12 PM, Matthew Knepley 
>> wrote:
>>
>>> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay 
>>> wrote:
>>>
 Dear all,
 I want to write a solver for incompressible navier stokes
 using python and I want to use PETsc (particularly dmda & ksp) for this.
 May I know if this type of work is feasible/already done?

>>>
>>> How do you plan to discretize your system? DMDA supports only
>>> collocation discretizations, so some sort of penalty for pressure would
>>> have to be employed.
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 I intend to run my solver in a cluster and so am slightly
 concerned about the performance if I use python with petsc.
 My deepest apologies if this mail of mine caused you any
 inconvenience.

 Somdeb

>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] Performance of the Telescope Multigrid Preconditioner

2016-10-04 Thread Dave May

On Tuesday, 4 October 2016, frank <hengj...@uci.edu> wrote:

> Hi,
> This question is follow-up of the thread "Question about memory usage in
> Multigrid preconditioner".
> I used to have the "Out of Memory(OOM)" problem when using the
> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0;
> -matptap_scalable" option did solve that problem.
>
> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used
> one sub-communicator in all the tests. The difference between the petsc
> options in those tests are: 1 the pc_telescope_reduction_factor; 2 the
> number of multigrid levels in the up/down solver. The function "ksp_solve"
> is timed. It is kind of slow and doesn't scale at all.
>
> Test1: 512^3 grid points
> Core#telescope_reduction_factorMG levels# for up/down
> solver Time for KSPSolve (s)
> 512 8 4 /
> 3  6.2466
> 4096   64   5 /
> 3  0.9361
> 32768 64   4 /
> 3  4.8914
>
> Test2: 1024^3 grid points
> Core#telescope_reduction_factorMG levels# for up/down
> solver Time for KSPSolve (s)
> 4096   64   5 / 4
>  3.4139
> 8192   128 5 /
> 4  2.4196
> 16384 32   5 / 3
>  5.4150
> 32768 64   5 /
> 3  5.6067
> 65536 128 5 /
> 3  6.5219
>

You have to be very careful how you interpret these numbers. Your solver
contains nested calls to KSPSolve, and unfortunately as a result the
numbers you report include setup time. This will remain true even if you
call KSPSetUp on the outermost KSP.

Your email concerns scalability of the silver application, so let's focus
on that issue.

The only way to clearly separate setup from solve time is to perform two
identical solves. The second solve will not require any setup. You should
monitor the second solve via a new PetscStage.

This was what I did in the telescope paper. It was the only way to
understand the setup cost (and scaling) cf the solve time (and scaling).

Thanks
  Dave



> I guess I didn't set the MG levels properly. What would be the efficient
> way to arrange the MG levels?
> Also which preconditionr at the coarse mesh of the 2nd communicator should
> I use to improve the performance?
>
> I attached the test code and the petsc options file for the 1024^3 cube
> with 32768 cores.
>
> Thank you.
>
> Regards,
> Frank
>
>
>
>
>
>
> On 09/15/2016 03:35 AM, Dave May wrote:
>
> HI all,
>
> I the only unexpected memory usage I can see is associated with the call
> to MatPtAP().
> Here is something you can try immediately.
> Run your code with the additional options
>   -matrap 0 -matptap_scalable
>
> I didn't realize this before, but the default behaviour of MatPtAP in
> parallel is actually to to explicitly form the transpose of P (e.g.
> assemble R = P^T) and then compute R.A.P.
> You don't want to do this. The option -matrap 0 resolves this issue.
>
> The implementation of P^T.A.P has two variants.
> The scalable implementation (with respect to memory usage) is selected via
> the second option -matptap_scalable.
>
> Try it out - I see a significant memory reduction using these options for
> particular mesh sizes / partitions.
>
> I've attached a cleaned up version of the code you sent me.
> There were a number of memory leaks and other issues.
> The main points being
>   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
>   * You should call PetscFinalize(), otherwise the option -log_summary
> (-log_view) will not display anything once the program has completed.
>
>
> Thanks,
>   Dave
>
>
> On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu
> <javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote:
>
>> Hi Dave,
>>
>> Sorry, I should have put more comment to explain the code.
>> The number of process in each dimension is the same: Px = Py=Pz=P. So is
>> the domain size.
>> So if the you want to run the code for a  512^3 grid point

[petsc-users] Write binary to matrix

2016-09-22 Thread Dave May

On Thursday, 22 September 2016, Florian Lindner > wrote:

> Hello,
>
> I want to write a MATSBAIJ to a file in binary, so that I can load it
> later using MatLoad.
>
> However, I keep getting the error:
>
> [5]PETSC ERROR: No support for this operation for this object type!
> [5]PETSC ERROR: Cannot get subcomm viewer for binary files or sockets
> unless SubViewer contains the rank 0 process
> [6]PETSC ERROR: PetscViewerGetSubViewer_Binary() line 46 in
> /data/scratch/lindnefn/software/petsc/src/sys/classes/
> viewer/impls/binary/binv.c
>
> The rank 0 is included, as you can see below, I use PETSC_COMM_WORLD and
> the matrix is also created like that.
>
> The code looks like:
>
> PetscErrorCode ierr = 0;
> PetscViewer viewer;
> PetscViewerBinaryOpen(PETSC_COMM_WORLD, filename.c_str(),
> FILE_MODE_WRITE, ); CHKERRV(ierr);
> MatView(matrix, viewer); CHKERRV(ierr);
> PetscViewerDestroy();


The code snippet looks weird.

The error could be related to your usage of the error checking macros, Eg
the fact you set ierr to zero rather than assigning it to the return value
of your petsc function calls.

You should  do
ierr = petscfunc();CHQERRQ(ierr);

And why do you use CHQERRV and not CHKERRQ?

Thanks
  Dave



>
> Thanks,
> Florian
>

Re: [petsc-users] Issue updating MUMPS ictnl after failed solve

2016-09-19 Thread Dave May

On 19 September 2016 at 21:05, David Knezevic 
wrote:

> When I use MUMPS via PETSc, one issue is that it can sometimes fail with
> MUMPS error -9, which means that MUMPS didn't allocate a big enough
> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
> via the command line option -mat_mumps_icntl_14.
>
> However, instead of having to run several times with different command
> line options, I'd like to be able to automatically increment icntl 14 value
> in a loop until the solve succeeds.
>
> I have a saved matrix which fails when I use it for a solve with MUMPS
> with 4 MPI processes and the default ictnl values, so I'm using this to
> check that I can achieve the automatic icntl 14 update, as described above.
> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send
> it to anyone else who wants to try this test case out.)
>
> I've pasted some test code below which provides a simple test of this idea
> using two solves. The first solve uses the default value of icntl 14, which
> fails, and then we update icntl 14 to 30 and solve again. The second solve
> should succeed since icntl 14 of 30 is sufficient for MUMPS to succeed in
> this case, but for some reason the second solve still fails.
>
> Below I've also pasted the output from -ksp_view, and you can see that
> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
> output), so it's not clear to me why the second solve fails. It seems like
> MUMPS is ignoring the update to the ictnl value?
>

I believe this parameter is utilized during the numerical factorization
phase.
In your code, the operator hasn't changed, however you haven't signalled to
the KSP that you want to re-perform the numerical factorization.
You can do this by calling KSPSetOperators() before your second solve.
I think if you do this (please try it), the factorization will be performed
again and the new value of icntl will have an effect.

Note this is a wild stab in the dark - I haven't dug through the
petsc-mumps code in detail...

Thanks,
  Dave


>
>
> Thanks,
> David
>
> 
> -
> Test code:
>
>   Mat A;
>   MatCreate(PETSC_COMM_WORLD,);
>   MatSetType(A,MATMPIAIJ);
>
>   PetscViewer petsc_viewer;
>   PetscViewerBinaryOpen( PETSC_COMM_WORLD,
>  "matrix.dat",
>  FILE_MODE_READ,
>  _viewer);
>   MatLoad(A, petsc_viewer);
>   PetscViewerDestroy(_viewer);
>
>   PetscInt m, n;
>   MatGetSize(A, , );
>
>   Vec x;
>   VecCreate(PETSC_COMM_WORLD,);
>   VecSetSizes(x,PETSC_DECIDE,m);
>   VecSetFromOptions(x);
>   VecSet(x,1.0);
>
>   Vec b;
>   VecDuplicate(x,);
>
>   KSP ksp;
>   PC pc;
>
>   KSPCreate(PETSC_COMM_WORLD,);
>   KSPSetOperators(ksp,A,A);
>
>   KSPSetType(ksp,KSPPREONLY);
>   KSPGetPC(ksp,);
>
>   PCSetType(pc,PCCHOLESKY);
>
>   PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
>   PCFactorSetUpMatSolverPackage(pc);
>
>   KSPSetFromOptions(ksp);
>   KSPSetUp(ksp);
>
>   KSPSolve(ksp,b,x);
>
>   {
> KSPConvergedReason reason;
> KSPGetConvergedReason(ksp, );
> std::cout << "converged reason: " << reason << std::endl;
>   }
>
>   Mat F;
>   PCFactorGetMatrix(pc,);
>   MatMumpsSetIcntl(F,14,30);
>
>   KSPSolve(ksp,b,x);
>
>   {
> KSPConvergedReason reason;
> KSPGetConvergedReason(ksp, );
> std::cout << "converged reason: " << reason << std::endl;
>   }
>
> 
> -
> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged
> reason: -11" for both solves)
>
> KSP Object: 4 MPI processes
>   type: preonly
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object: 4 MPI processes
>   type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
>   Factored matrix follows:
> Mat Object: 4 MPI processes
>   type: mpiaij
>   rows=22878, cols=22878
>   package used to perform factorization: mumps
>   total: nonzeros=3361617, allocated nonzeros=3361617
>   total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
>   SYM (matrix type):   2
>   PAR (host participation):1
>   ICNTL(1) (output for error): 6
>   ICNTL(2) (output of diagnostic msg): 0
>   ICNTL(3) (output for global info):   0
>   ICNTL(4) (level of printing):0
>   ICNTL(5) (input mat struct): 0
>   ICNTL(6) (matrix prescaling):7
>

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May

On Thursday, 15 September 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>Should we have some simple selection of default algorithms based on
> problem size/number of processes? For example if using more than 1000
> processes then use scalable version etc?  How would we decide on the
> parameter values?


I don't like the idea of having "smart" selection by default as it's
terribly annoying for the user when they try and understand the performance
characteristics of a given method when they do a strong/weak scaling test.
If such a smart selection strategy was adopted, the details of it should be
made abundantly clear to the user.

These algs are dependent on many some factors, thus making the smart
selection for all use cases hard / impossible.

I would be happy with unifying the three inplemtationa with three different
options AND having these implantation options documented in the man page.
Maybe even the man page should advise users which to use in particular
circumstances (I think there is something similar on the VecScatter page).

I have these as suggestions for unifying the options names using bools

-matptap_explicit_transpose
-matptap_symbolic_transpose_dense
-matptap_symbolic_transpose

Or maybe enums is more clear
-matptap_impl {explicit_pt,symbolic_pt_dense,symbolic_pt}

which are equivalent to these options
1) the current default
2) -matrap 0
3) -matrap 0 -matptap_scalable

Maybe there could be a fourth option
-matptap_dynamic_selection
which chooses the most appropriate alg given machine info, problem size,
partition size, At least if the user explicitly chooses the
dynamic_selection mode, they wouldn't be surprised if there were any
bumps appearing in any scaling study they conducted.

Cheers
  Dave



>
>Barry
>
> > On Sep 15, 2016, at 5:35 AM, Dave May <dave.mayhe...@gmail.com
> <javascript:;>> wrote:
> >
> > HI all,
> >
> > I the only unexpected memory usage I can see is associated with the call
> to MatPtAP().
> > Here is something you can try immediately.
> > Run your code with the additional options
> >   -matrap 0 -matptap_scalable
> >
> > I didn't realize this before, but the default behaviour of MatPtAP in
> parallel is actually to to explicitly form the transpose of P (e.g.
> assemble R = P^T) and then compute R.A.P.
> > You don't want to do this. The option -matrap 0 resolves this issue.
> >
> > The implementation of P^T.A.P has two variants.
> > The scalable implementation (with respect to memory usage) is selected
> via the second option -matptap_scalable.
> >
> > Try it out - I see a significant memory reduction using these options
> for particular mesh sizes / partitions.
> >
> > I've attached a cleaned up version of the code you sent me.
> > There were a number of memory leaks and other issues.
> > The main points being
> >   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
> >   * You should call PetscFinalize(), otherwise the option -log_summary
> (-log_view) will not display anything once the program has completed.
> >
> >
> > Thanks,
> >   Dave
> >
> >
> > On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu
> <javascript:;>> wrote:
> > Hi Dave,
> >
> > Sorry, I should have put more comment to explain the code.
> > The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> > So if the you want to run the code for a  512^3 grid points on 16^3
> cores, you need to set "-N 512 -P 16" in the command line.
> > I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
> >
> > Thank you.
> > Frank
> >
> >
> > On 9/14/2016 9:05 PM, Dave May wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com
> <javascript:;>> wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, frank <hengj...@uci.edu <javascript:;>>
> wrote:
> >> Hi,
> >>
> >> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> >> The code just solves a 3d poisson equation.
> >>
> >> Why is the stencil width a runtime parameter?? And why is the default
> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
> >>
> >> Was this choice made to mimic something in the real application code?
> >>
> >> Please ignore - I misunderstood your usage of the param set by -P
> >>
> >>
> >>
> >> I run the code on a 1024^3

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May

HI all,

I the only unexpected memory usage I can see is associated with the call to
MatPtAP().
Here is something you can try immediately.
Run your code with the additional options
  -matrap 0 -matptap_scalable

I didn't realize this before, but the default behaviour of MatPtAP in
parallel is actually to to explicitly form the transpose of P (e.g.
assemble R = P^T) and then compute R.A.P.
You don't want to do this. The option -matrap 0 resolves this issue.

The implementation of P^T.A.P has two variants.
The scalable implementation (with respect to memory usage) is selected via
the second option -matptap_scalable.

Try it out - I see a significant memory reduction using these options for
particular mesh sizes / partitions.

I've attached a cleaned up version of the code you sent me.
There were a number of memory leaks and other issues.
The main points being
  * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
  * You should call PetscFinalize(), otherwise the option -log_summary
(-log_view) will not display anything once the program has completed.


Thanks,
  Dave


On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu> wrote:

> Hi Dave,
>
> Sorry, I should have put more comment to explain the code.
> The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores,
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
>
> Thank you.
> Frank
>
>
> On 9/14/2016 9:05 PM, Dave May wrote:
>
>
>
> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com> wrote:
>
>>
>>
>> On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:
>>
>>> Hi,
>>>
>>> I write a simple code to re-produce the error. I hope this can help to
>>> diagnose the problem.
>>> The code just solves a 3d poisson equation.
>>>
>>
>> Why is the stencil width a runtime parameter?? And why is the default
>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>
>> Was this choice made to mimic something in the real application code?
>>
>
> Please ignore - I misunderstood your usage of the param set by -P
>
>
>>
>>
>>>
>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>>> That's when I re-produce the OOM error. Each core has about 2G memory.
>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>>> solver works fine.
>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>
>>> Thank you.
>>> Frank
>>>
>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>
>>> Hi Barry,
>>>
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>>> is not in file I sent you. I am sorry for the confusion.
>>>
>>> Regards,
>>> Frank
>>>
>>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>
>>>>
>>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>>>> >
>>>> > Hi Barry,
>>>> >
>>>> > I think the first KSP view output is from -ksp_view_pre. Before I
>>>> submitted the test, I was not sure whether there would be OOM error or not.
>>>> So I added both -ksp_view_pre and -ksp_view.
>>>>
>>>>   But the options file you sent specifically does NOT list the
>>>> -ksp_view_pre so how could it be from that?
>>>>
>>>>Sorry to be pedantic but I've spent too much time in the past trying
>>>> to debug from incorrect information and want to make sure that the
>>>> information I have is correct before thinking. Please recheck exactly what
>>>> happened. Rerun with the exact input file you emailed if that is needed.
>>>>
>>>>Barry
>>>>
>>>> >
>>>> > Frank
>>>> >
>>>> >
>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>>> >>   Why does ksp_view2.txt have two KSP views in it while
>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves
>>>> in the 2 case but not the one?
>>>> >>
>>>> >>   Barry
>>>> >>
>>>> >>
>>>> >>
>>>> >>> On

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May

On Thursday, 15 September 2016, Hengjie Wang <hengj...@uci.edu> wrote:

> Hi Dave,
>
> Sorry, I should have put more comment to explain the code.
>

No problem. I was looking at the code after only 3 hrs of sleep


>
> The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores,
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
>

Yep thanks, I see that now.

I know this is only a test, but this is kinda clunky. The dmda can
automatically choose the partition, and if the user wants control over it,
they can use the command line options -da_processors_{x,y,z} (as in your
options file).

For my testing purposes I'll have to tweak your code as I don't want to
always have to change two options when changing the partition size or mesh
size (as I'll certainly get it wrong every second time leading to a lose of
my time due to queue wait times)

Thanks,
  Dave



>
>
> Thank you.
> Frank
>
> On 9/14/2016 9:05 PM, Dave May wrote:
>
>
>
> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com
> <javascript:_e(%7B%7D,'cvml','dave.mayhe...@gmail.com');>> wrote:
>
>>
>>
>> On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:
>>
>>> Hi,
>>>
>>> I write a simple code to re-produce the error. I hope this can help to
>>> diagnose the problem.
>>> The code just solves a 3d poisson equation.
>>>
>>
>> Why is the stencil width a runtime parameter?? And why is the default
>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>
>> Was this choice made to mimic something in the real application code?
>>
>
> Please ignore - I misunderstood your usage of the param set by -P
>
>
>>
>>
>>>
>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>>> That's when I re-produce the OOM error. Each core has about 2G memory.
>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>>> solver works fine.
>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>
>>> Thank you.
>>> Frank
>>>
>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>
>>> Hi Barry,
>>>
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>>> is not in file I sent you. I am sorry for the confusion.
>>>
>>> Regards,
>>> Frank
>>>
>>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>
>>>>
>>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>>>> >
>>>> > Hi Barry,
>>>> >
>>>> > I think the first KSP view output is from -ksp_view_pre. Before I
>>>> submitted the test, I was not sure whether there would be OOM error or not.
>>>> So I added both -ksp_view_pre and -ksp_view.
>>>>
>>>>   But the options file you sent specifically does NOT list the
>>>> -ksp_view_pre so how could it be from that?
>>>>
>>>>Sorry to be pedantic but I've spent too much time in the past trying
>>>> to debug from incorrect information and want to make sure that the
>>>> information I have is correct before thinking. Please recheck exactly what
>>>> happened. Rerun with the exact input file you emailed if that is needed.
>>>>
>>>>Barry
>>>>
>>>> >
>>>> > Frank
>>>> >
>>>> >
>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>>> >>   Why does ksp_view2.txt have two KSP views in it while
>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves
>>>> in the 2 case but not the one?
>>>> >>
>>>> >>   Barry
>>>> >>
>>>> >>
>>>> >>
>>>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>>>> >>>
>>>> >>> Hi,
>>>> >>>
>>>> >>> I want to continue digging into the memory problem here.
>>>> >>> I did find a work around in the past, which is to use less cores
>>>> per node so that each core has 8G memory.

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May

On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com> wrote:

>
>
> On Thursday, 15 September 2016, frank <hengj...@uci.edu
> <javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote:
>
>> Hi,
>>
>> I write a simple code to re-produce the error. I hope this can help to
>> diagnose the problem.
>> The code just solves a 3d poisson equation.
>>
>
> Why is the stencil width a runtime parameter?? And why is the default
> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>
> Was this choice made to mimic something in the real application code?
>

Please ignore - I misunderstood your usage of the param set by -P


>
>
>>
>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>> That's when I re-produce the OOM error. Each core has about 2G memory.
>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>> solver works fine.
>> I attached the code, ksp_view_pre's output and my petsc option file.
>>
>> Thank you.
>> Frank
>>
>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>
>> Hi Barry,
>>
>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>> is not in file I sent you. I am sorry for the confusion.
>>
>> Regards,
>> Frank
>>
>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>
>>>
>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > I think the first KSP view output is from -ksp_view_pre. Before I
>>> submitted the test, I was not sure whether there would be OOM error or not.
>>> So I added both -ksp_view_pre and -ksp_view.
>>>
>>>   But the options file you sent specifically does NOT list the
>>> -ksp_view_pre so how could it be from that?
>>>
>>>Sorry to be pedantic but I've spent too much time in the past trying
>>> to debug from incorrect information and want to make sure that the
>>> information I have is correct before thinking. Please recheck exactly what
>>> happened. Rerun with the exact input file you emailed if that is needed.
>>>
>>>Barry
>>>
>>> >
>>> > Frank
>>> >
>>> >
>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>>> has only one KSPView in it? Did you run two different solves in the 2 case
>>> but not the one?
>>> >>
>>> >>   Barry
>>> >>
>>> >>
>>> >>
>>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I want to continue digging into the memory problem here.
>>> >>> I did find a work around in the past, which is to use less cores per
>>> node so that each core has 8G memory. However this is deficient and
>>> expensive. I hope to locate the place that uses the most memory.
>>> >>>
>>> >>> Here is a brief summary of the tests I did in past:
>>> >>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> >>> Maximum (over computational time) process memory:   total
>>> 7.0727e+08
>>> >>> Current process memory:
>>>total 7.0727e+08
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>>> 6.3908e+11
>>> >>> Current space PetscMalloc()ed:
>>>   total 1.8275e+09
>>> >>>
>>> >>>> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> >>> Maximum (over computational time) process memory:   total
>>> 5.9431e+09
>>> >>> Current process memory:
>>>total 5.9431e+09
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>>> 5.3202e+12
>>> >>> Current space PetscMalloc()ed:
>>>total 5.4844e+09
>>> >>>
>>> >>>> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>>> job during "KSPSolve".
>>> >>>
>>> >>> I attached the output of ksp_view( the third test's output is from
>>> ksp_view_pre ), mem

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May

On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:

> Hi,
>
> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> The code just solves a 3d poisson equation.
>

Why is the stencil width a runtime parameter?? And why is the default value
2? For 7-pnt FD Laplace, you only need a stencil width of 1.

Was this choice made to mimic something in the real application code?


>
> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
> That's when I re-produce the OOM error. Each core has about 2G memory.
> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
> solver works fine.
> I attached the code, ksp_view_pre's output and my petsc option file.
>
> Thank you.
> Frank
>
> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>
> Hi Barry,
>
> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
> is not in file I sent you. I am sorry for the confusion.
>
> Regards,
> Frank
>
> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov
> <javascript:_e(%7B%7D,'cvml','bsm...@mcs.anl.gov');>> wrote:
>
>>
>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>> >
>> > Hi Barry,
>> >
>> > I think the first KSP view output is from -ksp_view_pre. Before I
>> submitted the test, I was not sure whether there would be OOM error or not.
>> So I added both -ksp_view_pre and -ksp_view.
>>
>>   But the options file you sent specifically does NOT list the
>> -ksp_view_pre so how could it be from that?
>>
>>Sorry to be pedantic but I've spent too much time in the past trying
>> to debug from incorrect information and want to make sure that the
>> information I have is correct before thinking. Please recheck exactly what
>> happened. Rerun with the exact input file you emailed if that is needed.
>>
>>Barry
>>
>> >
>> > Frank
>> >
>> >
>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>> has only one KSPView in it? Did you run two different solves in the 2 case
>> but not the one?
>> >>
>> >>   Barry
>> >>
>> >>
>> >>
>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I want to continue digging into the memory problem here.
>> >>> I did find a work around in the past, which is to use less cores per
>> node so that each core has 8G memory. However this is deficient and
>> expensive. I hope to locate the place that uses the most memory.
>> >>>
>> >>> Here is a brief summary of the tests I did in past:
>> >>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>> >>> Maximum (over computational time) process memory:   total
>> 7.0727e+08
>> >>> Current process memory:
>>total 7.0727e+08
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 6.3908e+11
>> >>> Current space PetscMalloc()ed:
>> total 1.8275e+09
>> >>>
>> >>>> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>> >>> Maximum (over computational time) process memory:   total
>> 5.9431e+09
>> >>> Current process memory:
>>total 5.9431e+09
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 5.3202e+12
>> >>> Current space PetscMalloc()ed:
>>  total 5.4844e+09
>> >>>
>> >>>> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>> job during "KSPSolve".
>> >>>
>> >>> I attached the output of ksp_view( the third test's output is from
>> ksp_view_pre ), memory_view and also the petsc options.
>> >>>
>> >>> In all the tests, each core can access about 2G memory. In test3,
>> there are 4223139840 non-zeros in the matrix. This will consume about
>> 1.74M, using double precision. Considering some extra memory used to store
>> integer index, 2G memory should still be way enough.
>> >>>
>> >>> Is there a way to find out which part of KSPSolve uses the most
>> memory?
>> >>> Thank you so much.
>> >>>
>> >>> BTW, ther

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May

Hi Frank,

On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:

> Hi,
>
> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> The code just solves a 3d poisson equation.
> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
> That's when I re-produce the OOM error. Each core has about 2G memory.
> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
> solver works fine.
>

Perfect! That's very helpful, I can use this to track down here the issue
is coming from. Give me some time to figure this out.

Thanks,
  Dave



>
> I attached the code, ksp_view_pre's output and my petsc option file.
>
> Thank you.
> Frank
>
> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>
> Hi Barry,
>
> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
> is not in file I sent you. I am sorry for the confusion.
>
> Regards,
> Frank
>
> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov
> <javascript:_e(%7B%7D,'cvml','bsm...@mcs.anl.gov');>> wrote:
>
>>
>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>> >
>> > Hi Barry,
>> >
>> > I think the first KSP view output is from -ksp_view_pre. Before I
>> submitted the test, I was not sure whether there would be OOM error or not.
>> So I added both -ksp_view_pre and -ksp_view.
>>
>>   But the options file you sent specifically does NOT list the
>> -ksp_view_pre so how could it be from that?
>>
>>Sorry to be pedantic but I've spent too much time in the past trying
>> to debug from incorrect information and want to make sure that the
>> information I have is correct before thinking. Please recheck exactly what
>> happened. Rerun with the exact input file you emailed if that is needed.
>>
>>Barry
>>
>> >
>> > Frank
>> >
>> >
>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>> has only one KSPView in it? Did you run two different solves in the 2 case
>> but not the one?
>> >>
>> >>   Barry
>> >>
>> >>
>> >>
>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I want to continue digging into the memory problem here.
>> >>> I did find a work around in the past, which is to use less cores per
>> node so that each core has 8G memory. However this is deficient and
>> expensive. I hope to locate the place that uses the most memory.
>> >>>
>> >>> Here is a brief summary of the tests I did in past:
>> >>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>> >>> Maximum (over computational time) process memory:   total
>> 7.0727e+08
>> >>> Current process memory:
>>total 7.0727e+08
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 6.3908e+11
>> >>> Current space PetscMalloc()ed:
>> total 1.8275e+09
>> >>>
>> >>>> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>> >>> Maximum (over computational time) process memory:   total
>> 5.9431e+09
>> >>> Current process memory:
>>total 5.9431e+09
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 5.3202e+12
>> >>> Current space PetscMalloc()ed:
>>  total 5.4844e+09
>> >>>
>> >>>> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>> job during "KSPSolve".
>> >>>
>> >>> I attached the output of ksp_view( the third test's output is from
>> ksp_view_pre ), memory_view and also the petsc options.
>> >>>
>> >>> In all the tests, each core can access about 2G memory. In test3,
>> there are 4223139840 non-zeros in the matrix. This will consume about
>> 1.74M, using double precision. Considering some extra memory used to store
>> integer index, 2G memory should still be way enough.
>> >>>
>> >>> Is there a way to find out which part of KSPSolve uses the most
>> memory?
>> >>> Thank you so much.
>> >>>
>> >>> BTW, there are 4 options remains unused and I don't under

Re: [petsc-users] Coloring of -mat_view draw

2016-09-05 Thread Dave May

On 5 September 2016 at 10:43, Justin Chang  wrote:

> Hi all,
>
> So i used the following command-line options to view the non-zero
> structure of my assembled matrix:
>
> -mat_view draw -draw_pause -1
>
> And I got an image filled with cyan squares and dots. However, if I
> right-click on the image a couple times, I now get cyan, blue, and red
> squares. What do the different colors mean?
>

Red represents positive numbers.
Blue represents negative numbers.

I believe cyan represents allocated non-zero entries which were never
populated with entries (explicit zeroes). Someone will correct me if I am
wrong here regarding cyan...

Thanks,
  Dave

> Attached are the images of the two cases:
>
> Thanks,
> Justin
>

Re: [petsc-users] MatGetDiagonalBlock and shell matrices

2016-08-26 Thread Dave May

On Friday, 26 August 2016, Steven Dargaville <dargaville.ste...@gmail.com>
wrote:

> Hi Dave
>
> Thanks for the response. I'm actually using fortran and I wasn't sure that
> PetscObjectComposeFunction would be accessible, and if so, what sort of
> fortran magic I might need to call this function (possibly an interface,
> with or without c_opt).
>
> Do you know if it is possible to call that routine directly from fortran?
>

 I don't know. I'll have appeal to the other petsc users for an answer to
these questions.

Thanks,
  Dave


> Thanks
> Steven
>
> On Fri, Aug 26, 2016 at 1:35 PM, Dave May <dave.mayhe...@gmail.com
> <javascript:_e(%7B%7D,'cvml','dave.mayhe...@gmail.com');>> wrote:
>
>>
>>
>> On 26 August 2016 at 14:34, Dave May <dave.mayhe...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','dave.mayhe...@gmail.com');>> wrote:
>>
>>>
>>>
>>> On 26 August 2016 at 14:14, Steven Dargaville <
>>> dargaville.ste...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','dargaville.ste...@gmail.com');>> wrote:
>>>
>>>> Hi all
>>>>
>>>> I'm just wondering if there is any plans in the future for
>>>> MatGetDiagonalBlock to support shell matrices by registering a
>>>> user-implemented MATOP? MatGetDiagonal supports MATOP, but the block
>>>> version of this does not.
>>>>
>>>> I found a previous query on the user list which touched on this and
>>>> mentioned that it would be easy to add:
>>>>
>>>> http://lists.mcs.anl.gov/pipermail/petsc-users/2011-May/008700.html
>>>>
>>>> I have implemented a matrix-free multigrid algorithm using shell
>>>> operations in PETSc, and it would be very convenient to be able to provide
>>>> a local shell Mat so I could apply local GMRES (or other matvec-based
>>>> solvers) as a local block smoother.
>>>>
>>>
>>> It looks like the thing you need to do is use
>>> PetscObjectComposeFunction() and not MatShellSetOperation()
>>>
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/
>>> Sys/PetscObjectComposeFunction.html#PetscObjectComposeFunction
>>>
>>> with your MatShell object.
>>>
>>> That is, instead of calling MatShellSetOperation(), call
>>> ierr = PetscObjectComposeFunction(myshell,"MatGetDiagonalBlock_C",
>>> MatGetDiagonalBlock_MyShell);CHKERRQ(ierr);
>>>
>>
>> Oops - I forgot the cast, the above should be
>>
>> ierr = PetscObjectComposeFunction((PetscObject)myshell,"MatGetDiago
>> nalBlock_C", MatGetDiagonalBlock_MyShell);CHKERRQ(ierr);
>>
>>
>>>
>>> where MatGetDiagonalBlock_MyShell is a function pointer to your method
>>> to get the diagonal block.
>>>
>>> Important detail is that you don't change the string
>>> "MatGetDiagonalBlock_C". The method MatGetDiagonalBlock() does a function
>>> pointer query using this string. See
>>> http://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface
>>> /matrix.c.html#MatGetDiagonalBlock
>>>
>>>
>>> Thanks
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks!
>>>> Steven
>>>>
>>>>
>>>>
>>>
>>
>

Re: [petsc-users] MatGetDiagonalBlock and shell matrices

2016-08-26 Thread Dave May

On 26 August 2016 at 14:34, Dave May <dave.mayhe...@gmail.com> wrote:

>
>
> On 26 August 2016 at 14:14, Steven Dargaville <dargaville.ste...@gmail.com
> > wrote:
>
>> Hi all
>>
>> I'm just wondering if there is any plans in the future for
>> MatGetDiagonalBlock to support shell matrices by registering a
>> user-implemented MATOP? MatGetDiagonal supports MATOP, but the block
>> version of this does not.
>>
>> I found a previous query on the user list which touched on this and
>> mentioned that it would be easy to add:
>>
>> http://lists.mcs.anl.gov/pipermail/petsc-users/2011-May/008700.html
>>
>> I have implemented a matrix-free multigrid algorithm using shell
>> operations in PETSc, and it would be very convenient to be able to provide
>> a local shell Mat so I could apply local GMRES (or other matvec-based
>> solvers) as a local block smoother.
>>
>
> It looks like the thing you need to do is use PetscObjectComposeFunction()
> and not MatShellSetOperation()
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/
> Sys/PetscObjectComposeFunction.html#PetscObjectComposeFunction
>
> with your MatShell object.
>
> That is, instead of calling MatShellSetOperation(), call
> ierr = PetscObjectComposeFunction(myshell,"MatGetDiagonalBlock_C",
> MatGetDiagonalBlock_MyShell);CHKERRQ(ierr);
>

Oops - I forgot the cast, the above should be

ierr = PetscObjectComposeFunction((PetscObject)myshell,"
MatGetDiagonalBlock_C", MatGetDiagonalBlock_MyShell);CHKERRQ(ierr);


>
> where MatGetDiagonalBlock_MyShell is a function pointer to your method to
> get the diagonal block.
>
> Important detail is that you don't change the string
> "MatGetDiagonalBlock_C". The method MatGetDiagonalBlock() does a function
> pointer query using this string. See
> http://www.mcs.anl.gov/petsc/petsc-current/src/mat/
> interface/matrix.c.html#MatGetDiagonalBlock
>
>
> Thanks
> Dave
>
>
>
>
>
>
>>
>> Thanks!
>> Steven
>>
>>
>>
>

Re: [petsc-users] MatGetDiagonalBlock and shell matrices

2016-08-26 Thread Dave May

On 26 August 2016 at 14:14, Steven Dargaville 
wrote:

> Hi all
>
> I'm just wondering if there is any plans in the future for
> MatGetDiagonalBlock to support shell matrices by registering a
> user-implemented MATOP? MatGetDiagonal supports MATOP, but the block
> version of this does not.
>
> I found a previous query on the user list which touched on this and
> mentioned that it would be easy to add:
>
> http://lists.mcs.anl.gov/pipermail/petsc-users/2011-May/008700.html
>
> I have implemented a matrix-free multigrid algorithm using shell
> operations in PETSc, and it would be very convenient to be able to provide
> a local shell Mat so I could apply local GMRES (or other matvec-based
> solvers) as a local block smoother.
>

It looks like the thing you need to do is use PetscObjectComposeFunction()
and not MatShellSetOperation()

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/
PetscObjectComposeFunction.html#PetscObjectComposeFunction

with your MatShell object.

That is, instead of calling MatShellSetOperation(), call
ierr = PetscObjectComposeFunction(myshell,"MatGetDiagonalBlock_C",
MatGetDiagonalBlock_MyShell);CHKERRQ(ierr);

where MatGetDiagonalBlock_MyShell is a function pointer to your method to
get the diagonal block.

Important detail is that you don't change the string
"MatGetDiagonalBlock_C". The method MatGetDiagonalBlock() does a function
pointer query using this string. See
http://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#MatGetDiagonalBlock


Thanks
Dave






>
> Thanks!
> Steven
>
>
>

[petsc-users] How to solve nonlinear F(x) = b(x)?

2016-08-08 Thread Dave May

On Monday, 8 August 2016, Neiferd, David John > wrote:

> Hello all,
>
> I've been searching through the PETSc documentation to try to find how to
> solve a nonlinear system where the right hand side (b) varies as a function
> of the state variables (x).  According to the PETSc documentation, SNES
> solves the equations F(x) = b where b is a constant vector.  What would I
> do to solve F(x) = b(x)?  An example of this would be a nonlinear
> thermoelastic structure where as the structure deforms the direction of the
> loads generated by the thermal expansion changes as well.  Any insight into
> how to implement this is appreciated.
>

All you need to do is define the non-linear residual F (a vector) such that
it includes b(x)

Eg, suppose I have some discrete non-linear system of the form, Ax = b(x),
then I would define F(x) as F(x) = Ax -b(x)

Thanks,
  Dave

Re: [petsc-users] false-positive leak report in log_view?

2016-08-04 Thread Dave May

On 4 August 2016 at 10:10, Patrick Sanan  wrote:

> I have a patch that I got from Dave that he got from Jed which seems
> to be related to this. I'll make a PR.
>

Jed wrote this variant of the VTK viewer so please mark him as a reviewer
for my bug fix.


>
>
> On Wed, Aug 3, 2016 at 8:18 PM, Mohammad Mirzadeh 
> wrote:
> > OK so I just ran the example under valgrind, and if I use two VecViews,
> it
> > complains about following leak:
> >
> > ==66838== 24,802 (544 direct, 24,258 indirect) bytes in 1 blocks are
> > definitely lost in loss record 924 of 926
> > ==66838==at 0x19EBB: malloc (in
> >
> /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
> > ==66838==by 0x10005E638: PetscMallocAlign (in
> > /usr/local/Cellar/petsc/3.7.2/real/lib/libpetsc.3.7.2.dylib)
> > ==66838==by 0x100405F00: DMCreate_DA (in
> > /usr/local/Cellar/petsc/3.7.2/real/lib/libpetsc.3.7.2.dylib)
> > ==66838==by 0x1003CFFA4: DMSetType (in
> > /usr/local/Cellar/petsc/3.7.2/real/lib/libpetsc.3.7.2.dylib)
> > ==66838==by 0x100405B7F: DMDACreate (in
> > /usr/local/Cellar/petsc/3.7.2/real/lib/libpetsc.3.7.2.dylib)
> > ==66838==by 0x1003F825F: DMDACreate2d (in
> > /usr/local/Cellar/petsc/3.7.2/real/lib/libpetsc.3.7.2.dylib)
> > ==66838==by 0x11D89: main (main_test.cpp:7)
> >
> > By I am destroying the dm ... also I dont get this when using a single
> > VecView. As a bonus info, PETSC_VIEWER_STDOUT_WORLD is just fine, so this
> > looks like it is definitely vtk related.
>

Mohammad,

I can confirm that this VTK functionality bleeds memory if you write more
than 1 vector to disk.

Cheers
  Dave



> >
> > On Wed, Aug 3, 2016 at 2:05 PM, Mohammad Mirzadeh 
> > wrote:
> >>
> >> On Wed, Aug 3, 2016 at 10:59 AM, Matthew Knepley 
> >> wrote:
> >>>
> >>> On Tue, Aug 2, 2016 at 12:40 PM, Mohammad Mirzadeh  >
> >>> wrote:
> 
>  I often use the memory usage information in log_view as a way to check
>  on memory leaks and so far it has worked perfect. However, I had long
>  noticed a false-positive report in memory leak for Viewers, i.e.
> destruction
>  count is always one less than creation.
> >>>
> >>>
> >>> Yes, I believe that is the Viewer being used to print this information.
> >>
> >>
> >> That makes sense.
> >>>
> >>>
> 
>  Today, I noticed what seems to be a second one. If you use VecView to
>  write the same DA to vtk, i.e. call VecView(A, vtk); twice, it also
> report a
>  memory leak for vectors, vecscatters, dm, etc. I am calling this a
>  false-positive since the code is valgrind-clean.
> 
>  Is this known/expected?
> >>>
> >>>
> >>> The VTK viewers have to hold everything they output until they are
> >>> destroyed since the format does not allow immediate writing.
> >>> I think the VTK viewer is not destroyed at the time of this output. Can
> >>> you make a small example that does this?
> >>
> >>
> >> Here's a small example that illustrates the issues
> >>
> >> #include 
> >>
> >>
> >> int main(int argc, char *argv[]) {
> >>
> >>   PetscInitialize(, , NULL, NULL);
> >>
> >>
> >>   DM dm;
> >>
> >>   DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE,
> >> DMDA_STENCIL_BOX,
> >>
> >>-10, -10, PETSC_DECIDE, PETSC_DECIDE, 1, 1, NULL, NULL,
> >>
> >>);
> >>
> >> //  DMDASetUniformCoordinates(dm, -1, 1, -1, 1, 0, 0);
> >>
> >>
> >>   Vec sol;
> >>
> >>   DMCreateGlobalVector(dm, );
> >>
> >>   VecSet(sol, 0);
> >>
> >>
> >>   PetscViewer vtk;
> >>
> >>   PetscViewerVTKOpen(PETSC_COMM_WORLD, "test.vts", FILE_MODE_WRITE,
> );
> >>
> >>   VecView(sol, vtk);
> >>
> >> //  VecView(sol, vtk);
> >>
> >>   PetscViewerDestroy();
> >>
> >>
> >>   DMDestroy();
> >>
> >>   VecDestroy();
> >>
> >>
> >>   PetscFinalize();
> >>
> >>   return 0;
> >>
> >> }
> >>
> >>
> >> If you uncomment the second VecView you get reports for leaks in
> >> VecScatter and dm. If you also uncomment the DMDASetUniformCoordinates,
> and
> >> use both VecViews, you also get a leak report for Vecs ... its quite
> bizarre
> >> ...
> >>
> >>>
> >>> I have switched to HDF5 and XDMF due to the limitations of VTK format.
> >>>
> >>
> >> I had used XDMF + raw binary in the past and was satisfied with the
> >> result. Do you write a single XDMF as a "post-processing" step when the
> >> simulation is finished? If I remember correctly preview could not open
> xmf
> >> files as time-series.
> >>>
> >>>   Thanks,
> >>>
> >>>  Matt
> >>>
> 
>  Here's the relevant bit from log_view:
> 
>  --- Event Stage 0: Main Stage
> 
>    Vector 8  7   250992 0.
>    Vector Scatter 2  00 0.
>  Distributed Mesh 2  00 0.
>  Star Forest Bipartite Graph 4  00 0.
>

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-13 Thread Dave May

On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote:

> Hi Dave,
>
> Sorry for the late reply.
> Thank you so much for your detailed reply.
>
> I have a question about the estimation of the memory usage. There are
> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is
> used. So the memory per process is:
>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
> Did I do sth wrong here? Because this seems too small.
>

No - I totally f***ed it up. You are correct. That'll teach me for fumbling
around with my iphone calculator and not using my brain. (Note that to
convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot
convert between units correctly)

>From the PETSc objects associated with the solver, It looks like it
_should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities
are: somewhere in your usage of PETSc you've introduced a memory leak;
PETSc is doing a huge over allocation (e.g. as per our discussion of
MatPtAP); or in your application code there are other objects you have
forgotten to log the memory for.



> I am running this job on Bluewater
> <https://bluewaters.ncsa.illinois.edu/user-guide>
>
I am using the 7 points FD stencil in 3D.
>

I thought so on both counts.


>
> I apologize that I made a stupid mistake in computing the memory per core.
> My settings render each core can access only 2G memory on average instead
> of 8G which I mentioned in previous email. I re-run the job with 8G memory
> per core on average and there is no "Out Of Memory" error. I would do more
> test to see if there is still some memory issue.
>

Ok. I'd still like to know where the memory was being used since my
estimates were off.


Thanks,
  Dave


>
> Regards,
> Frank
>
>
>
> On 07/11/2016 01:18 PM, Dave May wrote:
>
> Hi Frank,
>
>
> On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote:
>
>> Hi Dave,
>>
>> I re-run the test using bjacobi as the preconditioner on the coarse mesh
>> of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The
>> petsc option file is attached.
>> I still got the "Out Of Memory" error. The error occurred before the
>> linear solver finished one step. So I don't have the full info from
>> ksp_view. The info from ksp_view_pre is attached.
>>
>
> Okay - that is essentially useless (sorry)
>
>
>>
>> It seems to me that the error occurred when the decomposition was going
>> to be changed.
>>
>
> Based on what information?
> Running with -info would give us more clues, but will create a ton of
> output.
> Please try running the case which failed with -info
>
>
>> I had another test with a grid of 1536*128*384 and the same process mesh
>> as above. There was no error. The ksp_view info is attached for comparison.
>> Thank you.
>>
>
>
> [3] Here is my crude estimate of your memory usage.
> I'll target the biggest memory hogs only to get an order of magnitude
> estimate
>
> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI
> rank assuming double precision.
> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit
> integers)
>
> * You use 5 levels of coarsening, so the other operators should represent
> (collectively)
> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the
> communicator with 18432 ranks.
> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator
> with 18432 ranks.
>
> * You use a reduction factor of 64, making the new communicator with 288
> MPI ranks.
> PCTelescope will first gather a temporary matrix associated with your
> coarse level operator assuming a comm size of 288 living on the comm with
> size 18432.
> This matrix will require approximately 0.5 * 64 = 32 MB per core on the
> 288 ranks.
> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus
> require another 32 MB per rank.
> The temporary matrix is now destroyed.
>
> * Because a DMDA is detected, a permutation matrix is assembled.
> This requires 2 doubles per point in the DMDA.
> Your coarse DMDA contains 92 x 16 x 48 points.
> Thus the permutation matrix will require < 1 MB per MPI rank on the
> sub-comm.
>
> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting
> operator will have the same memory footprint as the unpermuted matrix (32
> MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held
> in memory when the DMDA is provided.
>
> From my rough estimates, the worst case memory foot print for any given
> core, given your options is approximately
> 2100 MB + 300 MB + 32 MB + 32 MB + 1 M

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-11 Thread Dave May

Hi Frank,

On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote:

> Hi Dave,
>
> I re-run the test using bjacobi as the preconditioner on the coarse mesh
> of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The
> petsc option file is attached.
> I still got the "Out Of Memory" error. The error occurred before the
> linear solver finished one step. So I don't have the full info from
> ksp_view. The info from ksp_view_pre is attached.
>

Okay - that is essentially useless (sorry)

>
> It seems to me that the error occurred when the decomposition was going to
> be changed.
>

Based on what information?
Running with -info would give us more clues, but will create a ton of
output.
Please try running the case which failed with -info

> I had another test with a grid of 1536*128*384 and the same process mesh
> as above. There was no error. The ksp_view info is attached for comparison.
> Thank you.
>

[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get an order of magnitude
estimate

* The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI
rank assuming double precision.
The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit
integers)

* You use 5 levels of coarsening, so the other operators should represent
(collectively)
2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the
communicator with 18432 ranks.
The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator
with 18432 ranks.

* You use a reduction factor of 64, making the new communicator with 288
MPI ranks.
PCTelescope will first gather a temporary matrix associated with your
coarse level operator assuming a comm size of 288 living on the comm with
size 18432.
This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288
ranks.
This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus
require another 32 MB per rank.
The temporary matrix is now destroyed.

* Because a DMDA is detected, a permutation matrix is assembled.
This requires 2 doubles per point in the DMDA.
Your coarse DMDA contains 92 x 16 x 48 points.
Thus the permutation matrix will require < 1 MB per MPI rank on the
sub-comm.

* Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting
operator will have the same memory footprint as the unpermuted matrix (32
MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held
in memory when the DMDA is provided.

>From my rough estimates, the worst case memory foot print for any given
core, given your options is approximately
2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
This is way below 8 GB.

Note this estimate completely ignores:
(1) the memory required for the restriction operator,
(2) the potential growth in the number of non-zeros per row due to Galerkin
coarsening (I wished -ksp_view_pre reported the output from MatView so we
could see the number of non-zeros required by the coarse level operators)
(3) all temporary vectors required by the CG solver, and those required by
the smoothers.
(4) internal memory allocated by MatPtAP
(5) memory associated with IS's used within PCTelescope

So either I am completely off in my estimates, or you have not carefully
estimated the memory usage of your application code. Hopefully others might
examine/correct my rough estimates

Since I don't have your code I cannot access the latter.
Since I don't have access to the same machine you are running on, I think
we need to take a step back.

[1] What machine are you running on? Send me a URL if its available

[2] What discretization are you using? (I am guessing a scalar 7 point FD
stencil)
If it's a 7 point FD stencil, we should be able to examine the memory usage
of your solver configuration using a standard, light weight existing PETSc
example, run on your machine at the same scale.
This would hopefully enable us to correctly evaluate the actual memory
usage required by the solver configuration you are using.

Thanks,
  Dave

>
>
> Frank
>
>
>
>
> On 07/08/2016 10:38 PM, Dave May wrote:
>
>
>
> On Saturday, 9 July 2016, frank <hengj...@uci.edu> wrote:
>
>> Hi Barry and Dave,
>>
>> Thank both of you for the advice.
>>
>> @Barry
>> I made a mistake in the file names in last email. I attached the correct
>> files this time.
>> For all the three tests, 'Telescope' is used as the coarse preconditioner.
>>
>> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
>> Part of the memory usage:  Vector   125124 3971904 0.
>>  Matrix   101 101
>> 9462372 0
>>
>> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>> Part of the memory usage:  Vector   125124 681672 0.
>>

[petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread Dave May

On Saturday, 9 July 2016, frank > wrote:

> Hi Barry and Dave,
>
> Thank both of you for the advice.
>
> @Barry
> I made a mistake in the file names in last email. I attached the correct
> files this time.
> For all the three tests, 'Telescope' is used as the coarse preconditioner.
>
> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
> Part of the memory usage:  Vector   125124 3971904 0.
>  Matrix   101 101
> 9462372 0
>
> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
> Part of the memory usage:  Vector   125124 681672 0.
>  Matrix   101 101
> 1462180 0.
>
> In theory, the memory usage in Test1 should be 8 times of Test2. In my
> case, it is about 6 times.
>
> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per
> process: 32*32*32
> Here I get the out of memory error.
>
> I tried to use -mg_coarse jacobi. In this way, I don't need to set
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> The linear solver didn't work in this case. Petsc output some errors.
>
> @Dave
> In test3, I use only one instance of 'Telescope'. On the coarse mesh of
> 'Telescope', I used LU as the preconditioner instead of SVD.
> If my set the levels correctly, then on the last coarse mesh of MG where
> it calls 'Telescope', the sub-domain per process is 2*2*2.
> On the last coarse mesh of 'Telescope', there is only one grid point per
> process.
> I still got the OOM error. The detailed petsc option file is attached.


Do you understand the expected memory usage for the particular parallel
LU implementation you are using? I don't (seriously). Replace LU with
bjacobi and re-run this test. My point about solver debugging is still
valid.

And please send the result of KSPView so we can see what is actually used
in the computations

Thanks
  Dave


>
>
> Thank you so much.
>
> Frank
>
>
>
> On 07/06/2016 02:51 PM, Barry Smith wrote:
>
>> On Jul 6, 2016, at 4:19 PM, frank  wrote:
>>>
>>> Hi Barry,
>>>
>>> Thank you for you advice.
>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the
>>> process mesh is 96*8*24.
>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is
>>> used as the preconditioner at the coarse mesh.
>>> The system gives me the "Out of Memory" error before the linear system
>>> is completely solved.
>>> The info from '-ksp_view_pre' is attached. I seems to me that the error
>>> occurs when it reaches the coarse mesh.
>>>
>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24.
>>> The 3rd test uses the same grid but a different process mesh 48*4*12.
>>>
>> Are you sure this is right? The total matrix and vector memory usage
>> goes from 2nd test
>>Vector   384383  8,193,712 0.
>>Matrix   103103 11,508,688 0.
>> to 3rd test
>>   Vector   384383  1,590,520 0.
>>Matrix   103103  3,508,664 0.
>> that is the memory usage got smaller but if you have only 1/8th the
>> processes and the same grid it should have gotten about 8 times bigger. Did
>> you maybe cut the grid by a factor of 8 also? If so that still doesn't
>> explain it because the memory usage changed by a factor of 5 something for
>> the vectors and 3 something for the matrices.
>>
>>
>> The linear solver and petsc options in 2nd and 3rd tests are the same in
>>> 1st test. The linear solver works fine in both test.
>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is
>>> from the option '-log_summary'. I tried to use '-momery_info' as you
>>> suggested, but in my case petsc treated it as an unused option. It output
>>> nothing about the memory. Do I need to add sth to my code so I can use
>>> '-memory_info'?
>>>
>> Sorry, my mistake the option is -memory_view
>>
>>Can you run the one case with -memory_view and -mg_coarse jacobi
>> -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory
>> is used without the telescope? Also run case 2 the same way.
>>
>>Barry
>>
>>
>>
>> In both tests the memory usage is not large.
>>>
>>> It seems to me that it might be the 'telescope'  preconditioner that
>>> allocated a lot of memory and caused the error in the 1st test.
>>> Is there is a way to show how much memory it allocated?
>>>
>>> Frank
>>>
>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>>
Frank,

  You can run with -ksp_view_pre to have it "view" the KSP before
 the solve so hopefully it gets that far.

   Please run the problem that does fit with -memory_info when the
 problem completes it will show the "high water mark" for PETSc allocated
 memory and total memory used. We first want to look at these numbers to see

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-07 Thread Dave May

Hi Frank,

On 6 July 2016 at 00:23, frank  wrote:

> Hi,
>
> I am using the CG ksp solver and Multigrid preconditioner  to solve a
> linear system in parallel.
> I chose to use the 'Telescope' as the preconditioner on the coarse mesh
> for its good performance.
> The petsc options file is attached.
>
> The domain is a 3d box.
> It works well when the grid is  1536*128*384 and the process mesh is
> 96*8*24. When I double the size of grid and keep the same process mesh and
> petsc options, I get an "out of memory" error from the super-cluster I am
> using.
>

When you increase the mesh resolution, did you also increasing the number
of effective MG levels?
If the number of levels was held constant, then your coarse grid is
increasing in size.
I notice that you coarsest grid solver is PCSVD.
This can be become expensive as PCSVD will convert your coarse level
operator into a dense matrix and could be the cause of your OOM error.

Telescope does have to store a couple of temporary matrices, but generally
when used in the context of multigrid coarse level solves these operators
represent a very small fraction of the fine level operator.

We need to isolate if it's these temporary matrices from telescope causing
the OOM error, or if they are caused by something else (e.g. PCSVD).

> Each process has access to at least 8G memory, which should be more than
> enough for my application. I am sure that all the other parts of my code(
> except the linear solver ) do not use much memory. So I doubt if there is
> something wrong with the linear solver.
> The error occurs before the linear system is completely solved so I don't
> have the info from ksp view. I am not able to re-produce the error with a
> smaller problem either.
> In addition,  I tried to use the block jacobi as the preconditioner with
> the same grid and same decomposition. The linear solver runs extremely slow
> but there is no memory error.
>
> How can I diagnose what exactly cause the error?
>

This going to be kinda hard as I notice your configuration uses nested
calls to telescope.
You need to debug the solver configuration.

The only way I know to do this is by invoking telescope one step at a time.
By this I mean, use telescope once, check the configuration is what you
want.
 Then add the next instance of telescope.
For solver debugging  purposes, get rid of PCSVD.
The constant null space is propagated with telescope so you can just use an
iterative method.
Furthermore, for debugging purposes, you don't care about the solve time or
even convergence, so set -ksp_max_it 1 everywhere in your solver stack
(e.g. outer most KSP and on the coarsest level).

If one instance of telescope works, e.g. no OOM error occurs, add the next
instance of telescope.
If two instance of telescope also works (no OOM), revert back to PCSVD.
If now you have an OOM error, you should consider adding more levels, or
getting rid of PCSVD as your coarse grid solver.

Lastly, the option

-repart_da_processors_x 24

has been depreciated.
It now inherits the prefix from the solver running on the sub-communicator.
For your use case, it should this be something like
  -mg_coarse_telescope_repart_da_processors_x 24
Use -options_left 1 to verify the option is getting picked up (another
useful tool for solver config debugging).

Cheers
  Dave

> Thank you so much.
>
> Frank
>

Re: [petsc-users] reusing matrix created with MatCreateMPIAIJWithSplitArrays

2016-06-30 Thread Dave May

On Thursday, 30 June 2016, Hassan Raiesi 
wrote:

> Hello,
>
>
>
> We are using PETSC in our CFD code, and  noticed that using
> “MatCreateMPIAIJWithSplitArrays” is almost 60% faster for large problem
> size (i.e DOF > 725M, using GAMG each time-step only takes 5sec, compared
> to 8.3 sec when assembling the matrix one row at a time using
> matsetvaluesblocked()  as recommended).
>
>
>
> The problem is that the memory usage goes up after each call to
> MatCreateMPIAIJWithSplitArrays  to update the matrix values.
>

Did you call MatDestroy() at each time step?

> As MatCreateMPIAIJWithSplitArrays is not supposed to copy the values, do
> we need to call it each time to update the values? We tried to just update
> the values of the diagonal and off-diagonal part of the arrays passed to
> “MatCreateMPIAIJWithSplitArrays”, (the sparsity structure is fixed) but it
> looks like that the values are not updated, what is the proper way to
> update the values of the matrix created by MatCreateMPIAIJWithSplitArrays?
>

The fool proof way would be to simply call  MatCreateMPIAIJWithSplitArrays()
and MatDestroy() at each time step (or whenever values are changed). Since
no data is actually copied, the overhead of creating the mpiaij matrix will
be minimal

Thanks,
  Dave

>
>
>
>
> Thank you
>
>
>
> *Hassan Raiesi, *
>
> Advanced Aerodynamics Department
>
> Bombardier Aerospace
>
>
>
> hassan.rai...@aero.bombardier.com
> 
>
>
>
> *2351 boul. Alfred-Nobel (BAN1)*
>
> *Ville Saint-Laurent, Québec, H4S 2A9*
>
>
>
>
>
>
>
> Tél.
>
>   514-855-5001# 62204
>
>
>
>
>
>
>
>
>
>
>
> *CONFIDENTIALITY NOTICE* - This communication may contain privileged or
> confidential information.
> If you are not the intended recipient or received this communication by
> error, please notify the sender
> and delete the message without copying, forwarding and/or disclosing it.
>
>
>
>
>
>
>

Re: [petsc-users] How to have a local copy (sequential) of a parallel matrix

2016-06-29 Thread Dave May

On Wednesday, 29 June 2016, ehsan sadrfaridpour  wrote:

> I faced the below error during compiling my code for using
> MatGetSubMatrices.
>
> error: cannot convert ‘IS {aka _p_IS*}’ to ‘_p_IS* const*’ for argument
>> ‘3’ to ‘PetscErrorCode MatGetSubMatrices(Mat, PetscInt, _p_IS* const*,
>> _p_IS* const*, MatReuse, _p_Mat***)’
>>  MatGetSubMatrices(m_WA_norm_T, 1, set, set, MAT_INITIAL_MATRIX,
>> _local_W);
>>
>
> My code :
>
>> PetscMPIIntrank;
>> MPI_Comm_rank(PETSC_COMM_WORLD, );
>>
>> if(rank ==0){
>> Mat m_local_W;
>> MatCreateSeqAIJ(PETSC_COMM_SELF,num_points,num_points, num_nz,
>> NULL,_local_W);// try to reserve space for only number of final non zero
>> entries for each fine node (e.g. 4)
>> IS set;
>> ISCreateStride(PETSC_COMM_SELF, num_points, 0, 1, _row);
>> MatGetSubMatrices(m_WA_norm_T, 1, set_row, set_col,
>> MAT_INITIAL_MATRIX, _local_W);
>>
>> }
>>
>
> I followed below example:
>
> http://www.mcs.anl.gov/petsc/petsc-current/src/vec/is/is/examples/tutorials/ex2.c.html
>

This code won't work in parallel.
The man page says this function is collective on Mat. You need to move the
call to MatGetSubMatrices outside of the if(rank==0) loop.



>
>
>
>
>
> On Wed, Jun 29, 2016 at 3:19 PM, ehsan sadrfaridpour  > wrote:
>
>> Thanks a lot for great support.
>>
>> On Wed, Jun 29, 2016 at 3:11 PM, Barry Smith > > wrote:
>>
>>>
>>>MatGetSubmatrices() just have the first process request all the rows
>>> and columns and the others request none. You can use ISCreateStride() to
>>> create the ISs without having to make an array of all the indices.
>>>
>>>
>>> > On Jun 29, 2016, at 1:43 PM, ehsan sadrfaridpour >> > wrote:
>>> >
>>> > Hi,
>>> >
>>> > I need to have access to most of elements of a parallel MPIAIJ matrix
>>> only from 1 process (rank 0).
>>> > I tried to copy or duplicate it to SEQAIJ, but I faced problems.
>>> >
>>> > How can I have a local copy of a matrix which is distributed on
>>> multiple process? I don't want to update the matrix, and the read-only
>>> version of it would be enough.
>>> >
>>> > Best,
>>> > Ehsan
>>> >
>>> >
>>>
>>>
>>
>

Re: [petsc-users] Tuning performance for simple solver

2016-06-03 Thread Dave May

On 3 June 2016 at 11:37, Michael Becker <
michael.bec...@physik.uni-giessen.de> wrote:

> Dear all,
>
> I have a few questions regarding possible performance enhancements for the
> PETSc solver I included in my project.
>
> It's a particle-in-cell plasma simulation written in C++, where Poisson's
> equation needs to be solved repeatedly on every timestep.
> The simulation domain is discretized using finite differences, so the
> solver therefore needs to be able to efficiently solve the linear system A
> x = b successively with changing b. The solution x of the previous timestep
> is generally a good initial guess for the solution.
>
> I wrote a class PETScSolver that holds all PETSc objects and necessary
> information about domain size and decomposition. To solve the linear
> system, two arrays, 'phi' and 'charge', are passed to a member function
> solve(), where they are copied to PETSc vectors, and KSPSolve() is called.
> After convergence, the solution is then transferred again to the phi array
> so that other program parts can use it.
>
> The matrix is created using DMDA. An array 'bound' is used to determine
> whether a node is either a Dirichlet BC or holds a charge.
>
> I attached three files, petscsolver.h, petscsolver.cpp and main.cpp, that
> contain a shortened version of the solver class and a set-up to initialize
> and run a simple problem.
>
> Is there anything I can change to generally make the program run faster?
>

Before changing anything, you should profile your code to see where time is
being spent.

To that end, you should compile an optimized build of petsc, link it to you
application and run your code with the option -log_summary. The
-log_summary flag will generate a performance profile of specific
functionality within petsc (KSPSolve, MatMult etc) so you can see where all
the time is being spent.

As a second round of profiling, you should consider registering specific
functionality in your code you think is performance critical.
You can do this using the function PetscLogStageRegister()

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html

Check out the examples listed at the bottom of this web page to see how to
log stages. Once you've registered stages, these will appear in the report
provided by -log_summary

Thanks,
  Dave


And, since I'm rather unexperienced with KSP methods, how do I efficiently
> choose PC and KSP? Just by testing every combination?
> Would multigrid be a viable option as a pure solver (-ksp_type preonly)?
>
> Thanks,
> Michael
>

Re: [petsc-users] use of VecGetArray() with dof>1 (block size > 1)

2016-05-27 Thread Dave May

Hi Ed,

On 28 May 2016 at 00:18, Ed Bueler <elbue...@alaska.edu> wrote:

> Dave --
>
> You are right that I had a cut-paste-edit error in my VecGetArray()
> invocation in the email.   Sorry about that.  I should of cut and pasted
> from my functioning, and DMDA-free, code.
>
> In this context I meant
>
> Vec v;
> ...  // create or load the Vec
> Node *u;
> VecGetArray(v,(PetscScalar**));
>
> This *is* correct, and it works fine in the right context, without memory
> leaks.  I am *not* using a DMDA in this case.  At all.
>

Ah okay.

>
> My original quote from the PETSc User Manual should be read *as is*,
> however.  It *does* refer to DMDAVecGetArray().  And you are right that
> *it* has an error: should be [j][i] not [i][j].
>

Oh crap, that indexing error is in the manual (multiple times)!
Yikes.

> And DMDAVecGetArray() does a cast like the above, but internally.
>
> Again, my original question was about distinquishing/explaining the
> different types of the returned pointers from DMDAVecGetArray() and
> VecGetArray().
>

Okay.

VecGetArray() knows nothing about any DM. It only provides access to the
underlying entries in the vector which are PetscScalar's. So the last arg
is naturally PetscScalar**.

DMDAVecGet{Array,ArrayDOF}() exploits the DMDA ijk structure.
DMDAVecGet{Array,ArrayDOF}() exist solely for the convenience of the user
who wants to write an FD code, and wishes to express the discrete operators
with multi-dimensional arrays, e.g. which can be indexed like x[j+1][i-1].

DMDAVecGetArray() maps entries from a Vec to a users data type, which can
be indexed with the dimension of the DMDA (ijk). Since the dimension of the
DMDA can be 1,2,3 and the blocksize (e.g. number of members in your user
struct) is defined by the user - the last arg must be void*.

DMDAVecGetArrayDOF() maps entries from a Vec to multi-dimensional array
indexed by the dimension of the DMDA AND the number of fields  (defined via
the block size). Since the dimension of the DMDA can be 1,2 or 3, again,
the last arg must be void* unless a separate API is introduced for 1D, 2D
and 3D.

Why do both exist? Well, Jed provided one reasons - the block size may be a
runtime choice and thus the definition of the struct cannot be changed at
runtime. Another reason could be the user just doesn't think it is useful
to attach names (i.e. though members in a struct) to their DOFs - hence
they want DMDAVecGetArrayDOF(). This could arise if you used the DMDA to
describe a DG discretization. The DOFs would then just represent
coefficients associated with your basis function. Maybe the user just
prefers to write out loops.

I see DMDAVec{Get,Restore}XXX as tools to help the user.
The user can pick the API they prefer.

I use the DMDA, but I always use plain old VecGetArray() and obtain the
result in a variable declared as PetscScalar*. I don't bother with mapping
the entries into a struct. I see no advantage to this in terms of code
clarity. I prefer not to use DMDAVecGetArray() and DMDAVecGetArrayDOF() as
these methods allocate additional memory.

In my opinion, the line you refer to from the manual regarding
multi-component PDEs should only applied in the context of usage with the
DMDA. Others may disagree.

I hope I finally helped answered your question.

Cheers,
  Dave

>
> Ed
>
>
> On Fri, May 27, 2016 at 2:47 PM, Dave May <dave.mayhe...@gmail.com> wrote:
>
>>
>>
>> On 27 May 2016 at 21:24, Ed Bueler <elbue...@alaska.edu> wrote:
>>
>>> Dave --
>>>
>>> Perhaps you should re-read my questions.
>>>
>>
>> Actually - maybe we got our wires crossed from the beginning.
>> I'm going back to the original email as I missed something.
>>
>>
>>>> """
>>>> The recommended approach for multi-component PDEs is to declare a
>>>> struct representing the fields defined
>>>> at each node of the grid, e.g.
>>>>
>>>> typedef struct {
>>>> PetscScalar u,v,omega,temperature;
>>>> } Node;
>>>>
>>>> and write residual evaluation using
>>>>
>>>> Node **f,**u;
>>>> DMDAVecGetArray(DM da,Vec local,);
>>>> DMDAVecGetArray(DM da,Vec global,);
>>>> ...
>>>>
>>>>
>>
>> 1)  The third argument to DMDAVec{Get,Restore}Array() is of type "void
>>>> *".  It makes the above convenient.  But the third argument of the
>>>> unstructured version Vec{Get,Restore}Array() is of type "PetscScalar **",
>>>> which means that in an unstructured case, with the same Node struct, I
>>>> would write
>>>> "VecGetArray(DM da,Vec local,(PetscScalar **));"
&g

Re: [petsc-users] use of VecGetArray() with dof>1 (block size > 1)

2016-05-27 Thread Dave May

On 27 May 2016 at 21:24, Ed Bueler  wrote:

> Dave --
>
> Perhaps you should re-read my questions.
>

Actually - maybe we got our wires crossed from the beginning.
I'm going back to the original email as I missed something.


>> """
>> The recommended approach for multi-component PDEs is to declare a struct
>> representing the fields defined
>> at each node of the grid, e.g.
>>
>> typedef struct {
>> PetscScalar u,v,omega,temperature;
>> } Node;
>>
>> and write residual evaluation using
>>
>> Node **f,**u;
>> DMDAVecGetArray(DM da,Vec local,);
>> DMDAVecGetArray(DM da,Vec global,);
>> ...
>>
>>

1)  The third argument to DMDAVec{Get,Restore}Array() is of type "void *".
>> It makes the above convenient.  But the third argument of the unstructured
>> version Vec{Get,Restore}Array() is of type "PetscScalar **", which means
>> that in an unstructured case, with the same Node struct, I would write
>> "VecGetArray(DM da,Vec local,(PetscScalar **));"
>> to get the same functionality.  Why is it this way?  More specifically,
>> why not have the argument to VecGetArray() be of type "void *"?
>>
>
>
Is the quoted text
  "VecGetArray(DM da,Vec local,(PetscScalar **));"
really what you meant?

Sorry I didn't spot this on the first read, but probably you meant
something else as VecGetArray() only takes two args (Vec,PetscScalar**).

This code

  Node **u;
  VecGetArray(Vec local,(PetscScalar**));

would not be correct, neither would

  Node ***u;
  VecGetArray(Vec local,(PetscScalar**));
if the DMDA was defined in 3d.



>
> I would say the reason why the last arg to VecGetArray() is not void* is
> because it is intended to give you direct access to the pointer associated
> with the entries within the vector - these are also PetscScalar's
>
>
>>
>> 2) Given that the "recommended approach"
>>
>
> I don't believe it is ever recommended anywhere to do the following:
>   VecGetArray(DM da,Vec local,(PetscScalar**))
>
> Trying to trick the compile with such a cast is just begging for memory
> corruption to occur.
>
> above works just fine, why do DMDAVec{Get,Restore}ArrayDOF() exist?  (I.e.
>> is there something I am missing about C indexing?)
>>
>
> As an additional point, DMDAVec{Get,Restore}ArrayDOF() return
>
> void *array
>
> so that the same API will work for 1D, 2D and 3D DMDA's which would
> require PetscScalar **data, PetscScalar ***data, PetscScalar data
> respectively.
>
>
> Cheers,
>   Dave
>
>
>> 3) There are parts of the PETSc API that refer to "dof" and parts that
>> refer to "block size".  Is this a systematic distinction with an underlying
>> reason?  It seems "block size" is more generic, but also it seems that it
>> could replace "dof" everywhere.
>>
>> Thanks for addressing silly questions.
>>
>> Ed
>>
>>
>> --
>> Ed Bueler
>> Dept of Math and Stat and Geophysical Institute
>> University of Alaska Fairbanks
>> Fairbanks, AK 99775-6660
>> 301C Chapman and 410D Elvey
>> 907 474-7693 and 907 474-7199  (fax 907 474-5394)
>>
>
>


-- 
Ed Bueler
Dept of Math and Stat and Geophysical Institute
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
301C Chapman and 410D Elvey
907 474-7693 and 907 474-7199  (fax 907 474-5394)

Re: [petsc-users] use of VecGetArray() with dof>1 (block size > 1)

2016-05-27 Thread Dave May

On 27 May 2016 at 20:34, Ed Bueler  wrote:

> Dear PETSc --
>
> This is an "am I using it correctly" question.  Probably the API has the
> current design because of something I am missing.
>
> First, a quote from the PETSc manual which I fully understand; it works
> great and gives literate code (to the extent possible...):
>
> """
> The recommended approach for multi-component PDEs is to declare a struct
> representing the fields defined
> at each node of the grid, e.g.
>
> typedef struct {
> PetscScalar u,v,omega,temperature;
> } Node;
>
> and write residual evaluation using
>
> Node **f,**u;
> DMDAVecGetArray(DM da,Vec local,);
> DMDAVecGetArray(DM da,Vec global,);
> ...
> f[i][j].omega = ...
>

Note that here the indexing should be
  f[ *j* ][ *i* ].omega



> ...
> DMDAVecRestoreArray(DM da,Vec local,);
> DMDAVecRestoreArray(DM da,Vec global,);
> """
>
> Now the three questions:
>
> 1)  The third argument to DMDAVec{Get,Restore}Array() is of type "void
> *".  It makes the above convenient.  But the third argument of the
> unstructured version Vec{Get,Restore}Array() is of type "PetscScalar **",
> which means that in an unstructured case, with the same Node struct, I
> would write
> "VecGetArray(DM da,Vec local,(PetscScalar **));"
> to get the same functionality.  Why is it this way?  More specifically,
> why not have the argument to VecGetArray() be of type "void *"?
>

I would say the reason why the last arg to VecGetArray() is not void* is
because it is intended to give you direct access to the pointer associated
with the entries within the vector - these are also PetscScalar's


>
> 2) Given that the "recommended approach"
>

I don't believe it is ever recommended anywhere to do the following:
  VecGetArray(DM da,Vec local,(PetscScalar**))

Trying to trick the compile with such a cast is just begging for memory
corruption to occur.

above works just fine, why do DMDAVec{Get,Restore}ArrayDOF() exist?  (I.e.
> is there something I am missing about C indexing?)
>

As an additional point, DMDAVec{Get,Restore}ArrayDOF() return

void *array

so that the same API will work for 1D, 2D and 3D DMDA's which would require
PetscScalar **data, PetscScalar ***data, PetscScalar data respectively.


Cheers,
  Dave


> 3) There are parts of the PETSc API that refer to "dof" and parts that
> refer to "block size".  Is this a systematic distinction with an underlying
> reason?  It seems "block size" is more generic, but also it seems that it
> could replace "dof" everywhere.
>
> Thanks for addressing silly questions.
>
> Ed
>
>
> --
> Ed Bueler
> Dept of Math and Stat and Geophysical Institute
> University of Alaska Fairbanks
> Fairbanks, AK 99775-6660
> 301C Chapman and 410D Elvey
> 907 474-7693 and 907 474-7199  (fax 907 474-5394)
>

Re: [petsc-users] accessing DMDA Vec ghost values

2016-05-12 Thread Dave May

Matt beat me to the punch... :D
Anyway, here is my more detailed answer.


> Thanks!  Somehow I missed DM{Get,Create}LocalVector().  BTW what is the
> difference between the Get and Create versions?  It is not obvious from the
> documentation.
>

The DMDA contains a pool of vectors (both local and global) which can be
re-used by the user. This avoids the need to continually allocate and
deallocate memory. Thus, the Get methods are simply an optimization.

The Get methods retrieve from the pool, a vector which isn't currently in
use. In this case, You can think of Restore as returning the vector back to
the pool to be used somewhere else.
If all vectors in the pool are in use, a new one will be allocated for you.
In this case, Restore will actually deallocate memory.

Since Get methods may return vectors which have been used else where in the
code, you should always call VecZeroEntries() on them.

The Create methods ALWAYS allocate new memory and thus you ALWAYS need to
call Destroy on them. Vectors obtained via VecCreate() will always be
initialized with 0's.


Thanks,
  Dave


>
> Also, can you explain the difference between DMDAVecGetArrayDOF and
> DMDAVecGetArrayDOFRead?
>
> Thanks again,
> Sean
>
>
>
> Thanks,
>   Dave
>
>
>
>>
>> Thanks very much!
>>
>> Sean Dettrick
>>
>>
>

Re: [petsc-users] DMDAGetAO and AODestroy

2016-05-12 Thread Dave May

On 12 May 2016 at 13:01, Miorelli, Federico <federico.miore...@cgg.com>
wrote:

> Dave,
>
>
>
> Thanks for your answer.
>
> For consistency with otehr PETSc routines it would perhaps make sense to
> create a DMDARestoreAO function?
>

Not really. The pattern used here is the same as

DMGetCoordinateDM()
DMGetCoordinates()

etc

I agree it's not always immediately obvious whether
one should call destroy on the object returned.

The best rule I can suggest to follow is that if the man page doesn't
explicitly instruct you to call the destroy method, you should not call
destroy. If a destroy is required, there will be a note in the man page
indicating this, for example

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPChebyshevEstEigGetKSP.html

or

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMCompositeGetGlobalISs.html


The man pages are not 100% consistent:
Sometimes they will say "don't call destroy on object XXX as it is
used internally by YYY".
Other times it will mention the reference counter has been incremented.
Other times nothing is stated (implicitly meaning no destroy is required).


If in doubt, just email the petsc-users list :D

Thanks,
  Dave





>
>
> Regards,
>
> Federico
>
>
>
> *__* *__* *__*
>
> Federico Miorelli
>
>
>
> Senior R Geophysicist
>
> *Subsurface Imaging - General Geophysics **Italy*
>
>
>
> CGG Electromagnetics (Italy) Srl
>
>
>
>
>
> *From:* Dave May [mailto:dave.mayhe...@gmail.com]
> *Sent:* giovedì 12 maggio 2016 13:03
> *To:* Miorelli, Federico
> *Cc:* petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] DMDAGetAO and AODestroy
>
>
>
>
>
>
>
> On 12 May 2016 at 11:36, Miorelli, Federico <federico.miore...@cgg.com>
> wrote:
>
> In one of my subroutines I'm calling DMDAGetAO to get the application
> ordering from a DM structure.
>
> After using it I was calling AODestroy.
>
>
>
> Everything worked fine until I called the subroutine for the second time,
> when the program crashed.
>
> Removing the call to AODestroy solved the crash.
>
>
>
> Am I supposed to AODestroy the output of DMDAGetAO or not? I was worried
> that DMDAGetAO would allocate memory that I need to release.
>
>
>
> You are not supposed to call AODestroy() on the AO returned.
>
> The pointer being returned is used internally by the DMDA.
>
> Thanks
>
>   Dave
>
>
>
>
>
>
>
> Thanks,
>
>
>
> Federico
>
>
>
> *__* *__* *__*
>
> Federico Miorelli
>
>
>
> Senior R Geophysicist
>
> *Subsurface Imaging - General Geophysics **Italy*
>
>
>
> CGG Electromagnetics (Italy) Srl
>
>
> *This email and any accompanying attachments are confidential. If you
> received this email by mistake, please delete it from your system. Any
> review, disclosure, copying, distribution, or use of the email by others is
> strictly prohibited.*
>
>
>
>
> *This email and any accompanying attachments are confidential. If you
> received this email by mistake, please delete it from your system. Any
> review, disclosure, copying, distribution, or use of the email by others is
> strictly prohibited.*
>

Re: [petsc-users] DMDAGetAO and AODestroy

2016-05-12 Thread Dave May

On 12 May 2016 at 11:36, Miorelli, Federico 
wrote:

> In one of my subroutines I'm calling DMDAGetAO to get the application
> ordering from a DM structure.
>
> After using it I was calling AODestroy.
>
>
>
> Everything worked fine until I called the subroutine for the second time,
> when the program crashed.
>
> Removing the call to AODestroy solved the crash.
>
>
>
> Am I supposed to AODestroy the output of DMDAGetAO or not? I was worried
> that DMDAGetAO would allocate memory that I need to release.
>

You are not supposed to call AODestroy() on the AO returned.
The pointer being returned is used internally by the DMDA.

Thanks
  Dave



>
>
> Thanks,
>
>
>
> Federico
>
>
>
> *__* *__* *__*
>
> Federico Miorelli
>
>
>
> Senior R Geophysicist
>
> *Subsurface Imaging - General Geophysics **Italy*
>
>
>
> CGG Electromagnetics (Italy) Srl
>
>
> *This email and any accompanying attachments are confidential. If you
> received this email by mistake, please delete it from your system. Any
> review, disclosure, copying, distribution, or use of the email by others is
> strictly prohibited.*
>

Re: [petsc-users] accessing DMDA Vec ghost values

2016-05-12 Thread Dave May

On 12 May 2016 at 10:42, Sean Dettrick  wrote:

> Hi,
>
> When discussing DMDAVecGetArrayDOF etc in section 2.4.4,  the PETSc 3.7
> manual says "The array is accessed using the usual global indexing on the
> entire grid, but the user may only refer to the local and ghost entries
> of this array as all other entries are undefined”.
>
> OK so far.  But how to access the ghost entries?
>
> With a 2D DMDA, I can do this OK:
>
>
> PetscIntxs,xm,ys,ym;
>
> ierr=DMDAGetCorners(da,,,0,,,0);CHKERRQ(ierr);
>
> PetscScalar ***es;
>
> ierr=DMDAVecGetArrayDOF(da,Es,);CHKERRQ(ierr);
>
>
> for (int j=ys; j < ys+ym; j++) {
>
> for (int i=xs; i < xs+xm;i++) {
>
> es[j][i][0]=1.;
>
> es[j][i][1]=1.;
>
> }
>
> }
>
> ierr=DMDAVecRestoreArrayDOF(da,Es,);CHKERRQ(ierr);
>
> But if I replace DMDAGetCorners with DMDAGetGhostCorners, then the code
> crashes with a seg fault, presumably due to out of bounds memory access.
>
> Is that supposed to happen?
>

If you created the vector Es using
the function DM{Get,Create}GlobalVector(), then the answer is yes.


> What’s the remedy?
>

If you want to access the ghost entries, you need to create the vector
using the function DM{Get,Create}LocalVector().

Thanks,
  Dave



>
> Thanks very much!
>
> Sean Dettrick
>
>

Re: [petsc-users] a crash due to memory issue

2016-04-30 Thread Dave May

On 30 April 2016 at 16:04, Ilyas YILMAZ  wrote:

> Hello,
>
> The code segment I wrote based on "src/dm/da/examples/tutorials/ex2.c"
> crashes when destroying things / freeing memory as given below.
> I can't figure out what I'm missing? Any comments are welcome. (Petsc
> 3.1.p8,
>

I'd strongly suggest updating to version 3.7 before proceeding any further.


> intel fortran compiler). The code and error are below.
>
> !...Create scatter from global DA parallel vector to local vector
> that contains all entries
> call DAGlobalToNaturalAllCreate(da,tolocalall,ierr)
> call DAGlobalToNaturalAllCreate(da,tolocalall2,ierr)
> call DAGlobalToNaturalAllCreate(da,tolocalall3,ierr)
>
> call VecCreateSeq(PETSC_COMM_SELF,im*jm*km,CSRENOMlocalall,ierr)
> call
> VecScatterBegin(tolocalall,CSRENOM,CSRENOMlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> tolocalall,CSRENOM,CSRENOMlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecCreateSeq(PETSC_COMM_SELF,im*jm*km,CSDENOMlocalall,ierr)
> call
> VecScatterBegin(tolocalall2,CSDENOM,CSDENOMlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> tolocalall2,CSDENOM,CSDENOMlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecCreateSeq(PETSC_COMM_SELF,im*jm*km,CSlocalall,ierr)
> call
> VecScatterBegin(tolocalall3,CS,CSlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> tolocalall3,CS,CSlocalall,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
>
> call
> VecGetArray(CSRENOMlocalall,scaCSRENOMlocalall,idCSRENOMlocalall,ierr)
> call
> VecGetArray(CSDENOMlocalall,scaCSDENOMlocalall,idCSDENOMlocalall,ierr)
> call VecGetArray( CSlocalall,scaCSlocalall
> ,idCSlocalall ,ierr)
>
>
>   SOME WORK HERE
>
>
> call VecRestoreArray(CSlocalall ,scaCSlocalall
> ,idCSlocalall,ierr)
> call
> VecRestoreArray(CSRENOMlocalall,scaCSRENOMlocalall,idCSRENOMlocalall,ierr)
> call
> VecRestoreArray(CSDENOMlocalall,scaCSDENOMlocalall,idCSDENOMlocalall,ierr)
>
>
> !...scatter back to global vector
> call DANaturalAllToGlobalCreate(da,fromlocalall,ierr)
> call DANaturalAllToGlobalCreate(da,fromlocalall2,ierr)
> call DANaturalAllToGlobalCreate(da,fromlocalall3,ierr)
>
> call
> VecScatterBegin(fromlocalall,CSRENOMlocalall,CSRENOM,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> fromlocalall,CSRENOMlocalall,CSRENOM,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call
> VecScatterBegin(fromlocalall2,CSDENOMlocalall,CSDENOM,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> fromlocalall2,CSDENOMlocalall,CSDENOM,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call
> VecScatterBegin(fromlocalall3,CSlocalall,CS,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> call VecScatterEnd(
> fromlocalall3,CSlocalall,CS,INSERT_VALUES,SCATTER_FORWARD,ierr)
>
> !..free memory
>call VecScatterDestroy(fromlocalall,ierr)
>call VecScatterDestroy(fromlocalall2,ierr)
>call VecScatterDestroy(fromlocalall3,ierr)
>
>   call VecScatterDestroy(tolocalall,ierr)
>   call VecScatterDestroy(tolocalall2,ierr)
>   call VecScatterDestroy(tolocalall3,ierr)
>
>   call VecDestroy(CSDENOMlocalall,ierr)
>   call VecDestroy(CSRENOMlocalall,ierr)
>   call VecDestroy(CSlocalall,ierr)
>
>
> .
> .
> .
> .
> .
> .
>  .. SGS model is called
>  MUT computed
>  MUT computed
>  MUT computed
>  MUT computed
> [1]PETSC ERROR:
> 
> [1]PETSC ERROR: [0]PETSC ERROR:
> 
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find
> memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> .
> .
> .
> .
> .
>
> Thank you
>
> Ilyas.
>

Re: [petsc-users] Allocating memory for off-diagonal matrix blocks when using DMComposite

2016-04-27 Thread Dave May

> It is also inconsistent with some of the other DMShellSetXXX() as some
> type check the type (e.g. DMShellSetCreateMatrix), whilst some do, e.g.
> DMShellSetCoarsen().
>
>
ooops - I meant some setters DO check types and some DON'T

DMShellSetCreateMatrix() --> doesn't check the type
DMShellSetCoarsen() --> DOES check the type

Re: [petsc-users] Allocating memory for off-diagonal matrix blocks when using DMComposite

2016-04-27 Thread Dave May

On 27 April 2016 at 22:49, Jed Brown <j...@jedbrown.org> wrote:

> Dave May <dave.mayhe...@gmail.com> writes:
> > This always bugged me.
> > I prefer to access the pointer as at least it's clear what I am doing and
> > when reading the code later, I am not required to ask myself whether the
> DM
> > is actually a shell or not.
> >
> > Why doesn't there exist a generic setter for each object which allows one
> > to set a method for a particular operation?
>
> You're just objecting to the function being named DMShellSetCreateMatrix
> instead of DMSetCreateMatrix?
>

Absolutely!

Its inconsistent with nearly all other PETSc methods.
With all other methods, this function would only take affect if the DM was
of type shell, e.g through using either PetscTryMethod(), or by using
PetscObjectTpeCompare(). This is way it is done in PCShell.

It is also inconsistent with some of the other DMShellSetXXX() as some type
check the type (e.g. DMShellSetCreateMatrix), whilst some do, e.g.
DMShellSetCoarsen().

>
> > The implementation for Mat defines typedef enum { } MatOperation.
> > Using this, we could have
> >   PetscErrorCode MatSetOperation(Mat mat,MatOperation op,z)
> >
> > If there was a similar typedef enum for all other objects, an
> > XXXSetOperation() would be viable.
>
> The MatOperation enums are a bit of a maintenance burden and don't offer
> any type checking.  We have them for Mat and Vec (sort of) because there
> are so many methods.  We could add them for other objects, but would
> give up type checking relative to the existing specialized functions.
>

Agreed, we shouldn't give up on type checking. But type checking is ignored
with MatShellSetOperation()... This function seems to do approximately what
I proposed. I'm I could autogenerate the functions from the op structure so
it wouldn't be a huge burden (just a lot of code).

Could we at least make the (i) XXXShell objects APIs consistent with each
other in terms of how methods/operations are set, (ii) change all the shell
setter to only take affect if the type was "shell" and (iii) add type
checked setters for Mat, PC and DM (which appear to be the only methods
with shell implementations).

I'd be happy to do this is everyone approves.

Cheers
  Dave

Re: [petsc-users] Allocating memory for off-diagonal matrix blocks when using DMComposite

2016-04-26 Thread Dave May

On 26 April 2016 at 23:58, Jed Brown <j...@jedbrown.org> wrote:

> Dave May <dave.mayhe...@gmail.com> writes:
> > You are always free to over-ride the method
> >   dm->ops->creatematrix
> > with your own custom code to create
> > and preallocate the matrix.
>
> DMShellSetCreateMatrix()
>
> No need to include the private header.  I know this isn't great.
>

This always bugged me.
I prefer to access the pointer as at least it's clear what I am doing and
when reading the code later, I am not required to ask myself whether the DM
is actually a shell or not.

Why doesn't there exist a generic setter for each object which allows one
to set a method for a particular operation?

The implementation for Mat defines typedef enum { } MatOperation.
Using this, we could have
  PetscErrorCode MatSetOperation(Mat mat,MatOperation op,z)

If there was a similar typedef enum for all other objects, an
XXXSetOperation() would be viable.

Is there a good reason to not have such a setter in the library?

Thanks,
  Dave

Re: [petsc-users] Allocating memory for off-diagonal matrix blocks when using DMComposite

2016-04-26 Thread Dave May

On 26 April 2016 at 16:50, Gautam Bisht  wrote:

> I want to follow up on this old thread. If a user knows the exact fill
> pattern of the off-diagonal block (i.e. d_nz+o_nz  or d_nnz+o_nnz ), can
> one still not preallocate memory for the off-diagonal matrix when using
> DMComposite?
>

You are always free to over-ride the method
  dm->ops->creatematrix
with your own custom code to create
and preallocate the matrix.

You will need to add
  #include 
into your source file to expose the contents of the DM object.




>
> -Gautam.
>
> On Tue, Sep 30, 2014 at 3:48 PM, Jed Brown  wrote:
>
>> Gautam Bisht  writes:
>>
>> > Hi,
>> >
>> > The comment on line 419 of SNES ex 28.c
>> > <
>> http://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex28.c.html
>> >
>> > says
>> > that the approach used in this example is not the best way to allocate
>> > off-diagonal blocks. Is there an example that shows a better way off
>> > allocating memory for off-diagonal matrix blocks when using DMComposite?
>>
>> The problem is that the allocation interfaces specialize on the matrix
>> type.  Barry wrote a DMCompositeSetCoupling(), but there are no
>> examples.  This is something that PETSc needs to improve.  I have been
>> unsuccessful at conceiving a _simple_ yet flexible/extensible solution.
>>
>
>

Re: [petsc-users] Understanding the -ksp_monitor_true_residual output

2016-04-12 Thread Dave May

On 12 April 2016 at 15:39, Aulisa, Eugenio  wrote:

> Hi,
>
> I am trying to understand better the meaning of
> the output obtained using the option
>
> -ksp_monitor_true_residual
>
> For this particular ksp gmres solver I used
>
> KSPSetTolerances( ksp, 1.0e-4, 1.0e-20, 1.0e50, 60);
>
> Below I am reporting the output for 5 different solutions
>
> In each of them the convergence reason is 2, the relative tolerance.
>
> If I well understood the manual I would expect the solver to exit when the
> output of the third column ||r(i)||/||b|| would fall below 1.0e-04, but it
> is not.
>

By default, GMRES uses left preconditioning,
thus iterations are terminated when || P^{-1} r_i || / || P^{-1} b || <
1e-4
With GMRES, you can change the side of the preconditioning via
  -ksp_pc_side right
(note that not all methods support both left and right preconditioning)

FGMRES and GCR use right preconditioning by default


Thanks,
  Dave

Only in the last solution this is the case.
> So probably I did not understood well the manual, and there is a rescaling
> involved.
>
> Can anybody clarify?
>
> Thanks,
> Eugenio Aulisa
>
>
>
>* solution 1 *
>   0 KSP preconditioned resid norm 3.410562043865e+01 true resid norm
> 2.677893657847e-02 ||r(i)||/||b|| 3.993927934126e+00
>   1 KSP preconditioned resid norm 5.824255715066e+00 true resid norm
> 1.393228808527e-02 ||r(i)||/||b|| 2.077922489826e+00
>   2 KSP preconditioned resid norm 2.044834259922e+00 true resid norm
> 9.998998076659e-03 ||r(i)||/||b|| 1.491294384100e+00
>   3 KSP preconditioned resid norm 7.983786718994e-01 true resid norm
> 7.993995927780e-03 ||r(i)||/||b|| 1.192259578632e+00
>   4 KSP preconditioned resid norm 5.088731995172e-01 true resid norm
> 8.593150583543e-03 ||r(i)||/||b|| 1.281620129208e+00
>   5 KSP preconditioned resid norm 4.857006661570e-01 true resid norm
> 9.145941228163e-03 ||r(i)||/||b|| 1.364065748018e+00
>   6 KSP preconditioned resid norm 4.709624056986e-01 true resid norm
> 8.186977687270e-03 ||r(i)||/||b|| 1.221041723798e+00
>   7 KSP preconditioned resid norm 3.220045108449e-01 true resid norm
> 4.352636436074e-03 ||r(i)||/||b|| 6.491712692992e-01
>   8 KSP preconditioned resid norm 2.317017366518e-01 true resid norm
> 3.068984741076e-03 ||r(i)||/||b|| 4.577218311441e-01
>   9 KSP preconditioned resid norm 1.910797150631e-01 true resid norm
> 2.564718555990e-03 ||r(i)||/||b|| 3.825133628412e-01
>  10 KSP preconditioned resid norm 1.729613253747e-01 true resid norm
> 2.555884137267e-03 ||r(i)||/||b|| 3.811957589244e-01
>  11 KSP preconditioned resid norm 1.708323617518e-01 true resid norm
> 2.699641560429e-03 ||r(i)||/||b|| 4.026363708929e-01
>  12 KSP preconditioned resid norm 1.606003434286e-01 true resid norm
> 2.148175491821e-03 ||r(i)||/||b|| 3.203883051534e-01
>  13 KSP preconditioned resid norm 1.205953154320e-01 true resid norm
> 1.319733567910e-03 ||r(i)||/||b|| 1.968308467752e-01
>  14 KSP preconditioned resid norm 6.473668392252e-02 true resid norm
> 7.750007977497e-04 ||r(i)||/||b|| 1.155870146685e-01
>  15 KSP preconditioned resid norm 4.210976512647e-02 true resid norm
> 3.627137178803e-04 ||r(i)||/||b|| 5.409671312704e-02
>  16 KSP preconditioned resid norm 2.167079981806e-02 true resid norm
> 1.885937330161e-04 ||r(i)||/||b|| 2.812769567181e-02
>*** MG linear solver time: 1.710646e+01
>*** Number of outer ksp solver iterations = 16
>*** Convergence reason = 2
>*** Residual norm =  0.0216708
>
>
>
>
>
>* solution 2 *
>* Level Max 4 MG PROJECTION MATRICES TIME:   1.199155
>* Level Max 4 MGINIT TIME:   0.691589
>* Level Max 4 ASSEMBLY TIME: 11.470600
>  * Linear Cycle + Residual Update iteration 1
>*** Linear iteration 1 ***
>   0 KSP preconditioned resid norm 5.264510580035e+01 true resid norm
> 4.446470526095e-03 ||r(i)||/||b|| 1.343097318343e+00
>   1 KSP preconditioned resid norm 1.194632734776e+00 true resid norm
> 2.650838633150e-03 ||r(i)||/||b|| 8.007101899472e-01
>   2 KSP preconditioned resid norm 2.514505382950e-01 true resid norm
> 8.961867887367e-04 ||r(i)||/||b|| 2.707014621199e-01
>   3 KSP preconditioned resid norm 4.126701684642e-02 true resid norm
> 2.828561525607e-04 ||r(i)||/||b|| 8.543930242012e-02
>   4 KSP preconditioned resid norm 2.990078801994e-02 true resid norm
> 3.012123557665e-04 ||r(i)||/||b|| 9.098396242766e-02
>   5 KSP preconditioned resid norm 2.498986435717e-02 true resid norm
> 3.492907889867e-04 ||r(i)||/||b|| 1.055064953780e-01
>   6 KSP preconditioned resid norm 2.212280334220e-02 true resid norm
> 2.649097730890e-04 ||r(i)||/||b|| 8.001843344081e-02
>   7 KSP preconditioned resid norm 1.270611663385e-02 true resid norm
> 1.114378752133e-04 ||r(i)||/||b|| 3.366083514611e-02
>*** MG linear solver time: 1.032488e+01
>*** Number of

Re: [petsc-users] How to extract array portions

2016-04-06 Thread Dave May

On 6 April 2016 at 16:20, FRANCAVILLA MATTEO ALESSANDRO <d019...@polito.it>
wrote:

> Fantastic, it definitely solved my problem! Thank you!
>
> Should I always create the vectors with MatCreateVecs when I use them for
> matrix-vector multiplications? I used to create vectors with VecCreate, and
> it has always worked fine with "standard" matrices (matrices not created
> with MATNEST structure).
>

Yes


>
> In the example at
> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex2.c.html
> I read the following:
>
> "PETSc automatically generates appropriately partitioned matrices and
> vectors when MatCreate() and VecCreate() are used with the same
> communicator."
>

This comment also assumes that you use PETSC_DECIDE for the local sizes. So
if you create a matrix with global size M x M and a vector of global size M
and you (i) use PETSC_DECIDE for the local size for the row/col with
MatCreate() and PETSC_DECIDE for the local row for VecCreate(); and (ii)
you use the same communicator - then the row partitioning adopted by the
matrix will be consistent with the row partitioning for the vector.



> It is not clear to me when I can simply use VecCreate (as I've always done
> so far), and when to use MatCreateVecs...
>

You should always use MatCreateVecs(). The implementation of Mat will give
you back a consistent vector.

Thanks,
  Dave


>
>
>
> On Wed, 6 Apr 2016 14:38:12 +0200
>
>  Dave May <dave.mayhe...@gmail.com> wrote:
>
>> On 6 April 2016 at 14:12, FRANCAVILLA MATTEO ALESSANDRO <
>> d019...@polito.it>
>> wrote:
>>
>> Thanks Dave, that's exactly what I was looking for, and it (partially)
>>> solved my problem.
>>>
>>> The MatMult operation now works fine, unfortunately not for every number
>>> of MPI processes (while the MatMult involving each submatrix works fine
>>> for
>>> any number of processes):
>>>
>>> 
>>> matArray(1) = L_discretized
>>> matArray(2) = K_discretized
>>> matArray(3) = L_MR
>>> matArray(4) = K_MR
>>> CALL
>>>
>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_OBJECT,2,PETSC_NULL_OBJECT,matArray,TheMatrix,IERR)
>>> CALL MatSetFromOptions(TheMatrix,IERR)
>>> !
>>> CALL VecCreate(PETSC_COMM_WORLD,x,IERR)
>>> CALL VecSetSizes(x,PETSC_DECIDE,2*NumberOfUnknowns,IERR)
>>> CALL VecSetFromOptions(x,IERR)
>>> CALL VecSet(x,0.+0.*PETSC_i,IERR)
>>> CALL VecSetValues(x,1,0,1.+0.*PETSC_i,INSERT_VALUES,IERR)
>>> CALL VecAssemblyBegin(x,IERR)
>>> CALL VecAssemblyEnd(x,IERR)
>>> CALL VecCreate(PETSC_COMM_WORLD,y,IERR)
>>> CALL VecSetSizes(y,PETSC_DECIDE,NumberOfUnknowns+NumberOfMeasures,IERR)
>>> CALL VecSetFromOptions(y,IERR)
>>> CALL MatMult(TheMatrix,x,y,IERR)
>>> 
>>>
>>>
>>> I guess there is something wrong with the distribution of vectors between
>>> the MPI processes. With the specific sizes of my problem everything works
>>> fine with 1-2-3-6 processes, otherwise I get "Nonconforming object sizes"
>>> error, e.g. with 8 processes:
>>>
>>> [2]PETSC ERROR: - Error Message
>>> --
>>> [2]PETSC ERROR: Nonconforming object sizes
>>> [2]PETSC ERROR: Mat mat,Vec y: local dim 383 382
>>> [2]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
>>> [2]PETSC ERROR: Nonconforming object sizes
>>> ./ISS on a arch-linux2-c-debug named alessandro-HP-ProDesk-490-G2-MT by
>>> alessandro Wed Apr  6 13:31:45 2016
>>> [2]PETSC ERROR: Configure options
>>> --with-blas-lapack-dir=/opt/intel/composer_xe_2015.2.164/mkl
>>> --with-mpi-cc=/home/alessandro/.openmpi-1.8.3/bin/mpicc
>>> --with-mpi-f90=/home/alessandro/.openmpi-1.8.3/bin/mpif90
>>> --with-mpiexec=/home/alessandro/.openmpi-1.8.3/bin/mpiexec
>>> --with-scalar-type=complex
>>> [2]PETSC ERROR: #1 MatMult() line 2216 in
>>> /home/alessandro/CODE/PETSc/petsc-3.6.3/src/mat/interface/matrix.c
>>> ...
>>> [6]PETSC ERROR: Mat mat,Vec y: local dim 381 382
>>> ...
>>> [7]PETSC ERROR: Mat mat,Vec y: local dim 381 382
>>> ...
>>> [1]PETSC ERROR: Mat mat,Vec y: local dim 383 382
>>> ...
>>>
>>>
>>> I noticed that the code works (runs and provides correct results) when
>>> the
>>> number of processes divides the length of y2 (number of rows of A21 and
>>> A22); however when I apply MatMult separately on A21 and A22 everyt

Re: [petsc-users] How to extract array portions

2016-04-06 Thread Dave May

On 6 April 2016 at 11:08, FRANCAVILLA MATTEO ALESSANDRO 
wrote:

> Hi,
>
> I'm trying to set a linear problem with a 2x2 block matrix (say A=[A11
> A12; A21 A22] in matlab notation). Obviously I could generate a single
> matrix, but I would like to keep it as a 2x2 block matrix with the 4 blocks
> generated and used independently (e.g., because at some point I need to
> change some of those blocks to matrix-free). My plan was then to generate a
> MatShell and set a MATOP_MULT shell operation to explicitly compute y=A*x
> by splitting x=[x1;x2] and y=[y1;y2] and carrying out the single
> matrix-vector multiplications, and reassembling the final result.


Take a look at MatNest - it does exactly what you want.

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateNest.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html#MATNEST

Thanks,
  Dave


> My problem is that my efforts in extracting x1 and x2 from the original x
> vector have failed. The following Fortran code is what seemed to me the
> most reasonable thing to do to set the matrix-vector multiplication, but it
> does not work
>
>
> !==
> subroutine MyMult(A,x,y,IERR)
>
> implicit none
>
> #include 
> #include 
> #include 
> #include 
>
> Mat A
> Vec x,y, x1, x2, y1, y2
> IS  :: ix1, ix2, iy1, iy2
> PetscErrorCode ierr
> PetscIntM, N, II, NumberOfUnknowns, NumberOfMeasures
> PetscInt, ALLOCATABLE :: INDICES(:)
> PetscMPIInt   :: RANK
>
> CALL VecGetSize(x,N,IERR)
> NumberOfUnknowns = N / 2
> CALL VecGetSize(y,N,IERR)
> NumberOfMeasures = N - NumberOfUnknowns
>
> N = NumberOfUnknowns + MAX(NumberOfUnknowns,NumberOfMeasures)
> ALLOCATE(INDICES(N))
> INDICES = (/ (ii, ii=0, N-1) /)
>
> CALL
> ISCreateGeneral(MPI_COMM_SELF,NumberOfUnknowns,INDICES(1:NumberOfUnknowns),PETSC_COPY_VALUES,ix1,IERR)
> CALL VecGetSubVector(x,ix1,x1,IERR)
> CALL
> ISCreateGeneral(MPI_COMM_SELF,NumberOfUnknowns,INDICES(NumberOfUnknowns+1:2*NumberOfUnknowns),PETSC_COPY_VALUES,ix2,IERR)
> CALL VecGetSubVector(x,ix2,x2,IERR)
> CALL
> ISCreateGeneral(MPI_COMM_SELF,NumberOfUnknowns,INDICES(1:NumberOfUnknowns),PETSC_COPY_VALUES,iy1,IERR)
> CALL VecGetSubVector(y,iy1,y1,IERR)
> CALL
> ISCreateGeneral(MPI_COMM_SELF,NumberOfUnknowns,INDICES(NumberOfUnknowns+1:NumberOfUnknowns+NumberOfMeasures),PETSC_COPY_VALUES,iy2,IERR)
> CALL VecGetSubVector(y,iy2,y2,IERR)
>
> CALL MatMult(L_discretized,x1,y1,IERR)
> CALL MatMultAdd(K_discretized,x2,y1,y1,IERR)
> CALL MatMult(L_MR,x1,y2,IERR)
> CALL MatMultAdd(K_MR,x2,y2,y2,IERR)
>
> CALL VecRestoreSubVector(y,iy1,y1,IERR)
> CALL VecRestoreSubVector(y,iy2,y2,IERR)
>
> return
> end subroutine MyMult
> !==
>
>
> Obviously the sequence of calls to ISCreateGeneral and VecGetSubVector
> does not do what I expect, as the errors I'm getting are in the following
> MatMult multiplications (the global sizes of x1 and x2 SHOULD BE both 771,
> while y1 and y2 should be 771 and 2286):
>
> 1) executed with 1 MPI process:
>
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Nonconforming object sizes
> [0]PETSC ERROR: Mat mat,Vec y: global dim 2286 771
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
> [0]PETSC ERROR: ./ISS on a arch-linux2-c-debug named
> alessandro-HP-ProDesk-490-G2-MT by alessandro Wed Apr  6 10:54:03 2016
> [0]PETSC ERROR: Configure options
> --with-blas-lapack-dir=/opt/intel/composer_xe_2015.2.164/mkl
> --with-mpi-cc=/home/alessandro/.openmpi-1.8.3/bin/mpicc
> --with-mpi-f90=/home/alessandro/.openmpi-1.8.3/bin/mpif90
> --with-mpiexec=/home/alessandro/.openmpi-1.8.3/bin/mpiexec
> --with-scalar-type=complex
> [0]PETSC ERROR: #1 MatMult() line 2215 in
> /home/alessandro/CODE/PETSc/petsc-3.6.3/src/mat/interface/matrix.c
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Nonconforming object sizes
> [0]PETSC ERROR: Mat mat,Vec v3: local dim 2286 771
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
> [0]PETSC ERROR: ./ISS on a arch-linux2-c-debug named
> alessandro-HP-ProDesk-490-G2-MT by alessandro Wed Apr  6 10:54:03 2016
> [0]PETSC ERROR: Configure options
> --with-blas-lapack-dir=/opt/intel/composer_xe_2015.2.164/mkl
> --with-mpi-cc=/home/alessandro/.openmpi-1.8.3/bin/mpicc
> --with-mpi-f90=/home/alessandro/.openmpi-1.8.3/bin/mpif90
> --with-mpiexec=/home/alessandro/.openmpi-1.8.3/bin/mpiexec
> --with-scalar-type=complex
> [0]PETSC ERROR: #2 MatMultAdd() line 2396 in
> /home/alessandro/CODE/PETSc/petsc-3.6.3/src/mat/interface/matrix.c
>
>
>
> 2)

Re: [petsc-users] Vec is locked read only

2016-04-02 Thread Dave May

On 2 April 2016 at 11:18, Rongliang Chen  wrote:

> Hi Shri,
>
> Thanks for your reply.
>
> Do you mean that I need to change the VecGetArrary() in
> /home/rlchen/soft/petsc-3.6.3/src/dm/interface/dm.c to VecGetArrayRead()?
>

No - you should change it in your function FormMassTimeStepFunction().
The input vector x passed into SNESComputeFunction() is read only.



> I tried it and I got a warning when I make the petsc:
>
> /home/rlchen/soft/petsc-3.6.3/src/dm/interface/dm.c: In function
> ‘DMLocalToGlobalBegin’:
> /home/rlchen/soft/petsc-3.6.3/src/dm/interface/dm.c:1913:5: warning:
> passing argument 2 of ‘VecGetArrayRead’ from incompatible pointer type
> [enabled by default]
> In file included from /home/rlchen/soft/petsc-3.6.3/include/petscmat.h:6:0,
> from /home/rlchen/soft/petsc-3.6.3/include/petscdm.h:6,
> from /home/rlchen/soft/petsc-3.6.3/include/petsc/private/dmimpl.h:6,
> from /home/rlchen/soft/petsc-3.6.3/src/dm/interface/dm.c:1:
> /home/rlchen/soft/petsc-3.6.3/include/petscvec.h:420:29: note: expected
> ‘const PetscScalar **’ but argument is of type ‘PetscScalar **’
>
> Best regards,
> Rongliang
>
> -
> Rongliang Chen,   PhD
> Associate Professor
>
> Laboratory for Engineering and Scientific Computing
> Shenzhen Institutes of Advanced Technology
> Chinese Academy of Sciences
> Address: 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen,
> Guangdong (518055), P. R. China
> E-mail:  rl.c...@siat.ac.cn
> Phone: +86-755-86392312
>
>
> On 04/02/2016 05:03 PM, Abhyankar, Shrirang G. wrote:
>
>> Use VecGetArrayRead instead of VecGetArray
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetArray
>> Read.html
>>
>>
>> Shri
>>
>> -Original Message-
>> From: Rongliang Chen 
>> Date: Saturday, April 2, 2016 at 3:51 AM
>> To: PETSc users list , "rongliang.c...@gmail.com
>> "
>> 
>> Subject: [petsc-users] Vec is locked read only
>>
>> Dear All,
>>>
>>> My code got the following error messages, but the code works well for
>>> the petsc optimized version (--with-debugging=0). Anyone can tell me how
>>> to fix this problem?
>>>
>>> Best regards,
>>> Rongliang
>>>
>>>
>>> [0]PETSC ERROR: - Error Message
>>> --
>>> [0]PETSC ERROR: Object is in wrong state
>>> [0]PETSC ERROR:  Vec is locked read only, argument # 1
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
>>> [0]PETSC ERROR: ./Nwtun on a 64bit-debug named rlchen by rlchen Sat Apr
>>> 2 15:40:32 2016
>>> [0]PETSC ERROR: Configure options --download-blacs --download-scalapack
>>> --download-metis --download-parmetis --download-exodusii
>>> --download-netcdf --download-hdf5
>>> --with-mpi-dir=/home/rlchen/soft/Program/mpich2-shared
>>> --with-debugging=1 --download-fblaslapack --download-chaco
>>> [0]PETSC ERROR: #1 VecGetArray() line 1646 in
>>> /home/rlchen/soft/petsc-3.6.3/src/vec/vec/interface/rvector.c
>>> [0]PETSC ERROR: #2 DMLocalToGlobalBegin() line 1913 in
>>> /home/rlchen/soft/petsc-3.6.3/src/dm/interface/dm.c
>>> [0]PETSC ERROR: #3 FormMassTimeStepFunction() line 191 in
>>>
>>> /home/rlchen/soft/3D_fluid/FiniteVolumeMethod/PETScCodes/codefor3.6/SetupF
>>> unctions.c
>>> [0]PETSC ERROR: #4 FormFunction() line 46 in
>>>
>>> /home/rlchen/soft/3D_fluid/FiniteVolumeMethod/PETScCodes/codefor3.6/SetupF
>>> unctions.c
>>> [0]PETSC ERROR: #5 SNESComputeFunction() line 2067 in
>>> /home/rlchen/soft/petsc-3.6.3/src/snes/interface/snes.c
>>> [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 184 in
>>> /home/rlchen/soft/petsc-3.6.3/src/snes/impls/ls/ls.c
>>> [0]PETSC ERROR: #7 SNESSolve() line 3906 in
>>> /home/rlchen/soft/petsc-3.6.3/src/snes/interface/snes.c
>>> [0]PETSC ERROR: #8 SolveTimeDependent() line 843 in
>>>
>>> /home/rlchen/soft/3D_fluid/FiniteVolumeMethod/PETScCodes/codefor3.6/Nwtun.
>>> c
>>> [0]PETSC ERROR: #9 main() line 452 in
>>>
>>> /home/rlchen/soft/3D_fluid/FiniteVolumeMethod/PETScCodes/codefor3.6/Nwtun.
>>> c
>>>
>>>
>>>
>>
>
>

Re: [petsc-users] MatSetSizes with blocked matrix

2016-03-15 Thread Dave May

On 15 March 2016 at 04:46, Matthew Knepley  wrote:

> On Mon, Mar 14, 2016 at 10:05 PM, Steena Monteiro 
> wrote:
>
>> Hello,
>>
>> I am having difficulty getting MatSetSize to work prior to using MatMult.
>>
>> For matrix A with rows=cols=1,139,905 and block size = 2,
>>
>
> It is inconsistent to have a row/col size that is not divisible by the
> block size.
>


To be honest, I don't think the error message being thrown clearly
indicates what the actual problem is (hence the email from Steena). What
about

"Cannot change/reset row sizes to 40 local 1139906 global after
previously setting them to 40 local 1139905 global. Local and global
sizes must be divisible by the block size"


>
>   Matt
>
>
>> rank 0 gets 40 rows and rank 1 739905 rows,  like so:
>>
>> /*Matrix setup*/
>>
>> ierr=PetscViewerBinaryOpen(PETSC_COMM_WORLD,file,FILE_MODE_READ,);
>> ierr = MatCreate(PETSC_COMM_WORLD,);
>> ierr = MatSetFromOptions(A);
>> ierr = MatSetType(A,MATBAIJ);
>> ierr = MatSetBlockSize(A,2);
>>
>> /*Unequal row assignment*/
>>
>>  if (!rank) {
>> ierr = MatSetSizes(A, 40, PETSC_DECIDE,
>> 1139905,1139905);CHKERRQ(ierr);
>>}
>> else {
>> ierr = MatSetSizes(A, 739905, PETSC_DECIDE,
>> 1139905,1139905);CHKERRQ(ierr);
>> }
>>
>> MatMult (A,x,y);
>>
>> //
>>
>> Error message:
>>
>> 1]PETSC ERROR: [0]PETSC ERROR: No support for this operation for this
>> object type
>> Cannot change/reset row sizes to 40 local 1139906 global after
>> previously setting them to 40 local 1139905 global
>>
>> [1]PETSC ERROR: [0]PETSC ERROR: Cannot change/reset row sizes to 739905
>> local 1139906 global after previously setting them to 739905 local 1139905
>> global
>>
>> -Without messing with row assignment,  MatMult works fine on this matrix
>> for block size = 2, presumably because an extra padded row is automatically
>> added to facilitate blocking.
>>
>> -The above code snippet works well for block size = 1.
>>
>> Is it possible to do unequal row distribution *while using blocking*?
>>
>> Thank you for any advice.
>>
>> -Steena
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] DMShellSetCreateRestriction

2016-03-11 Thread Dave May

> Other suggestions on how to best integrate staggered finite differences
> within the current PETSc framework are ofcourse also highly welcome.
> Our current thinking was to pack it into a DMSHELL (which has the problem
> of not having a restriction interface).
>
>
Using DMShell is the cleanest approach.

An alternative is to have you user code simply take control of all of the
configuration of the PCMG object. E.g. you call your user code which
creates the restriction operator, you pull out the PC and call
PCMGSetRestriction() on etc. This can be done easily performed in the
context of linear problems. For non-linear problems, you could jam this
setup code inside your ComputeJacobian function.

This is all possible, albeit clunky and kinda ugly. It works though if you
need something before Barry adds the required support in PCMG.

Cheers
  Dave

Re: [petsc-users] DMShellSetCreateRestriction

2016-03-11 Thread Dave May

On 11 March 2016 at 18:11, anton  wrote:

> Hi team,
>
> I'm implementing staggered grid in a PETSc-canonical way, trying to build
> a custom DM object, attach it to SNES, that should later transfered it
> further to KSP and PC.
>
> Yet, the Galerking coarsening for staggered grid is non-symmetric. The
> question is how possible is it that DMShellSetCreateRestriction can be
> implemented and included in 3.7 release?
>

It's a little more work than just adding a new method within the DM and a
new APIs for DMCreateRestriction() and DMShellSetCreateRestriction().
PCMG needs to be modified to call DMCreateRestriction().



>
> Please, please.
>
> Thanks,
> Anton
>
>
>

Re: [petsc-users] Weighted Jacobi, and scaling 2nd matrix in KSPSetOperators

2016-02-22 Thread Dave May

On 22 February 2016 at 23:16, Timothée Nicolas 
wrote:

> Hi all,
>
> It sounds it should be obvious but I can't find it somehow. I would like
> to use weighted jacobi (more precisely point block jacobi) as a smoother
> for multigrid, but I don't find the options to set a weight omega, after I
> set -mg_levels_pc_type pbjacobi
>

Why not use  -ksp_type richardson -ksp_richardson_scale XXX ?

Cheers,
  Dave

So I was thinking I could just artificially scale the second matrix in
> KSPSetOperators with a weight omega. Since it is the one used in retrieving
> the diagonal, it should be equivalent to have a weight and should not
> affect the residuals, right ? However, I notice no change of behaviour
> whatsoever when I do this.
>
> Is it normal ? Have I grossly misunderstood something ?
>
> Thanks
>
> Timothee
>

Re: [petsc-users] Extracting CSR format from matrices for CUDA

2016-02-12 Thread Dave May

On 12 February 2016 at 19:16, Bhalla, Amneet Pal S 
wrote:

> Hi Folks,
>
> I want to extract the CSR format from PETSc matrices and ship it to CUDA.
> Is there an easy way of doing this?
>

Yep, see these web pages

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetColumnIJ.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSeqAIJGetArray.html



> If so is there an example/source code showing this extraction?
>

Not that I know of.
But there isn't much to it, the description on these webpages should be
sufficient.

Thanks,
  Dave



>
> Thanks,
>
> — Amneet
>
>
>

Re: [petsc-users] Create DM matrix

2016-02-11 Thread Dave May

ETSC ERROR: Configure run at Thu Feb 11 11:44:35 2016
> [0]PETSC ERROR: Configure options --download-mpich=1
> --with-scalar-type=real --with-clanguage=cxx --download-mumps=1
> --download-blacs=1 --download-parmetis=1 --download-scalapack=1
> --with-debugging=1 --download-hypre=1 --with-fc=gfortran --download-metis=1
> -download-cmake=1 --download-f-blas-lapack=1
> [0]PETSC ERROR:
> 
> [0]PETSC ERROR: MatSetBlockSize() line 6686 in
> /home/sang/petsc/petsc-3.4.5/src/mat/interface/matrix.c
> [0]PETSC ERROR: PetscErrorCode
> sasMatVecPetsc::DMCreateMatrix_DA_3d_MPIAIJ_pvs(DM, sasSmesh*,
> sasVector&, sasVector&, sasVector&, sasVector&)() line
> 165 in "unknowndirectory/"src/mat_vec/sasMatVecPetsc.cpp
>
> where is the mistake?
>
> Many thanks.
>
> Pham
>
> On Thu, Feb 11, 2016 at 1:18 PM, Dave May <dave.mayhe...@gmail.com> wrote:
>
>> I think he wants the source location so that he can copy and
>> implementation and "tweak" it slightly
>>
>> The location is here
>> ${PETSC_DIR}/src/dm/impls/da/fdda.c
>>
>> /Users/dmay/software/petsc-3.6.0/src
>> dmay@nikkan:~/software/petsc-3.6.0/src $ grep -r
>> DMCreateMatrix_DA_3d_MPIAIJ *
>> dm/impls/da/fdda.c:extern PetscErrorCode
>> DMCreateMatrix_DA_3d_MPIAIJ(DM,Mat);
>> dm/impls/da/fdda.c:extern PetscErrorCode
>> DMCreateMatrix_DA_3d_MPIAIJ_Fill(DM,Mat);
>> dm/impls/da/fdda.c:ierr =
>> DMCreateMatrix_DA_3d_MPIAIJ_Fill(da,A);CHKERRQ(ierr);
>> dm/impls/da/fdda.c:ierr =
>> DMCreateMatrix_DA_3d_MPIAIJ(da,A);CHKERRQ(ierr);
>>
>>
>> *dm/impls/da/fdda.c:#define __FUNCT__
>> "DMCreateMatrix_DA_3d_MPIAIJ"dm/impls/da/fdda.c:PetscErrorCode
>> DMCreateMatrix_DA_3d_MPIAIJ(DM da,Mat J)*dm/impls/da/fdda.c:#define
>> __FUNCT__ "DMCreateMatrix_DA_3d_MPIAIJ_Fill"
>> dm/impls/da/fdda.c:PetscErrorCode DMCreateMatrix_DA_3d_MPIAIJ_Fill(DM
>> da,Mat J)
>>
>>
>> On 11 February 2016 at 04:08, Matthew Knepley <knep...@gmail.com> wrote:
>>
>>> On Wed, Feb 10, 2016 at 8:59 PM, Sang pham van <pvsang...@gmail.com>
>>> wrote:
>>>
>>>> The irregular rows is quite many. The matrix really needs to be
>>>> preallocated.
>>>> Could you show me how to use DMCreateMatrix_DA_3d_MPIAIJ() directly?
>>>>
>>>> Just put the declaration right into your source.
>>>
>>>Matt
>>>
>>>> Pham
>>>> On Feb 11, 2016 9:52 AM, "Matthew Knepley" <knep...@gmail.com> wrote:
>>>>
>>>>> On Wed, Feb 10, 2016 at 8:44 PM, Sang pham van <pvsang...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That is because my matrix has some rows which need more entries than
>>>>>> usual.
>>>>>>
>>>>>
>>>>> If its only a few, you could just turn off the allocation error.
>>>>>
>>>>>
>>>>>> Where can i find source code of DMCreateMatrix()?
>>>>>>
>>>>>>
>>>>>
>>>>> https://bitbucket.org/petsc/petsc/src/827b69d6bb12709ff9b9a0dede31640477fc2b74/src/dm/impls/da/fdda.c?at=master=file-view-default#fdda.c-1024
>>>>>
>>>>>   Matt
>>>>>
>>>>>> Pham.
>>>>>> On Feb 11, 2016 8:35 AM, "Matthew Knepley" <knep...@gmail.com> wrote:
>>>>>>
>>>>>>> On Wed, Feb 10, 2016 at 6:14 PM, Sang pham van <pvsang...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to create a DM matrix with DMCreateMatrix_DA_3d_MPIAIJ()
>>>>>>>> instead of using DMCreateMatrix().
>>>>>>>>
>>>>>>>
>>>>>>> Why, that should be called automatically by DMCreateMatrix()?
>>>>>>>
>>>>>>>   Matt
>>>>>>>
>>>>>>>
>>>>>>>> Which header file should I include to use that routine? also, what
>>>>>>>> is the source file containing the DMCreateMatrix() routine?
>>>>>>>>
>>>>>>>> Many thanks in advance.
>>>>>>>> Pham
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>> their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>

Re: [petsc-users] Create DM matrix

2016-02-10 Thread Dave May

I think he wants the source location so that he can copy and implementation
and "tweak" it slightly

The location is here
${PETSC_DIR}/src/dm/impls/da/fdda.c

/Users/dmay/software/petsc-3.6.0/src
dmay@nikkan:~/software/petsc-3.6.0/src $ grep -r
DMCreateMatrix_DA_3d_MPIAIJ *
dm/impls/da/fdda.c:extern PetscErrorCode
DMCreateMatrix_DA_3d_MPIAIJ(DM,Mat);
dm/impls/da/fdda.c:extern PetscErrorCode
DMCreateMatrix_DA_3d_MPIAIJ_Fill(DM,Mat);
dm/impls/da/fdda.c:ierr =
DMCreateMatrix_DA_3d_MPIAIJ_Fill(da,A);CHKERRQ(ierr);
dm/impls/da/fdda.c:ierr =
DMCreateMatrix_DA_3d_MPIAIJ(da,A);CHKERRQ(ierr);


*dm/impls/da/fdda.c:#define __FUNCT__
"DMCreateMatrix_DA_3d_MPIAIJ"dm/impls/da/fdda.c:PetscErrorCode
DMCreateMatrix_DA_3d_MPIAIJ(DM da,Mat J)*dm/impls/da/fdda.c:#define
__FUNCT__ "DMCreateMatrix_DA_3d_MPIAIJ_Fill"
dm/impls/da/fdda.c:PetscErrorCode DMCreateMatrix_DA_3d_MPIAIJ_Fill(DM
da,Mat J)


On 11 February 2016 at 04:08, Matthew Knepley  wrote:

> On Wed, Feb 10, 2016 at 8:59 PM, Sang pham van 
> wrote:
>
>> The irregular rows is quite many. The matrix really needs to be
>> preallocated.
>> Could you show me how to use DMCreateMatrix_DA_3d_MPIAIJ() directly?
>>
>> Just put the declaration right into your source.
>
>Matt
>
>> Pham
>> On Feb 11, 2016 9:52 AM, "Matthew Knepley"  wrote:
>>
>>> On Wed, Feb 10, 2016 at 8:44 PM, Sang pham van 
>>> wrote:
>>>
 That is because my matrix has some rows which need more entries than
 usual.

>>>
>>> If its only a few, you could just turn off the allocation error.
>>>
>>>
 Where can i find source code of DMCreateMatrix()?


>>>
>>> https://bitbucket.org/petsc/petsc/src/827b69d6bb12709ff9b9a0dede31640477fc2b74/src/dm/impls/da/fdda.c?at=master=file-view-default#fdda.c-1024
>>>
>>>   Matt
>>>
 Pham.
 On Feb 11, 2016 8:35 AM, "Matthew Knepley"  wrote:

> On Wed, Feb 10, 2016 at 6:14 PM, Sang pham van 
> wrote:
>
>> Hi,
>>
>> I am trying to create a DM matrix with DMCreateMatrix_DA_3d_MPIAIJ()
>> instead of using DMCreateMatrix().
>>
>
> Why, that should be called automatically by DMCreateMatrix()?
>
>   Matt
>
>
>> Which header file should I include to use that routine? also, what is
>> the source file containing the DMCreateMatrix() routine?
>>
>> Many thanks in advance.
>> Pham
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] DIVERGED_PCSETUP_FAILED

2016-02-10 Thread Dave May

On 11 February 2016 at 07:05, Michele Rosso  wrote:

> I tried setting -mat_superlu_dist_replacetinypivot true: it does help to
> advance the run past the previous "critical"  point but eventually it stops
> later with the same error.
> I forgot to mention my system is singular: I remove the constant null
> space but I am not sure if the coarse solver needs to be explicity informed
> of this.
>

Right - are you using pure Newmann boundary conditions?

To make the solution unique, are you
(a) imposing a single Dichletet boundary condition on your field
by explicitly modifying the matrix
(b) imposing a a condition like
  \int \phi dV = 0
via something like -ksp_constant_null_space

If you removed removed the null space by modifying the matrix explicitly
(a), the sparse direct solver
should go through. If you use (b), then this method cannot be used to help
the direct solver.

If this is the intended target problem size (16x16), gather the matrix and
using petsc Cholesky or Umfpack.
Cholesky is more stable than LU and can usually deal with a single zero
eigenvaue without resorting to tricks. Umfpack will solve the problem
easily as it uses clever re-ordering. If you have access to MKL-Pardiso,
that will also work great.

Thanks,
  Dave



>
>
> Michele
>
>
> On Wed, 2016-02-10 at 22:15 -0600, Barry Smith wrote:
>
>You can try the option
>
> -mat_superlu_dist_replacetinypivot true
>
> if you are luck it get you past the zero pivot but still give an adequate 
> preconditioner.
>
>   Barry
> > On Feb 10, 2016, at 9:49 PM, Michele Rosso  wrote:> > 
> > Hong,> > here if the output of grep -info:> > using diagonal shift 
> > on blocks to prevent zero pivot [INBLOCKS]>   Replace tiny 
> > pivots FALSE> tolerance for zero pivot 2.22045e-14> > It seems it 
> > is not replacing small pivots: could this be the problem?> I will also try 
> > Barry's suggestion to diagnose the problem.> > Thanks,> Michele> > > On 
> > Wed, 2016-02-10 at 21:22 -0600, Barry Smith wrote:>> > On Feb 10, 2016, at 
> > 9:00 PM, Hong  wrote:>> > >> > Michele :>> > 
> > Superlu_dist LU is used for coarse grid PC, which likely produces a 
> > zero-pivot.>> > Run your code with '-info |grep pivot' to  verify.>> >> >>  
> >  Michele>> >>You can also run with -ksp_error_if_not_converged in or 
> > not in the debugger and it will stop immediately when the problem is 
> > detected and hopefully provide additional useful  information about what 
> > has happened.>> >>   Barry>> >> >> > >> > Hong>> > >> > Hi Matt,>> > >> > 
> > the ksp_view output was an attachment to my previous email.>> > Here it 
> > is:>> > >> > KSP Object: 1 MPI processes>> >   type: cg>> >   maximum 
> > iterations=1>> >   tolerances:  relative=1e-08, absolute=1e-50, 
> > divergence=1.>> >   left preconditioning>> >   using nonzero initial 
> > guess>> >   using UNPRECONDITIONED norm type for convergence test>> > PC 
> > Object: 1 MPI processes>> >   type: mg>> > MG: type is MULTIPLICATIVE, 
> > levels=4 cycles=v>> >   Cycles per PCApply=1>> >   Using Galerkin 
> > computed coarse grid matrices>> >   Coarse grid solver -- level 
> > --->> > KSP Object:(mg_coarse_) 1 
> > MPI processes>> >   type: preonly>> >   maximum iterations=1, 
> > initial guess is zero>> >   tolerances:  relative=1e-05, 
> > absolute=1e-50, divergence=1.>> >   left preconditioning>> >   
> > using NONE norm type for convergence test>> > PC Object:
> > (mg_coarse_) 1 MPI processes>> >   type: lu>> > LU: 
> > out-of-place factorization>> > tolerance for zero pivot 
> > 2.22045e-14>> > using diagonal shift on blocks to prevent zero 
> > pivot [INBLOCKS]>> > matrix ordering: nd>> > factor fill 
> > ratio given 0., needed 0.>> >   Factored matrix follows:>> >
> >  Mat Object: 1 MPI processes>> >   type: 
> > seqaij>> >   rows=16, cols=16>> >   package used to 
> > perform factorization: superlu_dist>> >   total: nonzeros=0, 
> > allocated nonzeros=0>> >   total number of mallocs used during 
> > MatSetValues calls =0>> > SuperLU_DIST run parameters:>> >  
> >  Process grid nprow 1 x npcol 1 >> >   
> > Equilibrate matrix TRUE >> >   Matrix input mode 0 >> > 
> >   Replace tiny pivots FALSE >> >   Use 
> > iterative refinement FALSE >> >   Processors in row 1 col 
> > partition 1 >> >   Row permutation LargeDiag >> >   
> > Column permutation METIS_AT_PLUS_A>> >   Parallel 
> > symbolic factorization FALSE >> >   Repeated factorization 
> > SamePattern>> >   linear system matrix = precond matrix:>> >   Mat 
> >

Re: [petsc-users] Different results with different distribution of rows

2016-02-09 Thread Dave May

If you don't specify preconditioner via -pc_type XXX, the default being
used is BJacobi-ILU.
This preconditioner will yield different results on different numbers of
MPI-processes, and will yield different results for a fixed number of
different MPI-processes, but with a different matrix partitioning. If you
operator is singular (or close to singular), ILU is likely to fail.

To be sure your code is working correctly, test it using a preconditioner
which isn't dependent on the partitioning of the matrix. I would use these
options:
  -pc_type jacobi
  -ksp_monitor_true_residual

The last option is useful as it will report both the preconditioned
residual and the true residual. If the operator is singular, or close to
singular, gmres-ILU or gmres-BJacobi-ILU can report a preconditioned
residual which is small, but is orders of magnitude different from the true
residual.

Thanks,
  Dave



On 9 February 2016 at 14:39, Florian Lindner  wrote:

> Addition. The KSP Solver shows very different convergence:
>
>
> WRONG:
>
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm
> 6.832362172732e+06 is less than relative tolerance 1.e-09
> times initial right hand side norm 6.934533099989e+15 at iteration 8447
>
> RIGHT:
>
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm
> 7.959757133341e-08 is less than relative tolerance 1.e-09
> times initial right hand side norm 1.731788191624e+02 at iteration 9
>
> Best,
> Florian
>
> On Tue, 9 Feb 2016 14:06:01 +0100
> Florian Lindner  wrote:
>
> > Hello,
> >
> > I use PETSc with 4 MPI processes and I experience different results
> > when using different distribution of rows amoung ranks. The code looks
> > like that:
> >
> >
> > KSPSetOperators(_solver, _matrixC.matrix, _matrixC.matrix);
> > // _solverRtol = 1e-9
> > KSPSetTolerances(_solver, _solverRtol, PETSC_DEFAULT, PETSC_DEFAULT,
> > PETSC_DEFAULT);
> > KSPSetFromOptions(_solver);
> >
> > // means: MatGetVecs(matrix, nullptr, );
> > petsc::Vector Au(_matrixC, "Au");
> > petsc::Vector out(_matrixC, "out");
> > petsc::Vector in(_matrixA, "in");
> >
> > // is an identity mapping here
> > ierr = VecSetLocalToGlobalMapping(in.vector, _ISmapping);
> >
> > // fill and assemble vector in
> >
> > in.view();
> > MatMultTranspose(_matrixA.matrix, in.vector, Au.vector);
> > Au.view();
> > KSPSolve(_solver, Au.vector, out.vector);
> >
> > out.view();
> >
> > I have experimented with two variants. The first one, non-working has
> > MatrixC rows distributed like that: 3, 4, 2, 2. The other, working,
> > one has 4, 3, 2, 2.
> >
> > All input values: _matrixA, _matrixC, in are identival, Au is
> > identical too, but out differs. See the results from object::view
> > below.
> >
> > Could the bad condition of matrixC be a problem? I'm not using any
> > special KSP options. matrixC is of type MATSBAIJ, matrixA is MATAIJ.
> >
> > Thanks,
> > Florian
> >
> > WRONG
> > =
> >
> > Vec Object:in 4 MPI processes
> >   type: mpi
> > Process [0]
> > 1
> > 2
> > Process [1]
> > 3
> > 4
> > Process [2]
> > 5
> > 6
> > Process [3]
> > 7
> > 8
> >
> > Vec Object:Au 4 MPI processes
> >   type: mpi
> > Process [0]
> > 36
> > 74
> > 20
> > Process [1]
> > 1.09292
> > 2.09259
> > 3.18584
> > 4.20349
> > Process [2]
> > 5.29708
> > 6.31472
> > Process [3]
> > 7.24012
> > 8.23978
> >
> > // should not be result
> > Vec Object:out 4 MPI processes
> >   type: mpi
> > Process [0]
> > -1.10633e+07
> > 618058
> > 9.01497e+06
> > Process [1]
> > 0.996195
> > 1.98711
> > 3.0
> > 4.00203
> > Process [2]
> > 5.00736
> > 6.01644
> > Process [3]
> > 6.98534
> > 7.99442
> >
> > Mat Object:C 4 MPI processes
> >   type: mpisbaij
> > row 0: (0, 0)  (3, 1)  (4, 1)  (5, 1)  (6, 1)  (7, 1)  (8, 1)  (9,
> > 1)  (10, 1) row 1: (1, 0)  (3, 0)  (4, 0)  (5, 1)  (6, 1)  (7, 2)
> > (8, 2)  (9, 3)  (10, 3) row 2: (2, 0)  (3, 0)  (4, 1)  (5, 0)  (6,
> > 1)  (7, 0)  (8, 1)  (9, 0)  (10, 1) row 3: (3, 1)  (4, 0.0183156)
> > (5, 0.0183156)  (6, 0.000335463)  (7, 1.12535e-07)  (8, 2.06115e-09)
> > row 4: (4, 1)  (5, 0.000335463)  (6, 0.0183156)  (7, 2.06115e-09)
> > (8, 1.12535e-07) row 5: (5, 1)  (6, 0.0183156)  (7, 0.0183156)  (8,
> > 0.000335463)  (9, 1.12535e-07)  (10, 2.06115e-09) row 6: (6, 1)  (7,
> > 0.000335463)  (8, 0.0183156)  (9, 2.06115e-09)  (10, 1.12535e-07) row
> > 7: (7, 1)  (8, 0.0183156)  (9, 0.0183156)  (10, 0.000335463) row 8:
> > (8, 1)  (9, 0.000335463)  (10, 0.0183156) row 9: (9, 1)  (10,
> > 0.0183156) row 10: (10, 1)
> >
> > Mat Object:A 4 MPI processes
> >   type: mpiaij
> >  1.0e+00  0.0e+00  0.0e+00  1.0e+00  1.83156e-02
> > 1.83156e-02  3.35463e-04  1.12535e-07  2.06115e-09  0.0e+00
> > 0.0e+00 1.0e+00  0.0e+00  1.0e+00  1.83156e-02
> > 1.0e+00  3.35463e-04  1.83156e-02  2.06115e-09  1.12535e-07
> > 0.0e+00  0.0e+00 1.0e+00  1.0e+00  0.0e+00
> > 1.83156e-02  3.35463e-04

Re: [petsc-users] MatSetValues timing

2016-02-08 Thread Dave May

On 8 February 2016 at 12:31, Jacek Miloszewski 
wrote:

> Dear PETSc users,
>
> I use PETSc to assembly a square matrix (in the attached example it is n =
> 4356) which has around 12% of non-zero entries. I timed my code using
> various number of process (data in table). Now I have 2 questions:
>
> 1. Why with doubling number of processes the speed-up is 4x? I would
> expect 2x at the most.
>
> 2. Is there a way to speed-up matrix construction in general? I attach
> piece of my fortran code at the bottom. At compilation time I have the
> following knowledge about the matrix: the total number of non-zero matrix
> elements, all diagonal elements are non-zero.
>

Performance improvements of the assembly require the preallocation of the
matrix be specified.
Check out these pages for further information

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html

also note the FAQ item

"Assembling large sparse matrices takes a long time. What can I do make
this process faster? or MatSetValues() is so slow, what can I do to speed
it up?"

http://www.mcs.anl.gov/petsc/documentation/faq.html


Thanks,
  Dave


> Timing data:
>
> number of proc time [s] speedup ratio
> 1 2044.941
> 2 504.692 4.051859352
> 4 149.678 3.371851575
> 8 64.102 2.334997348
> 16 17.296 3.706174838
> 32 4.43 3.904288939
> 64 1.096 4.041970803
>
> Code:
>
> call MatCreate(PETSC_COMM_WORLD, a, ierr)
> call MatSetSizes(a, PETSC_DECIDE, PETSC_DECIDE, n, n, ierr)
> call MatSetFromOptions(a, ierr)
> call MatSetUp(a, ierr)
>
> call MatGetOwnershipRange(a, Istart, Iend, ierr)
>
> call system_clock(t1)
> t_dim = h_dim*e_dim
> do row = Istart, Iend - 1 ! row
>do col = 0, t_dim - 1 ! col
>   call h_ij(row + 1, col + 1, n_h, n_e, b_h, b_e, h_dim, e_dim, e_sp,
> v, basis, ht, hs, info_1)
>   if (hs) then
>  hh = ht
>  call MatSetValues(a, 1, row, 1, col, hh, INSERT_VALUES, ierr)
>   end if
>end do
> end do
> call system_clock(t2, ct)
> if (rank == 0) then
>write(*, '(a, f0.3)') 'Matrix assembly time: ', real((t2 - t1),
> r8)/real(ct, r8)
> end if
>
> --
> Best Wishes
> Jacek Miloszewski
>

Re: [petsc-users] MatSetValues timing

2016-02-08 Thread Dave May

On 8 February 2016 at 12:31, Jacek Miloszewski 
wrote:

> Dear PETSc users,
>
> I use PETSc to assembly a square matrix (in the attached example it is n =
> 4356) which has around 12% of non-zero entries. I timed my code using
> various number of process (data in table). Now I have 2 questions:
>
> 1. Why with doubling number of processes the speed-up is 4x? I would
> expect 2x at the most.
>

Your timing doesn't appear to include the time required to scatter
off-processor values.
You should move the timer to be after the following calls

MatAssemblyBegin(a,MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(a,MAT_FINAL_ASSEMBLY);



>
> 2. Is there a way to speed-up matrix construction in general? I attach
> piece of my fortran code at the bottom. At compilation time I have the
> following knowledge about the matrix: the total number of non-zero matrix
> elements, all diagonal elements are non-zero.
>
> Timing data:
>
> number of proc time [s] speedup ratio
> 1 2044.941
> 2 504.692 4.051859352
> 4 149.678 3.371851575
> 8 64.102 2.334997348
> 16 17.296 3.706174838
> 32 4.43 3.904288939
> 64 1.096 4.041970803
>
> Code:
>
> call MatCreate(PETSC_COMM_WORLD, a, ierr)
> call MatSetSizes(a, PETSC_DECIDE, PETSC_DECIDE, n, n, ierr)
> call MatSetFromOptions(a, ierr)
> call MatSetUp(a, ierr)
>
> call MatGetOwnershipRange(a, Istart, Iend, ierr)
>
> call system_clock(t1)
> t_dim = h_dim*e_dim
> do row = Istart, Iend - 1 ! row
>do col = 0, t_dim - 1 ! col
>   call h_ij(row + 1, col + 1, n_h, n_e, b_h, b_e, h_dim, e_dim, e_sp,
> v, basis, ht, hs, info_1)
>   if (hs) then
>  hh = ht
>  call MatSetValues(a, 1, row, 1, col, hh, INSERT_VALUES, ierr)
>   end if
>end do
> end do
> call system_clock(t2, ct)
> if (rank == 0) then
>write(*, '(a, f0.3)') 'Matrix assembly time: ', real((t2 - t1),
> r8)/real(ct, r8)
> end if
>
> --
> Best Wishes
> Jacek Miloszewski
>

Re: [petsc-users] VecGetValues not working.

2016-02-07 Thread Dave May

On Sunday, 7 February 2016, Kaushik Kulkarni  wrote:

> Hello,
> I am a beginner at PETSc, so please excuse for such a trivial doubt. I am
> writing a program to learn about PETSc vectors. So in the process I thought
> to write a small program to learn about initializing and accessing vectors,
> which creates vectors with size `rank+1' where rank is initialized using
> `MPI_Comm_rank (PETSC_COMM_WORLD, );`. And, then simply I looped
> thorugh all the elements to get the sum. But as can seen in the
> output(attached), it is returning "junk" values.
>

Your call to printf is treating Y[i] as an integer (%d) when it should be
be using a specifier for a floating point number , eg %1.4e





> Source Code:- http://pastebin.com/Swku7A3X
> Output:- http://pastebin.com/paSWnFwz
>
>
> Thanks,
>
> *Kaushik*
>

[petsc-users] repartition for dynamic load balancing

2016-01-28 Thread Dave May

On Thursday, 28 January 2016, Matthew Knepley > wrote:

> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong  wrote:
>
>> What functions/tools can I use for dynamic migration in DMPlex framework?
>>
>
> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use the
> DMPlexMigrate() function to redistribute data.
> In the future, its likely we will add a function that wraps it up with
> determination of the new partition at the same time.
>
>
>> Can you also name some external mesh management systems? Thanks.
>>
>
> I will note that if load balance in the solve is your only concern,
> PCTelescope can redistribute the DMDA solve.
>

Currently Telescope will only repartition 2d and 3d DMDA's. It does perform
data migration and allows users to specify the number of ranks to be used
in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it
supports "load balancing", as there is no mechanism to define number of
points in each sub-domain

Cheers
  Dave



>
>   Thanks,
>
> Matt
>
>
>>
>> Xiangdong
>>
>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith  wrote:
>>
>>>
>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong  wrote:
>>> >
>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with
>>> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some
>>> simulation time, I found that repartition the 10 cells into 3 and 7 is
>>> better for load balancing. Is there an easy/efficient way to migrate data
>>> from one partition to another partition? I am wondering whether there are
>>> some functions or libraries help me manage this redistribution.
>>>
>>>   For DMDA we don't provide tools for doing this, nor do we expect to.
>>> For this type of need for dynamic migration we recommend using DMPlex or
>>> some external mesh management system.
>>>
>>>   Barry
>>>
>>> >
>>> > Thanks.
>>> > Xiangdong
>>> >
>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown  wrote:
>>> > Xiangdong  writes:
>>> >
>>> > > I have a question on dynamic load balance in petsc. I started
>>> running a
>>> > > simulation with one partition. As the simulation goes on, that
>>> partition
>>> > > may lead to load imbalance since it is a non-steady problem. If it
>>> is worth
>>> > > to perform the load balance, is there an easy way to re-partition
>>> the mesh
>>> > > and continue the simulation?
>>> >
>>> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
>>> > repartitioning it is entirely your business.
>>> >
>>> > In general, after adapting the mesh, you rebuild all algebraic data
>>> > structures.  Solvers can be reset (SNESReset, etc.).
>>> >
>>>
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

Re: [petsc-users] addition of two matrix

2016-01-21 Thread Dave May

Try this

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatAXPY.html

On 22 January 2016 at 00:11, wen zhao  wrote:

> Hello,
>
> I want to add to matrix, but i haven't found a function which can do this
> operation. Is there existe a kind of operation can do C = A + B
>
> Thanks
>

Re: [petsc-users] HPCToolKit/HPCViewer on OS X

2016-01-14 Thread Dave May

On 14 January 2016 at 14:24, Matthew Knepley  wrote:

> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
> amne...@live.unc.edu> wrote:
>
>>
>>
>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley  wrote:
>>
>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>> to interpret the data avalanche from one of those tools without a simple
>> map.
>>
>>
>> Does this indicate some hot spots?
>>
>
> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
> richardson_self_scae
>
> You can try to avoid this by giving -options_left
>
> 2) Are you using any custom code during the solve? There is a gaping whole
> in the timing. It take 9s to
> do PCApply(), but something like a collective 1s to do everything we
> time under that.
>


You are looking at the timing from a debug build.
The results from the optimized build don't have such a gaping hole.



>
> Since this is serial, we can use something like kcachegrind to look at
> performance as well, which should
> at least tell us what is sucking up this time so we can put a PETSc even
> on it.
>
>   Thanks,
>
>  Matt
>
>
>
>>
>> 
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document***
>>
>> 
>>
>> -- PETSc Performance Summary:
>> --
>>
>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>> processor, by Taylor Wed Jan 13 21:07:43 2016
>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
>> 2015-11-16 13:07:08 -0600
>>
>>  Max   Max/MinAvg  Total
>> Time (sec):   1.039e+01  1.0   1.039e+01
>> Objects:  2.834e+03  1.0   2.834e+03
>> Flops:3.552e+08  1.0   3.552e+08  3.552e+08
>> Flops/sec:3.418e+07  1.0   3.418e+07  3.418e+07
>> Memory:   3.949e+07  1.0  3.949e+07
>> MPI Messages: 0.000e+00  0.0   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00  0.0   0.000e+00  0.000e+00
>> MPI Reductions:   0.000e+00  0.0
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>> and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   - Time --  - Flops -  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>> Avg %Total Avg %Total   counts
>> %Total Avg %Total   counts   %Total
>>  0:  Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
>> 0.0%  0.000e+000.0%  0.000e+00   0.0%
>>
>>
>> 
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>Count: number of times phase was executed
>>Time and Flops: Max - maximum over all processors
>>Ratio - ratio of maximum to minimum over all processors
>>Mess: number of messages sent
>>Avg. len: average message length (bytes)
>>Reduct: number of global reductions
>>Global: entire computation
>>Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>   %T - percent time in this phase %F - percent flops in this
>> phase
>>   %M - percent messages in this phase %L - percent message
>> lengths in this phase
>>   %R - percent reductions in this phase
>>Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> 
>>
>>
>>   ##
>>   ##
>>   #  WARNING!!!#
>>   ##
>>   #   This code was compiled with a debugging option,  #
>>   #   To get timing results run ./configure#
>>   #   using --with-debugging=no, the performance will  #
>>   #   be generally two or three times faster.  #
>>   ##
>>   ##
>>
>>
>> EventCount

Re: [petsc-users] SNES norm control

2016-01-12 Thread Dave May

On 12 January 2016 at 14:14, Gideon Simpson 
wrote:

> That seems to to allow for me to cook up a convergence test in terms of
> the 2 norm.
>

While you are only provided the 2 norm of F, you are also given access to
the SNES object. Thus inside your user convergence test function, you can
call SNESGetFunction() and SNESGetSolution(), then you can compute your
convergence criteria and set the converged reason to what ever you want.

See

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html

Cheers,
  Dave





> What I’m really looking for is the ability to change things to be
> something like the 2 norm of the vector with elements
>
> F_i/|x_i|
>
> where I am looking for a root of F(x).  I can just build that scaling into
> the form function, but is there a way to do it without rewriting that piece
> of the code?
>
>
> -gideon
>
> On Jan 12, 2016, at 12:14 AM, Barry Smith  wrote:
>
>
>   You can use SNESSetConvergenceTest() to use whatever test you want to
> decide on convergence.
>
> Barry
>
> On Jan 11, 2016, at 3:26 PM, Gideon Simpson 
> wrote:
>
> I’m solving nonlinear problem for a complex valued function which is
> decomposed into real and imaginary parts, Q = u + i v.  What I’m finding is
> that where |Q| is small, the numerical phase errors tend to be larger.  I
> suspect this is because it’s using the 2-norm for convergence in the SNES,
> so, where the solution is already, the phase errors are seen as small too.
> Is there a way to use something more like an infinity norm with SNES, to
> get more point wise control?
>
> -gideon
>
>
>
>

Re: [petsc-users] SNES norm control

2016-01-12 Thread Dave May

On 12 January 2016 at 14:33, Gideon Simpson <gideon.simp...@gmail.com>
wrote:

> I’m just a bit confused by the documentation
> for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f
> are passed in, at the current iterate, correct?
>

Yes, but nothing requires you to use them :D


>  I interpreted this as though I had to build by convergence test based on
> those values.
>

This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm
and define any crazy stopping condition you like.

xnorm, gnorm and fnorm are commonly required for many stopping conditions
and are computed by the snes methods. As such, are readily available and
for efficiency and convenience they are provided to the user (e.g. to avoid
you having to re-compute norms).

Cheers,
  Dave


>
> -gideon
>
> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhe...@gmail.com> wrote:
>
>
>
> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simp...@gmail.com>
> wrote:
>
>> That seems to to allow for me to cook up a convergence test in terms of
>> the 2 norm.
>>
>
> While you are only provided the 2 norm of F, you are also given access to
> the SNES object. Thus inside your user convergence test function, you can
> call SNESGetFunction() and SNESGetSolution(), then you can compute your
> convergence criteria and set the converged reason to what ever you want.
>
> See
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>
> Cheers,
>   Dave
>
>
>
>
>
>> What I’m really looking for is the ability to change things to be
>> something like the 2 norm of the vector with elements
>>
>> F_i/|x_i|
>>
>> where I am looking for a root of F(x).  I can just build that scaling
>> into the form function, but is there a way to do it without rewriting that
>> piece of the code?
>>
>>
>> -gideon
>>
>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>
>>
>>   You can use SNESSetConvergenceTest() to use whatever test you want to
>> decide on convergence.
>>
>> Barry
>>
>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simp...@gmail.com>
>> wrote:
>>
>> I’m solving nonlinear problem for a complex valued function which is
>> decomposed into real and imaginary parts, Q = u + i v.  What I’m finding is
>> that where |Q| is small, the numerical phase errors tend to be larger.  I
>> suspect this is because it’s using the 2-norm for convergence in the SNES,
>> so, where the solution is already, the phase errors are seen as small too.
>> Is there a way to use something more like an infinity norm with SNES, to
>> get more point wise control?
>>
>> -gideon
>>
>>
>>
>>
>
>

Re: [petsc-users] SNES norm control

2016-01-12 Thread Dave May

On 12 January 2016 at 15:06, <gideon.simp...@gmail.com> wrote:

> Do I have to manually code in the divergence criteria too?
>

Yes.

By calling SNESSetConvergenceTest() you are replacing the default SNES
convergence test function which will get called at each SNES iteration,
therefore you are responsible for defining all reasons for convergence and
divergence.

To make life easy, you could copy everything in the funciton
SNESConvergedDefault(),

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault

and just replace the rule for
  SNES_CONVERGED_FNORM_RELATIVE
with your custom scaled stopping condition.






>
> On Jan 12, 2016, at 8:37 AM, Dave May <dave.mayhe...@gmail.com> wrote:
>
>
>
> On 12 January 2016 at 14:33, Gideon Simpson <gideon.simp...@gmail.com>
> wrote:
>
>> I’m just a bit confused by the documentation
>> for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f
>> are passed in, at the current iterate, correct?
>>
>
> Yes, but nothing requires you to use them :D
>
>
>>  I interpreted this as though I had to build by convergence test based on
>> those values.
>>
>
> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm
> and define any crazy stopping condition you like.
>
> xnorm, gnorm and fnorm are commonly required for many stopping conditions
> and are computed by the snes methods. As such, are readily available and
> for efficiency and convenience they are provided to the user (e.g. to avoid
> you having to re-compute norms).
>
> Cheers,
>   Dave
>
>
>>
>> -gideon
>>
>> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhe...@gmail.com> wrote:
>>
>>
>>
>> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simp...@gmail.com>
>> wrote:
>>
>>> That seems to to allow for me to cook up a convergence test in terms of
>>> the 2 norm.
>>>
>>
>> While you are only provided the 2 norm of F, you are also given access to
>> the SNES object. Thus inside your user convergence test function, you can
>> call SNESGetFunction() and SNESGetSolution(), then you can compute your
>> convergence criteria and set the converged reason to what ever you want.
>>
>> See
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>>
>> Cheers,
>>   Dave
>>
>>
>>
>>
>>
>>> What I’m really looking for is the ability to change things to be
>>> something like the 2 norm of the vector with elements
>>>
>>> F_i/|x_i|
>>>
>>> where I am looking for a root of F(x).  I can just build that scaling
>>> into the form function, but is there a way to do it without rewriting that
>>> piece of the code?
>>>
>>>
>>> -gideon
>>>
>>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>
>>>
>>>   You can use SNESSetConvergenceTest() to use whatever test you want to
>>> decide on convergence.
>>>
>>> Barry
>>>
>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simp...@gmail.com>
>>> wrote:
>>>
>>> I’m solving nonlinear problem for a complex valued function which is
>>> decomposed into real and imaginary parts, Q = u + i v.  What I’m finding is
>>> that where |Q| is small, the numerical phase errors tend to be larger.  I
>>> suspect this is because it’s using the 2-norm for convergence in the SNES,
>>> so, where the solution is already, the phase errors are seen as small too.
>>> Is there a way to use something more like an infinity norm with SNES, to
>>> get more point wise control?
>>>
>>> -gideon
>>>
>>>
>>>
>>>
>>
>>
>

Re: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock

2016-01-05 Thread Dave May

The manpage

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetDiagonalBlock.html
indicates the reference counter on the returned matrix (a) isn't
incremented.

This statement would imply that in the absence of calling
PetscObjectReference() yourself, you should not call MatDestroy() on the
matrix returned.
If you do call MatDestroy(), a double free will occur when you call
MatDestroy() on the parent matrix from which you pulled the block matrix
out of.

Cheers,
  Dave

On 6 January 2016 at 00:14, Bhalla, Amneet Pal S 
wrote:

> Hi Folks,
>
> Is it safe to call MatDestroy on the sequential matrix returned by
> MatGetDiagonalBlock() after it’s no longer used?
>
>
> Thanks,
>
> — Amneet
> =
> Amneet Bhalla
> Postdoctoral Research Associate
> Department of Mathematics and McAllister Heart Institute
> University of North Carolina at Chapel Hill
> Email: amn...@unc.edu
> Web:  https://abhalla.web.unc.edu
> =
>
>

Re: [petsc-users] Slow MatAssemblyBegin MAT_FINAL_ASSEMBLY

2015-12-17 Thread Dave May

On 17 December 2015 at 08:06, Jose A. Abell M.  wrote:

> Hello dear PETSc users,
>
> This is a problem that pops up often, from what I see, in the mailing
> list. My program takes a long time assembling the matrix.
>
> What I know:
>
>
>- Matrix Size is (MatMPIAIJ) 2670402
>- Number of processes running PETSc: 95
>- Not going to virtual memory (no swapping, used mem well withing each
>node's capacity)
>- System is partitioned with ParMETIS for load balancing
>- I see memory moving around in each node (total used memory changes a
>bit, grows and then frees)
>- Matrix is filled in blocks of size 81x81 (FEM code, so this ends up
>being a sparse matrix)
>- I don't do flushes at all. Only MAT_FINAL_ASSEMBLY when all the
>MatSetValues are done.
>
> Should I do MAT_FLUSH_ASSEMBLY even though I have enough memory to store
> the buffers? If so, how often? Every 100 blocks?
>
> What else could it be?
>
> Its taking several hours to asseble this matrix. I re-use the sparsity
> pattern, so subsequent assemblies are fast. Does this mean that my
> preallocation is wrong?
>

The preallocation could be wrong. That is the usual cause of very slow
matrix assembly. To confirm this hypothesis, run your code with the command
line option -info. You will get an enormous amount of information in
stdout. You might consider using -info with a smallish problem size / core
count.

Inspect the output generated by -info and look for lines like this:

[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

If the number of mallocs during MatSetValues() is not zero, then your
preallocation is not exactly correct. A small number of mallocs, say less
than 10, might be accepted (performance wise). However if the number of
mallocs is > 100, then assembly time will be terribly slow.

Thanks,
  Dave

>
> Regards,
>
>
>
>
> --
>
> José Abell
> *PhD Candidate*
> Computational Geomechanics Group
> Dept. of Civil and Environmental Engineering
> UC Davis
> www.joseabell.com
>
>

Re: [petsc-users] Big discrepancy between machines

2015-12-17 Thread Dave May

On 17 December 2015 at 11:00, Timothée Nicolas <timothee.nico...@gmail.com>
wrote:

> Hi,
>
> So, valgrind is OK (at least on the local machine. Actually on the cluster
> helios, it produces strange results even for the simplest petsc program
> PetscInitialize followed by PetscFinalize, I will try to figure this out
> with their technical team), and I have also tried with exactly the same
> versions (3.6.0) and it does not change the behavior.
>
> So now I would like to now how to have a grip on what comes in and out of
> the SNES and the KSP internal to the SNES. That is, I would like to inspect
> manually the vector which enters the SNES in the first place (should be
> zero I believe), what is being fed to the KSP, and the vector which comes
> out of it, etc. if possible at each iteration of the SNES. I want to
> actually *see* these vectors, and compute there norm by hand. The trouble
> is, it is really hard to understand why the newton residuals are not
> reduced since the KSP converges so nicely. This does not make any sense to
> me, so I want to know what happens to the vectors. But on the SNES list of
> routines, I did not find the tools that would allow me to do that (and
> messing around with the C code is too hard for me, it would take me weeks).
> Does someone have a hint ?
>

The only sane way to do this is to write a custom monitor for your SNES
object.

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESMonitorSet.html

Inside your monitor, you have access the SNES, and everything it defines,
e.g. the current solution, non-linear residual, KSP etc. See these pages

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetKSP.html

Then you can pull apart the residual and compute specific norms (or plot
the residual).

Hopefully you can access everything you need to perform your analysis.

Cheers,
  Dave


>
> Thx
>
> Timothee
>
>
>
>
> 2015-12-15 14:20 GMT+09:00 Matthew Knepley <knep...@gmail.com>:
>
>> On Mon, Dec 14, 2015 at 11:06 PM, Timothée Nicolas <
>> timothee.nico...@gmail.com> wrote:
>>
>>> There is a diference in valgrind indeed between the two. It seems to be
>>> clean on my desktop Mac OS X but not on the cluster. I'll try to see what's
>>> causing this. I still don't understand well what's causing memory leaks in
>>> the case where all PETSc objects are freed correctly (as can pbe checked
>>> with -log_summary).
>>>
>>> Also, I have tried running either
>>>
>>> valgrind ./my_code -option1 -option2...
>>>
>>> or
>>>
>>> valgrind mpiexec -n 1 ./my_code -option1 -option2...
>>>
>>
>> Note here you would need --trace-children=yes for valgrind.
>>
>>   Matt
>>
>>
>>> It seems the second is the correct way to proceed right ? This gives
>>> very different behaviour for valgrind.
>>>
>>> Timothee
>>>
>>>
>>>
>>> 2015-12-14 17:38 GMT+09:00 Timothée Nicolas <timothee.nico...@gmail.com>
>>> :
>>>
>>>> OK, I'll try that, thx
>>>>
>>>> 2015-12-14 17:38 GMT+09:00 Dave May <dave.mayhe...@gmail.com>:
>>>>
>>>>> You have the configure line, so it should be relatively straight
>>>>> forward to configure / build petsc in your home directory.
>>>>>
>>>>>
>>>>> On 14 December 2015 at 09:34, Timothée Nicolas <
>>>>> timothee.nico...@gmail.com> wrote:
>>>>>
>>>>>> OK, The problem is that I don't think I can change this easily as far
>>>>>> as the cluster is concerned. I obtain access to petsc by loading the 
>>>>>> petsc
>>>>>> module, and even if I have a few choices, I don't see any debug builds...
>>>>>>
>>>>>> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhe...@gmail.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Monday, 14 December 2015, Timothée Nicolas <
>>>>>>> timothee.nico...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hum, OK. I use FORTRAN by the way. Is your comment still valid ?
>>>>>>>>
>>>>>>>
>>>>>>> No. Fortran compilers init variables to zero.
>>>>>>> In this case, I would run a debug

Re: [petsc-users] Slow MatAssemblyBegin MAT_FINAL_ASSEMBLY

2015-12-17 Thread Dave May

On Thursday, 17 December 2015, Jose A. Abell M. <jaab...@ucdavis.edu> wrote:

> Thank you Dave!
>
> Do you have a rough idea of how long a matrix like that should take to
> assemble?
> Not hours. Right?
>

If the preallocation is correct, and most of the entries to be inserted
live locally (and don't need to be scattered to another rank), it should
definitely not take hours.



>
> Regards,
> Jose
>
> --
>
> José Abell
> *PhD Candidate*
> Computational Geomechanics Group
> Dept. of Civil and Environmental Engineering
> UC Davis
> www.joseabell.com
>
>
> On Thu, Dec 17, 2015 at 12:58 AM, Dave May <dave.mayhe...@gmail.com
> <javascript:_e(%7B%7D,'cvml','dave.mayhe...@gmail.com');>> wrote:
>
>>
>>
>> On 17 December 2015 at 08:06, Jose A. Abell M. <jaab...@ucdavis.edu
>> <javascript:_e(%7B%7D,'cvml','jaab...@ucdavis.edu');>> wrote:
>>
>>> Hello dear PETSc users,
>>>
>>> This is a problem that pops up often, from what I see, in the mailing
>>> list. My program takes a long time assembling the matrix.
>>>
>>> What I know:
>>>
>>>
>>>- Matrix Size is (MatMPIAIJ) 2670402
>>>- Number of processes running PETSc: 95
>>>- Not going to virtual memory (no swapping, used mem well withing
>>>each node's capacity)
>>>- System is partitioned with ParMETIS for load balancing
>>>- I see memory moving around in each node (total used memory changes
>>>a bit, grows and then frees)
>>>- Matrix is filled in blocks of size 81x81 (FEM code, so this ends
>>>up being a sparse matrix)
>>>- I don't do flushes at all. Only MAT_FINAL_ASSEMBLY when all the
>>>MatSetValues are done.
>>>
>>> Should I do MAT_FLUSH_ASSEMBLY even though I have enough memory to
>>> store the buffers? If so, how often? Every 100 blocks?
>>>
>>> What else could it be?
>>>
>>> Its taking several hours to asseble this matrix. I re-use the sparsity
>>> pattern, so subsequent assemblies are fast. Does this mean that my
>>> preallocation is wrong?
>>>
>>
>> The preallocation could be wrong. That is the usual cause of very slow
>> matrix assembly. To confirm this hypothesis, run your code with the command
>> line option -info. You will get an enormous amount of information in
>> stdout. You might consider using -info with a smallish problem size / core
>> count.
>>
>> Inspect the output generated by -info and look for lines like this:
>>
>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>
>> If the number of mallocs during MatSetValues() is not zero, then your
>> preallocation is not exactly correct. A small number of mallocs, say less
>> than 10, might be accepted (performance wise). However if the number of
>> mallocs is > 100, then assembly time will be terribly slow.
>>
>> Thanks,
>>   Dave
>>
>>
>>
>>
>>>
>>> Regards,
>>>
>>>
>>>
>>>
>>> --
>>>
>>> José Abell
>>> *PhD Candidate*
>>> Computational Geomechanics Group
>>> Dept. of Civil and Environmental Engineering
>>> UC Davis
>>> www.joseabell.com
>>>
>>>
>>
>

Re: [petsc-users] Big discrepancy between machines

2015-12-14 Thread Dave May

One suggestion is you have some uninitialized variables in your pcshell.
Despite your arch being called "debug", your configure options indicate you
have turned debugging off.

C standard doesn't prescribe how uninit variables should be treated - the
behavior is labelled as undefined. As a result, different compilers on
different archs with the same optimization flags can and will treat uninit
variables differently. I find OSX c compilers tend to set them to zero.

I suggest compiling a debug build on both machines and trying your
test again. Also, consider running the debug builds through valgrind.

Thanks,
  Dave

On Monday, 14 December 2015, Timothée Nicolas 
wrote:

> Hi,
>
> I have noticed I have a VERY big difference in behaviour between two
> machines in my problem, solved with SNES. I can't explain it, because I
> have tested my operators which give the same result. I also checked that
> the vectors fed to the SNES are the same. The problem happens only with my
> shell preconditioner. When I don't use it, and simply solve using -snes_mf,
> I don't see anymore than the usual 3-4 changing digits at the end of the
> residuals. However, when I use my pcshell, the results are completely
> different between the two machines.
>
> I have attached output_SuperComputer.txt and output_DesktopComputer.txt,
> which correspond to the output from the exact same code and options (and of
> course same input data file !). More precisely
>
> output_SuperComputer.txt : output on a supercomputer called Helios, sorry
> I don't know the exact specs.
> In this case, the SNES norms are reduced successively:
> 0 SNES Function norm 4.867111712420e-03
> 1 SNES Function norm 5.632325929998e-08
> 2 SNES Function norm 7.427800084502e-15
>
> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 GHz Intel
> Core i5 16GB 1600 MHz DDr3. (the same happens on an other laptop with Mac
> OS X Mavericks).
> In this case, I obtain the following for the SNES norms,
> while in the other, I obtain
> 0 SNES Function norm 4.867111713544e-03
> 1 SNES Function norm 1.56009405e-03
> 2 SNES Function norm 1.552118650943e-03
> 3 SNES Function norm 1.552106297094e-03
> 4 SNES Function norm 1.552106277949e-03
> which I can't explain, because otherwise the KSP residual (with the same
> operator, which I checked) behave well.
>
> As you can see, the first time the preconditioner is applied (DB_, DP_,
> Drho_ and PS_ solves), the two outputs coincide (except for the few last
> digits, up to 9 actually, which is more than I would expect), and
> everything starts to diverge at the first print of the main KSP (the one
> stemming from the SNES) residual norms.
>
> Do you have an idea what may cause such a strange behaviour ?
>
> Best
>
> Timothee
>

< 1 2 3 4 >

101 - 200 of 318 matches

Mail list logo