Re: [petsc-users] PetscBarrier from fortran

2018-11-07 Thread Smith, Barry F. via petsc-users

   This comes from the new type checking for Fortran code.

   I suggest you just call MPI_Barrier() using MPI_COMM_WORLD from Fortran.

   Barry


> On Nov 6, 2018, at 11:37 PM, Marius Buerkle via petsc-users 
>  wrote:
> 
> Hi
>  
> When calling PetscBarrier from fortran using "call 
> PetscBarrier(PETSC_NULL_MAT,ierr)" with latest petsc version 3.10.2, I get 
> the following error "Error: Type mismatch in argument ‘a’ at (1); passed 
> TYPE(tmat) to INTEGER(8)". I compiled petsc not with integer(8). It work with 
> previous versions.
>  
> best,
> Marius



Re: [petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

2018-11-07 Thread Mark Adams via petsc-users
First I would add -gamg_est_ksp_type cg

You seem to be converging well so I assume you are setting the null space
for GAMG.

Note, you should test hypre also.

You probably want a bigger "-pc_gamg_process_eq_limit 50". 200 at least but
you test your machine with a range on the largest problem. This is a
parameter for reducing the number of active processors (on coarse grids).

I would only worry about "load3". This has 16K equations per process, which
is where you start noticing "strong scaling" problems, depending on the
machine.

An important parameter is "-pc_gamg_square_graph 0". I would probably start
with infinity (eg, 10).

Now, I'm not sure about your domain, problem sizes, and thus the weak
scaling design. You seem to be scaling on the background mesh, but that may
not be a good proxy for complexity.

You can look at the number of flops and scale it appropriately by the
number of solver iterations to get a relative size of the problem. I would
recommend scaling the number of processors with this. For instance here the
MatMult line for the 4 proc and 16K proc run:


EventCount  Time (sec) Flop
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

MatMult  636 1.0 1.9035e-01 1.0 3.12e+08 1.1 7.6e+03 3.0e+03
0.0e+00  0 47 62 44  0   0 47 62 44  0  6275 [2 procs]
MatMult 1416 1.0 1.9601e+002744.6 4.82e+08 0.0 4.3e+08 7.2e+02
0.0e+00  0 48 50 48  0   0 48 50 48  0 2757975 [16K procs]

Now, you have empty processors. See the massive load imbalance on time and
the zero on Flops. The "Ratio" is max/min and cleary min=0 so PETSc reports
a ratio of 0 (it is infinity really).

Also, weak scaling on a thin body (I don't know your domain) is a little
funny because as the problem scales up the mesh becomes more 3D and this
causes the cost per equation to go up. That is why I prefer to use the
number of non-zeros as the processor scaling function but number of
equations is easier ...

The PC setup times are large (I see 48 seconds at 16K bu you report 16).
-pc_gamg_square_graph 10 should help that.

The max number of flops per processor in MatMult goes up by 50% and the max
time goes up by 10x and the number of iterations goes up by 13/8. If I put
all of this together I get that 75% of the time at 16K is in communication
at 16K. I think that and the absolute time can be improved some by
optimizing parameters as I've suggested.

Mark





On Wed, Nov 7, 2018 at 11:03 AM "Alberto F. Martín" via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear All,
>
> we are performing a weak scaling test of the PETSc (v3.9.0) GAMG
> preconditioner when applied to the linear system arising
> from the *conforming unfitted FE discretization *(using Q1 Lagrangian
> FEs) of a 3D PDE Poisson problem, where
> the boundary of the domain (a popcorn flake)  is described as a
> zero-level-set embedded within a uniform background
> (Cartesian-like) hexahedral mesh. Details underlying the FEM formulation
> can be made available on demand if you
> believe that this might be helpful, but let me just point out that it is
> designed such that it addresses the well-known
> ill-conditioning issues of unfitted FE discretizations due to the small
> cut cell problem.
>
> The weak scaling test is set up as follows. We start from a single cube
> background mesh, and refine it uniformly several
> steps, until we have approximately either 10**3 (load1), 20**3 (load2), or
> 40**3 (load3) hexahedra/MPI task when
> distributing it over 4 MPI tasks. The benchmark is scaled such that the
> next larger scale problem to be tested is obtained
> by uniformly refining the mesh from the previous scale and running it on
> 8x times the number of MPI tasks that we used
> in the previous scale.  As a result, we obtain three weak scaling curves
> for each of the three fixed loads per MPI task
> above, on the following total number of MPI tasks: 4, 32, 262, 2097,
> 16777. The underlying mesh is not partitioned among
> MPI tasks using ParMETIS (unstructured multilevel graph partitioning)  nor
> optimally by hand, but following the so-called
> z-shape space-filling curves provided by an underlying octree-like mesh
> handler (i.e., p4est library).
>
> I configured the preconditioned linear solver as follows:
>
> -ksp_type cg
> -ksp_monitor
> -ksp_rtol 1.0e-6
> -ksp_converged_reason
> -ksp_max_it 500
> -ksp_norm_type unpreconditioned
> -ksp_view
> -log_view
>
> -pc_type gamg
> -pc_gamg_type agg
> -mg_levels_esteig_ksp_type cg
> -mg_coarse_sub_pc_type cholesky
> -mg_coarse_sub_pc_factor_mat_ordering_type nd
> -pc_gamg_process_eq_limit 50
> -pc_gamg_square_graph 0
> 

Re: [petsc-users] [petsc-maint] Correct use of PCFactorSetMatOrderingType

2018-11-07 Thread Mark Adams via petsc-users
please respond to petsc-users.

You are doing 5 solves here in 14 seconds. You seem to be saying that the
two pressure solves are taking all of this time. I don't know why the two
solves are different.

You seem to be saying that OpenFOAM solves the problem in 10 seconds and
PETSc solves it in 14 seconds. Is that correct? Hypre seems to be running
fine.



On Wed, Nov 7, 2018 at 11:24 AM Edoardo alinovi 
wrote:

> Thanks a lot Mark for your kind replay. The solver is mine and I use
> PETSc  for the solution of momentum and pressure. The first is solved very
> fast by a standard bcgs + bjacobi, but the pressure is the source of all
> evils and, unfortunately, I am pretty sure that almost all the time within
> the time-step is needed by KSP to solve the pressure (see log attached). I
> have verified this also putting a couple of mpi_wtime around the kspsolve
> call. The pressure is solved 2 times (1 prediction + 1 correction), the
> prediction takes around 11s , the correction around 4s (here I am avoiding
> to recompute the preconditioner), all the rest of the code (flux assembling
> + mometum solution + others) around 1s. Openfoam does the same procedure
> with the same tolerance in 10s using its gamg version (50 it to converge).
> The number of iteration required to solve the pressure with hypre are 12.
> Gamg performs similarly to hypre in terms of speed, but with 50 iterations
> to converge. Am I missing something in the setup in your opinion?
>
> thanks a lot,
>
> Edo
>
> --
>
> Edoardo Alinovi, Ph.D.
>
> DICCA, Scuola Politecnica
> Universita' di Genova
> 1, via Montallegro
> 16145 Genova, Italy
>
> email: edoardo.alin...@dicca.unige.it
> Tel: +39 010 353 2540
>
>
>
>
> Il giorno mer 7 nov 2018 alle ore 16:50 Mark Adams  ha
> scritto:
>
>> You can try -pc_type gamg, but hypre is a pretty good solver for the
>> Laplacian. If hypre is just a little faster than LU on a 3D problem (that
>> takes 10 seconds to solve) then AMG is not doing well. I would expect that
>> AMG is taking a lot of iterations (eg, >> 10). You can check that with
>> -ksp_monitor.
>>
>> The PISO algorithm is a multistage algorithm with a pressure correction
>> in it. It also has a solve for the velocity, from what I can tell. Are you
>> building PISO yourself and using PETSc just for the pressure correction?
>> Are you sure the time is spent in this solver? You can use -log_view to see
>> performance numbers and look for KSPSolve to see how much time is spent in
>> the PETSc solver.
>>
>> Mark
>>
>>
>> On Wed, Nov 7, 2018 at 10:26 AM Zhang, Hong via petsc-maint <
>> petsc-ma...@mcs.anl.gov> wrote:
>>
>>> Edoardo:
>>> Forwarding your request to petsc-maint where you can get fast and expert
>>> advise. I do not have suggestion for your application, but someone in our
>>> team likely will make suggestion.
>>> Hong
>>>
>>> Hello Hong,

 Well,  using -sub_pc_type lu  it super slow. I am desperately triying
 to enhance performaces of my code (CFD, finite volume, PISO alghoritm), in
 particular I have a strong bottleneck in the solution of pressure
 correction equation which takes almost the 90% of computational time. Using
 multigrid as preconditoner (hypre with default options)  is slighlty
 better, but comparing the results against the multigrid used in openFOAM,
 my code is losing 10s/iteration which a huge amount of time. Now, since
 that all the time is employed by KSPSolve, I feel a bit powerless.  Do you
 have any helpful advice?

 Thank you very much!
 --

 Edoardo Alinovi, Ph.D.

 DICCA, Scuola Politecnica
 Universita' di Genova
 1, via Montallegro
 16145 Genova, Italy

 email: edoardo.alin...@dicca.unige.it
 Tel: +39 010 353 2540




 Il giorno mar 6 nov 2018 alle ore 17:15 Zhang, Hong 
 ha scritto:

> Edoardo:
> Interesting. I thought it would not affect performance much. What
> happens if you use -sub_pc_type lu'?
> Hong
>
> Dear Hong and Matt,
>>
>> thank you for your kind replay. I have just tested your suggestions
>> and applied " -sub_pc_type ilu -sub_pc_factor_mat_ordering_type nd/rcm"
>> and, in both cases, I have found  a deterioration of performances
>> with respect to doing nothing (thus just putting default PCBJACOBI). Is 
>> it
>> normal? However, I guess this is very problem dependent.
>> --
>>
>> Edoardo Alinovi, Ph.D.
>>
>> DICCA, Scuola Politecnica
>> Universita' di Genova
>> 1, via Montallegro
>> 16145 Genova, Italy
>>
>> email: edoardo.alin...@dicca.unige.it
>> Tel: +39 010 353 2540
>>
>>
>>
>>
>> Il giorno mar 6 nov 2018 alle ore 16:04 Zhang, Hong <
>> hzh...@mcs.anl.gov> ha scritto:
>>
>>> Edoardo:
>>> You can test runtime option '-sub_pc_factor_mat_ordering_type' and
>>> use '-log_view' to get performance on different orderings,
>>> 

Re: [petsc-users] Vec, Mat and binaryfiles.

2018-11-07 Thread Jed Brown via petsc-users
Please always use "reply-all" so that your messages go to the list.
This is standard mailing list etiquette.  It is important to preserve
threading for people who find this discussion later and so that we do
not waste our time re-answering the same questions that have already
been answered in private side-conversations.  You'll likely get an
answer faster that way too.

Sal Am  writes:

> Thank you Jed for the quick response!
>
>>
>> Yes, of course the formats would have to match.  I would recommend
>> writing the files in an existing format such as PETSc's binary format.
>>
>
> Unfortunately I do not think I can change the source code to output those
> two files in PETSc format and it would probably take a very long time
> converting everything into PETSc (it is not even my code).

File-based workflows are very often bottlenecks.  You can also use any
convenient software (e.g., Python or Matlab/Octave) to convert your
custom binary formats to PETSc binary format (see PetscBinaryIO provided
with PETSc), at which point you'll be able to read in parallel.  If you
don't care about scalability or only need to read once, then you can
write code of the type you propose.

> Do you have any other suggestions on how to read in those two complex
> binary files? Also why would it be more difficult to parallelise as
> I thought getting the two files in PETSc vector format would allow me to
> use the rest of PETSc library the usual way?
>
> Kind regards,
> S
>
>
> On Mon, Nov 5, 2018 at 4:49 PM Jed Brown  wrote:
>
>> Sal Am via petsc-users  writes:
>>
>> > Hi,
>> >
>> > I am trying to solve a Ax=b complex system. the vector b and "matrix" A
>> are
>> > both binary and NOT created by PETSc. So I keep getting error messages
>> that
>> > they are not correct format when I read the files with
>> PetscViewBinaryOpen,
>> > after some digging it seems that one cannot just read a binary file that
>> > was created by another software.
>>
>> Yes, of course the formats would have to match.  I would recommend
>> writing the files in an existing format such as PETSc's binary format.
>> While the method you describe can be made to work, it will be more work
>> to make it parallel.
>>
>> > How would I go on to solve this problem?
>> >
>> > More info and trials:
>> >
>> > "matrix" A consists of two files, one that contains row column index
>> > numbers and one that contains the non-zero values. So what I would have
>> to
>> > do is multiply the last term in a+b with PETSC_i to get a real +
>> imaginary
>> > vector A.
>> >
>> > vector b is in binary, so what I have done so far (not sure if it works)
>> is:
>> >
>> > std::ifstream input("Vector_b.bin", std::ios::binary );
>> > while (input.read(reinterpret_cast(), sizeof(float)))
>> >  ierr= VecSetValues(u,1,,,INSERT_VALUES);CHKERRQ(ierr);
>> >
>> > where v is a PetscScalar.
>> >
>> > Once I am able to read both matrices I think I can figure out the solvers
>> > to solve the system.
>> >
>> > All the best,
>> > S
>>


Re: [petsc-users] Problems about PCtype bjacobi

2018-11-07 Thread Mark Adams via petsc-users
On Wed, Nov 7, 2018 at 10:16 AM Yingjie Wu via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear Petsc developer:
> Hi,
> Recently, I'm solving the problems of nonlinear systems of PDEs, I
> encountered some problems about precondition and wanted to seek help.
>
> 1.I set the precondition matrix in SNES as MPIAIJ in the program, and then
> use Matrix Free method to solve my problem. The log information of the
> program is as follows:
>
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=177
>   total number of function evaluations=371
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object: 1 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=0.01, absolute=1e-50, divergence=1.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
> type: bjacobi
>   number of blocks = 1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object: (sub_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object: (sub_) 1 MPI processes
> type: bjacobi
>   number of blocks = 1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object: (sub_sub_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object: (sub_sub_) 1 MPI processes
>type: ilu
>   out-of-place factorization
>   0 levels of fill
>   tolerance for zero pivot 2.22045e-14
>   matrix ordering: natural
>   factor fill ratio given 1., needed 1.
> Factored matrix follows:
>  Mat Object: 1 MPI processes
> type: seqaij
> rows=961, cols=961
>package used to perform factorization: petsc
> total: nonzeros=4129, allocated nonzeros=4129
> total number of mallocs used during MatSetValues calls
> =0
>   not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=4805
>   total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: mpiaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=9610
>  total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> linear system matrix followed by preconditioner matrix:
> Mat Object: 1 MPI processes
>   type: mffd
>   rows=961, cols=961
> Matrix-free approximation:
>   err=1.49012e-08 (relative error in function evaluation)
>   Using wp compute h routine
>   Does not compute normU
> Mat Object: 1 MPI processes
>   type: mpiaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=9610
>   total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> Although parallel matrix is used, it runs on a single processor. Because
> of the use of parallel matrices, the overall precondition scheme should be
> bjacobi, and then build a KSP for each block (there is only one block in my
> program). Therefore, there will be a sub KSP object in the PC information.
> But in the above information, a subsubksp is also embedded in the sub KSP
> object. I don't understand the reason for this KSP. Please help me answer.
>

It looks like you are setting the sub PC type to bjacobi and it has a sub
(sub) PC.


>
> 2. Bjacobi is a precondition method in theory. Why is there a subsystem of
> linear equations solver