Re: [petsc-users] PetscBarrier from fortran
This comes from the new type checking for Fortran code. I suggest you just call MPI_Barrier() using MPI_COMM_WORLD from Fortran. Barry > On Nov 6, 2018, at 11:37 PM, Marius Buerkle via petsc-users > wrote: > > Hi > > When calling PetscBarrier from fortran using "call > PetscBarrier(PETSC_NULL_MAT,ierr)" with latest petsc version 3.10.2, I get > the following error "Error: Type mismatch in argument ‘a’ at (1); passed > TYPE(tmat) to INTEGER(8)". I compiled petsc not with integer(8). It work with > previous versions. > > best, > Marius
Re: [petsc-users] PETSc (3.9.0) GAMG weak scaling test issue
First I would add -gamg_est_ksp_type cg You seem to be converging well so I assume you are setting the null space for GAMG. Note, you should test hypre also. You probably want a bigger "-pc_gamg_process_eq_limit 50". 200 at least but you test your machine with a range on the largest problem. This is a parameter for reducing the number of active processors (on coarse grids). I would only worry about "load3". This has 16K equations per process, which is where you start noticing "strong scaling" problems, depending on the machine. An important parameter is "-pc_gamg_square_graph 0". I would probably start with infinity (eg, 10). Now, I'm not sure about your domain, problem sizes, and thus the weak scaling design. You seem to be scaling on the background mesh, but that may not be a good proxy for complexity. You can look at the number of flops and scale it appropriately by the number of solver iterations to get a relative size of the problem. I would recommend scaling the number of processors with this. For instance here the MatMult line for the 4 proc and 16K proc run: EventCount Time (sec) Flop --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s MatMult 636 1.0 1.9035e-01 1.0 3.12e+08 1.1 7.6e+03 3.0e+03 0.0e+00 0 47 62 44 0 0 47 62 44 0 6275 [2 procs] MatMult 1416 1.0 1.9601e+002744.6 4.82e+08 0.0 4.3e+08 7.2e+02 0.0e+00 0 48 50 48 0 0 48 50 48 0 2757975 [16K procs] Now, you have empty processors. See the massive load imbalance on time and the zero on Flops. The "Ratio" is max/min and cleary min=0 so PETSc reports a ratio of 0 (it is infinity really). Also, weak scaling on a thin body (I don't know your domain) is a little funny because as the problem scales up the mesh becomes more 3D and this causes the cost per equation to go up. That is why I prefer to use the number of non-zeros as the processor scaling function but number of equations is easier ... The PC setup times are large (I see 48 seconds at 16K bu you report 16). -pc_gamg_square_graph 10 should help that. The max number of flops per processor in MatMult goes up by 50% and the max time goes up by 10x and the number of iterations goes up by 13/8. If I put all of this together I get that 75% of the time at 16K is in communication at 16K. I think that and the absolute time can be improved some by optimizing parameters as I've suggested. Mark On Wed, Nov 7, 2018 at 11:03 AM "Alberto F. Martín" via petsc-users < petsc-users@mcs.anl.gov> wrote: > Dear All, > > we are performing a weak scaling test of the PETSc (v3.9.0) GAMG > preconditioner when applied to the linear system arising > from the *conforming unfitted FE discretization *(using Q1 Lagrangian > FEs) of a 3D PDE Poisson problem, where > the boundary of the domain (a popcorn flake) is described as a > zero-level-set embedded within a uniform background > (Cartesian-like) hexahedral mesh. Details underlying the FEM formulation > can be made available on demand if you > believe that this might be helpful, but let me just point out that it is > designed such that it addresses the well-known > ill-conditioning issues of unfitted FE discretizations due to the small > cut cell problem. > > The weak scaling test is set up as follows. We start from a single cube > background mesh, and refine it uniformly several > steps, until we have approximately either 10**3 (load1), 20**3 (load2), or > 40**3 (load3) hexahedra/MPI task when > distributing it over 4 MPI tasks. The benchmark is scaled such that the > next larger scale problem to be tested is obtained > by uniformly refining the mesh from the previous scale and running it on > 8x times the number of MPI tasks that we used > in the previous scale. As a result, we obtain three weak scaling curves > for each of the three fixed loads per MPI task > above, on the following total number of MPI tasks: 4, 32, 262, 2097, > 16777. The underlying mesh is not partitioned among > MPI tasks using ParMETIS (unstructured multilevel graph partitioning) nor > optimally by hand, but following the so-called > z-shape space-filling curves provided by an underlying octree-like mesh > handler (i.e., p4est library). > > I configured the preconditioned linear solver as follows: > > -ksp_type cg > -ksp_monitor > -ksp_rtol 1.0e-6 > -ksp_converged_reason > -ksp_max_it 500 > -ksp_norm_type unpreconditioned > -ksp_view > -log_view > > -pc_type gamg > -pc_gamg_type agg > -mg_levels_esteig_ksp_type cg > -mg_coarse_sub_pc_type cholesky > -mg_coarse_sub_pc_factor_mat_ordering_type nd > -pc_gamg_process_eq_limit 50 > -pc_gamg_square_graph 0 >
Re: [petsc-users] [petsc-maint] Correct use of PCFactorSetMatOrderingType
please respond to petsc-users. You are doing 5 solves here in 14 seconds. You seem to be saying that the two pressure solves are taking all of this time. I don't know why the two solves are different. You seem to be saying that OpenFOAM solves the problem in 10 seconds and PETSc solves it in 14 seconds. Is that correct? Hypre seems to be running fine. On Wed, Nov 7, 2018 at 11:24 AM Edoardo alinovi wrote: > Thanks a lot Mark for your kind replay. The solver is mine and I use > PETSc for the solution of momentum and pressure. The first is solved very > fast by a standard bcgs + bjacobi, but the pressure is the source of all > evils and, unfortunately, I am pretty sure that almost all the time within > the time-step is needed by KSP to solve the pressure (see log attached). I > have verified this also putting a couple of mpi_wtime around the kspsolve > call. The pressure is solved 2 times (1 prediction + 1 correction), the > prediction takes around 11s , the correction around 4s (here I am avoiding > to recompute the preconditioner), all the rest of the code (flux assembling > + mometum solution + others) around 1s. Openfoam does the same procedure > with the same tolerance in 10s using its gamg version (50 it to converge). > The number of iteration required to solve the pressure with hypre are 12. > Gamg performs similarly to hypre in terms of speed, but with 50 iterations > to converge. Am I missing something in the setup in your opinion? > > thanks a lot, > > Edo > > -- > > Edoardo Alinovi, Ph.D. > > DICCA, Scuola Politecnica > Universita' di Genova > 1, via Montallegro > 16145 Genova, Italy > > email: edoardo.alin...@dicca.unige.it > Tel: +39 010 353 2540 > > > > > Il giorno mer 7 nov 2018 alle ore 16:50 Mark Adams ha > scritto: > >> You can try -pc_type gamg, but hypre is a pretty good solver for the >> Laplacian. If hypre is just a little faster than LU on a 3D problem (that >> takes 10 seconds to solve) then AMG is not doing well. I would expect that >> AMG is taking a lot of iterations (eg, >> 10). You can check that with >> -ksp_monitor. >> >> The PISO algorithm is a multistage algorithm with a pressure correction >> in it. It also has a solve for the velocity, from what I can tell. Are you >> building PISO yourself and using PETSc just for the pressure correction? >> Are you sure the time is spent in this solver? You can use -log_view to see >> performance numbers and look for KSPSolve to see how much time is spent in >> the PETSc solver. >> >> Mark >> >> >> On Wed, Nov 7, 2018 at 10:26 AM Zhang, Hong via petsc-maint < >> petsc-ma...@mcs.anl.gov> wrote: >> >>> Edoardo: >>> Forwarding your request to petsc-maint where you can get fast and expert >>> advise. I do not have suggestion for your application, but someone in our >>> team likely will make suggestion. >>> Hong >>> >>> Hello Hong, Well, using -sub_pc_type lu it super slow. I am desperately triying to enhance performaces of my code (CFD, finite volume, PISO alghoritm), in particular I have a strong bottleneck in the solution of pressure correction equation which takes almost the 90% of computational time. Using multigrid as preconditoner (hypre with default options) is slighlty better, but comparing the results against the multigrid used in openFOAM, my code is losing 10s/iteration which a huge amount of time. Now, since that all the time is employed by KSPSolve, I feel a bit powerless. Do you have any helpful advice? Thank you very much! -- Edoardo Alinovi, Ph.D. DICCA, Scuola Politecnica Universita' di Genova 1, via Montallegro 16145 Genova, Italy email: edoardo.alin...@dicca.unige.it Tel: +39 010 353 2540 Il giorno mar 6 nov 2018 alle ore 17:15 Zhang, Hong ha scritto: > Edoardo: > Interesting. I thought it would not affect performance much. What > happens if you use -sub_pc_type lu'? > Hong > > Dear Hong and Matt, >> >> thank you for your kind replay. I have just tested your suggestions >> and applied " -sub_pc_type ilu -sub_pc_factor_mat_ordering_type nd/rcm" >> and, in both cases, I have found a deterioration of performances >> with respect to doing nothing (thus just putting default PCBJACOBI). Is >> it >> normal? However, I guess this is very problem dependent. >> -- >> >> Edoardo Alinovi, Ph.D. >> >> DICCA, Scuola Politecnica >> Universita' di Genova >> 1, via Montallegro >> 16145 Genova, Italy >> >> email: edoardo.alin...@dicca.unige.it >> Tel: +39 010 353 2540 >> >> >> >> >> Il giorno mar 6 nov 2018 alle ore 16:04 Zhang, Hong < >> hzh...@mcs.anl.gov> ha scritto: >> >>> Edoardo: >>> You can test runtime option '-sub_pc_factor_mat_ordering_type' and >>> use '-log_view' to get performance on different orderings, >>>
Re: [petsc-users] Vec, Mat and binaryfiles.
Please always use "reply-all" so that your messages go to the list. This is standard mailing list etiquette. It is important to preserve threading for people who find this discussion later and so that we do not waste our time re-answering the same questions that have already been answered in private side-conversations. You'll likely get an answer faster that way too. Sal Am writes: > Thank you Jed for the quick response! > >> >> Yes, of course the formats would have to match. I would recommend >> writing the files in an existing format such as PETSc's binary format. >> > > Unfortunately I do not think I can change the source code to output those > two files in PETSc format and it would probably take a very long time > converting everything into PETSc (it is not even my code). File-based workflows are very often bottlenecks. You can also use any convenient software (e.g., Python or Matlab/Octave) to convert your custom binary formats to PETSc binary format (see PetscBinaryIO provided with PETSc), at which point you'll be able to read in parallel. If you don't care about scalability or only need to read once, then you can write code of the type you propose. > Do you have any other suggestions on how to read in those two complex > binary files? Also why would it be more difficult to parallelise as > I thought getting the two files in PETSc vector format would allow me to > use the rest of PETSc library the usual way? > > Kind regards, > S > > > On Mon, Nov 5, 2018 at 4:49 PM Jed Brown wrote: > >> Sal Am via petsc-users writes: >> >> > Hi, >> > >> > I am trying to solve a Ax=b complex system. the vector b and "matrix" A >> are >> > both binary and NOT created by PETSc. So I keep getting error messages >> that >> > they are not correct format when I read the files with >> PetscViewBinaryOpen, >> > after some digging it seems that one cannot just read a binary file that >> > was created by another software. >> >> Yes, of course the formats would have to match. I would recommend >> writing the files in an existing format such as PETSc's binary format. >> While the method you describe can be made to work, it will be more work >> to make it parallel. >> >> > How would I go on to solve this problem? >> > >> > More info and trials: >> > >> > "matrix" A consists of two files, one that contains row column index >> > numbers and one that contains the non-zero values. So what I would have >> to >> > do is multiply the last term in a+b with PETSC_i to get a real + >> imaginary >> > vector A. >> > >> > vector b is in binary, so what I have done so far (not sure if it works) >> is: >> > >> > std::ifstream input("Vector_b.bin", std::ios::binary ); >> > while (input.read(reinterpret_cast(), sizeof(float))) >> > ierr= VecSetValues(u,1,,,INSERT_VALUES);CHKERRQ(ierr); >> > >> > where v is a PetscScalar. >> > >> > Once I am able to read both matrices I think I can figure out the solvers >> > to solve the system. >> > >> > All the best, >> > S >>
Re: [petsc-users] Problems about PCtype bjacobi
On Wed, Nov 7, 2018 at 10:16 AM Yingjie Wu via petsc-users < petsc-users@mcs.anl.gov> wrote: > Dear Petsc developer: > Hi, > Recently, I'm solving the problems of nonlinear systems of PDEs, I > encountered some problems about precondition and wanted to seek help. > > 1.I set the precondition matrix in SNES as MPIAIJ in the program, and then > use Matrix Free method to solve my problem. The log information of the > program is as follows: > > SNES Object: 1 MPI processes > type: newtonls > maximum iterations=50, maximum function evaluations=1 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=177 > total number of function evaluations=371 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.00e-04 > maxstep=1.00e+08, minlambda=1.00e-12 > tolerances: relative=1.00e-08, absolute=1.00e-15, > lambda=1.00e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=0.01, absolute=1e-50, divergence=1. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 1 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: bjacobi > number of blocks = 1 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (sub_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_sub_) 1 MPI processes >type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=961, cols=961 >package used to perform factorization: petsc > total: nonzeros=4129, allocated nonzeros=4129 > total number of mallocs used during MatSetValues calls > =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=961, cols=961 > total: nonzeros=4129, allocated nonzeros=4805 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=961, cols=961 > total: nonzeros=4129, allocated nonzeros=9610 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > linear system matrix followed by preconditioner matrix: > Mat Object: 1 MPI processes > type: mffd > rows=961, cols=961 > Matrix-free approximation: > err=1.49012e-08 (relative error in function evaluation) > Using wp compute h routine > Does not compute normU > Mat Object: 1 MPI processes > type: mpiaij > rows=961, cols=961 > total: nonzeros=4129, allocated nonzeros=9610 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > > Although parallel matrix is used, it runs on a single processor. Because > of the use of parallel matrices, the overall precondition scheme should be > bjacobi, and then build a KSP for each block (there is only one block in my > program). Therefore, there will be a sub KSP object in the PC information. > But in the above information, a subsubksp is also embedded in the sub KSP > object. I don't understand the reason for this KSP. Please help me answer. > It looks like you are setting the sub PC type to bjacobi and it has a sub (sub) PC. > > 2. Bjacobi is a precondition method in theory. Why is there a subsystem of > linear equations solver