On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pie...@joliv.et> wrote: > > > On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzon...@gmail.com> wrote: > > > > On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pie...@joliv.et> wrote: > >> >> >> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzon...@gmail.com> wrote: >> > >> > Hi, >> > >> > I am writing to seek your advice regarding a problem I encountered >> while using multigrid to solve a certain issue. >> > I am currently using multigrid with the coarse problem solved by PCLU. >> However, the PC failed randomly with the error below (the value of INFO(2) >> may differ): >> > ```shell >> > [ 0] Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-9, INFO(2)=36 >> > ``` >> > >> > Upon checking the documentation of MUMPS, I discovered that increasing >> the value of ICNTL(14) may help resolve the issue. Specifically, I set the >> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error >> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am >> still curious as to why MUMPS failed randomly in the first place. >> > >> > Upon further inspection, I found that the number of nonzeros of the >> PETSc matrix and the MUMPS matrix were different every time I ran the code. >> I am now left with the following questions: >> > >> > 1. What could be causing the number of nonzeros of the MUMPS matrix to >> change every time I run the code? >> >> Is the Mat being fed to MUMPS distributed on a communicator of size >> greater than one? >> If yes, then, depending on the pivoting and the renumbering, you may get >> non-deterministic results. >> > > Hi, Pierre, > Thank you for your prompt reply. Yes, the size of the communicator is > greater than one. > Even if the size of the communicator is equal, are the results > still non-deterministic? > > > In the most general case, yes. > > Can I assume the Mat being fed to MUMPS is the same in this case? > > > Are you doing algebraic or geometric multigrid? > Are the prolongation operators computed by Firedrake or by PETSc, e.g., > through GAMG? > If it’s the latter, I believe the Mat being fed to MUMPS should always be > the same. > If it’s the former, you’ll have to ask the Firedrake people if there may > be non-determinism in the coarsening process. >
I am using geometric multigrid, and the prolongation operators, I think, are computed by Firedrake. Thanks for your suggestion, I will ask the Firedrake people. > > Is the pivoting and renumbering all done by MUMPS other than PETSc? > > > You could provide your own numbering, but by default, this is outsourced > to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. > I think I won't do this. By the way, does the result of superlu_dist have a similar non-deterministic? Thanks, Zongze > Thanks, > Pierre > > >> > 2. Why is the number of nonzeros of the MUMPS matrix significantly >> greater than that of the PETSc matrix (as seen in the output of ksp_view, >> 115025949 vs 7346177)? >> >> Exact factorizations introduce fill-in. >> The number of nonzeros you are seeing for MUMPS is the number of nonzeros >> in the factors. >> >> > 3. Is it possible that the varying number of nonzeros of the MUMPS >> matrix is the cause of the random failure? >> >> Yes, MUMPS uses dynamic scheduling, which will depend on numerical >> pivoting, and which may generate factors with different number of nonzeros. >> > > Got it. Thank you for your clear explanation. > Zongze > > >> Thanks, >> Pierre > > >> > I have attached a test example written in Firedrake. The output of >> `ksp_view` after running the code twice is included below for your >> reference. >> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 >> and 115377847, respectively, while that of the PETSc matrix was only >> 7346177. >> > >> > ```shell >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >> ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115025949, allocated nonzeros=115025949 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >> ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115377847, allocated nonzeros=115377847 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > ``` >> > >> > I would greatly appreciate any insights you may have on this matter. >> Thank you in advance for your time and assistance. >> > >> > Best wishes, >> > Zongze >> > <test_mumps.py> > > >