Thanks, I will give it a try. Best wishes, Zongze
On Sat, 4 Mar 2023 at 23:09, Pierre Jolivet <pie...@joliv.et> wrote: > > > On 4 Mar 2023, at 3:26 PM, Zongze Yang <yangzon...@gmail.com> wrote: > > > > > On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pie...@joliv.et> wrote: > >> >> >> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzon...@gmail.com> wrote: >> >> >> >> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pie...@joliv.et> wrote: >> >>> >>> >>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzon...@gmail.com> wrote: >>> > >>> > Hi, >>> > >>> > I am writing to seek your advice regarding a problem I encountered >>> while using multigrid to solve a certain issue. >>> > I am currently using multigrid with the coarse problem solved by PCLU. >>> However, the PC failed randomly with the error below (the value of INFO(2) >>> may differ): >>> > ```shell >>> > [ 0] Error reported by MUMPS in numerical factorization phase: >>> INFOG(1)=-9, INFO(2)=36 >>> > ``` >>> > >>> > Upon checking the documentation of MUMPS, I discovered that increasing >>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the >>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error >>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am >>> still curious as to why MUMPS failed randomly in the first place. >>> > >>> > Upon further inspection, I found that the number of nonzeros of the >>> PETSc matrix and the MUMPS matrix were different every time I ran the code. >>> I am now left with the following questions: >>> > >>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to >>> change every time I run the code? >>> >>> Is the Mat being fed to MUMPS distributed on a communicator of size >>> greater than one? >>> If yes, then, depending on the pivoting and the renumbering, you may get >>> non-deterministic results. >>> >> >> Hi, Pierre, >> Thank you for your prompt reply. Yes, the size of the communicator is >> greater than one. >> Even if the size of the communicator is equal, are the results >> still non-deterministic? >> >> >> In the most general case, yes. >> >> Can I assume the Mat being fed to MUMPS is the same in this case? >> >> >> Are you doing algebraic or geometric multigrid? >> Are the prolongation operators computed by Firedrake or by PETSc, e.g., >> through GAMG? >> If it’s the latter, I believe the Mat being fed to MUMPS should always be >> the same. >> If it’s the former, you’ll have to ask the Firedrake people if there may >> be non-determinism in the coarsening process. >> > > I am using geometric multigrid, and the prolongation operators, I think, > are computed by Firedrake. > Thanks for your suggestion, I will ask the Firedrake people. > > >> >> Is the pivoting and renumbering all done by MUMPS other than PETSc? >> >> >> You could provide your own numbering, but by default, this is outsourced >> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. >> > > I think I won't do this. > By the way, does the result of superlu_dist have a similar > non-deterministic? > > > SuperLU_DIST uses static pivoting as far as I know, so it may be more > deterministic. > > Thanks, > Pierre > > Thanks, > Zongze > > >> Thanks, >> Pierre >> >> >>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly >>> greater than that of the PETSc matrix (as seen in the output of ksp_view, >>> 115025949 vs 7346177)? >>> >>> Exact factorizations introduce fill-in. >>> The number of nonzeros you are seeing for MUMPS is the number of >>> nonzeros in the factors. >>> >>> > 3. Is it possible that the varying number of nonzeros of the MUMPS >>> matrix is the cause of the random failure? >>> >>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical >>> pivoting, and which may generate factors with different number of nonzeros. >>> >> >> Got it. Thank you for your clear explanation. >> Zongze >> >> >>> Thanks, >>> Pierre >> >> >>> > I have attached a test example written in Firedrake. The output of >>> `ksp_view` after running the code twice is included below for your >>> reference. >>> > In the output, the number of nonzeros of the MUMPS matrix was >>> 115025949 and 115377847, respectively, while that of the PETSc matrix was >>> only 7346177. >>> > >>> > ```shell >>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>> ::ascii_info_detail | grep -A3 "type: " >>> > type: preonly >>> > maximum iterations=10000, initial guess is zero >>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> > left preconditioning >>> > -- >>> > type: lu >>> > out-of-place factorization >>> > tolerance for zero pivot 2.22045e-14 >>> > matrix ordering: external >>> > -- >>> > type: mumps >>> > rows=1050625, cols=1050625 >>> > package used to perform factorization: mumps >>> > total: nonzeros=115025949, allocated nonzeros=115025949 >>> > -- >>> > type: mpiaij >>> > rows=1050625, cols=1050625 >>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>> > total number of mallocs used during MatSetValues calls=0 >>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>> ::ascii_info_detail | grep -A3 "type: " >>> > type: preonly >>> > maximum iterations=10000, initial guess is zero >>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> > left preconditioning >>> > -- >>> > type: lu >>> > out-of-place factorization >>> > tolerance for zero pivot 2.22045e-14 >>> > matrix ordering: external >>> > -- >>> > type: mumps >>> > rows=1050625, cols=1050625 >>> > package used to perform factorization: mumps >>> > total: nonzeros=115377847, allocated nonzeros=115377847 >>> > -- >>> > type: mpiaij >>> > rows=1050625, cols=1050625 >>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>> > total number of mallocs used during MatSetValues calls=0 >>> > ``` >>> > >>> > I would greatly appreciate any insights you may have on this matter. >>> Thank you in advance for your time and assistance. >>> > >>> > Best wishes, >>> > Zongze >>> > <test_mumps.py> >> >> >>