Re: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Zongze Yang Sat, 04 Mar 2023 06:26:30 -0800

On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pie...@joliv.et> wrote:

>
>
> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzon...@gmail.com> wrote:
>
>
>
> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pie...@joliv.et> wrote:
>
>>
>>
>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzon...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I am writing to seek your advice regarding a problem I encountered
>> while using multigrid to solve a certain issue.
>> > I am currently using multigrid with the coarse problem solved by PCLU.
>> However, the PC failed randomly with the error below (the value of INFO(2)
>> may differ):
>> > ```shell
>> > [ 0] Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-9, INFO(2)=36
>> > ```
>> >
>> > Upon checking the documentation of MUMPS, I discovered that increasing
>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
>> still curious as to why MUMPS failed randomly in the first place.
>> >
>> > Upon further inspection, I found that the number of nonzeros of the
>> PETSc matrix and the MUMPS matrix were different every time I ran the code.
>> I am now left with the following questions:
>> >
>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
>> change every time I run the code?
>>
>> Is the Mat being fed to MUMPS distributed on a communicator of size
>> greater than one?
>> If yes, then, depending on the pivoting and the renumbering, you may get
>> non-deterministic results.
>>
>
> Hi, Pierre,
> Thank you for your prompt reply. Yes, the size of the communicator is
> greater than one.
> Even if the size of the communicator is equal, are the results
> still non-deterministic?
>
>
> In the most general case, yes.
>
> Can I assume the Mat being fed to MUMPS is the same in this case?
>
>
> Are you doing algebraic or geometric multigrid?
> Are the prolongation operators computed by Firedrake or by PETSc, e.g.,
> through GAMG?
> If it’s the latter, I believe the Mat being fed to MUMPS should always be
> the same.
> If it’s the former, you’ll have to ask the Firedrake people if there may
> be non-determinism in the coarsening process.
>


I am using geometric multigrid, and the prolongation operators, I think,
are computed by Firedrake.
Thanks for your suggestion, I will ask the Firedrake people.


>
> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>
>
> You could provide your own numbering, but by default, this is outsourced
> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
>

I think I won't do this.
By the way, does the result of superlu_dist  have a similar
non-deterministic?

Thanks,
Zongze


> Thanks,
> Pierre
>
>
>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
>> greater than that of the PETSc matrix (as seen in the output of ksp_view,
>> 115025949 vs 7346177)?
>>
>> Exact factorizations introduce fill-in.
>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros
>> in the factors.
>>
>> > 3. Is it possible that the varying number of nonzeros of the MUMPS
>> matrix is the cause of the random failure?
>>
>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
>> pivoting, and which may generate factors with different number of nonzeros.
>>
>
> Got it. Thank you for your clear explanation.
> Zongze
>
>
>> Thanks,
>> Pierre
>
>
>> > I have attached a test example written in Firedrake. The output of
>> `ksp_view` after running the code twice is included below for your
>> reference.
>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949
>> and 115377847, respectively, while that of the PETSc matrix was only
>> 7346177.
>> >
>> > ```shell
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>> ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>> ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > ```
>> >
>> > I would greatly appreciate any insights you may have on this matter.
>> Thank you in advance for your time and assistance.
>> >
>> > Best wishes,
>> > Zongze
>> > <test_mumps.py>
>
>
>

Re: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Reply via email to