Re: [petsc-users] Error reported by MUMPS in numerical factorization phase

Danyang Su Wed, 02 Dec 2015 11:36:53 -0800

Hi Hong,

Thank. I can test it but it may takes some time to install petsc-dev onthe cluster. I will try more cases to see if I can get this error on mylocal machine which is much more convenient for me to test in debugmode. So far, the error does not occur on my local machine using thesame code, the same petsc-3.6.2 version, the same case and the samenumber of processors. The system and petsc configuration is different.


Regards,

Danyang

On 15-12-02 10:26 AM, Hong wrote:

Danyang:

It is likely a zero pivot. I'm adding a feature to petsc. When matrixfactorization fails, computation continues with error informationstored in

ksp->reason=DIVERGED_PCSETUP_FAILED.

For your timestepping code, you may able to automatically reducetimestep and continue your simulation.

Do you want to test it? If so, you need install petsc-dev with mybranch hzhang/matpackage-erroriffpe on your cluster. We may merge thisbranch to petsc-master soon.



    It's not easy to run in debugging mode as the cluster does not
    have petsc installed using debug mode. Restart the case from the
    crashing time does not has the problem. So if I want to detect
    this error, I need to start the simulation from beginning which
    takes hours in the cluster.


This is why we are adding this new feature.


    Do you mean I need to redo symbolic factorization? For now, I only
    do factorization once at the first timestep and then reuse it.
    Some of the code is shown below.

                if (timestep == 1) then
                  call
    PCFactorSetMatSolverPackage(pc_flow,MATSOLVERMUMPS,ierr)
                  CHKERRQ(ierr)

                  call PCFactorSetUpMatSolverPackage(pc_flow,ierr)
                  CHKERRQ(ierr)

                  call PCFactorGetMatrix(pc_flow,a_flow_j,ierr)
                  CHKERRQ(ierr)
                end if

                call KSPSolve(ksp_flow,b_flow,x_flow,ierr)
                CHKERRQ(ierr)


I do not think you need to change this part of code.
Does you code check convergence at each time step?

Hong



    On 15-12-02 08:39 AM, Hong wrote:

    Danyang :

        My code fails due to the error in external library. It works
        fine for the previous 2000+ timesteps but then crashes.

        [4]PETSC ERROR: Error in external library
        [4]PETSC ERROR: Error reported by MUMPS in numerical
        factorization phase: INFO(1)=-1, INFO(2)=0

    This simply says an error occurred in proc[0] during numerical
    factorization, which usually either encounter a zeropivot or run
    out of memory. Since it is at a later timesteps, which I guess
    you reuse matrix factor, zeropivot might be the problem.
    Is possible to run it in debugging mode? In this way, mumps would
    dump out more information.


        Then I tried the same simulation on another machine using the
        same number of processors, it does not fail.

    Does this machine  have larger memory?

    Hong

Re: [petsc-users] Error reported by MUMPS in numerical factorization phase

Reply via email to