Hi Hong,

Thank. I can test it but it may takes some time to install petsc-dev on the cluster. I will try more cases to see if I can get this error on my local machine which is much more convenient for me to test in debug mode. So far, the error does not occur on my local machine using the same code, the same petsc-3.6.2 version, the same case and the same number of processors. The system and petsc configuration is different.

Regards,

Danyang

On 15-12-02 10:26 AM, Hong wrote:
Danyang:
It is likely a zero pivot. I'm adding a feature to petsc. When matrix factorization fails, computation continues with error information stored in
ksp->reason=DIVERGED_PCSETUP_FAILED.
For your timestepping code, you may able to automatically reduce timestep and continue your simulation.

Do you want to test it? If so, you need install petsc-dev with my branch hzhang/matpackage-erroriffpe on your cluster. We may merge this branch to petsc-master soon.


    It's not easy to run in debugging mode as the cluster does not
    have petsc installed using debug mode. Restart the case from the
    crashing time does not has the problem. So if I want to detect
    this error, I need to start the simulation from beginning which
    takes hours in the cluster.


This is why we are adding this new feature.


    Do you mean I need to redo symbolic factorization? For now, I only
    do factorization once at the first timestep and then reuse it.
    Some of the code is shown below.

                if (timestep == 1) then
                  call
    PCFactorSetMatSolverPackage(pc_flow,MATSOLVERMUMPS,ierr)
                  CHKERRQ(ierr)

                  call PCFactorSetUpMatSolverPackage(pc_flow,ierr)
                  CHKERRQ(ierr)

                  call PCFactorGetMatrix(pc_flow,a_flow_j,ierr)
                  CHKERRQ(ierr)
                end if

                call KSPSolve(ksp_flow,b_flow,x_flow,ierr)
                CHKERRQ(ierr)


I do not think you need to change this part of code.
Does you code check convergence at each time step?

Hong



    On 15-12-02 08:39 AM, Hong wrote:
    Danyang :

        My code fails due to the error in external library. It works
        fine for the previous 2000+ timesteps but then crashes.

        [4]PETSC ERROR: Error in external library
        [4]PETSC ERROR: Error reported by MUMPS in numerical
        factorization phase: INFO(1)=-1, INFO(2)=0

    This simply says an error occurred in proc[0] during numerical
    factorization, which usually either encounter a zeropivot or run
    out of memory. Since it is at a later timesteps, which I guess
    you reuse matrix factor, zeropivot might be the problem.
    Is possible to run it in debugging mode? In this way, mumps would
    dump out more information.


        Then I tried the same simulation on another machine using the
        same number of processors, it does not fail.

    Does this machine  have larger memory?

    Hong



Reply via email to