Hi Hong,
Thank. I can test it but it may takes some time to install petsc-dev on
the cluster. I will try more cases to see if I can get this error on my
local machine which is much more convenient for me to test in debug
mode. So far, the error does not occur on my local machine using the
same code, the same petsc-3.6.2 version, the same case and the same
number of processors. The system and petsc configuration is different.
Regards,
Danyang
On 15-12-02 10:26 AM, Hong wrote:
Danyang:
It is likely a zero pivot. I'm adding a feature to petsc. When matrix
factorization fails, computation continues with error information
stored in
ksp->reason=DIVERGED_PCSETUP_FAILED.
For your timestepping code, you may able to automatically reduce
timestep and continue your simulation.
Do you want to test it? If so, you need install petsc-dev with my
branch hzhang/matpackage-erroriffpe on your cluster. We may merge this
branch to petsc-master soon.
It's not easy to run in debugging mode as the cluster does not
have petsc installed using debug mode. Restart the case from the
crashing time does not has the problem. So if I want to detect
this error, I need to start the simulation from beginning which
takes hours in the cluster.
This is why we are adding this new feature.
Do you mean I need to redo symbolic factorization? For now, I only
do factorization once at the first timestep and then reuse it.
Some of the code is shown below.
if (timestep == 1) then
call
PCFactorSetMatSolverPackage(pc_flow,MATSOLVERMUMPS,ierr)
CHKERRQ(ierr)
call PCFactorSetUpMatSolverPackage(pc_flow,ierr)
CHKERRQ(ierr)
call PCFactorGetMatrix(pc_flow,a_flow_j,ierr)
CHKERRQ(ierr)
end if
call KSPSolve(ksp_flow,b_flow,x_flow,ierr)
CHKERRQ(ierr)
I do not think you need to change this part of code.
Does you code check convergence at each time step?
Hong
On 15-12-02 08:39 AM, Hong wrote:
Danyang :
My code fails due to the error in external library. It works
fine for the previous 2000+ timesteps but then crashes.
[4]PETSC ERROR: Error in external library
[4]PETSC ERROR: Error reported by MUMPS in numerical
factorization phase: INFO(1)=-1, INFO(2)=0
This simply says an error occurred in proc[0] during numerical
factorization, which usually either encounter a zeropivot or run
out of memory. Since it is at a later timesteps, which I guess
you reuse matrix factor, zeropivot might be the problem.
Is possible to run it in debugging mode? In this way, mumps would
dump out more information.
Then I tried the same simulation on another machine using the
same number of processors, it does not fail.
Does this machine have larger memory?
Hong