I understand the situation you are facing. With pseudo-continuation, when you provide an RHS function, SNES is computing udot - F(u). Inside the SNESSolve(), specifically in TSComputeIFunction(), with the lines if (!ts->ops->ifunction) { ierr = TSComputeRHSFunction(ts,t,X,Y);CHKERRQ(ierr); ierr = VecAYPX(Y,-1,Xdot);CHKERRQ(ierr); But for the original system, we care about the norm for F(u). So in TSStep_Pseudo(), after the nonlinear solve there is the line pseudo->fnorm = -1; /* The current norm is no longer valid, monitor must recompute it. */ and further down if (pseudo->fnorm < 0) { PetscCall(VecZeroEntries(pseudo->xdot)); PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, pseudo->func, PETSC_FALSE)); PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); this norm is needed to determine of the nonlinear system has converged satisfactory (or if more pseudo time steps are needed). Hence, your FormRHSFunction gets called twice with identical input. To make matters even worse your FormRHSFunction is actually being called THREE times with identical input. Because at the start of TSStep_Pseudo() it calls TSPseudoComputeTimeStep(ts, &next_time_step) which by default uses TSPseudoTimeStepDefault() which has the same chunk of code PetscCall(VecZeroEntries(pseudo->xdot)); PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, pseudo->func, PETSC_FALSE)); PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); And the same norm is also recomputed. Now the diff diff --git a/src/ts/impls/pseudo/posindep.c b/src/ts/impls/pseudo/posindep.c index 7c013435f98..813a28311e7 100644 --- a/src/ts/impls/pseudo/posindep.c +++ b/src/ts/impls/pseudo/posindep.c @@ -660,9 +660,11 @@ PetscErrorCode TSPseudoTimeStepDefault(TS ts, PetscReal *newdt, void *dtctx) PetscReal inc = pseudo->dt_increment; PetscFunctionBegin; - PetscCall(VecZeroEntries(pseudo->xdot)); - PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, pseudo->func, PETSC_FALSE)); - PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); + if (pseudo->fnorm < 0.0) { + PetscCall(VecZeroEntries(pseudo->xdot)); + PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, pseudo->func, PETSC_FALSE)); + PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); + } if (pseudo->fnorm_initial < 0) { /* first time through so compute initial function norm */ pseudo->fnorm_initial = pseudo->fnorm will prevent the THIRD computation of the same function evaluation. So this is a good optimization But eliminating the second computation is not so trivial. To review, deep inside the TS code there is TSComputeRHSFunction(ts,t,X,Y); VecAYPX(Y,-1,Xdot); This code knows nothing about TSPPseudo etc. Then later in the Pseudo code PetscCall(VecZeroEntries(pseudo->xdot)); PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, pseudo->func, PETSC_FALSE)); PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); again calls TSComputeRHSFunction(ts,t,X,Y); with the same X. (since Xdot is not used directly by TSComputeRHSFunction()). Now if all the code was custom for Pseudo we could simply put in a line TSComputeRHSFunction(ts,t,X,Y); VecNorm(Y,NORM_2, &pseudo->fnorm) VecAYPX(Y,-1,Xdot); into TSComputeIFunction() and have the norm available later for pseudo. It is not clear that there is any definitive benefit for pseudo to utilize the TS framework. It possibly gets some code reuse from TS but at the cost of extra code that won't be needed if written directly. Plus this performance hit. Anyways I will make a merge request to remove the third redundant computation of FormRHSFunction but will await input from others on if anything else could be done to remove the inefficiency. Thanks for reporting this challenging problem. Barry Note: I don't think providing the Jacobian improves the efficiency, it is just better at hiding the redundant computation.
|
-dm_plex_dim 2 -dm_plex_filename /home/abonfi/grids/2D/aerofoils/naca0012/euler/gmsh/naca0012.msh #-dm_plex_filename /home/abonfi/grids/2D/aerofoils/naca0012/euler/gmsh/mesh.h5 -testcase 3 ##-dm_view ##-dm_plex_view_labels "marker" ##-dm_plex_view_labels "Face Sets" -freestream_Mach_number .5 -flow_angles 1.,0.,0. -dm_mat_type baij ##-Reynolds 1.e39 -colors 4,5,-1,-1,-1,-1 -dm_plex_simplex true -dm_plex_interpolate -dm_plex_check_all -petscpartitioner_view ##-dm_petscsection_view ##-ts_dt 100. -ts_max_steps 100 -ts_type pseudo -pc_factor_mat_ordering_type rcm -pc_factor_reuse_ordering -ksp_converged_reason -options_left ##-ksp_view