[petsc-users] Bug report VecNorm
Dear PETSc/Tao team, there is a bug in the voector interface: In the function VecNorm, see, eg. https://petsc.org/release/src/vec/vec/interface/rvector.c.html#VecNorm line 197 the check for consistency in line 214 is done on the wrong communicator. The communicator should be PETSC_COMM_SELF. Otherwise the program may hang when PetscCheck is executed. Please find a minimal example attached. Kind regards, Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3188 (Büro) #include "petscvec.h" int main(int argc, char **args) { PetscMPIInt size, rank; Vec vec; PetscReal norm; PetscBool flg = PETSC_FALSE, minflg = PETSC_FALSE; MPI_Commcomm; PetscScalar *xx; PetscCall(PetscInitialize(&argc, &args, PETSC_NULLPTR, PETSC_NULLPTR)); comm = PETSC_COMM_WORLD; PetscCallMPI(MPI_Comm_size(comm, &size)); PetscCallMPI(MPI_Comm_rank(comm, &rank)); PetscCheck(size > 1, comm, PETSC_ERR_ARG_WRONG, "example should be called with more than 1 MPI rank."); PetscCall(VecCreateMPI(comm, (rank+1)*10, PETSC_DETERMINE, &vec)); PetscCall(VecSet(vec, 1.0)); PetscCall(VecNorm(vec, NORM_INFINITY, &norm)); PetscSynchronizedPrintf(PETSC_COMM_WORLD, "rank = %d, size = %d, norm = %lf\n", rank, size, norm); PetscSynchronizedFlush(comm, PETSC_STDOUT); if(rank == 0) { PetscCall(VecGetArrayWrite(vec, &xx)); PetscCall(VecRestoreArrayWrite(vec, &xx)); } PetscCall(VecNormAvailable(vec, NORM_INFINITY, &flg, &norm)); PetscSynchronizedPrintf(comm, "rank = %d, size = %d, flg = %d, norm = %lf\n", rank, size, flg, norm); PetscSynchronizedFlush(comm, PETSC_STDOUT); PetscCall(MPIU_Allreduce(&flg, &minflg, 1, MPIU_BOOL, MPI_LAND, PetscObjectComm((PetscObject)vec))); /* wrong */ PetscCheck(flg == minflg, PetscObjectComm((PetscObject)vec), PETSC_ERR_ARG_WRONGSTATE, "Some MPI processes have cached norm, others do not. This may happen when some MPI processes call VecGetArray() and some others do not."); /* this is correct */ // PetscCheck(flg == minflg, PETSC_COMM_SELF, PETSC_ERR_ARG_WRONGSTATE, "Some MPI processes have cached norm, others do not. This may happen when some MPI processes call VecGetArray() and some others do not."); PetscCall(VecDestroy(&vec)); PetscFinalize(); return 0; } OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [petsc-users] Bug Report TaoALMM class
Dear PETSc/Tao team, this is still an open issue andI haven't heard anything else so far that I'm wrong. Kind regards, Stephan Köhler Am 18.07.23 um 02:21 schrieb Matthew Knepley: Toby and Hansol, Has anyone looked at this? Thanks, Matt On Mon, Jun 12, 2023 at 8:24 AM Stephan Köhler < stephan.koeh...@math.tu-freiberg.de> wrote: Dear PETSc/Tao team, I think there might be a bug in the Tao ALMM class: In the function TaoALMMComputeAugLagAndGradient_Private(), see, eg. https://petsc.org/release/src/tao/constrained/impls/almm/almm.c.html#TAOALMM line 648 the gradient seems to be wrong. The given function and gradient computation is Lc = F + Ye^TCe + Yi^T(Ci - S) + 0.5*mu*[Ce^TCe + (Ci - S)^T(Ci - S)], dLc/dX = dF/dX + Ye^TAe + Yi^TAi + 0.5*mu*[Ce^TAe + (Ci - S)^TAi], but I think the gradient should be (without 0.5) dLc/dX = dF/dX + Ye^TAe + Yi^TAi + mu*[Ce^TAe + (Ci - S)^TAi]. Kind regards, Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3188 (Büro) OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
[petsc-users] Bug Report TaoALMM class
Dear PETSc/Tao team, I think there might be a bug in the Tao ALMM class: In the function TaoALMMComputeAugLagAndGradient_Private(), see, eg. https://petsc.org/release/src/tao/constrained/impls/almm/almm.c.html#TAOALMM line 648 the gradient seems to be wrong. The given function and gradient computation is Lc = F + Ye^TCe + Yi^T(Ci - S) + 0.5*mu*[Ce^TCe + (Ci - S)^T(Ci - S)], dLc/dX = dF/dX + Ye^TAe + Yi^TAi + 0.5*mu*[Ce^TAe + (Ci - S)^TAi], but I think the gradient should be (without 0.5) dLc/dX = dF/dX + Ye^TAe + Yi^TAi + mu*[Ce^TAe + (Ci - S)^TAi]. Kind regards, Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[petsc-users] Bug report LMVM matrix class
Dear PETSc/Tao team, it seems to be that there is a bug in the LMVM matrix class: The function MatMultAdd_LMVM, see, e.g., https://petsc.org/release/src/ksp/ksp/utils/lmvm/lmvmimpl.c.html at line 114, if the vectors Y and Z are the same, then the result is wrong, since the first MatMult overwrites also the value in Y. Best regards Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [petsc-users] Report Bug TaoALMM class
Barry, this is a nonartificial code. This is a problem in the ALMM subsolver. I want to solve a problem with a TaoALMM solver what then happens is: TaoSolve(tao) /* TaoALMM solver */ | | |> This calls the TaoALMM subsolver routine TaoSolve(subsolver) | | |---> The subsolver does not correctly work, at least with an Armijo line search, since the solution is overwritten within the line search. In my case, the subsolver does not make any progress although it is possible. To get to my real problem you can simply change line 268 to if(0) (from if(1) -> if(0)) and line 317 from // ierr = TaoSolve(tao); CHKERRQ(ierr); ---> ierr = TaoSolve(tao); CHKERRQ(ierr); What you can see is that the solver does not make any progress, but it should make progress. To be honest, I do not really know why the option -tao_almm_subsolver_tao_ls_monitor has know effect if the ALMM solver is called and not the subsolver. I also do not know why -tao_almm_subsolver_tao_view prints as termination reason for the subsolver Solution converged: ||g(X)|| <= gatol This is obviously not the case. I set the tolerance -tao_almm_subsolver_tao_gatol 1e-8 \ -tao_almm_subsolver_tao_grtol 1e-8 \ I encountered this and then I looked into the ALMM class and therefore I tried to call the subsolver (previous example). I attach the updated programm and also the options. Stephan <https://www.dict.cc/?s=obviously> On 03.11.22 22:15, Barry Smith wrote: Thanks for your response and the code. I understand the potential problem and how your code demonstrates a bug if the TaoALMMSubsolverObjective() is used in the manner you use in the example where you directly call TaoComputeObjective() multiple times line a line search code might. What I don't have or understand is how to reproduce the problem in a real code that uses Tao. That is where the Tao Armijo line search code has a problem when it is used (somehow) in a Tao solver with ALMM. You suggest "If you have an example for your own, you can switch the Armijo line search by the option -tao_ls_type armijo. The thing is that it will cause no problems if the line search accepts the steps with step length one." I don't see how to do this if I use -tao_type almm I cannot use -tao_ls_type armijo; that is the option -tao_ls_type doesn't seem to me to be usable in the context of almm (since almm internally does directly its own trust region approach for globalization). If we remove the if (1) code from your example, is there some Tao options I can use to get the bug to appear inside the Tao solve? I'll try to explain again, I agree that the fact that the Tao solution is aliased (within the ALMM solver) is a problem with repeated calls to TaoComputeObjective() but I cannot see how these repeated calls could ever happen in the use of TaoSolve() with the ALMM solver. That is when is this "design problem" a true problem as opposed to just a potential problem that can be demonstrated in artificial code? The reason I need to understand the non-artificial situation it breaks things is to come up with an appropriate correction for the current code. Barry On Nov 3, 2022, at 12:46 PM, Stephan Köhler wrote: Barry, so far, I have not experimented with trust-region methods, but I can imagine that this "design feature" causes no problem for trust-region methods, if the old point is saved and after the trust-region check fails the old point is copied to the actual point. But the implementation of the Armijo line search method does not work that way. Here, the actual point will always be overwritten. Only if the line search fails, then the old point is restored, but then the TaoSolve method ends with a line search failure. If you have an example for your own, you can switch the Armijo line search by the option -tao_ls_type armijo. The thing is that it will cause no problems if the line search accepts the steps with step length one. It is also possible that, by luck, it will cause no problems, if the "excessive" step brings a reduction of the objective Otherwise, I attach my example, which is not minimal, but here you can see that it causes problems. You need to set the paths to the PETSc library in the makefile. You find the options for this problem in the run_test_tao_neohooke.sh script. The import part begins at line 292 in test_tao_neohooke.cpp Stephan On 02.11.22 19:04, Barry Smith wrote: Stephan, I have located the troublesome line in TaoSetUp_ALMM() it has the line auglag->Px = tao->solution; and in alma.h it has Vec Px, LgradX, Ce, Ci, G; /* aliased vectors (do not destroy!) */ Now auglag->P in some situations alias
Re: [petsc-users] Report Bug TaoALMM class
Barry, so far, I have not experimented with trust-region methods, but I can imagine that this "design feature" causes no problem for trust-region methods, if the old point is saved and after the trust-region check fails the old point is copied to the actual point. But the implementation of the Armijo line search method does not work that way. Here, the actual point will always be overwritten. Only if the line search fails, then the old point is restored, but then the TaoSolve method ends with a line search failure. If you have an example for your own, you can switch the Armijo line search by the option -tao_ls_type armijo. The thing is that it will cause no problems if the line search accepts the steps with step length one. It is also possible that, by luck, it will cause no problems, if the "excessive" step brings a reduction of the objective Otherwise, I attach my example, which is not minimal, but here you can see that it causes problems. You need to set the paths to the PETSc library in the makefile. You find the options for this problem in the run_test_tao_neohooke.sh script. The import part begins at line 292 in test_tao_neohooke.cpp Stephan On 02.11.22 19:04, Barry Smith wrote: Stephan, I have located the troublesome line in TaoSetUp_ALMM() it has the line auglag->Px = tao->solution; and in alma.h it has Vec Px, LgradX, Ce, Ci, G; /* aliased vectors (do not destroy!) */ Now auglag->P in some situations alias auglag->P and in some cases auglag->Px serves to hold a portion of auglag->P. So then in TaoALMMSubsolverObjective_Private() the lines PetscCall(VecCopy(P, auglag->P)); PetscCall((*auglag->sub_obj)(auglag->parent)); causes, just as you said, tao->solution to be overwritten by the P at which the objective function is being computed. In other words, the solution of the outer Tao is aliased with the solution of the inner Tao, by design. You are definitely correct, the use of TaoALMMSubsolverObjective_Private and TaoALMMSubsolverObjectiveAndGradient_Private in a line search would be problematic. I am not an expert at these methods or their implementations. Could you point to an actual use case within Tao that triggers the problem. Is there a set of command line options or code calls to Tao that fail due to this "design feature". Within the standard use of ALMM I do not see how the objective function would be used within a line search. The TaoSolve_ALMM() code is self-correcting in that if a trust region check fails it automatically rolls back the solution. Barry On Oct 28, 2022, at 4:27 AM, Stephan Köhler wrote: Dear PETSc/Tao team, it seems to be that there is a bug in the TaoALMM class: In the methods TaoALMMSubsolverObjective_Private and TaoALMMSubsolverObjectiveAndGradient_Private the vector where the function value for the augmented Lagrangian is evaluate is copied into the current solution, see, e.g.,https://petsc.org/release/src/tao/constrained/impls/almm/almm.c.html line 672 or 682. This causes subsolver routine to not converge if the line search for the subsolver rejects the step length 1. for some update. In detail: Suppose the current iterate is xk and the current update is dxk. The line search evaluates the augmented Lagrangian now at (xk + dxk). This causes that the value (xk + dxk) is copied in the current solution. If the point (xk + dxk) is rejected, the line search should try the point (xk + alpha * dxk), where alpha < 1. But due to the copying, what happens is that the point ((xk + dxk) + alpha * dxk) is evaluated, see, e.g.,https://petsc.org/release/src/tao/linesearch/impls/armijo/armijo.c.html line 191. Best regards Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) Minimal_example_without_vtk_2.tar.gz Description: application/gzip OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[petsc-users] Bug report LMVM matrix class
Dear PETSc/Tao team, it seems to be that there is a bug in the LMVM matrix class: In the function MatCreateVecs_LMVM, see, e.g., https://petsc.org/release/src/ksp/ksp/utils/lmvm/lmvmimpl.c.html at line 214. it is not checked if the vectors *L, or *R are NULL. This is, in particular, a problem if this matrix class is combined with the Schur complement matrix class, since MatMult_SchurComplement calls this function with NULL as *R, see, e.g. https://petsc.org/release/src/ksp/ksp/utils/schurm/schurm.c.html at line 66. I attach a minimal example. You need to modify the paths to the PETSc installation in the makefile. Best regards Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) Minimal_example_schur_lmvm.tar.gz Description: application/gzip OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[petsc-users] Report Bug TaoALMM class
Dear PETSc/Tao team, it seems to be that there is a bug in the TaoALMM class: In the methods TaoALMMSubsolverObjective_Private and TaoALMMSubsolverObjectiveAndGradient_Private the vector where the function value for the augmented Lagrangian is evaluate is copied into the current solution, see, e.g., https://petsc.org/release/src/tao/constrained/impls/almm/almm.c.html line 672 or 682. This causes subsolver routine to not converge if the line search for the subsolver rejects the step length 1. for some update. In detail: Suppose the current iterate is xk and the current update is dxk. The line search evaluates the augmented Lagrangian now at (xk + dxk). This causes that the value (xk + dxk) is copied in the current solution. If the point (xk + dxk) is rejected, the line search should try the point (xk + alpha * dxk), where alpha < 1. But due to the copying, what happens is that the point ((xk + dxk) + alpha * dxk) is evaluated, see, e.g., https://petsc.org/release/src/tao/linesearch/impls/armijo/armijo.c.html line 191. Best regards Stephan Köhler -- Stephan Köhler TU Bergakademie Freiberg Institut für numerische Mathematik und Optimierung Akademiestraße 6 09599 Freiberg Gebäudeteil Mittelbau, Zimmer 2.07 Telefon: +49 (0)3731 39-3173 (Büro) OpenPGP_0xC9BF2C20DFE9F713.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature