Congratulations you have found a ginormous bug in PETSc! Thanks for the 
detail information on the problem.

   I will post a fix shortly.

   Barry


> On Nov 16, 2023, at 6:19 PM, Sreeram R Venkat <srven...@utexas.edu> wrote:
> 
> I have a program which reads a vector from file into an array, and then uses 
> that array to create a PETSc Vec object. The Vec is defined on the global 
> communicator, but not all processes actually contain entries of it. For 
> example, suppose we have 4 processors, and the vector is of size 10. Rank 0 
> will contain entries 0-4 and Rank 1 will contain entries 5-9. Ranks 2 and 3 
> will not have any entries of the Vec.
> 
> This Vec is then used as an input to other parts of the code, and those work 
> fine. However, if I try to take the norm of the Vec with VecNorm(), I get the 
> error
> 
> `MPI_Allreduce() called in different locations (code lines) on different 
> processors`
> 
> The stack trace shows that ranks 0 and 1 (from the above example) are still 
> in the VecNorm() function while ranks 2 and 3 have moved on to a later part 
> of the code. If I add a PetscBarrier() after the VecNorm(), I find that the 
> program hangs. 
> 
> The funny thing is that part of the code duplicates the Vec with 
> VecDuplicate() and assigns to the duplicated vector the result of some 
> computations. The duplicated Vec has the same layout as the original Vec, but 
> taking VecNorm() on the duplicated Vec works fine. If I use VecCopy(), 
> however, the copied Vec also causes VecNorm() to hang. I've printed out the 
> original Vec, and there are no corrupted/NaN entries.
> 
> I have a temporary workaround where I perturb the original Vec slightly 
> before copying it to another Vec. This causes the program to successfully 
> terminate.
> 
> Any advice on how to get VecNorm() working with the original Vec?
> 
> Thanks,
> Sreeram

Reply via email to