Re: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Matthew Knepley Sun, 26 Feb 2023 08:38:07 -0800

On Sun, Feb 26, 2023 at 11:32 AM Mike Michell <mi.mike1...@gmail.com> wrote:


> This is what I get from petsc main which is not correct:
> Overall volume computed from median-dual ...
>    6.37050098781844
> Overall volume computed from PETSc ...
>    3.15470053800000
>
>
> This is what I get from petsc 3.18.4 which is correct:
> Overall volume computed from median-dual ...
>    3.15470053800000
> Overall volume computed from PETSc ...
>    3.15470053800000
>
>
> If there is a problem in the code, it is also strange for me that petsc
> 3.18.4 gives the correct answer
>

As I said, this can happen due to different layouts in memory. If you run
it under valgrind, or address sanitizer, you will see
that there is a problem.

  Thanks,

     Matt


> Thanks,
> Mike
>
>
>> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <mi.mike1...@gmail.com>
>> wrote:
>>
>>> Which version of petsc you tested? With petsc 3.18.4, median duan volume
>>> gives the same value with petsc from DMPlexComputeCellGeometryFVM().
>>>
>>
>> This is only an accident of the data layout. The code you sent writes
>> over memory in the local Fortran arrays.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>>
>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <mi.mike1...@gmail.com>
>>>> wrote:
>>>>
>>>>> My apologies for the late follow-up. There was a time conflict.
>>>>>
>>>>> A simple example code related to the issue I mentioned is attached
>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise
>>>>> control volume for each node in a median-dual way, (3) halo exchange among
>>>>> procs to have complete control volume values, and (4) print out its field
>>>>> as a .vtu file. To make sure, the computed control volume is also compared
>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see
>>>>> lines 771-793).
>>>>>
>>>>> Back to the original problem, I can get a proper control volume field
>>>>> with PETSc 3.18.4, which is the latest stable release. However, if I use
>>>>> PETSc from the main repo, it gives a strange control volume field.
>>>>> Something is certainly strange around the parallel boundaries, thus I 
>>>>> think
>>>>> something went wrong with halo communication. To help understand, a
>>>>> comparing snapshot is also attached. I guess a certain part of my code is
>>>>> no longer compatible with PETSc unless there is a bug in the library. 
>>>>> Could
>>>>> I get comments on it?
>>>>>
>>>>
>>>> I can run your example. The numbers I get for "median-dual volume" do
>>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover,
>>>> the median-dual numbers change, which suggests a memory fault. I compiled
>>>> it using address sanitizer, and it found an error:
>>>>
>>>>  Number of physical boundary edge ...            4           0
>>>>  Number of physical and parallel boundary edge ...            4
>>>>   0
>>>>  Number of parallel boundary edge ...            0           0
>>>>  Number of physical boundary edge ...            4           1
>>>>  Number of physical and parallel boundary edge ...            4
>>>>   1
>>>>  Number of parallel boundary edge ...            0           1
>>>> =================================================================
>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8
>>>> READ of size 8 at 0x603000022d40 thread T0
>>>> =================================================================
>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8
>>>> READ of size 8 at 0x60300000f0f0 thread T0
>>>>     #0 0x10cf702a7 in MAIN__ test.F90:657
>>>>     #1 0x10cf768ee in main test.F90:43
>>>>     #0 0x1068e12a7 in MAIN__ test.F90:657
>>>>     #1 0x1068e78ee in main test.F90:43
>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>
>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region
>>>> [0x60300000f0d0,0x60300000f0f0)
>>>> allocated by thread T0 here:
>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>
>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region
>>>> [0x603000022d20,0x603000022d40)
>>>> allocated by thread T0 here:
>>>>     #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)
>>>>     #1 0x1068dba71 in MAIN__ test.F90:499
>>>>     #2 0x1068e78ee in main test.F90:43
>>>>     #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>
>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__
>>>> Shadow bytes around the buggy address:
>>>>
>>>> which corresponds to
>>>>
>>>>      ! midpoint of median-dual face for inner face
>>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>>
>>>> and these were allocated here
>>>>
>>>>  allocate(xc(ncell))
>>>>  allocate(yc(ncell))
>>>>
>>>> Hopefully the error is straightforward to see now.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>>
>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <knep...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <mi.mike1...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> As a follow-up, I tested:
>>>>>>>>
>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab (
>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on
>>>>>>>> DMPlex halo exchange. This version works as I expect.
>>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git)
>>>>>>>> has issues with DMPlex halo exchange. Something is suspicious about 
>>>>>>>> this
>>>>>>>> main branch, related to DMPlex halo. The solution field I got is not
>>>>>>>> correct. But it works okay with 1-proc.
>>>>>>>>
>>>>>>>> Does anyone have any comments on this issue? I am curious if other
>>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not
>>>>>>>> declare ghost layers for halo exchange.
>>>>>>>>
>>>>>>>
>>>>>>> There should not have been any changes there and there are
>>>>>>> definitely tests for this.
>>>>>>>
>>>>>>> It would be great if you could send something that failed. I could
>>>>>>> fix it and add it as a test.
>>>>>>>
>>>>>>
>>>>>> Just to follow up, we have tests of the low-level communication (Plex
>>>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have
>>>>>> tests that use halo exchange for PDE calculations, for example SNES
>>>>>> tutorial ex12, ex13, ex62. THe convergence rates
>>>>>> should be off if the halo exchange were wrong. Is there any example
>>>>>> similar to your code that is failing on your installation?
>>>>>> Or is there a way to run your code?
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>>   Thanks,
>>>>>>>
>>>>>>>      Matt
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>>> Dear PETSc team,
>>>>>>>>>
>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this
>>>>>>>>> version of PETSc:
>>>>>>>>> >>git rev-parse origin
>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1
>>>>>>>>> >>git rev-parse FETCH_HEAD
>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b
>>>>>>>>>
>>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed
>>>>>>>>> fixed today (
>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> Since that MR has been merged into the main repo, I pulled the
>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if 
>>>>>>>>> I use
>>>>>>>>> the same configure option with before, and run my code, then there is 
>>>>>>>>> an
>>>>>>>>> issue with halo exchange. The code runs without error message, but it 
>>>>>>>>> gives
>>>>>>>>> wrong solution field. I guess the issue I have is related to graph
>>>>>>>>> partitioner or halo exchange part. This is because if I run the code 
>>>>>>>>> with
>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc 
>>>>>>>>> and
>>>>>>>>> there was no change in my own code. Could I get any comments on the 
>>>>>>>>> issue?
>>>>>>>>> I was wondering if there have been many changes in halo exchange or 
>>>>>>>>> graph
>>>>>>>>> partitioning & distributing part related to DMPlex.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>> their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>> their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Reply via email to