On Sun, Feb 26, 2023 at 11:32 AM Mike Michell <mi.mike1...@gmail.com> wrote:
> This is what I get from petsc main which is not correct: > Overall volume computed from median-dual ... > 6.37050098781844 > Overall volume computed from PETSc ... > 3.15470053800000 > > > This is what I get from petsc 3.18.4 which is correct: > Overall volume computed from median-dual ... > 3.15470053800000 > Overall volume computed from PETSc ... > 3.15470053800000 > > > If there is a problem in the code, it is also strange for me that petsc > 3.18.4 gives the correct answer > As I said, this can happen due to different layouts in memory. If you run it under valgrind, or address sanitizer, you will see that there is a problem. Thanks, Matt > Thanks, > Mike > > >> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <mi.mike1...@gmail.com> >> wrote: >> >>> Which version of petsc you tested? With petsc 3.18.4, median duan volume >>> gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>> >> >> This is only an accident of the data layout. The code you sent writes >> over memory in the local Fortran arrays. >> >> Thanks, >> >> Matt >> >> >>> >>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <mi.mike1...@gmail.com> >>>> wrote: >>>> >>>>> My apologies for the late follow-up. There was a time conflict. >>>>> >>>>> A simple example code related to the issue I mentioned is attached >>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>> procs to have complete control volume values, and (4) print out its field >>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>> lines 771-793). >>>>> >>>>> Back to the original problem, I can get a proper control volume field >>>>> with PETSc 3.18.4, which is the latest stable release. However, if I use >>>>> PETSc from the main repo, it gives a strange control volume field. >>>>> Something is certainly strange around the parallel boundaries, thus I >>>>> think >>>>> something went wrong with halo communication. To help understand, a >>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>> no longer compatible with PETSc unless there is a bug in the library. >>>>> Could >>>>> I get comments on it? >>>>> >>>> >>>> I can run your example. The numbers I get for "median-dual volume" do >>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>> it using address sanitizer, and it found an error: >>>> >>>> Number of physical boundary edge ... 4 0 >>>> Number of physical and parallel boundary edge ... 4 >>>> 0 >>>> Number of parallel boundary edge ... 0 0 >>>> Number of physical boundary edge ... 4 1 >>>> Number of physical and parallel boundary edge ... 4 >>>> 1 >>>> Number of parallel boundary edge ... 0 1 >>>> ================================================================= >>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>> READ of size 8 at 0x603000022d40 thread T0 >>>> ================================================================= >>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>> #1 0x10cf768ee in main test.F90:43 >>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>> #1 0x1068e78ee in main test.F90:43 >>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>> [0x60300000f0d0,0x60300000f0f0) >>>> allocated by thread T0 here: >>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>> [0x603000022d20,0x603000022d40) >>>> allocated by thread T0 here: >>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>> #2 0x1068e78ee in main test.F90:43 >>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>>> Shadow bytes around the buggy address: >>>> >>>> which corresponds to >>>> >>>> ! midpoint of median-dual face for inner face >>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>> >>>> and these were allocated here >>>> >>>> allocate(xc(ncell)) >>>> allocate(yc(ncell)) >>>> >>>> Hopefully the error is straightforward to see now. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> >>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <knep...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <mi.mike1...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> As a follow-up, I tested: >>>>>>>> >>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>>>>>> has issues with DMPlex halo exchange. Something is suspicious about >>>>>>>> this >>>>>>>> main branch, related to DMPlex halo. The solution field I got is not >>>>>>>> correct. But it works okay with 1-proc. >>>>>>>> >>>>>>>> Does anyone have any comments on this issue? I am curious if other >>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>> declare ghost layers for halo exchange. >>>>>>>> >>>>>>> >>>>>>> There should not have been any changes there and there are >>>>>>> definitely tests for this. >>>>>>> >>>>>>> It would be great if you could send something that failed. I could >>>>>>> fix it and add it as a test. >>>>>>> >>>>>> >>>>>> Just to follow up, we have tests of the low-level communication (Plex >>>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>> tests that use halo exchange for PDE calculations, for example SNES >>>>>> tutorial ex12, ex13, ex62. THe convergence rates >>>>>> should be off if the halo exchange were wrong. Is there any example >>>>>> similar to your code that is failing on your installation? >>>>>> Or is there a way to run your code? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Mike >>>>>>>> >>>>>>>> >>>>>>>>> Dear PETSc team, >>>>>>>>> >>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>>>> version of PETSc: >>>>>>>>> >>git rev-parse origin >>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>> >>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>>>>>> fixed today ( >>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>> ). >>>>>>>>> >>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if >>>>>>>>> I use >>>>>>>>> the same configure option with before, and run my code, then there is >>>>>>>>> an >>>>>>>>> issue with halo exchange. The code runs without error message, but it >>>>>>>>> gives >>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>> partitioner or halo exchange part. This is because if I run the code >>>>>>>>> with >>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc >>>>>>>>> and >>>>>>>>> there was no change in my own code. Could I get any comments on the >>>>>>>>> issue? >>>>>>>>> I was wondering if there have been many changes in halo exchange or >>>>>>>>> graph >>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mike >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>