Ok - cherrypicked and pushed to maint. Satish
On Thu, 21 Mar 2019, Zhang, Junchao via petsc-users wrote: > Yes, it does. It is a bug. > --Junchao Zhang > > > On Thu, Mar 21, 2019 at 11:16 AM Balay, Satish > <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>> wrote: > Does maint also need this fix? > > Satish > > On Thu, 21 Mar 2019, Stefano Zampini via petsc-users wrote: > > > Derek > > > > I have fixed the optimized plan few weeks ago > > > > https://bitbucket.org/petsc/petsc/commits/c3caad8634d376283f7053f3b388606b45b3122c > > > > Maybe this will fix your problem too? > > > > Stefano > > > > > > Il Gio 21 Mar 2019, 04:21 Zhang, Junchao via petsc-users < > > petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> ha scritto: > > > > > Hi, Derek, > > > Try to apply this tiny (but dirty) patch on your version of PETSc to > > > disable the VecScatterMemcpyPlan optimization to see if it helps. > > > Thanks. > > > --Junchao Zhang > > > > > > On Wed, Mar 20, 2019 at 6:33 PM Junchao Zhang > > > <jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>> wrote: > > > > > >> Did you see the warning with small scale runs? Is it possible to provide > > >> a test code? > > >> You mentioned "changing PETSc now would be pretty painful". Is it because > > >> it will affect your performance (but not your code)? If yes, could you > > >> try > > >> PETSc master and run you code with or without -vecscatter_type sf. I > > >> want > > >> to isolate the problem and see if it is due to possible bugs in > > >> VecScatter. > > >> If the above suggestion is not feasible, I will disable VecScatterMemcpy. > > >> It is an optimization I added. Sorry I did not have an option to turn off > > >> it because I thought it was always useful:) I will provide you a patch > > >> later to disable it. With that you can run again to isolate possible bugs > > >> in VecScatterMemcpy. > > >> Thanks. > > >> --Junchao Zhang > > >> > > >> > > >> On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users < > > >> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: > > >> > > >>> Trying to track down some memory corruption I'm seeing on larger scale > > >>> runs (3.5B+ unknowns). Was able to run Valgrind on it... and I'm seeing > > >>> quite a lot of uninitialized value errors coming from ghost updating. > > >>> Here > > >>> are some of the traces: > > >>> > > >>> ==87695== Conditional jump or move depends on uninitialised value(s) > > >>> ==87695== at 0x73236D3: PetscMallocAlign (mal.c:28) > > >>> ==87695== by 0x7323C70: PetscMallocA (mal.c:390) > > >>> ==87695== by 0x739048E: VecScatterMemcpyPlanCreate_Index > > >>> (vscat.c:284) > > >>> ==87695== by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP > > >>> (vpscat_mpi1.c:312) > > >>> ==64730== by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857) > > >>> ==64730== by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543) > > >>> ==64730== by 0x73DDD39: VecScatterSetUp (vscatfce.c:212) > > >>> ==64730== by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333) > > >>> ==64730== by 0x7444232: VecCreateGhostWithArray (pbvec.c:685) > > >>> ==64730== by 0x744490D: VecCreateGhost (pbvec.c:741) > > >>> > > >>> ==133582== Conditional jump or move depends on uninitialised value(s) > > >>> ==133582== at 0x4030384: memcpy@@GLIBC_2.14 > > >>> (vg_replace_strmem.c:1034) > > >>> ==133582== by 0x739E4F9: PetscMemcpy (petscsys.h:1649) > > >>> ==133582== by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack > > >>> (vecscatterimpl.h:150) > > >>> ==133582== by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69) > > >>> ==133582== by 0x73DD964: VecScatterBegin (vscatfce.c:110) > > >>> ==133582== by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225) > > >>> > > >>> This is from a Git checkout of PETSc... the hash I branched from is: > > >>> 0e667e8fea4aa from December 23rd (updating would be really hard at this > > >>> point as I've completed 90% of my dissertation with this version... and > > >>> changing PETSc now would be pretty painful!). > > >>> > > >>> Any ideas? Is it possible it's in my code? Is it possible that there > > >>> are later PETSc commits that already fix this? > > >>> > > >>> Thanks for any help, > > >>> Derek > > >>> > > >>> > > > >