Note in init.c that, by default, PETSc does not use PetscTrMallocDefault() when valgrind is running; because it doesn't necessarily make sense to put one memory checker on top of another memory checker. So, at a glance, I'm puzzled how it can be in the routine PetscTrMallocDefault(). Do you perhaps have -malloc_debug in a .petscrc file or in the environmental variable PETSC_OPTIONS? Anyways there is a problem but perhaps this is a hint where the problem is coming from?
Barry > On Jul 30, 2019, at 5:38 PM, Zhang, Junchao via petsc-dev > <petsc-dev@mcs.anl.gov> wrote: > > Fabian, > I happen have a Ubuntu virtual machine and I could reproduce the error with > your mini-test, even with two processes. It is horrible to see wrong results > in such a simple test. > We'd better figure out whether it is a PETSc bug or an OpenMPI bug. If it is > latter, which MPI call is at fault. > > --Junchao Zhang > > > On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev > <petsc-dev@mcs.anl.gov> wrote: > Dear Petsc Team, > Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and > (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind > errors in DMDAGlobalToNatural. > > This is evident in a minimal fortran example such as the attached > example petsc_ex.F90 > > with the following error: > > ==22616== Conditional jump or move depends on uninitialised value(s) > ==22616== at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185) > ==22616== by 0x4FA4DAC: PetscMallocA (mal.c:413) > ==22616== by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652) > ==22616== by 0x50A1104: VecScatterSetUp (vscatfce.c:209) > ==22616== by 0x509EE3B: VecScatterCreate (vscreate.c:280) > ==22616== by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108) > ==22616== by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155) > ==22616== by 0x5798446: VecView_MPI_DA (gr2.c:720) > ==22616== by 0x51BC7D8: VecView (vector.c:574) > ==22616== by 0x4F4ECA1: PetscObjectView (destroy.c:90) > ==22616== by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126) > > and consequently wrong results in the natural vec > > > I was looking at the fortran example if I did forget something but I can > also see the same error, i.e. not being valgrind clean, in pure C - PETSc: > > cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun > --allow-run-as-root -np 2 valgrind ./ex14 > > I then tried various docker/podman linux distributions to make sure that > my setup is clean and to me it seems that this error is confined to the > particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo. > > I tried other images from dockerhub including > > gcc:7.4.0 :: where I could neither install openmpi nor mpich through > apt, however works with --download-openmpi and --download-mpich > > ubuntu:rolling(19.04) <-- work > > debian:latest & :stable <-- works > > ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich > or with petsc-configure --download-openmpi or --download-mpich > > > Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I > guess I'll go with a custom mpi install but given that ubuntu:latest is > widely spread, do you think there is an easy solution to the error? > > I guess you are not eager to delve into this issue with old mpi versions > but in case you find some spare time, maybe you find the root cause > and/or a workaround. > > Many thanks, > Fabian