Satish,
Can you please add to MPI.py a check for this and simply reject it telling the user there are bugs in that version of OpenMP/ubuntu? It is not debuggable, and hence not fixable and wastes everyones time and could even lead to wrong results (which is worse than crashing). We've had multiple reports of this. Barry > On Jul 30, 2019, at 10:17 AM, Balay, Satish via petsc-dev > <petsc-dev@mcs.anl.gov> wrote: > > We've seen such behavior with ubuntu default OpenMPI - but have no > idea why this happens or if we can work around it. > > Last I checked - the same version of openmpi - when installed > separately did not exhibit such issues.. > > Satish > > On Tue, 30 Jul 2019, Fabian.Jakub via petsc-dev wrote: > >> Dear Petsc Team, >> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and >> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind >> errors in DMDAGlobalToNatural. >> >> This is evident in a minimal fortran example such as the attached >> example petsc_ex.F90 >> >> with the following error: >> >> ==22616== Conditional jump or move depends on uninitialised value(s) >> ==22616== at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185) >> ==22616== by 0x4FA4DAC: PetscMallocA (mal.c:413) >> ==22616== by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652) >> ==22616== by 0x50A1104: VecScatterSetUp (vscatfce.c:209) >> ==22616== by 0x509EE3B: VecScatterCreate (vscreate.c:280) >> ==22616== by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108) >> ==22616== by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155) >> ==22616== by 0x5798446: VecView_MPI_DA (gr2.c:720) >> ==22616== by 0x51BC7D8: VecView (vector.c:574) >> ==22616== by 0x4F4ECA1: PetscObjectView (destroy.c:90) >> ==22616== by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126) >> >> and consequently wrong results in the natural vec >> >> >> I was looking at the fortran example if I did forget something but I can >> also see the same error, i.e. not being valgrind clean, in pure C - PETSc: >> >> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun >> --allow-run-as-root -np 2 valgrind ./ex14 >> >> I then tried various docker/podman linux distributions to make sure that >> my setup is clean and to me it seems that this error is confined to the >> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo. >> >> I tried other images from dockerhub including >> >> gcc:7.4.0 :: where I could neither install openmpi nor mpich through >> apt, however works with --download-openmpi and --download-mpich >> >> ubuntu:rolling(19.04) <-- work >> >> debian:latest & :stable <-- works >> >> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich >> or with petsc-configure --download-openmpi or --download-mpich >> >> >> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I >> guess I'll go with a custom mpi install but given that ubuntu:latest is >> widely spread, do you think there is an easy solution to the error? >> >> I guess you are not eager to delve into this issue with old mpi versions >> but in case you find some spare time, maybe you find the root cause >> and/or a workaround. >> >> Many thanks, >> Fabian >> >