Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-31 Thread Fabian.Jakub via petsc-dev
Awesome, many thanks for your efforts!

On 7/31/19 9:17 PM, Zhang, Junchao wrote:
> Hi, Fabian,
> I found it is an OpenMPI bug w.r.t self-to-self MPI_Send/Recv using 
> MPI_ANY_SOURCE for message matching. OpenMPI does not put correct value in 
> recv buffer.
> I have a workaround 
> jczhang/fix-ubuntu-openmpi-anysource<https://bitbucket.org/petsc/petsc/branch/jczhang/fix-ubuntu-openmpi-anysource>.
>  I tested with your petsc_ex.F90 and $PETSC_DIR/src/dm/examples/tests/ex14.  
> The majority of valgrind errors disappeared. A few left are in ompi_mpi_init 
> and we can ignore them.
> I filed a bug report to OpenMPI 
> https://www.mail-archive.com/users@lists.open-mpi.org//msg33383.html and hope 
> they can fix it in Ubuntu.
> Thanks.
> 
> --Junchao Zhang
> 
> 
> On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev 
> mailto:petsc-dev@mcs.anl.gov>> wrote:
> Dear Petsc Team,
> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
> errors in DMDAGlobalToNatural.
> 
> This is evident in a minimal fortran example such as the attached
> example petsc_ex.F90
> 
> with the following error:
> 
> ==22616== Conditional jump or move depends on uninitialised value(s)
> ==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
> ==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
> ==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
> ==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
> ==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
> ==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
> ==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
> ==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
> ==22616==by 0x51BC7D8: VecView (vector.c:574)
> ==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
> ==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)
> 
> and consequently wrong results in the natural vec
> 
> 
> I was looking at the fortran example if I did forget something but I can
> also see the same error, i.e. not being valgrind clean, in pure C - PETSc:
> 
> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
> --allow-run-as-root -np 2 valgrind ./ex14
> 
> I then tried various docker/podman linux distributions to make sure that
> my setup is clean and to me it seems that this error is confined to the
> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.
> 
> I tried other images from dockerhub including
> 
> gcc:7.4.0 :: where I could neither install openmpi nor mpich through
> apt, however works with --download-openmpi and --download-mpich
> 
> ubuntu:rolling(19.04) <-- work
> 
> debian:latest & :stable <-- works
> 
> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
> or with petsc-configure --download-openmpi or --download-mpich
> 
> 
> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
> guess I'll go with a custom mpi install but given that ubuntu:latest is
> widely spread, do you think there is an easy solution to the error?
> 
> I guess you are not eager to delve into this issue with old mpi versions
> but in case you find some spare time, maybe you find the root cause
> and/or a workaround.
> 
> Many thanks,
> Fabian
> 



[petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Fabian.Jakub via petsc-dev
Dear Petsc Team,
Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
(Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
errors in DMDAGlobalToNatural.

This is evident in a minimal fortran example such as the attached
example petsc_ex.F90

with the following error:

==22616== Conditional jump or move depends on uninitialised value(s)
==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
==22616==by 0x51BC7D8: VecView (vector.c:574)
==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)

and consequently wrong results in the natural vec


I was looking at the fortran example if I did forget something but I can
also see the same error, i.e. not being valgrind clean, in pure C - PETSc:

cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
--allow-run-as-root -np 2 valgrind ./ex14

I then tried various docker/podman linux distributions to make sure that
my setup is clean and to me it seems that this error is confined to the
particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.

I tried other images from dockerhub including

gcc:7.4.0 :: where I could neither install openmpi nor mpich through
apt, however works with --download-openmpi and --download-mpich

ubuntu:rolling(19.04) <-- work

debian:latest & :stable <-- works

ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
or with petsc-configure --download-openmpi or --download-mpich


Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
guess I'll go with a custom mpi install but given that ubuntu:latest is
widely spread, do you think there is an easy solution to the error?

I guess you are not eager to delve into this issue with old mpi versions
but in case you find some spare time, maybe you find the root cause
and/or a workaround.

Many thanks,
Fabian
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules

run:: petsc_ex
mpirun -np 9 valgrind ./petsc_ex -show_gVec

petsc_ex:: petsc_ex.F90
${PETSC_FCOMPILE} -c petsc_ex.F90 -o petsc_ex.o
${FLINKER} petsc_ex.o -o petsc_ex ${PETSC_LIB}

clean::
rm -rf *.o petsc_ex
program main
#include "petsc/finclude/petsc.h"

  use petsc
  implicit none

  PetscInt, parameter :: Ndof=1, stencil_size=1
  PetscInt, parameter :: Nx=3, Ny=3
  PetscErrorCode :: myid, commsize, ierr
  PetscScalar, pointer :: xv1d(:)

  type(tDM) :: da
  type(tVec) :: gVec!, naturalVec


  call PetscInitialize(PETSC_NULL_CHARACTER, ierr)
  call mpi_comm_rank(PETSC_COMM_WORLD, myid, ierr)
  call mpi_comm_size(PETSC_COMM_WORLD, commsize, ierr)

  call DMDACreate2d(PETSC_COMM_WORLD, &
DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, &
DMDA_STENCIL_STAR, &
Nx, Ny, PETSC_DECIDE, PETSC_DECIDE, Ndof, stencil_size, &
PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, da, ierr)
  call DMSetup(da, ierr)
  call DMSetFromOptions(da, ierr)

  call DMCreateGlobalVector(da, gVec, ierr)
  call VecGetArrayF90(gVec, xv1d, ierr)
  xv1d(:) = real(myid, kind(xv1d))
  print *,myid, 'xv1d', xv1d, ':', xv1d
  call VecRestoreArrayF90(gVec, xv1d, ierr)

  call PetscObjectViewFromOptions(gVec, PETSC_NULL_VEC, "-show_gVec", ierr)

  call VecDestroy(gVec, ierr)
  call DMDestroy(da, ierr)
  call PetscFinalize(ierr)
end program
# Dockerfile to reproduce valgrind errors
# in the PETSc Example src/dm/examples/tests/ex14
# with the Ubuntu:latest (18.04) openmpi (2.1.1)
#
# invoked via: podman build -f Dockerfile.ubuntu_latest.test_petsc -t 
test_petsc_ex14_ubuntu_latest

FROM ubuntu:latest
#FROM ubuntu:rolling

RUN apt-get update && \
  apt-get install -fy cmake gfortran git libopenblas-dev libopenmpi-dev 
openmpi-bin python valgrind && \
  apt-get autoremove && apt-get clean

RUN cd $HOME && \
  echo "export PETSC_DIR=$HOME/petsc" >> $HOME/.profile && \
  echo "export PETSC_ARCH=debug" >> $HOME/.profile && \
  . $HOME/.profile && \
  git clone --depth=1 https://bitbucket.org/petsc/petsc -b master 
$PETSC_DIR && \
  cd $PETSC_DIR && \
  ./configure --with-cc=$(which mpicc) --with-fortran-bindings=0 
--with-fc=0 && \
  make

RUN cd $HOME && . $HOME/.profile && \
  cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun 
--allow-run-as-root -np 2 valgrind ./ex14


Re: [petsc-dev] Issues with Fortran Interfaces for PetscSort routines

2019-07-29 Thread Fabian.Jakub via petsc-dev
Fixes it for me. Many thanks for the prompt reply!


On 7/30/19 12:34 AM, Zhang, Junchao wrote:
> Fixed in jczhang/fix-sort-fortran-binding and will be in master later. Thanks.
> --Junchao Zhang
> 
> 
> On Mon, Jul 29, 2019 at 10:14 AM Fabian.Jakub via petsc-dev 
> mailto:petsc-dev@mcs.anl.gov>> wrote:
> Dear Petsc,
> 
> Commit 051fd8986cf23c0556f4229193defe128fafa1f7 changed the C signature
> of the sorting routines and as a result I cannot compile against them
> anymore from Fortran.
> I tried to rebuild Petsc from scratch and did a make allfortranstubs but
> still to no avail.
> 
> I attach a simple fortran program that calls PetscSortInt and gives the
> following error at compile time.
> 
> 
> 
> 
> petsc_fortran_sort.F90:15:27:
> 
>call PetscSortInt(N, x, ierr)
>1
> Error: Rank mismatch in argument ‘b’ at (1) (scalar and rank-1)
> 
> Same applies for other routines such as PetscSortIntWithArrayPair...
> 
> I am not sure where to find the FortranInterfaces and currently had no
> time to dig deeper.
> 
> Please let me know if I have missed something stupid.
> 
> Many thanks,
> 
> Fabian
> 
> 
> P.S. Petsc was compiled with
> --with-fortran
> --with-fortran-interfaces
> --with-shared-libraries=1
> 



[petsc-dev] Issues with Fortran Interfaces for PetscSort routines

2019-07-29 Thread Fabian.Jakub via petsc-dev
Dear Petsc,

Commit 051fd8986cf23c0556f4229193defe128fafa1f7 changed the C signature
of the sorting routines and as a result I cannot compile against them
anymore from Fortran.
I tried to rebuild Petsc from scratch and did a make allfortranstubs but
still to no avail.

I attach a simple fortran program that calls PetscSortInt and gives the
following error at compile time.




petsc_fortran_sort.F90:15:27:

   call PetscSortInt(N, x, ierr)
   1
Error: Rank mismatch in argument ‘b’ at (1) (scalar and rank-1)

Same applies for other routines such as PetscSortIntWithArrayPair...

I am not sure where to find the FortranInterfaces and currently had no
time to dig deeper.

Please let me know if I have missed something stupid.

Many thanks,

Fabian


P.S. Petsc was compiled with
--with-fortran
--with-fortran-interfaces
--with-shared-libraries=1
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules

run:: petsc_fortran_sort
./petsc_fortran_sort

petsc_fortran_sort:: petsc_fortran_sort.F90
${PETSC_FCOMPILE} -c petsc_fortran_sort.F90
${FLINKER} petsc_fortran_sort.o -o petsc_fortran_sort 
${PETSC_LIB}

clean::
rm -rf *.o petsc_fortran_sort
program main
#include "petsc/finclude/petsc.h"

  use petsc
  implicit none

  PetscErrorCode :: ierr
  PetscInt, parameter :: N=3
  PetscInt :: x(N)

  call PetscInitialize(PETSC_NULL_CHARACTER,ierr)

  x = [3, 2, 1]

  call PetscSortInt(N, x, ierr)

  call PetscFinalize(ierr)
end program


[petsc-dev] Error in HDF5 dumps of DMPlex labels

2019-02-14 Thread Fabian.Jakub via petsc-dev
Dear Petsc Team!

I had an issue when writing out DMPlex objects through hdf5.


This comes from a DMLabel that has only entries on non-local mesh points.
The DMLabel write only includes local parts of the label and so leads to
a zero sized write for the index set.
This seems to be fine except that the hdf5 chunksize is set to zero
which is not allowed.

I added a minimal example to illustrate the error.
It creates a 2D DMPlex in serial, distributes it, labels the nonlocal
points in the mesh and dumps it via PetscObjectViewer to HDF5.
Run with:

   make plex.h5

I also attached a quick fix to override the chunksize.

Please let me know if you anything extra and also if this is expected
behavior... I could certainly with the fact that DMLabel is not supposed
to work this way.

Many thanks,

Fabian

From a8fb918b6f1ef49b8b14c5c492581ff84d484eb6 Mon Sep 17 00:00:00 2001
From: "Fabian.Jakub" 
Date: Thu, 14 Feb 2019 13:21:28 +0100
Subject: [PATCH] fix hdf5 chunksizes of 0

  chunksize must not be 0.
  H5Pset_chunk,(chunkspace, dim, chunkDims) will otherwise give an
  error.
  This happened for example when dumping a dmlabel inside a dmplex which
  has only entries on non-local points.
---
 src/vec/is/is/impls/general/general.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/vec/is/is/impls/general/general.c b/src/vec/is/is/impls/general/general.c
index c6930cf..476a149 100644
--- a/src/vec/is/is/impls/general/general.c
+++ b/src/vec/is/is/impls/general/general.c
@@ -264,7 +264,7 @@ static PetscErrorCode ISView_General_HDF5(IS is, PetscViewer viewer)
   ierr = PetscHDF5IntCast(N/bs,dims + dim);CHKERRQ(ierr);
 
   maxDims[dim]   = dims[dim];
-  chunkDims[dim] = dims[dim];
+  chunkDims[dim] = PetscMax(1,dims[dim]);
   ++dim;
   if (bs >= 1) {
 dims[dim]  = bs;
-- 
2.7.4

program main
#include "petsc/finclude/petsc.h"

  use petsc
  implicit none

  PetscErrorCode :: ierr
  PetscInt, parameter :: petscint_dummy=0
  integer, parameter :: pi=kind(petscint_dummy)

  type(tDM) :: dm, dmdist

  call PetscInitialize(PETSC_NULL_CHARACTER,ierr); CHKERRQ(ierr)

  call create_plex(PETSC_COMM_WORLD, dm)
  call PetscObjectViewFromOptions(dm, PETSC_NULL_VEC, "-show_serial_plex", ierr); CHKERRQ(ierr)

  call distribute_dmplex(dm, dmdist)
  call PetscObjectViewFromOptions(dmdist, PETSC_NULL_VEC, "-show_dist_plex", ierr); CHKERRQ(ierr)

  call label_non_local_points(dmdist)
  call PetscObjectViewFromOptions(dmdist, PETSC_NULL_VEC, "-show_labeled_plex", ierr); CHKERRQ(ierr)

  call DMDestroy(dmdist, ierr);CHKERRQ(ierr)
  call DMDestroy(dm, ierr);CHKERRQ(ierr)
  call PetscFinalize(ierr)
  contains

subroutine create_plex(comm, dm)
  integer, intent(in) :: comm
  type(tDM), intent(out) :: dm
  integer :: myid

  PetscInt :: i, k, Nfaces, Nedges, Nverts, chartsize

  call mpi_comm_rank(comm, myid, ierr);CHKERRQ(ierr)
  call DMPlexCreate(comm, dm, ierr);CHKERRQ(ierr)
  call PetscObjectSetName(dm, 'testplex', ierr);CHKERRQ(ierr)
  call DMSetDimension(dm, 2_pi, ierr);CHKERRQ(ierr)

  if(myid.eq.0) then

!   1611_17_12___18
!|  /|\  |
!| / | \ |
!|/  |  \3   |
!|   0   /   |   \   |
!|  /|\  |
!8 7 | 9 10
!|/  |  \|
!|   /   8   \   |
!|  /1   |2   \  |
!| / | \ |
!|/  |  \|
!/   |   \
!   13-414-5-15

Nfaces = 4
Nedges = 9
Nverts = 6
chartsize = 19
  else
Nfaces = 0
Nedges = 0
Nverts = 0
chartsize = 0
  endif

  call DMPlexSetChart(dm, 0_pi, chartsize, ierr); CHKERRQ(ierr)

  ! Preallocation
  k=0
  ! cell has 3 edges
  do i = 1, Nfaces
call DMPlexSetConeSize(dm, k, 3_pi, ierr); CHKERRQ(ierr)
k = k+1
  enddo

  ! Edges have 2 vertices
  do i = 1, Nedges
call DMPlexSetConeSize(dm, k, 2_pi, ierr); CHKERRQ(ierr)
k = k+1
  enddo

  call DMSetUp(dm, ierr); CHKERRQ(ierr) ! Allocate space for cones

  if(myid.eq.0) then
! Setup Connections
call DMPlexSetCone(dm,  0_pi, [6_pi, 7_pi,11_pi], ierr); CHKERRQ(ierr)
call DMPlexSetCone(dm,  1_pi, [4_pi, 8_pi, 7_pi], ierr); CHKERRQ(ierr)
call DMPlexSetCone(dm,  2_pi, [5_pi, 9_pi, 8_pi], ierr); CHKERRQ(ierr)
call DMPlexSetCone(dm,  3_pi, [9_pi,10_pi,12_pi], ierr); CHKERRQ(ierr)

call DMPlexSetCone(dm,  4_pi, [13_pi,14_pi], ierr); CHKERRQ(ierr)
call 

[petsc-dev] patch for wrong integer type

2019-02-04 Thread Fabian.Jakub via petsc-dev
Dear Petsc Team,

I recently had segfaults when dumping DMPlexs through the
PetscObjectViewer into hdf5 files

This happens to me with 64 bit integers and I think there is a PetscInt
where an int should be placed.

Please have a look at the attached patch.

Yours,

Fabian
From 78a8c48ed0956277273d50a22c456bf0d43db235 Mon Sep 17 00:00:00 2001
From: "Fabian.Jakub" 
Date: Mon, 4 Feb 2019 19:48:37 +0100
Subject: [PATCH] fix integer type given to PetscStrToArray

Had segfaults when dumping DMPlexs through HDF5 with 64-bit integers
Given integer was PetscInt* but expects int*
---
 src/sys/classes/viewer/impls/hdf5/hdf5v.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sys/classes/viewer/impls/hdf5/hdf5v.c b/src/sys/classes/viewer/impls/hdf5/hdf5v.c
index a87d77c..ae58aa8 100644
--- a/src/sys/classes/viewer/impls/hdf5/hdf5v.c
+++ b/src/sys/classes/viewer/impls/hdf5/hdf5v.c
@@ -1009,7 +1009,7 @@ static PetscErrorCode PetscViewerHDF5Traverse_Internal(PetscViewer viewer, const
   const char rootGroupName[] = "/";
   hid_t  h5;
   PetscBool  exists=PETSC_FALSE;
-  PetscInt   i,n;
+  inti, n;
   char   **hierarchy;
   char   buf[PETSC_MAX_PATH_LEN]="";
   PetscErrorCode ierr;
-- 
2.7.4



signature.asc
Description: OpenPGP digital signature