Re: [petsc-dev] MatPinToCPU

2019-07-30 Thread Mark Adams via petsc-dev
On Mon, Jul 29, 2019 at 11:27 PM Smith, Barry F.  wrote:

>
>   Thanks. Could you please send the 24 processors with the GPU?
>

That is in  out_cuda_24


>Note the final column of the table gives you the percentage of flops
> (not rates, actual operations) on the GPU. For you biggest run it is
>
>For the MatMult it is 18 percent and for KSP solve it is 23 percent. I
> think this is much too low, we'd like to see well over 90 percent of the
> flops on the GPU; or 95 or more. Is this because you are forced to put very
> large matrices only the CPU?
>

Humm, that is strange. BLAS1 stuff is 100% GPU but the coarse grids are on
the CPU. This could be because it is > 99.5%. And there is this in the last
solve phase:

MatMult  679 1.0 5.2220e+00 1.2 7.58e+09 1.3 8.0e+07 1.1e+04
0.0e+00  1 39 14  8  0   3 74 79 60  0 16438647   438720307578 1.99e+02
 519 2.55e+02 18
MatMultAdd   150 1.0 1.1836e+00 4.7 3.41e+08 1.2 1.0e+07 1.8e+03
0.0e+00  0  2  2  0  0   1  3 10  1  0 3409019   191195194120 2.48e+01
  60 2.25e+00 21
MatMultTranspose 150 1.0 5.7940e-01 2.4 3.37e+08 1.2 1.0e+07 1.8e+03
0.0e+00  0  2  2  0  0   0  3 10  1  0 6867795   2539317196 38 1.02e+02
 150 3.22e+00 92

I have added print statements to MatMult_[CUDA,CPU] and it looks fine. Well
over 90% should be on the GPU. I am puzzled. I'll keep digging but the log
statements look OK.


>For the MatMult if we assume the flop rate for the GPU is 25 times as
> fast as the CPU and 18 percent of the flops are done on the GPU then the
> ratio of time for the GPU should be 82.7 percent of the time for the CPU
> but  it is .90; so where is the extra time? Seems too much than just for
> the communication.
>

I don't follow this analysis but the there is something funny about the
logging ...


>
>There is so much information and so much happening in the final stage
> that it is hard to discern what is killing the performance in the GPU case
> for the KSP solve. Anyway you can just have a stage at the end with several
> KSP solves and nothing else?
>

I added this, eg,

--- Event Stage 7: KSP only

SFBcastOpBegin   263 1.0 8.4140e-03 2.7 0.00e+00 0.0 6.1e+04 2.5e+03
0.0e+00  0  0 15  7  0   1  0 91 98  0 0   0  0 0.00e+000
0.00e+00  0
SFBcastOpEnd 263 1.0 6.6676e-02 6.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   8  0  0  0  0 0   0  0 0.00e+000
0.00e+00  0
SFReduceBegin 48 1.0 4.5977e-04 2.1 0.00e+00 0.0 6.4e+03 6.0e+02
0.0e+00  0  0  2  0  0   0  0  9  2  0 0   0  0 0.00e+000
0.00e+00  0
SFReduceEnd   48 1.0 5.4065e-0321.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
0.00e+00  0
MatMult  215 1.0 3.9271e-01 1.0 6.33e+08 1.4 5.5e+04 2.7e+03
0.0e+00  1 24 14  7  0  83 89 81 95  0 33405   177859430 1.75e+01  358
2.23e+01 17
MatMultAdd48 1.0 3.3079e-02 1.3 3.20e+07 1.3 6.4e+03 6.0e+02
0.0e+00  0  1  2  0  0   7  5  9  2  0 20318   106989 48 2.33e+00   48
2.24e-01 20
MatMultTranspose  48 1.0 1.1967e-02 1.8 3.15e+07 1.3 6.4e+03 6.0e+02
0.0e+00  0  1  2  0  0   2  4  9  2  0 55325   781863  0 0.00e+00   72
3.23e-01 93
MatSolve  24 0.0 3.6270e-03 0.0 1.02e+07 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2810   0  0 0.00e+000
0.00e+00  0
MatResidual   48 1.0 8.2272e-02 1.0 1.33e+08 1.4 1.2e+04 2.6e+03
0.0e+00  0  5  3  1  0  17 19 18 20  0 33284   136803 96 3.62e+00   72
4.50e+00 19
VecTDot   46 1.0 6.1646e-03 1.3 1.13e+06 1.2 0.0e+00 0.0e+00
4.6e+01  0  0  0  0  2   1  0  0  0 66  41096814  0 0.00e+000
0.00e+00 100
VecNorm   24 1.0 5.2724e-03 1.9 5.90e+05 1.2 0.0e+00 0.0e+00
2.4e+01  0  0  0  0  1   1  0  0  0 34  25075050  0 0.00e+000
0.00e+00 100
VecCopy  146 1.0 3.9029e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0 0   0  0 0.00e+00   24
9.87e-02  0
VecSet   169 1.0 1.3301e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
0.00e+00  0
VecAXPY   46 1.0 1.5963e-03 1.2 1.13e+06 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 15870   23070  0 0.00e+000
0.00e+00 100
VecAYPX  310 1.0 1.3059e-02 1.1 4.25e+06 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   3  1  0  0  0  7273   12000 48 1.97e-010
0.00e+00 100
VecAXPBYCZ96 1.0 6.8591e-03 1.2 6.19e+06 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  1  0  0  0 20134   46381  0 0.00e+000
0.00e+00 100
VecPointwiseMult 192 1.0 7.1075e-03 1.2 1.24e+06 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0  38864184 24 9.87e-020
0.00e+00 100
VecScatterBegin  311 1.0 1.1026e-02 2.0 0.00e+00 0.0 6.8e+04 2.3e+03
0.0e+00  0  0 17  7  0   2  0100100  0 0   0  0 0.00e+00   72
3.50e-

[petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Fabian.Jakub via petsc-dev
Dear Petsc Team,
Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
(Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
errors in DMDAGlobalToNatural.

This is evident in a minimal fortran example such as the attached
example petsc_ex.F90

with the following error:

==22616== Conditional jump or move depends on uninitialised value(s)
==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
==22616==by 0x51BC7D8: VecView (vector.c:574)
==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)

and consequently wrong results in the natural vec


I was looking at the fortran example if I did forget something but I can
also see the same error, i.e. not being valgrind clean, in pure C - PETSc:

cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
--allow-run-as-root -np 2 valgrind ./ex14

I then tried various docker/podman linux distributions to make sure that
my setup is clean and to me it seems that this error is confined to the
particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.

I tried other images from dockerhub including

gcc:7.4.0 :: where I could neither install openmpi nor mpich through
apt, however works with --download-openmpi and --download-mpich

ubuntu:rolling(19.04) <-- work

debian:latest & :stable <-- works

ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
or with petsc-configure --download-openmpi or --download-mpich


Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
guess I'll go with a custom mpi install but given that ubuntu:latest is
widely spread, do you think there is an easy solution to the error?

I guess you are not eager to delve into this issue with old mpi versions
but in case you find some spare time, maybe you find the root cause
and/or a workaround.

Many thanks,
Fabian
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules

run:: petsc_ex
mpirun -np 9 valgrind ./petsc_ex -show_gVec

petsc_ex:: petsc_ex.F90
${PETSC_FCOMPILE} -c petsc_ex.F90 -o petsc_ex.o
${FLINKER} petsc_ex.o -o petsc_ex ${PETSC_LIB}

clean::
rm -rf *.o petsc_ex
program main
#include "petsc/finclude/petsc.h"

  use petsc
  implicit none

  PetscInt, parameter :: Ndof=1, stencil_size=1
  PetscInt, parameter :: Nx=3, Ny=3
  PetscErrorCode :: myid, commsize, ierr
  PetscScalar, pointer :: xv1d(:)

  type(tDM) :: da
  type(tVec) :: gVec!, naturalVec


  call PetscInitialize(PETSC_NULL_CHARACTER, ierr)
  call mpi_comm_rank(PETSC_COMM_WORLD, myid, ierr)
  call mpi_comm_size(PETSC_COMM_WORLD, commsize, ierr)

  call DMDACreate2d(PETSC_COMM_WORLD, &
DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, &
DMDA_STENCIL_STAR, &
Nx, Ny, PETSC_DECIDE, PETSC_DECIDE, Ndof, stencil_size, &
PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, da, ierr)
  call DMSetup(da, ierr)
  call DMSetFromOptions(da, ierr)

  call DMCreateGlobalVector(da, gVec, ierr)
  call VecGetArrayF90(gVec, xv1d, ierr)
  xv1d(:) = real(myid, kind(xv1d))
  print *,myid, 'xv1d', xv1d, ':', xv1d
  call VecRestoreArrayF90(gVec, xv1d, ierr)

  call PetscObjectViewFromOptions(gVec, PETSC_NULL_VEC, "-show_gVec", ierr)

  call VecDestroy(gVec, ierr)
  call DMDestroy(da, ierr)
  call PetscFinalize(ierr)
end program
# Dockerfile to reproduce valgrind errors
# in the PETSc Example src/dm/examples/tests/ex14
# with the Ubuntu:latest (18.04) openmpi (2.1.1)
#
# invoked via: podman build -f Dockerfile.ubuntu_latest.test_petsc -t 
test_petsc_ex14_ubuntu_latest

FROM ubuntu:latest
#FROM ubuntu:rolling

RUN apt-get update && \
  apt-get install -fy cmake gfortran git libopenblas-dev libopenmpi-dev 
openmpi-bin python valgrind && \
  apt-get autoremove && apt-get clean

RUN cd $HOME && \
  echo "export PETSC_DIR=$HOME/petsc" >> $HOME/.profile && \
  echo "export PETSC_ARCH=debug" >> $HOME/.profile && \
  . $HOME/.profile && \
  git clone --depth=1 https://bitbucket.org/petsc/petsc -b master 
$PETSC_DIR && \
  cd $PETSC_DIR && \
  ./configure --with-cc=$(which mpicc) --with-fortran-bindings=0 
--with-fc=0 && \
  make

RUN cd $HOME && . $HOME/.profile && \
  cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun 
--allow-run-as-root -np 2 valgrind ./ex14


Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Balay, Satish via petsc-dev
We've seen such behavior with ubuntu default OpenMPI - but have no
idea why this happens or if we can work around it.

Last I checked - the same version of openmpi - when installed
separately did not exhibit such issues..

Satish

On Tue, 30 Jul 2019, Fabian.Jakub via petsc-dev wrote:

> Dear Petsc Team,
> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
> errors in DMDAGlobalToNatural.
> 
> This is evident in a minimal fortran example such as the attached
> example petsc_ex.F90
> 
> with the following error:
> 
> ==22616== Conditional jump or move depends on uninitialised value(s)
> ==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
> ==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
> ==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
> ==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
> ==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
> ==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
> ==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
> ==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
> ==22616==by 0x51BC7D8: VecView (vector.c:574)
> ==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
> ==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)
> 
> and consequently wrong results in the natural vec
> 
> 
> I was looking at the fortran example if I did forget something but I can
> also see the same error, i.e. not being valgrind clean, in pure C - PETSc:
> 
> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
> --allow-run-as-root -np 2 valgrind ./ex14
> 
> I then tried various docker/podman linux distributions to make sure that
> my setup is clean and to me it seems that this error is confined to the
> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.
> 
> I tried other images from dockerhub including
> 
> gcc:7.4.0 :: where I could neither install openmpi nor mpich through
> apt, however works with --download-openmpi and --download-mpich
> 
> ubuntu:rolling(19.04) <-- work
> 
> debian:latest & :stable <-- works
> 
> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
> or with petsc-configure --download-openmpi or --download-mpich
> 
> 
> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
> guess I'll go with a custom mpi install but given that ubuntu:latest is
> widely spread, do you think there is an easy solution to the error?
> 
> I guess you are not eager to delve into this issue with old mpi versions
> but in case you find some spare time, maybe you find the root cause
> and/or a workaround.
> 
> Many thanks,
> Fabian
> 



Re: [petsc-dev] MatPinToCPU

2019-07-30 Thread Smith, Barry F. via petsc-dev


  Sorry, I meant 24 CPU only


> On Jul 30, 2019, at 9:19 AM, Mark Adams  wrote:
> 
> 
> 
> On Mon, Jul 29, 2019 at 11:27 PM Smith, Barry F.  wrote:
> 
>   Thanks. Could you please send the 24 processors with the GPU? 
> 
> That is in  out_cuda_24
> 
> 
>Note the final column of the table gives you the percentage of flops (not 
> rates, actual operations) on the GPU. For you biggest run it is
> 
>For the MatMult it is 18 percent and for KSP solve it is 23 percent. I 
> think this is much too low, we'd like to see well over 90 percent of the 
> flops on the GPU; or 95 or more. Is this because you are forced to put very 
> large matrices only the CPU? 
> 
> Humm, that is strange. BLAS1 stuff is 100% GPU but the coarse grids are on 
> the CPU. This could be because it is > 99.5%. And there is this in the last 
> solve phase:
> 
> MatMult  679 1.0 5.2220e+00 1.2 7.58e+09 1.3 8.0e+07 1.1e+04 
> 0.0e+00  1 39 14  8  0   3 74 79 60  0 16438647   438720307578 1.99e+02  
> 519 2.55e+02 18
> MatMultAdd   150 1.0 1.1836e+00 4.7 3.41e+08 1.2 1.0e+07 1.8e+03 
> 0.0e+00  0  2  2  0  0   1  3 10  1  0 3409019   191195194120 2.48e+01   
> 60 2.25e+00 21
> MatMultTranspose 150 1.0 5.7940e-01 2.4 3.37e+08 1.2 1.0e+07 1.8e+03 
> 0.0e+00  0  2  2  0  0   0  3 10  1  0 6867795   2539317196 38 1.02e+02  
> 150 3.22e+00 92
>  
> I have added print statements to MatMult_[CUDA,CPU] and it looks fine. Well 
> over 90% should be on the GPU. I am puzzled. I'll keep digging but the log 
> statements look OK.
> 
> 
>For the MatMult if we assume the flop rate for the GPU is 25 times as fast 
> as the CPU and 18 percent of the flops are done on the GPU then the ratio of 
> time for the GPU should be 82.7 percent of the time for the CPU but  it is 
> .90; so where is the extra time? Seems too much than just for the 
> communication. 
> 
> I don't follow this analysis but the there is something funny about the 
> logging ...
>  
> 
>There is so much information and so much happening in the final stage that 
> it is hard to discern what is killing the performance in the GPU case for the 
> KSP solve. Anyway you can just have a stage at the end with several KSP 
> solves and nothing else? 
> 
> I added this, eg, 
> 
> --- Event Stage 7: KSP only
> 
> SFBcastOpBegin   263 1.0 8.4140e-03 2.7 0.00e+00 0.0 6.1e+04 2.5e+03 
> 0.0e+00  0  0 15  7  0   1  0 91 98  0 0   0  0 0.00e+000 
> 0.00e+00  0
> SFBcastOpEnd 263 1.0 6.6676e-02 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   8  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
> SFReduceBegin 48 1.0 4.5977e-04 2.1 0.00e+00 0.0 6.4e+03 6.0e+02 
> 0.0e+00  0  0  2  0  0   0  0  9  2  0 0   0  0 0.00e+000 
> 0.00e+00  0
> SFReduceEnd   48 1.0 5.4065e-0321.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
> MatMult  215 1.0 3.9271e-01 1.0 6.33e+08 1.4 5.5e+04 2.7e+03 
> 0.0e+00  1 24 14  7  0  83 89 81 95  0 33405   177859430 1.75e+01  358 
> 2.23e+01 17
> MatMultAdd48 1.0 3.3079e-02 1.3 3.20e+07 1.3 6.4e+03 6.0e+02 
> 0.0e+00  0  1  2  0  0   7  5  9  2  0 20318   106989 48 2.33e+00   48 
> 2.24e-01 20
> MatMultTranspose  48 1.0 1.1967e-02 1.8 3.15e+07 1.3 6.4e+03 6.0e+02 
> 0.0e+00  0  1  2  0  0   2  4  9  2  0 55325   781863  0 0.00e+00   72 
> 3.23e-01 93
> MatSolve  24 0.0 3.6270e-03 0.0 1.02e+07 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  2810   0  0 0.00e+000 
> 0.00e+00  0
> MatResidual   48 1.0 8.2272e-02 1.0 1.33e+08 1.4 1.2e+04 2.6e+03 
> 0.0e+00  0  5  3  1  0  17 19 18 20  0 33284   136803 96 3.62e+00   72 
> 4.50e+00 19
> VecTDot   46 1.0 6.1646e-03 1.3 1.13e+06 1.2 0.0e+00 0.0e+00 
> 4.6e+01  0  0  0  0  2   1  0  0  0 66  41096814  0 0.00e+000 
> 0.00e+00 100
> VecNorm   24 1.0 5.2724e-03 1.9 5.90e+05 1.2 0.0e+00 0.0e+00 
> 2.4e+01  0  0  0  0  1   1  0  0  0 34  25075050  0 0.00e+000 
> 0.00e+00 100
> VecCopy  146 1.0 3.9029e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   1  0  0  0  0 0   0  0 0.00e+00   24 
> 9.87e-02  0
> VecSet   169 1.0 1.3301e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
> VecAXPY   46 1.0 1.5963e-03 1.2 1.13e+06 1.2 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 15870   23070  0 0.00e+000 
> 0.00e+00 100
> VecAYPX  310 1.0 1.3059e-02 1.1 4.25e+06 1.2 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   3  1  0  0  0  7273   12000 48 1.97e-010 
> 0.00e+00 100
> VecAXPBYCZ96 1.0 6.8591e-03 1.2 6.19e+06 1.2 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   1  1  0  0  0 20134   46381  0 0.00e+000 
> 0.00e+00 100
> V

Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Smith, Barry F. via petsc-dev


  Satish,

  Can you please add to MPI.py a check for this and simply reject it telling 
the user there are bugs in that version of OpenMP/ubuntu? 

  It is not debuggable, and hence not fixable and wastes everyones time and 
could even lead to wrong results (which is worse than crashing). We've had 
multiple reports of this.

  Barry


> On Jul 30, 2019, at 10:17 AM, Balay, Satish via petsc-dev 
>  wrote:
> 
> We've seen such behavior with ubuntu default OpenMPI - but have no
> idea why this happens or if we can work around it.
> 
> Last I checked - the same version of openmpi - when installed
> separately did not exhibit such issues..
> 
> Satish
> 
> On Tue, 30 Jul 2019, Fabian.Jakub via petsc-dev wrote:
> 
>> Dear Petsc Team,
>> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
>> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
>> errors in DMDAGlobalToNatural.
>> 
>> This is evident in a minimal fortran example such as the attached
>> example petsc_ex.F90
>> 
>> with the following error:
>> 
>> ==22616== Conditional jump or move depends on uninitialised value(s)
>> ==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
>> ==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
>> ==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
>> ==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
>> ==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
>> ==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
>> ==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
>> ==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
>> ==22616==by 0x51BC7D8: VecView (vector.c:574)
>> ==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
>> ==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)
>> 
>> and consequently wrong results in the natural vec
>> 
>> 
>> I was looking at the fortran example if I did forget something but I can
>> also see the same error, i.e. not being valgrind clean, in pure C - PETSc:
>> 
>> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
>> --allow-run-as-root -np 2 valgrind ./ex14
>> 
>> I then tried various docker/podman linux distributions to make sure that
>> my setup is clean and to me it seems that this error is confined to the
>> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.
>> 
>> I tried other images from dockerhub including
>> 
>> gcc:7.4.0 :: where I could neither install openmpi nor mpich through
>> apt, however works with --download-openmpi and --download-mpich
>> 
>> ubuntu:rolling(19.04) <-- work
>> 
>> debian:latest & :stable <-- works
>> 
>> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
>> or with petsc-configure --download-openmpi or --download-mpich
>> 
>> 
>> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
>> guess I'll go with a custom mpi install but given that ubuntu:latest is
>> widely spread, do you think there is an easy solution to the error?
>> 
>> I guess you are not eager to delve into this issue with old mpi versions
>> but in case you find some spare time, maybe you find the root cause
>> and/or a workaround.
>> 
>> Many thanks,
>> Fabian
>> 
> 



Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Zhang, Junchao via petsc-dev
Fabian,
  I happen have a Ubuntu virtual machine and I could reproduce the error with 
your mini-test, even with two processes. It is horrible to see wrong results in 
such a simple test.
  We'd better figure out whether it is a PETSc bug or an OpenMPI bug. If it is 
latter, which MPI call is at fault.

--Junchao Zhang


On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:
Dear Petsc Team,
Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
(Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
errors in DMDAGlobalToNatural.

This is evident in a minimal fortran example such as the attached
example petsc_ex.F90

with the following error:

==22616== Conditional jump or move depends on uninitialised value(s)
==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
==22616==by 0x51BC7D8: VecView (vector.c:574)
==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)

and consequently wrong results in the natural vec


I was looking at the fortran example if I did forget something but I can
also see the same error, i.e. not being valgrind clean, in pure C - PETSc:

cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
--allow-run-as-root -np 2 valgrind ./ex14

I then tried various docker/podman linux distributions to make sure that
my setup is clean and to me it seems that this error is confined to the
particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.

I tried other images from dockerhub including

gcc:7.4.0 :: where I could neither install openmpi nor mpich through
apt, however works with --download-openmpi and --download-mpich

ubuntu:rolling(19.04) <-- work

debian:latest & :stable <-- works

ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
or with petsc-configure --download-openmpi or --download-mpich


Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
guess I'll go with a custom mpi install but given that ubuntu:latest is
widely spread, do you think there is an easy solution to the error?

I guess you are not eager to delve into this issue with old mpi versions
but in case you find some spare time, maybe you find the root cause
and/or a workaround.

Many thanks,
Fabian


Re: [petsc-dev] [petsc-users] MatMultTranspose memory usage

2019-07-30 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-users"  writes:

>The reason this worked for 4 processes is that the largest count in that 
> case was roughly 6,653,750,976/4 which does fit into an int. PETSc only needs 
> to know the number of nonzeros on each process, it doesn't need to know the 
> amount across all the processors. In other words you may want to use a 
> different PETSC_ARCH (different configuration) for small number of processors 
> and large number depending on how large your problem is. Or you can always 
> use 64 bit integers at a little performance and memory cost.

We could consider always using 64-bit ints for quantities like row
starts, keeping column indices (the "heavy" part) in 32-bit.  This may
become a more frequent issue with fatter nodes and many GPUs potentially
being driven by a single MPI rank.


Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-30 Thread Smith, Barry F. via petsc-dev


 Note in init.c that, by default, PETSc does not use PetscTrMallocDefault() 
when valgrind is running; because it doesn't necessarily make sense to put one 
memory checker on top of another memory checker. So, at a glance, I'm puzzled 
how it can be in the routine PetscTrMallocDefault(). Do you perhaps have 
-malloc_debug in a .petscrc file or in the environmental variable 
PETSC_OPTIONS? Anyways there is a problem but perhaps this is a hint where the 
problem is coming from?

 Barry


> On Jul 30, 2019, at 5:38 PM, Zhang, Junchao via petsc-dev 
>  wrote:
> 
> Fabian,
>  I happen have a Ubuntu virtual machine and I could reproduce the error with 
> your mini-test, even with two processes. It is horrible to see wrong results 
> in such a simple test.
>  We'd better figure out whether it is a PETSc bug or an OpenMPI bug. If it is 
> latter, which MPI call is at fault.
> 
> --Junchao Zhang
> 
> 
> On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev 
>  wrote:
> Dear Petsc Team,
> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
> errors in DMDAGlobalToNatural.
> 
> This is evident in a minimal fortran example such as the attached
> example petsc_ex.F90
> 
> with the following error:
> 
> ==22616== Conditional jump or move depends on uninitialised value(s)
> ==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
> ==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
> ==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
> ==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
> ==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
> ==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
> ==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
> ==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
> ==22616==by 0x51BC7D8: VecView (vector.c:574)
> ==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
> ==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)
> 
> and consequently wrong results in the natural vec
> 
> 
> I was looking at the fortran example if I did forget something but I can
> also see the same error, i.e. not being valgrind clean, in pure C - PETSc:
> 
> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
> --allow-run-as-root -np 2 valgrind ./ex14
> 
> I then tried various docker/podman linux distributions to make sure that
> my setup is clean and to me it seems that this error is confined to the
> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.
> 
> I tried other images from dockerhub including
> 
> gcc:7.4.0 :: where I could neither install openmpi nor mpich through
> apt, however works with --download-openmpi and --download-mpich
> 
> ubuntu:rolling(19.04) <-- work
> 
> debian:latest & :stable <-- works
> 
> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
> or with petsc-configure --download-openmpi or --download-mpich
> 
> 
> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
> guess I'll go with a custom mpi install but given that ubuntu:latest is
> widely spread, do you think there is an easy solution to the error?
> 
> I guess you are not eager to delve into this issue with old mpi versions
> but in case you find some spare time, maybe you find the root cause
> and/or a workaround.
> 
> Many thanks,
> Fabian