Re: [petsc-dev] Is master broken?

2019-07-31 Thread Smith, Barry F. via petsc-dev


  It is generated automatically and put in 
arch-linux2-c-debug/include/petscpkg_version.h  this include file is included 
at top of the "bad" source  file crashes so in theory everything is in order 
check that arch-linux2-c-debug/include/petscpkg_version.h contains 
PETSC_PKG_CUDA_VERSION_GE and similar macros. If not send configure.lo

check what is in arch-linux2-c-debug/include/petscpkg_version.h it nothing or 
broken send configure.lo


  Barry



> On Jul 31, 2019, at 9:28 PM, Mark Adams via petsc-dev  
> wrote:
> 
> I am seeing this when I pull master into my branch:
> 
> "/autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu"
>   , line 243: error: function call is not allowed in a constant
>   expression
>   #if PETSC_PKG_CUDA_VERSION_GE(10,1,0)
> 
> and I see that this macro does not seem to be defined:
> 
> 22:24 master= ~/Codes/petsc$ git grep PETSC_PKG_CUDA_VERSION_GE
> src/mat/impls/dense/seq/cuda/densecuda.cu:#if 
> PETSC_PKG_CUDA_VERSION_GE(10,1,0)



[petsc-dev] Is master broken?

2019-07-31 Thread Mark Adams via petsc-dev
I am seeing this when I pull master into my branch:

"/autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/dense/seq/cuda/
densecuda.cu"
  , line 243: error: function call is not allowed in a constant
  expression
  #if PETSC_PKG_CUDA_VERSION_GE(10,1,0)

and I see that this macro does not seem to be defined:

22:24 master= ~/Codes/petsc$ git grep PETSC_PKG_CUDA_VERSION_GE
src/mat/impls/dense/seq/cuda/densecuda.cu:#if
PETSC_PKG_CUDA_VERSION_GE(10,1,0)


Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-31 Thread Fabian.Jakub via petsc-dev
Awesome, many thanks for your efforts!

On 7/31/19 9:17 PM, Zhang, Junchao wrote:
> Hi, Fabian,
> I found it is an OpenMPI bug w.r.t self-to-self MPI_Send/Recv using 
> MPI_ANY_SOURCE for message matching. OpenMPI does not put correct value in 
> recv buffer.
> I have a workaround 
> jczhang/fix-ubuntu-openmpi-anysource.
>  I tested with your petsc_ex.F90 and $PETSC_DIR/src/dm/examples/tests/ex14.  
> The majority of valgrind errors disappeared. A few left are in ompi_mpi_init 
> and we can ignore them.
> I filed a bug report to OpenMPI 
> https://www.mail-archive.com/users@lists.open-mpi.org//msg33383.html and hope 
> they can fix it in Ubuntu.
> Thanks.
> 
> --Junchao Zhang
> 
> 
> On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev 
> mailto:petsc-dev@mcs.anl.gov>> wrote:
> Dear Petsc Team,
> Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
> (Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
> errors in DMDAGlobalToNatural.
> 
> This is evident in a minimal fortran example such as the attached
> example petsc_ex.F90
> 
> with the following error:
> 
> ==22616== Conditional jump or move depends on uninitialised value(s)
> ==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
> ==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
> ==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
> ==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
> ==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
> ==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
> ==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
> ==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
> ==22616==by 0x51BC7D8: VecView (vector.c:574)
> ==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
> ==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)
> 
> and consequently wrong results in the natural vec
> 
> 
> I was looking at the fortran example if I did forget something but I can
> also see the same error, i.e. not being valgrind clean, in pure C - PETSc:
> 
> cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
> --allow-run-as-root -np 2 valgrind ./ex14
> 
> I then tried various docker/podman linux distributions to make sure that
> my setup is clean and to me it seems that this error is confined to the
> particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.
> 
> I tried other images from dockerhub including
> 
> gcc:7.4.0 :: where I could neither install openmpi nor mpich through
> apt, however works with --download-openmpi and --download-mpich
> 
> ubuntu:rolling(19.04) <-- work
> 
> debian:latest & :stable <-- works
> 
> ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
> or with petsc-configure --download-openmpi or --download-mpich
> 
> 
> Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
> guess I'll go with a custom mpi install but given that ubuntu:latest is
> widely spread, do you think there is an easy solution to the error?
> 
> I guess you are not eager to delve into this issue with old mpi versions
> but in case you find some spare time, maybe you find the root cause
> and/or a workaround.
> 
> Many thanks,
> Fabian
> 



Re: [petsc-dev] DMDAGlobalToNatural errors with Ubuntu:latest; gcc 7 & Open MPI 2.1.1

2019-07-31 Thread Zhang, Junchao via petsc-dev
Hi, Fabian,
I found it is an OpenMPI bug w.r.t self-to-self MPI_Send/Recv using 
MPI_ANY_SOURCE for message matching. OpenMPI does not put correct value in recv 
buffer.
I have a workaround 
jczhang/fix-ubuntu-openmpi-anysource.
 I tested with your petsc_ex.F90 and $PETSC_DIR/src/dm/examples/tests/ex14.  
The majority of valgrind errors disappeared. A few left are in ompi_mpi_init 
and we can ignore them.
I filed a bug report to OpenMPI 
https://www.mail-archive.com/users@lists.open-mpi.org//msg33383.html and hope 
they can fix it in Ubuntu.
Thanks.

--Junchao Zhang


On Tue, Jul 30, 2019 at 9:47 AM Fabian.Jakub via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:
Dear Petsc Team,
Our cluster recently switched to Ubuntu 18.04 which has gcc 7.4 and
(Open MPI) 2.1.1 - with this I ended up with segfault and valgrind
errors in DMDAGlobalToNatural.

This is evident in a minimal fortran example such as the attached
example petsc_ex.F90

with the following error:

==22616== Conditional jump or move depends on uninitialised value(s)
==22616==at 0x4FA5CDB: PetscTrMallocDefault (mtr.c:185)
==22616==by 0x4FA4DAC: PetscMallocA (mal.c:413)
==22616==by 0x5090E94: VecScatterSetUp_SF (vscatsf.c:652)
==22616==by 0x50A1104: VecScatterSetUp (vscatfce.c:209)
==22616==by 0x509EE3B: VecScatterCreate (vscreate.c:280)
==22616==by 0x577B48B: DMDAGlobalToNatural_Create (dagtol.c:108)
==22616==by 0x577BB6D: DMDAGlobalToNaturalBegin (dagtol.c:155)
==22616==by 0x5798446: VecView_MPI_DA (gr2.c:720)
==22616==by 0x51BC7D8: VecView (vector.c:574)
==22616==by 0x4F4ECA1: PetscObjectView (destroy.c:90)
==22616==by 0x4F4F05E: PetscObjectViewFromOptions (destroy.c:126)

and consequently wrong results in the natural vec


I was looking at the fortran example if I did forget something but I can
also see the same error, i.e. not being valgrind clean, in pure C - PETSc:

cd $PETSC_DIR/src/dm/examples/tests && make ex14 && mpirun
--allow-run-as-root -np 2 valgrind ./ex14

I then tried various docker/podman linux distributions to make sure that
my setup is clean and to me it seems that this error is confined to the
particular gcc version 7.4 and (Open MPI) 2.1.1 from the ubuntu:latest repo.

I tried other images from dockerhub including

gcc:7.4.0 :: where I could neither install openmpi nor mpich through
apt, however works with --download-openmpi and --download-mpich

ubuntu:rolling(19.04) <-- work

debian:latest & :stable <-- works

ubuntu:latest(18.04) <-- fails in case of openmpi, but works with mpich
or with petsc-configure --download-openmpi or --download-mpich


Is this error with (Open MPI) 2.1.1 a known issue? In the meantime, I
guess I'll go with a custom mpi install but given that ubuntu:latest is
widely spread, do you think there is an easy solution to the error?

I guess you are not eager to delve into this issue with old mpi versions
but in case you find some spare time, maybe you find the root cause
and/or a workaround.

Many thanks,
Fabian


Re: [petsc-dev] [petsc-users] MatMultTranspose memory usage

2019-07-31 Thread Jed Brown via petsc-dev
https://bitbucket.org/petsc/petsc/issues/333/use-64-bit-indices-for-row-offsets-in

"Smith, Barry F."  writes:

>   Make an issue
>
>
>> On Jul 30, 2019, at 7:00 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F. via petsc-users"  writes:
>> 
>>>   The reason this worked for 4 processes is that the largest count in that 
>>> case was roughly 6,653,750,976/4 which does fit into an int. PETSc only 
>>> needs to know the number of nonzeros on each process, it doesn't need to 
>>> know the amount across all the processors. In other words you may want to 
>>> use a different PETSC_ARCH (different configuration) for small number of 
>>> processors and large number depending on how large your problem is. Or you 
>>> can always use 64 bit integers at a little performance and memory cost.
>> 
>> We could consider always using 64-bit ints for quantities like row
>> starts, keeping column indices (the "heavy" part) in 32-bit.  This may
>> become a more frequent issue with fatter nodes and many GPUs potentially
>> being driven by a single MPI rank.


Re: [petsc-dev] [petsc-users] MatMultTranspose memory usage

2019-07-31 Thread Smith, Barry F. via petsc-dev


  Make an issue


> On Jul 30, 2019, at 7:00 PM, Jed Brown  wrote:
> 
> "Smith, Barry F. via petsc-users"  writes:
> 
>>   The reason this worked for 4 processes is that the largest count in that 
>> case was roughly 6,653,750,976/4 which does fit into an int. PETSc only 
>> needs to know the number of nonzeros on each process, it doesn't need to 
>> know the amount across all the processors. In other words you may want to 
>> use a different PETSC_ARCH (different configuration) for small number of 
>> processors and large number depending on how large your problem is. Or you 
>> can always use 64 bit integers at a little performance and memory cost.
> 
> We could consider always using 64-bit ints for quantities like row
> starts, keeping column indices (the "heavy" part) in 32-bit.  This may
> become a more frequent issue with fatter nodes and many GPUs potentially
> being driven by a single MPI rank.