I was thinking of catching the overflow BEFORE it hits the malloc.  For 
example in DMSetUp_DA_3D() we have the unattractive code

#if !defined(PETSC_USE_64BIT_INDICES)
  if (((Petsc64bitInt) M)*((Petsc64bitInt) N)*((Petsc64bitInt) 
P)*((Petsc64bitInt) dof) > (Petsc64bitInt) PETSC_MPI_INT_MAX) 
SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,"Mesh of %D by %D by %D (dof) is too large 
for 32 bit indices",M,N,dof);
#endif

  Maybe we could introduce some macros like

#if !defined(PETSC_USE_64BIT_INDICES)
#define  PetscCheckOverflow2(M,N) if (((Petsc64bitInt) M)*((Petsc64bitInt) N) > 
(Petsc64bitInt) PETSC_MPI_INT_MAX) 
SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,”Multiply of %D by %D bis too large for 32 
bit indices",M,N;
#else
#define  PetscCheckOverflow2(a,b)  
#endif

and then check large products when they are likely to occur.

  Barry



On Jun 26, 2014, at 3:25 PM, Karl Rupp <[email protected]> wrote:

> Hey,
> 
> > It would be nice if we could automatically detect this issue more often.
> 
> Indeed.
> 
> We have
> 
> #define PetscMalloc(a,b)  ((a != 0) ? do_something : (*(b) = 0,0) )
> 
> For out typical internal use cases, 'a' is of type PetscInt, so we could 
> check for (a < 0) and throw a better error message. On the other hand, 
> assuming 'a' to be signed might result in tautology warnings for unsigned 
> integers in user code, so it may be cleaner to hide this additional check in 
> something like "PetscSignedMalloc()" and only use it for the typical internal 
> routines where the argument to PetscMalloc() is a product of potentially 
> large values. A quick grep on the source tree only shows a handful of cases 
> where this is indeed the case (typically just matrices), allowing us to 
> capture most of these problems. Is this a viable approach?
> 
> Best regards,
> Karli
> 
> 
> 
> >
>> Begin forwarded message:
>> 
>>> *From: *Mathis Friesdorf <[email protected]
>>> <mailto:[email protected]>>
>>> *Subject: **Re: [petsc-users] Unexpected "Out of memory error" with SLEPC*
>>> *Date: *June 26, 2014 at 5:08:23 AM CDT
>>> *To: *Karl Rupp <[email protected] <mailto:[email protected]>>
>>> *Cc: *<[email protected] <mailto:[email protected]>>,
>>> <[email protected] <mailto:[email protected]>>
>>> 
>>> Dear Karl,
>>> 
>>> thanks a lot! This indeed worked. I recompiled PETSC with
>>> 
>>> ./configure --with-64-bit-indices=1
>>> PETSC_ARCH = arch-linux2-c-debug-int64
>>> 
>>> and the error is gone. Again thanks for your help!
>>> 
>>> All the best, Mathis
>>> 
>>> 
>>> On Wed, Jun 25, 2014 at 12:42 PM, Karl Rupp <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> 
>>>    Hi Mathis,
>>> 
>>>    this looks very much like an integer overflow:
>>>    
>>> http://www.mcs.anl.gov/petsc/__documentation/faq.html#with-__64-bit-indices
>>>    <http://www.mcs.anl.gov/petsc/documentation/faq.html#with-64-bit-indices>
>>> 
>>>    Best regards,
>>>    Karli
>>> 
>>> 
>>>    On 06/25/2014 12:31 PM, Mathis Friesdorf wrote:
>>> 
>>>        Dear all,
>>> 
>>>        after a very useful email exchange with Jed Brown quite a
>>>        while ago, I
>>>        was able to find the lowest eigenvalue of a large matrix which is
>>>        constructed as a tensor product. Admittedly the solution is a bit
>>>        hacked, but is based on a Matrix shell and Armadillo and therefore
>>>        reasonably fast. The problem seems to work well for smaller
>>>        systems, but
>>>        once the vectors reach a certain size, I get "out of memory"
>>>        errors. I
>>>        have tested the initialization of a vector of that size and
>>>        multiplication by the matrix. This works fine and takes
>>>        roughly 20GB of
>>>        memory. There are 256 GB available, so I see no reason why the esp
>>>        solvers should complain. Does anyone have an idea what goes
>>>        wrong here?
>>>        The error message is not very helpful and claims that a memory is
>>>        requested that is way beyond any reasonable number: /Memory
>>>        requested
>>>        18446744056529684480./
>>> 
>>> 
>>>        Thanks and all the best, Mathis Friesdorf
>>> 
>>> 
>>>        *Output of the Program:*
>>>        /mathis@n180:~/localisation$ ./local_plus 27/
>>>        /System Size: 27
>>> 
>>>        
>>> ------------------------------__------------------------------__--------------
>>>        [[30558,1],0]: A high-performance Open MPI point-to-point
>>>        messaging module
>>>        was unable to find any relevant network interfaces:
>>> 
>>>        Module: OpenFabrics (openib)
>>>           Host: n180
>>> 
>>>        Another transport will be used instead, although this may
>>>        result in
>>>        lower performance.
>>>        
>>> ------------------------------__------------------------------__--------------
>>>        [0]PETSC ERROR: --------------------- Error Message
>>>        ------------------------------__------
>>>        [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>        [0]PETSC ERROR: too large an object or bleeding by not properly
>>>        [0]PETSC ERROR: destroying unneeded objects.
>>>        [0]PETSC ERROR: Memory allocated 3221286704 Memory used by process
>>>        3229827072
>>>        [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log
>>>        for info.
>>>        [0]PETSC ERROR: Memory requested 18446744056529684480!
>>>        [0]PETSC ERROR:
>>>        
>>> ------------------------------__------------------------------__------------
>>>        [0]PETSC ERROR: Petsc Release Version 3.4.4, Mar, 13, 2014
>>>        [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>>        [0]PETSC ERROR: See docs/faq.html for hints about trouble
>>>        shooting.
>>>        [0]PETSC ERROR: See docs/index.html for manual pages.
>>>        [0]PETSC ERROR:
>>>        
>>> ------------------------------__------------------------------__------------
>>>        [0]PETSC ERROR: ./local_plus on a arch-linux2-cxx-debug named
>>>        n180 by
>>>        mathis Wed Jun 25 12:23:01 2014
>>>        [0]PETSC ERROR: Libraries linked from /home/mathis/bin_nodebug/lib
>>>        [0]PETSC ERROR: Configure run at Wed Jun 25 00:03:34 2014
>>>        [0]PETSC ERROR: Configure options
>>>        PETSC_DIR=/home/mathis/petsc-__3.4.4
>>>        --with-debugging=1 COPTFLAGS="-O3 -march=p4 -mtune=p4"
>>>        --with-fortran=0
>>>        -with-mpi=1 --with-mpi-dir=/usr/lib/__openmpi --with-clanguage=cxx
>>>        --prefix=/home/mathis/bin___nodebug
>>>        [0]PETSC ERROR:
>>>        
>>> ------------------------------__------------------------------__------------
>>>        [0]PETSC ERROR: PetscMallocAlign() line 46 in
>>>        /home/mathis/petsc-3.4.4/src/__sys/memory/mal.c
>>>        [0]PETSC ERROR: PetscTrMallocDefault() line 189 in
>>>        /home/mathis/petsc-3.4.4/src/__sys/memory/mtr.c
>>>        [0]PETSC ERROR: VecDuplicateVecs_Contiguous() line 62 in
>>>        src/vec/contiguous.c
>>>        [0]PETSC ERROR: VecDuplicateVecs() line 589 in
>>>        /home/mathis/petsc-3.4.4/src/__vec/vec/interface/vector.c
>>>        [0]PETSC ERROR: EPSAllocateSolution() line 51 in
>>>        src/eps/interface/mem.c
>>>        [0]PETSC ERROR: EPSSetUp_KrylovSchur() line 141 in
>>>        src/eps/impls/krylov/__krylovschur/krylovschur.c
>>>        [0]PETSC ERROR: EPSSetUp() line 147 in src/eps/interface/setup.c
>>>        [0]PETSC ERROR: EPSSolve() line 90 in src/eps/interface/solve.c
>>>        [0]PETSC ERROR: main() line 48 in local_plus.cpp
>>>        
>>> ------------------------------__------------------------------__--------------
>>>        MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>        with errorcode 55.
>>> 
>>>        NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
>>>        processes.
>>>        You may or may not see output from other processes, depending on
>>>        exactly when Open MPI kills them.
>>>        
>>> ------------------------------__------------------------------__--------------
>>> 
>>>        /
>>>        *Output of make:
>>>        */mathis@n180:~/localisation$ make local_plus
>>> 
>>>        mpicxx -o local_plus.o -c -Wall -Wwrite-strings
>>>        -Wno-strict-aliasing
>>>        -Wno-unknown-pragmas -g   -fPIC
>>>        -I/home/mathis/armadillo-4.__300.8/include -lblas -llapack
>>>        -L/home/mathis/armadillo-4.__300.8 -O3 -larmadillo
>>>        -fomit-frame-pointer
>>>        -I/home/mathis/bin_nodebug/__include
>>>        -I/home/mathis/bin_nodebug/__include
>>>        -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/__openmpi
>>>          -D__INSDIR__= -I/home/mathis/bin_nodebug
>>>        -I/home/mathis/bin_nodebug//__include
>>>        -I/home/mathis/bin_nodebug/__include
>>>        local_plus.cpp
>>>        local_plus.cpp:22:0: warning: "__FUNCT__" redefined [enabled
>>>        by default]
>>>        In file included from
>>>        /home/mathis/bin_nodebug/__include/petscvec.h:10:0,
>>>                          from local_plus.cpp:10:
>>>        /home/mathis/bin_nodebug/__include/petscviewer.h:386:0: note:
>>>        this is the
>>>        location of the previous definition
>>>        g++ -o local_plus local_plus.o
>>>        -Wl,-rpath,/home/mathis/bin___nodebug//lib
>>>        -L/home/mathis/bin_nodebug//__lib -lslepc
>>>        -Wl,-rpath,/home/mathis/bin___nodebug/lib
>>>        -L/home/mathis/bin_nodebug/lib
>>>          -lpetsc -llapack -lblas -lpthread
>>>        -Wl,-rpath,/usr/lib/openmpi/__lib
>>>        -L/usr/lib/openmpi/lib
>>>        -Wl,-rpath,/usr/lib/gcc/x86___64-linux-gnu/4.7
>>>        -L/usr/lib/gcc/x86_64-linux-__gnu/4.7
>>>        -Wl,-rpath,/usr/lib/x86_64-__linux-gnu
>>>        -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-__gnu
>>>        -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte
>>>        -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>>        /bin/rm -f local_plus.o
>>>        /
>>>        *Code:
>>>        *///Author: Mathis Friesdorf
>>>        //[email protected] <mailto:[email protected]>
>>>        <mailto:MathisFriesdorf@gmail.__com
>>>        <mailto:[email protected]>>
>>> 
>>> 
>>>        static char help[] = "1D chain\n";
>>> 
>>>        #include <iostream>
>>>        #include <typeinfo>
>>>        #include <armadillo>
>>>        #include <petscsys.h>
>>>        #include <petscvec.h>
>>>        #include <petscmat.h>
>>>        #include <slepcsys.h>
>>>        #include <slepceps.h>
>>>        #include <math.h>
>>>        #include <assert.h>
>>> 
>>>        PetscErrorCode BathMult(Mat H, Vec x,Vec y);
>>>        PetscInt       L=30,d=2,dsys;
>>>        PetscErrorCode ierr;
>>>        arma::mat hint = "1.0 0 0 0.0; 0 -1.0 2.0 0; 0 2.0 -1.0 0; 0 0
>>>        0 1.0;";
>>> 
>>>        #define __FUNCT__ "main"
>>>        int main(int argc, char **argv)
>>>        {
>>>           Mat H;
>>>           EPS eps;
>>>           Vec xr,xi;
>>>           PetscScalar kr,ki;
>>>           PetscInt j, nconv;
>>> 
>>>           L = strtol(argv[1],NULL,10);
>>>           dsys = pow(d,L);
>>>           printf("%s","System Size: ");
>>>           printf("%i",L);
>>>           printf("%s","\n");
>>>           SlepcInitialize(&argc,&argv,(__char*)0,help);
>>> 
>>> 
>>>         MatCreateShell(PETSC_COMM___WORLD,dsys,dsys,dsys,dsys,__NULL,&H);
>>>           MatShellSetOperation(H,MATOP___MULT,(void(*)())BathMult);
>>>           ierr = MatGetVecs(H,NULL,&xr); CHKERRQ(ierr);
>>>           ierr = MatGetVecs(H,NULL,&xi); CHKERRQ(ierr);
>>> 
>>>           ierr = EPSCreate(PETSC_COMM_WORLD, &eps); CHKERRQ(ierr);
>>>           ierr = EPSSetOperators(eps, H, NULL); CHKERRQ(ierr);
>>>           ierr = EPSSetProblemType(eps, EPS_HEP); CHKERRQ(ierr);
>>>           ierr = EPSSetWhichEigenpairs(eps,EPS___SMALLEST_REAL);
>>>        CHKERRQ(ierr);
>>>           ierr = EPSSetFromOptions( eps ); CHKERRQ(ierr);
>>>           ierr = EPSSolve(eps); CHKERRQ(ierr);
>>>           ierr = EPSGetConverged(eps, &nconv); CHKERRQ(ierr);
>>>           for (j=0; j<1; j++) {
>>>             EPSGetEigenpair(eps, j, &kr, &ki, xr, xi);
>>>             printf("%s","Lowest Eigenvalue: ");
>>>             PetscPrintf(PETSC_COMM_WORLD,"__%9F",kr);
>>>             PetscPrintf(PETSC_COMM_WORLD,"__\n");
>>>           }
>>>           EPSDestroy(&eps);
>>> 
>>>           ierr = SlepcFinalize();
>>>           return 0;
>>>        }
>>>        #undef __FUNCT__
>>> 
>>>        #define __FUNCT__ "BathMult"
>>>        PetscErrorCode BathMult(Mat H, Vec x, Vec y)
>>>        {
>>>           PetscInt l;
>>>           uint slice;
>>>           PetscScalar *arrayin,*arrayout;
>>> 
>>>           VecGetArray(x,&arrayin);
>>>           VecGetArray(y,&arrayout);
>>>           arma::cube A = arma::cube(arrayin,1,1,pow(d,__L),
>>>               /*copy_aux_mem*/false,/*__strict*/true);
>>>           arma::mat result = arma::mat(arrayout,pow(d,L),1,
>>>               /*copy_aux_mem*/false,/*__strict*/true);
>>>           for (l=0;l<L-1;l++){
>>>             A.reshape(pow(d,L-2-l),pow(d,__2),pow(d,l));
>>>             result.reshape(pow(d,L-l),pow(__d,l));
>>>             for (slice=0;slice<A.n_slices;__slice++){
>>>               result.col(slice) += vectorise(A.slice(slice)*hint)__;
>>>             }
>>>           }
>>>           arrayin = A.memptr();
>>>           ierr = VecRestoreArray(x,&arrayin); CHKERRQ(ierr);
>>>           arrayout = result.memptr();
>>>           ierr = VecRestoreArray(y,&arrayout); CHKERRQ(ierr);
>>>           PetscFunctionReturn(0);
>>>        }
>>>        #undef __FUNCT__/*
>>>        *

Reply via email to