I was thinking of catching the overflow BEFORE it hits the malloc. For example in DMSetUp_DA_3D() we have the unattractive code
#if !defined(PETSC_USE_64BIT_INDICES) if (((Petsc64bitInt) M)*((Petsc64bitInt) N)*((Petsc64bitInt) P)*((Petsc64bitInt) dof) > (Petsc64bitInt) PETSC_MPI_INT_MAX) SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,"Mesh of %D by %D by %D (dof) is too large for 32 bit indices",M,N,dof); #endif Maybe we could introduce some macros like #if !defined(PETSC_USE_64BIT_INDICES) #define PetscCheckOverflow2(M,N) if (((Petsc64bitInt) M)*((Petsc64bitInt) N) > (Petsc64bitInt) PETSC_MPI_INT_MAX) SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,”Multiply of %D by %D bis too large for 32 bit indices",M,N; #else #define PetscCheckOverflow2(a,b) #endif and then check large products when they are likely to occur. Barry On Jun 26, 2014, at 3:25 PM, Karl Rupp <[email protected]> wrote: > Hey, > > > It would be nice if we could automatically detect this issue more often. > > Indeed. > > We have > > #define PetscMalloc(a,b) ((a != 0) ? do_something : (*(b) = 0,0) ) > > For out typical internal use cases, 'a' is of type PetscInt, so we could > check for (a < 0) and throw a better error message. On the other hand, > assuming 'a' to be signed might result in tautology warnings for unsigned > integers in user code, so it may be cleaner to hide this additional check in > something like "PetscSignedMalloc()" and only use it for the typical internal > routines where the argument to PetscMalloc() is a product of potentially > large values. A quick grep on the source tree only shows a handful of cases > where this is indeed the case (typically just matrices), allowing us to > capture most of these problems. Is this a viable approach? > > Best regards, > Karli > > > > > >> Begin forwarded message: >> >>> *From: *Mathis Friesdorf <[email protected] >>> <mailto:[email protected]>> >>> *Subject: **Re: [petsc-users] Unexpected "Out of memory error" with SLEPC* >>> *Date: *June 26, 2014 at 5:08:23 AM CDT >>> *To: *Karl Rupp <[email protected] <mailto:[email protected]>> >>> *Cc: *<[email protected] <mailto:[email protected]>>, >>> <[email protected] <mailto:[email protected]>> >>> >>> Dear Karl, >>> >>> thanks a lot! This indeed worked. I recompiled PETSC with >>> >>> ./configure --with-64-bit-indices=1 >>> PETSC_ARCH = arch-linux2-c-debug-int64 >>> >>> and the error is gone. Again thanks for your help! >>> >>> All the best, Mathis >>> >>> >>> On Wed, Jun 25, 2014 at 12:42 PM, Karl Rupp <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Mathis, >>> >>> this looks very much like an integer overflow: >>> >>> http://www.mcs.anl.gov/petsc/__documentation/faq.html#with-__64-bit-indices >>> <http://www.mcs.anl.gov/petsc/documentation/faq.html#with-64-bit-indices> >>> >>> Best regards, >>> Karli >>> >>> >>> On 06/25/2014 12:31 PM, Mathis Friesdorf wrote: >>> >>> Dear all, >>> >>> after a very useful email exchange with Jed Brown quite a >>> while ago, I >>> was able to find the lowest eigenvalue of a large matrix which is >>> constructed as a tensor product. Admittedly the solution is a bit >>> hacked, but is based on a Matrix shell and Armadillo and therefore >>> reasonably fast. The problem seems to work well for smaller >>> systems, but >>> once the vectors reach a certain size, I get "out of memory" >>> errors. I >>> have tested the initialization of a vector of that size and >>> multiplication by the matrix. This works fine and takes >>> roughly 20GB of >>> memory. There are 256 GB available, so I see no reason why the esp >>> solvers should complain. Does anyone have an idea what goes >>> wrong here? >>> The error message is not very helpful and claims that a memory is >>> requested that is way beyond any reasonable number: /Memory >>> requested >>> 18446744056529684480./ >>> >>> >>> Thanks and all the best, Mathis Friesdorf >>> >>> >>> *Output of the Program:* >>> /mathis@n180:~/localisation$ ./local_plus 27/ >>> /System Size: 27 >>> >>> >>> ------------------------------__------------------------------__-------------- >>> [[30558,1],0]: A high-performance Open MPI point-to-point >>> messaging module >>> was unable to find any relevant network interfaces: >>> >>> Module: OpenFabrics (openib) >>> Host: n180 >>> >>> Another transport will be used instead, although this may >>> result in >>> lower performance. >>> >>> ------------------------------__------------------------------__-------------- >>> [0]PETSC ERROR: --------------------- Error Message >>> ------------------------------__------ >>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>> [0]PETSC ERROR: too large an object or bleeding by not properly >>> [0]PETSC ERROR: destroying unneeded objects. >>> [0]PETSC ERROR: Memory allocated 3221286704 Memory used by process >>> 3229827072 >>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log >>> for info. >>> [0]PETSC ERROR: Memory requested 18446744056529684480! >>> [0]PETSC ERROR: >>> >>> ------------------------------__------------------------------__------------ >>> [0]PETSC ERROR: Petsc Release Version 3.4.4, Mar, 13, 2014 >>> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [0]PETSC ERROR: See docs/faq.html for hints about trouble >>> shooting. >>> [0]PETSC ERROR: See docs/index.html for manual pages. >>> [0]PETSC ERROR: >>> >>> ------------------------------__------------------------------__------------ >>> [0]PETSC ERROR: ./local_plus on a arch-linux2-cxx-debug named >>> n180 by >>> mathis Wed Jun 25 12:23:01 2014 >>> [0]PETSC ERROR: Libraries linked from /home/mathis/bin_nodebug/lib >>> [0]PETSC ERROR: Configure run at Wed Jun 25 00:03:34 2014 >>> [0]PETSC ERROR: Configure options >>> PETSC_DIR=/home/mathis/petsc-__3.4.4 >>> --with-debugging=1 COPTFLAGS="-O3 -march=p4 -mtune=p4" >>> --with-fortran=0 >>> -with-mpi=1 --with-mpi-dir=/usr/lib/__openmpi --with-clanguage=cxx >>> --prefix=/home/mathis/bin___nodebug >>> [0]PETSC ERROR: >>> >>> ------------------------------__------------------------------__------------ >>> [0]PETSC ERROR: PetscMallocAlign() line 46 in >>> /home/mathis/petsc-3.4.4/src/__sys/memory/mal.c >>> [0]PETSC ERROR: PetscTrMallocDefault() line 189 in >>> /home/mathis/petsc-3.4.4/src/__sys/memory/mtr.c >>> [0]PETSC ERROR: VecDuplicateVecs_Contiguous() line 62 in >>> src/vec/contiguous.c >>> [0]PETSC ERROR: VecDuplicateVecs() line 589 in >>> /home/mathis/petsc-3.4.4/src/__vec/vec/interface/vector.c >>> [0]PETSC ERROR: EPSAllocateSolution() line 51 in >>> src/eps/interface/mem.c >>> [0]PETSC ERROR: EPSSetUp_KrylovSchur() line 141 in >>> src/eps/impls/krylov/__krylovschur/krylovschur.c >>> [0]PETSC ERROR: EPSSetUp() line 147 in src/eps/interface/setup.c >>> [0]PETSC ERROR: EPSSolve() line 90 in src/eps/interface/solve.c >>> [0]PETSC ERROR: main() line 48 in local_plus.cpp >>> >>> ------------------------------__------------------------------__-------------- >>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>> with errorcode 55. >>> >>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI >>> processes. >>> You may or may not see output from other processes, depending on >>> exactly when Open MPI kills them. >>> >>> ------------------------------__------------------------------__-------------- >>> >>> / >>> *Output of make: >>> */mathis@n180:~/localisation$ make local_plus >>> >>> mpicxx -o local_plus.o -c -Wall -Wwrite-strings >>> -Wno-strict-aliasing >>> -Wno-unknown-pragmas -g -fPIC >>> -I/home/mathis/armadillo-4.__300.8/include -lblas -llapack >>> -L/home/mathis/armadillo-4.__300.8 -O3 -larmadillo >>> -fomit-frame-pointer >>> -I/home/mathis/bin_nodebug/__include >>> -I/home/mathis/bin_nodebug/__include >>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/__openmpi >>> -D__INSDIR__= -I/home/mathis/bin_nodebug >>> -I/home/mathis/bin_nodebug//__include >>> -I/home/mathis/bin_nodebug/__include >>> local_plus.cpp >>> local_plus.cpp:22:0: warning: "__FUNCT__" redefined [enabled >>> by default] >>> In file included from >>> /home/mathis/bin_nodebug/__include/petscvec.h:10:0, >>> from local_plus.cpp:10: >>> /home/mathis/bin_nodebug/__include/petscviewer.h:386:0: note: >>> this is the >>> location of the previous definition >>> g++ -o local_plus local_plus.o >>> -Wl,-rpath,/home/mathis/bin___nodebug//lib >>> -L/home/mathis/bin_nodebug//__lib -lslepc >>> -Wl,-rpath,/home/mathis/bin___nodebug/lib >>> -L/home/mathis/bin_nodebug/lib >>> -lpetsc -llapack -lblas -lpthread >>> -Wl,-rpath,/usr/lib/openmpi/__lib >>> -L/usr/lib/openmpi/lib >>> -Wl,-rpath,/usr/lib/gcc/x86___64-linux-gnu/4.7 >>> -L/usr/lib/gcc/x86_64-linux-__gnu/4.7 >>> -Wl,-rpath,/usr/lib/x86_64-__linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-__gnu >>> -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte >>> -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >>> /bin/rm -f local_plus.o >>> / >>> *Code: >>> *///Author: Mathis Friesdorf >>> //[email protected] <mailto:[email protected]> >>> <mailto:MathisFriesdorf@gmail.__com >>> <mailto:[email protected]>> >>> >>> >>> static char help[] = "1D chain\n"; >>> >>> #include <iostream> >>> #include <typeinfo> >>> #include <armadillo> >>> #include <petscsys.h> >>> #include <petscvec.h> >>> #include <petscmat.h> >>> #include <slepcsys.h> >>> #include <slepceps.h> >>> #include <math.h> >>> #include <assert.h> >>> >>> PetscErrorCode BathMult(Mat H, Vec x,Vec y); >>> PetscInt L=30,d=2,dsys; >>> PetscErrorCode ierr; >>> arma::mat hint = "1.0 0 0 0.0; 0 -1.0 2.0 0; 0 2.0 -1.0 0; 0 0 >>> 0 1.0;"; >>> >>> #define __FUNCT__ "main" >>> int main(int argc, char **argv) >>> { >>> Mat H; >>> EPS eps; >>> Vec xr,xi; >>> PetscScalar kr,ki; >>> PetscInt j, nconv; >>> >>> L = strtol(argv[1],NULL,10); >>> dsys = pow(d,L); >>> printf("%s","System Size: "); >>> printf("%i",L); >>> printf("%s","\n"); >>> SlepcInitialize(&argc,&argv,(__char*)0,help); >>> >>> >>> MatCreateShell(PETSC_COMM___WORLD,dsys,dsys,dsys,dsys,__NULL,&H); >>> MatShellSetOperation(H,MATOP___MULT,(void(*)())BathMult); >>> ierr = MatGetVecs(H,NULL,&xr); CHKERRQ(ierr); >>> ierr = MatGetVecs(H,NULL,&xi); CHKERRQ(ierr); >>> >>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); CHKERRQ(ierr); >>> ierr = EPSSetOperators(eps, H, NULL); CHKERRQ(ierr); >>> ierr = EPSSetProblemType(eps, EPS_HEP); CHKERRQ(ierr); >>> ierr = EPSSetWhichEigenpairs(eps,EPS___SMALLEST_REAL); >>> CHKERRQ(ierr); >>> ierr = EPSSetFromOptions( eps ); CHKERRQ(ierr); >>> ierr = EPSSolve(eps); CHKERRQ(ierr); >>> ierr = EPSGetConverged(eps, &nconv); CHKERRQ(ierr); >>> for (j=0; j<1; j++) { >>> EPSGetEigenpair(eps, j, &kr, &ki, xr, xi); >>> printf("%s","Lowest Eigenvalue: "); >>> PetscPrintf(PETSC_COMM_WORLD,"__%9F",kr); >>> PetscPrintf(PETSC_COMM_WORLD,"__\n"); >>> } >>> EPSDestroy(&eps); >>> >>> ierr = SlepcFinalize(); >>> return 0; >>> } >>> #undef __FUNCT__ >>> >>> #define __FUNCT__ "BathMult" >>> PetscErrorCode BathMult(Mat H, Vec x, Vec y) >>> { >>> PetscInt l; >>> uint slice; >>> PetscScalar *arrayin,*arrayout; >>> >>> VecGetArray(x,&arrayin); >>> VecGetArray(y,&arrayout); >>> arma::cube A = arma::cube(arrayin,1,1,pow(d,__L), >>> /*copy_aux_mem*/false,/*__strict*/true); >>> arma::mat result = arma::mat(arrayout,pow(d,L),1, >>> /*copy_aux_mem*/false,/*__strict*/true); >>> for (l=0;l<L-1;l++){ >>> A.reshape(pow(d,L-2-l),pow(d,__2),pow(d,l)); >>> result.reshape(pow(d,L-l),pow(__d,l)); >>> for (slice=0;slice<A.n_slices;__slice++){ >>> result.col(slice) += vectorise(A.slice(slice)*hint)__; >>> } >>> } >>> arrayin = A.memptr(); >>> ierr = VecRestoreArray(x,&arrayin); CHKERRQ(ierr); >>> arrayout = result.memptr(); >>> ierr = VecRestoreArray(y,&arrayout); CHKERRQ(ierr); >>> PetscFunctionReturn(0); >>> } >>> #undef __FUNCT__/* >>> *
