On Fri, Oct 10, 2008 at 6:29 PM, Deng, Ying <ying.deng at intel.com> wrote: > Hi, > > I am seeing problems when trying to build petsc-dev code. My configure > line is below, same as I successfully did for 2.3.2-p10. I tried with > mkl 9 and mkl 10. Same errors. There are references to undefined > symbols. Please share with me if you have any experience with the issue > or suggestions to resolve it.
1) Please always send configure.log. The screen output does not tell us enough to debug problems. 2) Specifying libraries directly is not usually a good idea since some packages, like MKL, tend to depend on other libraries (like libguide, libpthread). I would use --with-blas-lapack-dir=$MKLDIR 3) Mail about install problems should go to petsc-maint at mcs.anl.gov. petsc-dev is for discussion of development. Thanks, Matt > Thanks, > Ying > > > ./config/configure.py --with-batch=1 --with-clanguage=C++ > --with-vendor-compilers=intel '--CXXFLAGS=-g > -gcc-name=/usr/intel/pkgs/gcc/4.2.2/bin/g++ -gcc-version=420 ' > '--LDFLAGS=-L/usr/lib64 -L/usr/intel/pkgs/gcc/4.2.2/lib -ldl -lpthread > -Qlocation,ld,/usr/intel/pkgs/gcc/4.2.2/x86_64-suse-linux/bin > -L/usr/intel/pkgs/icc/10.1.008e/lib -lirc' --with-cxx=$ICCDIR/bin/icpc > --with-fc=$IFCDIR/bin/ifort --with-mpi-compilers=0 --with-mpi-shared=0 > --with-debugging=yes --with-mpi=yes --with-mpi-include=$MPIDIR/include > --with-mpi-lib=\[$MPIDIR/lib64/libmpi.a,$MPIDIR/lib64/libmpiif.a,$MPIDIR > /lib64/libmpigi.a\] > --with-blas-lapack-lib=\[$MKLLIBDIR/libguide.so,$MKLLIBDIR/libmkl_lapack > .so,$MKLLIBDIR/libmkl_solver.a,$MKLLIBDIR/libmkl.so\] > --with-scalapack=yes --with-scalapack-include=$MKLDIR/include > --with-scalapack-lib=$MKLLIBDIR/libmkl_scalapack.a --with-blacs=yes > --with-blacs-include=$MKLDIR/include > --with-blacs-lib=$MKLLIBDIR/libmkl_blacs_intelmpi_lp64.a > --with-umfpack=1 > --with-umfpack-lib=\[$UMFPACKDIR/UMFPACK/Lib/libumfpack.a,$UMFPACKDIR/AM > D/Lib/libamd.a\] --with-umfpack-include=$UMFPACKDIR/UMFPACK/Include > --with-parmetis=1 --with-parmetis-dir=$PARMETISDIR --with-mumps=1 > --download-mumps=$PETSC_DIR/externalpackages/MUMPS_4.6.3.tar.gz > --with-superlu_dist=1 > --download-superlu_dist=$PETSC_DIR/externalpackages/superlu_dist_2.0.tar > .gz > > > .... > > /nfs/pdx/proj/dt/pdx_sde02/x86-64_linux26/petsc/petsc-dev/conftest.c:7: > undefined reference to `f2cblaslapack311_id_' > /p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so: > undefined reference to `pthread_atfork' > > .... > > > ------------------------------------------------------------------------ > -------------- > You set a value for --with-blas-lapack-lib=<lib>, but > ['/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so', > '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_lapack.s > o', > '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_solver.a > ', '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl.so'] > cannot be used > ************************************************************************ > ********* > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Thursday, October 09, 2008 12:39 PM > To: Rhew, Jung-hoon > Cc: PETSc-Maint Smith; Linton, Tom; Cea, Stephen M; Stettler, Mark > Subject: Re: [PETSC #18391] PETSc crash with memory allocation in ILU > preconditioning > > > We don't have all the code just right to use those packages with > 64 bit integers. I will try to get them all > working by Monday and will let you know my progress. To use them you > will need to be using > petsc-dev > http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html > so you can switch to > that now if you are not yet using it in preparation for my updates. > > > Barry > > On Oct 9, 2008, at 12:52 PM, Rhew, Jung-hoon wrote: > >> Hi, >> >> I found that the root cause of malloc error was that our PETSc >> library had been compiled without 64 bit flag on. Thus, PetscInt >> was defined as "int" instead of "long long" and for large problems, >> the memory allocation requires memory beyond the maximum of int and >> causes integer overflow. >> >> But when I tried to build using 64 bit flag (--with-64-bit- >> indices=1), all files associated with the external libraries (such >> as UMFPACK, and MUMPS) built with PETSc started failing in >> compilation mainly due to the incompatibility between "int" in those >> libraries and "long long" in PETSc. >> >> I wonder if you can let us know how to resolve this conflict when >> builing PETSc with 64 bit. The brute force way is to change the >> source codes of those libraries where the conflicts occur but I >> wonder if there is a neater way of doing this. >> >> Thanks. >> jr >> >> Example: >> libfast in: /nfs/ltdn/disks/td_disk49/usr.cdmg/jrhew/work/mds_work/ >> PETSC/mypetsc-2.3.2-p10/src/mat/impls/aij/seq/umfpack >> >> umfpack.c(154): error: a value of type "PetscInt={long long} *" >> cannot be used to initialize an entity of type "int *" >> int m=A->rmap.n,n=A->cmap.n,*ai=mat->i,*aj=mat- >> >j,status,*ra,idx; >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Tuesday, October 07, 2008 6:15 PM >> To: Rhew, Jung-hoon >> Cc: petsc-maint at mcs.anl.gov; Linton, Tom; Cea, Stephen M; Stettler, >> Mark >> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in >> ILU preconditioning >> >> >> During the symbolic phase of ILU(N) there is no way in advance to >> know how many new nonzeros are needed >> in the factored version over the original matrix (this is tree for LU >> too). We handle this by starting with a certain >> amount of memory and then if that is not enough for for the symbolic >> factor we double the memory allocated >> and copy the values over from the old copy of the symbolic factor >> (what has been computed so far) and then >> free the old copy. >> >> To avoid this "memory doubling" (which is not super memory >> efficient) you can use the option >> -mat_factor_fill or PCFactorSetFill() to set slightly more than the >> "correct" value then only a single malloc >> is needed and you can do larger problems. >> >> Of course, the question is "what value should I use for fill"? >> There is no formula, if there was we would >> use it automatically. So the only way I know is to run smaller >> problems and get a feel for what the ratio >> should be for your larger problem. Run with -info | grep >> pc_factor_fill and it will tell you what "you should >> have used" >> >> Hope this helps, >> >> Barry >> >> >> >> On Oct 7, 2008, at 5:46 PM, Rhew, Jung-hoon wrote: >> >>> Hi, >>> >>> 1. I ran it with 64-bit machine with 32GB physical memory but it >>> still crashed. At the crash, the peak memory was 17GB so there were >>> plenty of memory left. This is why I don't think the simulation >>> needed full 32GB + swap space more than 64GB. >>> >>> 2. The problem size is too big for direct solver as it can easily go >>> beyond 32GB. Actually, we use MUMPS for smaller problems. >>> >>> 3. ILUN is the most robust preconditioner we found for our >>> production simulation so we want to stick to it. >>> >>> I think I'll send a test case that reproduces the problem. >>> >>> -----Original Message----- >>> From: knepley at gmail.com [mailto:knepley at gmail.com] On Behalf Of >>> Matthew Knepley >>> Sent: Tuesday, October 07, 2008 2:21 PM >>> To: Rhew, Jung-hoon >>> Cc: PETSC Maintenance >>> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in >>> ILU preconditioning >>> >>> Its not hard for ILU(k) to run out of the 32-bit limit for large >>> matrices. I would recommend >>> >>> 1) Using a 64-bit machine with more memory >>> >>> 2) Trying a sparse direct solver like MUMPS >>> >>> 3) Trying another preconditioner, which is of course problem >>> dependent >>> >>> Thanks, >>> >>> Matt >>> >>> On Tue, Oct 7, 2008 at 4:03 PM, Rhew, Jung-hoon >>> <jung-hoon.rhew at intel.com> wrote: >>>> Dear PETSc team, >>>> >>>> We use PETSc as a linear solver library in our tool and in some >>>> test cases >>>> using ILU(N) preconditioner, we have problems with memory. I'm not >>>> sending >>>> our matrix at this time since it is huge but if you think it is >>>> needed, I'll >>>> send it to you. >>>> >>>> Thanks for your help in advance. >>>> >>>> >>>> >>>> Log file is attached. >>>> OS: suse 64bit sles9 >>>> >>>> 2.6.5-7.276.PTF.196309.1-smp #1 SMP Mon Jul 24 10:45:31 UTC 2006 >>>> x86_64 >>>> x86_64 x86_64 GNU/Linux >>>> >>>> PETSc ver: petsc-2.3.2-p10 >>>> MPI implementation: Intel MPI based on MPICH2 and MVAPICH2 >>>> Compiler: GCC 4.2.2 >>>> Probable PETSc component: n/a >>>> Problem Description >>>> >>>> Solver setting: BCGSL (L=2) and ILU(N=2) >>>> >>>> -ksp_rtol=1e-14 >>>> >>>> -ksp_type=bcgsl >>>> >>>> -ksp_bcgsl_ell=2 >>>> >>>> -pc_factor_levels=2 >>>> >>>> -pc_factor_reuseordering >>>> >>>> -pc_factor_zeropivot=0.0 >>>> >>>> -pc_type=ilu >>>> >>>> -pc_factor_fill=2 >>>> >>>> -pc_factor_mat_ordering_type=rcm >>>> >>>> >>>> >>>> malloc crash: sparse matrix size ~ 500K by 500K with NNZ ~ 0.002% >>>> (full >>>> error message is attached.) >>>> >>>> In debugger, symbolic ILU requires memory beyond the max int. At >>>> line 1089 >>>> In aijfact.c, len becomes -2147483648 as >>>> (bi[n])*sizeof(PetscScalar) > max >>>> int. >>>> >>>> len = (bi[n])*sizeof(PetscScalar); >>>> >>>> >>>> >>>> Then, it causes the following malloc error in subsequent function >>>> calls (the >>>> call stack is also in the attached error message). >>>> >>>> [0]PETSC ERROR: --------------------- Error Message >>>> ------------------------------------ >>>> >>>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>>> >>>> [0]PETSC ERROR: too large an object or bleeding by not properly >>>> >>>> [0]PETSC ERROR: destroying unneeded objects. >>>> >>>> [0]PETSC ERROR: Memory allocated -2147483648 Memory used by process >>>> -2147483648 >>>> >>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for >>>> info. >>>> >>>> [0]PETSC ERROR: Memory requested 18446744071912865792! >>>> >>>> [0]PETSC ERROR: >>>> > ------------------------------------------------------------------------ >>>> >>>> [0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 10, Wed Mar 28 >>>> 19:13:22 >>>> CDT 2007 HG revision: d7298c71db7f5e767f359ae35d33cab3bed44428 >>>> >>>> >>>> >>>> Possibly relevant symptom: iterative solver with ILU(N) consumes >>>> more memory >>>> than direct solver as N gets larger (>5) although the matrix is not >>>> big >>>> enough to cause malloc crash like the above. >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener