[petsc-dev] Error during KSPDestroy

Alexander Grayver Tue, 08 May 2012 08:59:16 +0200

Barry,

Not it works.
Thanks everybody!


On 07.05.2012 22:36, Barry Smith wrote:
>     Alexander
>
>     Satish and I have determined the problem (took some valgrind and debugger 
> work). We were not allocating enough "workspace" for one of the work arrays 
> passed to zgesvd(). We have fixed it in petsc-dev. You should be able to do a 
> hg pull -u  then recompile the libraries with make cmake then relink and run 
> your example.
>
>      Thank you for your patience.
>
>       Barry
>
> On May 7, 2012, at 9:10 AM, Alexander Grayver wrote:
>
>> On 07.05.2012 15:04, Barry Smith wrote:
>>>     I am also running complex.
>>>
>>>      Look in the file dlasq2.f (it will be in the externalpackages 
>>> subdirectory of the PETSc directory. Look at line 215, this is where 
>>> valgrind has a problem. In my copy
>>>
>>>        END IF
>>> *
>>> *     Check for negative data and compute sums of q's and e's.
>>> *<------ this is line 215
>>>        Z( 2*N ) = ZERO
>>>
>>> it is a comment, which is not good. Is lione 215 also a comment in your 
>>> copy of dlasq2.f?
>> Barry,
>>
>> *
>> *     Rearrange data for locality: Z=(q1,qq1,e1,ee1,q2,qq2,e2,ee2,...).
>> *
>>       DO 30 K = 2*N, 2, -2
>>          Z( 2*K ) = ZERO                                             
>> !<----------- LINE 215
>>          Z( 2*K-1 ) = Z( K )
>>          Z( 2*K-2 ) = ZERO
>>          Z( 2*K-3 ) = Z( K-1 )
>>    30 CONTINUE
>>
>> In valgrind log you can see that it complaints about following lines as well:
>>
>> ==9009== Invalid write of size 8
>> ==9009==    at 0x10651D5: dlasq2_ (dlasq2.f:215)
>> ==9009==    by 0x1064683: dlasq1_ (dlasq1.f:135)
>> ==9009==    by 0x104EB3F: zbdsqr_ (zbdsqr.f:225)
>> ==9009==    by 0x1023B74: zgesvd_ (zgesvd.f:2040)
>> ==9009==    by 0xD38725: KSPComputeExtremeSingularValues_GMRES (gmreig.c:46)
>> ==9009==    by 0xCB3CC7: KSPComputeExtremeSingularValues (itfunc.c:47)
>> ==9009==    by 0x406DF2: main (solveTest.c:47)
>> ==9009==  Address 0x6ef5d88 is 8 bytes before a block of size 832 alloc'd
>> ==9009==    at 0x4C2786E: memalign (vg_replace_malloc.c:581)
>> ==9009==    by 0x47E3CB: PetscMallocAlign (mal.c:30)
>> ==9009==    by 0xD2E286: KSPSetUp_GMRES (gmres.c:73)
>> ==9009==    by 0xCB5464: KSPSetUp (itfunc.c:239)
>> ==9009==    by 0xCB6E56: KSPSolve (itfunc.c:402)
>> ==9009==    by 0x406DDB: main (solveTest.c:46)
>> ==9009==
>> ==9009== Invalid write of size 8
>> ==9009==    at 0x1065204: dlasq2_ (dlasq2.f:216)
>> ....
>> ==9009==
>> ==9009== Invalid write of size 8
>> ==9009==    at 0x1065223: dlasq2_ (dlasq2.f:217)
>> ....
>> ==9009==
>> ==9009== Invalid write of size 8
>> ==9009==    at 0x1065255: dlasq2_ (dlasq2.f:218)
>> ....
>>
>> All further output is also related to the Z array.
>> Hard to believe this is a LAPACK problem... I tried 3 implementations over 2 
>> machines.
>> I have bad feeling it's my stupid mistake somewhere... :)
>>
>> Just in case, I run ubuntu 11.1 and PETSc is configured like this with 
>> default gcc compiler:
>> ./configure --with-petsc-arch=mpich-gcc-complex-debug-c 
>> --download-f-blas-lapack --with-precision=double --with-scalar-type=complex 
>> --download-mpich
>>
>>> There are two possible causes I can think of for your problem
>>>
>>> 1) PETSc does not allocate enough work space for zgesvd() or
>>> 2) the BLAS/LAPACK routines have a bug where they sometimes access out of 
>>> their work space.
>>>
>>>
>>>     Satish,
>>>
>>>       Can you try the same build options on a Linux machine as close to 
>>> Alexander as we have and see if you can reproduce this?
>>>
>>>
>>>     Barry
>>>
>>>
>>>
>>> On May 7, 2012, at 2:16 AM, Alexander Grayver wrote:
>>>
>>>> On 06.05.2012 22:24, Barry Smith wrote:
>>>>>    Alexander,
>>>>>
>>>>>       I cannot reproduce this on my mac with 3 different blas/lapack.
>>>> Barry,
>>>>
>>>> I'm surprised. I ran it on my home PC with ubuntu and PETSc configured 
>>>> from scratch as following:
>>>> --download-mpich --with-fortran-interfaces=1 --download-scalapack 
>>>> --download-blacs --with-scalar-type=complex --download-blas-lapack 
>>>> --with-precision=double
>>>>
>>>> And it's still there.
>>>> Please note that all my numbers are complex.
>>>>
>>>>>       Could you please run the case below but with 
>>>>> --download-f-blas-lapack   (you forgot the -f last time)? Send us the 
>>>>> valgrind results. This will tell use the exact line number in dlasq3() 
>>>>> that is triggering the bad read.
>>>> I did:
>>>> ./configure --with-petsc-arch=openmpi-intel-complex-debug-c 
>>>> --download-scalapack --download-blacs --download-f-blas-lapack 
>>>> --with-precision=double --with-scalar-type=complex
>>>>
>>>> And then valgrind program. The first message from log:
>>>>
>>>> ==27656== Invalid write of size 8
>>>> ==27656==    at 0x15A8E9E: dlasq2_ (dlasq2.f:215)
>>>> ==27656==    by 0x15A83A4: dlasq1_ (dlasq1.f:135)
>>>> ==27656==    by 0x158ACEC: zbdsqr_ (zbdsqr.f:225)
>>>> ==27656==    by 0x154EC27: zgesvd_ (zgesvd.f:2038)
>>>> ==27656==    by 0x695DD3: KSPComputeExtremeSingularValues_GMRES 
>>>> (gmreig.c:46)
>>>> ==27656==    by 0x69DD76: KSPComputeExtremeSingularValues (itfunc.c:47)
>>>> ==27656==    by 0x44E98C: main (solveTest.c:62)
>>>> ==27656==  Address 0xfad2d98 is 8 bytes before a block of size 832 alloc'd
>>>> ==27656==    at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>> ==27656==    by 0x4B642B: PetscMallocAlign (mal.c:30)
>>>> ==27656==    by 0x687775: KSPSetUp_GMRES (gmres.c:73)
>>>> ==27656==    by 0x69FE4A: KSPSetUp (itfunc.c:239)
>>>> ==27656==    by 0x6A2058: KSPSolve (itfunc.c:402)
>>>> ==27656==    by 0x44E969: main (solveTest.c:61)
>>>>
>>>> Please find full log attached.
>>>>
>>>>>      Barry
>>>>>
>>>>>
>>>>> On May 6, 2012, at 9:16 AM, Alexander Grayver wrote:
>>>>>
>>>>>> On 06.05.2012 15:34, Matthew Knepley wrote:
>>>>>>> On Sun, May 6, 2012 at 9:24 AM, Alexander Grayver<agrayver at 
>>>>>>> gfz-potsdam.de>    wrote:
>>>>>>> Hm, valgrind gives a lot of output like that (see full log in previous 
>>>>>>> message):
>>>>>>>
>>>>>>> Can you run this with --download-f-blas-lapack? This sounds much more 
>>>>>>> like an MKL bug.
>>>>>> I did:
>>>>>> --download-scalapack --download-blacs --download-blas-lapack 
>>>>>> --with-precision=double --with-scalar-type=complex
>>>>>>
>>>>>> The error is still there. I checked "ldd solveTest", mkl is not used for 
>>>>>> sure. This is not an MKL problem I guess:
>>>>>>
>>>>>> ==13600== Invalid read of size 8
>>>>>> ==13600==    at 0x58636AF: dlasq3_ (in /usr/local/lib/liblapack.so.3.2.2)
>>>>>> ==13600==    by 0x5862C84: dlasq2_ (in /usr/local/lib/liblapack.so.3.2.2)
>>>>>> ==13600==    by 0x5861F2C: dlasq1_ (in /usr/local/lib/liblapack.so.3.2.2)
>>>>>> ==13600==    by 0x571A479: zbdsqr_ (in /usr/local/lib/liblapack.so.3.2.2)
>>>>>> ==13600==    by 0x57466A7: zgesvd_ (in /usr/local/lib/liblapack.so.3.2.2)
>>>>>> ==13600==    by 0x694687: KSPComputeExtremeSingularValues_GMRES 
>>>>>> (gmreig.c:46)
>>>>>> ==13600==    by 0x69C62A: KSPComputeExtremeSingularValues (itfunc.c:47)
>>>>>> ==13600==    by 0x44E02C: main (solveTest.c:62)
>>>>>> ==13600==  Address 0x10826b90 is 16 bytes before a block of size 832 
>>>>>> alloc'd
>>>>>> ==13600==    at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>>>> ==13600==    by 0x4B5ACB: PetscMallocAlign (mal.c:30)
>>>>>> ==13600==    by 0x686181: KSPSetUp_GMRES (gmres.c:73)
>>>>>> ==13600==    by 0x69E6FE: KSPSetUp (itfunc.c:239)
>>>>>> ==13600==    by 0x6A090C: KSPSolve (itfunc.c:402)
>>>>>> ==13600==    by 0x44E009: main (solveTest.c:61)
>>>>>>
>>>>>> The weird thing is that the it gives correct result, so zgesvd works 
>>>>>> fine.
>>>>>>
>>>>>> And also running this program with 10 iterations in valgrind doesn't 
>>>>>> produce error. The low above is with 100 iterations.
>>>>>> Without valgrind the error is always there.
>>>>>>
>>>>>> -- 
>>>>>> Regards,
>>>>>> Alexander
>>>>>>
>>>> -- 
>>>> Regards,
>>>> Alexander
>>>>
>>>> <valgrind.zip>
>>
>> -- 
>> Regards,
>> Alexander
>>


-- 
Regards,
Alexander

[petsc-dev] Error during KSPDestroy

Reply via email to