>Submitter-Id: net >Originator: >Organization: University of Heidelberg >Confidential: no >Synopsis: -mfpmath=sse creates illegal code (movapd with misaligned >argument) >Severity: serious >Priority: medium >Category: fortran >Class: wrong-code >Release: 3.4.2 (Debian 3.4.2-2) (Debian testing/unstable) >Environment: System: Linux pc-lenz 2.4.27-rc5-tc2 #2 SMP Fri Aug 20 15:42:48 CEST 2004 i686 GNU/Linux Architecture: i686
host: i486-pc-linux-gnu build: i486-pc-linux-gnu target: i486-pc-linux-gnu configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --disable-werror i486-linux >Description: Using the option "-mfpmath=sse" in g77 (to use SSE instructions for floating point arithmetic) can create illegal code, or at least code that is called with illegal arguments. Specifically, I have a test case where a "movapd" instruction is called with a second argument that is not 16-byte-aligned, causing a segmentation violation. Details below in section "How-To-Repeat". This bug is probably related to the one with ID 14776. However, that bug was reported for a different target, has been open for more than six months, and is still unassigned, so I consider it worthwile reporting this on my own. >How-To-Repeat: This is the Fortran77 source of a program that shows the problem: ========== gccbug.f ========== program test implicit none integer n,error,i,j real*8 tol complex*16 work(5,5,2) n=5 do j=1,n do i=1,n if (i.eq.j) then work(i,j,1)=dcmplx(1.0d0,0.0d0) else work(i,j,1)=dcmplx(0.0d0,0.0d0) endif enddo enddo tol=1.0d-8 call matinvzhp(work(1,1,2),work(1,1,1),n,n,tol,error) write (*,*) 'Error=',error return end C----------------------------------------------------------------------- C CHOLESKYZHP C C Decomposites a complex hermitian (H), positive definite (P) matrix C A into a product of a left lower triangular matrix L and its adjoint, C i. e. A = L L^dagger where L_ij = 0 for j > i and L_ii is real and C positive. All calculations are performed with double precision (Z). C C Input parameters: C a: The matrix to be decomposited (a(i,j) is needed for j <= i C only). C dim: The number of allocated rows (and columns) of "a". C n: The number of used rows (and columns) of "a". C tol: A (relative) tolerance for the check wether "a" is hermitian C and positive definite. C C Output parameters: C a: The left lower triangular matrix L. (The part of "a" right of C the diagonal still contains the corresponding part of A.) C error: An error flag having the following meaning: C 0: Everything was o. k. C 1: The matrix is not hermitian and positive definite. C----------------------------------------------------------------------- subroutine choleskyzhp (a,dim,n,tol,error) implicit none integer dim,n,error,i,j,k real*8 tol,y,tmp complex*16 a(dim,dim),x error = 0 do j = 1,n do i = j,n x = a(i,j) do k = 1,j-1 x = x-a(i,k)*dconjg(a(j,k)) enddo if (i .eq. j) then tmp=dble(x) if (dabs(dimag(x)) .gt. tol*tmp) then error = 1 return else C write (*,*) tmp tmp=dsqrt(tmp) a(i,j) = dcmplx(tmp) y = 1.0d0/tmp endif else a(i,j) = x*y endif enddo enddo return end C----------------------------------------------------------------------- C MATINVZHP C C Inverts a double precision complex (Z), Hermitian (H), and positive C definite (P) matrix A employing a Cholesky-decomposition of A. C C Input parameters: C a: The matrix to be inverted (a(i,j) is needed for j <= i only). C dim: The number of allocated rows (and columns) of "a". C n: The number of used rows (and columns) of "a". C tol: A (relative) tolerance for the check whether "a" is Hermitian C and positive definite. C Output parameters: C b: The inverse of "a". C a: The matrix "a" is overwritten. C error: An error flag having the following meaning: C 0: Everything was o. k. C 1: The matrix "a" is not hermitian and positive definite. C Remarks: C Library routine "choleskyzhp" is used. C----------------------------------------------------------------------- subroutine matinvzhp (b,a,dim,n,tol,error) implicit none complex*16 zeroz,onez parameter (zeroz = (0.0d0,0.0d0),onez = (1.0d0,0.0d0)) integer dim,n,error,i,j,k real*8 tol complex*16 b(dim,dim),a(dim,dim),x C 1. Construct the A = L L^dagger Cholesky-decomposition. C 2. For every column j of the identity matrix do: C a. Solve L b = 1 (first do-loop over i). C b. Solve L_dagger u = b (u is stored directly in the j-th column of C the matrix b) (second do-loop over i). call choleskyzhp(a,dim,n,tol,error) if (error .ne. 0) return do j = 1,n do i = 1,n if (i .lt. j) then b(i,j) = zeroz elseif (i .eq. j) then b(i,j) = onez/a(i,i) else x = zeroz do k = j,i-1 x = x-a(i,k)*b(k,j) enddo b(i,j) = x/a(i,i) endif enddo do i = n,1,-1 x = b(i,j) do k = i+1,n x = x-dconjg(a(k,i))*b(k,j) enddo b(i,j) = x/a(i,i) enddo enddo return end ========== END ========== If compiled like this: frank:~/gccbug> g77-3.4 -O -msse2 -mfpmath=sse -g -Wall -v gccbug.f -o gccbug Driving: g77-3.4 -O -msse2 -mfpmath=sse -g -Wall -v gccbug.f -o gccbug -lfrtbegin -lg2c -lm -shared-libgcc Reading specs from /usr/lib/gcc/i486-linux/3.4.2/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --disable-werror i486-linux Thread model: posix gcc version 3.4.2 (Debian 3.4.2-2) /usr/lib/gcc/i486-linux/3.4.2/f771 gccbug.f -quiet -dumpbase gccbug.f -msse2 -mfpmath=sse -mtune=i486 -auxbase gccbug -g -O -Wall -version -o /tmp/ccuU2AmV.s GNU F77 version 3.4.2 (Debian 3.4.2-2) (i486-linux) compiled by GNU C version 3.4.2 (Debian 3.4.2-2). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 gccbug.f: In subroutine `choleskyzhp': In file included from gccbug.f:0: gccbug.f:58: warning: 'y' might be used uninitialized in this function as -V -Qy --32 -o /tmp/ccQrG9WN.o /tmp/ccuU2AmV.s GNU assembler version 2.15 (i386-linux) using BFD version 2.15 /usr/lib/gcc/i486-linux/3.4.2/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o gccbug /usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crt1.o /usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crti.o /usr/lib/gcc/i486-linux/3.4.2/crtbegin.o -L/usr/lib/gcc/i486-linux/3.4.2 -L/usr/lib/gcc/i486-linux/3.4.2 -L/usr/lib/gcc/i486-linux/3.4.2/../../../../lib -L/usr/lib/gcc/i486-linux/3.4.2/../../.. -L/lib/../lib -L/usr/lib/../lib /tmp/ccQrG9WN.o -lfrtbegin -lg2c -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux/3.4.2/crtend.o /usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crtn.o the program will crash with "Segmentation violation". (OT: The one warning makes sense, but is wrong.) Here's the output from a gdb session: frank:~/gccbug> gdb ./gccbug GNU gdb 6.1-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) break choleskyzhp_ Breakpoint 1 at 0x804880a: file gccbug.f, line 53. (gdb) run Starting program: /home/frank/gccbug/gccbug Breakpoint 1, choleskyzhp_ (a=0xbfffe7b0, dim=0xbfffe7ac, n=0xbfffe79c, tol=0xbfffe7a0, error=0xbfffe7ac) at gccbug.f:53 53 subroutine choleskyzhp (a,dim,n,tol,error) Current language: auto; currently fortran (gdb) disass Dump of assembler code for function choleskyzhp_: 0x080487e9 <choleskyzhp_+0>: push %ebp 0x080487ea <choleskyzhp_+1>: mov %esp,%ebp 0x080487ec <choleskyzhp_+3>: push %edi 0x080487ed <choleskyzhp_+4>: push %esi 0x080487ee <choleskyzhp_+5>: push %ebx 0x080487ef <choleskyzhp_+6>: sub $0x8c,%esp 0x080487f5 <choleskyzhp_+12>: mov 0x8(%ebp),%eax 0x080487f8 <choleskyzhp_+15>: mov %eax,0xffffffe4(%ebp) 0x080487fb <choleskyzhp_+18>: mov 0x10(%ebp),%edx 0x080487fe <choleskyzhp_+21>: mov 0x14(%ebp),%eax 0x08048801 <choleskyzhp_+24>: mov %eax,0xffffffe0(%ebp) 0x08048804 <choleskyzhp_+27>: mov 0x18(%ebp),%eax 0x08048807 <choleskyzhp_+30>: mov %eax,0xffffffdc(%ebp) 0x0804880a <choleskyzhp_+33>: mov 0xc(%ebp),%eax 0x0804880d <choleskyzhp_+36>: mov (%eax),%eax 0x0804880f <choleskyzhp_+38>: mov %eax,0xffffffd8(%ebp) 0x08048812 <choleskyzhp_+41>: mov 0xffffffdc(%ebp),%eax 0x08048815 <choleskyzhp_+44>: movl $0x0,(%eax) 0x0804881b <choleskyzhp_+50>: mov $0x1,%edi 0x08048820 <choleskyzhp_+55>: mov (%edx),%eax 0x08048822 <choleskyzhp_+57>: dec %eax 0x08048823 <choleskyzhp_+58>: mov %eax,0xffffffcc(%ebp) 0x08048826 <choleskyzhp_+61>: js 0x80489cc <choleskyzhp_+483> 0x0804882c <choleskyzhp_+67>: mov (%edx),%edx 0x0804882e <choleskyzhp_+69>: mov %edx,0xffffffc4(%ebp) 0x08048831 <choleskyzhp_+72>: movsd 0x80490a0,%xmm0 0x08048839 <choleskyzhp_+80>: movapd %xmm0,0xffffff78(%ebp) <-- program will crash here 0x08048841 <choleskyzhp_+88>: flds 0x804907c 0x08048847 <choleskyzhp_+94>: flds 0x8049080 0x0804884d <choleskyzhp_+100>: movsd 0x8049090,%xmm0 0x08048855 <choleskyzhp_+108>: movapd %xmm0,0xffffff68(%ebp) 0x0804885d <choleskyzhp_+116>: mov %edi,%esi 0x0804885f <choleskyzhp_+118>: mov 0xffffffc4(%ebp),%eax 0x08048862 <choleskyzhp_+121>: sub %edi,%eax 0x08048864 <choleskyzhp_+123>: mov %eax,0xffffffc8(%ebp) 0x08048867 <choleskyzhp_+126>: js 0x80489be <choleskyzhp_+469> 0x0804886d <choleskyzhp_+132>: fxch %st(1) 0x0804886f <choleskyzhp_+134>: lea 0xffffffff(%edi),%eax 0x08048872 <choleskyzhp_+137>: imul 0xffffffd8(%ebp),%eax 0x08048876 <choleskyzhp_+141>: mov %eax,0xffffffc0(%ebp) 0x08048879 <choleskyzhp_+144>: movapd 0xffffff78(%ebp),%xmm0 0x08048881 <choleskyzhp_+152>: movapd %xmm0,0xffffff98(%ebp) 0x08048886 <choleskyzhp_+157>: fstl 0xffffff90(%ebp) 0x08048889 <choleskyzhp_+160>: fxch %st(1) ---Type <return> to continue, or q <return> to quit---q Quit (gdb) step 61 error = 0 (gdb) step 62 do j = 1,n (gdb) step 82 enddo (gdb) stepi 0x08048822 82 enddo (gdb) stepi 0x08048823 82 enddo (gdb) stepi 0x08048826 82 enddo (gdb) stepi 0x0804882c 82 enddo (gdb) stepi 0x0804882e 82 enddo (gdb) stepi 0x08048831 82 enddo (gdb) stepi 0x08048839 82 enddo (gdb) print $ebp $1 = (PTR TO -> ( void )) 0xbfffe684 <-- ebp must be 0x.......8 for correct alignment (gdb) stepi <-- now executing the incorrect movapd... Program received signal SIGSEGV, Segmentation fault. 0x08048839 in choleskyzhp_ (a=0xbfffe7b0, dim=0x4, n=0x5, tol=0xbfffe7a0, error=0xbfffe7ac) at gccbug.f:82 82 enddo (gdb) q I hope this is enough information for you to reproduce the bug. One interesting observation I made: As you can see, the main program calls "matinvzhp", which in turn immediately calls "choleskyzhp" where the bug appears. If the main program is rewritten to directly call "choleskyzhp", the bug disappears. That's why I had to post such a rather long test program. The bug is also present in gcc-3.4.1 (the one currently in debian/sarge). The bug was not present in gcc-3.3. I've tested Pentium4 as well as AMD64 processors (both have the SSE2 extensions), the crash happens on both. I've also tested the program on an AMD64 processor in a 64-bit environment (corresponding to target x86_64) -- there the program runs fine. >Fix: The available workarounds are: * not using option "-O" * not using option "-mfpmath=sse" * uncommenting the line "write (*,*) tmp" in choleskyzhp and recompiling (probably because this line disturbs the optimizer)