-mfpmath=sse creates illegal code (movapd with misaligned argument)

Frank . Otto Mon, 11 Oct 2004 07:18:03 -0500

>Submitter-Id:  net
>Originator:    
>Organization:  University of Heidelberg
>Confidential:  no
>Synopsis:      -mfpmath=sse creates illegal code (movapd with misaligned 
>argument)
>Severity:      serious
>Priority:      medium
>Category:      fortran
>Class:         wrong-code
>Release:       3.4.2 (Debian 3.4.2-2) (Debian testing/unstable)
>Environment:
System: Linux pc-lenz 2.4.27-rc5-tc2 #2 SMP Fri Aug 20 15:42:48 CEST 2004 i686 
GNU/Linux
Architecture: i686


host: i486-pc-linux-gnu
build: i486-pc-linux-gnu
target: i486-pc-linux-gnu
configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 
--enable-shared --with-system-zlib --enable-nls --without-included-gettext 
--program-suffix=-3.4 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt 
--enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm 
--enable-java-awt=gtk --disable-werror i486-linux

>Description:
        Using the option "-mfpmath=sse" in g77 (to use SSE instructions for
        floating point arithmetic) can create illegal code, or at least
        code that is called with illegal arguments. Specifically, I have a
        test case where a "movapd" instruction is called with a second argument
        that is not 16-byte-aligned, causing a segmentation violation.
        Details below in section "How-To-Repeat".
        This bug is probably related to the one with ID 14776. However, that bug
        was reported for a different target, has been open for more than six
        months, and is still unassigned, so I consider it worthwile reporting
        this on my own.

>How-To-Repeat:
        This is the Fortran77 source of a program that shows the problem:

========== gccbug.f ==========
      program test

      implicit none

      integer    n,error,i,j
      real*8     tol
      complex*16 work(5,5,2)

      n=5
      do j=1,n
         do i=1,n
            if (i.eq.j) then
               work(i,j,1)=dcmplx(1.0d0,0.0d0)
            else
               work(i,j,1)=dcmplx(0.0d0,0.0d0)
            endif
         enddo
      enddo
      tol=1.0d-8

      call matinvzhp(work(1,1,2),work(1,1,1),n,n,tol,error)

      write (*,*) 'Error=',error

      return
      end


C-----------------------------------------------------------------------
C                               CHOLESKYZHP
C
C Decomposites a complex hermitian (H), positive definite (P) matrix
C A into a product of a left lower triangular matrix L and its adjoint,
C i. e. A = L L^dagger where L_ij = 0 for j > i and L_ii is real and
C positive. All calculations are performed with double precision (Z).
C
C Input parameters:
C   a:     The matrix to be decomposited (a(i,j) is needed for j <= i
C          only).
C   dim:   The number of allocated rows (and columns) of "a".
C   n:     The number of used rows (and columns) of "a".
C   tol:   A (relative) tolerance for the check wether "a" is hermitian
C          and positive definite.
C
C Output parameters:
C   a:     The left lower triangular matrix L. (The part of "a" right of
C          the diagonal still contains the corresponding part of A.)
C   error: An error flag having the following meaning:
C          0: Everything was o. k.
C          1: The matrix is not hermitian and positive definite.
C-----------------------------------------------------------------------

      subroutine choleskyzhp (a,dim,n,tol,error)

      implicit none

      integer    dim,n,error,i,j,k
      real*8     tol,y,tmp
      complex*16 a(dim,dim),x

      error = 0
      do j = 1,n
         do i = j,n
            x = a(i,j)
            do k = 1,j-1
               x = x-a(i,k)*dconjg(a(j,k))
            enddo
            if (i .eq. j) then
               tmp=dble(x)
               if (dabs(dimag(x)) .gt. tol*tmp) then
                  error = 1
                  return
               else
C                 write (*,*) tmp
                  tmp=dsqrt(tmp)
                  a(i,j) = dcmplx(tmp)
                  y = 1.0d0/tmp
               endif
            else
               a(i,j) = x*y
            endif
         enddo
      enddo

      return
      end

C-----------------------------------------------------------------------
C                               MATINVZHP
C
C Inverts a double precision complex (Z), Hermitian (H), and positive
C definite (P) matrix A employing a Cholesky-decomposition of A.
C
C Input parameters:
C   a:     The matrix to be inverted (a(i,j) is needed for j <= i only).
C   dim:   The number of allocated rows (and columns) of "a".
C   n:     The number of used rows (and columns) of "a".
C   tol:   A (relative) tolerance for the check whether "a" is Hermitian
C          and positive definite.
C Output parameters:
C   b:     The inverse of "a".
C   a:     The matrix "a" is overwritten.
C   error: An error flag having the following meaning:
C          0: Everything was o. k.
C          1: The matrix "a" is not hermitian and positive definite.
C Remarks:
C   Library routine "choleskyzhp" is used.
C-----------------------------------------------------------------------

      subroutine matinvzhp (b,a,dim,n,tol,error)

      implicit none

      complex*16 zeroz,onez
      parameter (zeroz = (0.0d0,0.0d0),onez = (1.0d0,0.0d0))

      integer    dim,n,error,i,j,k
      real*8     tol
      complex*16 b(dim,dim),a(dim,dim),x

C 1. Construct the A = L L^dagger Cholesky-decomposition.
C 2. For every column j of the identity matrix do:
C    a. Solve L b = 1 (first do-loop over i).
C    b. Solve L_dagger u = b (u is stored directly in the j-th column of
C       the matrix b) (second do-loop over i).

      call choleskyzhp(a,dim,n,tol,error)

      if (error .ne. 0) return
      do j = 1,n
         do i = 1,n
            if (i .lt. j) then
               b(i,j) = zeroz
            elseif (i .eq. j) then
               b(i,j) = onez/a(i,i)
            else
               x = zeroz
               do k = j,i-1
                  x = x-a(i,k)*b(k,j)
               enddo
               b(i,j) = x/a(i,i)
            endif
         enddo
         do i = n,1,-1
            x = b(i,j)
            do k = i+1,n
               x = x-dconjg(a(k,i))*b(k,j)
            enddo
            b(i,j) = x/a(i,i)
         enddo
      enddo

      return
      end
========== END ==========

        If compiled like this:

frank:~/gccbug> g77-3.4 -O -msse2 -mfpmath=sse -g -Wall -v gccbug.f -o gccbug
Driving: g77-3.4 -O -msse2 -mfpmath=sse -g -Wall -v gccbug.f -o gccbug 
-lfrtbegin -lg2c -lm -shared-libgcc
Reading specs from /usr/lib/gcc/i486-linux/3.4.2/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 
--enable-shared --with-system-zlib --enable-nls --without-included-gettext 
--program-suffix=-3.4 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt 
--enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm 
--enable-java-awt=gtk --disable-werror i486-linux
Thread model: posix
gcc version 3.4.2 (Debian 3.4.2-2)
 /usr/lib/gcc/i486-linux/3.4.2/f771 gccbug.f -quiet -dumpbase gccbug.f -msse2 
-mfpmath=sse -mtune=i486 -auxbase gccbug -g -O -Wall -version -o /tmp/ccuU2AmV.s
GNU F77 version 3.4.2 (Debian 3.4.2-2) (i486-linux)
        compiled by GNU C version 3.4.2 (Debian 3.4.2-2).
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
gccbug.f: In subroutine `choleskyzhp':
In file included from gccbug.f:0:
gccbug.f:58: warning: 'y' might be used uninitialized in this function
 as -V -Qy --32 -o /tmp/ccQrG9WN.o /tmp/ccuU2AmV.s
GNU assembler version 2.15 (i386-linux) using BFD version 2.15
 /usr/lib/gcc/i486-linux/3.4.2/collect2 --eh-frame-hdr -m elf_i386 
-dynamic-linker /lib/ld-linux.so.2 -o gccbug 
/usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crt1.o 
/usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crti.o 
/usr/lib/gcc/i486-linux/3.4.2/crtbegin.o -L/usr/lib/gcc/i486-linux/3.4.2 
-L/usr/lib/gcc/i486-linux/3.4.2 -L/usr/lib/gcc/i486-linux/3.4.2/../../../../lib 
-L/usr/lib/gcc/i486-linux/3.4.2/../../.. -L/lib/../lib -L/usr/lib/../lib 
/tmp/ccQrG9WN.o -lfrtbegin -lg2c -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc 
/usr/lib/gcc/i486-linux/3.4.2/crtend.o 
/usr/lib/gcc/i486-linux/3.4.2/../../../../lib/crtn.o

        the program will crash with "Segmentation violation".
        (OT: The one warning makes sense, but is wrong.)

        Here's the output from a gdb session:

frank:~/gccbug> gdb ./gccbug
GNU gdb 6.1-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux"...Using host libthread_db library 
"/lib/libthread_db.so.1".

(gdb) break choleskyzhp_
Breakpoint 1 at 0x804880a: file gccbug.f, line 53.
(gdb) run
Starting program: /home/frank/gccbug/gccbug 

Breakpoint 1, choleskyzhp_ (a=0xbfffe7b0, dim=0xbfffe7ac, n=0xbfffe79c, 
tol=0xbfffe7a0, error=0xbfffe7ac) at gccbug.f:53
53            subroutine choleskyzhp (a,dim,n,tol,error)
Current language:  auto; currently fortran
(gdb) disass
Dump of assembler code for function choleskyzhp_:
0x080487e9 <choleskyzhp_+0>:    push   %ebp
0x080487ea <choleskyzhp_+1>:    mov    %esp,%ebp
0x080487ec <choleskyzhp_+3>:    push   %edi
0x080487ed <choleskyzhp_+4>:    push   %esi
0x080487ee <choleskyzhp_+5>:    push   %ebx
0x080487ef <choleskyzhp_+6>:    sub    $0x8c,%esp
0x080487f5 <choleskyzhp_+12>:   mov    0x8(%ebp),%eax
0x080487f8 <choleskyzhp_+15>:   mov    %eax,0xffffffe4(%ebp)
0x080487fb <choleskyzhp_+18>:   mov    0x10(%ebp),%edx
0x080487fe <choleskyzhp_+21>:   mov    0x14(%ebp),%eax
0x08048801 <choleskyzhp_+24>:   mov    %eax,0xffffffe0(%ebp)
0x08048804 <choleskyzhp_+27>:   mov    0x18(%ebp),%eax
0x08048807 <choleskyzhp_+30>:   mov    %eax,0xffffffdc(%ebp)
0x0804880a <choleskyzhp_+33>:   mov    0xc(%ebp),%eax
0x0804880d <choleskyzhp_+36>:   mov    (%eax),%eax
0x0804880f <choleskyzhp_+38>:   mov    %eax,0xffffffd8(%ebp)
0x08048812 <choleskyzhp_+41>:   mov    0xffffffdc(%ebp),%eax
0x08048815 <choleskyzhp_+44>:   movl   $0x0,(%eax)
0x0804881b <choleskyzhp_+50>:   mov    $0x1,%edi
0x08048820 <choleskyzhp_+55>:   mov    (%edx),%eax
0x08048822 <choleskyzhp_+57>:   dec    %eax
0x08048823 <choleskyzhp_+58>:   mov    %eax,0xffffffcc(%ebp)
0x08048826 <choleskyzhp_+61>:   js     0x80489cc <choleskyzhp_+483>
0x0804882c <choleskyzhp_+67>:   mov    (%edx),%edx
0x0804882e <choleskyzhp_+69>:   mov    %edx,0xffffffc4(%ebp)
0x08048831 <choleskyzhp_+72>:   movsd  0x80490a0,%xmm0
0x08048839 <choleskyzhp_+80>:   movapd %xmm0,0xffffff78(%ebp)   <-- program 
will crash here
0x08048841 <choleskyzhp_+88>:   flds   0x804907c
0x08048847 <choleskyzhp_+94>:   flds   0x8049080
0x0804884d <choleskyzhp_+100>:  movsd  0x8049090,%xmm0
0x08048855 <choleskyzhp_+108>:  movapd %xmm0,0xffffff68(%ebp)
0x0804885d <choleskyzhp_+116>:  mov    %edi,%esi
0x0804885f <choleskyzhp_+118>:  mov    0xffffffc4(%ebp),%eax
0x08048862 <choleskyzhp_+121>:  sub    %edi,%eax
0x08048864 <choleskyzhp_+123>:  mov    %eax,0xffffffc8(%ebp)
0x08048867 <choleskyzhp_+126>:  js     0x80489be <choleskyzhp_+469>
0x0804886d <choleskyzhp_+132>:  fxch   %st(1)
0x0804886f <choleskyzhp_+134>:  lea    0xffffffff(%edi),%eax
0x08048872 <choleskyzhp_+137>:  imul   0xffffffd8(%ebp),%eax
0x08048876 <choleskyzhp_+141>:  mov    %eax,0xffffffc0(%ebp)
0x08048879 <choleskyzhp_+144>:  movapd 0xffffff78(%ebp),%xmm0
0x08048881 <choleskyzhp_+152>:  movapd %xmm0,0xffffff98(%ebp)
0x08048886 <choleskyzhp_+157>:  fstl   0xffffff90(%ebp)
0x08048889 <choleskyzhp_+160>:  fxch   %st(1)
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) step
61            error = 0
(gdb) step
62            do j = 1,n
(gdb) step
82            enddo
(gdb) stepi
0x08048822      82            enddo
(gdb) stepi
0x08048823      82            enddo
(gdb) stepi
0x08048826      82            enddo
(gdb) stepi
0x0804882c      82            enddo
(gdb) stepi
0x0804882e      82            enddo
(gdb) stepi
0x08048831      82            enddo
(gdb) stepi
0x08048839      82            enddo
(gdb) print $ebp
$1 = (PTR TO -> ( void )) 0xbfffe684   <-- ebp must be 0x.......8 for correct 
alignment
(gdb) stepi                            <-- now executing the incorrect movapd...

Program received signal SIGSEGV, Segmentation fault.
0x08048839 in choleskyzhp_ (a=0xbfffe7b0, dim=0x4, n=0x5, tol=0xbfffe7a0, 
error=0xbfffe7ac) at gccbug.f:82
82            enddo
(gdb) q

        I hope this is enough information for you to reproduce the bug.

        One interesting observation I made:
        As you can see, the main program calls "matinvzhp", which in turn
        immediately calls "choleskyzhp" where the bug appears. If the main
        program is rewritten to directly call "choleskyzhp", the bug disappears.
        That's why I had to post such a rather long test program.

        The bug is also present in gcc-3.4.1 (the one currently in 
debian/sarge).
        The bug was not present in gcc-3.3.
        I've tested Pentium4 as well as AMD64 processors (both have the SSE2 
extensions),
        the crash happens on both.
        I've also tested the program on an AMD64 processor in a 64-bit 
environment
        (corresponding to target x86_64) -- there the program runs fine.

>Fix:
        The available workarounds are:
        * not using option "-O"
        * not using option "-mfpmath=sse"
        * uncommenting the line "write (*,*) tmp" in choleskyzhp
          and recompiling (probably because this line disturbs the
          optimizer)

-mfpmath=sse creates illegal code (movapd with misaligned argument)

Reply via email to