[commit: ghc] master: Change stack alignment to 16+8 bytes in STG code (a9ce361)

David Terei Tue, 01 Nov 2011 21:31:11 -0700

Repository : ssh://darcs.haskell.org//srv/darcs/ghc

On branch  : master


http://hackage.haskell.org/trac/ghc/changeset/a9ce36118f0de3aeb427792f8f2c5ae097c94d3f

>---------------------------------------------------------------

commit a9ce36118f0de3aeb427792f8f2c5ae097c94d3f
Author: David M Peixotto <[email protected]>
Date:   Wed Oct 19 15:49:06 2011 -0500

    Change stack alignment to 16+8 bytes in STG code
    
    This patch changes the STG code so that %rsp to be aligned
    to a 16-byte boundary + 8. This is the alignment required by
    the x86_64 ABI on entry to a function. Previously we kept
    %rsp aligned to a 16-byte boundary, but this was causing
    problems for the LLVM backend (see #4211).
    
    We now don't need to invoke llvm stack mangler on
    x86_64 targets. Since the stack is now 16+8 byte algined in
    STG land on x86_64, we don't need to mangle the stack
    manipulations with the llvm mangler.
    
    This patch only modifies the alignement for x86_64 backends.
    
    Signed-off-by: David Terei <[email protected]>

>---------------------------------------------------------------

 compiler/llvmGen/LlvmMangler.hs   |    6 +++-
 compiler/nativeGen/X86/CodeGen.hs |   16 ++++++++------
 rts/StgCRun.c                     |   42 +++++++++++++++++++++---------------
 3 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/compiler/llvmGen/LlvmMangler.hs b/compiler/llvmGen/LlvmMangler.hs
index 68e92cf..981bbf2 100644
--- a/compiler/llvmGen/LlvmMangler.hs
+++ b/compiler/llvmGen/LlvmMangler.hs
@@ -143,11 +143,13 @@ fixTables ss = fixed
     have been pushed, so sub 4). GHC though since it always uses jumps keeps
     the stack 16 byte aligned on both function calls and function entry.
 
-    We correct the alignment here.
+    We correct the alignment here for Mac OS X i386. The x86_64 target already
+    has the correct alignment since we keep the stack 16+8 aligned throughout
+    STG land for 64-bit targets.
 -}
 fixupStack :: B.ByteString -> B.ByteString -> B.ByteString
 
-#if !darwin_TARGET_OS
+#if !darwin_TARGET_OS || x86_64_TARGET_ARCH
 fixupStack = const
 
 #else
diff --git a/compiler/nativeGen/X86/CodeGen.hs 
b/compiler/nativeGen/X86/CodeGen.hs
index 1efa327..458f379 100644
--- a/compiler/nativeGen/X86/CodeGen.hs
+++ b/compiler/nativeGen/X86/CodeGen.hs
@@ -1842,15 +1842,17 @@ genCCall64 target dest_regs args =
             tot_arg_size = arg_size * length stack_args
 
             -- On entry to the called function, %rsp should be aligned
-            -- on a 16-byte boundary +8 (i.e. the first stack arg after
-            -- the return address is 16-byte aligned).  In STG land
-            -- %rsp is kept 16-byte aligned (see StgCRun.c), so we just
-            -- need to make sure we push a multiple of 16-bytes of args,
-            -- plus the return address, to get the correct alignment.
+            -- on a 16-byte boundary +8 (i.e. the first stack arg
+            -- above the return address is 16-byte aligned).  In STG
+            -- land %rsp is kept 8-byte aligned (see StgCRun.c), so we
+            -- just need to make sure we pad by eight bytes after
+            -- pushing a multiple of 16-bytes of args to get the
+            -- correct alignment. If we push an odd number of eight byte
+            -- arguments then no padding is needed.
             -- Urg, this is hard.  We need to feed the delta back into
             -- the arg pushing code.
         (real_size, adjust_rsp) <-
-            if tot_arg_size `rem` 16 == 0
+            if (tot_arg_size + 8) `rem` 16 == 0
                 then return (tot_arg_size, nilOL)
                 else do -- we need to adjust...
                     delta <- getDeltaNat
@@ -1865,7 +1867,7 @@ genCCall64 target dest_regs args =
         delta <- getDeltaNat
 
         -- deal with static vs dynamic call targets
-        (callinsns,cconv) <-
+        (callinsns,_cconv) <-
           case target of
             CmmCallee (CmmLit (CmmLabel lbl)) conv
                -> -- ToDo: stdcall arg sizes
diff --git a/rts/StgCRun.c b/rts/StgCRun.c
index 7251e64..11e0543 100644
--- a/rts/StgCRun.c
+++ b/rts/StgCRun.c
@@ -267,28 +267,35 @@ StgRunIsImplementedInAssembler(void)
        "addq %0, %%rsp\n\t"
        "retq"
 
-       : : "i"(RESERVED_C_STACK_BYTES+48+8 /*stack frame size*/));
+       : : "i"(RESERVED_C_STACK_BYTES+48 /*stack frame size*/));
     /* 
-       HACK alert!
-
-       The x86_64 ABI specifies that on a procedure call, %rsp is
+       The x86_64 ABI specifies that on entry to a procedure, %rsp is
        aligned on a 16-byte boundary + 8.  That is, the first
        argument on the stack after the return address will be
        16-byte aligned.  
        
-       Which should be fine: RESERVED_C_STACK_BYTES+48 is a multiple
-       of 16 bytes.  
-       
-       BUT... when we do a C-call from STG land, gcc likes to put the
-       stack alignment adjustment in the prolog.  eg. if we're calling
-       a function with arguments in regs, gcc will insert 'subq $8,%rsp'
-       in the prolog, to keep %rsp aligned (the return address is 8
-       bytes, remember).  The mangler throws away the prolog, so we
-       lose the stack alignment.
-
-       The hack is to add this extra 8 bytes to our %rsp adjustment
-       here, so that throughout STG code, %rsp is 16-byte aligned,
-       ready for a C-call.  
+       We maintain the 16+8 stack alignment throughout the STG code.
+
+       When we call STG_RUN the stack will be aligned to 16+8. We used
+       to subtract an extra 8 bytes so that %rsp would be 16 byte
+       aligned at all times in STG land. This worked fine for the
+       native code generator which knew that the stack was already
+       aligned on 16 bytes when it generated calls to C functions.
+
+       This arrangemnt caused problems for the LLVM backend. The LLVM
+       code generator would assume that on entry to each function the
+       stack is aligned to 16+8 as required by the ABI. However, since
+       we only enter STG functions by jumping to them with tail calls,
+       the stack was actually aligned to a 16-byte boundary. The LLVM
+       backend had its own mangler that would post-process the
+       assembly code to fixup the stack manipulation code to mainain
+       the correct alignment (see #4211).
+
+       Therefore, we now now keep the stack aligned to 16+8 while in
+       STG land so that LLVM generates correct code without any
+       mangling. The native code generator can handle this alignment
+       just fine by making sure the stack is aligned to a 16-byte
+       boundary before it makes a C-call.
 
        A quick way to see if this is wrong is to compile this code:
 
@@ -300,7 +307,6 @@ StgRunIsImplementedInAssembler(void)
        stack isn't aligned, and calling exitWith from Haskell invokes
        shutdownHaskellAndExit using a C call.
 
-       Future gcc releases will almost certainly break this hack...
     */
 }
 



_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

[commit: ghc] master: Change stack alignment to 16+8 bytes in STG code (a9ce361)

Reply via email to