http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59086

--- Comment #3 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
First of all, reload also cannot generate this code with mentioned i386.c
change

When we use -maccumulate-outgoing-args, we have before reload/LRA

(insn 6 10 7 2 (parallel [
            (set (reg:SI 87 [ x ])
                (asm_operands:SI ("1:
") ("=d") 0 [
                        (mem/c:SI (reg/f:SI 16 argp) [0 count+0 S4 A32])
                        (mem/c:V2DI (plus:SI (reg/f:SI 20 frame)
                                (const_int -32 [0xffffffffffffffe0])) [0 tmp1+0
S16 A128])
                        (reg:SI 87 [ x ])
                    ]
                     [
                        (asm_input:SI ("m") (null):0)
                        (asm_input:V2DI ("m") (null):0)
                        (asm_input:SI ("0") (null):0)
                    ]
                     [] b2.c:7))
            (clobber (reg:QI 18 fpsr))
            (clobber (reg:QI 17 flags))
            (clobber (reg:QI 0 ax))
            (clobber (reg:QI 5 di))
            (clobber (reg:QI 4 si))
            (clobber (reg:QI 2 cx))
        ]) b2.c:7 -1

and argp is transformed after lra/reload into fp+const:

(insn 6 10 7 2 (parallel [
            (set (reg:SI 1 dx [orig:87 x ] [87])
                (asm_operands:SI ("1:
") ("=d") 0 [
                        (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                                (const_int 8 [0x8])) [0 count+0 S4 A32])
                        (mem/c:V2DI (reg/f:SI 7 sp) [0 tmp1+0 S16 A128])
                        (reg:SI 1 dx [orig:87 x ] [87])
                    ]
 If we don't use -maccumulate-outgoing-args, we have before reload/LRA:

(insn/f 10 3 2 2 (set (reg:SI 89)
        (reg:SI 2 cx)) 86 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 2 cx)
        (expr_list:REG_CFA_SET_VDRAP (reg:SI 89)
            (nil))))
...
(insn 6 11 7 2 (parallel [
            (set (reg:SI 87 [ x ])
                (asm_operands:SI ("1:
") ("=d") 0 [
                        (mem/c:SI (reg:SI 89) [0 count+0 S4 A32])
                        (mem/c:V2DI (plus:SI (reg/f:SI 20 frame)
                                (const_int -32 [0xffffffffffffffe0])) [0 tmp1+0
S16 A128])
                        (reg:SI 87 [ x ])
                    ]
As we have only 1 free reg for asm (4 regs are clobbered in the asm, ebx is
taken for -fPIC, ebp is always needed for -mstackrealign), we cannot use it for
p87 (it has + constraint) and p89.  The generated code also has no equiv for
p89 to use it.  So reload/LRA can do nothing in this situation.

Even if i implement fp elimination in presence of sp changes in RTL (and i am
working on it as it is needed for other existing PRs and very important for
generated code performance), bp will not be free (again because of
-mstackrealign).  So if we want to compile the code, we should revert the
original change

-  /* ??? Unwind info is not correct around the CFG unless either a frame
-     pointer is present or M_A_O_A is set.  Fixing this requires rewriting
-     unwind info generation to be aware of the CFG and propagating states
-     around edges.  */
-  if ((flag_unwind_tables || flag_asynchronous_unwind_tables
-       || flag_exceptions || flag_non_call_exceptions)
-      && flag_omit_frame_pointer
-      && !(target_flags & MASK_ACCUMULATE_OUTGOING_ARGS))
-    {
-      if (target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS)
-       warning (0, "unwind tables currently require either a frame pointer "
-                "or %saccumulate-outgoing-args%s for correctness",
-                prefix, suffix);
-      target_flags |= MASK_ACCUMULATE_OUTGOING_ARGS;
-    }
-

although we could modify the comment if it is not true anymore.

Still the code will be broken for some tunings with -mno-accumulate-args by
default.

To really solve the problem, we should free bp somehow.  Probably it can be
done by smarter stack realign implementation or saving/restoring bp around asm.
 The later is very complicated task and can not be done in gcc-4.9 time frame.

Reply via email to