Hi, suppose the following C code:

static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
    _Fract f;
    __builtin_memcpy (&f, &i, sizeof (_Fract));
    return f;
}

_Fract func (void)
{
#if B == 1
    return rbits (0x1234);
#elif B == 2
    return 0.14222r;
#endif
}


Type-punning idioms like in rbits above are very common in libgcc, for example
in fixed-bit.c.

In this example, both compilation variants are equivalent (provided int and
_Fract are 16 bits wide).  The problem with the B=1 variant is that it is
inefficient:

Variant B=1 needs 2 instructions.
Variant B=2 needs 11 instructions, 9 of them are not needed at all.

The problem goes as follows:

The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to memory
moves through the frame:


(insn 5 4 6 (set (reg:HI 45)
        (const_int 4660 [0x1234])) bloat.c:5 -1
     (nil))

(insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
        (reg:HI 45)) bloat.c:5 -1
     (nil))

(insn 7 6 8 (set (reg:HQ 46)
        (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
     (nil))

(insn 8 7 9 (set (reg:HQ 43 [ <retval> ])
        (reg:HQ 46)) bloat.c:12 -1
     (nil))


Is there a specific reason why this is not expanded as subreg like this?


  (set (reg:HQ 46)
       (subreg:HQ [(reg:HI 45)] 0))


The insns are analyzed in .dfinit:

;;  regs ever live       24[r24] 25[r25] 29[r29]

24/25 is the return register, 28/29 is the frame pointer:


(insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [2 S2 A8])
        (reg:HI 45)) bloat.c:5 82 {*movhi}


Is there a reason why R28 is not marked as live?


The memory accesses are optimized out in .fwprop2 so that no frame pointer is
needed any more.

However, in the subsequent passes, R29 is still reported as "regs ever live"
which is not correct.

The "regs ever live" don't change until .ira, where R28 springs to live again:

;;  regs ever live       24[r24] 25[r25] 28[r28] 29[r29]

And in .reload:

;;  regular block artificial uses        28 [r28] 32 [__SP_L__]
;;  eh block artificial uses     28 [r28] 32 [__SP_L__] 34 [argL]
;;  entry block defs     8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
[r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  exit block uses      24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  regs ever live       24[r24] 25[r25] 28[r28] 29[r29]

Outcome is that the frame pointer is set up without need, which is very costly
on avr.


The compiler is trunk from 2013-01-16 configured for avr:

   --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2

The example is compiled with

   avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1


In what stage happens the misoptimization?

Is this worth a PR? (I.e. is there a change that anybody cares?)

Thanks.

Reply via email to