https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71374

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-06-02
                 CC|                            |vmakarov at gcc dot gnu.org
          Component|target                      |rtl-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
This is register allocator failure, as evident from even more simplified
testcase:

int a, b, c;
extern inline void fn1 (void *p1, void *p2)
{ 
  __asm__ ("#": "=&c" (a), "=&D" (b), "=&S" (c): "r" (p2), "2" (p2));
}


LRA gets following RTX:

(insn 10 4 7 2 (parallel [
            (set (reg:SI 89)
                (asm_operands:SI ("#") ("=&c") 0 [
                        (reg/v/f:DI 88 [ p2 ])
                        (reg/v/f:DI 88 [ p2 ])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (set (reg:SI 90)
                (asm_operands:SI ("#") ("=&D") 1 [
                        (reg/v/f:DI 88 [ p2 ])
                        (reg/v/f:DI 88 [ p2 ])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (set (reg:SI 91)
                (asm_operands:SI ("#") ("=&S") 2 [
                        (reg/v/f:DI 88 [ p2 ])
                        (reg/v/f:DI 88 [ p2 ])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (clobber (reg:CCFP 18 fpsr))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 -1
     (expr_list:REG_DEAD (reg/v/f:DI 88 [ p2 ])
        (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))

Please note how asm input is tied through p2 variable. LRA ties "2" matching
constraint with "=&D" earlyclobber output constraint (BTW: matching
earlyclobber output is allowed), but it can't resolve tie through p2. This
results in:

(insn 10 4 15 2 (parallel [
            (set (reg:SI 2 cx [89])
                (asm_operands:SI ("#") ("=&c") 0 [
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (set (reg:SI 5 di [90])
                (asm_operands:SI ("#") ("=&D") 1 [
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (set (reg:SI 4 si [orig:88 p2 ] [88])
                (asm_operands:SI ("#") ("=&S") 2 [
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                        (reg/v/f:DI 4 si [orig:88 p2 ] [88])
                    ]
                     [
                        (asm_input:DI ("r") t.c:4)
                        (asm_input:DI ("2") t.c:4)
                    ]
                     [] t.c:4))
            (clobber (reg:CCFP 18 fpsr))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 -1
     (nil))

which results in reg SI allocated to asm input 0. This violates earlyclobber
requirement that "this operand may not lie in a register that is read by the
instruction or as part of any memory address" with output operand 2, which is
also reg SI.

LRA should copy asm input 0 to an appropriate class temporary reg in the above
case.

Confirmed as rtl-optimization problem.

Reply via email to