On 14/01/15 02:20 PM, Robert Suchanek wrote:
> Hi Vladimir,
>
> An issue has been identified with LRA when running CPU2006 h264ref benchmark.
>
> I'll try to describe what the issue is and a fix applied as it is very
> difficult to reproduce it and it is next to impossible to create a narrowed
> testcase on top of the source code restrictions.
>
> The concerned LRA code in lra-constraints.c is the following:
>
> if (GET_CODE (*loc) == SUBREG)
>   {
>     reg = SUBREG_REG (*loc);
>     byte = SUBREG_BYTE (*loc);
>     if (REG_P (reg)
>         /* Strict_low_part requires reload the register not
>            the sub-register.  */
>         && (curr_static_id->operand[i].strict_low
>             || (GET_MODE_SIZE (mode)
>                 <= GET_MODE_SIZE (GET_MODE (reg))
>                 && (hard_regno
>                     = get_try_hard_regno (REGNO (reg))) >= 0
>                 && (simplify_subreg_regno
>                     (hard_regno,
>                      GET_MODE (reg), byte, mode) < 0)
>                 && (goal_alt[i] == NO_REGS
>                     || (simplify_subreg_regno
>                         (ira_class_hard_regs[goal_alt[i]][0],
>                          GET_MODE (reg), byte, mode) >= 0))))
>       {
>         loc = &SUBREG_REG (*loc);
>         mode = GET_MODE (*loc);
>       }
>   }
>
> The above works just fine when we deal with strict_low_part or a subreg
> smaller than a word.
>
> However, multi-word operations that were emitted as a sequence of operations
> on word sized parts of the DImode register appears to expose a problem with
> LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
> LRA does not realize that it actually uses the other halve of the DI-mode
> register leading to a situation where it modifies one halve of the result and
> spills the whole register with the other halve undefined.
>
> In the dump I can see the following:
>
>       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>  1487: r1552:DI#4=r1404:SI+r1509:SI
>       REG_DEAD r1509:SI
>       REG_DEAD r1404:SI
>     Inserting insn reload after:
>  1735: r521:DI=r1552:DI
>
> There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
> to load the value before modifying it but it is spilled.
>
> As it is a multi-word register, the split pass emits an additional instruction
> to load the whole 64-bit value but since one halve was modified, only
> register $20 appears in the live-in set. In contrast to $20, $21 is being used
> but not added to the live-in set.
>
> ...
> ;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
> [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
> 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
> ...
> (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
>     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>               (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
> {*movsi_internal}
>      (nil))
> (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
>     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
>               (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
> {*movsi_internal}
>      (nil))
> ...
>
> The potential fix for this is to promote the type of a subreg OP_OUT to 
> OP_INOUT
> to treat the pseudo register (r1552 in this case) as input and LRA will be 
> forced
> to insert a reload before modifying its contents. 
>
> Handling of strict_low_part case is fine as the operand is described in the 
> MD pattern
> as IN_OUT through modifiers.
>
> With the above change in place, we get a reload before assignment:
>
>       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
>  1487: r1552:DI#4=r1404:SI+r1509:SI
>       REG_DEAD r1509:SI
>       REG_DEAD r1404:SI
>     Inserting insn reload before:
>  1735: r1552:DI=r521:DI
>     Inserting insn reload after:
>  1736: r521:DI=r1552:DI
>
> and the benchmark happily passes the runtime check.
>
> The question is whether changing the type to OP_INOUT is the correct and
> valid fix?
>
> Regards,
> Robert
>
> 2015-01-14  Robert Suchanek  <robert.sucha...@imgtec.com>
>
> gcc/
>         * lra-constraints.c (curr_insn_transform): Change the type of a reload
>         pseudo to OP_INOUT.
> ---
>  gcc/lra-constraints.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index ec28b7f..018968b 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -3798,6 +3798,7 @@ curr_insn_transform (void)
>                                   (ira_class_hard_regs[goal_alt[i]][0],
>                                    GET_MODE (reg), byte, mode) >= 0)))))
>                 {
> +                 type = OP_INOUT;
>                   loc = &SUBREG_REG (*loc);
>                   mode = GET_MODE (*loc);
>                 }
>
Robert, thanks for working on the issue.  Your approach is in the right
direction.  But I believe the change should be

if (type == OP_OUT)
  type = OP_INOUT;

Otherwise, I will generate still a right code but an additional insn
which will not go away later.

With this change the patch is ok.  Please, check x86/x86-64 target
bootstrap and GCC testsuite before submitting the patch.  It seems there
will be no problem with the patch on other targets but it is better to
check to be sure.


Reply via email to