[Bug rtl-optimization/77499] New: Regression after code-hoisting, due to combine pass failing to evaluate known value range

avieira at gcc dot gnu.org Tue, 06 Sep 2016 08:54:42 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499


            Bug ID: 77499
           Summary: Regression after code-hoisting, due to combine pass
                    failing to evaluate known value range
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hello,

We are seeing a regression for arm-none-eabi on a Cortex-M7. This regression
was observed after Biener's and Bosscher's GIMPLE code-hoisting patch
(https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00360.html). The example below
will illustrate the regression:

$ cat t.c
unsigned short foo (unsigned short x, int c, int d, int e)
{
  unsigned short y;

  while (c > d)
    {
      if (c % 3)
        {
         x = (x >> 1) ^ 0xB121U;
        }
      else
         x = x >> 1;
      c-= e;
    }
  return x;
}

Comparing:
$ arm-none-eabi-gcc -mcpu=cortex-m7 -mthumb -O2 -S t.c 
vs
$ arm-none-eabi-gcc -mcpu=cortex-m7 -mthumb -O2 -S t.c -fno-code-hoisting

Will illustrate that the code-hoisting version has an extra zero_extension of
HImode to SImode. After some digging I found out that during the combine phase,
the no-code-hoisting version is able to recognize that the
'last_set_nonzero_bits' are 0x7fff whereas for the code-hoisted version it
seems to have lost this knowledge.

Looking at the graph dump for the no-code-hoisting t.c.246r.ud_dce.dot I see
the following insns:
23: r125:SI=r113:SI 0>>0x1
24: r111:SI=zero_extend(r125:SI#0)
27: r128:SI=r111:SI^r131:SI
28: r113:SI=zero_extend(r128:SI#0)

These are all in the same basic block. 

For the code-hoisting version we have:

BB A:
...
12: r116:SI=r112:SI 0>>0x1
13: r112:SI=zero_extend(r116:SI#0)
...
BB B:
27: r127:SI=r112:SI^r129:SI
28: r112:SI=zero_extend(r127:SI#0)


Now from what I have observed by debugging the combine pass is that combine
will first combine instructions 23 and 24. 
Here combine is able to optimize away the zero_extend, because in
'reg_nonzero_bits_for_combine' the reg_stat[113] has its 'last_set_value' to
'r0' (the unsigned short argument) and its corresponding
'last_set_nonzero_bits' to 0xffffffff0000ffff. Which means the zero_extend is
pointless. The code-hoisting version also combines 12 and 13, optimizing away
the zero_extend.

However, the next set of instructions is where things get tricky. In the
no-code-hoisting case it will end up combining all 4 instructions one by one
from the top down and it will end up figuring out that the last zero_extend is
also not required. For the code-hoisting case, when it tries to combine 28 with
27 (remember they are not in the basic block as 13 and 14, so it will never try
to combine all 4), it will eventually try to evaluate the nonzero bits based on
r112 and see that the last_set_value for r112 is:
(lshiftrt:SI (clobber:SI (const_int 0 [0]))
(const_int 1 [0x1]))

The last_set_nonzero_bits will be 0x7fffffff, instead of the expected
0x00007fff. This looks like the result of the code in 'record_value_for_reg' in
combine.c that sits bellow the comment:

  /* The value being assigned might refer to X (like in "x++;").  In that
     case, we must replace it with (clobber (const_int 0)) to prevent
     infinite loops.  */

Given that 12 and 13 were combined into:
r112:SI=r112:SI 0>>0x1

This seems to be invalidating the last_set_value and thus leaving us with a
weaker 'last_set_nonzero_bits' which isn't enough to optimize away the
zero_extend.

Any clue on how to "fix" this?

Cheers,
Andre

PS: I am not sure I completely understand the way the last_set_value stuff
works for pseudo's in combine, but it looks to me like each instruction is
visited in a top down order per basic block again in a top-down order. And each
pseudo will have its 'last_set_value' according to the last time that register
was seen being set, without any regards to loop or proper dataflow analysis.
Can anyone explain to me how this doesnt go horribly wrong?

[Bug rtl-optimization/77499] New: Regression after code-hoisting, due to combine pass failing to evaluate known value range

Reply via email to