[Bug rtl-optimization/90174] New: Bad register spill due to top-down allocation order

fxue at os dot amperecomputing.com Thu, 18 Apr 2019 22:43:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90174


            Bug ID: 90174
           Summary: Bad register spill due to top-down allocation order
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Current regional RA uses a top-down allocation order, which may not properly
split a long live range that crosses sub-region with high register pressure. 

In the following graph, lv0 is live in whole outer region, and suppose inner
region is under high register pressure due to lots of live ranges inside it.

According to RA algorithm, out region is processed firstly, lv0 is picked up as
spill candidate. And then turn to inner region, also the part of lv0 in inner
region is marked as being spilled. Finally result is that the whole lv0 should
be spilled. But if in area excluding inner region, there is with low register
pressure, we can get a better choice to place lv0 in register instead of
memory, and only spill/reload lv0 at boundary of entry-into(A)/exist-from(B)
inner region. In other word, inner region boundary are split points for lv0.


               |  
       outer   | lv0 
       region  | __________  split point
               |/
    .----------A---------------.
    |          |               |
    |          |  |         |  |
    |  inner   |  | lv1     |  | 
    |  region  |  |         |  |
    |          |  |     lv2 |  |
    |          |  |         |  |
    |          |               |
    '----------B---------------'
               |\__________
               |            split point
               |

Here is an example to show this. gcc produces bad spills as we point out(-m64
-O3, for x86-64), but llvm generates better spill/reload as we expect.

  int value[20];

  int user0;
  int user1;
  int user2[100];
  int user3;

  int fncall(void);

  void foo(int cond)
  {
      int lv_out = value[0];
      int i;

      user0 = lv_out;   /* Better to place lv_out in register. */

      if (cond) {
          int sum = 0;
          int lv_in_1 = value[1];
          int lv_in_2 = value[2];
          int lv_in_3 = value[3];
          int lv_in_4 = value[4];
          int lv_in_5 = value[5];
          int lv_in_6 = value[6];
          int lv_in_7 = value[7];
          int lv_in_8 = value[8];
          int lv_in_9 = value[9];
          int lv_in_10 = value[10];
          int lv_in_11 = value[11];
          int lv_in_12 = value[12];
          int lv_in_13 = value[13];
          int lv_in_14 = value[14];
          int lv_in_15 = value[15];

          /* Better to spill lv_out here */

          for (i = 0; i < 1000; i++) {
              sum += lv_in_1;
              sum += lv_in_2;
              sum += lv_in_3;
              sum += lv_in_4;
              sum += lv_in_5;
              sum += lv_in_6;
              sum += lv_in_7;
              sum += lv_in_8;
              sum += lv_in_9;
              sum += lv_in_10;
              sum += lv_in_11;
              sum += lv_in_12;
              sum += lv_in_13;
              sum += lv_in_14;
              sum += lv_in_15;

              fncall();

              lv_in_1 ^= i;
              lv_in_2 ^= i;
              lv_in_3 ^= i;
              lv_in_4 ^= i;
              lv_in_5 ^= i;
              lv_in_6 ^= i;
              lv_in_7 ^= i;
              lv_in_8 ^= i;
              lv_in_9 ^= i;
              lv_in_10 ^= i;
              lv_in_11 ^= i;
              lv_in_12 ^= i;
              lv_in_13 ^= i;
              lv_in_14 ^= i;
              lv_in_15 ^= i;
          }

          /* Better to reload lv_out here */

          user1 = sum;
      }

      for (i = 0; i < 100; i++) {
          user2[i ^ 100] = lv_out; /* Better to place lv_out in register */
      }

      user3 = lv_out;  /* Better to place lv_out in register */
  }


For top-down allocation, we can only adjust inner region allocation result, but
no way to refine decision that has been made on outside live-range, it is an
intrinsic weakness of the top-down algorithm. To fix that, we may need to add a
new pass to explicitly split live ranges based on region boundary, or adopt a
reverse means, from inner to outer to perform allocation.

[Bug rtl-optimization/90174] New: Bad register spill due to top-down allocation order

Reply via email to