[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2021-04-25 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from rsandifo at gcc dot gnu.org  
---
Fixed for GCC 9 and above.  Thanks for the bug report.

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2021-04-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

--- Comment #11 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:49cc1253d079bbefc18275f29adc526679422176

commit r9-9463-g49cc1253d079bbefc18275f29adc526679422176
Author: Richard Sandiford 
Date:   Sun Apr 25 14:51:14 2021 +0100

lra: Avoid cycling on certain subreg reloads [PR96796]

This PR is about LRA cycling for a reload of the form:

   

Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
  Creating newreg=287, assigning class ALL_REGS to slow/invalid mem
r287
  Creating newreg=288, assigning class ALL_REGS to slow/invalid mem
r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
  REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0
   


The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

This happens for input address reloads, where the class is taken
from the target's chosen base register class.  It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):

   

 Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
  Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
Inserting insn reload before:
  320: r291:SI=r287:DI#0
   


IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:

   

 Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
  Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to
r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
  318: r287:DI=r289:DI
---

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2020-08-26 Thread acoplan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

--- Comment #3 from Alex Coplan  ---
Adding -fcommon, I can reproduce this ICE on trunk. The default changed in GCC
10 (as of 6271dd984d7f920d4fb17ad37af6a1f8e6b796dc).

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2020-08-26 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
There's a reduced test-case:

cat pr96796.c
struct S0 {
  signed f0 : 8;
  unsigned f1;
  unsigned f4;
};
struct S1 {
  long f3;
  char f4;
} g_3_4;

int g_5, func_1_l_32, func_50___trans_tmp_31;
static struct S0 g_144, g_834, g_1255, g_1261;

int g_273[120] = {};
int *g_555;
char **g_979;
static int g_1092_0;
static int g_1193;
int safe_mul_func_int16_t_s_s(int si1, int si2) { return si1 * si2; }
static struct S0 *func_50();
int func_1() { func_50(g_3_4, g_5, func_1_l_32, 8, 3); }
void safe_div_func_int64_t_s_s(int *);
void safe_mod_func_uint32_t_u_u(struct S0);
struct S0 *func_50(int p_51, struct S0 p_52, struct S1 p_53, int p_54,
   int p_55) {
  int __trans_tmp_30;
  char __trans_tmp_22;
  short __trans_tmp_19;
  long l_985_1;
  long l_1191[8];
  safe_div_func_int64_t_s_s(g_273);
  __builtin_printf((char*)g_1261.f4);
  safe_mod_func_uint32_t_u_u(g_834);
  g_144.f0 += 1;
  for (;;) {
struct S1 l_1350 = {_1350};
for (; p_53.f3; p_53.f3 -= 1)
  for (; g_1193 <= 2; g_1193 += 1) {
__trans_tmp_19 = safe_mul_func_int16_t_s_s(l_1191[l_985_1 + p_53.f3],
   p_55 % (**g_979 = 10));
__trans_tmp_22 = g_1255.f1 * p_53.f4;
__trans_tmp_30 = __trans_tmp_19 + __trans_tmp_22;
if (__trans_tmp_30)
  g_1261.f0 = p_51;
else {
  g_1255.f0 = p_53.f3;
  int *l_1422 = g_834.f0 = g_144.f4 != (*l_1422)++ > 0 < 0 ^ 51;
  g_555 = ~0;
  g_1092_0 |= func_50___trans_tmp_31;
}
  }
  }
}

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2020-08-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

Richard Biener  changed:

   What|Removed |Added

 Target||aarch64
   Target Milestone|--- |9.4

[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload

2020-08-26 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||10.1.1, 11.0, 8.4.1
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2020-08-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Known to fail||9.3.1
Summary|aarch64: ICE during RTL |[9 Regression] aarch64: ICE
   |pass: reload|during RTL pass: reload

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed on GCC 9 branches. Other branches don't ICE for me