[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 --- Comment #8 from CVS Commits --- The master branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:6001db79c477b03eacc7e7049560921fb54b7845 commit r11-3041-g6001db79c477b03eacc7e7049560921fb54b7845 Author: Richard Sandiford Date: Mon Sep 7 20:15:36 2020 +0100 lra: Avoid cycling on certain subreg reloads [PR96796] This PR is about LRA cycling for a reload of the form: Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI] Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287 Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288 103: r203:SI=r288:SI<<0x1+r196:DI#0 REG_DEAD r196:DI Inserting slow/invalid mem reload before: 316: r287:DI=[r105:DI*0x8+r140:DI] 317: r288:SI=r287:DI#0 The problem is with r287. We rightly give it a broad starting class of POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class). However, we never make forward progress towards narrowing it down to a specific choice of class (POINTER_REGS or FP_REGS). I think in practice we rely on two things to narrow a reload pseudo's class down to a specific choice: (1) a restricted class is specified when the pseudo is created This happens for input address reloads, where the class is taken from the target's chosen base register class. It also happens for simple REG reloads, where the class is taken from the chosen alternative's constraints. (2) uses of the reload pseudo as a direct input operand In this case get_reload_reg tries to reuse the existing register and narrow its class, instead of creating a new reload pseudo. However, neither occurs here. As described above, r287 rightly starts out with a wide choice of class, ultimately derived from ALL_REGS, so we don't get (1). And as the comments in the PR explain, r287 is never used as an input reload, only the subreg is, so we don't get (2): Choosing alt 13 in insn 317: (0) r (1) w {*movsi_aarch64} Creating newreg=291, assigning class FP_REGS to r291 317: r288:SI=r291:SI Inserting insn reload before: 320: r291:SI=r287:DI#0 IMO, in this case we should rely on the reload of r316 to narrow down the class of r278. Currently we do: Choosing alt 7 in insn 316: (0) r (1) m {*movdi_aarch64} Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289 316: r289:DI=[r105:DI*0x8+r140:DI] Inserting insn reload after: 318: r287:DI=r289:DI --- i.e. we create a new pseudo register r289 and give *that* pseudo GENERAL_REGS instead. This is because get_reload_reg only narrows down the existing class for OP_IN and OP_INOUT, not OP_OUT. But if we have a reload pseudo in a reload instruction and have chosen a specific class for the reload pseudo, I think we should simply install it for OP_OUT reloads too, if the class is a subset of the existing class. We will need to pick such a register whatever happens (for r289 in the example above). And as explained in the PR, doing this actually avoids an unnecessary move via the FP registers too. The patch is quite aggressive in that it does this for all reload pseudos in all reload instructions. I wondered about reusing the condition for a reload move in in_class_p: INSN_UID (curr_insn) >= new_insn_uid_start && curr_insn_set != NULL && ((OBJECT_P (SET_SRC (curr_insn_set)) && ! CONSTANT_P (SET_SRC (curr_insn_set))) || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set))) && ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set))) but I can't really justify that on first principles. I think we should apply the rule consistently until we have a specific reason for doing otherwise. gcc/ PR rtl-optimization/96796 * lra-constraints.c (in_class_p): Add a default-false allow_all_reload_class_changes_p parameter. Do not treat reload moves specially when the parameter is true. (get_reload_reg): Try to narrow the class of an existing OP_OUT reload if we're reloading a reload pseudo in a reload instr
[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 --- Comment #7 from rsandifo at gcc dot gnu.org --- Created attachment 49149 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49149&action=edit Posted patch
[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #6 from rsandifo at gcc dot gnu.org --- (In reply to Alex Coplan from comment #5) > Started with this change: > https://gcc.gnu.org/git/?p=gcc.git;a=commit; > h=8eaff6ef97836100801f7b40dc03f77fbebe03ac Ah, yeah. What the patch does looks good, but it seems to be exposing a latent problem with subreg reloads. The cycling starts with: Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI] Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287 Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288 103: r203:SI=r288:SI<<0x1+r196:DI#0 REG_DEAD r196:DI Inserting slow/invalid mem reload before: 316: r287:DI=[r105:DI*0x8+r140:DI] 317: r288:SI=r287:DI#0 where we now (IMO justifiably) have two reload moves, one for the memory load and one for the subreg. Next we have: Changing pseudo 196 in operand 3 of insn 103 on equiv [r105:DI*0x8+r140:DI] Reuse r287 for reload [r105:DI*0x8+r140:DI], change to class POINTER_AND_FP_REGS for r287 Reuse r288 for reload r287:DI#0, change to class POINTER_AND_FP_REGS for r288 1 Non pseudo reload: reject++ 3 Non pseudo reload: reject++ alt=0,overall=2,losers=0,rld_nregs=0 Choosing alt 0 in insn 103: (0) =r (1) r (2) n (3) r {*add_lsl_si} Change to class GENERAL_REGS for r288 POINTER_AND_FP_REGS is the class that aarch64 prefers for the reload, again IMO justifiably. This then gets narrowed to GENERAL_REGS for the main reload register (r288) because of the use in the *add_lsl_si instruction. But we're then left with a situation in which r287 has class POINTER_AND_FP_REGS and is only used in moves. In practice, each move alternative will require either POINTER_REGS or FP_REGS, but there's nothing to pin r287 down to a particular one, and we end up oscillating between them. More specifically, we reload insn 316 as follows: Choosing alt 7 in insn 316: (0) r (1) m {*movdi_aarch64} Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289 316: r289:DI=[r105:DI*0x8+r140:DI] Inserting insn reload after: 318: r287:DI=r289:DI Here we've effectively chosen to use GENERAL_REGS for the r287 reload, but made the choice via a new reload register (r289). Next we do: Choosing alt 13 in insn 318: (0) w (1) rZ {*movdi_aarch64} Creating newreg=290 from oldreg=287, assigning class FP_REGS to r290 318: r290:DI=r289:DI Inserting insn reload after: 319: r287:DI=r290:DI Here we've eschewed the r<-r alternative because of the risk of cycling, so this time we've effectively chosen to use FP_REGS for r287 (instead of GENERAL_REGS as above). This choice too is made via a new reload register (r290). We manage to break a potential cycle here, but we've still left r287 as POINTER_AND_FP_REGS. Next we move on to the second of the original two reload instructions: Choosing alt 13 in insn 317: (0) r (1) w {*movsi_aarch64} Creating newreg=291, assigning class FP_REGS to r291 317: r288:SI=r291:SI Inserting insn reload before: 320: r291:SI=r287:DI#0 Here too we've rejected r<-r because of potential cycling, and so have effectively chosen to put r287 in FP_REGS. The “problem” is that this time we've reloaded the subreg input rather than the register output, and so we have the same problem when reloading the subreg the next time round. IMO the handling of the first reload shows that it would be better to restrict the class of r287 rather than generate a new reload register r289. Doing that might then require a reload in the uses of r287, but that might happen anyway, since the new class would still be a subset of the old class, and so any register chosen for
[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 --- Comment #5 from Alex Coplan --- Started with this change: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8eaff6ef97836100801f7b40dc03f77fbebe03ac
[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 ktkachov at gcc dot gnu.org changed: What|Removed |Added Keywords||ice-on-valid-code Known to fail||10.1.1, 11.0 Summary|[9 Regression] aarch64: ICE |[9/10/11 Regression] |during RTL pass: reload |aarch64: ICE during RTL ||pass: reload Known to work|10.1.1, 11.0| --- Comment #4 from ktkachov at gcc dot gnu.org --- Updating regression markers then