Hi Jeff, Thanks for the speedy review(s).
> From: Jeff Law <jeffreya...@gmail.com> > Sent: 15 October 2023 00:03 > To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in > make_compound_operation. > > On 10/14/23 16:14, Roger Sayle wrote: > > > > This patch is my proposed solution to PR rtl-optimization/91865. > > Normally RTX simplification canonicalizes a ZERO_EXTEND of a > > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is > > possible for combine's make_compound_operation to unintentionally > > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is > > unlikely to be matched by the backend. > > > > For the new test case: > > > > const int table[2] = {1, 2}; > > int foo (char i) { return table[i]; } > > > > compiling with -O2 -mlarge on msp430 we currently see: > > > > Trying 2 -> 7: > > 2: r25:HI=zero_extend(R12:QI) > > REG_DEAD R12:QI > > 7: r28:PSI=sign_extend(r25:HI)#0 > > REG_DEAD r25:HI > > Failed to match this instruction: > > (set (reg:PSI 28 [ iD.1772 ]) > > (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) > > > > which results in the following code: > > > > foo: AND #0xff, R12 > > RLAM.A #4, R12 { RRAM.A #4, R12 > > RLAM.A #1, R12 > > MOVX.W table(R12), R12 > > RETA > > > > With this patch, we now see: > > > > Trying 2 -> 7: > > 2: r25:HI=zero_extend(R12:QI) > > REG_DEAD R12:QI > > 7: r28:PSI=sign_extend(r25:HI)#0 > > REG_DEAD r25:HI > > Successfully matched this instruction: > > (set (reg:PSI 28 [ iD.1772 ]) > > (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing > > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement > > cost 8 > > > > foo: MOV.B R12, R12 > > RLAM.A #1, R12 > > MOVX.W table(R12), R12 > > RETA > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > 2023-10-14 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR rtl-optimization/91865 > > * combine.cc (make_compound_operation): Avoid creating a > > ZERO_EXTEND of a ZERO_EXTEND. > > > > gcc/testsuite/ChangeLog > > PR rtl-optimization/91865 > > * gcc.target/msp430/pr91865.c: New test case. > Neither an ACK or NAK at this point. > > The bug report includes a patch from Segher which purports to fix this in > simplify- > rtx. Any thoughts on Segher's approach and whether or not it should be > considered? > > The BZ also indicates that removal of 2 patterns from msp430.md would solve > this > too (though it may cause regressions elsewhere?). Any thoughts on that > approach > as well? > Great questions. I believe Segher's proposed patch (in comment #4) was an msp430-specific proof-of-concept workaround rather than intended to be fix. Eliminating a ZERO_EXTEND simply by changing the mode of a hard register is not a solution that'll work on many platforms (and therefore not really suitable for target-independent middle-end code in the RTL optimizers). For example, zero_extend:TI (and:QI (reg:QI hard_r1) (const_int 0x0f)) can't universally be reduced to (and:TI (reg:TI hard_r1) (const_int 0x0f)). Notice that Segher's code doesn't check TARGET_HARD_REGNO_MODE_OK or TARGET_MODES_TIEABLE_P or any of the other backend hooks necessary to confirm such a transformation is safe/possible. Secondly, the hard register aspect is a bit of a red herring. This work-around fixes the issue in the original BZ description, but not the slightly modified test case in comment #2 (with a global variable). This doesn't have a hard register, but does have the dubious ZERO_EXTEND/SIGN_EXTEND of a ZERO_EXTEND. The underlying issue, which is applicable to all targets, is that combine.cc's make_compound_operation is expected to reverse the local transformations made by expand_compound_operation. Hence, if an RTL expression is canonical going into expand_compound_operation, it is expected (hoped) to be canonical (and equivalent) coming out of make_compound_operation. Hence, rather than be a MSP430 specific issue, no target should expect (or be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND of a ZERO_EXTEND in the RTL stream. Much like a binary operator with two CONST_INT operands, or a shift by zero, it's something the middle-end might reasonably be expected to clean-up. [Yeah, I know... 😊] However, if it isn't considered a problem that make_compound_operand and simplify_rtx generate different forms of the same expression, requiring the backend to match multiple patterns (some for the combine pass, and others for other RTL passes), there is a simple fix for MSP430; Just add a pattern that matches (the slightly odd): > > (set (reg:PSI 28 [ iD.1772 ]) > > (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) As a rule of thumb, if the missed optimization bug report includes combine's diagnostic "Failed to match this instruction:", things can be improved by adding a pattern (often a define_insn_and_split) that matches the shown RTL. In this case the perhaps reasonable assumption is that the above would/should (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern. Or that's my understanding of why this PR is classified as rtl-optimization and not target. Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch or other solution to PR 91865. The more (safe) solutions the merrier. Thanks again.