On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang <hongyu.w...@intel.com> wrote: > > > > Hi, > > > > For lea + zero_extendsidi insns, if dest of lea and src of zext are the > > same, combine them with single leal under 64bit target since 32bit > > register will be automatically zero-extended. > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for master? > > > > gcc/ChangeLog: > > > > PR target/101716 > > * config/i386/i386.md (*lea<mode>_zext): New define_insn. > > (define_peephole2): New peephole2 to combine zero_extend > > with lea. > > > > gcc/testsuite/ChangeLog: > > > > PR target/101716 > > * gcc.target/i386/pr101716.c: New test. > > This form should be covered by ix86_decompose_address via > address_no_seg_operand predicate. Combine creates: > > Trying 6 -> 7: > 6: {r86:DI=r87:DI<<0x1;clobber flags:CC;} > REG_DEAD r87:DI > REG_UNUSED flags:CC > 7: r85:DI=zero_extend(r86:DI#0) > REG_DEAD r86:DI > Failed to match this instruction: > (set (reg:DI 85) > (and:DI (ashift:DI (reg:DI 87) > (const_int 1 [0x1])) > (const_int 4294967294 [0xfffffffe]))) > > which does not fit: > > else if (GET_CODE (addr) == AND > && const_32bit_mask (XEXP (addr, 1), DImode)) > > After reload, we lose SUBREG, so REE does not trigger on: > > (insn 17 3 7 2 (set (reg:DI 0 ax [86]) > (mult:DI (reg:DI 5 di [87]) > (const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi} > (nil)) > (insn 7 17 13 2 (set (reg:DI 0 ax [85]) > (zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136 > {*zero_extendsidi2} > (nil)) > > So, the question is if the combine pass really needs to zero-extend > with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so > 0xffffffff should be better and in line with canonical zero-extension > RTX.
Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the call in ix86_legitimate_address_p) for some (historic?) reason. It looks to me that this restriction is not necessary, since ix86_legitimize_address can canonicalize ASHIFT RTXes without problems. The attached patch that survives bootstrap and regtest can help in your case. Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4d4ab6a03d6..9395716dd60 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -10018,8 +10018,7 @@ ix86_live_on_entry (bitmap regs) /* Extract the parts of an RTL expression that is a valid memory address for an instruction. Return 0 if the structure of the address is - grossly off. Return -1 if the address contains ASHIFT, so it is not - strictly valid, but still used for computing length of lea instruction. */ + grossly off. */ int ix86_decompose_address (rtx addr, struct ix86_address *out) @@ -10029,7 +10028,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) HOST_WIDE_INT scale = 1; rtx scale_rtx = NULL_RTX; rtx tmp; - int retval = 1; addr_space_t seg = ADDR_SPACE_GENERIC; /* Allow zero-extended SImode addresses, @@ -10179,7 +10177,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) if ((unsigned HOST_WIDE_INT) scale > 3) return 0; scale = 1 << scale; - retval = -1; } else disp = addr; /* displacement */ @@ -10252,7 +10249,7 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) out->scale = scale; out->seg = seg; - return retval; + return 1; } /* Return cost of the memory address x. @@ -10765,7 +10762,7 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict) HOST_WIDE_INT scale; addr_space_t seg; - if (ix86_decompose_address (addr, &parts) <= 0) + if (ix86_decompose_address (addr, &parts) == 0) /* Decomposition failed. */ return false;