[PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
Good catch. I agree with Hongtao that although my testing revealed no problems with the previous version of this patch, it makes sense to call gen_reg_rtx to generate an pseudo intermediate instead of attempting to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an intermediate. I've left the existing behaviour the same, so that memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx. This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-10-13 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_vector_move): Use a pseudo intermediate when moving a SUBREG into a hard register, by checking ix86_hardreg_mov_ok. (ix86_expand_vector_extract): Store zero-extended SImode intermediate in a pseudo, then set target using a SUBREG_PROMOTED annotated subreg. * config/i386/sse.md (mov_internal): Prevent CSE creating complex (SUBREG) sets of (vector) hard registers before reload, by checking ix86_hardreg_mov_ok. Thanks, Roger -Original Message- From: Hongtao Liu Sent: 11 October 2021 12:29 To: Roger Sayle Cc: GCC Patches Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend. On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle wrote: > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > pseudo intermediate when moving a SUBREG into a hard register, > by checking ix86_hardreg_mov_ok. /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0, + mode) + && !register_operand (op1, mode { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register. -- BR, Hongtao diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 3e6f7d8e..4a8fa2f 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -615,6 +615,16 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) return; } + /* If operand0 is a hard register, make operand1 a pseudo. */ + if (can_create_pseudo_p () + && !ix86_hardreg_mov_ok (op0, op1)) +{ + rtx tmp = gen_reg_rtx (GET_MODE (op0)); + emit_move_insn (tmp, op1); + emit_move_insn (op0, tmp); + return; +} + /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () && !register_operand (op0, mode) @@ -16005,11 +16015,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) /* Let the rtl optimizers know about the zero extension performed. */ if (inner_mode == QImode || inner_mode == HImode) { + rtx reg = gen_reg_rtx (SImode); tmp = gen_rtx_ZERO_EXTEND (SImode, tmp); - target = gen_lowpart (SImode, target); + emit_move_insn (reg, tmp); + tmp = gen_lowpart (inner_mode, reg); + SUBREG_PROMOTED_VAR_P (tmp) = 1; + SUBREG_PROMOTED_SET (tmp, 1); } - emit_insn (gen_rtx_SET (target, tmp)); + emit_move_insn (target, tmp); } else { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4559b0c..e43f597 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1270,7 +1270,8 @@ " C,,vm,v"))] "TARGET_SSE && (register_operand (operands[0], mode) - || register_operand (operands[1], mode))" + || register_operand (operands[1], mode)) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) {
Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle wrote: > > > Good catch. I agree with Hongtao that although my testing revealed > no problems with the previous version of this patch, it makes sense to > call gen_reg_rtx to generate an pseudo intermediate instead of attempting > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an > intermediate. I've left the existing behaviour the same, so that > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx. > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. > > Ok for mainline? > > > 2021-10-13 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > pseudo intermediate when moving a SUBREG into a hard register, > by checking ix86_hardreg_mov_ok. > (ix86_expand_vector_extract): Store zero-extended SImode > intermediate in a pseudo, then set target using a SUBREG_PROMOTED > annotated subreg. > * config/i386/sse.md (mov_internal): Prevent CSE creating > complex (SUBREG) sets of (vector) hard registers before reload, by > checking ix86_hardreg_mov_ok. OK. Thanks, Uros. > > Thanks, > Roger > > -Original Message- > From: Hongtao Liu > Sent: 11 October 2021 12:29 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 > backend. > > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle > wrote: > > gcc/ChangeLog > > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > > pseudo intermediate when moving a SUBREG into a hard register, > > by checking ix86_hardreg_mov_ok. > >/* Make operand1 a register if it isn't already. */ >if (can_create_pseudo_p () > - && !register_operand (op0, mode) > - && !register_operand (op1, mode)) > + && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0, > + mode) > + && !register_operand (op1, mode > { >rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); > > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want > a pseudo register. > > -- > BR, > Hongtao >
Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
On Wed, Oct 13, 2021 at 2:08 AM Uros Bizjak via Gcc-patches wrote: > > On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle > wrote: > > > > > > Good catch. I agree with Hongtao that although my testing revealed > > no problems with the previous version of this patch, it makes sense to > > call gen_reg_rtx to generate an pseudo intermediate instead of attempting > > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an > > intermediate. I've left the existing behaviour the same, so that > > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx. > > > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > > and "make -k check" with no new failures. > > > > Ok for mainline? > > > > > > 2021-10-13 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > > pseudo intermediate when moving a SUBREG into a hard register, > > by checking ix86_hardreg_mov_ok. > > (ix86_expand_vector_extract): Store zero-extended SImode > > intermediate in a pseudo, then set target using a SUBREG_PROMOTED > > annotated subreg. > > * config/i386/sse.md (mov_internal): Prevent CSE creating > > complex (SUBREG) sets of (vector) hard registers before reload, by > > checking ix86_hardreg_mov_ok. > > OK. > > Thanks, > Uros. > > > > > Thanks, > > Roger > > > > -Original Message- > > From: Hongtao Liu > > Sent: 11 October 2021 12:29 > > To: Roger Sayle > > Cc: GCC Patches > > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to > > i386 backend. > > > > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle > > wrote: > > > gcc/ChangeLog > > > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > > > pseudo intermediate when moving a SUBREG into a hard register, > > > by checking ix86_hardreg_mov_ok. > > > >/* Make operand1 a register if it isn't already. */ > >if (can_create_pseudo_p () > > - && !register_operand (op0, mode) > > - && !register_operand (op1, mode)) > > + && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0, > > + mode) > > + && !register_operand (op1, mode > > { > >rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); > > > > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you > > want a pseudo register. > > > > -- > > BR, > > Hongtao > > This caused: https://gcc.gnu.org/pipermail/gcc-regression/2021-October/075498.html FAIL: gcc.target/i386/avx-1.c (internal compiler error) FAIL: gcc.target/i386/avx-1.c (test for excess errors) FAIL: gcc.target/i386/avx-2.c (internal compiler error) FAIL: gcc.target/i386/avx-2.c (test for excess errors) FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error) FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors) FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error) FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors) FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error) FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors) FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error) FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors) FAIL: gcc.target/i386/sse-13.c (internal compiler error) FAIL: gcc.target/i386/sse-13.c (test for excess errors) FAIL: gcc.target/i386/sse-14.c (internal compiler error) FAIL: gcc.target/i386/sse-14.c (test for excess errors) FAIL: gcc.target/i386/sse-22a.c (internal compiler error) FAIL: gcc.target/i386/sse-22a.c (test for excess errors) FAIL: gcc.target/i386/sse-22.c (internal compiler error) FAIL: gcc.target/i386/sse-22.c (test for excess errors) FAIL: gcc.target/i386/sse-23.c (internal compiler error) FAIL: gcc.target/i386/sse-23.c (test for excess errors) FAIL: gcc.target/i386/sse-24.c (internal compiler error) FAIL: gcc.target/i386/sse-24.c (test for excess errors) FAIL: gcc.target/i386/sse-25.c (internal compiler error) FAIL: gcc.target/i386/sse-25.c (test for excess errors) FAIL: gcc.target/i386/sse-26.c (internal compiler error) FAIL: gcc.target/i386/sse-26.c (test for excess errors) You can reproduce them by adding -march=cascadelake to these tests. -- H.J.