Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 11:26 AM Uros Bizjak wrote: > > On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle > wrote: > > > > > > Hi Uros, > > Would you consider the following variant that disables this optimization > > when a > > red zone is used by the current function? You're right that cfun's > > red_zone_size is > > recalculated dynamically, but ix86_red_zone_used should be a better "gate" > > given > > that this logic resides very late during compilation, in the output > > templates, where > > whether or not a red zone is used is known. > > > > On CSiBE, disabling this optimization in non-leaf functions that use a red > > zone costs > > 219 bytes, but remains a significant win over -Os. (Alas the absolute > > numbers aren't > > comparable as this testing included the 0/-1 write to memory changes). > > > > Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k > > check > > with no new failures. > > > > 2021-12-22 Roger Sayle > > > > gcc/ChangeLog > > PR target/103773 > > * config/i386/i386.md (*movdi_internal): Only use short > > push/pop sequence for register (non-memory) destinations > > when the current function doesn't make use of a red zone. > > (*movsi_internal): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/103773 > > * gcc.target/i386/pr103773.c: New test case. > > > > Please let me know what you think. I'll revert, if this tweak doesn't > > address > > your concerns. > > Yes, using ix86_red_zone_used looks safe. > > OTOH, is there a reason the transformation is not implemented via > peephole2 pass? IIRC, frame is stable after pro_and_epilogue_pass, and > peephole2 pass is instanced well after register allocation. Something like the attached patch. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 58b10643fcb..e5d603f0025 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2514,6 +2514,24 @@ ] (symbol_ref "true")))]) +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "const_int_operand"))] + "optimize_insn_for_size_p () && optimize_size > 1 + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !ix86_red_zone_used" + [(set (match_dup 2) (match_dup 1)) + (set (match_dup 0) (match_dup 3))] +{ + if (GET_MODE (operands[0]) != word_mode) +operands[0] = gen_rtx_REG (word_mode, REGNO (operands[0])); + + operands[2] = gen_rtx_MEM (word_mode, +gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx)); + operands[3] = gen_rtx_MEM (word_mode, +gen_rtx_POST_INC (Pmode, stack_pointer_rtx)); +}) + (define_insn "*movhi_internal" [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle wrote: > > > Hi Uros, > Would you consider the following variant that disables this optimization when > a > red zone is used by the current function? You're right that cfun's > red_zone_size is > recalculated dynamically, but ix86_red_zone_used should be a better "gate" > given > that this logic resides very late during compilation, in the output > templates, where > whether or not a red zone is used is known. > > On CSiBE, disabling this optimization in non-leaf functions that use a red > zone costs > 219 bytes, but remains a significant win over -Os. (Alas the absolute > numbers aren't > comparable as this testing included the 0/-1 write to memory changes). > > Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k > check > with no new failures. > > 2021-12-22 Roger Sayle > > gcc/ChangeLog > PR target/103773 > * config/i386/i386.md (*movdi_internal): Only use short > push/pop sequence for register (non-memory) destinations > when the current function doesn't make use of a red zone. > (*movsi_internal): Likewise. > > gcc/testsuite/ChangeLog > PR target/103773 > * gcc.target/i386/pr103773.c: New test case. > > Please let me know what you think. I'll revert, if this tweak doesn't address > your concerns. Yes, using ix86_red_zone_used looks safe. OTOH, is there a reason the transformation is not implemented via peephole2 pass? IIRC, frame is stable after pro_and_epilogue_pass, and peephole2 pass is instanced well after register allocation. Uros.
RE: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
Hi Uros, Would you consider the following variant that disables this optimization when a red zone is used by the current function? You're right that cfun's red_zone_size is recalculated dynamically, but ix86_red_zone_used should be a better "gate" given that this logic resides very late during compilation, in the output templates, where whether or not a red zone is used is known. On CSiBE, disabling this optimization in non-leaf functions that use a red zone costs 219 bytes, but remains a significant win over -Os. (Alas the absolute numbers aren't comparable as this testing included the 0/-1 write to memory changes). Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. 2021-12-22 Roger Sayle gcc/ChangeLog PR target/103773 * config/i386/i386.md (*movdi_internal): Only use short push/pop sequence for register (non-memory) destinations when the current function doesn't make use of a red zone. (*movsi_internal): Likewise. gcc/testsuite/ChangeLog PR target/103773 * gcc.target/i386/pr103773.c: New test case. Please let me know what you think. I'll revert, if this tweak doesn't address your concerns. Roger -- > -Original Message- > From: Uros Bizjak > Sent: 22 December 2021 08:20 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to > memory. > > On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote: > > > > On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle > wrote: > > > > > > > > > My apologies for the inconvenience. The new support for -Oz using > > > push/pop for small integer constants on x86_64 is only a win/correct > > > for loading registers. Fixed by adding !MEM_P tests in the > > > appropriate locations. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make > > > bootstrap and make -k check with no new failures. Ok for mainline? > > > > > > > > > 2021-12-21 Roger Sayle > > > > > > gcc/ChangeLog > > > PR target/103773 > > > * config/i386/i386.md (*movdi_internal): Only use short > > > push/pop sequence for register (non-memory) destinations. > > > (*movsi_internal): Likewise. > > > > > > gcc/testsuite/ChangeLog > > > PR target/103773 > > > * gcc.target/i386/pr103773.c: New test case. > > > > Ouch, as pointed out in the PR, this approach clobbers the red zone. > > > > Please revert the original patch. > > *Maybe* we can use frame->red_zone_size here, but the frame is recalculated > several times during the compilation. I think it is just too dangerous to use > push/pop w.r.t. red zone clobbering. > > Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d25453f..489cede 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2217,7 +2217,9 @@ if (optimize_size > 1 && TARGET_64BIT && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0]) + && !ix86_red_zone_used) return "push{q}\t%1\n\tpop{q}\t%0"; return "mov{l}\t{%k1, %k0|%k0, %k1}"; } @@ -2440,7 +2442,9 @@ return "lea{l}\t{%E1, %0|%0, %E1}"; else if (optimize_size > 1 && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0]) + && !ix86_red_zone_used) { if (TARGET_64BIT) return "push{q}\t%1\n\tpop{q}\t%q0";
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote: > > On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle > wrote: > > > > > > My apologies for the inconvenience. The new support for -Oz using > > push/pop for small integer constants on x86_64 is only a win/correct > > for loading registers. Fixed by adding !MEM_P tests in the appropriate > > locations. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > > > > > 2021-12-21 Roger Sayle > > > > gcc/ChangeLog > > PR target/103773 > > * config/i386/i386.md (*movdi_internal): Only use short > > push/pop sequence for register (non-memory) destinations. > > (*movsi_internal): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/103773 > > * gcc.target/i386/pr103773.c: New test case. > > Ouch, as pointed out in the PR, this approach clobbers the red zone. > > Please revert the original patch. *Maybe* we can use frame->red_zone_size here, but the frame is recalculated several times during the compilation. I think it is just too dangerous to use push/pop w.r.t. red zone clobbering. Uros.
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle wrote: > > > My apologies for the inconvenience. The new support for -Oz using > push/pop for small integer constants on x86_64 is only a win/correct > for loading registers. Fixed by adding !MEM_P tests in the appropriate > locations. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-12-21 Roger Sayle > > gcc/ChangeLog > PR target/103773 > * config/i386/i386.md (*movdi_internal): Only use short > push/pop sequence for register (non-memory) destinations. > (*movsi_internal): Likewise. > > gcc/testsuite/ChangeLog > PR target/103773 > * gcc.target/i386/pr103773.c: New test case. Ouch, as pointed out in the PR, this approach clobbers the red zone. Please revert the original patch. Thanks, Uros. > > Roger > -- >
[PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
My apologies for the inconvenience. The new support for -Oz using push/pop for small integer constants on x86_64 is only a win/correct for loading registers. Fixed by adding !MEM_P tests in the appropriate locations. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2021-12-21 Roger Sayle gcc/ChangeLog PR target/103773 * config/i386/i386.md (*movdi_internal): Only use short push/pop sequence for register (non-memory) destinations. (*movsi_internal): Likewise. gcc/testsuite/ChangeLog PR target/103773 * gcc.target/i386/pr103773.c: New test case. Roger -- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d25453f..e596f8b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2217,7 +2217,8 @@ if (optimize_size > 1 && TARGET_64BIT && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0])) return "push{q}\t%1\n\tpop{q}\t%0"; return "mov{l}\t{%k1, %k0|%k0, %k1}"; } @@ -2440,7 +2441,8 @@ return "lea{l}\t{%E1, %0|%0, %E1}"; else if (optimize_size > 1 && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0])) { if (TARGET_64BIT) return "push{q}\t%1\n\tpop{q}\t%q0"; diff --git a/gcc/testsuite/gcc.target/i386/pr103773.c b/gcc/testsuite/gcc.target/i386/pr103773.c new file mode 100644 index 000..1e4b8ce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103773.c @@ -0,0 +1,12 @@ +/* { dg-do run } */ +/* { dg-options "-Oz" } */ + +unsigned long long x; + +int main (void) +{ + __builtin_memset (&x, 0xff, 4); + if (x != 0x) +__builtin_abort (); + return 0; +}