Just to add a bit more color on this one... It was originally observed (and isolated from) _ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_ and reproduces both for AArch64 and RISC-V.
The basic block (annotated with dynamic instructions executed and percentage of total dynamic instructions) looks as follows: > 0x0000000000511488 4589868875 0.4638% > _ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_ > 4518 lw a4,8(a0) > 0017029b addiw t0,a4,1 > 00552423 sw t0,8(a0) > 4685 addi a3,zero,1 > 00d28363 beq t0,a3,6 # 0x51149a This change reduces the instruction count on RISC-V by one compressible instruction (2 bytes) and on AArch64 by one instruction (4 bytes). No execution time improvement (measured on Neoverse-N1) — as would be expected. --Philipp. On Thu, 16 Mar 2023 at 17:41, Jeff Law <jeffreya...@gmail.com> wrote: > > > On 3/16/23 09:27, Manolis Tsamis wrote: > > For this C testcase: > > > > void g(); > > void f(unsigned int *a) > > { > > if (++*a == 1) > > g(); > > } > > > > GCC will currently emit a comparison with 1 by using the value > > of *a after the increment. This can be improved by comparing > > against 0 and using the value before the increment. As a result > > there is a potentially shorter dependancy chain (no need to wait > > for the result of +1) and on targets with compare zero instructions > > the generated code is one instruction shorter. > > > > Example from Aarch64: > > > > Before > > ldr w1, [x0] > > add w1, w1, 1 > > str w1, [x0] > > cmp w1, 1 > > beq .L4 > > ret > > > > After > > ldr w1, [x0] > > add w2, w1, 1 > > str w2, [x0] > > cbz w1, .L4 > > ret > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.cc (combine_cond_expr_cond): > > (forward_propagate_into_comparison_1): Optimize > > for zero comparisons. > Deferring to gcc-14. Though I'm generally supportive of normalizing to > a comparison against zero when we safely can :-) > > jeff >