On Tue, Feb 16, 2021 at 12:05 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Feb 16, 2021 at 11:47:51AM +0100, Uros Bizjak wrote: > > > In this case the match_scratch wouldn't work, since CC_REGNUM is fixed. > > > But as you said on irc, there's peep2_regno_dead_p instead. > > > > > > Haven't tried it and so don't know whether it would work with only > > > one construct in the first […]. But it seems like it would be a better > > > fit, since peephole2 is designed to have up-to-date register information. > > > > > > Uros, does that sound reasonable, or is it a non-starter? > > > > We have some splits after peephole2 on x86, e.g. "convert impossible > > pushes of immediate". There we try to get scratch register with > > define_peephole2 and if this fails (or when peephole2 pass is not ran > > at all with -O0), define_split after peephole2 splits the impossible > > insn to some less optimal sequence. > > > > Another usage of late split pass is SSE partial register dependency > > stall that should run after allocated registers won't change anymore. > > > > Please also note that peephole2 pass doesn't run with -O0. > > True, but it seems these ix86_avoid_lea_for_add{,r} splitters are only > an optimization (mostly for the Atom/Bonell like CPUs). > And actually looking at it, there are 2 separate cases. > > One is ix86_avoid_lea_for_add where I see no reason to check that > flags are dead, because it is used in splitters: > (define_split > [(set (match_operand:SWI48 0 "register_operand") > (plus:SWI48 (match_operand:SWI48 1 "register_operand") > (match_operand:SWI48 2 "x86_64_nonmemory_operand"))) > (clobber (reg:CC FLAGS_REG))] > "reload_completed && ix86_avoid_lea_for_add (insn, operands)" > ... > (define_split > [(set (match_operand:DI 0 "register_operand") > (zero_extend:DI > (plus:SI (match_operand:SI 1 "register_operand") > (match_operand:SI 2 "x86_64_nonmemory_operand")))) > (clobber (reg:CC FLAGS_REG))] > "TARGET_64BIT > && reload_completed && ix86_avoid_lea_for_add (insn, operands)" > where the splitter will only match if the flags reg is dead at that point. > So, I think we can just keep those as splitters and just don't call the > function that makes no sense in that case.
Agreed. > And then there is the ix86_avoid_lea_for_addr case, where indeed we > need to check if flags reg is dead at that point (i.e. if we can clobber > it). But the define_peephole2 pass seems to have much better infrastructure > for that, the walking of the rest of the bb can be quadratic. > > The splitting in that case when optimizing (nobody should care about code > performance microoptimizations with -O0) could happen during split2 and > split3 passes (when people don't disable sched2 or enable sel sched - the > latter only with a few days old trunk): > > NEXT_PASS (pass_split_after_reload); > NEXT_PASS (pass_ree); > NEXT_PASS (pass_compare_elim_after_reload); > NEXT_PASS (pass_thread_prologue_and_epilogue); > NEXT_PASS (pass_rtl_dse2); > NEXT_PASS (pass_stack_adjustments); > NEXT_PASS (pass_jump2); > NEXT_PASS (pass_duplicate_computed_gotos); > NEXT_PASS (pass_sched_fusion); > NEXT_PASS (pass_peephole2); > NEXT_PASS (pass_if_after_reload); > NEXT_PASS (pass_regrename); > NEXT_PASS (pass_cprop_hardreg); > NEXT_PASS (pass_fast_rtl_dce); > NEXT_PASS (pass_reorder_blocks); > NEXT_PASS (pass_leaf_regs); > NEXT_PASS (pass_split_before_sched2); > NEXT_PASS (pass_sched2); > > I think the passes between split2 and peephole2 won't really benefit > from the splitting being done earlier, and I doubt the passes between > peephole2 and split3 will often create new opportunities to split. > And during/after sched2 we weren't splitting anymore. OK. > So, I think doing following would be best: > > 2021-02-16 Jakub Jelinek <ja...@redhat.com> > > PR target/99104 > * config/i386/i386.c (ix86_ok_to_clobber_flags): Remove. > (ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags. > (ix86_avoid_lea_for_addr): Likewise. Adjust function comment. > * config/i386/i386.m (*lea<mode>): Change from define_insn_and_split > into define_insn. Move the splitting to define_peephole2 and > check there using peep2_regno_dead_p if FLAGS_REG is dead. > > * gcc.dg/pr99104.c: New test. OK. Please see a small comment inline. Thanks, Uros. > > --- gcc/config/i386/i386.c.jj 2021-02-16 08:58:55.197890195 +0100 > +++ gcc/config/i386/i386.c 2021-02-16 11:45:44.233204555 +0100 > @@ -14968,38 +14968,6 @@ ix86_lea_outperforms (rtx_insn *insn, un > return dist_define >= dist_use; > } > > -/* Return true if it is legal to clobber flags by INSN and > - false otherwise. */ > - > -static bool > -ix86_ok_to_clobber_flags (rtx_insn *insn) > -{ > - basic_block bb = BLOCK_FOR_INSN (insn); > - df_ref use; > - bitmap live; > - > - while (insn) > - { > - if (NONDEBUG_INSN_P (insn)) > - { > - FOR_EACH_INSN_USE (use, insn) > - if (DF_REF_REG_USE_P (use) && DF_REF_REGNO (use) == FLAGS_REG) > - return false; > - > - if (insn_defines_reg (FLAGS_REG, INVALID_REGNUM, insn)) > - return true; > - } > - > - if (insn == BB_END (bb)) > - break; > - > - insn = NEXT_INSN (insn); > - } > - > - live = df_get_live_out(bb); > - return !REGNO_REG_SET_P (live, FLAGS_REG); > -} > - > /* Return true if we need to split op0 = op1 + op2 into a sequence of > move and add to avoid AGU stalls. */ > > @@ -15012,10 +14980,6 @@ ix86_avoid_lea_for_add (rtx_insn *insn, > if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun)) > return false; > > - /* Check it is correct to split here. */ > - if (!ix86_ok_to_clobber_flags(insn)) > - return false; > - > regno0 = true_regnum (operands[0]); > regno1 = true_regnum (operands[1]); > regno2 = true_regnum (operands[2]); > @@ -15051,7 +15015,7 @@ ix86_use_lea_for_mov (rtx_insn *insn, rt > } > > /* Return true if we need to split lea into a sequence of > - instructions to avoid AGU stalls. */ > + instructions to avoid AGU stalls during peephole2. */ > > bool > ix86_avoid_lea_for_addr (rtx_insn *insn, rtx operands[]) > @@ -15071,10 +15035,6 @@ ix86_avoid_lea_for_addr (rtx_insn *insn, > && REG_P (XEXP (operands[1], 0)))) > return false; > > - /* Check if it is OK to split here. */ > - if (!ix86_ok_to_clobber_flags (insn)) > - return false; > - > ok = ix86_decompose_address (operands[1], &parts); > gcc_assert (ok); > > --- gcc/config/i386/i386.md.jj 2021-01-13 10:12:07.036506101 +0100 > +++ gcc/config/i386/i386.md 2021-02-16 11:43:27.666737790 +0100 > @@ -5176,7 +5176,7 @@ (define_expand "floatunsdidf2" > > ;; Load effective address instructions > > -(define_insn_and_split "*lea<mode>" > +(define_insn "*lea<mode>" > [(set (match_operand:SWI48 0 "register_operand" "=r") > (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))] > "ix86_hardreg_mov_ok (operands[0], operands[1])" > @@ -5189,7 +5189,19 @@ (define_insn_and_split "*lea<mode>" > else > return "lea{<imodesuffix>}\t{%E1, %0|%0, %E1}"; > } > - "reload_completed && ix86_avoid_lea_for_addr (insn, operands)" > + [(set_attr "type" "lea") > + (set (attr "mode") > + (if_then_else > + (match_operand 1 "SImode_address_operand") > + (const_string "SI") > + (const_string "<MODE>")))]) > + > +(define_peephole2 > + [(set (match_operand:SWI48 0 "register_operand") > + (match_operand:SWI48 1 "address_no_seg_operand"))] > + "ix86_hardreg_mov_ok (operands[0], operands[1]) > + && peep2_regno_dead_p (0, FLAGS_REG) > + && ix86_avoid_lea_for_addr (peep2_next_insn (0), operands)" > [(const_int 0)] > { > machine_mode mode = <MODE>mode; > @@ -5197,7 +5209,7 @@ (define_insn_and_split "*lea<mode>" > > /* ix86_avoid_lea_for_addr re-recognizes insn and may > change operands[] array behind our back. */ > - pat = PATTERN (curr_insn); > + pat = PATTERN (peep2_next_insn (0)); I remember there were several problems with the insn, so I introduced the above hack. Probably, there is better solution for this... > operands[0] = SET_DEST (pat); > operands[1] = SET_SRC (pat); > @@ -5206,7 +5218,7 @@ (define_insn_and_split "*lea<mode>" > if (SImode_address_operand (operands[1], VOIDmode)) > mode = SImode; > > - ix86_split_lea_for_addr (curr_insn, operands, mode); > + ix86_split_lea_for_addr (peep2_next_insn (0), operands, mode); > > /* Zero-extend return register to DImode for zero-extended addresses. */ > if (mode != <MODE>mode) > @@ -5214,13 +5226,7 @@ (define_insn_and_split "*lea<mode>" > (operands[0], gen_lowpart (mode, operands[0]))); > > DONE; > -} > - [(set_attr "type" "lea") > - (set (attr "mode") > - (if_then_else > - (match_operand 1 "SImode_address_operand") > - (const_string "SI") > - (const_string "<MODE>")))]) > +}) > > ;; Add instructions > > --- gcc/testsuite/gcc.dg/pr99104.c.jj 2021-02-16 11:47:08.793255200 +0100 > +++ gcc/testsuite/gcc.dg/pr99104.c 2021-02-16 11:47:08.793255200 +0100 > @@ -0,0 +1,15 @@ > +/* PR target/99104 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O2 -fsel-sched-pipelining -fselective-scheduling2 > -funroll-loops" } */ > + > +__int128 a; > +int b; > +int foo (void); > + > +int __attribute__ ((simd)) > +bar (void) > +{ > + a = ~a; > + if (foo ()) > + b = 0; > +} > > > Jakub >