Hi! The following patch fixes two issues in the new rpad pass. One is that the insertion at the start of a basic block didn't work properly if the basic block didn't contain any non-NOTE/non-DEBUG_INSN instructions. next_nonnote_nondebug_insn hapilly turns through into another basic block and the insertion can insert an instruction in between basic blocks or into a different basic block from where we wanted to emit it. I believe we want to emit it after CODE_LABEL, after NOTE_INSN_BASIC_BLOCK and if possible, after debug insns in there, the patch emits it before the first normal insn in the bb if any, or after the BB_END (thus extending BB_END).
Another issue is that it is quite weird/dangerous to add the v4sf_const0 pseudo uses in lots of places in the IL, register those changes with df, then do df_analyze with different flags and finally emit the setter. I understand the goal was not to do df_analyze etc. in the usual case where there are no instructions that need this treatment. This patch does the df_analyze at the spot we find the first insn, but before we actually change that instruction, so the changes are after the df_analyze. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-02-25 Jakub Jelinek <ja...@redhat.com> PR target/89474 * config/i386/i386.c (remove_partial_avx_dependency): Call df_analyze etc. before creation of the v4sf_const0 pseudo, rather than after changing possibly many instructions to use that pseudo. Fix up insertion of v4sf_const0 setter at the start of bb. * gcc.target/i386/pr89474.c: New test. --- gcc/config/i386/i386.c.jj 2019-02-22 23:02:47.805117610 +0100 +++ gcc/config/i386/i386.c 2019-02-25 14:20:05.793608879 +0100 @@ -2835,7 +2835,14 @@ remove_partial_avx_dependency (void) continue; if (!v4sf_const0) - v4sf_const0 = gen_reg_rtx (V4SFmode); + { + calculate_dominance_info (CDI_DOMINATORS); + df_set_flags (DF_DEFER_INSN_RESCAN); + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); + df_md_add_problem (); + df_analyze (); + v4sf_const0 = gen_reg_rtx (V4SFmode); + } /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF, SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and @@ -2883,12 +2890,6 @@ remove_partial_avx_dependency (void) if (v4sf_const0) { - calculate_dominance_info (CDI_DOMINATORS); - df_set_flags (DF_DEFER_INSN_RESCAN); - df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); - df_md_add_problem (); - df_analyze (); - /* (Re-)discover loops so that bb->loop_father can be used in the analysis below. */ loop_optimizer_init (AVOID_CFG_MODIFICATIONS); @@ -2904,11 +2905,23 @@ remove_partial_avx_dependency (void) bb = get_immediate_dominator (CDI_DOMINATORS, bb->loop_father->header); - insn = BB_HEAD (bb); - if (!NONDEBUG_INSN_P (insn)) - insn = next_nonnote_nondebug_insn (insn); set = gen_rtx_SET (v4sf_const0, CONST0_RTX (V4SFmode)); - set_insn = emit_insn_before (set, insn); + + insn = BB_HEAD (bb); + while (insn && !NONDEBUG_INSN_P (insn)) + { + if (insn == BB_END (bb)) + { + insn = NULL; + break; + } + insn = NEXT_INSN (insn); + } + if (insn == BB_HEAD (bb)) + set_insn = emit_insn_before (set, insn); + else + set_insn = emit_insn_after (set, + insn ? PREV_INSN (insn) : BB_END (bb)); df_insn_rescan (set_insn); df_process_deferred_rescans (); loop_optimizer_finalize (); --- gcc/testsuite/gcc.target/i386/pr89474.c.jj 2019-02-25 14:21:51.651867104 +0100 +++ gcc/testsuite/gcc.target/i386/pr89474.c 2019-02-25 14:21:34.373151405 +0100 @@ -0,0 +1,14 @@ +/* PR target/89474 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +int a; +void foo (double); +int baz (void); + +void +bar (void) +{ + while (baz ()) + foo (a); +} Jakub