2016-06-21 15:48 GMT+03:00 H.J. Lu <hjl.to...@gmail.com>: > On Mon, Jun 20, 2016 at 11:53 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Mon, Jun 20, 2016 at 10:31 AM, Ilya Enkovich <enkovich....@gmail.com> >> wrote: >>> On 20 Jun 09:45, H.J. Lu wrote: >>>> On Mon, Jun 20, 2016 at 7:30 AM, Ilya Enkovich <enkovich....@gmail.com> >>>> wrote: >>>> > 2016-06-20 16:39 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>: >>>> >> On Mon, Jun 20, 2016 at 1:55 PM, H.J. Lu <hongjiu...@intel.com> wrote: >>>> >>> TImode register referenced in debug insn can be converted to V1TImode >>>> >>> by scalar to vector optimization. We need to convert a debug insn if >>>> >>> it has a variable in a TImode register. >>>> >>> >>>> >>> Tested on x86-64. OK for trunk? >>>> >> >>>> >> Ilya, does this approach look good to you? Also, does DImode STV >>>> >> conversion need similar handling of debug insns? >>>> > >>>> > DImode conversion doesn't change register mode (i.e. never calls >>>> > PUT_MODE for registers). That keeps debug instructions valid. >>>> > >>>> > Overall I don't like the idea of having debug insns in candidates >>>> > set and in chains. Looks like it is possible to have a chain >>>> > consisting of a debug insn only which is weird (otherwise I don't >>>> > see why we may return false in timode_scalar_chain::convert_insn). >>>> >>>> Yes, it can happen: >>>> >>>> (insn 11 8 12 2 (parallel [ >>>> (set (reg/v:TI 91 [ <retval> ]) >>>> (plus:TI (reg/v:TI 92 [ a ]) >>>> (reg/v:TI 96 [ b ]))) >>>> (clobber (reg:CC 17 flags)) >>>> ]) y.i:5 210 {*addti3_doubleword} >>>> (expr_list:REG_UNUSED (reg:CC 17 flags) >>>> (nil))) >>>> (debug_insn 12 11 13 2 (var_location:TI w (reg/v:TI 91 [ <retval> ])) >>>> y.i:5 -1 >>>> (nil)) >>>> >>>> >>>> > What about other possible register uses? If debug insns are added >>>> > to candidates then NONDEBUG_INSN_P check for uses in >>>> > timode_check_non_convertible_regs becomes invalid, right? >>>> >>>> Debug insn has no impact on STV decision. We just need to convert >>>> register referenced in debug insn from V1TImode to TImode in >>>> timode_scalar_chain::convert_insn. >>>> >>>> > If we have (or want) to fix some register uses then it's probably >>>> > would be better to visit register uses when we convert its mode >>>> > and make required fix-ups. It seems better to me to not involve >>>> > debug insns in analysis phase. >>>> >>>> Here is the updated patch to add debug insn, which references the >>>> TImode register which will be converted to V1TImode to queue. >>>> I am testing it now. >>>> >>> >>> You still count and dump debug insns as optimized ones. Also we >>> try to use virtual functions to cover differences in DI and TI >>> optimizations and introducing additional TARGET_64BIT in common >>> STV code is undesirable. >>> >>> Also your conversion now depends on instructions processing order. >>> You will fail to process debug insn before non-debug ones. Required >>> order is not guaranteed because processing depends on instruction >>> UIDs only. >>> >>> I propose to modify transformation phase only like in the patch >>> (untested) below. I rely on your code which assumes the only >>> possible usage in debug insn is VAR_LOCATION. >>> >>> Thanks, >>> Ilya >>> -- >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> index c5e5e12..ec955f0 100644 >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -3139,6 +3139,7 @@ class timode_scalar_chain : public scalar_chain >>> >>> private: >>> void mark_dual_mode_def (df_ref def); >>> + void fix_debug_reg_uses (rtx reg); >>> void convert_insn (rtx_insn *insn); >>> /* We don't convert registers to difference size. */ >>> void convert_registers () {} >>> @@ -3790,6 +3791,34 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) >>> df_insn_rescan (insn); >>> } >>> >>> +/* Fix uses of converted REG in debug insns. */ >>> + >>> +void >>> +timode_scalar_chain::fix_debug_reg_uses (rtx reg) >>> +{ >>> + df_ref ref; >>> + for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; ref = DF_REF_NEXT_REG >>> (ref)) >>> + { >>> + rtx_insn *insn = DF_REF_INSN (ref); >>> + >>> + if (DEBUG_INSN_P (insn)) >>> + { >>> + /* It must be a debug insn with a TImode variable in register. */ >>> + rtx val = PATTERN (insn); >>> + gcc_assert (GET_MODE (val) == TImode >>> + && GET_CODE (val) == VAR_LOCATION); >>> + rtx loc = PAT_VAR_LOCATION_LOC (val); >>> + gcc_assert (REG_P (loc) >>> + && GET_MODE (loc) == V1TImode); >>> + /* Convert V1TImode register, which has been updated by a SET >>> + insn before, to SUBREG TImode. */ >>> + PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 0); >>> + df_insn_rescan (insn); >>> + return; >>> + } >>> + } >>> +} >>> + >>> /* Convert INSN from TImode to V1T1mode. */ >>> >>> void >>> @@ -3806,8 +3835,10 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) >>> rtx tmp = find_reg_equal_equiv_note (insn); >>> if (tmp) >>> PUT_MODE (XEXP (tmp, 0), V1TImode); >>> + PUT_MODE (dst, V1TImode); >>> + fix_debug_reg_uses (dst); >>> } >>> - /* FALLTHRU */ >>> + break; >>> case MEM: >>> PUT_MODE (dst, V1TImode); >>> break; >> >> Won't be it waste CPU cycles when there is no debug insin >> which use TImode registers? >> > > Here is the updated patch. Tested on x86-64. OK for trunk?
I'm OK with this version. Thanks, Ilya > > Thanks. > > > -- > H.J.