Jakub Jelinek <ja...@redhat.com> writes:
> On Tue, Feb 16, 2021 at 03:03:56PM +0000, Richard Sandiford via Gcc-patches 
> wrote:
>> > On Tue, Feb 16, 2021 at 01:09:43PM +0000, Richard Sandiford wrote:
>> >> Can I put in a plea to put this in recog.[hc], and possibly also make
>> >> it a copy constructor for recog_data_d?  I can't think of any legitimate
>> >> cases in which we'd want to copy the whole structure, instead of just
>> >> the active parts.
>> >
>> > Good idea.  So like this?
>> >
>> > 2021-02-16  Jakub Jelinek  <ja...@redhat.com>
>> >
>> >    PR target/99104
>> >    * recog.h (recog_data_d::recog_data_d ()): New defaulted default ctor.
>> >    (recog_data_d::recog_data_d (const recog_data_d &)): Declare.
>> >    (recog_data_d::operator = (const recog_data_d &)): Declare.
>> >    * recog.c (recog_data_d::operator = (const recog_data_d &)): New
>> >    assignment operator.
>> >    (recog_data_d::recog_data_d (const recog_data_d &)): New copy ctor.
>> 
>> OK, thanks.
>
> Unfortunately, that caused regressions:
> +FAIL: gcc.target/i386/pad-3.c (internal compiler error)
> +FAIL: gcc.target/i386/pad-3.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/pad-3.c scan-assembler-not nop
> +UNRESOLVED: gcc.target/i386/pad-3.c scan-assembler-not rep
> +FAIL: gcc.target/i386/pr93611.c (internal compiler error)
> +FAIL: gcc.target/i386/pr93611.c (test for excess errors)
>
> The thing is that both generated split_insns and peephole2_insns functions
> start with:
>   rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0];
> ...
>   recog_data.insn = NULL;
> and then mix various tests with operands[???] = ...; statements.
> So, the recog.[ch] added copy ctor and assignment operator can be perhaps
> useful for other cases but not for the splitter/peephole2 conditions.
>
> So, either we can still use those and then copy rest of operands array
> (there is nothing that tells how many are valid), like in the patch below
> that passed x86_64-linux and i686-linux bootstrap and regtests, i.e.
>   /* Remember recog_data content.  */
>   struct recog_data_d recog_data_save = recog_data;
>   /* While recog_data normally contains just n_operands operands in
>      operand array, during split_all_insns or peephole2 it can contain
>      arbitrary number of operands that are being matched.  */
>   for (int i = recog_data.n_operands; i < MAX_RECOG_OPERANDS; i++)
>     recog_data_save.operand[i] = recog_data.operand[i];
>
>   dist_define = distance_non_agu_define (regno1, regno2, insn);
>   dist_use = distance_agu_use (regno0, insn);
>
>   /* distance_non_agu_define can call get_attr_type which can call
>      recog_memoized, restore recog_data back to previous content.  */
>   recog_data = recog_data_save;
>   for (int i = recog_data_save.n_operands; i < MAX_RECOG_OPERANDS; i++)
>     recog_data.operand[i] = recog_data_save.operand[i];
> or we could save less by:
>   rtx operands[MAX_RECOG_OPERANDS];
>   memcpy (operands, recog_data.operand, sizeof operands);
>
>   dist_define = distance_non_agu_define (regno1, regno2, insn);
>   dist_use = distance_agu_use (regno0, insn);
>
>   memcpy (recog_data.operand, operands, sizeof operands);
>   recog_data.insn = NULL;
> and up to you whether the recog.[hc] changes should be kept or not.

Hmm.  I think that just means that the optimisation performed by
the copy constructor isn't valid in practice (even if it should be
in principle).  Guess this is the curse of manipulating data structures
directly rather than through well-defined interfaces :-)

TBH, since the obvious thing didn't work, I think we should keep it
simple and take the hit of the full copy.  It seems less error-prone
and I'd be surprised if the overhead was noticeable in practice.
It also means that if we do manage to optimise the copy in future,
all callers would automatically benefit.

I.e. my preference would be to go for your original patch without
the recog.[hc] bits.

Thanks,
Richard


>
> In any case, I'm surprised why what the splitters did so far worked at all,
> because it isn't always the case that the split_insns or peephole2_insns
> filled operands don't really need to match what extract_insn would fill.
>
> Sorry for this taking so long.
>
> 2021-02-17  Jakub Jelinek  <ja...@redhat.com>
>
>       PR target/99104
>       * recog.h (recog_data_d::recog_data_d ()): New defaulted default ctor.
>       (recog_data_d::recog_data_d (const recog_data_d &)): Declare.
>       (recog_data_d::operator = (const recog_data_d &)): Declare.
>       * recog.c (recog_data_d::operator = (const recog_data_d &)): New
>       assignment operator.
>       (recog_data_d::recog_data_d (const recog_data_d &)): New copy ctor.
>       * config/i386/i386.c (distance_non_agu_define): Don't call
>       extract_insn_cached here.
>       (ix86_lea_outperforms): Save and restore recog_data around call
>       to distance_non_agu_define and distance_agu_use.
>       (ix86_ok_to_clobber_flags): Remove.
>       (ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
>       (ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
>       * config/i386/i386.m (*lea<mode>): Change from define_insn_and_split
>       into define_insn.  Move the splitting to define_peephole2 and
>       check there using peep2_regno_dead_p if FLAGS_REG is dead.
>
>       * gcc.dg/pr99104.c: New test.
>
> --- gcc/recog.h.jj    2021-01-15 19:04:42.436445517 +0100
> +++ gcc/recog.h       2021-02-16 14:26:00.384934711 +0100
> @@ -372,6 +372,12 @@ struct recog_data_d
>  
>    /* In case we are caching, hold insn data was generated for.  */
>    rtx_insn *insn;
> +
> +  recog_data_d () = default;
> +  /* Copy constructor and assignment operator, so that when copying the
> +     structure we copy only the active elements.  */
> +  recog_data_d (const recog_data_d &);
> +  recog_data_d &operator = (const recog_data_d &);
>  };
>  
>  extern struct recog_data_d recog_data;
> --- gcc/recog.c.jj    2021-02-16 08:57:21.277960594 +0100
> +++ gcc/recog.c       2021-02-16 14:26:42.235464536 +0100
> @@ -109,7 +109,41 @@ init_recog (void)
>  {
>    volatile_ok = 1;
>  }
> +
> +/* recog_data_d assignment operator, so that recog_data_d copying
> +   only copies the active elements.  */
> +
> +recog_data_d &
> +recog_data_d::operator = (const recog_data_d &src)
> +{
> +  n_operands = src.n_operands;
> +  n_dups = src.n_dups;
> +  n_alternatives = src.n_alternatives;
> +  is_asm = src.is_asm;
> +  insn = src.insn;
> +  for (int i = 0; i < src.n_operands; i++)
> +    {
> +      operand[i] = src.operand[i];
> +      operand_loc[i] = src.operand_loc[i];
> +      constraints[i] = src.constraints[i];
> +      is_operator[i] = src.is_operator[i];
> +      operand_mode[i] = src.operand_mode[i];
> +      operand_type[i] = src.operand_type[i];
> +    }
> +  for (int i = 0; i < src.n_dups; i++)
> +    {
> +      dup_loc[i] = src.dup_loc[i];
> +      dup_num[i] = src.dup_num[i];
> +    }
> +  return *this;
> +}
>  
> +/* recog_data_d copy constructor, so that recog_data_d construction
> +   only copies the active elements.  */
> +recog_data_d::recog_data_d (const recog_data_d &src)
> +{
> +  *this = src;
> +}
>  
>  /* Return true if labels in asm operands BODY are LABEL_REFs.  */
>  
> --- gcc/config/i386/i386.c.jj 2021-02-16 08:58:55.197890195 +0100
> +++ gcc/config/i386/i386.c    2021-02-17 00:17:18.997017365 +0100
> @@ -14754,11 +14754,6 @@ distance_non_agu_define (unsigned int re
>       }
>      }
>  
> -  /* get_attr_type may modify recog data.  We want to make sure
> -     that recog data is valid for instruction INSN, on which
> -     distance_non_agu_define is called.  INSN is unchanged here.  */
> -  extract_insn_cached (insn);
> -
>    if (!found)
>      return -1;
>  
> @@ -14928,17 +14923,22 @@ ix86_lea_outperforms (rtx_insn *insn, un
>        return true;
>      }
>  
> -  rtx_insn *rinsn = recog_data.insn;
> +  /* Remember recog_data content.  */
> +  struct recog_data_d recog_data_save = recog_data;
> +  /* While recog_data normally contains just n_operands operands in
> +     operand array, during split_all_insns or peephole2 it can contain
> +     arbitrary number of operands that are being matched.  */
> +  for (int i = recog_data.n_operands; i < MAX_RECOG_OPERANDS; i++)
> +    recog_data_save.operand[i] = recog_data.operand[i];
>  
>    dist_define = distance_non_agu_define (regno1, regno2, insn);
>    dist_use = distance_agu_use (regno0, insn);
>  
> -  /* distance_non_agu_define can call extract_insn_cached.  If this function
> -     is called from define_split conditions, that can break insn splitting,
> -     because split_insns works by clearing recog_data.insn and then modifying
> -     recog_data.operand array and match the various split conditions.  */
> -  if (recog_data.insn != rinsn)
> -    recog_data.insn = NULL;
> +  /* distance_non_agu_define can call get_attr_type which can call
> +     recog_memoized, restore recog_data back to previous content.  */
> +  recog_data = recog_data_save;
> +  for (int i = recog_data_save.n_operands; i < MAX_RECOG_OPERANDS; i++)
> +    recog_data.operand[i] = recog_data_save.operand[i];
>  
>    if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
>      {
> @@ -14968,38 +14968,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
>    return dist_define >= dist_use;
>  }
>  
> -/* Return true if it is legal to clobber flags by INSN and
> -   false otherwise.  */
> -
> -static bool
> -ix86_ok_to_clobber_flags (rtx_insn *insn)
> -{
> -  basic_block bb = BLOCK_FOR_INSN (insn);
> -  df_ref use;
> -  bitmap live;
> -
> -  while (insn)
> -    {
> -      if (NONDEBUG_INSN_P (insn))
> -     {
> -       FOR_EACH_INSN_USE (use, insn)
> -         if (DF_REF_REG_USE_P (use) && DF_REF_REGNO (use) == FLAGS_REG)
> -           return false;
> -
> -       if (insn_defines_reg (FLAGS_REG, INVALID_REGNUM, insn))
> -         return true;
> -     }
> -
> -      if (insn == BB_END (bb))
> -     break;
> -
> -      insn = NEXT_INSN (insn);
> -    }
> -
> -  live = df_get_live_out(bb);
> -  return !REGNO_REG_SET_P (live, FLAGS_REG);
> -}
> -
>  /* Return true if we need to split op0 = op1 + op2 into a sequence of
>     move and add to avoid AGU stalls.  */
>  
> @@ -15012,10 +14980,6 @@ ix86_avoid_lea_for_add (rtx_insn *insn,
>    if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
>      return false;
>  
> -  /* Check it is correct to split here.  */
> -  if (!ix86_ok_to_clobber_flags(insn))
> -    return false;
> -
>    regno0 = true_regnum (operands[0]);
>    regno1 = true_regnum (operands[1]);
>    regno2 = true_regnum (operands[2]);
> @@ -15051,7 +15015,7 @@ ix86_use_lea_for_mov (rtx_insn *insn, rt
>  }
>  
>  /* Return true if we need to split lea into a sequence of
> -   instructions to avoid AGU stalls. */
> +   instructions to avoid AGU stalls during peephole2. */
>  
>  bool
>  ix86_avoid_lea_for_addr (rtx_insn *insn, rtx operands[])
> @@ -15071,10 +15035,6 @@ ix86_avoid_lea_for_addr (rtx_insn *insn,
>         && REG_P (XEXP (operands[1], 0))))
>      return false;
>  
> -  /* Check if it is OK to split here.  */
> -  if (!ix86_ok_to_clobber_flags (insn))
> -    return false;
> -
>    ok = ix86_decompose_address (operands[1], &parts);
>    gcc_assert (ok);
>  
> --- gcc/config/i386/i386.md.jj        2021-01-13 10:12:07.036506101 +0100
> +++ gcc/config/i386/i386.md   2021-02-16 13:48:13.650317709 +0100
> @@ -5176,7 +5176,7 @@ (define_expand "floatunsdidf2"
>  
>  ;; Load effective address instructions
>  
> -(define_insn_and_split "*lea<mode>"
> +(define_insn "*lea<mode>"
>    [(set (match_operand:SWI48 0 "register_operand" "=r")
>       (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))]
>    "ix86_hardreg_mov_ok (operands[0], operands[1])"
> @@ -5189,38 +5189,36 @@ (define_insn_and_split "*lea<mode>"
>    else 
>      return "lea{<imodesuffix>}\t{%E1, %0|%0, %E1}";
>  }
> -  "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
> +  [(set_attr "type" "lea")
> +   (set (attr "mode")
> +     (if_then_else
> +       (match_operand 1 "SImode_address_operand")
> +       (const_string "SI")
> +       (const_string "<MODE>")))])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "register_operand")
> +     (match_operand:SWI48 1 "address_no_seg_operand"))]
> +  "ix86_hardreg_mov_ok (operands[0], operands[1])
> +   && peep2_regno_dead_p (0, FLAGS_REG)
> +   && ix86_avoid_lea_for_addr (peep2_next_insn (0), operands)"
>    [(const_int 0)]
>  {
>    machine_mode mode = <MODE>mode;
> -  rtx pat;
> -
> -  /* ix86_avoid_lea_for_addr re-recognizes insn and may
> -     change operands[] array behind our back.  */
> -  pat = PATTERN (curr_insn);
> -
> -  operands[0] = SET_DEST (pat);
> -  operands[1] = SET_SRC (pat);
>  
>    /* Emit all operations in SImode for zero-extended addresses.  */
>    if (SImode_address_operand (operands[1], VOIDmode))
>      mode = SImode;
>  
> -  ix86_split_lea_for_addr (curr_insn, operands, mode);
> +  ix86_split_lea_for_addr (peep2_next_insn (0), operands, mode);
>  
>    /* Zero-extend return register to DImode for zero-extended addresses.  */
>    if (mode != <MODE>mode)
> -    emit_insn (gen_zero_extendsidi2
> -            (operands[0], gen_lowpart (mode, operands[0])));
> +    emit_insn (gen_zero_extendsidi2 (operands[0],
> +                                  gen_lowpart (mode, operands[0])));
>  
>    DONE;
> -}
> -  [(set_attr "type" "lea")
> -   (set (attr "mode")
> -     (if_then_else
> -       (match_operand 1 "SImode_address_operand")
> -       (const_string "SI")
> -       (const_string "<MODE>")))])
> +})
>  
>  ;; Add instructions
>  
> --- gcc/testsuite/gcc.dg/pr99104.c.jj 2021-02-16 11:47:08.793255200 +0100
> +++ gcc/testsuite/gcc.dg/pr99104.c    2021-02-16 11:47:08.793255200 +0100
> @@ -0,0 +1,15 @@
> +/* PR target/99104 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -fsel-sched-pipelining -fselective-scheduling2 
> -funroll-loops" } */
> +
> +__int128 a;
> +int b;
> +int foo (void);
> +
> +int __attribute__ ((simd))
> +bar (void)
> +{
> +  a = ~a;
> +  if (foo ())
> +    b = 0;
> +}
>
>       Jakub

Reply via email to