Hi Richard

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: 11 November 2020 17:52
> To: Sudakshina Das <sudi....@arm.com>
> Cc: Wilco Dijkstra <wilco.dijks...@arm.com>; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov <kyrylo.tkac...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das <sudi....@arm.com> writes:
> > Apologies for the delay. I have attached another version of the patch.
> > I have disabled the test cases for ILP32. This is only because
> > function body check fails because there is an addition unsigned extension
> instruction for src pointer in
> > every test (uxtw    x0, w0). The actual inlining is not different.
> 
> Yeah, agree that's the best way of handling the ILP32 difference.
> 
> > […]
> > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant.
> Without
> > +   -mstrict-align, make decisions in "setmem".  Otherwise follow a sensible
> > +   default:  when optimizing for size adjust the ratio to account for
> > +the
> 
> nit: should just be one space after “:”
> 
> > […]
> > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
> >    return true;
> >  }
> >
> > +/* Like aarch64_copy_one_block_and_progress_pointers, except for
> memset where
> > +   *src is a register we have created with the duplicated value to be
> > +set.  */
> 
> “*src” -> SRC
> since there's no dereference now
> 
> > […]
> > +  /* In case we are optimizing for size or if the core does not
> > +     want to use STP Q regs, lower the max_set_size.  */
> > +  max_set_size = (!speed_p
> > +             || (aarch64_tune_params.extra_tuning_flags
> > +                 & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> > +             ? max_set_size/2 : max_set_size;
> 
> Formatting nit: should be a space either side of “/”.
> 
> > +  while (n > 0)
> > +    {
> > +      /* Find the largest mode in which to do the copy in without
> > +    over writing.  */
> 
> s/in without/without/
> 
> > +      opt_scalar_int_mode mode_iter;
> > +      FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> > +   if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> > +     cur_mode = mode_iter.require ();
> > +
> > +      gcc_assert (cur_mode != BLKmode);
> > +
> > +      mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> > +      aarch64_set_one_block_and_progress_pointer (src, &dst,
> > + cur_mode);
> > +
> > +      n -= mode_bits;
> > +
> > +      /* Do certain trailing copies as overlapping if it's going to be
> > +    cheaper.  i.e. less instructions to do so.  For instance doing a 15
> > +    byte copy it's more efficient to do two overlapping 8 byte copies
> than
> > +    8 + 4 + 2 + 1.  */
> > +      if (n > 0 && n < copy_limit / 2)
> > +   {
> > +     next_mode = smallest_mode_for_size (n, MODE_INT);
> > +     int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> 
> Sorry for the runaround, but looking at this again, I'm a bit worried that we
> only indirectly test that n_bits is within the length of the original set.  I 
> guess
> it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will 
> never
> exceed mode_bits.  I think it might be worth adding an assert to make that
> “clearer” (maybe only to me, probably obvious to everyone else):
> 
>         gcc_assert (n_bits <= mode_bits);
> 
> OK with those changes, thanks.

Thank you! Committed as 54bbde5 with those changes.

Sudi

> 
> Richard
> 
> > +     dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> > +     n = n_bits;
> > +   }
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +
> >  /* Split a DImode store of a CONST_INT SRC to MEM DST as two
> >     SImode stores.  Handle the case when the constant has identical
> >     bottom and top halves.  This is beneficial when the two stores can
> > be

Reply via email to