Hi Richard > -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: 11 November 2020 17:52 > To: Sudakshina Das <sudi....@arm.com> > Cc: Wilco Dijkstra <wilco.dijks...@arm.com>; gcc-patches@gcc.gnu.org; > Kyrylo Tkachov <kyrylo.tkac...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com> > Subject: Re: [PATCH] aarch64: Add backend support for expanding > __builtin_memset > > Sudakshina Das <sudi....@arm.com> writes: > > Apologies for the delay. I have attached another version of the patch. > > I have disabled the test cases for ILP32. This is only because > > function body check fails because there is an addition unsigned extension > instruction for src pointer in > > every test (uxtw x0, w0). The actual inlining is not different. > > Yeah, agree that's the best way of handling the ILP32 difference. > > > […] > > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant. > Without > > + -mstrict-align, make decisions in "setmem". Otherwise follow a sensible > > + default: when optimizing for size adjust the ratio to account for > > +the > > nit: should just be one space after “:” > > > […] > > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands) > > return true; > > } > > > > +/* Like aarch64_copy_one_block_and_progress_pointers, except for > memset where > > + *src is a register we have created with the duplicated value to be > > +set. */ > > “*src” -> SRC > since there's no dereference now > > > […] > > + /* In case we are optimizing for size or if the core does not > > + want to use STP Q regs, lower the max_set_size. */ > > + max_set_size = (!speed_p > > + || (aarch64_tune_params.extra_tuning_flags > > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) > > + ? max_set_size/2 : max_set_size; > > Formatting nit: should be a space either side of “/”. > > > + while (n > 0) > > + { > > + /* Find the largest mode in which to do the copy in without > > + over writing. */ > > s/in without/without/ > > > + opt_scalar_int_mode mode_iter; > > + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) > > + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) > > + cur_mode = mode_iter.require (); > > + > > + gcc_assert (cur_mode != BLKmode); > > + > > + mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); > > + aarch64_set_one_block_and_progress_pointer (src, &dst, > > + cur_mode); > > + > > + n -= mode_bits; > > + > > + /* Do certain trailing copies as overlapping if it's going to be > > + cheaper. i.e. less instructions to do so. For instance doing a 15 > > + byte copy it's more efficient to do two overlapping 8 byte copies > than > > + 8 + 4 + 2 + 1. */ > > + if (n > 0 && n < copy_limit / 2) > > + { > > + next_mode = smallest_mode_for_size (n, MODE_INT); > > + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); > > Sorry for the runaround, but looking at this again, I'm a bit worried that we > only indirectly test that n_bits is within the length of the original set. I > guess > it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will > never > exceed mode_bits. I think it might be worth adding an assert to make that > “clearer” (maybe only to me, probably obvious to everyone else): > > gcc_assert (n_bits <= mode_bits); > > OK with those changes, thanks.
Thank you! Committed as 54bbde5 with those changes. Sudi > > Richard > > > + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT); > > + n = n_bits; > > + } > > + } > > + > > + return true; > > +} > > + > > + > > /* Split a DImode store of a CONST_INT SRC to MEM DST as two > > SImode stores. Handle the case when the constant has identical > > bottom and top halves. This is beneficial when the two stores can > > be