On Thu, Jul 09, 2020 at 04:33:46PM +0800, Hongtao Liu via Gcc-patches wrote:
> +static void
> +replace_constant_pool_with_broadcast (rtx_insn* insn)
> +{
> +  subrtx_ptr_iterator::array_type array;
> +  FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL)
> +    {
> +      rtx *loc = *iter;
> +      rtx x = *loc;
> +      rtx broadcast_mem, vec_dup, constant, first;
> +      machine_mode mode;
> +      if (GET_CODE (x) != MEM

MEM_P

> +       || GET_CODE (XEXP (x, 0)) != SYMBOL_REF

SYMBOL_REF_P

> +       || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
> +     continue;
> +
> +      mode = GET_MODE (x);
> +      if (!VECTOR_MODE_P (mode))
> +     return;
> +
> +      constant = get_pool_constant (XEXP (x, 0));
> +      first = XVECEXP (constant, 0, 0);

Shouldn't this verify first that GET_CODE (constant) == CONST_VECTOR
and punt otherwise?

> +      broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
> +      vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
> +      *loc = vec_dup;
> +      INSN_CODE (insn) = -1;
> +      /* Revert change if there's no corresponding pattern.  */
> +      if (recog_memoized (insn) < 0)
> +             {
> +               *loc = x;
> +               recog_memoized (insn);
> +             }

The usual way of doing this would be through
  validate_change (insn, loc, vec_dup, 0);

Also, isn't the pass also useful for TARGET_AVX and above (but in that case
only if it is a simple memory load)?  Or are avx/avx2 broadcast slower than
full vector loads?

As Jeff wrote, I wonder if when successfully replacing those pool constants
the old constant pool entries will be omitted.

Another thing I wonder about is whether more analysis shouldn't be used.
E.g. if the constant pool entry is already emitted into .rodata anyway
(e.g. some earlier function needed it), using the broadcast will mean
actually larger .rodata.  If {1to8} and similar is as fast as reading all
the same elements from memory (or faster), perhaps in that case it should
broadcast from the first element of the existing constant pool full vector
rather than creating a new one.
And similarly, perhaps the function should look at all constant pool entries
in the current function (not yet emitted into .rodata) and if it would
succeed for some and not for others, either use broadcast from its first
element or not perform it for the others too.

        Jakub

Reply via email to