Re: Don't use integer "FMA" for shifts

Richard Sandiford Tue, 06 Aug 2019 06:57:51 -0700

"Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:
> On 30/07/2019 11:50, Richard Sandiford wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
>>> <richard.sandif...@arm.com> wrote:
>>>>
>>>> tree-ssa-math-opts supports FMA optabs for integers as well as
>>>> floating-point types, even though there's no distinction between
>>>> fused and unfused there.  It turns out to be pretty handy for the
>>>> IFN_COND_* handling though, so I don't want to remove it, however
>>>> weird it might seem.
>>>>
>>>> Instead this patch makes sure that we don't apply it to integer
>>>> multiplications that are actually shifts (but that are represented
>>>> in gimple as multiplications because that's the canonical form).
>>>>
>>>> This is a preemptive strike.  The test doesn't fail for SVE as-is,
>>>> but would after a later patch.
>>>>
>>>> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
>>>> OK to install?
>>>>
>>>> Richard
>>>>
>>>>
>>>> 2019-07-30  Richard Sandiford  <richard.sandif...@arm.com>
>>>>
>>>> gcc/
>>>>          * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
>>>>          multiplications by a power of 2 or a negative power of 2.
>>>>
>>>> gcc/testsuite/
>>>>          * gcc.dg/vect/vect-cond-arith-8.c: New test.
>>>>
>>>> Index: gcc/tree-ssa-math-opts.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-math-opts.c    2019-07-30 10:51:51.827405171 +0100
>>>> +++ gcc/tree-ssa-math-opts.c    2019-07-30 10:52:27.327139149 +0100
>>>> @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
>>>>         && flag_fp_contract_mode == FP_CONTRACT_OFF)
>>>>       return false;
>>>>
>>>> -  /* We don't want to do bitfield reduction ops.  */
>>>> -  if (INTEGRAL_TYPE_P (type)
>>>> -      && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS 
>>>> (type)))
>>>> -    return false;
>>>> +  if (ANY_INTEGRAL_TYPE_P (type))
>>>> +    {
>>>> +      /* We don't want to do bitfield reduction ops.  */
>>>> +      tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
>>>
>>> you can use element_type () for this.  But I think it's easier to
>>> leave the existing
>>> test in-place since vector or complex types cannot have non-mode precision
>>> components.
>> 
>> Ah, yeah.  Guess I was over-generalising here :-)
>> 
>>>> +      if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS 
>>>> (itype))
>>>> +       return false;
>>>> +
>>>> +      /* Don't use FMA for multiplications that are actually shifts.  */
>>>
>>> So I question this - if the FMA can do the shift "for free" and it 
>>> eventually is
>>> same cost/latency as an add why do we want to not allow this (for all 
>>> targets)?
>>> Esp. for vector types I would guess a add plus a shift may be more costly
>>> (not all ISAs implement shifts anyhow).
>> 
>> The shift doesn't really come for free.  By using FMA we're converting
>> the shift by a constant into a general multiplication.
>
> Isn't this just the problem that the madd<mode> pattern for aarch64 
> doesn't accept constants that are a power of 2?  So it ends up trying to 
> force the constant into a register unnecessarily.
>
> (define_insn "madd<mode>"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>       (plus:GPI (mult:GPI (match_operand:GPI 1 "register_operand" "r")
>                           (match_operand:GPI 2 "register_operand" "r"))
> --- This should be reg_or_imm_power2, then a second alternative for the 
> immediate case.
>
>                 (match_operand:GPI 3 "register_operand" "r")))]
>    ""


That might be a problem too (e.g. maybe for intrinsics?), but this patch
was only dealing with the way that the "fma<mode>4" group of patterns are
used.  The AArch64 port doesn't yet define fma for any integer modes AFAICT,
but I have a patch that adds them for SVE.  (And it now handles shifts
specially, as Richard suggested.)

Thanks,
Richard


>
> R.
>
>> 
>>> Can this be fixed up in the target instead?  (by a splitter or appropriate
>>> expander?)
>> 
>> OK, I'll try to do it that way instead.
>> 
>> Thanks,
>> Richard
>> 
>>>
>>>> +      tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
>>>> +      if (val
>>>> +         && TREE_CODE (val) == INTEGER_CST
>>>> +         && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
>>>> +       return false;
>>>> +    }
>>>>
>>>>     /* If the target doesn't support it, don't generate it.  We assume that
>>>>        if fma isn't available then fms, fnma or fnms are not either.  */
>>>> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
>>>> ===================================================================
>>>> --- /dev/null   2019-07-30 08:53:31.317691683 +0100
>>>> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c       2019-07-30 
>>>> 10:52:27.327139149 +0100
>>>> @@ -0,0 +1,57 @@
>>>> +/* { dg-require-effective-target scalar_all_fma } */
>>>> +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
>>>> +
>>>> +#include "tree-vect.h"
>>>> +
>>>> +#define N (VECTOR_BITS * 11 / 64 + 3)
>>>> +
>>>> +#define DEF(INV)                                       \
>>>> +  void __attribute__ ((noipa))                         \
>>>> +  f_##INV (int *restrict a, int *restrict b,           \
>>>> +          int *restrict c)                             \
>>>> +  {                                                    \
>>>> +    for (int i = 0; i < N; ++i)                                \
>>>> +      {                                                        \
>>>> +       int mb = (INV & 1 ? -b[i] : b[i]);              \
>>>> +       int mc = (INV & 2 ? -c[i] : c[i]);              \
>>>> +       a[i] = b[i] < 10 ? mb * 8 + mc : 10;            \
>>>> +      }                                                        \
>>>> +  }
>>>> +
>>>> +#define TEST(INV)                                      \
>>>> +  {                                                    \
>>>> +    f_##INV (a, b, c);                                 \
>>>> +    for (int i = 0; i < N; ++i)                                \
>>>> +      {                                                        \
>>>> +       int mb = (INV & 1 ? -b[i] : b[i]);              \
>>>> +       int mc = (INV & 2 ? -c[i] : c[i]);              \
>>>> +       int truev = mb * 8 + mc;                        \
>>>> +       if (a[i] != (i % 17 < 10 ? truev : 10))         \
>>>> +         __builtin_abort ();                           \
>>>> +       asm volatile ("" ::: "memory");                 \
>>>> +      }                                                        \
>>>> +  }
>>>> +
>>>> +#define FOR_EACH_INV(T) \
>>>> +  T (0) T (1) T (2) T (3)
>>>> +
>>>> +FOR_EACH_INV (DEF)
>>>> +
>>>> +int
>>>> +main (void)
>>>> +{
>>>> +  int a[N], b[N], c[N];
>>>> +  for (int i = 0; i < N; ++i)
>>>> +    {
>>>> +      b[i] = i % 17;
>>>> +      c[i] = i % 13 + 14;
>>>> +      asm volatile ("" ::: "memory");
>>>> +    }
>>>> +  FOR_EACH_INV (TEST)
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump-not { = \.COND_FMA } "optimized" } } */
>>>> +/* { dg-final { scan-tree-dump-not { = \.COND_FMS } "optimized" } } */
>>>> +/* { dg-final { scan-tree-dump-not { = \.COND_FNMA } "optimized" } } */
>>>> +/* { dg-final { scan-tree-dump-not { = \.COND_FNMS } "optimized" } } */

Re: Don't use integer "FMA" for shifts

Reply via email to