On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > tree-ssa-math-opts supports FMA optabs for integers as well as > floating-point types, even though there's no distinction between > fused and unfused there. It turns out to be pretty handy for the > IFN_COND_* handling though, so I don't want to remove it, however > weird it might seem. > > Instead this patch makes sure that we don't apply it to integer > multiplications that are actually shifts (but that are represented > in gimple as multiplications because that's the canonical form). > > This is a preemptive strike. The test doesn't fail for SVE as-is, > but would after a later patch. > > Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. > OK to install? > > Richard > > > 2019-07-30 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer > multiplications by a power of 2 or a negative power of 2. > > gcc/testsuite/ > * gcc.dg/vect/vect-cond-arith-8.c: New test. > > Index: gcc/tree-ssa-math-opts.c > =================================================================== > --- gcc/tree-ssa-math-opts.c 2019-07-30 10:51:51.827405171 +0100 > +++ gcc/tree-ssa-math-opts.c 2019-07-30 10:52:27.327139149 +0100 > @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t > && flag_fp_contract_mode == FP_CONTRACT_OFF) > return false; > > - /* We don't want to do bitfield reduction ops. */ > - if (INTEGRAL_TYPE_P (type) > - && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type))) > - return false; > + if (ANY_INTEGRAL_TYPE_P (type)) > + { > + /* We don't want to do bitfield reduction ops. */ > + tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
you can use element_type () for this. But I think it's easier to leave the existing test in-place since vector or complex types cannot have non-mode precision components. > + if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype)) > + return false; > + > + /* Don't use FMA for multiplications that are actually shifts. */ So I question this - if the FMA can do the shift "for free" and it eventually is same cost/latency as an add why do we want to not allow this (for all targets)? Esp. for vector types I would guess a add plus a shift may be more costly (not all ISAs implement shifts anyhow). Can this be fixed up in the target instead? (by a splitter or appropriate expander?) > + tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2; > + if (val > + && TREE_CODE (val) == INTEGER_CST > + && wi::popcount (wi::abs (wi::to_wide (val))) == 1) > + return false; > + } > > /* If the target doesn't support it, don't generate it. We assume that > if fma isn't available then fms, fnma or fnms are not either. */ > Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c > =================================================================== > --- /dev/null 2019-07-30 08:53:31.317691683 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c 2019-07-30 > 10:52:27.327139149 +0100 > @@ -0,0 +1,57 @@ > +/* { dg-require-effective-target scalar_all_fma } */ > +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */ > + > +#include "tree-vect.h" > + > +#define N (VECTOR_BITS * 11 / 64 + 3) > + > +#define DEF(INV) \ > + void __attribute__ ((noipa)) \ > + f_##INV (int *restrict a, int *restrict b, \ > + int *restrict c) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + int mb = (INV & 1 ? -b[i] : b[i]); \ > + int mc = (INV & 2 ? -c[i] : c[i]); \ > + a[i] = b[i] < 10 ? mb * 8 + mc : 10; \ > + } \ > + } > + > +#define TEST(INV) \ > + { \ > + f_##INV (a, b, c); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + int mb = (INV & 1 ? -b[i] : b[i]); \ > + int mc = (INV & 2 ? -c[i] : c[i]); \ > + int truev = mb * 8 + mc; \ > + if (a[i] != (i % 17 < 10 ? truev : 10)) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +#define FOR_EACH_INV(T) \ > + T (0) T (1) T (2) T (3) > + > +FOR_EACH_INV (DEF) > + > +int > +main (void) > +{ > + int a[N], b[N], c[N]; > + for (int i = 0; i < N; ++i) > + { > + b[i] = i % 17; > + c[i] = i % 13 + 14; > + asm volatile ("" ::: "memory"); > + } > + FOR_EACH_INV (TEST) > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-not { = \.COND_FMA } "optimized" } } */ > +/* { dg-final { scan-tree-dump-not { = \.COND_FMS } "optimized" } } */ > +/* { dg-final { scan-tree-dump-not { = \.COND_FNMA } "optimized" } } */ > +/* { dg-final { scan-tree-dump-not { = \.COND_FNMS } "optimized" } } */