Hi All, This patch adds a match.pd rule for stripping away the type converts when you're converting to a type that has twice the precision of the current type in the same class, doing a simple math operation on it and converting back to the smaller type.
The change makes it so the operations are kept in the smaller type. The motivating reason behind this is that the imaginary constant I in C99 is defined to be a single precision float. For Half precision this means the entire operation is carried out in single precision which means that it adds a lot of type casting instructions in the output and prevents optimal vectorization as it lowers your vectorization factor. It means that if a and b are fp16 values, doing a * b * I will get vectorized in SFmode instead of HFmode. Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and x86_64-pc-linux-gnu are still on going but previous patch showed regressions in the builtin-arith-overflow-8 to -11. However since it doesn't show any regression anywhere else I am wondering if it's just the test that need updating or if the idea is not acceptable. Perhaps it should be done only for unsafe math? So I am posting the patch for comments. Thanks, Tamar gcc/ChangeLog: 2018-11-11 Tamar Christina <tamar.christ...@arm.com> * match.pd: Add type conversion stripping. --
diff --git a/gcc/match.pd b/gcc/match.pd index d07ceb7d087b8b5c5a7d7362ad9d8f71ac90dc08..3c2f8caca42d6a163fbf7faba6220d7304200100 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4709,6 +4709,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (convert (op (convert:utype @0) (convert:utype @1)))))))) +/* Strip out useless type conversions: + ((F)((X)a op (X)b)) -> a op b + + when ((F)((X)a op (X)b)) where a and b are both of type F, + and X has twice the precision of F then the conversion is useless + and should be stripped away to allow more optimizations. */ + +(for op (plus minus mult rdiv) + (simplify + (convert (op:s (convert@0 @1) (convert@2 @3))) + (if (types_match (@1, @3) + && types_match (type, @1) + && types_match (@0, @2) + && GET_MODE_CLASS (TYPE_MODE (type)) + == GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (@0))) + && TYPE_PRECISION (type) == (TYPE_PRECISION (TREE_TYPE (@0)) / 2)) + (op @1 @3)))) + /* This is another case of narrowing, specifically when there's an outer BIT_AND_EXPR which masks off bits outside the type of the innermost operands. Like the previous case we have to convert the operands