Hi All,

This patch adds a match.pd rule for stripping away the type converts when you're
converting to a type that has twice the precision of the current type in the 
same
class, doing a simple math operation on it and converting back to the smaller 
type.

The change makes it so the operations are kept in the smaller type.  The 
motivating
reason behind this is that the imaginary constant I in C99 is defined to be a
single precision float.  For Half precision this means the entire operation is 
carried
out in single precision which means that it adds a lot of type casting 
instructions in
the output and prevents optimal vectorization as it lowers your vectorization 
factor.


It means that if a and b are fp16 values, doing a * b * I will get vectorized 
in SFmode
instead of HFmode.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed regressions in the 
builtin-arith-overflow-8 to -11.

However since it doesn't show any regression anywhere else I am wondering if 
it's just the test
that need updating or if the idea is not acceptable. Perhaps it should be done 
only for unsafe math?

So I am posting the patch for comments.

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  <tamar.christ...@arm.com>

        * match.pd: Add type conversion stripping.

-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index d07ceb7d087b8b5c5a7d7362ad9d8f71ac90dc08..3c2f8caca42d6a163fbf7faba6220d7304200100 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4709,6 +4709,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	 (convert (op (convert:utype @0)
 		      (convert:utype @1))))))))
 
+/* Strip out useless type conversions:
+   ((F)((X)a op (X)b)) -> a op b
+
+   when ((F)((X)a op (X)b)) where a and b are both of type F,
+   and X has twice the precision of F then the conversion is useless
+   and should be stripped away to allow more optimizations.  */
+
+(for op (plus minus mult rdiv)
+ (simplify
+   (convert (op:s (convert@0 @1) (convert@2 @3)))
+   (if (types_match (@1, @3)
+        && types_match (type, @1)
+        && types_match (@0, @2)
+	&& GET_MODE_CLASS (TYPE_MODE (type))
+	   == GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (@0)))
+        && TYPE_PRECISION (type) == (TYPE_PRECISION (TREE_TYPE (@0)) / 2))
+     (op @1 @3))))
+
 /* This is another case of narrowing, specifically when there's an outer
    BIT_AND_EXPR which masks off bits outside the type of the innermost
    operands.   Like the previous case we have to convert the operands

Reply via email to