On Tue, 11 May 2021, Tamar Christina wrote:
> Hi All,
>
> We are looking to implement saturation support in the compiler. The aim is to
> recognize both Scalar and Vector variant of typical saturating expressions.
>
> As an example:
>
> 1. Saturating addition:
> char sat (char a, char b)
> {
> int tmp = a + b;
> return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> }
>
> 2. Saturating abs:
> char sat (char a)
> {
> int tmp = abs (a);
> return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> }
>
> 3. Rounding shifts
> char rndshift (char dc)
> {
> int round_const = 1 << (shift - 1);
> return (dc + round_const) >> shift;
> }
>
> etc.
>
> Of course the first issue is that C does not really have a single idiom for
> expressing this.
>
> At the RTL level we have ss_truncate and us_truncate and float_truncate for
> truncation.
>
> At the Tree level we have nothing for truncation (I believe) for scalars. For
> Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> nothing actually generates this at the moment. it's just an unused tree code.
>
> For rounding there doesn't seem to be any existing infrastructure.
>
> The proposal to handle these are as follow, keep in mind that all of these
> also
> exist in their scalar form, as such detecting them in the vectorizer would be
> the wrong place.
>
> 1. Rounding:
> a) Use match.pd to rewrite various rounding idioms to shifts.
> b) Use backwards or forward prop to rewrite these to internal functions
> where even if the target does not support these rounding instructions
> they
> have a chance to provide a more efficient implementation than what would
> be generated normally.
>
> 2. Saturation:
> a) Use match.pd to rewrite the various saturation expressions into min/max
> operations which opens up the expressions to further optimizations.
> b) Use backwards or forward prop to convert to internal functions if the
> resulting min/max expression still meet the criteria for being a
> saturating expression. This follows the algorithm as outlined in "The
> Software Vectorization handbook" by Aart J.C. Bik.
>
> We could get the right instructions by using combine if we don't rewrite
> the instructions to an internal function, however then during
> Vectorization
> we would overestimate the cost of performing the saturation. The
> constants
> will the also be loaded into registers and so becomes a lot more
> difficult
> to cleanup solely in the backend.
>
> The one thing I am wondering about is whether we would need an internal
> function
> for all operations supported, or if it should be modelled as an internal FN
> which
> just "marks" the operation as rounding/saturating. After all, the only
> difference
> between a normal and saturating expression in RTL is the xx_truncate RTL
> surrounding
> the expression. Doing so would also mean that all targets whom have
> saturating
> instructions would automatically benefit from this.
>
> But it does mean a small adjustment to the costing, which would need to cost
> the
> internal function call and the argument together as a whole.
>
> Any feedback is appreciated to minimize the number of changes required to the
> final patch. Any objections to the outlined approach?
I think it makes sense to pattern-match the operations on GIMPLE
and follow the approach take by __builtin_add_overflow & friends.
Maybe quickly check whether clang provides some builtins already
which we could implement.
There's some appeal to mimicing what RTL does - thus have
the saturation be represented as saturating truncation.
Maybe that's what users expect of builtins as well.
I'm not sure what the rounding shift would do - 'shift' isn't
an argument to rndshift here. It feels like it's a
rounding division but only by powers of two. Does
ROUND_DIV_EXPR already provide the desired semantics?
Richard.