On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
<richard.sandif...@arm.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>> <richard.sandif...@arm.com> wrote:
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>> <david.sherw...@arm.com> wrote:
>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>> <david.sherw...@arm.com> wrote:
>>>>>> > Hi Richard,
>>>>>> >
>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>> > seemed more
>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>> > addition it
>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>> > target-specific
>>>>>> > builtin implementation would. I accept that optimisation
>>>>>> > opportunities will
>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>> > worth having
>>>>>> > them. Also, if we did map it to builtins then the scalar version would 
>>>>>> > go
>>>>>> > through the optabs and the vector version would go through the
>>>>>> > target's builtin
>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>
>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, 
>>>>>> correct?)
>>>>> I thought for this particular case associativity wasn't an issue?
>>>>> We're not doing any
>>>>> reductions here, just simply performing max/min operations on each
>>>>> pair of elements
>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>>> ensure that for
>>>>> each pair of elements if one element is a NaN we return the other one.
>>>>
>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>> builtin for the vectorized variant.
>>>>
>>>> The reason I am pushing against a new tree code is that we'd have an
>>>> awful lot of similar codes when pushing other flag related IL
>>>> specialities to actual IL constructs.  And we still need to find a
>>>> consistent way to do that.
>>>
>>> In this case though the new code is really the "native" min/max operation
>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>>> might make it seem like more of a special case than it really is.
>>
>> In some sense.  But the "strict" version already has a builtin (just no
>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>> for existing builtins (why have builtins at all then?).
>
> We still need the builtin to match the C function (and to allow direct
> calls to __builtin_fmin, etc., which are occasionally useful).
>
>>> If you're still not convinced, how about an internal function instead
>>> of a built-in function, so that we can continue to use optabs for all
>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>> target-specific builtins with target-specific expansion code, especially
>>> when the same concept is exposed by target-independent code for scalars.
>>
>> The target builtin is for the vectorized variant - not all targets might have
>> that and we'd need to query the target about this.  So using a IFN would
>> mean adding a target hook for that query.
>
> No, the idea is that if we have a tree code or an internal function, the
> decision about whether we have target support is based on a query of the
> optabs (just like it is for scalar, and for other vectorisable tree codes).
> No new hooks are needed.
>
> The patch checked for target support that way.

Fair enough.  Still this means we should have tree codes for all builtins
that eventually are vectorized?  So why don't we have SIN_EXPR,
POW_EXPR (ok, I did argue and have patches for that in the past),
RINT_EXPR, SQRT_EXPR, etc?

This patch starts to go down that route which is why I ask for the
whole picture to be considered and hinted at the alternative implementation
which follows existing practice.  Add a expander in builtins.c, add an optab,
and eventual support to vectorized_function.

See for example ix86_builtin_vectorized_function which handles
sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
if the target supports it for the scalar mode, so not sure if there is
any x86 ISA where it has vectorized FMA but not scalar FMA).

>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>> > builtin enum value) is worse than a tree-code value, unless the limit
>> > of the tree_code bitfield is in sight (maybe it is).
>>
>> I think tree_code is 64bits now.
>
> Even better :-)

Yes.

I'm not against adding a corresponding tree code for all math builtin functions,
we just have to decide whether this is the way to go (and of course support
expanding those back to libcalls to libc/m rather than libgcc).  There are
also constraints on what kind of STRICT_FMIN_EXPR the compiler may
generate as the target may not be able to expand the long double variant
directly but needs a libcall but libm might not be linked or may not
have support
for it.  That would be a new thing compared to libgcc providing a fallback
for all other tree codes.

Richard.

> Thanks,
> Richard
>

Reply via email to