Richard Biener <richard.guent...@gmail.com> writes: > On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford > <richard.sandif...@arm.com> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford >>> <richard.sandif...@arm.com> wrote: >>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood >>>>> <david.sherw...@arm.com> wrote: >>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood >>>>>>> <david.sherw...@arm.com> wrote: >>>>>>> > Hi Richard, >>>>>>> > >>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this >>>>>>> > seemed more >>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In >>>>>>> > addition it >>>>>>> > would seem to provide more opportunities for optimisation than a >>>>>>> > target-specific >>>>>>> > builtin implementation would. I accept that optimisation >>>>>>> > opportunities will >>>>>>> > be more limited for strict math compilation, but that it was still >>>>>>> > worth having >>>>>>> > them. Also, if we did map it to builtins then the scalar >>>>>>> > version would go >>>>>>> > through the optabs and the vector version would go through the >>>>>>> > target's builtin >>>>>>> > expansion, which doesn't seem very consistent. >>>>>>> >>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus >>>>>>> you can't vectorize anyway? (strict IEEE behavior is about NaNs, >>>>>>> correct?) >>>>>> I thought for this particular case associativity wasn't an issue? >>>>>> We're not doing any >>>>>> reductions here, just simply performing max/min operations on each >>>>>> pair of elements >>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to >>>>>> ensure that for >>>>>> each pair of elements if one element is a NaN we return the other one. >>>>> >>>>> Hmm, true. Ok, my comment still stands - I don't see that using a >>>>> tree code is the best thing to do here. You can add fmin/max optabs >>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target >>>>> builtin for the vectorized variant. >>>>> >>>>> The reason I am pushing against a new tree code is that we'd have an >>>>> awful lot of similar codes when pushing other flag related IL >>>>> specialities to actual IL constructs. And we still need to find a >>>>> consistent way to do that. >>>> >>>> In this case though the new code is really the "native" min/max operation >>>> for fp, rather than some weird flag-dependent behaviour. Maybe it's >>>> a bit unfortunate that the non-strict min/max fp operation got mapped >>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really >>>> the flag-related modification. The STRICT_* prefix is forced by that and >>>> might make it seem like more of a special case than it really is. >>> >>> In some sense. But the "strict" version already has a builtin (just no >>> special expander in builtins.c). We usually don't add 1:1 tree codes >>> for existing builtins (why have builtins at all then?). >> >> We still need the builtin to match the C function (and to allow direct >> calls to __builtin_fmin, etc., which are occasionally useful). >> >>>> If you're still not convinced, how about an internal function instead >>>> of a built-in function, so that we can continue to use optabs for all >>>> cases? I'd really like to avoid forcing such a generic concept down to >>>> target-specific builtins with target-specific expansion code, especially >>>> when the same concept is exposed by target-independent code for scalars. >>> >>> The target builtin is for the vectorized variant - not all targets might >>> have >>> that and we'd need to query the target about this. So using a IFN would >>> mean adding a target hook for that query. >> >> No, the idea is that if we have a tree code or an internal function, the >> decision about whether we have target support is based on a query of the >> optabs (just like it is for scalar, and for other vectorisable tree codes). >> No new hooks are needed. >> >> The patch checked for target support that way. > > Fair enough. Still this means we should have tree codes for all builtins > that eventually are vectorized? So why don't we have SIN_EXPR, > POW_EXPR (ok, I did argue and have patches for that in the past), > RINT_EXPR, SQRT_EXPR, etc?
Yeah, it doesn't sound so bad to me :-) The choice of what's a function in C and what's inherent is pretty arbitrary. E.g. % on doubles could have implemented fmod() or remainder(). Casts from double to int could have used the current rounding mode, but instead they truncate and conversions using the rounding mode need to go through something like (l)lrint(). Like you say, pow() could have been an operator (and is in many languages), but instead it's a function. > This patch starts to go down that route which is why I ask for the > whole picture to be considered and hinted at the alternative implementation > which follows existing practice. Add a expander in builtins.c, add an optab, > and eventual support to vectorized_function. > > See for example ix86_builtin_vectorized_function which handles > sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR > if the target supports it for the scalar mode, so not sure if there is > any x86 ISA where it has vectorized FMA but not scalar FMA). Yeah. TBH I'm really against doing that unless (a) there's good reason to believe that the concept really is specific to one target and wouldn't be implemented on others or (b) there really is a function rather than an instruction underneath (usually the case for sin, etc.). But (b) could also be handled by the optab support library mechanism. Reasons against using target-specific builtins for operations that have direct support in the ISA: 1. Like you say, in practice vector ops only tend to be supported if the associated scalar op is also supported. Sticking to this approach means that vector ops follow a different path from scalar ops whereas (for example) division follows the same path for both. It just seems confusing to have some floating-point optabs that support both scalar and vector operands and others that only support scalar operands. 2. Once converted to a target-specific function, the target-independent code has no idea what the function does or how expensive it is. We might start out with just one hook to convert a scalar operation to a target-dependent built-in function, but I bet over time we'll grow other hooks to query properties about the function, such as costs. 3. builtin_vectorized_function returns a decl rather than a call. If the target's vector API doesn't already have a built-in for the operation we need, with the exact types and arguments that we expect, the target needs to define one, presumably marked so that it isn't callable by input code. E.g. on targets where FP conversion instructions allow an explicit rounding mode to be specified as an operand, it's reasonable for a target's vector API to expose that operand as a constant argument to the API function. There'd then be one API function for all vector- float-to-vector-float integer rounding operations, rather than one for vector rint(), one for vector ceil(), etc. (I'm thinking of System z instructions here, although I don't know offhand what the vector API is there.) IMO it doesn't make sense to force the target to define "fake" built-in functions for all those possibilities purely for the sake of the target hook. It's a lot of extra code, and it's extra code that would be duplicated on any target that needs to do this. IMO optabs are the best way for the target to tell the target-independent code what it can do. If it supports sqrt on df it defines sqrtdf and if it supports vector sqrt on v2df it defines sqrtv2df. These patterns will often be a single define_expand or define_insn template -- the vectorness often comes "for free" in terms of writing the pattern. >>> > TBH though I'm not sure why an internal_fn value (or a target-specific >>> > builtin enum value) is worse than a tree-code value, unless the limit >>> > of the tree_code bitfield is in sight (maybe it is). >>> >>> I think tree_code is 64bits now. >> >> Even better :-) > > Yes. > > I'm not against adding a corresponding tree code for all math builtin > functions, > we just have to decide whether this is the way to go (and of course support > expanding those back to libcalls to libc/m rather than libgcc). There are > also constraints on what kind of STRICT_FMIN_EXPR the compiler may > generate as the target may not be able to expand the long double variant > directly but needs a libcall but libm might not be linked or may not > have support > for it. That would be a new thing compared to libgcc providing a fallback > for all other tree codes. True, but that doesn't seem too bad. The constraints would be the same if we're operating on built-in functions rather than codes. I suppose built-in functions make this more explicit, but at the end of the day it's a costing decision. We should no more be converting a cheap operation into an expensive libgcc function than converting a cheap operation into an expensive libm function, even if the libgcc conversion links. There's certainly precedent for introducing calls to things that libgcc doesn't define. E.g. we already introduce calls to memcpy in things like loop distribution, even though we don't provide a fallback memcpy in libgcc. Thanks, Richard