> On 6 Aug 2024, at 4:14 PM, Richard Sandiford <[email protected]> > wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <[email protected]> writes: >>> On 5 Aug 2024, at 18:00, Richard Sandiford <[email protected]> >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Kyrylo Tkachov <[email protected]> writes: >>>>> On 5 Aug 2024, at 12:01, Richard Sandiford <[email protected]> >>>>> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> Jennifer Schmitz <[email protected]> writes: >>>>>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case >>>>>> 1) the predicate is svptrue and >>>>>> 2) dividend and divisor are equal. >>>>>> This is implemented in the gimple_folder for signed and unsigned >>>>>> integers. Corresponding test cases were added to the existing test >>>>>> suites. >>>>>> >>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>>>>> regression. >>>>>> OK for mainline? >>>>>> >>>>>> Please also advise whether it makes sense to implement the same >>>>>> optimization >>>>>> for float types and if so, under which conditions? >>>>> >>>>> I think we should instead use const_binop to try to fold the division >>>>> whenever the predicate is all-true, or if the function uses _x >>>>> predication. >>>>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.) >>>>> >>>> >>>> From what I can see const_binop only works on constant arguments. >>> >>> Yeah, it only produces a result for constant arguments. I see now >>> that that isn't the case that the patch is interested in, sorry. >>> >>>> Is fold_binary a better interface to use ? I think it’d hook into the >>>> match.pd machinery for divisions at some point. >>> >>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember. >>> (I realise we'd be using it only to test whether the result is constant, >>> but even so.) >>> >>> Have you (plural) come across a case where svdiv is used with equal >>> non-constant arguments? If it's just being done on first principles >>> then how about starting with const_binop instead? If possible, it'd be >>> good to structure it so that we can reuse the code for svadd, svmul, >>> svsub, etc. >> >> We’ve had a bit of internal discussion on this to get our ducks in a row. >> We are interested in having more powerful folding of SVE intrinsics >> generally and we’d like some advice on how best to approach this. >> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes >> where possible when they are _x-predicated or have a ptrue predicate. >> Hopefully that would allow us to get all the match.pd and fold-const.cc >> <http://fold-const.cc/> optimizations “for free”. >> Would that be a reasonable direction rather than adding custom folding code >> to individual intrinsics such as svdiv? >> We’d need to ensure that the midend knows how to expand such GIMPLE codes >> with VLA types and that the required folding rules exist in match.pd (though >> maybe they work already for VLA types?) > > Expansion shouldn't be a problem, since we already rely on that for > autovectorisation. > > But I think this comes back to what we discussed earlier, in the context > of whether we should replace divisions by constants with multi-instruction > alternatives. My comment there was:
>
>
> If people want to write out a calculation in natural arithmetic, it
> would be better to write the algorithm in scalar code and let the
> vectoriser handle it. That gives the opportunity for many more
> optimisations than just this one.
>
It’s been a while and apologies if I’m coming in a bit late in this and
possibly that thinking has moved on. I’ve always viewed ACLE as an extension to
the language and thus fair game for compilers to optimise . For folks who
really really need that instruction there’s also inline asm :)
The approach for implementing the ACLE intrinsics for both AArch32 and AArch64
used to be:
1. express the intrinsics with GNU C / C++ (see implementations in arm_neon.h)
if feasible and semantics match up.
2. fall back to gimple folding / representation if semantics matched up.
3. RTL unspecs (if no representation feasible , fall back to it )
In the case of SVE VLA intrinsics there is no GNU C feasible, but if there was
gimple representation possible shouldn’t we go to that ?
With Advanced SIMD the behaviour the user sees the behaviour as per 1 above
(see the implementation of the basic arithmetic operations for neon in GNUC).
Is there any reason that SVE needs to be different in its treatment in the
backend ?
I could be missing something here...
regards
Ramana
> Intrinsics are about giving programmers direct, architecture-level
> control over how something is implemented. I've seen Arm's library
> teams go to great lengths to work out which out of a choice of
> instruction sequences is the best one, even though the sequences in
> question would look functionally equivalent to a smart-enough compiler.
>
> So part of the work of using intrinsics is to figure out what the best
> sequence is. And IMO, part of the contract is that the compiler
> shouldn't interfere with the programmer's choices too much. If the
> compiler makes a change, it must very confident that it is a win for
> the function as a whole.
>
> Replacing one division with one shift is fine, as an aid to the programmer.
> It removes the need for (say) templated functions to check for that case
> manually. Constant folding is fine too, for similar reasons. In these
> cases, there's not really a cost/benefit choice to be made between
> different expansions. One choice is objectively better in all
> realistic situations.
>
> But when it comes to general constants, there are many different choices
> that could be made when deciding which constants should be open-coded
> and which shouldn't. IMO we should leave the choice to the programmer
> in those cases. If the compiler gets it wrong, there will be no way
> for the programmer to force the compiler's hand ("no, when I say svdiv,
> I really do mean svdiv").
>
> If we just replace svmul and svdiv with MULT_EXPR and *DIV_EXPR,
> we'd be discarding the user's instruction choices and imposing our own.
>
> FWIW, Tejas is looking at adding support for C/C++ operators on VLA
> vectors (__ARM_FEATURE_SVE_VECTOR_OPERATORS). That would then give
> the user the choice of writing the arithmetic "naturally" or using
> intrinsics. The former is better for users who want the compiler to choose
> the instructions, while the latter is betterfor users who want to control
> the implementation themselves.
>
> Thanks,
> Richard
smime.p7s
Description: S/MIME cryptographic signature
