RE: [PATCH] [Aarch64] Optimize subtract in shift counts

Michael Collison Wed, 06 Sep 2017 08:41:20 -0700

Richard,

The problem with this approach for Aarch64 is that TARGET_SHIFT_TRUNCATION_MASK 
is based on SHIFT_COUNT_TRUNCATED which is normally 0 as it based on the 
TARGET_SIMD flag.


-----Original Message-----
From: Richard Sandiford [mailto:[email protected]] 
Sent: Wednesday, September 6, 2017 11:32 AM
To: Michael Collison <[email protected]>
Cc: Richard Biener <[email protected]>; Richard Kenner 
<[email protected]>; GCC Patches <[email protected]>; nd 
<[email protected]>; Andrew Pinski <[email protected]>
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Michael Collison <[email protected]> writes:
> Richard Sandiford do you have any objections to the patch as it stands?
> It doesn't appear as if anything is going to change in the mid-end 
> anytime soon.

I think one of the suggestions was to do it in expand, taking advantage of 
range info and TARGET_SHIFT_TRUNCATION_MASK.  This would be like the current 
FMA_EXPR handling in expand_expr_real_2.

I know there was talk about cleaner approaches, but at least doing the above 
seems cleaner than doing in the backend.  It should also be a nicely-contained 
piece of work.

Thanks,
Richard

> -----Original Message-----
> From: Richard Sandiford [mailto:[email protected]]
> Sent: Tuesday, August 22, 2017 9:11 AM
> To: Richard Biener <[email protected]>
> Cc: Richard Kenner <[email protected]>; Michael Collison 
> <[email protected]>; GCC Patches <[email protected]>; nd 
> <[email protected]>; Andrew Pinski <[email protected]>
> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>
> Richard Biener <[email protected]> writes:
>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>> <[email protected]> wrote:
>>> Richard Biener <[email protected]> writes:
>>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>>> <[email protected]> wrote:
>>>>>Richard Biener <[email protected]> writes:
>>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>>>>> <[email protected]> wrote:
>>>>>>>> Correct. It is truncated for integer shift, but not simd shift 
>>>>>>>> instructions. We generate a pattern in the split that only
>>>>>generates
>>>>>>>> the integer shift instructions.
>>>>>>>
>>>>>>> That's unfortunate, because it would be nice to do this in
>>>>>simplify_rtx,
>>>>>>> since it's machine-independent, but that has to be conditioned 
>>>>>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>>>>
>>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in 
>>>>>> the patterns, like for example with
>>>>>>
>>>>>> (define_insn ashlSI3
>>>>>>   [(set (match_operand 0 "")
>>>>>>          (ashl:SI (match_operand ... )
>>>>>>                      (subreg:QI (match_operand:SI ...)))]
>>>>>>
>>>>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>>>>magic
>>>>>> optimization we expect.
>>>>>
>>>>>The problem with the explicit AND is that you'd end up with either 
>>>>>an AND of two constants for constant shifts, or with two separate 
>>>>>patterns, one for constant shifts and one for variable shifts.  
>>>>>(And the problem in theory with two patterns is that it reduces the 
>>>>>RA's freedom, although in practice I guess we'd always want a 
>>>>>constant shift where possible for cost reasons, and so the RA would 
>>>>>never need to replace pseudos with constants itself.)
>>>>>
>>>>>I think all useful instances of this optimisation will be exposed 
>>>>>by the gimple optimisers, so maybe expand could to do it based on 
>>>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>>>>>the rtx code and it does take the mode into account.
>>>>
>>>> Sure, that could work as well and also take into account range info. 
>>>> But we'd then need named expanders and the result would still have 
>>>> the explicit and or need to be an unspec or a different RTL operation.
>>>
>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>>> target-dependent rather than undefined behaviour, so it's OK for a 
>>> target to use shift codes with out-of-range values.
>>
>> Hmm, but that means simplify-rtx can't do anything with them because 
>> we need to preserve target dependent behavior.
>
> Yeah, it needs to punt.  In practice that shouldn't matter much.
>
>> I think the RTL IL should be always well-defined and its semantics 
>> shouldn't have any target dependences (ideally, and if, then they 
>> should be well specified via extra target hooks/macros).
>
> That would be nice :-) I think the problem has traditionally been that
>> shifts can be used in quite a few define_insn patterns besides those 
>> for shift instructions.  So if your target defines shifts to have 
>> 256-bit precision (say) then you need to make sure that every 
>> define_insn with a shift rtx will honour that.
>
> It's more natural for target guarantees to apply to instructions than 
> to
>> rtx codes.
>
>>>  And
>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about 
>>> how the normal shift optabs behave, so I don't think we'd need new 
>>> optabs or new unspecs.
>>>
>>> E.g. it already works this way when expanding double-word shifts, 
>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There 
>>> it's possible to use a shorter sequence if you know that the shift 
>>> optab truncates the count, so we can do that even if 
>>> SHIFT_COUNT_TRUNCATED isn't defined.
>>
>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
>> applies to the instructions generated by the named shift patterns but 
>> _not_ general shift RTXen.  But the generated pattern contains shift 
>> RTXen and how can we figure whether they were generated by the named 
>> expanders or by other means?  Don't define_expand also serve as 
>> define_insn for things like combine?
>
> Yeah, you can't (and aren't supposed to try to) reverse-engineer the
>> expander from the generated instructions.
>> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding 
>> a shift optab.
>
>> That said, from looking at the docs and your description above it 
>> seems that semantics are not fully reflected in the RTL instruction stream?
>
> Yeah, the semantics go from being well-defined in the optab interface to 
> being target-dependent in the rtl stream.
>
> Thanks,
> Richard
>
>>
>> Richard.
>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Richard.
>>>>
>>>>>Thanks,
>>>>>Richard

RE: [PATCH] [Aarch64] Optimize subtract in shift counts

Reply via email to