Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2018-01-24 Thread Alan Hayward
Ping.
Any comments on this?
The one line summary is that using self sets instead of clobber high would 
result in a
patch roughly the same, but with different condition checks.
It depends if people think it really is useful for self sets to not be live.

Given that we are at stage 4 now, and this can’t go in until stage 1, I’m happy 
to
leave the discussion until stage 1, but would appreciate any suggestions before 
then.

Thanks,
Alan.

> On 12 Jan 2018, at 11:58, Alan Hayward  wrote:
> 
> 
> 
>> On 19 Dec 2017, at 16:27, Jeff Law  wrote:
>> 
>> On 12/19/2017 03:12 AM, Alan Hayward wrote:
>>> Ping ping.
>>> I think there should be enough information in the first test to show that 
>>> any "set to self”
>>> registers become live. Let me know if there’s anything I’ve missed.
>> I think that both Richi and I would like you investigate fixing the df
>> infrastructure so that a self-set doesn't make things go live.  Sorry if
>> we weren't clear about that.  ie, level of complexity and likely fallout
>> so that we can evaluate that vs CLOBBER_HIGH.
>> 
>> 
>> 
>> jeff
>> 
> 
> 
> Right, sorry, I misunderstood. Ok, so I’ve been looking at trying to do this.
> 
> To summarise: To do this we need to check for 1) All reg sets to self where 
> 2) the existing value of the register fits within the mode of the new set. In 
> these cases we want then to (in effect) pretend the set doesn’t exist.
> To test this, I add a set to self in a tls_desc call for every V register ( 
> eg: (set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM)), (set (reg:TI V1_REGNUM) 
> (reg:TI V1_REGNUM))  etc etc). If the patch works, then these registers will 
> not be backed up around a tls call.
> 
> First added some checks into df-scan and successfully stopped the sets to 
> self from being added to the live and uses lists.  [ To simplify the issue 
> for now I ignored the existing value of the register, and just assumed it 
> fits ].
> However, running my test case, the code still results in my vector registers 
> being backed up around tls. Even though the dumps show that my vector 
> registers are no longer live.
> 
> Debugging further, finds that the register backing up is happening as part of 
> combine.c. Ok, I can add a check for reg set to self in here too…..
> But as part of my clobber_high patch I already had code in clobber.c that 
> checked for CLOBBER_HIGH and then checked the mode of the previous value in 
> the register. My new code ends up looking very similar to my clobber high 
> patch.
> 
> Instead of:
> 
> if (GET_CODE (setter) == CLOBBER_HIGH
>&& reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE 
> (rsp->last_set_value))
> 
> Now becomes something like:
> 
> if (GET_CODE (setter) == SET
>&& REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == 
> REGNO(src)
>&& reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE 
> (rsp->last_set_value))
> 
> I then need to find the next pass that has similar checks…. and again it’ll 
> be the same places I already have clobber high code.
> 
> I suspect in the end I’ll be able to remove the df-scan changes, because as I 
> effectively found out with clobber high, they aren’t causing any register 
> backups to happen.
> 
> Ok, in the new patch we do save a bit of code because there is no new 
> expression to add. But that was a small part of the full patch.
> 
> I could rewrite the patch in this way, but personally I feel it’s now 
> exploiting a side effect of a set to self, rather than being explicit in what 
> it is trying to do. 
> 
> 
> Alan.
> 



Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2018-01-12 Thread Alan Hayward


> On 19 Dec 2017, at 16:27, Jeff Law  wrote:
> 
> On 12/19/2017 03:12 AM, Alan Hayward wrote:
>> Ping ping.
>> I think there should be enough information in the first test to show that 
>> any "set to self”
>> registers become live. Let me know if there’s anything I’ve missed.
> I think that both Richi and I would like you investigate fixing the df
> infrastructure so that a self-set doesn't make things go live.  Sorry if
> we weren't clear about that.  ie, level of complexity and likely fallout
> so that we can evaluate that vs CLOBBER_HIGH.
> 
> 
> 
> jeff
> 


Right, sorry, I misunderstood. Ok, so I’ve been looking at trying to do this.

To summarise: To do this we need to check for 1) All reg sets to self where 2) 
the existing value of the register fits within the mode of the new set. In 
these cases we want then to (in effect) pretend the set doesn’t exist.
To test this, I add a set to self in a tls_desc call for every V register ( eg: 
(set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM)), (set (reg:TI V1_REGNUM) (reg:TI 
V1_REGNUM))  etc etc). If the patch works, then these registers will not be 
backed up around a tls call.

First added some checks into df-scan and successfully stopped the sets to self 
from being added to the live and uses lists.  [ To simplify the issue for now I 
ignored the existing value of the register, and just assumed it fits ].
However, running my test case, the code still results in my vector registers 
being backed up around tls. Even though the dumps show that my vector registers 
are no longer live.

Debugging further, finds that the register backing up is happening as part of 
combine.c. Ok, I can add a check for reg set to self in here too…..
But as part of my clobber_high patch I already had code in clobber.c that 
checked for CLOBBER_HIGH and then checked the mode of the previous value in the 
register. My new code ends up looking very similar to my clobber high patch.

Instead of:

if (GET_CODE (setter) == CLOBBER_HIGH
&& reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE 
(rsp->last_set_value))

Now becomes something like:

if (GET_CODE (setter) == SET
&& REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == 
REGNO(src)
&& reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE (rsp->last_set_value))

I then need to find the next pass that has similar checks…. and again it’ll be 
the same places I already have clobber high code.

I suspect in the end I’ll be able to remove the df-scan changes, because as I 
effectively found out with clobber high, they aren’t causing any register 
backups to happen.

Ok, in the new patch we do save a bit of code because there is no new 
expression to add. But that was a small part of the full patch.

I could rewrite the patch in this way, but personally I feel it’s now 
exploiting a side effect of a set to self, rather than being explicit in what 
it is trying to do. 


Alan.



Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-12-19 Thread Jeff Law
On 12/19/2017 03:12 AM, Alan Hayward wrote:
> Ping ping.
> I think there should be enough information in the first test to show that any 
> "set to self”
> registers become live. Let me know if there’s anything I’ve missed.
I think that both Richi and I would like you investigate fixing the df
infrastructure so that a self-set doesn't make things go live.  Sorry if
we weren't clear about that.  ie, level of complexity and likely fallout
so that we can evaluate that vs CLOBBER_HIGH.



jeff



Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-12-19 Thread Alan Hayward
Ping ping.
I think there should be enough information in the first test to show that any 
"set to self”
registers become live. Let me know if there’s anything I’ve missed.


Thanks,
Alan.

> On 12 Dec 2017, at 11:11, Alan Hayward  wrote:
> 
> Ping.
> 
>> On 30 Nov 2017, at 11:03, Alan Hayward  wrote:
>> 
>> 
>>> On 27 Nov 2017, at 17:29, Jeff Law  wrote:
>>> 
>>> On 11/23/2017 04:11 AM, Alan Hayward wrote:
 
> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
> 
> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>> 
>>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
 
> 
> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
> totally forgotten about it.  And in fact it seems to come pretty close
> to what you need…
 
 Yes, some of the code is similar to the way
 TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
 CLOBBER expr code served as a starting point for writing the patch. 
 The main difference
 here, is that _PART_CLOBBERED is around all calls and is not tied to a 
 specific Instruction,
 it’s part of the calling abi. Whereas clobber_high is explicitly tied 
 to an expression (tls_desc).
 It meant there wasn’t really any opportunity to resume any existing 
 code.
>>> Understood.  Though your first patch mentions that you're trying to
>>> describe partial preservation "around TLS calls". Presumably those are
>>> represented as normal insns, not call_insn.
>>> 
>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>> to itself using whatever mode is wide enough to cover the neon part of
>>> the register.
>>> 
>>> That should tell the generic parts of the compiler that you're just
>>> clobbering the upper part and at least in theory you can implement in
>>> the aarch64 backend and the rest of the compiler should "just work"
>>> because that's the existing semantics of a subreg store.
>>> 
>>> The only worry would be if a pass tried to get overly smart and
>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>> wrong given the semantics of a partial store.
>>> 
>> 
>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>> It’s something we considered, and then dismissed.
>> 
>> The problem then is you are now using SET semantics on those registers, 
>> and it
>> would make the register live around the function, which might not be the 
>> case.
>> Whereas clobber semantics will just make the register dead - which is 
>> exactly
>> what we want (but only conditionally).
> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
> low part is preserved while the upper part is clobbered across the TLS
> insns.
> 
> jeff
 
 Consider where the TLS call is inside a loop. The compiler would normally 
 want
 to hoist that out of the loop. By adding a set(x,x) into the parallel of 
 the tls_desc we
 are now making x live across the loop, x is dependant on the value from 
 the previous
 iteration, and the tls_desc can no longer be hoisted.
>>> Hmm.  I think I see the problem you're trying to point out.  Let me
>>> restate it and see if you agree.
>>> 
>>> The low subreg set does clearly indicate the upper part of the SVE
>>> register is clobbered.  The problem is from a liveness standpoint the
>>> compiler is considering the low part live, even though it's a self-set.
>>> 
>> 
>> Yes.
>> 
>>> In fact, if that is the case, then a single TLS call (independent of a
>>> loop) would make the low part of the register globally live.  This
>>> should be testable.  Include one of these low part self sets on the
>>> existing TLS calls and compile a little test function and let's look at
>>> the liveness data.
>>> 
>> 
>> 
>> Ok, so I’ve put 
>> (set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM))
>> (set (reg:TI V1_REGNUM) (reg:TI V1_REGNUM))
>> etc up to V31 in the UNSPEC_TLSDESC pattern in aarch64.md, using up all the 
>> neon registers.
>> 
>> In an ideal world, this change would do nothing as neon reg size is TI.
>> 
>> First I compiled up  sve_tls_preserve_1.c (Test1 below)
>> 
>> Without the SET changes, gcc does not backup any neon registers before the 
>> tlsdesccall (good).
>> With the SET changes, gcc backs up all the neon registers (not ideal).
>> 
>> Without the SET changes, dump logs give:
>> 
>> cse1:
>> ;;  regs ever live0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 66 [cc]
>> 
>> cprop1:
>> ;; lr  in 29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
>> ;; lr  use29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
>> ;; lr  def0 [x0] 30 [x30] 32 [v0] 66 [cc] 79 80 81 82 84 85 86 87 88 89
>> 

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-12-12 Thread Alan Hayward
Ping.

> On 30 Nov 2017, at 11:03, Alan Hayward  wrote:
> 
> 
>> On 27 Nov 2017, at 17:29, Jeff Law  wrote:
>> 
>> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>>> 
 On 22 Nov 2017, at 17:33, Jeff Law  wrote:
 
 On 11/22/2017 04:31 AM, Alan Hayward wrote:
> 
>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>>> 
 
 You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
 totally forgotten about it.  And in fact it seems to come pretty close
 to what you need…
>>> 
>>> Yes, some of the code is similar to the way
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>> main difference
>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>> specific Instruction,
>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied 
>>> to an expression (tls_desc).
>>> It meant there wasn’t really any opportunity to resume any existing 
>>> code.
>> Understood.  Though your first patch mentions that you're trying to
>> describe partial preservation "around TLS calls". Presumably those are
>> represented as normal insns, not call_insn.
>> 
>> That brings me back to Richi's idea of exposing a set of the low subreg
>> to itself using whatever mode is wide enough to cover the neon part of
>> the register.
>> 
>> That should tell the generic parts of the compiler that you're just
>> clobbering the upper part and at least in theory you can implement in
>> the aarch64 backend and the rest of the compiler should "just work"
>> because that's the existing semantics of a subreg store.
>> 
>> The only worry would be if a pass tried to get overly smart and
>> considered that kind of set a nop -- but I think I'd argue that's simply
>> wrong given the semantics of a partial store.
>> 
> 
> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
> It’s something we considered, and then dismissed.
> 
> The problem then is you are now using SET semantics on those registers, 
> and it
> would make the register live around the function, which might not be the 
> case.
> Whereas clobber semantics will just make the register dead - which is 
> exactly
> what we want (but only conditionally).
 ?!?  A set of the subreg is the *exact* semantics you want.  It says the
 low part is preserved while the upper part is clobbered across the TLS
 insns.
 
 jeff
>>> 
>>> Consider where the TLS call is inside a loop. The compiler would normally 
>>> want
>>> to hoist that out of the loop. By adding a set(x,x) into the parallel of 
>>> the tls_desc we
>>> are now making x live across the loop, x is dependant on the value from the 
>>> previous
>>> iteration, and the tls_desc can no longer be hoisted.
>> Hmm.  I think I see the problem you're trying to point out.  Let me
>> restate it and see if you agree.
>> 
>> The low subreg set does clearly indicate the upper part of the SVE
>> register is clobbered.  The problem is from a liveness standpoint the
>> compiler is considering the low part live, even though it's a self-set.
>> 
> 
> Yes.
> 
>> In fact, if that is the case, then a single TLS call (independent of a
>> loop) would make the low part of the register globally live.  This
>> should be testable.  Include one of these low part self sets on the
>> existing TLS calls and compile a little test function and let's look at
>> the liveness data.
>> 
> 
> 
> Ok, so I’ve put 
> (set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM))
> (set (reg:TI V1_REGNUM) (reg:TI V1_REGNUM))
> etc up to V31 in the UNSPEC_TLSDESC pattern in aarch64.md, using up all the 
> neon registers.
> 
> In an ideal world, this change would do nothing as neon reg size is TI.
> 
> First I compiled up  sve_tls_preserve_1.c (Test1 below)
> 
> Without the SET changes, gcc does not backup any neon registers before the 
> tlsdesccall (good).
> With the SET changes, gcc backs up all the neon registers (not ideal).
> 
> Without the SET changes, dump logs give:
> 
> cse1:
> ;;  regs ever live 0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 66 [cc]
> 
> cprop1:
> ;; lr  in  29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; lr  use 29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; lr  def 0 [x0] 30 [x30] 32 [v0] 66 [cc] 79 80 81 82 84 85 86 87 88 89
> ;; live  in29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
> ;; live  gen   0 [x0] 32 [v0] 78 79 80 81 82 83 84 85 86 87 88 89
> ;; live  kill  30 [x30] 66 [cc]
> ;; lr  out 29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
> ;; live  out   29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
> 
> reload:
> ;;  regs ever live 0 [x0] 1 [x1] 2 [x2] 29 [x29] 30 [x30] 32 

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-30 Thread Alan Hayward

> On 27 Nov 2017, at 17:29, Jeff Law  wrote:
> 
> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>> 
>>> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
>>> 
>>> On 11/22/2017 04:31 AM, Alan Hayward wrote:
 
> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>> 
>>> 
>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>> totally forgotten about it.  And in fact it seems to come pretty close
>>> to what you need…
>> 
>> Yes, some of the code is similar to the way
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>> CLOBBER expr code served as a starting point for writing the patch. The 
>> main difference
>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>> specific Instruction,
>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
>> an expression (tls_desc).
>> It meant there wasn’t really any opportunity to resume any existing code.
> Understood.  Though your first patch mentions that you're trying to
> describe partial preservation "around TLS calls". Presumably those are
> represented as normal insns, not call_insn.
> 
> That brings me back to Richi's idea of exposing a set of the low subreg
> to itself using whatever mode is wide enough to cover the neon part of
> the register.
> 
> That should tell the generic parts of the compiler that you're just
> clobbering the upper part and at least in theory you can implement in
> the aarch64 backend and the rest of the compiler should "just work"
> because that's the existing semantics of a subreg store.
> 
> The only worry would be if a pass tried to get overly smart and
> considered that kind of set a nop -- but I think I'd argue that's simply
> wrong given the semantics of a partial store.
> 
 
 So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
 It’s something we considered, and then dismissed.
 
 The problem then is you are now using SET semantics on those registers, 
 and it
 would make the register live around the function, which might not be the 
 case.
 Whereas clobber semantics will just make the register dead - which is 
 exactly
 what we want (but only conditionally).
>>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>>> low part is preserved while the upper part is clobbered across the TLS
>>> insns.
>>> 
>>> jeff
>> 
>> Consider where the TLS call is inside a loop. The compiler would normally 
>> want
>> to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
>> tls_desc we
>> are now making x live across the loop, x is dependant on the value from the 
>> previous
>> iteration, and the tls_desc can no longer be hoisted.
> Hmm.  I think I see the problem you're trying to point out.  Let me
> restate it and see if you agree.
> 
> The low subreg set does clearly indicate the upper part of the SVE
> register is clobbered.  The problem is from a liveness standpoint the
> compiler is considering the low part live, even though it's a self-set.
> 

Yes.

> In fact, if that is the case, then a single TLS call (independent of a
> loop) would make the low part of the register globally live.  This
> should be testable.  Include one of these low part self sets on the
> existing TLS calls and compile a little test function and let's look at
> the liveness data.
> 


Ok, so I’ve put 
(set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM))
(set (reg:TI V1_REGNUM) (reg:TI V1_REGNUM))
etc up to V31 in the UNSPEC_TLSDESC pattern in aarch64.md, using up all the 
neon registers.

In an ideal world, this change would do nothing as neon reg size is TI.

First I compiled up  sve_tls_preserve_1.c (Test1 below)

Without the SET changes, gcc does not backup any neon registers before the 
tlsdesccall (good).
With the SET changes, gcc backs up all the neon registers (not ideal).

Without the SET changes, dump logs give:

cse1:
;;  regs ever live   0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 66 [cc]

cprop1:
;; lr  in29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; lr  use   29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; lr  def   0 [x0] 30 [x30] 32 [v0] 66 [cc] 79 80 81 82 84 85 86 87 88 89
;; live  in  29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; live  gen 0 [x0] 32 [v0] 78 79 80 81 82 83 84 85 86 87 88 89
;; live  kill30 [x30] 66 [cc]
;; lr  out   29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
;; live  out 29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]

reload:
;;  regs ever live   0 [x0] 1 [x1] 2 [x2] 29 [x29] 30 [x30] 32 [v0] 33 [v1] 
34 [v2] 35 [v3] 36 [v4] 66 [cc]

With the set changes:

cse1:
;;  regs ever live   0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 35 [v3] 36 
[v4] 37 [v5] 38 [v6] 39 [v7] 40 [v8] 41 [v9] 42 [v10] 43 [v11] 44 

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-28 Thread Jeff Law
On 11/28/2017 04:55 AM, Richard Biener wrote:

>>> Or consider a stream of code containing two tls_desc calls (ok, the 
>>> compiler might
>>> optimise one of the tls calls away, but this approach should be reusable 
>>> for other exprs).
>>> Between the two set(x,x)’s x is considered live so the register allocator 
>>> can’t use that
>>> register.
>>> Given that we are applying this to all the neon registers, the register 
>>> allocator now throws
>>> an ICE because it can’t find any free hard neon registers to use.
>> Given your statements it sounds like the liveness infrastructure is
>> making those neon regs globally live when it sees the low part subreg
>> self-set.  Let's confirm that one way or the other and see where it
>> takes us.
> 
> Indeed in (set (subreg:neon reg1) (subreg:neon reg1)) it appears that
> the lowpart of reg1
> is used and thus it is live but liveness analysis can (and should)
> simply ignore such sets.
My suggestion was going to be to peek a bit at the life analysis code if
indeed my suspicion was true.

Jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-28 Thread Richard Biener
On Mon, Nov 27, 2017 at 6:29 PM, Jeff Law  wrote:
> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>>
>>> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
>>>
>>> On 11/22/2017 04:31 AM, Alan Hayward wrote:

> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>>
>>>
>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>> totally forgotten about it.  And in fact it seems to come pretty close
>>> to what you need…
>>
>> Yes, some of the code is similar to the way
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>> CLOBBER expr code served as a starting point for writing the patch. The 
>> main difference
>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>> specific Instruction,
>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
>> an expression (tls_desc).
>> It meant there wasn’t really any opportunity to resume any existing code.
> Understood.  Though your first patch mentions that you're trying to
> describe partial preservation "around TLS calls". Presumably those are
> represented as normal insns, not call_insn.
>
> That brings me back to Richi's idea of exposing a set of the low subreg
> to itself using whatever mode is wide enough to cover the neon part of
> the register.
>
> That should tell the generic parts of the compiler that you're just
> clobbering the upper part and at least in theory you can implement in
> the aarch64 backend and the rest of the compiler should "just work"
> because that's the existing semantics of a subreg store.
>
> The only worry would be if a pass tried to get overly smart and
> considered that kind of set a nop -- but I think I'd argue that's simply
> wrong given the semantics of a partial store.
>

 So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
 It’s something we considered, and then dismissed.

 The problem then is you are now using SET semantics on those registers, 
 and it
 would make the register live around the function, which might not be the 
 case.
 Whereas clobber semantics will just make the register dead - which is 
 exactly
 what we want (but only conditionally).
>>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>>> low part is preserved while the upper part is clobbered across the TLS
>>> insns.
>>>
>>> jeff
>>
>> Consider where the TLS call is inside a loop. The compiler would normally 
>> want
>> to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
>> tls_desc we
>> are now making x live across the loop, x is dependant on the value from the 
>> previous
>> iteration, and the tls_desc can no longer be hoisted.
> Hmm.  I think I see the problem you're trying to point out.  Let me
> restate it and see if you agree.
>
> The low subreg set does clearly indicate the upper part of the SVE
> register is clobbered.  The problem is from a liveness standpoint the
> compiler is considering the low part live, even though it's a self-set.
>
> In fact, if that is the case, then a single TLS call (independent of a
> loop) would make the low part of the register globally live.  This
> should be testable.  Include one of these low part self sets on the
> existing TLS calls and compile a little test function and let's look at
> the liveness data.
>
>
> Now it could be the case that various local analysis could sub-optimally
> handle things.  You mention LICM.  I know our original LICM did have a
> problem in that if it saw a use of a hard reg in a loop without seeing a
> set of that hard reg it considered the register varying within the loop.
>  I have no idea if we carried that forward when the loop code was
> rewritten (when I looked at this it was circa 1992).
>
>
>>
>> Or consider a stream of code containing two tls_desc calls (ok, the compiler 
>> might
>> optimise one of the tls calls away, but this approach should be reusable for 
>> other exprs).
>> Between the two set(x,x)’s x is considered live so the register allocator 
>> can’t use that
>> register.
>> Given that we are applying this to all the neon registers, the register 
>> allocator now throws
>> an ICE because it can’t find any free hard neon registers to use.
> Given your statements it sounds like the liveness infrastructure is
> making those neon regs globally live when it sees the low part subreg
> self-set.  Let's confirm that one way or the other and see where it
> takes us.

Indeed in (set (subreg:neon reg1) (subreg:neon reg1)) it appears that
the lowpart of reg1
is used and thus it is live but liveness analysis can (and should)
simply ignore such sets.

> Jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-27 Thread Jeff Law
On 11/23/2017 04:11 AM, Alan Hayward wrote:
> 
>> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
>>
>> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>>>
 On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>
>>
>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>> totally forgotten about it.  And in fact it seems to come pretty close
>> to what you need…
>
> Yes, some of the code is similar to the way
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
> CLOBBER expr code served as a starting point for writing the patch. The 
> main difference
> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
> specific Instruction,
> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
> an expression (tls_desc).
> It meant there wasn’t really any opportunity to resume any existing code.
 Understood.  Though your first patch mentions that you're trying to
 describe partial preservation "around TLS calls". Presumably those are
 represented as normal insns, not call_insn.

 That brings me back to Richi's idea of exposing a set of the low subreg
 to itself using whatever mode is wide enough to cover the neon part of
 the register.

 That should tell the generic parts of the compiler that you're just
 clobbering the upper part and at least in theory you can implement in
 the aarch64 backend and the rest of the compiler should "just work"
 because that's the existing semantics of a subreg store.

 The only worry would be if a pass tried to get overly smart and
 considered that kind of set a nop -- but I think I'd argue that's simply
 wrong given the semantics of a partial store.

>>>
>>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>>> It’s something we considered, and then dismissed.
>>>
>>> The problem then is you are now using SET semantics on those registers, and 
>>> it
>>> would make the register live around the function, which might not be the 
>>> case.
>>> Whereas clobber semantics will just make the register dead - which is 
>>> exactly
>>> what we want (but only conditionally).
>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>> low part is preserved while the upper part is clobbered across the TLS
>> insns.
>>
>> jeff
> 
> Consider where the TLS call is inside a loop. The compiler would normally want
> to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
> tls_desc we
> are now making x live across the loop, x is dependant on the value from the 
> previous
> iteration, and the tls_desc can no longer be hoisted.
Hmm.  I think I see the problem you're trying to point out.  Let me
restate it and see if you agree.

The low subreg set does clearly indicate the upper part of the SVE
register is clobbered.  The problem is from a liveness standpoint the
compiler is considering the low part live, even though it's a self-set.

In fact, if that is the case, then a single TLS call (independent of a
loop) would make the low part of the register globally live.  This
should be testable.  Include one of these low part self sets on the
existing TLS calls and compile a little test function and let's look at
the liveness data.


Now it could be the case that various local analysis could sub-optimally
handle things.  You mention LICM.  I know our original LICM did have a
problem in that if it saw a use of a hard reg in a loop without seeing a
set of that hard reg it considered the register varying within the loop.
 I have no idea if we carried that forward when the loop code was
rewritten (when I looked at this it was circa 1992).


> 
> Or consider a stream of code containing two tls_desc calls (ok, the compiler 
> might
> optimise one of the tls calls away, but this approach should be reusable for 
> other exprs).
> Between the two set(x,x)’s x is considered live so the register allocator 
> can’t use that
> register.
> Given that we are applying this to all the neon registers, the register 
> allocator now throws
> an ICE because it can’t find any free hard neon registers to use.
Given your statements it sounds like the liveness infrastructure is
making those neon regs globally live when it sees the low part subreg
self-set.  Let's confirm that one way or the other and see where it
takes us.

Jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-23 Thread Alan Hayward

> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
> 
> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>> 
>>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
 
> 
> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
> totally forgotten about it.  And in fact it seems to come pretty close
> to what you need…
 
 Yes, some of the code is similar to the way
 TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
 CLOBBER expr code served as a starting point for writing the patch. The 
 main difference
 here, is that _PART_CLOBBERED is around all calls and is not tied to a 
 specific Instruction,
 it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
 an expression (tls_desc).
 It meant there wasn’t really any opportunity to resume any existing code.
>>> Understood.  Though your first patch mentions that you're trying to
>>> describe partial preservation "around TLS calls". Presumably those are
>>> represented as normal insns, not call_insn.
>>> 
>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>> to itself using whatever mode is wide enough to cover the neon part of
>>> the register.
>>> 
>>> That should tell the generic parts of the compiler that you're just
>>> clobbering the upper part and at least in theory you can implement in
>>> the aarch64 backend and the rest of the compiler should "just work"
>>> because that's the existing semantics of a subreg store.
>>> 
>>> The only worry would be if a pass tried to get overly smart and
>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>> wrong given the semantics of a partial store.
>>> 
>> 
>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>> It’s something we considered, and then dismissed.
>> 
>> The problem then is you are now using SET semantics on those registers, and 
>> it
>> would make the register live around the function, which might not be the 
>> case.
>> Whereas clobber semantics will just make the register dead - which is exactly
>> what we want (but only conditionally).
> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
> low part is preserved while the upper part is clobbered across the TLS
> insns.
> 
> jeff

Consider where the TLS call is inside a loop. The compiler would normally want
to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
tls_desc we
are now making x live across the loop, x is dependant on the value from the 
previous
iteration, and the tls_desc can no longer be hoisted.

Or consider a stream of code containing two tls_desc calls (ok, the compiler 
might
optimise one of the tls calls away, but this approach should be reusable for 
other exprs).
Between the two set(x,x)’s x is considered live so the register allocator can’t 
use that
register.
Given that we are applying this to all the neon registers, the register 
allocator now throws
an ICE because it can’t find any free hard neon registers to use.


Alan.

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-22 Thread Jeff Law
On 11/22/2017 04:31 AM, Alan Hayward wrote:
> 
>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>>>

 You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
 totally forgotten about it.  And in fact it seems to come pretty close
 to what you need…
>>>
>>> Yes, some of the code is similar to the way
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>> main difference
>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>> specific Instruction,
>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
>>> expression (tls_desc).
>>> It meant there wasn’t really any opportunity to resume any existing code.
>> Understood.  Though your first patch mentions that you're trying to
>> describe partial preservation "around TLS calls".  Presumably those are
>> represented as normal insns, not call_insn.
>>
>> That brings me back to Richi's idea of exposing a set of the low subreg
>> to itself using whatever mode is wide enough to cover the neon part of
>> the register.
>>
>> That should tell the generic parts of the compiler that you're just
>> clobbering the upper part and at least in theory you can implement in
>> the aarch64 backend and the rest of the compiler should "just work"
>> because that's the existing semantics of a subreg store.
>>
>> The only worry would be if a pass tried to get overly smart and
>> considered that kind of set a nop -- but I think I'd argue that's simply
>> wrong given the semantics of a partial store.
>>
> 
> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
> It’s something we considered, and then dismissed.
> 
> The problem then is you are now using SET semantics on those registers, and it
> would make the register live around the function, which might not be the case.
> Whereas clobber semantics will just make the register dead - which is exactly
> what we want (but only conditionally).
?!?  A set of the subreg is the *exact* semantics you want.  It says the
low part is preserved while the upper part is clobbered across the TLS
insns.

jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-22 Thread Alan Hayward

> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>> 
>>> 
>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>> totally forgotten about it.  And in fact it seems to come pretty close
>>> to what you need…
>> 
>> Yes, some of the code is similar to the way
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>> CLOBBER expr code served as a starting point for writing the patch. The main 
>> difference
>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>> specific Instruction,
>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
>> expression (tls_desc).
>> It meant there wasn’t really any opportunity to resume any existing code.
> Understood.  Though your first patch mentions that you're trying to
> describe partial preservation "around TLS calls".  Presumably those are
> represented as normal insns, not call_insn.
> 
> That brings me back to Richi's idea of exposing a set of the low subreg
> to itself using whatever mode is wide enough to cover the neon part of
> the register.
> 
> That should tell the generic parts of the compiler that you're just
> clobbering the upper part and at least in theory you can implement in
> the aarch64 backend and the rest of the compiler should "just work"
> because that's the existing semantics of a subreg store.
> 
> The only worry would be if a pass tried to get overly smart and
> considered that kind of set a nop -- but I think I'd argue that's simply
> wrong given the semantics of a partial store.
> 

So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
It’s something we considered, and then dismissed.

The problem then is you are now using SET semantics on those registers, and it
would make the register live around the function, which might not be the case.
Whereas clobber semantics will just make the register dead - which is exactly
what we want (but only conditionally).


Alan.

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-20 Thread Jeff Law
On 11/20/2017 08:04 AM, Alan Hayward wrote:
>>>
>>> Yes, that was an early alternative option for the patch.
>>>
>>> With that it would effect every operation that uses SVE registers. A simple
>>> add of two registers now has 4 inputs and two outputs. It would get in the
>>> way when debugging any sve dumps and be generally annoying.
>>> Possible that the code for that in would all be in the aarch64 target,
>>> (making everyone else happy!) But I suspect that there would be still be
>>> strange dependency issues that’d need sorting in the common code.
>>>
>>> Whereas with this patch, there are no new oddities in non-tls 
>>> compiles/dumps.
>>> Although the patch touches a lot of files, the changes are mostly restricted
>>> to places where standard clobbers were already being checked.
>> I'm not entirely sure that it would require doubling the number of
>> inputs/outputs.  It's not conceptually much different than how we
>> describe DImode operations on 32bit targets.  The mode selects one or
>> more consecutive registers, so you don't actually need anything weird in
>> your patterns.  This is pretty standard stuff.
> 
> Ok, fair enough.
> 
>>
>>
>> It would be painful in that the Neon regs would have to interleave with
>> the upper part of the SVE regs in terms of register numbers.  It would
>> also mean that you couldn't tie together multiple neon regs into
>> something wider.  I'm not sure if the latter would be an issue or not.
> 
> And there’s also the weirdness that the register would not be split evenly - 
> it’ll be a TI
> reg followed by a reg of the size of multiple TIs.
> 
> All of that has the potential to complicate all non-sve aarch64 code.
Agreed.  Let's drop this line of exploration.


> 
>>
>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>> totally forgotten about it.  And in fact it seems to come pretty close
>> to what you need…
> 
> Yes, some of the code is similar to the way
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
> CLOBBER expr code served as a starting point for writing the patch. The main 
> difference
> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
> specific Instruction,
> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
> expression (tls_desc).
> It meant there wasn’t really any opportunity to resume any existing code.
Understood.  Though your first patch mentions that you're trying to
describe partial preservation "around TLS calls".  Presumably those are
represented as normal insns, not call_insn.

That brings me back to Richi's idea of exposing a set of the low subreg
to itself using whatever mode is wide enough to cover the neon part of
the register.

That should tell the generic parts of the compiler that you're just
clobbering the upper part and at least in theory you can implement in
the aarch64 backend and the rest of the compiler should "just work"
because that's the existing semantics of a subreg store.

The only worry would be if a pass tried to get overly smart and
considered that kind of set a nop -- but I think I'd argue that's simply
wrong given the semantics of a partial store.

jeff







Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-20 Thread Alan Hayward

> On 17 Nov 2017, at 19:31, Jeff Law  wrote:
> 
> On 11/16/2017 11:50 AM, Alan Hayward wrote:
>> 
>>> On 16 Nov 2017, at 18:24, Richard Biener  wrote:
>>> 
>>> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
 On 11/16/2017 05:34 AM, Alan Hayward wrote:
> This is a set of patches aimed at supporting aarch64 SVE register
> preservation around TLS calls.
> 
> Across a TLS call, Aarch64 SVE does not explicitly preserve the
> SVE vector registers. However, the Neon vector registers are
 preserved.
> Due to overlapping of registers, this means the lower 128bits of all
> SVE vector registers will be preserved.
> 
> The existing GCC code will currently incorrectly assume preservation
> of all of the SVE registers.
> 
> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
 like
> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
 register.
> The mode of the expression indicates the size of the lower bits which
> will be preserved. If the register contains a value bigger than this
> mode then the code will treat the register as clobbered.
> 
> The means in order to evaluate if a clobber high is relevant, we need
 to ensure
> the mode of the existing value in a register is tracked.
> 
> The following patches in this series add support for the
 CLOBBER_HIGH,
> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
> aarch64. The testing performed on these patches is also detailed in
 the
> final patch.
> 
> These patches are based on top of the linaro-dev/sve branch.
> 
> A simpler alternative to this patch would be to assume all Neon and
 SVE
> registers are clobbered across TLS calls, however this would be a
> performance regression against all Aarch64 targets.
 So just a couple design questions.
 
 Presumably there's no reasonable way to set up GCC's view of the
 register file to avoid this problem?  ISTM that if the SVE register was
 split into two, one for the part that overlapped with the neon register
 and one that did not, then this could be handled via standard
 mechanisms?
 
>> 
>> Yes, that was an early alternative option for the patch.
>> 
>> With that it would effect every operation that uses SVE registers. A simple
>> add of two registers now has 4 inputs and two outputs. It would get in the
>> way when debugging any sve dumps and be generally annoying.
>> Possible that the code for that in would all be in the aarch64 target,
>> (making everyone else happy!) But I suspect that there would be still be
>> strange dependency issues that’d need sorting in the common code.
>> 
>> Whereas with this patch, there are no new oddities in non-tls compiles/dumps.
>> Although the patch touches a lot of files, the changes are mostly restricted
>> to places where standard clobbers were already being checked.
> I'm not entirely sure that it would require doubling the number of
> inputs/outputs.  It's not conceptually much different than how we
> describe DImode operations on 32bit targets.  The mode selects one or
> more consecutive registers, so you don't actually need anything weird in
> your patterns.  This is pretty standard stuff.

Ok, fair enough.

> 
> 
> It would be painful in that the Neon regs would have to interleave with
> the upper part of the SVE regs in terms of register numbers.  It would
> also mean that you couldn't tie together multiple neon regs into
> something wider.  I'm not sure if the latter would be an issue or not.

And there’s also the weirdness that the register would not be split evenly - 
it’ll be a TI
reg followed by a reg of the size of multiple TIs.

All of that has the potential to complicate all non-sve aarch64 code.

> 
> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
> totally forgotten about it.  And in fact it seems to come pretty close
> to what you need…

Yes, some of the code is similar to the way
TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
CLOBBER expr code served as a starting point for writing the patch. The main 
difference
here, is that _PART_CLOBBERED is around all calls and is not tied to a specific 
Instruction,
it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
expression (tls_desc).
It meant there wasn’t really any opportunity to resume any existing code.


Alan.




Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-17 Thread Jeff Law
On 11/16/2017 11:50 AM, Alan Hayward wrote:
> 
>> On 16 Nov 2017, at 18:24, Richard Biener  wrote:
>>
>> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
>>> On 11/16/2017 05:34 AM, Alan Hayward wrote:
 This is a set of patches aimed at supporting aarch64 SVE register
 preservation around TLS calls.

 Across a TLS call, Aarch64 SVE does not explicitly preserve the
 SVE vector registers. However, the Neon vector registers are
>>> preserved.
 Due to overlapping of registers, this means the lower 128bits of all
 SVE vector registers will be preserved.

 The existing GCC code will currently incorrectly assume preservation
 of all of the SVE registers.

 This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>>> like
 a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>>> register.
 The mode of the expression indicates the size of the lower bits which
 will be preserved. If the register contains a value bigger than this
 mode then the code will treat the register as clobbered.

 The means in order to evaluate if a clobber high is relevant, we need
>>> to ensure
 the mode of the existing value in a register is tracked.

 The following patches in this series add support for the
>>> CLOBBER_HIGH,
 with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
 aarch64. The testing performed on these patches is also detailed in
>>> the
 final patch.

 These patches are based on top of the linaro-dev/sve branch.

 A simpler alternative to this patch would be to assume all Neon and
>>> SVE
 registers are clobbered across TLS calls, however this would be a
 performance regression against all Aarch64 targets.
>>> So just a couple design questions.
>>>
>>> Presumably there's no reasonable way to set up GCC's view of the
>>> register file to avoid this problem?  ISTM that if the SVE register was
>>> split into two, one for the part that overlapped with the neon register
>>> and one that did not, then this could be handled via standard
>>> mechanisms?
>>>
> 
> Yes, that was an early alternative option for the patch.
> 
> With that it would effect every operation that uses SVE registers. A simple
> add of two registers now has 4 inputs and two outputs. It would get in the
> way when debugging any sve dumps and be generally annoying.
> Possible that the code for that in would all be in the aarch64 target,
> (making everyone else happy!) But I suspect that there would be still be
> strange dependency issues that’d need sorting in the common code.
> 
> Whereas with this patch, there are no new oddities in non-tls compiles/dumps.
> Although the patch touches a lot of files, the changes are mostly restricted
> to places where standard clobbers were already being checked.
I'm not entirely sure that it would require doubling the number of
inputs/outputs.  It's not conceptually much different than how we
describe DImode operations on 32bit targets.  The mode selects one or
more consecutive registers, so you don't actually need anything weird in
your patterns.  This is pretty standard stuff.


It would be painful in that the Neon regs would have to interleave with
the upper part of the SVE regs in terms of register numbers.  It would
also mean that you couldn't tie together multiple neon regs into
something wider.  I'm not sure if the latter would be an issue or not.

You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
totally forgotten about it.  And in fact it seems to come pretty close
to what you need...

> 
> 
>>> Alternately would it be easier to clobber a subreg representing the
>>> high
>>> part of the register?  Hmm, probably not.
>>
>> I thought of a set of the preserved part to itself that leaves the upper 
>> part undefined. Not sure if we have such thing or if it would work in all 
>> places that a clobber does.
> 
> I’ve not seen such a thing in the code. But it would need specific handling in
> the all the existing clobber code.
It could probably be done with a set of the low part via a subreg or
somesuch (rather than trying to clobber the upper part which was my
initial flawed idea).

jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Alan Hayward

> On 16 Nov 2017, at 18:24, Richard Biener  wrote:
> 
> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
>> On 11/16/2017 05:34 AM, Alan Hayward wrote:
>>> This is a set of patches aimed at supporting aarch64 SVE register
>>> preservation around TLS calls.
>>> 
>>> Across a TLS call, Aarch64 SVE does not explicitly preserve the
>>> SVE vector registers. However, the Neon vector registers are
>> preserved.
>>> Due to overlapping of registers, this means the lower 128bits of all
>>> SVE vector registers will be preserved.
>>> 
>>> The existing GCC code will currently incorrectly assume preservation
>>> of all of the SVE registers.
>>> 
>>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>> like
>>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>> register.
>>> The mode of the expression indicates the size of the lower bits which
>>> will be preserved. If the register contains a value bigger than this
>>> mode then the code will treat the register as clobbered.
>>> 
>>> The means in order to evaluate if a clobber high is relevant, we need
>> to ensure
>>> the mode of the existing value in a register is tracked.
>>> 
>>> The following patches in this series add support for the
>> CLOBBER_HIGH,
>>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
>>> aarch64. The testing performed on these patches is also detailed in
>> the
>>> final patch.
>>> 
>>> These patches are based on top of the linaro-dev/sve branch.
>>> 
>>> A simpler alternative to this patch would be to assume all Neon and
>> SVE
>>> registers are clobbered across TLS calls, however this would be a
>>> performance regression against all Aarch64 targets.
>> So just a couple design questions.
>> 
>> Presumably there's no reasonable way to set up GCC's view of the
>> register file to avoid this problem?  ISTM that if the SVE register was
>> split into two, one for the part that overlapped with the neon register
>> and one that did not, then this could be handled via standard
>> mechanisms?
>> 

Yes, that was an early alternative option for the patch.

With that it would effect every operation that uses SVE registers. A simple
add of two registers now has 4 inputs and two outputs. It would get in the
way when debugging any sve dumps and be generally annoying.
Possible that the code for that in would all be in the aarch64 target,
(making everyone else happy!) But I suspect that there would be still be
strange dependency issues that’d need sorting in the common code.

Whereas with this patch, there are no new oddities in non-tls compiles/dumps.
Although the patch touches a lot of files, the changes are mostly restricted
to places where standard clobbers were already being checked.


>> Alternately would it be easier to clobber a subreg representing the
>> high
>> part of the register?  Hmm, probably not.
> 
> I thought of a set of the preserved part to itself that leaves the upper part 
> undefined. Not sure if we have such thing or if it would work in all places 
> that a clobber does.

I’ve not seen such a thing in the code. But it would need specific handling in
the all the existing clobber code.


Alan.




Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Richard Biener
On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
>On 11/16/2017 05:34 AM, Alan Hayward wrote:
>> This is a set of patches aimed at supporting aarch64 SVE register
>> preservation around TLS calls.
>> 
>> Across a TLS call, Aarch64 SVE does not explicitly preserve the
>> SVE vector registers. However, the Neon vector registers are
>preserved.
>> Due to overlapping of registers, this means the lower 128bits of all
>> SVE vector registers will be preserved.
>> 
>> The existing GCC code will currently incorrectly assume preservation
>> of all of the SVE registers.
>> 
>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>like
>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>register.
>> The mode of the expression indicates the size of the lower bits which
>> will be preserved. If the register contains a value bigger than this
>> mode then the code will treat the register as clobbered.
>> 
>> The means in order to evaluate if a clobber high is relevant, we need
>to ensure
>> the mode of the existing value in a register is tracked.
>> 
>> The following patches in this series add support for the
>CLOBBER_HIGH,
>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
>> aarch64. The testing performed on these patches is also detailed in
>the
>> final patch.
>> 
>> These patches are based on top of the linaro-dev/sve branch.
>> 
>> A simpler alternative to this patch would be to assume all Neon and
>SVE
>> registers are clobbered across TLS calls, however this would be a
>> performance regression against all Aarch64 targets.
>So just a couple design questions.
>
>Presumably there's no reasonable way to set up GCC's view of the
>register file to avoid this problem?  ISTM that if the SVE register was
>split into two, one for the part that overlapped with the neon register
>and one that did not, then this could be handled via standard
>mechanisms?
>
>Alternately would it be easier to clobber a subreg representing the
>high
>part of the register?  Hmm, probably not.

I thought of a set of the preserved part to itself that leaves the upper part 
undefined. Not sure if we have such thing or if it would work in all places 
that a clobber does.

Richard. 

>Jeff



Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Jeff Law
On 11/16/2017 05:34 AM, Alan Hayward wrote:
> This is a set of patches aimed at supporting aarch64 SVE register
> preservation around TLS calls.
> 
> Across a TLS call, Aarch64 SVE does not explicitly preserve the
> SVE vector registers. However, the Neon vector registers are preserved.
> Due to overlapping of registers, this means the lower 128bits of all
> SVE vector registers will be preserved.
> 
> The existing GCC code will currently incorrectly assume preservation
> of all of the SVE registers.
> 
> This patch introduces a CLOBBER_HIGH expression. This behaves a bit like
> a CLOBBER expression. CLOBBER_HIGH can only refer to a single register.
> The mode of the expression indicates the size of the lower bits which
> will be preserved. If the register contains a value bigger than this
> mode then the code will treat the register as clobbered.
> 
> The means in order to evaluate if a clobber high is relevant, we need to 
> ensure
> the mode of the existing value in a register is tracked.
> 
> The following patches in this series add support for the CLOBBER_HIGH,
> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
> aarch64. The testing performed on these patches is also detailed in the
> final patch.
> 
> These patches are based on top of the linaro-dev/sve branch.
> 
> A simpler alternative to this patch would be to assume all Neon and SVE
> registers are clobbered across TLS calls, however this would be a
> performance regression against all Aarch64 targets.
So just a couple design questions.

Presumably there's no reasonable way to set up GCC's view of the
register file to avoid this problem?  ISTM that if the SVE register was
split into two, one for the part that overlapped with the neon register
and one that did not, then this could be handled via standard mechanisms?

Alternately would it be easier to clobber a subreg representing the high
part of the register?  Hmm, probably not.

Jeff


[PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Alan Hayward
This is a set of patches aimed at supporting aarch64 SVE register
preservation around TLS calls.

Across a TLS call, Aarch64 SVE does not explicitly preserve the
SVE vector registers. However, the Neon vector registers are preserved.
Due to overlapping of registers, this means the lower 128bits of all
SVE vector registers will be preserved.

The existing GCC code will currently incorrectly assume preservation
of all of the SVE registers.

This patch introduces a CLOBBER_HIGH expression. This behaves a bit like
a CLOBBER expression. CLOBBER_HIGH can only refer to a single register.
The mode of the expression indicates the size of the lower bits which
will be preserved. If the register contains a value bigger than this
mode then the code will treat the register as clobbered.

The means in order to evaluate if a clobber high is relevant, we need to ensure
the mode of the existing value in a register is tracked.

The following patches in this series add support for the CLOBBER_HIGH,
with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
aarch64. The testing performed on these patches is also detailed in the
final patch.

These patches are based on top of the linaro-dev/sve branch.

A simpler alternative to this patch would be to assume all Neon and SVE
registers are clobbered across TLS calls, however this would be a
performance regression against all Aarch64 targets.

Alan.


2017-11-16  Alan Hayward  

* doc/rtl.texi (clobber_high): Add.
(parallel): Add in clobber high
* rtl.c (rtl_check_failed_code3): Add function.
* rtl.def (CLOBBER_HIGH): Add expression.
* rtl.h (RTL_CHECKC3): Add macro.
(rtl_check_failed_code3): Add declaration.
(XC3EXP): Add macro.


diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 
f583940b9441b2111c8d65a00a064e89bdd2ffaf..951322258ddbb57900225bd501bd23a8a9970ead
 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -3209,6 +3209,18 @@ There is one other known use for clobbering a pseudo 
register in a
 clobbered by the insn.  In this case, using the same pseudo register in
 the clobber and elsewhere in the insn produces the expected results.

+@findex clobber_high
+@item (clobber_high @var{x})
+Represents the storing or possible storing of an unpredictable,
+undescribed value into the upper parts of @var{x}. The mode of the expression
+represents the lower parts of the register which will not be overwritten.
+@code{reg} must be a reg expression.
+
+One place this is used is when calling into functions where the registers are
+preserved, but only up to a given number of bits.  For example when using
+Aarch64 SVE, calling a TLS descriptor will cause only the lower 128 bits of
+each of the vector registers to be preserved.
+
 @findex use
 @item (use @var{x})
 Represents the use of the value of @var{x}.  It indicates that the
@@ -3262,7 +3274,8 @@ Represents several side effects performed in parallel.  
The square
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} @code{use} or
+@code{clobber_high}.

 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
diff --git a/gcc/rtl.c b/gcc/rtl.c
index 
3b2728be8b506fb3c14a20297cf92368caa5ca3b..6db84f99627bb8617c6e227892ca44076f4e729b
 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -860,6 +860,17 @@ rtl_check_failed_code2 (const_rtx r, enum rtx_code code1, 
enum rtx_code code2,
 }

 void
+rtl_check_failed_code3 (const_rtx r, enum rtx_code code1, enum rtx_code code2,
+   enum rtx_code code3, const char *file, int line,
+   const char *func)
+{
+  internal_error
+("RTL check: expected code '%s', '%s' or '%s', have '%s' in %s, at %s:%d",
+ GET_RTX_NAME (code1), GET_RTX_NAME (code2), GET_RTX_NAME (code3),
+ GET_RTX_NAME (GET_CODE (r)), func, trim_filename (file), line);
+}
+
+void
 rtl_check_failed_code_mode (const_rtx r, enum rtx_code code, machine_mode mode,
bool not_mode, const char *file, int line,
const char *func)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 
83bcfcaadcacc45cce352bf7fba33fbbc87ccd58..a6c4d4a46c4eb4f6cb0eca66a3f6a558f94acc8a
 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -312,6 +312,16 @@ DEF_RTL_EXPR(USE, "use", "e", RTX_EXTRA)
is considered undeletable before reload.  */
 DEF_RTL_EXPR(CLOBBER, "clobber", "e", RTX_EXTRA)

+/* Indicate that the upper parts of something are clobbered in a way that we
+   don't want to explain.  The MODE references the lower bits that will be
+   preserved.  Anything above that size will be clobbered.
+
+   CLOBBER_HIGH only occurs as