subject:"rs6000 stack_tie mishap again"

Re: rs6000 stack_tie mishap again

2016-04-15 Thread Segher Boessenkool

On Fri, Apr 15, 2016 at 07:05:25PM +0200, Olivier Hainque wrote:
> > But don't you run the risk that the stack could be deallocated before the 
> > restores are done?  This came up on the PA port a long time ago. IIRC the 
> > situations was something like this:
> > 
> > We had a frame pointer and all the restores were being done via the frame 
> > pointer.  So the scheduler moved the stack pointer adjustment up past the 
> > register restores.  At which point the restores were reading from 
> > unallocated stack space.
> > 
> > 99.99% of the time it didn't cause a problem.  But if we got an interrupt 
> > in that brief window, boom, the interrupt handler could allocate a frame on 
> > the current stack, store some data in there which clobbered the saved 
> > register state.
> 
> Yes, that's exactly the kind of problem we're fighting and, indeed,
> the memory barrier alone isn't enough.
> 
> Here, the idea is to add a memory barrier to the set of things that the rs6000
> back-end already does, which tries to be better than a complete scheduling
> barrier (see rs6000_emit_stack_tie), but turns out to still fail in some 
> corner
> cases due to alias analysis intricacies (original problem description at
> the very beginning of this now long thread: 
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html)

I think the best thing to do is add the clobber-of-mem-scratch to the
final stack deallocate (as a parallel).  I don't see anything else that
will work reliably.


Segher

Re: rs6000 stack_tie mishap again

2016-04-15 Thread Olivier Hainque


> On Apr 15, 2016, at 18:42 , Jeff Law  wrote:

>>   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>>  This is used in epilogue deallocation functions.  */
> *That's* the one I was looking for  :-)

:-)

>> Yes, I pondered this one and thought that a memory barrier
>> would be less aggressive, while good enough.
> But don't you run the risk that the stack could be deallocated before the 
> restores are done?  This came up on the PA port a long time ago. IIRC the 
> situations was something like this:
> 
> We had a frame pointer and all the restores were being done via the frame 
> pointer.  So the scheduler moved the stack pointer adjustment up past the 
> register restores.  At which point the restores were reading from unallocated 
> stack space.
> 
> 99.99% of the time it didn't cause a problem.  But if we got an interrupt in 
> that brief window, boom, the interrupt handler could allocate a frame on the 
> current stack, store some data in there which clobbered the saved register 
> state.

Yes, that's exactly the kind of problem we're fighting and, indeed,
the memory barrier alone isn't enough.

Here, the idea is to add a memory barrier to the set of things that the rs6000
back-end already does, which tries to be better than a complete scheduling
barrier (see rs6000_emit_stack_tie), but turns out to still fail in some corner
cases due to alias analysis intricacies (original problem description at
the very beginning of this now long thread: 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html)

Olivier

Re: rs6000 stack_tie mishap again

2016-04-15 Thread Jeff Law


On 04/14/2016 11:10 AM, Olivier Hainque wrote:



On 14 Apr 2016, at 18:50, Jeff Law  wrote:

I thought we had code to handle this case specially, but I can't immediately 
find it in sched-deps.c.


Unless I misunderstood what you were exactly looking for,
the special code is in alias.c. E.g. write_dependence_p:

   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
  This is used in epilogue deallocation functions.  */

*That's* the one I was looking for  :-)




   ...


Some ports also use an unspec_volatile to achieve the same effect:

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

(define_insn "blockage"
  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
  ""
  ""
  [(set_attr "length" "0")])


Yes, I pondered this one and thought that a memory barrier
would be less aggressive, while good enough.
But don't you run the risk that the stack could be deallocated before 
the restores are done?  This came up on the PA port a long time ago. 
IIRC the situations was something like this:


We had a frame pointer and all the restores were being done via the 
frame pointer.  So the scheduler moved the stack pointer adjustment up 
past the register restores.  At which point the restores were reading 
from unallocated stack space.


99.99% of the time it didn't cause a problem.  But if we got an 
interrupt in that brief window, boom, the interrupt handler could 
allocate a frame on the current stack, store some data in there which 
clobbered the saved register state.


Jeff

Re: rs6000 stack_tie mishap again

2016-04-15 Thread Olivier Hainque


> On Apr 15, 2016, at 00:42 , Segher Boessenkool  
> wrote:
> 
>> 
>> Something like the attached patch, at least for next stage1 until the
>> more general issue is sorted out ?
> 
> It's a bit heavy; as a workaround, we may want this clobber in the stack
> deallocation in the epilogue, but not in all other places stack_tie is
> used.

OK, I can add an epilogue_p argument or something like that.

> And for stage 1 we do not really want a workaround, we want to fix the
> actual problem ;-)

Sure. It might take some time though, and we might want
a workaround in the interim. Maintainers' call I suppose :)

Re: rs6000 stack_tie mishap again

2016-04-15 Thread Olivier Hainque


> On Apr 15, 2016, at 06:37 , Andreas Krebbel  
> wrote:
>>  /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>> This is used in epilogue deallocation functions.  */
>>  ...
> 
> Ok thanks. I've verified that the dependencies are also generated when
> using this expression in a clobber.

Ok, thanks. This is consistent with what I've seen.

>> sequence was allowed to move past the stack pointer reset when
>> it's performed as a mere register to register move.
> 
> I've seen the same on S/390 when restoring the stack pointer
> from a floating point reg. But I think it is not limited to
> reg-reg stack pointer restores.  Potentially this could also
> happen with the stack pointer decrement in the prologue which
> could also get moved across a mem only barrier.

Indeed.

With Kind Regards,

Olivier

Re: rs6000 stack_tie mishap again

2016-04-14 Thread Andreas Krebbel

On 04/14/2016 07:10 PM, Olivier Hainque wrote:
> 
>> On 14 Apr 2016, at 18:50, Jeff Law  wrote:
>>
>> I thought we had code to handle this case specially, but I can't immediately 
>> find it in sched-deps.c.
> 
> Unless I misunderstood what you were exactly looking for,
> the special code is in alias.c. E.g. write_dependence_p:
> 
>   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>  This is used in epilogue deallocation functions.  */
>   ...

Ok thanks. I've verified that the dependencies are also generated when
using this expression in a clobber.

> Emitting just that alone isn't enough though. The first attempt
> I made did that and failed because the whole 
> 
>   reg-restore
>   reg-restore
>   tie (mem blockage only)
>   
> sequence was allowed to move past the stack pointer reset when
> it's performed as a mere register to register move.
> 
> So I ended up adding the clobber mem:blk scratch to the existing
> tie parallel, which references the stack pointer register as
> well so is forbidden to move past it's update.

I've seen the same on S/390 when restoring the stack pointer
from a floating point reg. But I think it is not limited to
reg-reg stack pointer restores.  Potentially this could also
happen with the stack pointer decrement in the prologue which
could also get moved across a mem only barrier.

Bye,

-Andreas-

Re: rs6000 stack_tie mishap again

2016-04-14 Thread Segher Boessenkool

On Mon, Apr 11, 2016 at 12:15:10PM +0200, Olivier Hainque wrote:
> 
> > On Mar 28, 2016, at 19:41 , Richard Henderson  wrote:
> > 
> > But I expect for stage4, the best solution is to strengthen the stack_tie 
> > pattern to block all memory.  Early scheduling of the stack frame 
> > deallocation (a simple logic insn) can't really be that important to 
> > performance.
> 
> Something like the attached patch, at least for next stage1 until the
> more general issue is sorted out ?

It's a bit heavy; as a workaround, we may want this clobber in the stack
deallocation in the epilogue, but not in all other places stack_tie is
used.

And for stage 1 we do not really want a workaround, we want to fix the
actual problem ;-)


Segher

Re: rs6000 stack_tie mishap again

2016-04-14 Thread Olivier Hainque

> On 14 Apr 2016, at 18:50, Jeff Law  wrote:
> 
> I thought we had code to handle this case specially, but I can't immediately 
> find it in sched-deps.c.

Unless I misunderstood what you were exactly looking for,
the special code is in alias.c. E.g. write_dependence_p:

  /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
 This is used in epilogue deallocation functions.  */
  ...

> Some ports also use an unspec_volatile to achieve the same effect:
> 
> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> ;; all of memory.  This blocks insns from being moved across this point.
> 
> (define_insn "blockage"
>  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
>  ""
>  ""
>  [(set_attr "length" "0")])

Yes, I pondered this one and thought that a memory barrier
would be less aggressive, while good enough.

> ;; Do not schedule instructions accessing memory across this point.
> 
> (define_expand "memory_blockage"
>  [(set (match_dup 0)
>(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
>  ""
> {
>  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
>  MEM_VOLATILE_P (operands[0]) = 1;
> })
> 
> (define_insn "*memory_blockage"
>  [(set (match_operand:BLK 0)
>(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
>  ""
>  ""
>  [(set_attr "length" "0")])

Emitting just that alone isn't enough though. The first attempt
I made did that and failed because the whole 

  reg-restore
  reg-restore
  tie (mem blockage only)

sequence was allowed to move past the stack pointer reset when
it's performed as a mere register to register move.

So I ended up adding the clobber mem:blk scratch to the existing
tie parallel, which references the stack pointer register as
well so is forbidden to move past it's update.

Olivier

Re: rs6000 stack_tie mishap again

2016-04-14 Thread Jeff Law


On 04/14/2016 09:47 AM, Andreas Krebbel wrote:

On 04/11/2016 12:15 PM, Olivier Hainque wrote:



On Mar 28, 2016, at 19:41 , Richard Henderson  wrote:

But I expect for stage4, the best solution is to strengthen the stack_tie 
pattern to block all memory.  Early scheduling of the stack frame deallocation 
(a simple logic insn) can't really be that important to performance.


Something like the attached patch, at least for next stage1 until the
more general issue is sorted out ?

Only basic testing at this point. I can do more if considered appropriate.

Thanks for your feedback,

With Kind Regards,

Olivier

* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
 (clobber mem:BLK (scratch)) to the set of effects, blocking
all memory accesses.


We have the same problem on S/390 and I'm also looking for a way to solve that. 
I'm not sure about
the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a 
memory write over a
(clobber (mem))?
I thought we had code to handle this case specially, but I can't 
immediately find it in sched-deps.c.


Some ports also use an unspec_volatile to achieve the same effect:

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

(define_insn "blockage"
  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
  ""
  ""
  [(set_attr "length" "0")])

;; Do not schedule instructions accessing memory across this point.

(define_expand "memory_blockage"
  [(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
  ""
{
  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
  MEM_VOLATILE_P (operands[0]) = 1;
})

(define_insn "*memory_blockage"
  [(set (match_operand:BLK 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
  ""
  ""
  [(set_attr "length" "0")])


Jeff

Re: rs6000 stack_tie mishap again

2016-04-14 Thread Andreas Krebbel

On 04/11/2016 12:15 PM, Olivier Hainque wrote:
> 
>> On Mar 28, 2016, at 19:41 , Richard Henderson  wrote:
>>
>> But I expect for stage4, the best solution is to strengthen the stack_tie 
>> pattern to block all memory.  Early scheduling of the stack frame 
>> deallocation (a simple logic insn) can't really be that important to 
>> performance.
> 
> Something like the attached patch, at least for next stage1 until the
> more general issue is sorted out ?
> 
> Only basic testing at this point. I can do more if considered appropriate.
> 
> Thanks for your feedback,
> 
> With Kind Regards,
> 
> Olivier
> 
>   * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
> (clobber mem:BLK (scratch)) to the set of effects, blocking
>   all memory accesses.

We have the same problem on S/390 and I'm also looking for a way to solve that. 
I'm not sure about
the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a 
memory write over a
(clobber (mem))?

The patch I'm currently testing follows a proposal from Joseph Myers in 2011:
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00977.html

+ [(set (mem:BLK (scratch))
+ (unspec:BLK [(match_operand:BLK 0 "memory_operand" "+m")] UNSPEC_TIE))]

What I'm wondering is whether the memory read is actually needed here?! 
Wouldn't the following have
the same effect?

[(set (mem:BLK (scratch)) (unspec:BLK [(const_int 0)] UNSPEC_TIE))]

To my understanding this should block memory reads and writes from being moved 
across.
The MEM probably should be set up to be in the frame alias set?

Bye,

-Andreas-

Re: rs6000 stack_tie mishap again

2016-04-11 Thread Olivier Hainque

> On Mar 28, 2016, at 19:41 , Richard Henderson  wrote:
> 
> But I expect for stage4, the best solution is to strengthen the stack_tie 
> pattern to block all memory.  Early scheduling of the stack frame 
> deallocation (a simple logic insn) can't really be that important to 
> performance.

Something like the attached patch, at least for next stage1 until the
more general issue is sorted out ?

Only basic testing at this point. I can do more if considered appropriate.

Thanks for your feedback,

With Kind Regards,

Olivier

* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
(clobber mem:BLK (scratch)) to the set of effects, blocking
all memory accesses.

stronger_rs6000_tie.diff
Description: Binary data

Re: rs6000 stack_tie mishap again

2016-04-08 Thread Olivier Hainque

> On Apr 8, 2016, at 17:37 , Segher Boessenkool  
> wrote:
> 
> On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote:
>>> But I expect for stage4, the best solution is to strengthen the stack_tie 
>>> pattern to block all memory.  Early scheduling of the stack frame 
>>> deallocation (a simple logic insn) can't really be that important to 
>>> performance.
>> 
>> My feeling as well. At least, it can't be important enough to warrant
>> a sustained exposure to the kind of bug we're discussing here.
> 
> Is it a regression?  Changing this in stage 4, and this late in stage 4,
> is super invasive.  Wrt performance, well, I'd like to see numbers :-/

Sorry, my comment was ambiguous: "My feeling as well" etc was in reaction
to Richard's second sentence. I didn't mean to comment on what we want to
do or not for stage4.

Regarding perfs, I agree that having numbers would be good.

Nevertheless, I'd rather see some perf drop than just hoping for this not
to show up, in any case, and I remain convinced that whatever we gain seems
unlikely to be worth the risk of hitting this bug.

Olivier

Re: rs6000 stack_tie mishap again

2016-04-08 Thread Segher Boessenkool

On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote:
> > But I expect for stage4, the best solution is to strengthen the stack_tie 
> > pattern to block all memory.  Early scheduling of the stack frame 
> > deallocation (a simple logic insn) can't really be that important to 
> > performance.
> 
> My feeling as well. At least, it can't be important enough to warrant
> a sustained exposure to the kind of bug we're discussing here.

Is it a regression?  Changing this in stage 4, and this late in stage 4,
is super invasive.  Wrt performance, well, I'd like to see numbers :-/


Segher

Re: rs6000 stack_tie mishap again

2016-04-08 Thread Olivier Hainque

Hello Richard & Alan,

Thanks for your feedback. Back on it after a few extra
experiments.

> On Mar 28, 2016, at 19:41 , Richard Henderson  wrote:
> 
>> Let's see what rth thinks.  He did say the patch might need to be
>> redone.  :)
>> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html
> 
> I be surprised if this is works as expected without side-effects.  You've now 
> exposed the restore of the frame pointer to alias analysis, and it's probably 
> not seen as constant anymore.  As you reference, I expect that any patch that 
> opens the epilogue to such scrutiny is going to have to special-case the 
> frame pointer as well.

> That said, as Segher points out later in the thread, one can arrange for hard 
> regs within the body to bleed into temporaries used within the epilogue, 
> which is bad.  So perhaps this is exactly what's needed longer-term.  More 
> investigation is required.

I'd be happy to help, as much as I can.

Can you please expand a bit on the kind of side-effects you'd expect
and hint at useful directions you believe the investigation should
take ? I have some ideas and would like to make sure they're in
line with what you think would be most relevant :)

> But I expect for stage4, the best solution is to strengthen the stack_tie 
> pattern to block all memory.  Early scheduling of the stack frame 
> deallocation (a simple logic insn) can't really be that important to 
> performance.

My feeling as well. At least, it can't be important enough to warrant
a sustained exposure to the kind of bug we're discussing here.

Olivier

Re: rs6000 stack_tie mishap again

2016-03-30 Thread Alan Modra

On Wed, Mar 30, 2016 at 11:02:41AM +0200, Olivier Hainque wrote:
> void g(int, char *);
> 
> void f(int x)
> {
>char big[20];
>  start:
>g(x, big);
>g(x, big);
>register void *p asm("r11") = &
>asm("" : : "r"(p));
>asm("" : : :"r28");
>asm("" : : :"r29");
>asm("" : : :"r30");
> }
> 
> I'm getting:
> 
> lis 11,.L2@ha
> la 11,.L2@l(11)
> 
> lwz 11,0(1)
> lwz 0,4(11)
> lwz 28,-16(11) 
> 
>   mr 1,11
> 
>   mtlr 0
>   lwz 29,-12(11)
>   lwz 30,-8(11)
>   lwz 31,-4(11)
> 
>   blr

BTW, the exact sequence you get depends on -mcpu (not surprising), but
yes, I see register restores after the "mr 1,11" too.

-- 
Alan Modra
Australia Development Lab, IBM

Re: rs6000 stack_tie mishap again

2016-03-30 Thread Olivier Hainque

Hello Segher,

> On Mar 28, 2016, at 13:18 , Segher Boessenkool  
> wrote:
> 
>> You need to have had r11 last used to designate a global
>> symbol as part of the function body (in order to have base_term
>> designate a symbol_ref etc), and then have the scheduler
>> decide that moving across is a good idea. It's certainly not
>> an easy combination to trigger.
> 
> Yes, I did that (with some asm's).  Like this:
> 
> ===
> void g(int, char *);
> 
> int dum;
> 
> void f(int x)
> {
>char big[20];
>g(x, big);
>g(x, big);
>register void *p asm("r11") = 
>asm("" : : "r"(p));
> }

Ah, I see, thanks. In this instance, the problem doesn't
trigger because CONSTANT_POOL_ADDRESS_P (base) is false in

  base = find_base_term (true_mem_addr);
  if (! writep
  && base
  && (GET_CODE (base) == LABEL_REF
  || (GET_CODE (base) == SYMBOL_REF
  && CONSTANT_POOL_ADDRESS_P (base
return 0;

  (part of write_dependence_p)

With a minor variation:

void g(int, char *);

void f(int x)
{
   char big[20];
 start:
   g(x, big);
   g(x, big);
   register void *p asm("r11") = &
   asm("" : : "r"(p));
   asm("" : : :"r28");
   asm("" : : :"r29");
   asm("" : : :"r30");
}

I'm getting:

lis 11,.L2@ha
la 11,.L2@l(11)

lwz 11,0(1)
lwz 0,4(11)
lwz 28,-16(11) 

mr 1,11

mtlr 0
lwz 29,-12(11)
lwz 30,-8(11)
lwz 31,-4(11)

blr

out of a powerpc-elf close-to-tunk compiler, despite the
presence of a stack_tie insn at the rtl level.

Olivier

Re: rs6000 stack_tie mishap again

2016-03-28 Thread Richard Henderson


On 03/23/2016 09:10 PM, Alan Modra wrote:

On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

   /* Walk the insns adding values to the new_reg_base_value array.  */
   for (i = 0; i < rpo_cnt; i++)
{ ...
  if (could_be_prologue_epilogue
  && prologue_epilogue_contains (insn))
continue;

The motivation for this is unclear to me.


https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html


My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.


Also, possibly just prologue insns.  So you might be able to modify
init_alias_analysis just to ignore the prologue and skip any need for
yet another hook.

Let's see what rth thinks.  He did say the patch might need to be
redone.  :)
https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html


I be surprised if this is works as expected without side-effects.  You've now 
exposed the restore of the frame pointer to alias analysis, and it's probably 
not seen as constant anymore.  As you reference, I expect that any patch that 
opens the epilogue to such scrutiny is going to have to special-case the frame 
pointer as well.


That said, as Segher points out later in the thread, one can arrange for hard 
regs within the body to bleed into temporaries used within the epilogue, which 
is bad.  So perhaps this is exactly what's needed longer-term.  More 
investigation is required.


But I expect for stage4, the best solution is to strengthen the stack_tie 
pattern to block all memory.  Early scheduling of the stack frame deallocation 
(a simple logic insn) can't really be that important to performance.



r~

Re: rs6000 stack_tie mishap again

2016-03-28 Thread Segher Boessenkool

On Mon, Mar 28, 2016 at 12:45:03PM +0200, Olivier Hainque wrote:
> > The normal -m32 compiler here generates code like
> > 
> > lwz 11,0(1)
> > 
> > and try as I might I cannot get it to fail.  Maybe because the GPR11
> > setup here involves a load?
> 
> You need to have had r11 last used to designate a global
> symbol as part of the function body (in order to have base_term
> designate a symbol_ref etc), and then have the scheduler
> decide that moving across is a good idea. It's certainly not
> an easy combination to trigger.

Yes, I did that (with some asm's).  Like this:

===
void g(int, char *);

int dum;

void f(int x)
{
char big[20];
g(x, big);
g(x, big);
register void *p asm("r11") = 
asm("" : : "r"(p));
}
===

and the end of that becomes

===
bl g # 11   *call_nonlocal_sysvsi/1 [length = 4]
lis 11,dum@ha# 12   elf_high[length = 4]
la 11,dum@l(11)  # 13   elf_low [length = 4]
lwz 11,0(1)  # 33   *movsi_internal1/3  [length = 4]
lwz 0,4(11)  # 34   *movsi_internal1/3  [length = 4]
lwz 31,-4(11)# 36   *movsi_internal1/3  [length = 4]
mr 1,11  # 38   *movsi_internal1/1  [length = 4]
mtlr 0   # 35   *movsi_internal1/9  [length = 4]
blr  # 39   *return_internal_si [length = 4]
===

> > It seems clear that just considering the
> > prologue is enough to fix the original problem (frame pointer setup),
> > and problems like yours cannot happen in the prologue.
> 
> Right. What is unclear is if it's correct to consider only
> prologues here. ISTM we arrange to produce CFI for epilogues
> as well today, at least in some circumstances, and maybe the
> issue Richard had with prologues at the time would need to
> be addressed for epilogues as well today.

Correct is to do alias analysis in both the prologue and epilogue.
If we don't, ports have to do all kinds of stack ties etc. to fake it.

Currently we do neither, so doing just one is a step in the right
direction.

> > Better would be not to have this hack at all.
> 
> Sure.
> 
> >> My rough understanding is that we probably really care about frame_related
> >> insns only here, at least on targets where the flag is supposed to be set
> >> accurately.
> > 
> > On targets with DWARF2 unwind info the flag should be set on those insns
> > that need unwind info.  This does not include all insns in the epilogue
> > that access the frame, so I don't think this will help you?
> 
> My idea was that, maybe, the insns we need to see for alias analysis
> are precisely those for which the bit should not be set, which happened
> to be exactly the case in the problematic situation we hit. But as I said,
> I'm not 100% convinced the reasoning is globally correct.

All the register restores do not have the flag set, in most cases.
But they can in others (say, a target that does not optimise the CFI
stuff very well).

> >> This is the basis of the proposed patch, which essentially disconnects the
> >> shortcut above for !frame_related insns on targets which need precise alias
> >> analysis within epilogues, as indicated by a new target hook.
> > 
> > Eww.  Isn't that really all targets that schedule the epilogue at all?
> 
> Yes. Most of them use stronger barriers which the dependency analyser
> knows about, so aren't affected by this. 
> 
> That's a possible alternative approach for rs6000.

A clobber of scratch should work yes.  It's really heavy handed though,
we can move the GPR1 restore quite freely otherwise (in shrink-wrap),
but perhaps allowing scheduling to move it doesn't buy us much at all.

This doesn't solve the problem however that other dependencies in the
epilogue can be messed up in similar ways.


Segher

Re: rs6000 stack_tie mishap again

2016-03-28 Thread Olivier Hainque

> On Mar 28, 2016, at 05:01 , Segher Boessenkool  
> wrote:
> 
> The normal -m32 compiler here generates code like
> 
>   lwz 11,0(1)
> 
> and try as I might I cannot get it to fail.  Maybe because the GPR11
> setup here involves a load?

You need to have had r11 last used to designate a global
symbol as part of the function body (in order to have base_term
designate a symbol_ref etc), and then have the scheduler
decide that moving across is a good idea. It's certainly not
an easy combination to trigger.

>> We have observed this with a gcc 4.7 back-end and weren't able to reproduce
>> with a more recent version.
> 
> This makes it not a regression and thus out of scope for GCC 6.  We're
> supposed to stabilise things now ;-)

Sure, no problem if this is only for gcc 7. I posted the message
while the details were still fresh in my mind.

>>if (could_be_prologue_epilogue
>>&& prologue_epilogue_contains (insn))
>>  continue;
>> 
>> The motivation for this is unclear to me.
> 
> Alan linked to the history.

Right

>  It seems clear that just considering the
> prologue is enough to fix the original problem (frame pointer setup),
> and problems like yours cannot happen in the prologue.

Right. What is unclear is if it's correct to consider only
prologues here. ISTM we arrange to produce CFI for epilogues
as well today, at least in some circumstances, and maybe the
issue Richard had with prologues at the time would need to
be addressed for epilogues as well today.

> Better would be not to have this hack at all.

Sure.

>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
> 
> On targets with DWARF2 unwind info the flag should be set on those insns
> that need unwind info.  This does not include all insns in the epilogue
> that access the frame, so I don't think this will help you?

My idea was that, maybe, the insns we need to see for alias analysis
are precisely those for which the bit should not be set, which happened
to be exactly the case in the problematic situation we hit. But as I said,
I'm not 100% convinced the reasoning is globally correct.

>> This is the basis of the proposed patch, which essentially disconnects the
>> shortcut above for !frame_related insns on targets which need precise alias
>> analysis within epilogues, as indicated by a new target hook.
> 
> Eww.  Isn't that really all targets that schedule the epilogue at all?

Yes. Most of them use stronger barriers which the dependency analyser
knows about, so aren't affected by this. 

That's a possible alternative approach for rs6000.

Thanks for your feedback,

Olivier

Re: rs6000 stack_tie mishap again

2016-03-27 Thread Segher Boessenkool

On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
> The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
> epilogue sequence like
> 
>addi %r11,%r1,184# temp pointer within frame

The normal -m32 compiler here generates code like

lwz 11,0(1)

and try as I might I cannot get it to fail.  Maybe because the GPR11
setup here involves a load?

>addi %r1,%r11,104# release frame
> 
>evldd %r21,16(%r11)  # restore registers
>...  # ...
>evldd %r31,96(%r11)  # ...
> 
>blr  # return

> We have observed this with a gcc 4.7 back-end and weren't able to reproduce
> with a more recent version.

This makes it not a regression and thus out of scope for GCC 6.  We're
supposed to stabilise things now ;-)

>   if (! writep)
> {
>   base = find_base_term (mem_addr);
>   if (base && (GET_CODE (base) == LABEL_REF
>  || (GET_CODE (base) == SYMBOL_REF
>  && CONSTANT_POOL_ADDRESS_P (base
>   return 0;
> }
> 
> 
> with
> 
> (gdb) pr mem_addr
> (plus:SI (reg:SI 11 11)
> (const_int 96 [0x60]))
>  
> and
>  
> (gdb) pr base
> (symbol_ref/u:SI ("*.LC0") [flags 0x82])
>  
> coming from insn 710, despite 894 in between. Ug.

Yeah that is just Wrong.

> The reason why 894 is not accounted in the base ref computation is because it
> is part of the epilogue sequence, and init_alias_analysis has:
> 
>   /* Walk the insns adding values to the new_reg_base_value array.  */
>   for (i = 0; i < rpo_cnt; i++)
>   { ...
> if (could_be_prologue_epilogue
> && prologue_epilogue_contains (insn))
>   continue;
> 
> The motivation for this is unclear to me.

Alan linked to the history.  It seems clear that just considering the
prologue is enough to fix the original problem (frame pointer setup),
and problems like yours cannot happen in the prologue.

Better would be not to have this hack at all.

> My rough understanding is that we probably really care about frame_related
> insns only here, at least on targets where the flag is supposed to be set
> accurately.

On targets with DWARF2 unwind info the flag should be set on those insns
that need unwind info.  This does not include all insns in the epilogue
that access the frame, so I don't think this will help you?

> This is the basis of the proposed patch, which essentially disconnects the
> shortcut above for !frame_related insns on targets which need precise alias
> analysis within epilogues, as indicated by a new target hook.

Eww.  Isn't that really all targets that schedule the epilogue at all?

> On the key insn at hand, the frame_related bit was cleared on purpose,
> per:
> 
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html
> 
>   "1) Marking an instruction setting up r11 for use by _save64gpr_* as
>frame-related results in r11 being set as the cfa for the rest of the
>function.  That's bad when r11 gets used for other things later.
>   "

And that is correct.

> So, aside from the dependency issue which needs to be fixed somehow, I
> think it would make sense to consider using a strong blockage mecanism in
> expand_epilogue.

It would be very nice if we could directly express "the set of GPR1 should
stay behind any frame accesses", yeah.


Segher

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Jeff Law


On 03/24/2016 02:17 AM, Olivier Hainque wrote:

[snip]

So, aside from the dependency issue which needs to be fixed somehow, I
think it would make sense to consider using a strong blockage mecanism in
expand_epilogue.


That's what we both said here
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html

and David agreed too
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html

but if you can have the alias analysis changes accepted that would be
even better.


I'd really like to come to a resolution we're confident is robust,
because these are really very nasty bugs.
The robust solution is to have a scheduling barrier just before the 
point where the stack is deallocated.


The alternative some folks have suggested would be for the generic parts 
of the compiler to add the scheduling barrier before the stack pointer 
adjustment.  I wouldn't object to that.



Jeff

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Olivier Hainque

Hello Alan,

> On 24 Mar 2016, at 05:10, Alan Modra  wrote:
> 
>>if (could_be_prologue_epilogue
>>&& prologue_epilogue_contains (insn))
>>  continue;

> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

Ah, interesting, thanks!

>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
> 
> Also, possibly just prologue insns.  So you might be able to modify
> init_alias_analysis just to ignore the prologue and skip any need for
> yet another hook.

Which would be good.

> Let's see what rth thinks.

Definitely.

> He did say the patch might need to be redone.  :)
> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

And here we have a case :)

> [snip]
>> So, aside from the dependency issue which needs to be fixed somehow, I
>> think it would make sense to consider using a strong blockage mecanism in
>> expand_epilogue.
> 
> That's what we both said here
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html
> 
> and David agreed too
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html
> 
> but if you can have the alias analysis changes accepted that would be
> even better.

I'd really like to come to a resolution we're confident is robust,
because these are really very nasty bugs.

To tell the truth, my current feeling is that relying on the frame_related
bit still seems fragile (*) so I'd be happier with something stronger.

(*) First, it's not easy to be positive that all the insns we'd need to
catch are not frame_related, even if only looking at the current rs6000
epilogue expander. Second, it's very easy to consider flipping one such
bit for whatever reason and not think about this kind of implications.

Thanks for your feedback on this!

Olivier

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Olivier Hainque

> On 24 Mar 2016, at 05:58, Alan Modra  wrote:
> 
> On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
>> The description and
>> references to prior SPE prologue and epilogue changes do not confirm a
>> wider problem.
> 
> There's a good chance this affects ABI_V4 large stack frames too.
> If restoring regs inline we'll be using r11 as a base, just like SPE
> does with moderate sized stack frames when restoring 64-bit regs.

Exactly. If I'm not mistaken, the set of problematic cases encompasses
everything which uses in the epilogue, as a base, a register which
might have been used last to designate a global object in the function
body. There are such uses of at least r11 not limited to SPE.

Olivier

Re: rs6000 stack_tie mishap again

2016-03-23 Thread Alan Modra

On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
> The description and
> references to prior SPE prologue and epilogue changes do not confirm a
> wider problem.

There's a good chance this affects ABI_V4 large stack frames too.
If restoring regs inline we'll be using r11 as a base, just like SPE
does with moderate sized stack frames when restoring 64-bit regs.

-- 
Alan Modra
Australia Development Lab, IBM

Re: rs6000 stack_tie mishap again

2016-03-23 Thread Alan Modra

On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
> The reason why 894 is not accounted in the base ref computation is because it
> is part of the epilogue sequence, and init_alias_analysis has:
> 
>   /* Walk the insns adding values to the new_reg_base_value array.  */
>   for (i = 0; i < rpo_cnt; i++)
>   { ...
> if (could_be_prologue_epilogue
> && prologue_epilogue_contains (insn))
>   continue;
> 
> The motivation for this is unclear to me.

https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

> My rough understanding is that we probably really care about frame_related
> insns only here, at least on targets where the flag is supposed to be set
> accurately.

Also, possibly just prologue insns.  So you might be able to modify
init_alias_analysis just to ignore the prologue and skip any need for
yet another hook.

Let's see what rth thinks.  He did say the patch might need to be
redone.  :)
https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

[snip]
> So, aside from the dependency issue which needs to be fixed somehow, I
> think it would make sense to consider using a strong blockage mecanism in
> expand_epilogue.

That's what we both said here
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html

and David agreed too
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html

but if you can have the alias analysis changes accepted that would be
even better.

-- 
Alan Modra
Australia Development Lab, IBM

Re: rs6000 stack_tie mishap again

2016-03-23 Thread David Edelsohn

First, SPE has not been maintained and little participation from
Freescale.  I would rather deprecate all SPE support.  SPE ABI is
broken by design.

I find the approach very heavy-handed.  If you want to enable the
target hook for SPE *only*, that's fine with me.  The description and
references to prior SPE prologue and epilogue changes do not confirm a
wider problem.

- David

rs6000 stack_tie mishap again

2016-03-23 Thread Olivier Hainque

Hello,

This is a proposal to address what I think has been a long-standing, very
nasty latent code generation problem, which just manifested in-house on a
proprietary testcase.

This is intricate, so long email ...

Visible misbehavior


The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
epilogue sequence like

   addi %r11,%r1,184# temp pointer within frame

   addi %r1,%r11,104# release frame

   evldd %r21,16(%r11)  # restore registers
   ...  # ...
   evldd %r31,96(%r11)  # ...

   blr  # return

This is causing troubles in situations where interrupt code somehow quicks in
between the frame release and the blr, clobbering the stack slots before the
register restore insn have run.

Triggering conditions
=

We are witnessing a situation where the rs6000 stack_tie mechanism is
ineffective, out of a missed dependency between the stack_tie insn and some
register restores.

We have observed this with a gcc 4.7 back-end and weren't able to reproduce
with a more recent version. I believe, however, that the problem is still
latent, just requiring extremely precise conditions to trigger the problematic
scheduling decision.

At the RTL level, with our 4.7 based compiler, the chain of events is as
follows:

Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to:

 p.adb.216r.split4:

(insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650])
(high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high}
...

(note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG)
...
... sequence of register restores ...

(insn 894 893 895 32 (set (reg:SI 11 %r11)
(plus:SI (reg/f:SI 1 %r1)
(const_int 184 [0xb8]))) 
... 
(insn 907 906 908 32 (set (reg:V2SI 31 31)
(mem:V2SI (plus:SI (reg:SI 11 11)
(const_int 96 [0x60])) [0 S8 A64])) 
 
... STACK TIE insn, intended to prevent the scheduler from moving ...
... restores past this point, so they remain issued prior to the  ...
... following insn:
 
(insn 908 907 909 32 (parallel [
(set (mem/c:BLK (reg/f:SI 1 1) [17 A8])
(const_int 0 [0]))
(set (mem/c:BLK (reg:SI 11 11) [17 A8])
(const_int 0 [0]))
]) ../p.adb:84:8 681 {stack_tie}
 (nil))
 
... bumping r1 (stack pointer), releasing the stack frame:
 
(insn/f 909 908 910 32 (set (reg/f:SI 1 1)
(plus:SI (reg:SI 11 11)
(const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1}

The tie references a mem with alias set 17 (stack accesses). The restores
use alias set 0, which should conflict anyway, so this looks OK.

When sched2 kicks in, we however fail to establish a dependency between
907 (and other restores relative to r11) and the stack_tie.

At the bottom of the chain, we see write_dependence_p claiming absence
of dependence for 907 from

  if (! writep)
{
  base = find_base_term (mem_addr);
  if (base && (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF
   && CONSTANT_POOL_ADDRESS_P (base
return 0;
}


with

(gdb) pr mem_addr
(plus:SI (reg:SI 11 11)
(const_int 96 [0x60]))
 
and
 
(gdb) pr base
(symbol_ref/u:SI ("*.LC0") [flags 0x82])
 
coming from insn 710, despite 894 in between. Ug.

Further Analysis & patch proposal
=

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

  /* Walk the insns adding values to the new_reg_base_value array.  */
  for (i = 0; i < rpo_cnt; i++)
{ ...
  if (could_be_prologue_epilogue
  && prologue_epilogue_contains (insn))
continue;

The motivation for this is unclear to me.

My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.

This is the basis of the proposed patch, which essentially disconnects the
shortcut above for !frame_related insns on targets which need precise alias
analysis within epilogues, as indicated by a new target hook.

On the key insn at hand, the frame_related bit was cleared on purpose,
per:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

  "1) Marking an instruction setting up r11 for use by _save64gpr_* as
   frame-related results in r11 being set as the cfa for the rest of the
   function.  That's bad when r11 gets used for other things later.
  "

I have verified that it cures the problem with our gcc-4.7 based toolchain,
and we have been exercising it in gcc-4.9 based compilers on lots of ppc and
e500 configurations without problems for a few weeks now

I'll be happy to perform further testing if required. This is pretty heavy for
me, however, so I'd rather have a discussion and a preliminary agreement on
the approach first.

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

Re: rs6000 stack_tie mishap again

rs6000 stack_tie mishap again

27 matches

Site Navigation

Mail list logo

Footer information