Re: rs6000 stack_tie mishap again
On Fri, Apr 15, 2016 at 07:05:25PM +0200, Olivier Hainque wrote: > > But don't you run the risk that the stack could be deallocated before the > > restores are done? This came up on the PA port a long time ago. IIRC the > > situations was something like this: > > > > We had a frame pointer and all the restores were being done via the frame > > pointer. So the scheduler moved the stack pointer adjustment up past the > > register restores. At which point the restores were reading from > > unallocated stack space. > > > > 99.99% of the time it didn't cause a problem. But if we got an interrupt > > in that brief window, boom, the interrupt handler could allocate a frame on > > the current stack, store some data in there which clobbered the saved > > register state. > > Yes, that's exactly the kind of problem we're fighting and, indeed, > the memory barrier alone isn't enough. > > Here, the idea is to add a memory barrier to the set of things that the rs6000 > back-end already does, which tries to be better than a complete scheduling > barrier (see rs6000_emit_stack_tie), but turns out to still fail in some > corner > cases due to alias analysis intricacies (original problem description at > the very beginning of this now long thread: > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html) I think the best thing to do is add the clobber-of-mem-scratch to the final stack deallocate (as a parallel). I don't see anything else that will work reliably. Segher
Re: rs6000 stack_tie mishap again
> On Apr 15, 2016, at 18:42 , Jeff Lawwrote: >> /* (mem:BLK (scratch)) is a special mechanism to conflict with everything. >> This is used in epilogue deallocation functions. */ > *That's* the one I was looking for :-) :-) >> Yes, I pondered this one and thought that a memory barrier >> would be less aggressive, while good enough. > But don't you run the risk that the stack could be deallocated before the > restores are done? This came up on the PA port a long time ago. IIRC the > situations was something like this: > > We had a frame pointer and all the restores were being done via the frame > pointer. So the scheduler moved the stack pointer adjustment up past the > register restores. At which point the restores were reading from unallocated > stack space. > > 99.99% of the time it didn't cause a problem. But if we got an interrupt in > that brief window, boom, the interrupt handler could allocate a frame on the > current stack, store some data in there which clobbered the saved register > state. Yes, that's exactly the kind of problem we're fighting and, indeed, the memory barrier alone isn't enough. Here, the idea is to add a memory barrier to the set of things that the rs6000 back-end already does, which tries to be better than a complete scheduling barrier (see rs6000_emit_stack_tie), but turns out to still fail in some corner cases due to alias analysis intricacies (original problem description at the very beginning of this now long thread: https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html) Olivier
Re: rs6000 stack_tie mishap again
On 04/14/2016 11:10 AM, Olivier Hainque wrote: On 14 Apr 2016, at 18:50, Jeff Lawwrote: I thought we had code to handle this case specially, but I can't immediately find it in sched-deps.c. Unless I misunderstood what you were exactly looking for, the special code is in alias.c. E.g. write_dependence_p: /* (mem:BLK (scratch)) is a special mechanism to conflict with everything. This is used in epilogue deallocation functions. */ *That's* the one I was looking for :-) ... Some ports also use an unspec_volatile to achieve the same effect: ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and ;; all of memory. This blocks insns from being moved across this point. (define_insn "blockage" [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] "" "" [(set_attr "length" "0")]) Yes, I pondered this one and thought that a memory barrier would be less aggressive, while good enough. But don't you run the risk that the stack could be deallocated before the restores are done? This came up on the PA port a long time ago. IIRC the situations was something like this: We had a frame pointer and all the restores were being done via the frame pointer. So the scheduler moved the stack pointer adjustment up past the register restores. At which point the restores were reading from unallocated stack space. 99.99% of the time it didn't cause a problem. But if we got an interrupt in that brief window, boom, the interrupt handler could allocate a frame on the current stack, store some data in there which clobbered the saved register state. Jeff
Re: rs6000 stack_tie mishap again
> On Apr 15, 2016, at 00:42 , Segher Boessenkool> wrote: > >> >> Something like the attached patch, at least for next stage1 until the >> more general issue is sorted out ? > > It's a bit heavy; as a workaround, we may want this clobber in the stack > deallocation in the epilogue, but not in all other places stack_tie is > used. OK, I can add an epilogue_p argument or something like that. > And for stage 1 we do not really want a workaround, we want to fix the > actual problem ;-) Sure. It might take some time though, and we might want a workaround in the interim. Maintainers' call I suppose :)
Re: rs6000 stack_tie mishap again
> On Apr 15, 2016, at 06:37 , Andreas Krebbel> wrote: >> /* (mem:BLK (scratch)) is a special mechanism to conflict with everything. >> This is used in epilogue deallocation functions. */ >> ... > > Ok thanks. I've verified that the dependencies are also generated when > using this expression in a clobber. Ok, thanks. This is consistent with what I've seen. >> sequence was allowed to move past the stack pointer reset when >> it's performed as a mere register to register move. > > I've seen the same on S/390 when restoring the stack pointer > from a floating point reg. But I think it is not limited to > reg-reg stack pointer restores. Potentially this could also > happen with the stack pointer decrement in the prologue which > could also get moved across a mem only barrier. Indeed. With Kind Regards, Olivier
Re: rs6000 stack_tie mishap again
On 04/14/2016 07:10 PM, Olivier Hainque wrote: > >> On 14 Apr 2016, at 18:50, Jeff Lawwrote: >> >> I thought we had code to handle this case specially, but I can't immediately >> find it in sched-deps.c. > > Unless I misunderstood what you were exactly looking for, > the special code is in alias.c. E.g. write_dependence_p: > > /* (mem:BLK (scratch)) is a special mechanism to conflict with everything. > This is used in epilogue deallocation functions. */ > ... Ok thanks. I've verified that the dependencies are also generated when using this expression in a clobber. > Emitting just that alone isn't enough though. The first attempt > I made did that and failed because the whole > > reg-restore > reg-restore > tie (mem blockage only) > > sequence was allowed to move past the stack pointer reset when > it's performed as a mere register to register move. > > So I ended up adding the clobber mem:blk scratch to the existing > tie parallel, which references the stack pointer register as > well so is forbidden to move past it's update. I've seen the same on S/390 when restoring the stack pointer from a floating point reg. But I think it is not limited to reg-reg stack pointer restores. Potentially this could also happen with the stack pointer decrement in the prologue which could also get moved across a mem only barrier. Bye, -Andreas-
Re: rs6000 stack_tie mishap again
On Mon, Apr 11, 2016 at 12:15:10PM +0200, Olivier Hainque wrote: > > > On Mar 28, 2016, at 19:41 , Richard Hendersonwrote: > > > > But I expect for stage4, the best solution is to strengthen the stack_tie > > pattern to block all memory. Early scheduling of the stack frame > > deallocation (a simple logic insn) can't really be that important to > > performance. > > Something like the attached patch, at least for next stage1 until the > more general issue is sorted out ? It's a bit heavy; as a workaround, we may want this clobber in the stack deallocation in the epilogue, but not in all other places stack_tie is used. And for stage 1 we do not really want a workaround, we want to fix the actual problem ;-) Segher
Re: rs6000 stack_tie mishap again
> On 14 Apr 2016, at 18:50, Jeff Lawwrote: > > I thought we had code to handle this case specially, but I can't immediately > find it in sched-deps.c. Unless I misunderstood what you were exactly looking for, the special code is in alias.c. E.g. write_dependence_p: /* (mem:BLK (scratch)) is a special mechanism to conflict with everything. This is used in epilogue deallocation functions. */ ... > Some ports also use an unspec_volatile to achieve the same effect: > > ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and > ;; all of memory. This blocks insns from being moved across this point. > > (define_insn "blockage" > [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] > "" > "" > [(set_attr "length" "0")]) Yes, I pondered this one and thought that a memory barrier would be less aggressive, while good enough. > ;; Do not schedule instructions accessing memory across this point. > > (define_expand "memory_blockage" > [(set (match_dup 0) >(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))] > "" > { > operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); > MEM_VOLATILE_P (operands[0]) = 1; > }) > > (define_insn "*memory_blockage" > [(set (match_operand:BLK 0) >(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))] > "" > "" > [(set_attr "length" "0")]) Emitting just that alone isn't enough though. The first attempt I made did that and failed because the whole reg-restore reg-restore tie (mem blockage only) sequence was allowed to move past the stack pointer reset when it's performed as a mere register to register move. So I ended up adding the clobber mem:blk scratch to the existing tie parallel, which references the stack pointer register as well so is forbidden to move past it's update. Olivier
Re: rs6000 stack_tie mishap again
On 04/14/2016 09:47 AM, Andreas Krebbel wrote: On 04/11/2016 12:15 PM, Olivier Hainque wrote: On Mar 28, 2016, at 19:41 , Richard Hendersonwrote: But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory. Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance. Something like the attached patch, at least for next stage1 until the more general issue is sorted out ? Only basic testing at this point. I can do more if considered appropriate. Thanks for your feedback, With Kind Regards, Olivier * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a (clobber mem:BLK (scratch)) to the set of effects, blocking all memory accesses. We have the same problem on S/390 and I'm also looking for a way to solve that. I'm not sure about the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a memory write over a (clobber (mem))? I thought we had code to handle this case specially, but I can't immediately find it in sched-deps.c. Some ports also use an unspec_volatile to achieve the same effect: ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and ;; all of memory. This blocks insns from being moved across this point. (define_insn "blockage" [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] "" "" [(set_attr "length" "0")]) ;; Do not schedule instructions accessing memory across this point. (define_expand "memory_blockage" [(set (match_dup 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))] "" { operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); MEM_VOLATILE_P (operands[0]) = 1; }) (define_insn "*memory_blockage" [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))] "" "" [(set_attr "length" "0")]) Jeff
Re: rs6000 stack_tie mishap again
On 04/11/2016 12:15 PM, Olivier Hainque wrote: > >> On Mar 28, 2016, at 19:41 , Richard Hendersonwrote: >> >> But I expect for stage4, the best solution is to strengthen the stack_tie >> pattern to block all memory. Early scheduling of the stack frame >> deallocation (a simple logic insn) can't really be that important to >> performance. > > Something like the attached patch, at least for next stage1 until the > more general issue is sorted out ? > > Only basic testing at this point. I can do more if considered appropriate. > > Thanks for your feedback, > > With Kind Regards, > > Olivier > > * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a > (clobber mem:BLK (scratch)) to the set of effects, blocking > all memory accesses. We have the same problem on S/390 and I'm also looking for a way to solve that. I'm not sure about the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a memory write over a (clobber (mem))? The patch I'm currently testing follows a proposal from Joseph Myers in 2011: https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00977.html + [(set (mem:BLK (scratch)) + (unspec:BLK [(match_operand:BLK 0 "memory_operand" "+m")] UNSPEC_TIE))] What I'm wondering is whether the memory read is actually needed here?! Wouldn't the following have the same effect? [(set (mem:BLK (scratch)) (unspec:BLK [(const_int 0)] UNSPEC_TIE))] To my understanding this should block memory reads and writes from being moved across. The MEM probably should be set up to be in the frame alias set? Bye, -Andreas-
Re: rs6000 stack_tie mishap again
> On Mar 28, 2016, at 19:41 , Richard Hendersonwrote: > > But I expect for stage4, the best solution is to strengthen the stack_tie > pattern to block all memory. Early scheduling of the stack frame > deallocation (a simple logic insn) can't really be that important to > performance. Something like the attached patch, at least for next stage1 until the more general issue is sorted out ? Only basic testing at this point. I can do more if considered appropriate. Thanks for your feedback, With Kind Regards, Olivier * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a (clobber mem:BLK (scratch)) to the set of effects, blocking all memory accesses. stronger_rs6000_tie.diff Description: Binary data
Re: rs6000 stack_tie mishap again
> On Apr 8, 2016, at 17:37 , Segher Boessenkool> wrote: > > On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote: >>> But I expect for stage4, the best solution is to strengthen the stack_tie >>> pattern to block all memory. Early scheduling of the stack frame >>> deallocation (a simple logic insn) can't really be that important to >>> performance. >> >> My feeling as well. At least, it can't be important enough to warrant >> a sustained exposure to the kind of bug we're discussing here. > > Is it a regression? Changing this in stage 4, and this late in stage 4, > is super invasive. Wrt performance, well, I'd like to see numbers :-/ Sorry, my comment was ambiguous: "My feeling as well" etc was in reaction to Richard's second sentence. I didn't mean to comment on what we want to do or not for stage4. Regarding perfs, I agree that having numbers would be good. Nevertheless, I'd rather see some perf drop than just hoping for this not to show up, in any case, and I remain convinced that whatever we gain seems unlikely to be worth the risk of hitting this bug. Olivier
Re: rs6000 stack_tie mishap again
On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote: > > But I expect for stage4, the best solution is to strengthen the stack_tie > > pattern to block all memory. Early scheduling of the stack frame > > deallocation (a simple logic insn) can't really be that important to > > performance. > > My feeling as well. At least, it can't be important enough to warrant > a sustained exposure to the kind of bug we're discussing here. Is it a regression? Changing this in stage 4, and this late in stage 4, is super invasive. Wrt performance, well, I'd like to see numbers :-/ Segher
Re: rs6000 stack_tie mishap again
Hello Richard & Alan, Thanks for your feedback. Back on it after a few extra experiments. > On Mar 28, 2016, at 19:41 , Richard Hendersonwrote: > >> Let's see what rth thinks. He did say the patch might need to be >> redone. :) >> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html > > I be surprised if this is works as expected without side-effects. You've now > exposed the restore of the frame pointer to alias analysis, and it's probably > not seen as constant anymore. As you reference, I expect that any patch that > opens the epilogue to such scrutiny is going to have to special-case the > frame pointer as well. > That said, as Segher points out later in the thread, one can arrange for hard > regs within the body to bleed into temporaries used within the epilogue, > which is bad. So perhaps this is exactly what's needed longer-term. More > investigation is required. I'd be happy to help, as much as I can. Can you please expand a bit on the kind of side-effects you'd expect and hint at useful directions you believe the investigation should take ? I have some ideas and would like to make sure they're in line with what you think would be most relevant :) > But I expect for stage4, the best solution is to strengthen the stack_tie > pattern to block all memory. Early scheduling of the stack frame > deallocation (a simple logic insn) can't really be that important to > performance. My feeling as well. At least, it can't be important enough to warrant a sustained exposure to the kind of bug we're discussing here. Olivier
Re: rs6000 stack_tie mishap again
On Wed, Mar 30, 2016 at 11:02:41AM +0200, Olivier Hainque wrote: > void g(int, char *); > > void f(int x) > { >char big[20]; > start: >g(x, big); >g(x, big); >register void *p asm("r11") = & >asm("" : : "r"(p)); >asm("" : : :"r28"); >asm("" : : :"r29"); >asm("" : : :"r30"); > } > > I'm getting: > > lis 11,.L2@ha > la 11,.L2@l(11) > > lwz 11,0(1) > lwz 0,4(11) > lwz 28,-16(11) > > mr 1,11 > > mtlr 0 > lwz 29,-12(11) > lwz 30,-8(11) > lwz 31,-4(11) > > blr BTW, the exact sequence you get depends on -mcpu (not surprising), but yes, I see register restores after the "mr 1,11" too. -- Alan Modra Australia Development Lab, IBM
Re: rs6000 stack_tie mishap again
Hello Segher, > On Mar 28, 2016, at 13:18 , Segher Boessenkool> wrote: > >> You need to have had r11 last used to designate a global >> symbol as part of the function body (in order to have base_term >> designate a symbol_ref etc), and then have the scheduler >> decide that moving across is a good idea. It's certainly not >> an easy combination to trigger. > > Yes, I did that (with some asm's). Like this: > > === > void g(int, char *); > > int dum; > > void f(int x) > { >char big[20]; >g(x, big); >g(x, big); >register void *p asm("r11") = >asm("" : : "r"(p)); > } Ah, I see, thanks. In this instance, the problem doesn't trigger because CONSTANT_POOL_ADDRESS_P (base) is false in base = find_base_term (true_mem_addr); if (! writep && base && (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (base return 0; (part of write_dependence_p) With a minor variation: void g(int, char *); void f(int x) { char big[20]; start: g(x, big); g(x, big); register void *p asm("r11") = & asm("" : : "r"(p)); asm("" : : :"r28"); asm("" : : :"r29"); asm("" : : :"r30"); } I'm getting: lis 11,.L2@ha la 11,.L2@l(11) lwz 11,0(1) lwz 0,4(11) lwz 28,-16(11) mr 1,11 mtlr 0 lwz 29,-12(11) lwz 30,-8(11) lwz 31,-4(11) blr out of a powerpc-elf close-to-tunk compiler, despite the presence of a stack_tie insn at the rtl level. Olivier
Re: rs6000 stack_tie mishap again
On 03/23/2016 09:10 PM, Alan Modra wrote: On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote: The reason why 894 is not accounted in the base ref computation is because it is part of the epilogue sequence, and init_alias_analysis has: /* Walk the insns adding values to the new_reg_base_value array. */ for (i = 0; i < rpo_cnt; i++) { ... if (could_be_prologue_epilogue && prologue_epilogue_contains (insn)) continue; The motivation for this is unclear to me. https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html My rough understanding is that we probably really care about frame_related insns only here, at least on targets where the flag is supposed to be set accurately. Also, possibly just prologue insns. So you might be able to modify init_alias_analysis just to ignore the prologue and skip any need for yet another hook. Let's see what rth thinks. He did say the patch might need to be redone. :) https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html I be surprised if this is works as expected without side-effects. You've now exposed the restore of the frame pointer to alias analysis, and it's probably not seen as constant anymore. As you reference, I expect that any patch that opens the epilogue to such scrutiny is going to have to special-case the frame pointer as well. That said, as Segher points out later in the thread, one can arrange for hard regs within the body to bleed into temporaries used within the epilogue, which is bad. So perhaps this is exactly what's needed longer-term. More investigation is required. But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory. Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance. r~
Re: rs6000 stack_tie mishap again
On Mon, Mar 28, 2016 at 12:45:03PM +0200, Olivier Hainque wrote: > > The normal -m32 compiler here generates code like > > > > lwz 11,0(1) > > > > and try as I might I cannot get it to fail. Maybe because the GPR11 > > setup here involves a load? > > You need to have had r11 last used to designate a global > symbol as part of the function body (in order to have base_term > designate a symbol_ref etc), and then have the scheduler > decide that moving across is a good idea. It's certainly not > an easy combination to trigger. Yes, I did that (with some asm's). Like this: === void g(int, char *); int dum; void f(int x) { char big[20]; g(x, big); g(x, big); register void *p asm("r11") = asm("" : : "r"(p)); } === and the end of that becomes === bl g # 11 *call_nonlocal_sysvsi/1 [length = 4] lis 11,dum@ha# 12 elf_high[length = 4] la 11,dum@l(11) # 13 elf_low [length = 4] lwz 11,0(1) # 33 *movsi_internal1/3 [length = 4] lwz 0,4(11) # 34 *movsi_internal1/3 [length = 4] lwz 31,-4(11)# 36 *movsi_internal1/3 [length = 4] mr 1,11 # 38 *movsi_internal1/1 [length = 4] mtlr 0 # 35 *movsi_internal1/9 [length = 4] blr # 39 *return_internal_si [length = 4] === > > It seems clear that just considering the > > prologue is enough to fix the original problem (frame pointer setup), > > and problems like yours cannot happen in the prologue. > > Right. What is unclear is if it's correct to consider only > prologues here. ISTM we arrange to produce CFI for epilogues > as well today, at least in some circumstances, and maybe the > issue Richard had with prologues at the time would need to > be addressed for epilogues as well today. Correct is to do alias analysis in both the prologue and epilogue. If we don't, ports have to do all kinds of stack ties etc. to fake it. Currently we do neither, so doing just one is a step in the right direction. > > Better would be not to have this hack at all. > > Sure. > > >> My rough understanding is that we probably really care about frame_related > >> insns only here, at least on targets where the flag is supposed to be set > >> accurately. > > > > On targets with DWARF2 unwind info the flag should be set on those insns > > that need unwind info. This does not include all insns in the epilogue > > that access the frame, so I don't think this will help you? > > My idea was that, maybe, the insns we need to see for alias analysis > are precisely those for which the bit should not be set, which happened > to be exactly the case in the problematic situation we hit. But as I said, > I'm not 100% convinced the reasoning is globally correct. All the register restores do not have the flag set, in most cases. But they can in others (say, a target that does not optimise the CFI stuff very well). > >> This is the basis of the proposed patch, which essentially disconnects the > >> shortcut above for !frame_related insns on targets which need precise alias > >> analysis within epilogues, as indicated by a new target hook. > > > > Eww. Isn't that really all targets that schedule the epilogue at all? > > Yes. Most of them use stronger barriers which the dependency analyser > knows about, so aren't affected by this. > > That's a possible alternative approach for rs6000. A clobber of scratch should work yes. It's really heavy handed though, we can move the GPR1 restore quite freely otherwise (in shrink-wrap), but perhaps allowing scheduling to move it doesn't buy us much at all. This doesn't solve the problem however that other dependencies in the epilogue can be messed up in similar ways. Segher
Re: rs6000 stack_tie mishap again
> On Mar 28, 2016, at 05:01 , Segher Boessenkool> wrote: > > The normal -m32 compiler here generates code like > > lwz 11,0(1) > > and try as I might I cannot get it to fail. Maybe because the GPR11 > setup here involves a load? You need to have had r11 last used to designate a global symbol as part of the function body (in order to have base_term designate a symbol_ref etc), and then have the scheduler decide that moving across is a good idea. It's certainly not an easy combination to trigger. >> We have observed this with a gcc 4.7 back-end and weren't able to reproduce >> with a more recent version. > > This makes it not a regression and thus out of scope for GCC 6. We're > supposed to stabilise things now ;-) Sure, no problem if this is only for gcc 7. I posted the message while the details were still fresh in my mind. >>if (could_be_prologue_epilogue >>&& prologue_epilogue_contains (insn)) >> continue; >> >> The motivation for this is unclear to me. > > Alan linked to the history. Right > It seems clear that just considering the > prologue is enough to fix the original problem (frame pointer setup), > and problems like yours cannot happen in the prologue. Right. What is unclear is if it's correct to consider only prologues here. ISTM we arrange to produce CFI for epilogues as well today, at least in some circumstances, and maybe the issue Richard had with prologues at the time would need to be addressed for epilogues as well today. > Better would be not to have this hack at all. Sure. >> My rough understanding is that we probably really care about frame_related >> insns only here, at least on targets where the flag is supposed to be set >> accurately. > > On targets with DWARF2 unwind info the flag should be set on those insns > that need unwind info. This does not include all insns in the epilogue > that access the frame, so I don't think this will help you? My idea was that, maybe, the insns we need to see for alias analysis are precisely those for which the bit should not be set, which happened to be exactly the case in the problematic situation we hit. But as I said, I'm not 100% convinced the reasoning is globally correct. >> This is the basis of the proposed patch, which essentially disconnects the >> shortcut above for !frame_related insns on targets which need precise alias >> analysis within epilogues, as indicated by a new target hook. > > Eww. Isn't that really all targets that schedule the epilogue at all? Yes. Most of them use stronger barriers which the dependency analyser knows about, so aren't affected by this. That's a possible alternative approach for rs6000. Thanks for your feedback, Olivier
Re: rs6000 stack_tie mishap again
On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote: > The visible effect is a powerpc-eabispe compiler (no red-zone) producing an > epilogue sequence like > >addi %r11,%r1,184# temp pointer within frame The normal -m32 compiler here generates code like lwz 11,0(1) and try as I might I cannot get it to fail. Maybe because the GPR11 setup here involves a load? >addi %r1,%r11,104# release frame > >evldd %r21,16(%r11) # restore registers >... # ... >evldd %r31,96(%r11) # ... > >blr # return > We have observed this with a gcc 4.7 back-end and weren't able to reproduce > with a more recent version. This makes it not a regression and thus out of scope for GCC 6. We're supposed to stabilise things now ;-) > if (! writep) > { > base = find_base_term (mem_addr); > if (base && (GET_CODE (base) == LABEL_REF > || (GET_CODE (base) == SYMBOL_REF > && CONSTANT_POOL_ADDRESS_P (base > return 0; > } > > > with > > (gdb) pr mem_addr > (plus:SI (reg:SI 11 11) > (const_int 96 [0x60])) > > and > > (gdb) pr base > (symbol_ref/u:SI ("*.LC0") [flags 0x82]) > > coming from insn 710, despite 894 in between. Ug. Yeah that is just Wrong. > The reason why 894 is not accounted in the base ref computation is because it > is part of the epilogue sequence, and init_alias_analysis has: > > /* Walk the insns adding values to the new_reg_base_value array. */ > for (i = 0; i < rpo_cnt; i++) > { ... > if (could_be_prologue_epilogue > && prologue_epilogue_contains (insn)) > continue; > > The motivation for this is unclear to me. Alan linked to the history. It seems clear that just considering the prologue is enough to fix the original problem (frame pointer setup), and problems like yours cannot happen in the prologue. Better would be not to have this hack at all. > My rough understanding is that we probably really care about frame_related > insns only here, at least on targets where the flag is supposed to be set > accurately. On targets with DWARF2 unwind info the flag should be set on those insns that need unwind info. This does not include all insns in the epilogue that access the frame, so I don't think this will help you? > This is the basis of the proposed patch, which essentially disconnects the > shortcut above for !frame_related insns on targets which need precise alias > analysis within epilogues, as indicated by a new target hook. Eww. Isn't that really all targets that schedule the epilogue at all? > On the key insn at hand, the frame_related bit was cleared on purpose, > per: > > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html > > "1) Marking an instruction setting up r11 for use by _save64gpr_* as >frame-related results in r11 being set as the cfa for the rest of the >function. That's bad when r11 gets used for other things later. > " And that is correct. > So, aside from the dependency issue which needs to be fixed somehow, I > think it would make sense to consider using a strong blockage mecanism in > expand_epilogue. It would be very nice if we could directly express "the set of GPR1 should stay behind any frame accesses", yeah. Segher
Re: rs6000 stack_tie mishap again
On 03/24/2016 02:17 AM, Olivier Hainque wrote: [snip] So, aside from the dependency issue which needs to be fixed somehow, I think it would make sense to consider using a strong blockage mecanism in expand_epilogue. That's what we both said here https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html and David agreed too https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html but if you can have the alias analysis changes accepted that would be even better. I'd really like to come to a resolution we're confident is robust, because these are really very nasty bugs. The robust solution is to have a scheduling barrier just before the point where the stack is deallocated. The alternative some folks have suggested would be for the generic parts of the compiler to add the scheduling barrier before the stack pointer adjustment. I wouldn't object to that. Jeff
Re: rs6000 stack_tie mishap again
Hello Alan, > On 24 Mar 2016, at 05:10, Alan Modrawrote: > >>if (could_be_prologue_epilogue >>&& prologue_epilogue_contains (insn)) >> continue; > https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html Ah, interesting, thanks! >> My rough understanding is that we probably really care about frame_related >> insns only here, at least on targets where the flag is supposed to be set >> accurately. > > Also, possibly just prologue insns. So you might be able to modify > init_alias_analysis just to ignore the prologue and skip any need for > yet another hook. Which would be good. > Let's see what rth thinks. Definitely. > He did say the patch might need to be redone. :) > https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html And here we have a case :) > [snip] >> So, aside from the dependency issue which needs to be fixed somehow, I >> think it would make sense to consider using a strong blockage mecanism in >> expand_epilogue. > > That's what we both said here > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html > > and David agreed too > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html > > but if you can have the alias analysis changes accepted that would be > even better. I'd really like to come to a resolution we're confident is robust, because these are really very nasty bugs. To tell the truth, my current feeling is that relying on the frame_related bit still seems fragile (*) so I'd be happier with something stronger. (*) First, it's not easy to be positive that all the insns we'd need to catch are not frame_related, even if only looking at the current rs6000 epilogue expander. Second, it's very easy to consider flipping one such bit for whatever reason and not think about this kind of implications. Thanks for your feedback on this! Olivier
Re: rs6000 stack_tie mishap again
> On 24 Mar 2016, at 05:58, Alan Modrawrote: > > On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote: >> The description and >> references to prior SPE prologue and epilogue changes do not confirm a >> wider problem. > > There's a good chance this affects ABI_V4 large stack frames too. > If restoring regs inline we'll be using r11 as a base, just like SPE > does with moderate sized stack frames when restoring 64-bit regs. Exactly. If I'm not mistaken, the set of problematic cases encompasses everything which uses in the epilogue, as a base, a register which might have been used last to designate a global object in the function body. There are such uses of at least r11 not limited to SPE. Olivier
Re: rs6000 stack_tie mishap again
On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote: > The description and > references to prior SPE prologue and epilogue changes do not confirm a > wider problem. There's a good chance this affects ABI_V4 large stack frames too. If restoring regs inline we'll be using r11 as a base, just like SPE does with moderate sized stack frames when restoring 64-bit regs. -- Alan Modra Australia Development Lab, IBM
Re: rs6000 stack_tie mishap again
On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote: > The reason why 894 is not accounted in the base ref computation is because it > is part of the epilogue sequence, and init_alias_analysis has: > > /* Walk the insns adding values to the new_reg_base_value array. */ > for (i = 0; i < rpo_cnt; i++) > { ... > if (could_be_prologue_epilogue > && prologue_epilogue_contains (insn)) > continue; > > The motivation for this is unclear to me. https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html > My rough understanding is that we probably really care about frame_related > insns only here, at least on targets where the flag is supposed to be set > accurately. Also, possibly just prologue insns. So you might be able to modify init_alias_analysis just to ignore the prologue and skip any need for yet another hook. Let's see what rth thinks. He did say the patch might need to be redone. :) https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html [snip] > So, aside from the dependency issue which needs to be fixed somehow, I > think it would make sense to consider using a strong blockage mecanism in > expand_epilogue. That's what we both said here https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html and David agreed too https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html but if you can have the alias analysis changes accepted that would be even better. -- Alan Modra Australia Development Lab, IBM
Re: rs6000 stack_tie mishap again
First, SPE has not been maintained and little participation from Freescale. I would rather deprecate all SPE support. SPE ABI is broken by design. I find the approach very heavy-handed. If you want to enable the target hook for SPE *only*, that's fine with me. The description and references to prior SPE prologue and epilogue changes do not confirm a wider problem. - David
rs6000 stack_tie mishap again
Hello, This is a proposal to address what I think has been a long-standing, very nasty latent code generation problem, which just manifested in-house on a proprietary testcase. This is intricate, so long email ... Visible misbehavior The visible effect is a powerpc-eabispe compiler (no red-zone) producing an epilogue sequence like addi %r11,%r1,184# temp pointer within frame addi %r1,%r11,104# release frame evldd %r21,16(%r11) # restore registers ... # ... evldd %r31,96(%r11) # ... blr # return This is causing troubles in situations where interrupt code somehow quicks in between the frame release and the blr, clobbering the stack slots before the register restore insn have run. Triggering conditions = We are witnessing a situation where the rs6000 stack_tie mechanism is ineffective, out of a missed dependency between the stack_tie insn and some register restores. We have observed this with a gcc 4.7 back-end and weren't able to reproduce with a more recent version. I believe, however, that the problem is still latent, just requiring extremely precise conditions to trigger the problematic scheduling decision. At the RTL level, with our 4.7 based compiler, the chain of events is as follows: Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to: p.adb.216r.split4: (insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650]) (high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high} ... (note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG) ... ... sequence of register restores ... (insn 894 893 895 32 (set (reg:SI 11 %r11) (plus:SI (reg/f:SI 1 %r1) (const_int 184 [0xb8]))) ... (insn 907 906 908 32 (set (reg:V2SI 31 31) (mem:V2SI (plus:SI (reg:SI 11 11) (const_int 96 [0x60])) [0 S8 A64])) ... STACK TIE insn, intended to prevent the scheduler from moving ... ... restores past this point, so they remain issued prior to the ... ... following insn: (insn 908 907 909 32 (parallel [ (set (mem/c:BLK (reg/f:SI 1 1) [17 A8]) (const_int 0 [0])) (set (mem/c:BLK (reg:SI 11 11) [17 A8]) (const_int 0 [0])) ]) ../p.adb:84:8 681 {stack_tie} (nil)) ... bumping r1 (stack pointer), releasing the stack frame: (insn/f 909 908 910 32 (set (reg/f:SI 1 1) (plus:SI (reg:SI 11 11) (const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1} The tie references a mem with alias set 17 (stack accesses). The restores use alias set 0, which should conflict anyway, so this looks OK. When sched2 kicks in, we however fail to establish a dependency between 907 (and other restores relative to r11) and the stack_tie. At the bottom of the chain, we see write_dependence_p claiming absence of dependence for 907 from if (! writep) { base = find_base_term (mem_addr); if (base && (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (base return 0; } with (gdb) pr mem_addr (plus:SI (reg:SI 11 11) (const_int 96 [0x60])) and (gdb) pr base (symbol_ref/u:SI ("*.LC0") [flags 0x82]) coming from insn 710, despite 894 in between. Ug. Further Analysis & patch proposal = The reason why 894 is not accounted in the base ref computation is because it is part of the epilogue sequence, and init_alias_analysis has: /* Walk the insns adding values to the new_reg_base_value array. */ for (i = 0; i < rpo_cnt; i++) { ... if (could_be_prologue_epilogue && prologue_epilogue_contains (insn)) continue; The motivation for this is unclear to me. My rough understanding is that we probably really care about frame_related insns only here, at least on targets where the flag is supposed to be set accurately. This is the basis of the proposed patch, which essentially disconnects the shortcut above for !frame_related insns on targets which need precise alias analysis within epilogues, as indicated by a new target hook. On the key insn at hand, the frame_related bit was cleared on purpose, per: https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html "1) Marking an instruction setting up r11 for use by _save64gpr_* as frame-related results in r11 being set as the cfa for the rest of the function. That's bad when r11 gets used for other things later. " I have verified that it cures the problem with our gcc-4.7 based toolchain, and we have been exercising it in gcc-4.9 based compilers on lots of ppc and e500 configurations without problems for a few weeks now I'll be happy to perform further testing if required. This is pretty heavy for me, however, so I'd rather have a discussion and a preliminary agreement on the approach first.