Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On 5/14/19 4:57 PM, Jakub Jelinek wrote: > On Tue, May 14, 2019 at 01:08:27PM -0600, Jeff Law wrote: >>> In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted a >>> patch that would set it earlier (or it could be set during gimplification >>> and propagated during inlining, would need to be in cfun->calls_eh_return >>> instead of crtl->calls_eh_return) and then targets for which we do not want >>> to bother with it or where it is not beneficial to have tail calls in >>> functions that call __builtin_eh_return (as I said in the thread, e.g. on >>> x86_64-linux it is both smaller and faster), the targets which we expect to >>> fail currently or even have a proof of that can just punt. >> I would go with a patch that got the info set earlier and just punt in >> the generic code. I just don't see this case as terribly important to >> optimize. > > So like this? Bootstrapped/regtested on x86_64-linux and i686-linux, > verified _Unwind_Resume_or_Rethrow no longer has a tail call. > > 2019-05-14 Jakub Jelinek > > PR c++/59813 > PR target/90418 > * function.h (struct function): Add calls_eh_return member. > * gimplify.c (gimplify_call_expr): Set cfun->calls_eh_return when > gimplifying __builtin_eh_return call. > * tree-inline.c (initialize_cfun): Copy calls_eh_return from src_cfun > to cfun. > (expand_call_inline): Or in src_cfun->calls_eh_return into > dst_cfun->calls_eh_return. > * tree-tailcall.c (suitable_for_tail_call_opt_p): Return false if > cfun->calls_eh_return. > * lto-streamer-in.c (input_struct_function_base): Read calls_eh_return. > * lto-streamer-out.c (output_struct_function_base): Write > calls_eh_return. Yea, this is how I envisioned it working. I think we should just run with this. Jeff
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, all, > On 14 May 2019, at 23:57, Jakub Jelinek wrote: > > On Tue, May 14, 2019 at 01:08:27PM -0600, Jeff Law wrote: >>> In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted a >>> patch that would set it earlier (or it could be set during gimplification >>> and propagated during inlining, would need to be in cfun->calls_eh_return >>> instead of crtl->calls_eh_return) and then targets for which we do not want >>> to bother with it or where it is not beneficial to have tail calls in >>> functions that call __builtin_eh_return (as I said in the thread, e.g. on >>> x86_64-linux it is both smaller and faster), the targets which we expect to >>> fail currently or even have a proof of that can just punt. >> I would go with a patch that got the info set earlier and just punt in >> the generic code. I just don't see this case as terribly important to >> optimize. > > So like this? Bootstrapped/regtested on x86_64-linux and i686-linux, > verified _Unwind_Resume_or_Rethrow no longer has a tail call. JFTR, this fixes bootstrap/90418 for powerpc-darwin9 I don’t have a strong policy view: I’m happy with either this, or some variant that provides ??->calls_eh_return at _function_ok_for_sibcall time. it would be nice to get consensus and apply a fix. Iain > > 2019-05-14 Jakub Jelinek > > PR c++/59813 > PR target/90418 > * function.h (struct function): Add calls_eh_return member. > * gimplify.c (gimplify_call_expr): Set cfun->calls_eh_return when > gimplifying __builtin_eh_return call. > * tree-inline.c (initialize_cfun): Copy calls_eh_return from src_cfun > to cfun. > (expand_call_inline): Or in src_cfun->calls_eh_return into > dst_cfun->calls_eh_return. > * tree-tailcall.c (suitable_for_tail_call_opt_p): Return false if > cfun->calls_eh_return. > * lto-streamer-in.c (input_struct_function_base): Read calls_eh_return. > * lto-streamer-out.c (output_struct_function_base): Write > calls_eh_return. > > --- gcc/function.h.jj 2019-01-01 12:37:15.383004075 +0100 > +++ gcc/function.h2019-05-14 21:40:13.837715081 +0200 > @@ -327,6 +327,9 @@ struct GTY(()) function { > either as a subroutine or builtin. */ > unsigned int calls_alloca : 1; > > + /* Nonzero if function being compiled can call __builtin_eh_return. */ > + unsigned int calls_eh_return : 1; > + > /* Nonzero if function being compiled receives nonlocal gotos > from nested functions. */ > unsigned int has_nonlocal_label : 1; > --- gcc/gimplify.c.jj 2019-05-03 15:22:07.416401170 +0200 > +++ gcc/gimplify.c2019-05-14 21:51:38.700288873 +0200 > @@ -3297,6 +3297,10 @@ gimplify_call_expr (tree *expr_p, gimple > break; > } > > + case BUILT_IN_EH_RETURN: > + cfun->calls_eh_return = true; > + break; > + > default: > ; > } > --- gcc/tree-inline.c.jj 2019-05-10 10:15:40.704283180 +0200 > +++ gcc/tree-inline.c 2019-05-14 21:49:19.038617963 +0200 > @@ -2662,6 +2662,7 @@ initialize_cfun (tree new_fndecl, tree c > cfun->va_list_gpr_size = src_cfun->va_list_gpr_size; > cfun->va_list_fpr_size = src_cfun->va_list_fpr_size; > cfun->has_nonlocal_label = src_cfun->has_nonlocal_label; > + cfun->calls_eh_return = src_cfun->calls_eh_return; > cfun->stdarg = src_cfun->stdarg; > cfun->after_inlining = src_cfun->after_inlining; > cfun->can_throw_non_call_exceptions > @@ -4778,6 +4779,7 @@ expand_call_inline (basic_block bb, gimp > src_properties = id->src_cfun->curr_properties & prop_mask; > if (src_properties != prop_mask) > dst_cfun->curr_properties &= src_properties | ~prop_mask; > + dst_cfun->calls_eh_return |= id->src_cfun->calls_eh_return; > > gcc_assert (!id->src_cfun->after_inlining); > > --- gcc/tree-tailcall.c.jj2019-05-10 23:20:33.849768476 +0200 > +++ gcc/tree-tailcall.c 2019-05-14 21:54:34.733353248 +0200 > @@ -140,6 +140,7 @@ suitable_for_tail_opt_p (void) > > return true; > } > + > /* Returns false when the function is not suitable for tail call optimization >for some reason (e.g. if it takes variable number of arguments). >This test must pass in addition to suitable_for_tail_opt_p in order to make > @@ -168,6 +169,11 @@ suitable_for_tail_call_opt_p (void) > if (cfun->calls_setjmp) > return false; > > + /* Various targets don't handle tail calls correctly in functions > + that call __builtin_eh_return. */ > + if (cfun->calls_eh_return) > +return false; > + > /* ??? It is OK if the argument of a function is taken in some cases, > but not in all cases. See PR15387 and PR19616. Revisit for 4.1. */ > for (param = DECL_ARGUMENTS (current_function_decl); > --- gcc/lto-streamer-in.c.jj 2019-03-08 11:43:35.062317743 +0100 > +++ gcc/lto-streamer-in.c 2019-05-14 21:41:48.900128559 +0200 > @@ -1005,6 +1005,7 @@ input_struct_function_base (struct funct > fn->has_forced_label_in_static = bp_unpack_value (&bp
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Tue, May 14, 2019 at 01:08:27PM -0600, Jeff Law wrote: > > In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted a > > patch that would set it earlier (or it could be set during gimplification > > and propagated during inlining, would need to be in cfun->calls_eh_return > > instead of crtl->calls_eh_return) and then targets for which we do not want > > to bother with it or where it is not beneficial to have tail calls in > > functions that call __builtin_eh_return (as I said in the thread, e.g. on > > x86_64-linux it is both smaller and faster), the targets which we expect to > > fail currently or even have a proof of that can just punt. > I would go with a patch that got the info set earlier and just punt in > the generic code. I just don't see this case as terribly important to > optimize. So like this? Bootstrapped/regtested on x86_64-linux and i686-linux, verified _Unwind_Resume_or_Rethrow no longer has a tail call. 2019-05-14 Jakub Jelinek PR c++/59813 PR target/90418 * function.h (struct function): Add calls_eh_return member. * gimplify.c (gimplify_call_expr): Set cfun->calls_eh_return when gimplifying __builtin_eh_return call. * tree-inline.c (initialize_cfun): Copy calls_eh_return from src_cfun to cfun. (expand_call_inline): Or in src_cfun->calls_eh_return into dst_cfun->calls_eh_return. * tree-tailcall.c (suitable_for_tail_call_opt_p): Return false if cfun->calls_eh_return. * lto-streamer-in.c (input_struct_function_base): Read calls_eh_return. * lto-streamer-out.c (output_struct_function_base): Write calls_eh_return. --- gcc/function.h.jj 2019-01-01 12:37:15.383004075 +0100 +++ gcc/function.h 2019-05-14 21:40:13.837715081 +0200 @@ -327,6 +327,9 @@ struct GTY(()) function { either as a subroutine or builtin. */ unsigned int calls_alloca : 1; + /* Nonzero if function being compiled can call __builtin_eh_return. */ + unsigned int calls_eh_return : 1; + /* Nonzero if function being compiled receives nonlocal gotos from nested functions. */ unsigned int has_nonlocal_label : 1; --- gcc/gimplify.c.jj 2019-05-03 15:22:07.416401170 +0200 +++ gcc/gimplify.c 2019-05-14 21:51:38.700288873 +0200 @@ -3297,6 +3297,10 @@ gimplify_call_expr (tree *expr_p, gimple break; } + case BUILT_IN_EH_RETURN: + cfun->calls_eh_return = true; + break; + default: ; } --- gcc/tree-inline.c.jj2019-05-10 10:15:40.704283180 +0200 +++ gcc/tree-inline.c 2019-05-14 21:49:19.038617963 +0200 @@ -2662,6 +2662,7 @@ initialize_cfun (tree new_fndecl, tree c cfun->va_list_gpr_size = src_cfun->va_list_gpr_size; cfun->va_list_fpr_size = src_cfun->va_list_fpr_size; cfun->has_nonlocal_label = src_cfun->has_nonlocal_label; + cfun->calls_eh_return = src_cfun->calls_eh_return; cfun->stdarg = src_cfun->stdarg; cfun->after_inlining = src_cfun->after_inlining; cfun->can_throw_non_call_exceptions @@ -4778,6 +4779,7 @@ expand_call_inline (basic_block bb, gimp src_properties = id->src_cfun->curr_properties & prop_mask; if (src_properties != prop_mask) dst_cfun->curr_properties &= src_properties | ~prop_mask; + dst_cfun->calls_eh_return |= id->src_cfun->calls_eh_return; gcc_assert (!id->src_cfun->after_inlining); --- gcc/tree-tailcall.c.jj 2019-05-10 23:20:33.849768476 +0200 +++ gcc/tree-tailcall.c 2019-05-14 21:54:34.733353248 +0200 @@ -140,6 +140,7 @@ suitable_for_tail_opt_p (void) return true; } + /* Returns false when the function is not suitable for tail call optimization for some reason (e.g. if it takes variable number of arguments). This test must pass in addition to suitable_for_tail_opt_p in order to make @@ -168,6 +169,11 @@ suitable_for_tail_call_opt_p (void) if (cfun->calls_setjmp) return false; + /* Various targets don't handle tail calls correctly in functions + that call __builtin_eh_return. */ + if (cfun->calls_eh_return) +return false; + /* ??? It is OK if the argument of a function is taken in some cases, but not in all cases. See PR15387 and PR19616. Revisit for 4.1. */ for (param = DECL_ARGUMENTS (current_function_decl); --- gcc/lto-streamer-in.c.jj2019-03-08 11:43:35.062317743 +0100 +++ gcc/lto-streamer-in.c 2019-05-14 21:41:48.900128559 +0200 @@ -1005,6 +1005,7 @@ input_struct_function_base (struct funct fn->has_forced_label_in_static = bp_unpack_value (&bp, 1); fn->calls_alloca = bp_unpack_value (&bp, 1); fn->calls_setjmp = bp_unpack_value (&bp, 1); + fn->calls_eh_return = bp_unpack_value (&bp, 1); fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1); fn->has_simduid_loops = bp_unpack_value (&bp, 1); fn->va_list_fpr_size = bp_unpack_value (&bp, 8); --- gcc/lto-streamer-out.c.jj 2019-03-08 11:43:35.062317743 +0100 +++ gcc/lto-streamer-out.c 2019-05-14
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On 5/13/19 4:51 PM, Jakub Jelinek wrote: > On Mon, May 13, 2019 at 04:29:21PM -0600, Jeff Law wrote: >>> That is a serious misunderstanding of what __builtin_eh_return does. >>> For the most part, __builtin_eh_return works by forcing all call clobbered >>> registers to stack, having that described in the unwind info and having the >>> unwinding code modify that saved info when needed. The adjustment of the >>> stack is done only on approx. half of the targets where the destination >>> stack pointer should be derived from CFA instead of being restored with the >>> rest of the call saved registers. You can't do this in assembly, it needs >>> to be done in the same function that saves the state etc. >> So why not just reject tail calls when we see __builtin_eh_return. It's >> not ideal, but that feels better than the currently posted hack. > > The problem as has been said in this thread is that crtl->calls_eh_return > flag is only set during expansion, so if it is tested when deciding if we > should allow the tail call or not, it might be (and in the case of > _Unwind_Resume_or_Rethrow it is) too late, because we first expand the call > that could be tail call and only afterwards expand the __builtin_eh_return > and set crtl->calls_eh_return. > > In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted a > patch that would set it earlier (or it could be set during gimplification > and propagated during inlining, would need to be in cfun->calls_eh_return > instead of crtl->calls_eh_return) and then targets for which we do not want > to bother with it or where it is not beneficial to have tail calls in > functions that call __builtin_eh_return (as I said in the thread, e.g. on > x86_64-linux it is both smaller and faster), the targets which we expect to > fail currently or even have a proof of that can just punt. I would go with a patch that got the info set earlier and just punt in the generic code. I just don't see this case as terribly important to optimize. jeff
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi, On Tue, 14 May 2019, Jakub Jelinek wrote: > In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted > a patch that would set it earlier (or it could be set during > gimplification and propagated during inlining, would need to be in > cfun->calls_eh_return instead of crtl->calls_eh_return) and then targets > for which we do not want to bother with it or where it is not beneficial > to have tail calls in functions that call __builtin_eh_return (as I said > in the thread, e.g. on x86_64-linux it is both smaller and faster), I really think you're over-obsessing with the optimization in this case. Given the general slowness of DWARF CFI interpretation (which always happens before eh_return) any optimization re tail-calls in any of _Unwind_RaiseException _Unwind_ForcedUnwind _Unwind_Resume or _Unwind_Resume_or_Rethrow (of which only the latter looks amenable) seems a bit silly. Disable tail-calls whenever builtin_eh_return is seen in the gimplifier (in all four functions in the world!), and be done. Ciao, Michael.
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Mon, May 13, 2019 at 04:29:21PM -0600, Jeff Law wrote: > > That is a serious misunderstanding of what __builtin_eh_return does. > > For the most part, __builtin_eh_return works by forcing all call clobbered > > registers to stack, having that described in the unwind info and having the > > unwinding code modify that saved info when needed. The adjustment of the > > stack is done only on approx. half of the targets where the destination > > stack pointer should be derived from CFA instead of being restored with the > > rest of the call saved registers. You can't do this in assembly, it needs > > to be done in the same function that saves the state etc. > So why not just reject tail calls when we see __builtin_eh_return. It's > not ideal, but that feels better than the currently posted hack. The problem as has been said in this thread is that crtl->calls_eh_return flag is only set during expansion, so if it is tested when deciding if we should allow the tail call or not, it might be (and in the case of _Unwind_Resume_or_Rethrow it is) too late, because we first expand the call that could be tail call and only afterwards expand the __builtin_eh_return and set crtl->calls_eh_return. In https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00484.html I've posted a patch that would set it earlier (or it could be set during gimplification and propagated during inlining, would need to be in cfun->calls_eh_return instead of crtl->calls_eh_return) and then targets for which we do not want to bother with it or where it is not beneficial to have tail calls in functions that call __builtin_eh_return (as I said in the thread, e.g. on x86_64-linux it is both smaller and faster), the targets which we expect to fail currently or even have a proof of that can just punt. Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On 5/13/19 2:09 PM, Jakub Jelinek wrote: > On Mon, May 13, 2019 at 07:57:30PM +, Wilco Dijkstra wrote: >> Hi, >> >>> I think my worry would be all the targets that support tail calls, but >>> which we can't easily test. ie, we put in the hack, but when do we >>> know it can be removed? >> >> Well it would be easy to make a runnable torture test. I'm not sure >> how easy it would be to do it using a scan-assembler test given the >> stack pointer adjustment may not use a standard pattern (and the >> SP has many different names). >> >>> Is there any property of the code that we can look at instead of a hack >>> like this? >> >> We could simpify __builtin_eh_return and implement it as a libgcc >> function or automatically generated thunk. Basically all you need is >> tailcall a 2-instruction helper which adjusts the stack and then jumps >> to the given function pointer. This way you don't need the complex >> special cases and optimization disables for eh_return functions. > > That is a serious misunderstanding of what __builtin_eh_return does. > For the most part, __builtin_eh_return works by forcing all call clobbered > registers to stack, having that described in the unwind info and having the > unwinding code modify that saved info when needed. The adjustment of the > stack is done only on approx. half of the targets where the destination > stack pointer should be derived from CFA instead of being restored with the > rest of the call saved registers. You can't do this in assembly, it needs > to be done in the same function that saves the state etc. So why not just reject tail calls when we see __builtin_eh_return. It's not ideal, but that feels better than the currently posted hack. jeff
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, >> We could simpify __builtin_eh_return and implement it as a libgcc >> function or automatically generated thunk. Basically all you need is >> tailcall a 2-instruction helper which adjusts the stack and then jumps >> to the given function pointer. This way you don't need the complex >> special cases and optimization disables for eh_return functions. > > That is a serious misunderstanding of what __builtin_eh_return does. And this is exactly the issue - it's undocumented/untested. There isn't a well-defined interface that makes it clear which part of the feature is implemented by the midend and which part must be done by each backend. > For the most part, __builtin_eh_return works by forcing all call clobbered > registers to stack, having that described in the unwind info and having the > unwinding code modify that saved info when needed. Sure, and that's target independent. > The adjustment of the > stack is done only on approx. half of the targets where the destination > stack pointer should be derived from CFA instead of being restored with the > rest of the call saved registers. You can't do this in assembly, it needs > to be done in the same function that saves the state etc. No, it can be done by a simple 2-instruction thunk. Targets which save the stack pointer don't require it since they restore SP already. You could also enter the unwinder via a small assembler veneer which saves the callee-save registers and stack pointer (a bit like setjmp), and the unwinder will work just fine. Once you decide what the specification is, the implementation becomes really trivial. But without it, every target is just guessing what it should implement. Wilco
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Mon, May 13, 2019 at 07:57:30PM +, Wilco Dijkstra wrote: > Hi, > > > I think my worry would be all the targets that support tail calls, but > > which we can't easily test. ie, we put in the hack, but when do we > > know it can be removed? > > Well it would be easy to make a runnable torture test. I'm not sure > how easy it would be to do it using a scan-assembler test given the > stack pointer adjustment may not use a standard pattern (and the > SP has many different names). > > > Is there any property of the code that we can look at instead of a hack > > like this? > > We could simpify __builtin_eh_return and implement it as a libgcc > function or automatically generated thunk. Basically all you need is > tailcall a 2-instruction helper which adjusts the stack and then jumps > to the given function pointer. This way you don't need the complex > special cases and optimization disables for eh_return functions. That is a serious misunderstanding of what __builtin_eh_return does. For the most part, __builtin_eh_return works by forcing all call clobbered registers to stack, having that described in the unwind info and having the unwinding code modify that saved info when needed. The adjustment of the stack is done only on approx. half of the targets where the destination stack pointer should be derived from CFA instead of being restored with the rest of the call saved registers. You can't do this in assembly, it needs to be done in the same function that saves the state etc. Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi, > I think my worry would be all the targets that support tail calls, but > which we can't easily test. ie, we put in the hack, but when do we > know it can be removed? Well it would be easy to make a runnable torture test. I'm not sure how easy it would be to do it using a scan-assembler test given the stack pointer adjustment may not use a standard pattern (and the SP has many different names). > Is there any property of the code that we can look at instead of a hack > like this? We could simpify __builtin_eh_return and implement it as a libgcc function or automatically generated thunk. Basically all you need is tailcall a 2-instruction helper which adjusts the stack and then jumps to the given function pointer. This way you don't need the complex special cases and optimization disables for eh_return functions. Or drop support for eh_return altogether since its one and only use seems to be for unwinding (which already requires target specific code). Wilco
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
> On 13 May 2019, at 20:10, Jakub Jelinek wrote: > > On Mon, May 13, 2019 at 12:52:54PM -0600, Jeff Law wrote: >> On 5/10/19 9:17 AM, Jakub Jelinek wrote: >>> On Fri, May 10, 2019 at 01:46:07PM +, Wilco Dijkstra wrote: And looking at the generated code, emitting a tailcall for this case is actually a de-optimization since the large eh_return epilog must be repeated for every tailcall. >>> >>> On x86_64, the code is shorter with the tail call rather than without. >>> >>> That said, here is actually tested workaround until targets are fixed. >>> Richard or Jeff, do we want this workaround? >>> >>> I don't see why we would need extra testsuite coverage for this, given the >>> number of FAILs or bootstrap issues on targets that are broken. >>> >>> 2019-05-10 Jakub Jelinek >>> >>> PR c++/59813 >>> * unwind.inc (_Unwind_Resume_or_Rethrow): Add optimize attribute >>> to temporarily avoid tail calls in the function. >> I think my worry would be all the targets that support tail calls, but >> which we can't easily test. ie, we put in the hack, but when do we >> know it can be removed? >> >> Is there any property of the code that we can look at instead of a hack >> like this? > > We can test crtl->calls_eh_return if if make sure it is computed early > (before expansion), or we could just compute it in the ok_for_sibcall > hook if we add caching of that per function (check if any bb that doesn't > have successors and is still in GIMPLE IL ends with __builtin_eh_return > call), if not already crtl->calls_eh_return. > > I think only targets that define EH_RETURN_STACKADJ_RTX are problematic, and > from those only those that do not use eh_return_internal patterns (e.g. i386 > or bfin or riscv have special eh_return_internal patterns that expand > epilogue with not just true/false for sibcall/for_sibcall, but either a 3 > state argument or two bools whether it is sibcall/normal/eh_return). > Out of these, aarch64 and rs6000 have been fixed (though, the latter maybe > not for powerpc-darwin yet). no, it’s quite a lot more involved to fix that - I would favour the solution above - just punting when calls_eh_return is true in rs6000_ok_for_sibcall, until there’s a chance to try and remove (or improve) the save_world code. At least that’s also a simple immediate fix for any other affected target. Iain > alpha, arc, cr16, csky, frv, m32c, m68k, mmix, msp430, nds32, > nios2, or1k, pa, sh, tilegx, tilepro, visium, xtensa > might be still broken. > > Some of those might be even harder to fix (e.g. alpha, pa, mmix to name a > few), because they don't even bother to pass the sibcall/for_sibcall > argument to epilogue expansion. > > Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Mon, May 13, 2019 at 12:52:54PM -0600, Jeff Law wrote: > On 5/10/19 9:17 AM, Jakub Jelinek wrote: > > On Fri, May 10, 2019 at 01:46:07PM +, Wilco Dijkstra wrote: > >> And looking at the generated code, emitting a tailcall for this case is > >> actually a > >> de-optimization since the large eh_return epilog must be repeated for every > >> tailcall. > > > > On x86_64, the code is shorter with the tail call rather than without. > > > > That said, here is actually tested workaround until targets are fixed. > > Richard or Jeff, do we want this workaround? > > > > I don't see why we would need extra testsuite coverage for this, given the > > number of FAILs or bootstrap issues on targets that are broken. > > > > 2019-05-10 Jakub Jelinek > > > > PR c++/59813 > > * unwind.inc (_Unwind_Resume_or_Rethrow): Add optimize attribute > > to temporarily avoid tail calls in the function. > I think my worry would be all the targets that support tail calls, but > which we can't easily test. ie, we put in the hack, but when do we > know it can be removed? > > Is there any property of the code that we can look at instead of a hack > like this? We can test crtl->calls_eh_return if if make sure it is computed early (before expansion), or we could just compute it in the ok_for_sibcall hook if we add caching of that per function (check if any bb that doesn't have successors and is still in GIMPLE IL ends with __builtin_eh_return call), if not already crtl->calls_eh_return. I think only targets that define EH_RETURN_STACKADJ_RTX are problematic, and from those only those that do not use eh_return_internal patterns (e.g. i386 or bfin or riscv have special eh_return_internal patterns that expand epilogue with not just true/false for sibcall/for_sibcall, but either a 3 state argument or two bools whether it is sibcall/normal/eh_return). Out of these, aarch64 and rs6000 have been fixed (though, the latter maybe not for powerpc-darwin yet). alpha, arc, cr16, csky, frv, m32c, m68k, mmix, msp430, nds32, nios2, or1k, pa, sh, tilegx, tilepro, visium, xtensa might be still broken. Some of those might be even harder to fix (e.g. alpha, pa, mmix to name a few), because they don't even bother to pass the sibcall/for_sibcall argument to epilogue expansion. Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On 5/10/19 9:17 AM, Jakub Jelinek wrote: > On Fri, May 10, 2019 at 01:46:07PM +, Wilco Dijkstra wrote: >> And looking at the generated code, emitting a tailcall for this case is >> actually a >> de-optimization since the large eh_return epilog must be repeated for every >> tailcall. > > On x86_64, the code is shorter with the tail call rather than without. > > That said, here is actually tested workaround until targets are fixed. > Richard or Jeff, do we want this workaround? > > I don't see why we would need extra testsuite coverage for this, given the > number of FAILs or bootstrap issues on targets that are broken. > > 2019-05-10 Jakub Jelinek > > PR c++/59813 > * unwind.inc (_Unwind_Resume_or_Rethrow): Add optimize attribute > to temporarily avoid tail calls in the function. I think my worry would be all the targets that support tail calls, but which we can't easily test. ie, we put in the hack, but when do we know it can be removed? Is there any property of the code that we can look at instead of a hack like this? jeff
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Sat, May 11, 2019 at 09:38:22AM +0100, Richard Sandiford wrote: > Jakub Jelinek writes: > > Previously, the mere existence of the addressable variables this_context > > and cur_context prevented tail call on the early out > > return _Unwind_RaiseException (exc); > > but since r271013 the tailcall analysis figures that while those two > > variables are there, they aren't touched before the possible tail call > > site, so they can't be really live during the call. Ah! That explains why this never happened before. > > This does a lot of register saving and restoring, which is not needed but is > > not wrong-code (guess separate shrink wrapping would help here if > > implemented for the target). SWS does not handle eh_return: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/shrink-wrap.c;h=57124db92c662bf52efc7ea94c274d9b4e234d04;hb=HEAD#l1780 But, a function calling eh_return will not get *any* shrink-wrapping: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/shrink-wrap.c;h=57124db92c662bf52efc7ea94c274d9b4e234d04;hb=HEAD#l658 (it has done this since r179553, when shrink-wrapping was added). Some targets run splitters for their eh_return and evilness like that. It never was documented what the compiler expects the target to do with eh_return, and what the limitations are, etc. > > The only wrong-code is actually the > > add sp, sp, x4 instruction though. The previous instruction restored sp to > > the value it had at the start of the function and then we should just tail > > call. But is that correct? eh_return is supposed to adjust the stack? And, what should other targets do? > FWIW, I agree we should just fix the targets and not even use the > workaround you posted later. It won't be the first time that many > targets have got something wrong due to lack of coverage. eh_return isn't defined or documented very well. That should be fixed... Or maybe eh_return should just be removed. Segher > > --- gcc/config/aarch64/aarch64.c.jj 2019-05-02 12:18:40.004979690 +0200 > > +++ gcc/config/aarch64/aarch64.c2019-05-09 20:08:00.774718003 +0200 > > @@ -5913,7 +5913,7 @@ aarch64_expand_epilogue (bool for_sibcal > > } > > > >/* Stack adjustment for exception handler. */ > > - if (crtl->calls_eh_return) > > + if (crtl->calls_eh_return && !for_sibcall) > > { > >/* We need to unwind the stack by the offset computed by > > EH_RETURN_STACKADJ_RTX. We have already reset the CFA
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Jakub Jelinek writes: > Hi! > > My recent patch for tail call improvement apparently affects also the > _Unwind_Resume_or_Rethrow function in libgcc: > > _Unwind_Reason_Code __attribute__((optimize ("no-omit-frame-pointer"))) > _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc) > { > struct _Unwind_Context this_context, cur_context; > _Unwind_Reason_Code code; > unsigned long frames; > if (exc->private_1 == 0) > return _Unwind_RaiseException (exc); > do { __builtin_unwind_init (); uw_init_context_1 (&this_context, > __builtin_dwarf_cfa (), __builtin_return_address (0)); } while (0); > cur_context = this_context; > code = _Unwind_ForcedUnwind_Phase2 (exc, &cur_context, &frames); > ((void)(!(code == _URC_INSTALL_CONTEXT) ? abort (), 0 : 0)); > do { long offset = uw_install_context_1 ((&this_context), (&cur_context)); > void *handler = uw_frob_return_addr ((&this_context), (&cur_context)); > _Unwind_DebugHook ((&cur_context)->cfa, handler); ; __builtin_eh_return > (offset, handler); } while (0); > } > > Previously, the mere existence of the addressable variables this_context > and cur_context prevented tail call on the early out > return _Unwind_RaiseException (exc); > but since r271013 the tailcall analysis figures that while those two > variables are there, they aren't touched before the possible tail call > site, so they can't be really live during the call. > The problem is that this is one of the few calls that call > __builtin_eh_return which normally user code doesn't use, we use it just > for the unwinder routines and so some targets (in this case aarch64, in > another report powerpc-darwin) show that the combination of > cfun->calls_eh_return + tail calls has not been really tested. > One possibility is to do if (cfun->calls_eh_return) return false; in > the target hook *_ok_for_sibcall, the other possibility is to fix that > case properly. > > The following patch fixes the aarch64 case. > The code emitted for the code path from the start of the function > till the tail call was: > 2778 <_Unwind_Resume_or_Rethrow>: > 2778: d12143ffsub sp, sp, #0x850 > 277c: a9007bfdstp x29, x30, [sp] > 2780: 910003fdmov x29, sp > 2784: a90107e0stp x0, x1, [sp, #16] > 2788: f9400801ldr x1, [x0, #16] > 278c: a9020fe2stp x2, x3, [sp, #32] > 2790: a90353f3stp x19, x20, [sp, #48] > 2794: aa0003f3mov x19, x0 > 2798: a9045bf5stp x21, x22, [sp, #64] > 279c: a90563f7stp x23, x24, [sp, #80] > 27a0: a9066bf9stp x25, x26, [sp, #96] > 27a4: a90773fbstp x27, x28, [sp, #112] > 27a8: 6d0827e8stp d8, d9, [sp, #128] > 27ac: 6d092feastp d10, d11, [sp, #144] > 27b0: 6d0a37ecstp d12, d13, [sp, #160] > 27b4: 6d0b3feestp d14, d15, [sp, #176] > 27b8: b5000201cbnzx1, 27f8 > <_Unwind_Resume_or_Rethrow+0x80> > 27bc: a9407bfdldp x29, x30, [sp] > 27c0: a94107e0ldp x0, x1, [sp, #16] > 27c4: a9420fe2ldp x2, x3, [sp, #32] > 27c8: a94353f3ldp x19, x20, [sp, #48] > 27cc: a9445bf5ldp x21, x22, [sp, #64] > 27d0: a94563f7ldp x23, x24, [sp, #80] > 27d4: a9466bf9ldp x25, x26, [sp, #96] > 27d8: a94773fbldp x27, x28, [sp, #112] > 27dc: 6d4827e8ldp d8, d9, [sp, #128] > 27e0: 6d492fealdp d10, d11, [sp, #144] > 27e4: 6d4a37ecldp d12, d13, [sp, #160] > 27e8: 6d4b3feeldp d14, d15, [sp, #176] > 27ec: 912143ffadd sp, sp, #0x850 > 27f0: 8b2463ffadd sp, sp, x4 > 27f4: 1400b 23c8 <_Unwind_RaiseException> > 27f4: R_AARCH64_JUMP26 _Unwind_RaiseException > This does a lot of register saving and restoring, which is not needed but is > not wrong-code (guess separate shrink wrapping would help here if > implemented for the target). The only wrong-code is actually the > add sp, sp, x4 instruction though. The previous instruction restored sp to > the value it had at the start of the function and then we should just tail > call. This instruction is something that is needed in the spot where > __builtin_eh_return is emitted. > > Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk? > > 2019-05-10 Jakub Jelinek > > PR c++/59813 > * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't add > EH_RETURN_STACKADJ_RTX to sp in sibcall epilogues. OK, thanks. FWIW, I agree we should just fix the targets and not even use the workaround you posted later. It won't be the firs
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Fri, May 10, 2019 at 01:46:07PM +, Wilco Dijkstra wrote: > And looking at the generated code, emitting a tailcall for this case is > actually a > de-optimization since the large eh_return epilog must be repeated for every > tailcall. On x86_64, the code is shorter with the tail call rather than without. That said, here is actually tested workaround until targets are fixed. Richard or Jeff, do we want this workaround? I don't see why we would need extra testsuite coverage for this, given the number of FAILs or bootstrap issues on targets that are broken. 2019-05-10 Jakub Jelinek PR c++/59813 * unwind.inc (_Unwind_Resume_or_Rethrow): Add optimize attribute to temporarily avoid tail calls in the function. --- libgcc/unwind.inc.jj2019-01-01 12:38:17.345987416 +0100 +++ libgcc/unwind.inc 2019-05-10 17:10:32.436329516 +0200 @@ -252,6 +252,7 @@ _Unwind_Resume (struct _Unwind_Exception a normal exception that was handled. */ _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +__attribute__((optimize ("no-optimize-sibling-calls"))) _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc) { struct _Unwind_Context this_context, cur_context; Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, >> That's great to hear, but all the other targets still need to be fixed >> somehow. > > Not all other targets, just some. I don't see what you find wrong on > actually fixing them. The tail call in that spot is actually very > desirable, but only if we can do a shrink wrapping for it, by doing Without a torture test this it's not obvious how many targets actually get it right. If it's desirable to optimize RaiseException (is that really the common case?), then why not optimize the source code to: _Unwind_Reason_Code _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc) { if (exc->private_1 == 0) return _Unwind_RaiseException (exc); _Unwind_Resume_Tailcall (exc); } _Unwind_Reason_Code __attribute__((optimize ("no-omit-frame-pointer"))) _Unwind_Resume_Tailcall (struct _Unwind_Exception *exc) { struct _Unwind_Context this_context, cur_context; _Unwind_Reason_Code code; unsigned long frames; do { __builtin_unwind_init (); uw_init_context_1 (&this_context, __builtin_dwarf_cfa (), __builtin_return_address (0)); } while (0); cur_context = this_context; code = _Unwind_ForcedUnwind_Phase2 (exc, &cur_context, &frames); ((void)(!(code == _URC_INSTALL_CONTEXT) ? abort (), 0 : 0)); do { long offset = uw_install_context_1 ((&this_context), (&cur_context)); void *handler = uw_frob_return_addr ((&this_context), (&cur_context)); _Unwind_DebugHook ((&cur_context)->cfa, handler); ; __builtin_eh_return (offset, handler); } while (0); } > If you want just ignore the problem, we can just --- libgcc/unwind.inc.jj 2019-01-01 12:38:17.345987416 +0100 +++ libgcc/unwind.inc 2019-05-10 16:12:18.330555718 +0200 @@ -252,6 +252,7 @@ _Unwind_Resume (struct _Unwind_Exception a normal exception that was handled. */ _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +__attribute__((optimize ("no-omit-sibling-calls"))) _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc) { struct _Unwind_Context this_context, cur_context; > and be done with that. Yes that's a safe temporary workaround to get things working again, I don't think anyone can object to that. Note typo "no-optimize-sibling-calls". And it needs a comment with the PRs created for this. Wilco
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Fri, May 10, 2019 at 01:46:07PM +, Wilco Dijkstra wrote: > > So given this and the fact there is no real gain from emitting tailcalls in > > eh_return > > functions, wouldn't it make more sense to always block tailcalls in the > > mid-end? > > > No. E.g. x86 handles that just fine, > > That's great to hear, but all the other targets still need to be fixed > somehow. Not all other targets, just some. I don't see what you find wrong on actually fixing them. The tail call in that spot is actually very desirable, but only if we can do a shrink wrapping for it, by doing if (exc->private) tail_call _Unwind_RaiseException (exc); prologue; whatever; eh_epilogue; Or if (exc->private) tail_call _Unwind_RaiseException (exc); else tail_call _Unwind_Resume (exc); Except that _Unwind_Resume returns void and is (determined by the compiler) noreturn, so it doesn't matter what we return. That second option would be shortest, but we don't have ways to convince the compiler to do that right now (__attribute__((noipa)) on _Unwind_Resume perhaps, so that it doesn't tell callers it is noreturn, but the mismatch of the return value is bad. If you want just ignore the problem, we can just --- libgcc/unwind.inc.jj2019-01-01 12:38:17.345987416 +0100 +++ libgcc/unwind.inc 2019-05-10 16:12:18.330555718 +0200 @@ -252,6 +252,7 @@ _Unwind_Resume (struct _Unwind_Exception a normal exception that was handled. */ _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +__attribute__((optimize ("no-omit-sibling-calls"))) _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc) { struct _Unwind_Context this_context, cur_context; and be done with that. Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, > So given this and the fact there is no real gain from emitting tailcalls in > eh_return > functions, wouldn't it make more sense to always block tailcalls in the > mid-end? > No. E.g. x86 handles that just fine, That's great to hear, but all the other targets still need to be fixed somehow. Is the tailcall optimization so important for this case it is worth forcing 20+ targets to find and fix this issue? Given eh_return is already overly complex that seems like a lot of effort that could be used better on more productive optimizations. And looking at the generated code, emitting a tailcall for this case is actually a de-optimization since the large eh_return epilog must be repeated for every tailcall. Wilco
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Fri, May 10, 2019 at 11:38:50AM +, Wilco Dijkstra wrote: > > __builtin_eh_return is a noreturn call, and we never tail call optimize > > noreturn calls: > > if (flags & ECF_NORETURN) > > { > > maybe_complain_about_tail_call (exp, "callee does not return"); > > return false; > > } > > As both the __builtin_eh_return and other tail calls are points in the CFG > > that are followed by EXIT only, it doesn't make sense to talk about > > tailcalls ignoring __builtin_eh_return. The tailcall is in one epilogue in > > the function, __builtin_eh_return is in another epilogue. We do want that > > add sp, sp, x4 in the eh return epilogue, not in the tail call epilogue. > > Thanks that makes sense, so this change would work fine. > > However I think almost all targets are affected by this bug - I tried 4 random > backends and all had code similar to AArch64: > > if (crtl->calls_eh_return) > { > rtx sa = EH_RETURN_STACKADJ_RTX; > emit_insn (gen_add3_insn (sp_reg_rtx, sp_reg_rtx, sa)); > } > ... with sibcall case handled after this... > > And no code in _function_ok_for_sibcall to block tailcalls... > > So given this and the fact there is no real gain from emitting tailcalls in > eh_return > functions, wouldn't it make more sense to always block tailcalls in the > mid-end? No. E.g. x86 handles that just fine, the EH_RETURN_STACKADJ_RTX addition is guarded with style == 2 argument to ix86_expand_epilogue, which is only set in eh_return epilogue, not in normal epilogue or sibcall_epilogue. That seems to be the cleanest approach to me, although guarding it with !sibcall or similar as done in this patch should be enough for most of the targets that have the problem; of course there needs to be a way to let targets disallow tail calls in __builtin_eh_return functions, ATM there is not an easy one, because crtl->calls_eh_return is set only during expansion and in the case of if (cond) return tail_call (...); ... __builtin_eh_return (...); the tail call is expanded before crtl->calls_eh_return is set. The question is if we want to replace crtl->calls_eh_return with cfun->calls_eh_return and where we should set it (say in builtins folding and propagate through inlining), or some pass should do that? Can't find a good match ATM though, perhaps just the tailcall pass should check that, like below (untested): --- gcc/tree-tailcall.c.jj 2019-05-08 19:04:58.947797821 +0200 +++ gcc/tree-tailcall.c 2019-05-10 14:16:57.203930630 +0200 @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. #include "common/common-target.h" #include "ipa-utils.h" #include "tree-ssa-live.h" +#include "memmodel.h" +#include "emit-rtl.h" /* The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level @@ -1171,6 +1173,25 @@ tree_optimize_tail_calls_1 (bool opt_tai if (phis_constructed) mark_virtual_operands_for_renaming (cfun); + /* If there are any tail calls, check if the function has any + __builtin_eh_return calls, as some targets can't handle tail calls + in functions that call __builtin_eh_return. */ + if (cfun->tail_call_marked) +{ + basic_block bb; + + FOR_EACH_BB_FN (bb, cfun) +if (EDGE_COUNT (bb->succs) == 0) + { + gimple *stmt = last_stmt (bb); + if (stmt && gimple_call_builtin_p (stmt, BUILT_IN_EH_RETURN)) + { + crtl->calls_eh_return = 1; + break; + } + } +} + if (changed) return TODO_cleanup_cfg | TODO_update_ssa_only_virtuals; return 0; Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, > __builtin_eh_return is a noreturn call, and we never tail call optimize > noreturn calls: > if (flags & ECF_NORETURN) > { > maybe_complain_about_tail_call (exp, "callee does not return"); > return false; > } > As both the __builtin_eh_return and other tail calls are points in the CFG > that are followed by EXIT only, it doesn't make sense to talk about > tailcalls ignoring __builtin_eh_return. The tailcall is in one epilogue in > the function, __builtin_eh_return is in another epilogue. We do want that > add sp, sp, x4 in the eh return epilogue, not in the tail call epilogue. Thanks that makes sense, so this change would work fine. However I think almost all targets are affected by this bug - I tried 4 random backends and all had code similar to AArch64: if (crtl->calls_eh_return) { rtx sa = EH_RETURN_STACKADJ_RTX; emit_insn (gen_add3_insn (sp_reg_rtx, sp_reg_rtx, sa)); } ... with sibcall case handled after this... And no code in _function_ok_for_sibcall to block tailcalls... So given this and the fact there is no real gain from emitting tailcalls in eh_return functions, wouldn't it make more sense to always block tailcalls in the mid-end? Wilco
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
On Fri, May 10, 2019 at 10:25:58AM +, Wilco Dijkstra wrote: > Hi Jakub, > > 27ec: 912143ffadd sp, sp, #0x850 > 27f0: 8b2463ffadd sp, sp, x4 > 27f4: 1400b 23c8 <_Unwind_RaiseException> > > > This does a lot of register saving and restoring, which is not needed but is > > not wrong-code (guess separate shrink wrapping would help here if > > implemented for the target). The only wrong-code is actually the > > add sp, sp, x4 instruction though. The previous instruction restored sp to > > the value it had at the start of the function and then we should just tail > > call. This instruction is something that is needed in the spot where > > __builtin_eh_return is emitted. > > The issue seems to be that x4 isn't initialized correctly for tailcalls. > However > doesn't that mean tailcalls will ignore __builtin_eh_return? Or does the > midend > block any tailcalls that are reachable from __builtin_eh_return? __builtin_eh_return is a noreturn call, and we never tail call optimize noreturn calls: if (flags & ECF_NORETURN) { maybe_complain_about_tail_call (exp, "callee does not return"); return false; } As both the __builtin_eh_return and other tail calls are points in the CFG that are followed by EXIT only, it doesn't make sense to talk about tailcalls ignoring __builtin_eh_return. The tailcall is in one epilogue in the function, __builtin_eh_return is in another epilogue. We do want that add sp, sp, x4 in the eh return epilogue, not in the tail call epilogue. Jakub
Re: [PATCH] Fix aarch64 exception handling (PR c++/59813)
Hi Jakub, 27ec: 912143ffadd sp, sp, #0x850 27f0: 8b2463ffadd sp, sp, x4 27f4: 1400b 23c8 <_Unwind_RaiseException> > This does a lot of register saving and restoring, which is not needed but is > not wrong-code (guess separate shrink wrapping would help here if > implemented for the target). The only wrong-code is actually the > add sp, sp, x4 instruction though. The previous instruction restored sp to > the value it had at the start of the function and then we should just tail > call. This instruction is something that is needed in the spot where > __builtin_eh_return is emitted. The issue seems to be that x4 isn't initialized correctly for tailcalls. However doesn't that mean tailcalls will ignore __builtin_eh_return? Or does the midend block any tailcalls that are reachable from __builtin_eh_return? Wilco