Re: "musttail" statement attribute for GCC?
On Fri, 2021-04-23 at 16:07 -0400, David Malcolm wrote: > On Fri, 2021-04-23 at 12:44 -0700, Josh Haberman via Gcc wrote: > > Would it be feasible to implement a "musttail" statement attribute > > in > > GCC to get a guarantee that tail call optimization will be > > performed? > > > > Such an attribute has just landed in the trunk of Clang > > (https://reviews.llvm.org/D99517). It makes it possible to write > > algorithms that use arbitrarily long chains of tail calls without > > risk > > of blowing the stack. I would love to see something like this land > > in > > GCC also (ultimately I'd love to see it standardized). > > > FWIW I implemented something like this in GCC's middle-end here: > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105 > exposing it in API form for libgccjit: > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef > > https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call > but IIRC it's not yet exposed to the regular GCC frontends. I hacked on the plugin you created to test CALL_EXPR_MUST_TAIL_CALL to add the musttail attribute for functions and pushed it here: https://github.com/pietro/gcc-musttail-plugin This patch adds it to the plugin testsuite and successfully runs the same tests as the original plugin: https://gist.github.com/pietro/9a582b054b008e9a4ad3e1afe3a4cc22 It seems that in order to be able to add attributes to the return statement changes are needed to the frontend parser. If this attribute is something people think is usefull I'll try to implement it as a proper patch to GCC. Pietro
Re: "musttail" statement attribute for GCC?
On 26/04/2021 14:49, Iain Sandoe via Gcc wrote: Alexander Monakov wrote: On Fri, 23 Apr 2021, Josh Haberman via Gcc wrote: On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe wrote: I did try to use it this ^ for GCC coroutines (where such a guarantee is pretty important) However, the issue there is that not all targets support indirect tailcalls. I'm not familiar with this limitation. What targets do not support indirect tail calls? On Power64 tailcalls to anything that is not a static function are tricky, and current Clang ICEs for musttail indirect calls (or direct calls to a function which is not static) on that target. GCC also doesn't do indirect tailcalls on Power64. Also, NVPTX (being an intermediate instruction set) does not define a "tailcall instruction", so Clang also ICEs there. Just mentioning it for completeness. Also aarch64-linux if I remember correctly. Tailcalling is possible on aarch64 using the normal branch instructions (B & BR). There however be reasons why those can't be used in a specific compilation context. One could use a trampoline to deal with the loading of additional state, but I do not want to go down that route for coroutines, if possible - since we already have a state frame, it seems better to me to extend the ABI for that to include space for target-specific data at a known offset from the frame pointer. Iain What about tailcalls between DSOs? What issues are there around DSOs? Clang/LLVM don't treat these specially AFAIK, and it seems that tail calls through a PLT should work okay? No, some targets have additional requirements for calls that potentially go via a (PIC) PLT. The most well-known example is probably 32-bit x86, which requries the %ebx register to hold the address of GOT when entering a PLT trampoline. Since %ebx is callee-saved, this makes tailcalling impossible. LLVM solves this by transforming such calls to "no-plt" GOT-indirect calls under 'musttail', so evidently it does treat them specially. Another example is mips64, where even non-PIC PLT is special (but looks like LLVM does not do any tailcalls on mips64 at all). Alexander
Re: "musttail" statement attribute for GCC?
Alexander Monakov wrote: On Fri, 23 Apr 2021, Josh Haberman via Gcc wrote: On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe wrote: I did try to use it this ^ for GCC coroutines (where such a guarantee is pretty important) However, the issue there is that not all targets support indirect tailcalls. I'm not familiar with this limitation. What targets do not support indirect tail calls? On Power64 tailcalls to anything that is not a static function are tricky, and current Clang ICEs for musttail indirect calls (or direct calls to a function which is not static) on that target. GCC also doesn't do indirect tailcalls on Power64. Also, NVPTX (being an intermediate instruction set) does not define a "tailcall instruction", so Clang also ICEs there. Just mentioning it for completeness. Also aarch64-linux if I remember correctly. One could use a trampoline to deal with the loading of additional state, but I do not want to go down that route for coroutines, if possible - since we already have a state frame, it seems better to me to extend the ABI for that to include space for target-specific data at a known offset from the frame pointer. Iain What about tailcalls between DSOs? What issues are there around DSOs? Clang/LLVM don't treat these specially AFAIK, and it seems that tail calls through a PLT should work okay? No, some targets have additional requirements for calls that potentially go via a (PIC) PLT. The most well-known example is probably 32-bit x86, which requries the %ebx register to hold the address of GOT when entering a PLT trampoline. Since %ebx is callee-saved, this makes tailcalling impossible. LLVM solves this by transforming such calls to "no-plt" GOT-indirect calls under 'musttail', so evidently it does treat them specially. Another example is mips64, where even non-PIC PLT is special (but looks like LLVM does not do any tailcalls on mips64 at all). Alexander
Re: "musttail" statement attribute for GCC?
On Fri, 23 Apr 2021, Josh Haberman via Gcc wrote: > On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe wrote: > > I did try to use it this ^ for GCC coroutines (where such a guarantee is > > pretty important) > > > > However, the issue there is that not all targets support indirect tailcalls. > > I'm not familiar with this limitation. What targets do not support > indirect tail calls? On Power64 tailcalls to anything that is not a static function are tricky, and current Clang ICEs for musttail indirect calls (or direct calls to a function which is not static) on that target. GCC also doesn't do indirect tailcalls on Power64. Also, NVPTX (being an intermediate instruction set) does not define a "tailcall instruction", so Clang also ICEs there. Just mentioning it for completeness. > > What about tailcalls between DSOs? > > What issues are there around DSOs? Clang/LLVM don't treat these > specially AFAIK, and it seems that tail calls through a PLT should > work okay? No, some targets have additional requirements for calls that potentially go via a (PIC) PLT. The most well-known example is probably 32-bit x86, which requries the %ebx register to hold the address of GOT when entering a PLT trampoline. Since %ebx is callee-saved, this makes tailcalling impossible. LLVM solves this by transforming such calls to "no-plt" GOT-indirect calls under 'musttail', so evidently it does treat them specially. Another example is mips64, where even non-PIC PLT is special (but looks like LLVM does not do any tailcalls on mips64 at all). Alexander
Re: "musttail" statement attribute for GCC?
On Fri, Apr 23, 2021 at 4:34 PM Andrew Pinski wrote: > > On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc wrote: > > > > I wrote more about some motivation for guaranteed tail calls here: > > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html > > So I was looking into the use case presented in that web page, it is > the same reason why GCC has the computed gotos extension already. > Is there a reason why you can't just use that instead? Computed gotos solve some of the problems associated with interpreter main loops, but not others. Computed gotos help avoid a bounds check on the switch(). They can sometimes help ensure that the dispatch jmp is replicated at the end of each case, thus giving each case a unique branch prediction address, though this effect is somewhat fragile since the compiler will sometimes decide to merge these paths into one: https://godbolt.org/z/T9a6173WP Computed gotos cannot help with the two main issues mentioned in the article: 1. The larger a function is, and the more complex and connected its control flow, the harder it is for the compiler’s register allocator to keep the most important data in registers. 2. When fast paths and slow paths are intermixed in the same function, the presence of the slow paths compromises the code quality of the fast paths. Tail calls can help solve both of these problems: 1. If we break down the big interpreter function into many small functions, one per operation, we get control over register allocation at each function boundary. As long as each function is simple enough not to spill, this data will be kept in registers continuously through all the fast paths. 2. If we put fast paths and slow paths in separate functions entirely, then the slow paths cannot negatively affect the code generation of the fast paths. We spent a lot of time trying to work within the framework of computed gotos but were unable to get the results we wanted. Tail calls generated substantially better code than anything else we tried. There are also some practical reasons why tail calls are better for us, particularly that they make it easier to reuse the code across different "instantiations" of the interpreter. For the case of protobuf parsing, each protobuf message has a distinct set of fields with different field numbers and types. With tail calls we can create a distinct dispatch table for each message type, but the function pointers in these tables can point to a set of functions that are shared across all message types. Thanks, Josh
Re: "musttail" statement attribute for GCC?
On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc wrote: > > Would it be feasible to implement a "musttail" statement attribute in > GCC to get a guarantee that tail call optimization will be performed? > > Such an attribute has just landed in the trunk of Clang > (https://reviews.llvm.org/D99517). It makes it possible to write > algorithms that use arbitrarily long chains of tail calls without risk > of blowing the stack. I would love to see something like this land in > GCC also (ultimately I'd love to see it standardized). > > I wrote more about some motivation for guaranteed tail calls here: > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html So I was looking into the use case presented in that web page, it is the same reason why GCC has the computed gotos extension already. Is there a reason why you can't just use that instead? Thanks, Andrew Pinski > > GCC successfully optimizes tail calls in many cases already. What > would it take to provide an actual guarantee, and make it apply to > non-optimized builds also? > > The Clang implementation enforces several rules that must hold for the > attribute to be correct, including: > > - It must be on a function call that is tail position. > - Caller and callee must have compatible function signatures > (including the implicit "this", if any), differing only in cv > qualifiers. > - Caller and callee must use the same calling convention. > - Caller and callee may not be constructors or destructors. > - All arguments, the return type, and any temporaries created must be > trivially destructible. > - All variables currently in scope must be trivially destructible. > - The caller cannot be in a try block, an Objective-C block, or a > statement expression. > > Some of these restrictions may be overly strict for some calling > conventions, but they are useful as the "least common denominator" > that should be safe in all cases. When implementing this in Clang, we > found that we were able to reuse some of the checks around goto and > asm goto, since a tail call is sort of like a goto out of the > function's scope. > > Thanks, > Josh
Re: "musttail" statement attribute for GCC?
David Malcolm via Gcc wrote: > FWIW I implemented something like this in GCC's middle-end here: > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105 > exposing it in API form for libgccjit: > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef > https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call > but IIRC it's not yet exposed to the regular GCC frontends. That's great to hear that there is some existing infrastructure around this. Your code includes some checks that we should consider adding to Clang (especially the check against "returns twice"). Why the prohibition against noreturn though? I've found that tail calls to noreturn functions are useful for longjmp()-based error handling, and musttail could be a useful workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837#c2. On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe wrote: > I did try to use it this ^ for GCC coroutines (where such a guarantee is > pretty important) > > However, the issue there is that not all targets support indirect tailcalls. I'm not familiar with this limitation. What targets do not support indirect tail calls? > What about tailcalls between DSOs? What issues are there around DSOs? Clang/LLVM don't treat these specially AFAIK, and it seems that tail calls through a PLT should work okay? Thanks, Josh
Re: "musttail" statement attribute for GCC?
David Malcolm via Gcc wrote: On Fri, 2021-04-23 at 12:44 -0700, Josh Haberman via Gcc wrote: Would it be feasible to implement a "musttail" statement attribute in GCC to get a guarantee that tail call optimization will be performed? Such an attribute has just landed in the trunk of Clang (https://reviews.llvm.org/D99517). It makes it possible to write algorithms that use arbitrarily long chains of tail calls without risk of blowing the stack. I would love to see something like this land in GCC also (ultimately I'd love to see it standardized). FWIW I implemented something like this in GCC's middle-end here: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105 exposing it in API form for libgccjit: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call but IIRC it's not yet exposed to the regular GCC frontends. I did try to use it this ^ for GCC coroutines (where such a guarantee is pretty important) However, the issue there is that not all targets support indirect tailcalls. What about tailcalls between DSOs? Are those also excluded in the clang impl? Iain Dave I wrote more about some motivation for guaranteed tail calls here: https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html GCC successfully optimizes tail calls in many cases already. What would it take to provide an actual guarantee, and make it apply to non-optimized builds also? The Clang implementation enforces several rules that must hold for the attribute to be correct, including: - It must be on a function call that is tail position. - Caller and callee must have compatible function signatures (including the implicit "this", if any), differing only in cv qualifiers. - Caller and callee must use the same calling convention. - Caller and callee may not be constructors or destructors. - All arguments, the return type, and any temporaries created must be trivially destructible. - All variables currently in scope must be trivially destructible. - The caller cannot be in a try block, an Objective-C block, or a statement expression. Some of these restrictions may be overly strict for some calling conventions, but they are useful as the "least common denominator" that should be safe in all cases. When implementing this in Clang, we found that we were able to reuse some of the checks around goto and asm goto, since a tail call is sort of like a goto out of the function's scope. Thanks, Josh
Re: "musttail" statement attribute for GCC?
On Fri, 2021-04-23 at 12:44 -0700, Josh Haberman via Gcc wrote: > Would it be feasible to implement a "musttail" statement attribute in > GCC to get a guarantee that tail call optimization will be performed? > > Such an attribute has just landed in the trunk of Clang > (https://reviews.llvm.org/D99517). It makes it possible to write > algorithms that use arbitrarily long chains of tail calls without risk > of blowing the stack. I would love to see something like this land in > GCC also (ultimately I'd love to see it standardized). FWIW I implemented something like this in GCC's middle-end here: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105 exposing it in API form for libgccjit: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call but IIRC it's not yet exposed to the regular GCC frontends. Dave > > I wrote more about some motivation for guaranteed tail calls here: > > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html > > GCC successfully optimizes tail calls in many cases already. What > would it take to provide an actual guarantee, and make it apply to > non-optimized builds also? > The Clang implementation enforces several rules that must hold for the > attribute to be correct, including: > > - It must be on a function call that is tail position. > - Caller and callee must have compatible function signatures > (including the implicit "this", if any), differing only in cv > qualifiers. > - Caller and callee must use the same calling convention. > - Caller and callee may not be constructors or destructors. > - All arguments, the return type, and any temporaries created must be > trivially destructible. > - All variables currently in scope must be trivially destructible. > - The caller cannot be in a try block, an Objective-C block, or a > statement expression. > > Some of these restrictions may be overly strict for some calling > conventions, but they are useful as the "least common denominator" > that should be safe in all cases. When implementing this in Clang, we > found that we were able to reuse some of the checks around goto and > asm goto, since a tail call is sort of like a goto out of the > function's scope. > > Thanks, > Josh