On Fri, Apr 23, 2021 at 4:34 PM Andrew Pinski <pins...@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc <gcc@gcc.gnu.org> wrote: > > > > I wrote more about some motivation for guaranteed tail calls here: > > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html > > So I was looking into the use case presented in that web page, it is > the same reason why GCC has the computed gotos extension already. > Is there a reason why you can't just use that instead?
Computed gotos solve some of the problems associated with interpreter main loops, but not others. Computed gotos help avoid a bounds check on the switch(). They can sometimes help ensure that the dispatch jmp is replicated at the end of each case, thus giving each case a unique branch prediction address, though this effect is somewhat fragile since the compiler will sometimes decide to merge these paths into one: https://godbolt.org/z/T9a6173WP Computed gotos cannot help with the two main issues mentioned in the article: 1. The larger a function is, and the more complex and connected its control flow, the harder it is for the compiler’s register allocator to keep the most important data in registers. 2. When fast paths and slow paths are intermixed in the same function, the presence of the slow paths compromises the code quality of the fast paths. Tail calls can help solve both of these problems: 1. If we break down the big interpreter function into many small functions, one per operation, we get control over register allocation at each function boundary. As long as each function is simple enough not to spill, this data will be kept in registers continuously through all the fast paths. 2. If we put fast paths and slow paths in separate functions entirely, then the slow paths cannot negatively affect the code generation of the fast paths. We spent a lot of time trying to work within the framework of computed gotos but were unable to get the results we wanted. Tail calls generated substantially better code than anything else we tried. There are also some practical reasons why tail calls are better for us, particularly that they make it easier to reuse the code across different "instantiations" of the interpreter. For the case of protobuf parsing, each protobuf message has a distinct set of fields with different field numbers and types. With tail calls we can create a distinct dispatch table for each message type, but the function pointers in these tables can point to a set of functions that are shared across all message types. Thanks, Josh