On Fri, Apr 23, 2021 at 4:34 PM Andrew Pinski <pins...@gmail.com> wrote:
>
> On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > I wrote more about some motivation for guaranteed tail calls here:
> > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html
>
> So I was looking into the use case presented in that web page, it is
> the same reason why GCC has the computed gotos extension already.
> Is there a reason why you can't just use that instead?

Computed gotos solve some of the problems associated with interpreter
main loops, but not others.

Computed gotos help avoid a bounds check on the switch(). They can
sometimes help ensure that the dispatch jmp is replicated at the end
of each case, thus giving each case a unique branch prediction
address, though this effect is somewhat fragile since the compiler
will sometimes decide to merge these paths into one:
https://godbolt.org/z/T9a6173WP

Computed gotos cannot help with the two main issues mentioned in the article:
1. The larger a function is, and the more complex and connected its
control flow, the harder it is for the compiler’s register allocator
to keep the most important data in registers.
2. When fast paths and slow paths are intermixed in the same function,
the presence of the slow paths compromises the code quality of the
fast paths.

Tail calls can help solve both of these problems:
1. If we break down the big interpreter function into many small
functions, one per operation, we get control over register allocation
at each function boundary. As long as each function is simple enough
not to spill, this data will be kept in registers continuously through
all the fast paths.
2. If we put fast paths and slow paths in separate functions entirely,
then the slow paths cannot negatively affect the code generation of
the fast paths.

We spent a lot of time trying to work within the framework of computed
gotos but were unable to get the results we wanted. Tail calls
generated substantially better code than anything else we tried.

There are also some practical reasons why tail calls are better for
us, particularly that they make it easier to reuse the code across
different "instantiations" of the interpreter. For the case of
protobuf parsing, each protobuf message has a distinct set of fields
with different field numbers and types. With tail calls we can create
a distinct dispatch table for each message type, but the function
pointers in these tables can point to a set of functions that are
shared across all message types.

Thanks,
Josh

Reply via email to