[C++ coroutines 0/6] Implement C++ coroutines.

Iain Sandoe Sun, 17 Nov 2019 02:24:23 -0800


This patch series is an initial implementation of a coroutine feature,
expected to be standardised in C++20.


Standardisation status (and potential impact on this implementation):
----------------------

The facility was accepted into the working draft for C++20 by WG21 in
February 2019.  During two following WG21 meetings, design and national
body comments have been reviewed, with no significant change resulting.

Mature implementations (several years) of this exist in MSVC, clang and
EDG with some experience using the clang one in production - so that the
underlying principles are thought to be sound.

At this stage, the remaining potential for change comes from two areas of
national body comments that were not resolved during the last WG21 meeting:
(a) handling of the situation where aligned allocation is available.
(b) handling of the situation where a user wants coroutines, but does not
    want exceptions (e.g. a GPU).

It is not expected that the resolution to either of these will produce any
major change.

The current GCC implementation is against n4835 [1].

ABI
---

The various compiler developers have discussed a minimal ABI to allow one
implementation to call coroutines compiled by another; this amounts to:

1. The layout of a public portion of the coroutine frame.
2. A number of compiler builtins that the standard library might use.

The eventual home for the ABI is not decided yet, I will put a draft onto
the wiki this week.

The ABI has currently no target-specific content (a given psABI might elect
to mandate alignment, but the common ABI does not do this).

There is not need to add any new mangling, since the components of this are
regular functions with manipulation of the coroutine via a type-erased handle.

Standard Library impact
-----------------------

The current implementations require addition of only a single header to
the standard library (no change to the runtime).  This header is part of
the patch series.

GCC Implementation outline
--------------------------

The standard's design for coroutines does not decorate the definition of
a coroutine in any way, so that a function is only known to be a coroutine
when one of the keywords (co_await, co_yield, co_return) is encountered.

This means that we cannot special-case such functions from the outset, but
must process them differently when they are finalised - which we do from
"finish_function ()".

At a high level, this design of coroutine produces four pieces from the
original user's function:

  1. A coroutine state frame (taking the logical place of the activation
     record for a regular function).  One item stored in that state is the
     index of the current suspend point.
  2. A "ramp" function
     This is what the user calls to construct the coroutine frame and start
     the coroutine execution.  This will return some object representing the
     coroutine's eventual return value (or means to continue it when it it
     suspended).
  3. A "resume" function.
     This is what gets called when a the coroutine is resumed when suspended.
  4. A "destroy" function.
     This is what gets called when the coroutine state should be destroyed
     and its memory returned.

The standard's coroutines involve cooperation of the user's authored function
with a provided "promise" class, which includes mandatory methods for
handling the state transitions and providing output values.  Most realistic
coroutines will also have one or more 'awaiter' classes that implement the
user's actions for each suspend point.  As we parse (or during template
expansion) the types of the promise and awaiter classes become known, and can
then be verified against the signatures expected by the standard.

Once the function is parsed (and templates expanded) we are able to make the
transformation into the four pieces noted above.

The implementation here takes the approach of a series of AST transforms.
The state machine suspend points are encoded in three internal functions
(one of which represents an exit from scope without cleanups).  These three 
IFNs are lowered early in the middle end, such that the majority of GCC's
optimisers can be run on the resulting output.

As a design choice, we have carried out the outlining of the user's function
in the front end, and taken advantage of the existing middle end's abilities
to inline and DCE where that is profitable.

Since the state machine is actually common to both resumer and destroyer
functions, we make only a single function "actor" that contains both the
resume and destroy paths.  The destroy function is represented by a small
stub that sets a value to signal the use of the destroy path and calls the
actor.  The idea is that optimisation of the state machine need only be done
once - and then the resume and destroy paths can be identified allowing the
middle end's inline and DCE machinery to optimise as profitable as noted above.

The middle end components for this implementation are:
 1. Lower the coroutine builtins that allow the standard library header to
    interact with the coroutine frame (these fairly simple logical or
    numerical substitution of values given a coroutine frame pointer).
 2. Lower the IFN that represents the exit from state without cleanup.
    Essentially, this becomes a gimple goto.
 3. Lower the IFNs that represent the state machine paths for the resume and
    destroy cases.
 4. A very late pass that is able to re-size the coroutine frame when there
    are unused entries and therefore choose the minimum allocation for it.

There are no back-end implications to this current design.

GCC Implementation Status
-------------------------

The current implementation should be considered somewhat experimental and is
guarded by a "-fcoroutines" flag.  I have set out to minimise impact on the
compiler (such that with the switch off, coroutines should be a NOP).

The branch has been feature-complete for a few weeks and published on Compiler
Explorer since late September.  I have been keeping a copy of the branch on
my github page, and some bug reports have been filed there (and dealt with).

The only common resource taken is a single bit in the function decl to flag
that this function is determined to be a coroutine.

Patch Series
------------

The patch series is against r278049 (Mon 11th Nov).

There are 6 pieces to try an localise the reviewer interest areas.  However
it would not make sense to commit except as possibly two (main and testsuite).
I have not tested that the compiler would even build part-way through this
series.

1) Common code and base definitions.

This is the background content, defining the gating flag, keywords etc.

2) Builtins and internal functions.

Definitions of the builtins used by the standard library header and the
internal functions used to implement the state machine.

3)  Front end parsing and AST transforms.

This is the largest part of the code, and has essentially two phases
 1. parse (and template expansion)
 2. analysis and transformation, which does the code generation for the
    state machine.

4) Middle end expanders and transforms

 As per the description above.

5) Standard library header.

This is mostly mandated by the standard, although (of course) the decision
to implement the interaction with the coroutine frame by inline builtin
calls is pertinent.

There is no runtime addition for this (the builtins are expanded directly).

6) Testsuite.

There are two chunks of tests.
 1. those that check for correct error handling
 2. those that check for the correct lowering of the state machine
 
Since the second set are checking code-gen, they are run as 'torture' tests
with the default options list.

======

I will put this patch series onto a git branch for those that would prefer
to view it in that form.

thanks
Iain

======

[1] https://wg21.link/n4835

[C++ coroutines 0/6] Implement C++ coroutines.

Reply via email to