On Sat, Jan 18, 2020 at 4:54 AM Iain Sandoe <i...@sandoe.co.uk> wrote: > > Hi, > > Thanks to: > > * the reviewers, the code was definitely improved by your reviews. > > * those folks who tested the branch and/or compiler explorer > instance and reported problems with reproducers. > > * WG21 colleagues, especially Lewis and Gor for valuable input > and discussions on the design. > > ===== TL;DR: > > * This is not enabled by default (even for -std=c++2a), it needs -fcoroutines. > > * Like all the C++20 support, it is experimental, perhaps more experimental > than some other pieces because wording is still being amended. > > * The FE/ME tests are run for ALL targets; in principle this should be target- > agnostic, if we see fails then that is probably interesting input for the > ABI > panel. > > * I regstrapped on 64b LE and BE platforms and a 32b LE host with no observed > issues or regressions. > > * it’s just slightly too big to send uncompressed so attached as a bz2. > > * commit is r10-6063-g49789fd08 > > thanks again to all those who helped, > Iain > > ====== The full covering note: > > This is the squashed version of the first 6 patches that were split to > facilitate review. > > The changes to libiberty (7th patch) to support demangling the co_await > operator stand alone and are applied separately. > > The patch series is an initial implementation of a coroutine feature, > expected to be standardised in C++20. > > Standardisation status (and potential impact on this implementation) > -------------------------------------------------------------------- > > The facility was accepted into the working draft for C++20 by WG21 in > February 2019. During following WG21 meetings, design and national body > comments have been reviewed, with no significant change resulting. > > The current GCC implementation is against n4835 [1]. > > At this stage, the remaining potential for change comes from: > > * Areas of national body comments that were not resolved in the version we > have worked to: > (a) handling of the situation where aligned allocation is available. > (b) handling of the situation where a user wants coroutines, but does not > want exceptions (e.g. a GPU). > > * Agreed changes that have not yet been worded in a draft standard that we > have worked to. > > It is not expected that the resolution to these can produce any major > change at this phase of the standardisation process. Such changes should be > limited to the coroutine-specific code. > > ABI > --- > > The various compiler developers 'vendors' have discussed a minimal ABI to > allow one implementation to call coroutines compiled by another. > > This amounts to: > > 1. The layout of a public portion of the coroutine frame. > > Coroutines need to preserve state across suspension points, the storage for > this is called a "coroutine frame". > > The ABI mandates that pointers into the coroutine frame point to an area > begining with two function pointers (to the resume and destroy functions > described below); these are immediately followed by the "promise object" > described in the standard. > > This is sufficient that the builtins can take a coroutine frame pointer and > determine the address of the promise (or call the resume/destroy functions). > > 2. A number of compiler builtins that the standard library might use. > > These are implemented by this patch series. > > 3. This introduces a new operator 'co_await' the mangling for which is also > agreed between vendors (and has an issue filed for that against the upstream > c++abi). Demangling for this is added to libiberty in a separate patch. > > The ABI has currently no target-specific content (a given psABI might elect > to mandate alignment, but the common ABI does not do this). > > Standard Library impact > ----------------------- > > The current implementations require addition of only a single header to > the standard library (no change to the runtime). This header is part of > the patch. > > GCC Implementation outline > -------------------------- > > The standard's design for coroutines does not decorate the definition of > a coroutine in any way, so that a function is only known to be a coroutine > when one of the keywords (co_await, co_yield, co_return) is encountered. > > This means that we cannot special-case such functions from the outset, but > must process them differently when they are finalised - which we do from > "finish_function ()". > > At a high level, this design of coroutine produces four pieces from the > original user's function: > > 1. A coroutine state frame (taking the logical place of the activation > record for a regular function). One item stored in that state is the > index of the current suspend point. > 2. A "ramp" function > This is what the user calls to construct the coroutine frame and start > the coroutine execution. This will return some object representing the > coroutine's eventual return value (or means to continue it when it it > suspended). > 3. A "resume" function. > This is what gets called when a the coroutine is resumed when suspended. > 4. A "destroy" function. > This is what gets called when the coroutine state should be destroyed > and its memory released. > > The standard's coroutines involve cooperation of the user's authored function > with a provided "promise" class, which includes mandatory methods for > handling the state transitions and providing output values. Most realistic > coroutines will also have one or more 'awaiter' classes that implement the > user's actions for each suspend point. As we parse (or during template > expansion) the types of the promise and awaiter classes become known, and can > then be verified against the signatures expected by the standard. > > Once the function is parsed (and templates expanded) we are able to make the > transformation into the four pieces noted above. > > The implementation here takes the approach of a series of AST transforms. > The state machine suspend points are encoded in three internal functions > (one of which represents an exit from scope without cleanups). These three > IFNs are lowered early in the middle end, such that the majority of GCC's > optimisers can be run on the resulting output. > > As a design choice, we have carried out the outlining of the user's function > in the front end, and taken advantage of the existing middle end's abilities > to inline and DCE where that is profitable. > > Since the state machine is actually common to both resumer and destroyer > functions, we make only a single function "actor" that contains both the > resume and destroy paths. The destroy function is represented by a small > stub that sets a value to signal the use of the destroy path and calls the > actor. The idea is that optimisation of the state machine need only be done > once - and then the resume and destroy paths can be identified allowing the > middle end's inline and DCE machinery to optimise as profitable as noted > above. > > The middle end components for this implementation are: > > A pass that: > 1. Lowers the coroutine builtins that allow the standard library header to > interact with the coroutine frame (these fairly simple logical or > numerical substitution of values, given a coroutine frame pointer). > 2. Lowers the IFN that represents the exit from state without cleanup. > Essentially, this becomes a gimple goto. > 3. Sets the final size of the coroutine frame at this stage. > > A second pass (that requires the revised CFG that results from the lowering > of the scope exit IFNs in the first). > > 1. Lower the IFNs that represent the state machine paths for the resume and > destroy cases. > > Patches squashed into this commit: > > [C++ coroutines 1] Common code and base definitions. > > This part of the patch series provides the gating flag, the keywords, > cpp defines etc. > > [C++ coroutines 2] Define builtins and internal functions. > > This part of the patch series provides the builtin functions > used by the standard library code and the internal functions > used to implement lowering of the coroutine state machine. > > [C++ coroutines 3] Front end parsing and transforms. > > There are two parts to this. > > 1. Parsing, template instantiation and diagnostics for the standard- > mandated class entries. > > The user authors a function that becomes a coroutine (lazily) by > making use of any of the co_await, co_yield or co_return keywords. > > Unlike a regular function, where the activation record is placed on the > stack, and is destroyed on function exit, a coroutine has some state that > persists between calls - the 'coroutine frame' (thus analogous to a stack > frame). > > We transform the user's function into three pieces: > 1. A so-called ramp function, that establishes the coroutine frame and > begins execution of the coroutine. > 2. An actor function that contains the state machine corresponding to the > user's suspend/resume structure. > 3. A stub function that calls the actor function in 'destroy' mode. > > The actor function is executed: > * from "resume point 0" by the ramp. > * from resume point N ( > 0 ) for handle.resume() calls. > * from the destroy stub for destroy point N for handle.destroy() calls. > > The C++ coroutine design described in the standard makes use of some helper > methods that are authored in a so-called "promise" class provided by the > user. > > At parse time (or post substitution) the type of the coroutine promise > will be determined. At that point, we can look up the required promise > class methods and issue diagnostics if they are missing or incorrect. To > avoid repeating these actions at code-gen time, we make use of temporary > 'proxy' variables for the coroutine handle and the promise - which will > eventually be instantiated in the coroutine frame. > > Each of the keywords will expand to a code sequence (although co_yield is > just syntactic sugar for a co_await). > > We defer the analysis and transformatin until template expansion is > complete so that we have complete types at that time. > > 2. AST analysis and transformation which performs the code-gen for the > outlined state machine. > > The entry point here is morph_fn_to_coro () which is called from > finish_function () when we have completed any template expansion. > > This is preceded by helper functions that implement the phases below. > > The process proceeds in four phases. > > A Initial framing. > The user's function body is wrapped in the initial and final suspend > points and we begin building the coroutine frame. > We build empty decls for the actor and destroyer functions at this > time too. > When exceptions are enabled, the user's function body will also be > wrapped in a try-catch block with the catch invoking the promise > class 'unhandled_exception' method. > > B Analysis. > The user's function body is analysed to determine the suspend points, > if any, and to capture local variables that might persist across such > suspensions. In most cases, it is not necessary to capture compiler > temporaries, since the tree-lowering nests the suspensions correctly. > However, in the case of a captured reference, there is a lifetime > extension to the end of the full expression - which can mean across a > suspend point in which case it must be promoted to a frame variable. > > At the conclusion of analysis, we have a conservative frame layout and > maps of the local variables to their frame entry points. > > C Build the ramp function. > Carry out the allocation for the coroutine frame (NOTE; the actual size > computation is deferred until late in the middle end to allow for future > optimisations that will be allowed to elide unused frame entries). > We build the return object. > > D Build and expand the actor and destroyer function bodies. > The destroyer is a trivial shim that sets a bit to indicate that the > destroy dispatcher should be used and then calls into the actor. > > The actor function is the implementation of the user's state machine. > The current suspend point is noted in an index. > Each suspend point is encoded as a pair of internal functions, one in > the relevant dispatcher, and one representing the suspend point. > > During this process, the user's local variables and the proxies for the > self-handle and the promise class instanceare re-written to their > coroutine frame equivalents. > > The complete bodies for the ramp, actor and destroy function are passed > back to finish_function for folding and gimplification. > > [C++ coroutines 4] Middle end expanders and transforms. > > The first part of this is a pass that provides: > * expansion of the library support builtins, these are simple boolean > or numerical substitutions. > > * The functionality of implementing an exit from scope without cleanup > is performed here by lowering an IFN to a gimple goto. > > This pass has to run for non-coroutine functions, since functions calling > the builtins are not necessarily coroutines (i.e. they are implementing the > library interfaces which may be called from anywhere). > > The second part is the expansion of the coroutine IFNs that describe the > state machine connections to the dispatchers. This only has to be run > for functions that are coroutine components. The work done by this pass > is: > > In the front end we construct a single actor function that contains > the coroutine state machine. > > The actor function has three entry conditions: > 1. from the ramp, resume point 0 - to initial-suspend. > 2. when resume () is executed (resume point N). > 3. from the destroy () shim when that is executed. > > The actor function begins with two dispatchers; one for resume and > one for destroy (where the initial entry from the ramp is a special- > case of resume point 0). > > Each suspend point and each dispatch entry is marked with an IFN such > that we can connect the relevant dispatchers to their target labels. > > So, if we have: > > CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR) > > This is await point NUM, and is the final await if FINAL is non-zero. > The resume point is RES_LAB, and the destroy point is DEST_LAB. > > We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a > CO_ACTOR (NUM+1) in the destroy dispatcher. > > Initially, the intent of keeping the resume and destroy paths together > is that the conditionals controlling them are identical, and thus there > would be duplication of any optimisation of those paths if the split > were earlier. > > Subsequent inlining of the actor (and DCE) is then able to extract the > resume and destroy paths as separate functions if that is found > profitable by the optimisers. > > Once we have remade the connections to their correct postions, we elide > the labels that the front end inserted. > > [C++ coroutines 5] Standard library header. > > This provides the interfaces mandated by the standard and implements > the interaction with the coroutine frame by means of inline use of > builtins expanded at compile-time. There should be a 1:1 correspondence > with the standard sections which are cross-referenced. > > There is no runtime content. > > At this stage, we have the content in an inline namespace "__n4835" for > the CD we worked to. > > [C++ coroutines 6] Testsuite. > > There are two categories of test: > > 1. Checks for correctly formed source code and the error reporting. > 2. Checks for transformation and code-gen. > > The second set are run as 'torture' tests for the standard options > set, including LTO. These are also intentionally run with no options > provided (from the coroutines.exp script). > > gcc/ChangeLog: > > 2020-01-18 Iain Sandoe <i...@sandoe.co.uk> > > * Makefile.in: Add coroutine-passes.o. > * builtin-types.def (BT_CONST_SIZE): New. > (BT_FN_BOOL_PTR): New. > (BT_FN_PTR_PTR_CONST_SIZE_BOOL): New. > * builtins.def (DEF_COROUTINE_BUILTIN): New. > * coroutine-builtins.def: New file. > * coroutine-passes.cc: New file.
There are tree res_tgt = TREE_OPERAND (gimple_call_arg (stmt, 2), 0); tree &res_dest = destinations.get_or_insert (idx, &existed); if (existed && dump_file) Why does this behavior depend on dump_file? { fprintf ( dump_file, "duplicate YIELD RESUME point (" HOST_WIDE_INT_PRINT_DEC ") ?\n", idx); print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS|TDF_MEMSYMS); } else res_dest = res_tgt; H.J.