Hi all, I believe that typed mid-rule actions are ready, except for the lack of documentation. Please find below a proposal of documentation, currently in the ad/typed-midrule branch and in PDF currently here:
https://www.lrde.epita.fr/~akim/private/bison/bison.pdf I would also like to enforce consistency in Bison which uses both midrule and mid-rule. I am in favor of ‘midrule' (shorter, consistent with our move to lookahead instead of look-ahead, more consistent between code and doc as there is no conversion from dash to underscore, etc.), but I am not a native! To which one should we stick? (FWIW, neither appear in the manual of YACC, only ‘An action appearing in the middle of a rule […]’ does). I must have gotten rotten in Texinfo, as both my @ref to {Typed Mid-Rule Actions} appear as ‘See ⟨undefined⟩ [Typed Mid-Rule Actions], page ⟨undefined⟩’ in PDF, and I have no idea why. It doesn’t appear in the table of contents either. Info and HTML are fine, latest gnulib. Thanks in advance! commit 962317f490ee018fd6f6177495c356e9effc92a3 Author: Akim Demaille <[email protected]> Date: Sun Aug 12 10:49:29 2018 +0200 doc: typed mid-rule actions * doc/bison.texi (Mid-Rule Actions): Restructure to insert... (Typed Mid-Rule Actions): this new section. Move the manual translation of mid-rule actions into regular actions to... (Mid-Rule Action Translation): here. diff --git a/doc/bison.texi b/doc/bison.texi index 74ad9e5d..bad88cdb 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -225,6 +225,7 @@ Defining Language Semantics Actions in Mid-Rule * Using Mid-Rule Actions:: Putting an action in the middle of a rule. +* Typed Mid-Rule Actions:: Specifying the semantic type of their values. * Mid-Rule Action Translation:: How mid-rule actions are actually processed. * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. @@ -4071,6 +4072,7 @@ are executed before the parser even recognizes the following components. @menu * Using Mid-Rule Actions:: Putting an action in the middle of a rule. +* Typed Mid-Rule Actions:: Specifying the semantic type of their values. * Mid-Rule Action Translation:: How mid-rule actions are actually processed. * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. @end menu @@ -4158,64 +4160,86 @@ earlier action is used to restore the prior list of variables. This removes the temporary @code{let}-variable from the list so that it won't appear to exist while the rest of the program is parsed. +Because the types of the semantic values of mid-rule actions are unknown to +Bison, type-based features (e.g., @samp{%printer}, @samp{%destructor}) do +not work, which could result in memory leaks. They also forbid the use of +the @code{variant} implementation of the @code{api.value.type} in C++ +(@pxref{C++ Variants}). + +@xref{Typed Mid-Rule Actions}, for one way to address this issue, and +@ref{Mid-Rule Action Translation}, for another: turning mid-action actions +into regular actions. + + +@node Typed Mid-Rule Actions +@subsubsection Typed Mid-Rule Actions + @findex %destructor @cindex discarded symbols, mid-rule actions @cindex error recovery, mid-rule actions In the above example, if the parser initiates error recovery (@pxref{Error Recovery}) while parsing the tokens in the embedded statement @code{stmt}, it might discard the previous semantic context @code{$<context>5} without -restoring it. -Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing -Discarded Symbols}). -However, Bison currently provides no means to declare a destructor specific to -a particular mid-rule action's semantic value. - -One solution is to bury the mid-rule action inside a nonterminal symbol and to -declare a destructor for that symbol: +restoring it. Thus, @code{$<context>5} needs a destructor +(@pxref{Destructor Decl, , Freeing Discarded Symbols}), and Bison needs the +type of the semantic value (@code{context}) to select the right destructor. -@example -@group -%type <context> let -%destructor @{ pop_context ($$); @} let -@end group +As an extension to Yacc's mid-rule actions, Bison offers a means to type +their semantic value: specify its type tag (@samp{<...>} before the mid-rule +action. -%% +Consider the previous example, with an untyped mid-rule action: +@example @group stmt: - let stmt + "let" '(' var ')' @{ - $$ = $2; - pop_context ($let); - @}; + $<context>$ = push_context (); // *** + declare_variable ($3); + @} + stmt + @{ + $$ = $6; + pop_context ($<context>5); // *** + @} @end group +@end example + +@noindent +If instead you write: +@example @group -let: +stmt: "let" '(' var ')' - @{ - $let = push_context (); + <context>@{ // *** + $$ = push_context (); // *** declare_variable ($3); - @}; - + @} + stmt + @{ + $$ = $6; + pop_context ($5); // *** + @} @end group @end example @noindent -Note that the action is now at the end of its rule. -Any mid-rule action can be converted to an end-of-rule action in this way, and -this is what Bison actually does to implement mid-rule actions. +then @code{%printer}, and @code{%destructor} work properly (no more leaks!), +C++ @code{variant}s can be used, and redundancy is reduced (@code{<context>} +is specified once). + @node Mid-Rule Action Translation @subsubsection Mid-Rule Action Translation @vindex $@@@var{n} @vindex @@@var{n} -As hinted earlier, mid-rule actions are actually transformed into regular -rules and actions. The various reports generated by Bison (textual, -graphical, etc., see @ref{Understanding, , Understanding Your Parser}) -reveal this translation, best explained by means of an example. The -following rule: +Mid-rule actions are actually transformed into regular rules and actions. +The various reports generated by Bison (textual, graphical, etc., see +@ref{Understanding, , Understanding Your Parser}) reveal this translation, +best explained by means of an example. The following rule: @example exp: @{ a(); @} "b" @{ c(); @} @{ d(); @} "e" @{ f(); @}; @@ -4273,6 +4297,45 @@ mid.y:2.19-31: warning: unused value: $3 @end group @end example +@sp 1 + +It is sometimes useful to turn mid-rule actions into regular actions, e.g., +to factor them, or to escape from their limitations. For instance, as an +alternative to @emph{typed} mid-rule action, you may bury the mid-rule +action inside a nonterminal symbol and to declare a printer and a destructor +for that symbol: + +@example +@group +%type <context> let +%destructor @{ pop_context ($$); @} let +%printer @{ print_context (yyo, $$); @} let +@end group + +%% + +@group +stmt: + let stmt + @{ + $$ = $2; + pop_context ($let); + @}; +@end group + +@group +let: + "let" '(' var ')' + @{ + $let = push_context (); + declare_variable ($var); + @}; + +@end group +@end example + + + @node Mid-Rule Conflicts @subsubsection Conflicts due to Mid-Rule Actions @@ -10523,7 +10586,7 @@ To enable variant-based semantic values, set @code{%define} variable @code{%union} is ignored, and instead of using the name of the fields of the @code{%union} to ``type'' the symbols, use genuine types. -For instance, instead of +For instance, instead of: @example %union @@ -10536,7 +10599,7 @@ For instance, instead of @end example @noindent -write +write: @example %token <int> NUMBER; @@ -10555,7 +10618,10 @@ Variants are stricter than unions. When based on unions, you may play any dirty game with @code{yylval}, say storing an @code{int}, reading a @code{char*}, and then storing a @code{double} in it. This is no longer possible with variants: they must be initialized, then assigned to, and -eventually, destroyed. +eventually, destroyed. As a matter of fact, Bison variants forbid the use +of alternative types such as @samp{$<int>2} or @samp{$<std::string>$}, even +in mid-rule actions. It is mandatory to use typed mid-rule actions +(@pxref{Typed Mid-Rule Actions}). @deftypemethod {semantic_type} {T&} build<T> () Initialize, but leave empty. Returns the address where the actual value may @@ -10575,10 +10641,13 @@ Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already ``knows'' the type of the semantic value, so that would be duplicating the information. +We do not use C++17's @code{std::variant} either: we want to support all the +C++ standards, and of course @code{std::variant} also stores a tag to record +the current type. + Therefore we developed light-weight variants whose type tag is external (so -they are really like @code{unions} for C++ actually). But our code is much -less mature that Boost.Variant. So there is a number of limitations in -(the current implementation of) variants: +they are really like @code{unions} for C++ actually). There is a number of +limitations in (the current implementation of) variants: @itemize @item Alignment must be enforced: values should be aligned in memory according to @@ -10588,6 +10657,9 @@ therefore, since, as far as we know, @code{double} is the most demanding type on all platforms, alignments are enforced for @code{double} whatever types are actually used. This may waste space in some cases. +@item +Move semantics is not yet supported, but will soon be added. + @item There might be portability issues we are not aware of. @end itemize
