Hi all, Now that yytoken_kind_t is complete and includes all the guys that were #defined by hand, we can see that we have both YYUNDEF and YYERRCODE available as return value for yylex.
YYUNDEF is somewhat weird, as its main point is actually YYSYMBOL_YYUNDEF: any unexpected value returned by yylex is mapped to YYSYMBOL_YYUNDEF, but for the symbol YYSYMBOL_YYUNDEF to exist, the token YYUNDEF must exist. Yet I can see that it would be useful to provide a semantic value to it, if the scanner wants to return a bad sequence of characters to the parser, and let it report the error and enter recovery mode. Likewise YYERRCODE (the token kind of the error token) exists because the error token exists. Yet POSIX refers to it, in a rather mysterious fashion (https://pubs.opengroup.org/onlinepubs/009695399/utilities/yacc.html): > The token error shall be reserved for error handling. The name error can be > used in grammar rules. It indicates places where the parser can recover from > a syntax error. The default value of error shall be 256. Its value can be > changed using a %token declaration. The lexical analyzer should not return > the value of error. I do not understand what is the point of defining the code of the error token (256) and even to be allowed to change that code, and then to report that "The lexical analyzer should not return the value of error". Obviously all the codes for tokens that are not literal characters should be larger than 255, but why do they single out "error" this way is puzzling to me. I see no reason to forbid the scanner from returning the error token. Maybe there is something I do not see, but AFAICT the machinery accepts this perfectly. Here's an example with a modified yylex in examples/c/bistromathic: > $ git diff > diff --git a/examples/c/bistromathic/parse.y b/examples/c/bistromathic/parse.y > index a3b34c38..fcab3517 100644 > --- a/examples/c/bistromathic/parse.y > +++ b/examples/c/bistromathic/parse.y > @@ -235,6 +235,10 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE > *yylloc) > case '(': return TOK_LPAREN; > case ')': return TOK_RPAREN; > > + case '!': > + yyerror (yylloc, "error: user triggered error"); > + return TOK_YYERRCODE; > + > case '\0': return TOK_YYEOF; > > // Numbers. > @@ -276,8 +280,7 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE > *yylloc) > > // Stray characters. > default: > - yyerror (yylloc, "error: invalid character"); > - return yylex (line, yylval, yylloc); > + return TOK_YYUNDEF; > } > } So in this example '!' is YYERRCODE and '$' is YYUNDEF: $ ./_build/g9d/examples/c/bistromathic/bistromathic > 1 ! 2 1.3: error: user triggered error 1.3: syntax error: expected end of file or + or - or * or / or ^ before error > 1 $ 2 2.3: syntax error: expected end of file or + or - or * or / or ^ before invalid token > exit (the error messages with "before" are misleading, I should change that.) However, we have two competing features: YYUNDEF and YYERCODE. And maybe we should give them two different semantics. How about using YYERRCODE to enter error-recovery without reporting an error? Much like the YYERROR macro in user actions. That would provide a feature I personally felt the need for years ago: emit the error message from the scanner (which has more details about it), and enter error recovery. Currently there's no way to do that with having two messages: one from the scanner, the other from the parser. WDYT?
