RFC: YYUNDEF vs. YYERRCODE

Akim Demaille Mon, 13 Apr 2020 23:20:25 -0700

Hi all,

Now that yytoken_kind_t is complete and includes all the guys that were 
#defined by hand, we can see that we have both YYUNDEF and YYERRCODE available 
as return value for yylex.


YYUNDEF is somewhat weird, as its main point is actually YYSYMBOL_YYUNDEF: any 
unexpected value returned by yylex is mapped to YYSYMBOL_YYUNDEF, but for the 
symbol YYSYMBOL_YYUNDEF to exist, the token YYUNDEF must exist.  Yet I can see 
that it would be useful to provide a semantic value to it, if the scanner wants 
to return a bad sequence of characters to the parser, and let it report the 
error and enter recovery mode.

Likewise YYERRCODE (the token kind of the error token) exists because the error 
token exists.  Yet POSIX refers to it, in a rather mysterious fashion 
(https://pubs.opengroup.org/onlinepubs/009695399/utilities/yacc.html):

> The token error shall be reserved for error handling. The name error can be 
> used in grammar rules. It indicates places where the parser can recover from 
> a syntax error. The default value of error shall be 256. Its value can be 
> changed using a %token declaration. The lexical analyzer should not return 
> the value of error.

I do not understand what is the point of defining the code of the error token 
(256) and even to be allowed to change that code, and then to report that "The 
lexical analyzer should not return the value of error".  Obviously all the 
codes for tokens that are not literal characters should be larger than 255, but 
why do they single out "error" this way is puzzling to me.

I see no reason to forbid the scanner from returning the error token.  Maybe 
there is something I do not see, but AFAICT the machinery accepts this 
perfectly.

Here's an example with a modified yylex in examples/c/bistromathic:

> $ git diff
> diff --git a/examples/c/bistromathic/parse.y b/examples/c/bistromathic/parse.y
> index a3b34c38..fcab3517 100644
> --- a/examples/c/bistromathic/parse.y
> +++ b/examples/c/bistromathic/parse.y
> @@ -235,6 +235,10 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE 
> *yylloc)
>     case '(': return TOK_LPAREN;
>     case ')': return TOK_RPAREN;
> 
> +    case '!':
> +      yyerror (yylloc, "error: user triggered error");
> +      return TOK_YYERRCODE;
> +
>     case '\0': return TOK_YYEOF;
> 
>       // Numbers.
> @@ -276,8 +280,7 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE 
> *yylloc)
> 
>       // Stray characters.
>     default:
> -      yyerror (yylloc, "error: invalid character");
> -      return yylex (line, yylval, yylloc);
> +      return TOK_YYUNDEF;
>     }
> }

So in this example '!' is YYERRCODE and '$' is YYUNDEF:

$ ./_build/g9d/examples/c/bistromathic/bistromathic
> 1 ! 2
1.3: error: user triggered error
1.3: syntax error: expected end of file or + or - or * or / or ^ before error
> 1 $ 2
2.3: syntax error: expected end of file or + or - or * or / or ^ before invalid 
token
> exit

(the error messages with "before" are misleading, I should change that.)

However, we have two competing features: YYUNDEF and YYERCODE.  And maybe we 
should give them two different semantics.

How about using YYERRCODE to enter error-recovery without reporting an error?  
Much like the YYERROR macro in user actions.  That would provide a feature I 
personally felt the need for years ago: emit the error message from the scanner 
(which has more details about it), and enter error recovery.  Currently there's 
no way to do that with having two messages: one from the scanner, the other 
from the parser.

WDYT?

RFC: YYUNDEF vs. YYERRCODE

Reply via email to