Re: custom error messages

Adrian Vogelsgesang Sat, 04 Jan 2020 11:26:51 -0800

Hi Akim

> Why not indeed. It will have to be implemented in all our skeletons
> anyway, D included.


Great to hear! I wasn’t sure if you were planning to cover all skeletons
or would focus on C

> My proposal (but returning an array of token numbers rather that token
> names) would work for such a case, but it would be a bit wasteful: we
> would first gather all the possible tokens in an array, and then remove
> the useless ones.
>
> Would that be ok with you?

That would certainly work for us. We are not optimizing for "syntax errors
per second" anyway.

My main ask here would be to expose the token numbers instead of the token
names. One can always get the corresponding token name using a lookup in
"yytname", so I would consider an interface which provides the token numbers
as strictly superior.

> Do you have an alternative to propose?

One could tackle this particular use case also from a different angle:
We could introduce the concept of "opaque" rules, i.e. rules which are not
expanded when reporting syntax errors.

E.g., if I could define "unreserved_keyword" as
> unreserved_keyword [opaqe]: ABORT_P | ABSOLUTE_P | <...>
bison should then create the error message
> expected: Identifier, unreserved_keyword
instead of
> expected: Identifier, <long list containing all unreserved keywords>

I have no idea how common such use cases are, though. Also, I am currently
not even sure opaque rules could be efficiently implemented. Probably not
worth it…
Especially, given that we would still use custom error reporting for
internationalization, even independent of the filtering of the token list.

Cheers,
Adrian

From: Akim Demaille <[email protected]>
Date: Saturday, 4 January 2020 at 12:15
To: Adrian Vogelsgesang <[email protected]>
Cc: Bison Patches <[email protected]>, Derek Clegg <[email protected]>, 
Christian Schoenebeck <[email protected]>, Kevin Villela 
<[email protected]>, Rici Lake <[email protected]>
Subject: Re: custom error messages

Hi Adrian,

> Le 3 janv. 2020 à 13:13, Adrian Vogelsgesang <[email protected]> a 
> écrit :
>
> Hi Akim,
>
> I am happy to see you are looking into improving error reporting.
> We at Tableau also struggled with this. I think your proposed solution,
> in particular "parse.error custom", would improve things a lot for us.

Good!

> Our parser uses the C++ skeleton, though. In case you are interested,
> I could take care of the C++ work, as soon as the we know which exact
> interface we want to support.

Why not indeed. It will have to be implemented in all our skeletons
anyway, D included.


> Our grammar is structured similarly to the Postgres grammar
> (https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y<https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y>)
> If we take a look at ColId, type_function_name, NonReservedWord (lines 15064
> and following), we can see that in most places where IDENT (i.e. identifiers)
> are accepted, also "unreserved_keyword" are accepted. Now, if someone types
> > CREATE TABLE $1
> there is a syntax error near "$1" and the list of expected tokens would
> include the complete "unreserved_keyword" list, i.e. ~300 keywords. Obviously
> this is pointless.

Indeed. I see your point. This is probably the place where what I have
in mind for Bison 4 would shine.

> Hence, we have custom logic in place which checks if the IDENT token is
> among the expected tokens. If so, we supress all tokens with an internal
> token number larger than the token number of "ABORT_P" - which happen to be
> all keywords.
>
> You can find our current "solution" attached to this email. You can run this
> Python script against a C++ parser generated by bison and it will inject our
> custom error handling code.

If I read correctly, the core of your approach is here:

> int ident=SQLParser::yytranslate_(token::IDENT);
> int abort_p=SQLParser::yytranslate_(token::ABORT_P);
> got = yytname_[yytoken]; if ((yytoken<abort_p)&&(got[0]=='"')) ++got;
> bool expectedIdent=false;
> int yyn = yypact_[yystate];
> if (!yy_pact_value_is_default_ (yyn)) {
> int yyxbegin = yyn < 0 ? -yyn : 0;
> int yychecklim = yylast_ - yyn + 1;
> int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
> for (int yyx = yyxbegin; yyx < yyxend; ++yyx) {
> if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_ && 
> !yy_table_value_is_error_ (yytable_[yyx + yyn]))
> if (yyx==ident) { expectedIdent=true; break; }
> }
> }
> if (!yy_pact_value_is_default_ (yyn)) {
> int yyxbegin = yyn < 0 ? -yyn : 0;
> int yychecklim = yylast_ - yyn + 1;
> int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
> for (int yyx = yyxbegin; yyx < yyxend; ++yyx) {
> if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_ && 
> !yy_table_value_is_error_ (yytable_[yyx + yyn])) {
> if ((expectedIdent)&&(yyx>=abort_p)) continue;
> const char* p=yytname_[yyx]; if ((yyx<abort_p)&&(p[0]=='\"')) ++p;
> possibilities.push_back(p);
> }
> }
> }

in particular

> int ident=SQLParser::yytranslate_(token::IDENT);
> int abort_p=SQLParser::yytranslate_(token::ABORT_P);

and

> if (yyx==ident) { expectedIdent=true; break; }

which notices when IDENT is there, and then

> if ((expectedIdent)&&(yyx>=abort_p)) continue;

which filters out the alternatives in that case. Such a transformation is 
easier to implement with the symbol numbers, indeed. My proposal (but returning 
an array of token numbers rather that token names) would work for such a case, 
but it would be a bit wasteful: we would first gather all the possible tokens 
in an array, and then remove the useless ones.

Would that be ok with you? Do you have an alternative to propose?

I'm not super excited at the idea of exposing all these gory details to the 
users, I don't want to commit myself to have to support 
yy_pact_value_is_default_, and all this internal variables. I want to stay 
high-level.

Re: custom error messages

Reply via email to