On 2020-08-30 05:42, Akim Demaille wrote:
Let's face it: the POSIX spec for Yacc contains a lot of silly things.
For instance using #define for tokens is silly. enums are the obvious
right choice.
I happen to have an #ifdef test for one of these tokens, which is a
proxy for "y.tab.h has been included, and therefore YYSTYPE is
declared".
(I know about YYSTYPE_IS_DECLARED, but it is not portable!)
You can't just casually replace #define's with enum.
C code gets compiled as C++, and that includes Yacc stuff.
Enums behave differently in C++; they are more type safe.
If you actually use the enum type (not only the constants) you
run into the problem that the constants start at 257. Below 256 that
range are unnamed tokens corresponding to characters.
If you introduce some
enum YYTOKTYPE {
IDENT = 257
...
}
and then if you not just use the constants, but do the equivalent
of this somewhere in the code:
enum YYTOKTYPE tok = 'A';
it will not compile as C++; and it's "morally wrong" in C also.
That said, I've written switch statements on these tokens,
intended to handle all of the named cases. If they were enums,
the cases I missed would have been diagnosed.
You can have both. The following compromise is not unheard of:
enum {
IDENT = 257
#define IDENT IDENT
...
}
I'd still be nervous implementing that. Unix has documented it
to the users that there are just those #define-s and nothing else.
That allows you to do something like
#undef INDENT
int INDENT = 42;
which won't work if removing the macro still leaves an enum
constant in scope. That may look stupid, but it could happen
somehow by accident.
Maybe some file somewhere ends up including some y.tab.h
through some twisted chain of includes that cannot be
unraveled and some token constant happens to clash with
something, whereby the #undef provides the workaround.
Asserting the enum constant into the file scope means taking
away that workaround.
Imagine someone named some Yacc tokens TRUE and FALSE, and
then the y.tab.h is pulled into code that defines these
things.
The table "Internal Limits in yacc" is also a good
source of laugh.
Those are minimum limits. Standards have to define limits. C has
numerous minimum limits; like how many parameters in a function a
conforming
implementation must support, how many levels of nesting, and such.
Some of them are laughably low, and were even more so back in C90.
If you make a minimum limit to high, it means that some implementation
that cannot provide it is no longer a POSIX conforming Yacc.
If you don't specify minimum limits at all, then some joker
will make something that supports 7 tokens and 5 states, and
call it a valid Yacc.
The description of y.output says:
Limits for internal tables (see Limits) shall also be reported,
in an implementation-defined manner. (Some implementations may
use dynamic allocation techniques and have no specific limit
values to report.)
The minimum Yacc limits in the table in fact look usable for
many applications. You can fit some reasonably featured languages
in 600 states. Not your C++, obviously.
The numbers are likely based on historic values that were
used in practice.