(Modified from an email originally sent privately to Akim, who persuaded me to make it public. It's really a minor point.)
I only just noticed that %nterm is a bison directive, although I still don't fully understand the motivation. I can't find any reference in the manual, except for its unexplained use in one example. But Akim assures me that it has been accepted for quite some time, at least since 1993. I noticed it more or less by accident when looking to see what the exact syntax for %token, %type and %[precedence-level] declarations were. And once I noticed it, I added it to a test suite and promptly discovered that this input segfaults on bison 3.0.5: %union {int n;} %nterm <n> start "a" %% start: "a" | start "a" With bison 3.1, the segfault is gone, but the error messages are a bit mysterious: nterm.y:4.8-10: error: symbol "a" redefined start: "a" | start "a" ^^^ nterm.y:4.1-5: error: rule given for start, which is a token start: "a" | start "a" ^^^^^ The %nterm grammar is essentially the same as the grammar for the %token directive, and I suppose that it was considered to be the complement for declaring non-terminals. But a non-terminal cannot be aliased to a quoted name. Moreover, non-terminals do not have any equivalent to the token enumeration, so there is no meaningful way to assign a number to a non-terminal. The attempt to alias a non-terminal, as in the incorrect bison snippet above, should be flagged as an error in the %nterm declaration, not left to create havoc later on in the parse. The misbehaviour of the above sample program is the result of not checking to ensure that the alias syntax is only applied to tokens. I didn't attempt to do a full analysis of the segfault in bison 3.0.5 since the bug was fixed before it was noticed :-) but it happens at a point in which bison is about to issue a different mysterious error message, alleging that a token's token number had previously been assigned. (In the course of that error report, bison tries to compare the source code locations for two definitions, and segfaults because one of the locations is NULL.) In 3.1, a new function is used which merges the attributes of a token and its alias; in the case of the incorrect %nterm declaration, this seems to have the effect of making the alias ("a") into a non-terminal. When "a" is subsequently encountered in a rule and converted into a token alias, bison complains that it was already defined (as a non-terminal). The redefinition appears to then change `start` into a token, which makes it an invalid left-hand-side for a production. For the casual user (or even the not-so-casual user), the error would have been much clearer if the report had been something like: nterm.y:2.18-20: error: Quoted strings cannot alias non-terminals %nterm <n> start "a" ^^^ But it would probably suffice to just reject the declaration as a syntax error, by changing the grammar so that only IDs can be listed in an %nterm declaration. (The situation is actually unlikely to arise since %nterm is undocumented and, I think, not particularly well-known.) The various symbol declaration declarations are a bit of a jumble, thanks to backwards compatibility and some unfortunate decisions made before the start of recorded time. For what it's worth, this is my understanding of the different syntaxes (in a kind of EBNF): class-declaration ::= ( %token | %nterm ) tag? ( ID NUMBER? QUOTED? )+ ( tag (ID NUMBER? QUOTED? )+ )* precedence-declaration ::= ( %left | %right | %precedence | %nonassoc) tag? ( ID NUMBER? | QUOTED | CHARACTER )+ type-declaration ::= %type tag ( ID | QUOTED | CHARACTER ) + The inconsistency between %token, which allows aliases to be declared, and %precedence declarations, which treat aliases as new independent symbols, is noted in the manual, and apparently is necessary for Posix compliance. Posix does not require an %nterm declaration, and the Posix grammar for declarations is much simpler, since it doesn't allow multiple tags in a single declaration, and it doesn't allow quoted strings. (IDENTIFIER here includes character literals): declaration ::= ( %token | %type | %left | %right | %nonassoc ) tag? ( IDENTIFIER NUMBER? )+ (Note that the Posix grammar allows %type declarations without a tag.) The fact that bison's %token and %nterm declarations do allow multiple tags is probably the only justification for the existence of %nterm, since there is no need to predeclare non-terminals. (It could be considered superior to %type because it explicitly states that the targets are non-terminals. On the other hand, it is generally more useful IMHO to group terminals and non-terminals with the same type tag together.) But it is many decades to late to suggest removing it; my only suggestion is that its grammar be limited to the semantically meaningful: token-declaration ::= %token tag? ( ID NUMBER? QUOTED? )+ ( tag ( ID NUMBER? QUOTED? )+ )* nterm-declaration ::= %nterm tag? ID+ ( tag ID+ )* On the other hand, there is no real reason not to extend multiple-tag syntax to the other declarations, which would make the syntaxes a little less incoherent: token-declaration ::= %token tag? ( ID NUMBER? QUOTED? )+ ( tag (ID NUMBER? QUOTED? )+ )* nterm-declaration ::= %nterm tag? ID+ ( tag ID+ )* precedence-declaration ::= ( %left | %right | %precedence | %nonassoc) tag? ( ID NUMBER? )+ ( tag ( ID NUMBER? )+ )* type-declaration ::= %type ( tag ( ID | QUOTED | CHARACTER )+ )+ Such a change would not make bison any more or less Posix-compliant than it already is. Rici.