> Le 27 sept. 2020 à 20:46, Rici Lake <[email protected]> a écrit :
>
> Many parser generators do have the option to parse from various roots. One
> interesting case is ANTLR, which provides methods for parsing from *every*
> non-terminal (with names generated from the non-terminal).
Well, that's "cheating" (as you pointed out farther in your message):
ANTLR implements an recursive descent parser, i.e., its very technique
consists in emitting one parsing function per non-terminal. So actually,
I expect that all the LL generators support the free choice of the start
symbol.
Bison generates LR parsers. That does not apply.
> Although the
> vast majority of these interfaces will never be used, it turns out to be
> extremely convenient for debugging grammars (and for didactic purposes,
> such as drawing small parse trees). In ANTLR, these interfaces have little
> or no cost, since it fundamentally produces recursive descent parser
> anyway, but it might still be reasonable to allow "%start *" for parser
> debugging.
>
> Of course, in a C code generator, you most certainly wouldn't want to
> generate dozens (or hundreds) of unused interfaces, so this kind of feature
> would be better implemented by a general call which took a non-terminal
> enumerator as an argument. But that would require that the returned value
> type be the same regardless of non-terminal, which effectively reduces to
> the YYSTYPE union (or whatever it happens to be).
>
> OK, it's not necessarily a great idea to design a production interface
> around a feature only used for debugging.
Exactly :) Reading this sentence reminds me of one of my favorite
scenes in Oceans' 1[0-9]: https://www.youtube.com/watch?v=tcRvN2gtPiw
This feature, "start *", would generate quite larger automata.
In the case of Bison's own grammar, I get 450 states (that only x3,
I was expecting more) *and* additional conflicts (because Bison is
still using LALR for its grammar, so you can still have "subautomata"
that share states).
What I did not anticipate though, is that it crashes when generating
canonical LR on that grammar. However, I not not yet investigated
the impact of my changes in IELR and canonical LR, so that a TODO.
Using LR, "%start *" should be safe. You do have a point here.