Greetings, Bison developers, What would the sentiment be to include lexer support in core Bison?
I've thought for a long time that for common usage of Bison, it's not really helping developers to push them into a separate tool like Flex in order to enter their token patterns. While in theory it gives the developer flexibility to mix and match tools, in practice they are going to use both Bison+Flex as a combo, for a combined task of "implement a parser". That's my own gut, anyway, based on my professional experience in language technology. I've dabbled on this problem as a hobby project for about a year and a half, and at this point, I am ready to share a detailed proposal <https://docs.google.com/document/d/1TuUDPB5RH842U-xbKut16q1Xl3gJYYm8dyYLWUO9G6Q/edit#> as well as a prototype <https://github.com/lexspoon/bison/compare/master...lex-lexer> for the C language backend. I'm wondering what to do next. To get an idea of what I'm thinking, here is an example of the calculator example, but with a built-in lexer. Take a look at the new "%%tokens" section, halfway through the file. It includes a lexical pattern for each token type. This section replaces the old "%token" declaration, and instead of just declaring the tokens, it provides enough info that Bison can generate yylex(). This design has a lot of little niceties other than the basic feature itself. There is automatic location tracking, a modern approach to Unicode based on UTF-8, a cleaned-up version of mode support, and incorporation of line-oriented syntax, an area of syntax that seems pretty important nowadays. See the proposal for details. %code { #include <stdlib.h> void yyerror (const char *s); } %define api.value.type union %type <int> expr term fact %% input: %empty | input line ; line: NL | expr NL { printf ("%d\n", $1); } | error NL { yyerrok; } ; expr: expr PLUS term { $$ = $1 + $3; } | expr MINUS term { $$ = $1 - $3; } | term ; term: term TIMES fact { $$ = $1 * $3; } | term DIVIDE fact { $$ = $1 / $3; } | fact ; fact: NUM { $$ = atoi($1); } | LPAREN expr RPAREN { $$ = $2; } ; %%tokens DIVIDE: "/" LPAREN: "(" MINUS: "-" NL: "\n" NUM: [0-9]+ ("." [0-9]+)? PLUS: "+" RPAREN: ")" TIMES: "*" WS: [ \t\r]+ -> skip %% void yyerror (const char *s) { fprintf (stderr, "%s\n", s); } int main (int argc, char const* argv[]) { return yyparse (); } That's what I'm thinking. What does the group figure about where to go next? If the sentiment is positive, then perhaps we can talk about what the right kind of design review would be, and on what the checklist or process would be before it feels ready to the maintainers for inclusion in the main Git branch. I've opened the document for comments, for anyone that wants to interact in that way. If the sentiment is not that great, then no hard feelings. I realize this post is coming out of the blue. In that case, I'll leave my fork on GitHub, and I'll regroup. Lex Spoon
