Hi Christian, On Fri, Aug 14, 2020 at 11:11 AM Christian Schoenebeck < schoeneb...@crudebyte.com> wrote:
> On Donnerstag, 13. August 2020 07:49:52 CEST Giacinto Cifelli wrote: > > Hi all, > > > > I am wondering if it is possible to interpret a c-preprocessor (the > second > > preprocessor, not the one expanding trigrams and removing "\\\n") or an > m4 > > grammar through bison, and in case if it has already been done. > > I think this kind of tool does not produce a type-2 Chomsky grammar, > > rather a type-1 or even type-0. > > The common classification of languages like C I think is "attributed > context- > free language", and it is in chomsky-2. > > If you just need to handle the preprocessor part, then all you need is a > lexer > with stack enabled. A parser (e.g. Bison) only becomes relevant if you also > need to process the aspects that come after the preprocessor. > > > Any idea how to build something like an AST from it? > > > > The purpose would be to use in a text editor, to know how to format for > > example a block between #if/#endif (according to the condition, for > example > > could be greyed out if false), > > Just to give you a basic idea how this can be done e.g. with Flex, *very* > roughly (i.e. you have to complete it yourself): > > > /* enable functions yy_push_state(), yy_pop_state(), yy_top_state() */ > %option stack > > /* inclusive scanner conditions */ > %s PREPROC_BODY_USE > /* exclusive scanner conditions */ > %x PREPROC_DEFINE PREPROC_DEFINE_BODY PREPROC_IF PREPROC_BODY_EAT > > DIGIT [0-9] > ID [a-zA-Z][a-zA-Z0-9_]* > > %% > > /* #define <name> <body> */ > > <*>"#define"[ \t]* { > yy_push_state(PREPROC_DEFINE, yyscanner); > yyextra->token = PreprocessorToken(yytext); > return PREPROC_TOKEN_TYPE; > } > > <PREPROC_DEFINE>{ID} { > yy_pop_state(yyscanner); > yy_push_state(PREPROC_DEFINE_BODY, yyscanner); > yyextra->macro_name = yytext; > yyextra->token = PreprocessorToken(yytext); > return PREPROC_TOKEN_TYPE; > } > > <PREPROC_DEFINE_BODY>[^$]* { > yy_pop_state(yyscanner); > yyextra->token = PreprocessorToken(yytext); > yyextra->macro_table[yyextra->macro_name] = yytext; > return PREPROC_TOKEN_TYPE; > } > > > /* > #if <condition> > <body> > #endif > */ > > <*>#if[ \t]* { > yy_push_state(PREPROC_IF, yyscanner); > yyextra->token = PreprocessorToken(yytext); > return PREPROC_TOKEN_TYPE; > } > > <PREPROC_IF>{ID} { > yy_pop_state(yyscanner); > if (evaluate(yyextra->macro_table[yytext])) > yy_push_state(PREPROC_BODY_USE, yyscanner); > else > yy_push_state(PREPROC_BODY_EAT, yyscanner); > yyextra->token = PreprocessorToken(yytext); > return PREPROC_TOKEN_TYPE; > } > > <PREPROC_BODY_EAT>.* /* eat up code block filtered out by preprocessor */ > > <*>.*"#endif" { > yy_pop_state(yyscanner); > yyextra->token = PreprocessorToken(yytext); > return PREPROC_TOKEN_TYPE; > } > > /* Language keywords */ > > if|else|const|switch|case|int|unsigned { > yyextra->token = KeywordToken(yytext); > return KEYWORD_TOKEN_TYPE; > } > > /* String literal */ > > \"[^"]*\" { > yyextra->token = StringLiteralToken(yytext); > return STRING_LITERAL_TYPE; > } > > /* Number literal */ > > {DIGIT}+("."{DIGIT}+)? { > yyextra->token = NumberLiteralToken(yytext); > return NUMBER_LITERAL_TYPE; > } > > /* Other tokens */ > > <*>. { > yyextra->token = OtherToken(yytext); > return OTHER_TOKEN_TYPE; > } > > %% > > Thank you for taking the time to answer, unfortunately this isn't exactly what I was looking for. I am more interested in building a structure from a macro syntax than simply expanding them. Regards, Giacinto