Hi, On 7/5/22 09:55, 徐研辉 wrote:
I encountered some troubles when attempting to recognize identifiers of C language via flex. As for program shown below, what identifier returned by flex was "main", whereas bison got "main(){ return 3; }".
This is a rather common mistake: yytext is a pointer into the input buffer, which contains more data beyond the current token. You need to look at the yyleng variable to get the length of the current token, and you may only look at the text of the token before fetching the next token.
An expensive, but safe, way is to return a copy {IDENTIFIER} yylval.id = strndup(yytext, yyleng); return ID; This allows rules like struct_definition: "struct" ID '{' struct_members '}' ';'to work properly, because the yytext pointer can become invalid during the yylex call that gets the '{' token.
The alternative is something like %union { struct { char const *text; unsigned int leng; } sv; ... } %token<sv> IDENTIFIER to get a stringview-like type into yylval, and then use {IDENTIFIER} { yylval.sv.text = yytext; yylval.sv.leng = yyleng; return IDENTIFIER; }but this needs to be evaluated immediately, either in a mid-rule action, like
goto_statement: "goto" IDENTIFIER <label>{ $$=find_or_create_label($0.text, $0.leng); } ';' ; or in a sub-production goto_statement: "goto" label ';' label: IDENTIFIER { $$ = find_or_create_label($1); } Hope this helps, Simon
OpenPGP_0xEBF67A846AABE354.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature