Hi,

On 7/5/22 09:55, 徐研辉 wrote:

I encountered some troubles when attempting to recognize identifiers of C language via flex. As 
for program shown below, what identifier returned by flex was "main", whereas bison 
got "main(){
    return 3;
}".

This is a rather common mistake: yytext is a pointer into the input buffer, which contains more data beyond the current token. You need to look at the yyleng variable to get the length of the current token, and you may only look at the text of the token before fetching the next token.

An expensive, but safe, way is to return a copy

    {IDENTIFIER}    yylval.id = strndup(yytext, yyleng); return ID;

This allows rules like

    struct_definition: "struct" ID '{' struct_members '}' ';'

to work properly, because the yytext pointer can become invalid during the yylex call that gets the '{' token.

The alternative is something like

    %union {
        struct {
            char const *text;
            unsigned int leng;
        } sv;
        ...
    }
    %token<sv> IDENTIFIER

to get a stringview-like type into yylval, and then use

    {IDENTIFIER}    {
                        yylval.sv.text = yytext;
                        yylval.sv.leng = yyleng;
                        return IDENTIFIER;
                    }

but this needs to be evaluated immediately, either in a mid-rule action, like

    goto_statement:
        "goto"
        IDENTIFIER
        <label>{
            $$=find_or_create_label($0.text, $0.leng);
        }
        ';'
    ;

or in a sub-production

    goto_statement: "goto" label ';'

    label: IDENTIFIER
        { $$ = find_or_create_label($1); }

Hope this helps,

   Simon

Attachment: OpenPGP_0xEBF67A846AABE354.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to