[il-antlr-interest: 33260] [antlr-interest] C target: unhelpful error messages from the default error handler in trivial cases

Vlad Wed, 20 Jul 2011 18:50:13 -0700

Greetings,

Like apparently many new ANTLR users, I've borrowed the implementation from
the default displayRecognitionError() to implement my own version. Somewhat
unfortunately, this version generates unhelpful/random errors in rather
trivial cases. Here is a full example:


grammar testerrors;

options
{
    language='C';
}

NAME    :   ( 'a'..'z' | 'A'..'Z' | '0'..'9' )+ ;
WS      :   ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN; } ;

parse:
    decl ( options { greedy = true; }: ',' decl )* ','? EOF
    ;

decl:
    NAME ':' type
    ;

type:
    'int' | 'float'
    ;

Feeding "A : badtype" into parse() results in:

-memory-(1)  : error 10 : Unexpected token, at offset 3
    near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line:
1 LinePos:3]
     : Missing <invalid>

What puzzles me is where the <invalid> comes from. It would seem easy to
compute that either 'int' or 'float' token was expected. In the stock error
handler this comes from tokenNames[ex->expecting] evaluated for
ex->expecting being 0. What change to the default implementation is
necessary to make this work correctly?

Similary, attempting to parse "A :" results in:

-unknown source-(1)  : error 10 : Unexpected token, at offset -1
    near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line:
1 LinePos:1]
     : Missing <invalid>

Note how the source became "unknown" and the offset became -1. In the
default handler this is determined by "streamName" as follows:

if (ex->streamName == NULL)
{
if (((pANTLR3_COMMON_TOKEN)(ex->token))->type == ANTLR3_TOKEN_EOF)
{
ANTLR3_FPRINTF(stderr, "-end of input-(");
}
else
{
ANTLR3_FPRINTF(stderr, "-unknown source-(");
}
}
else
{
ftext = ex->streamName->to8(ex->streamName);
ANTLR3_FPRINTF(stderr, "%s(", ftext->chars);
}

and it is frankly unexpected that a slightly different match error type
should have this impact since it does not impact the branches taken here at
all (that happens later in the function). Anyone trying to take this
function as a blueprint for their own handler would conclude that
ex->streamName is NULL in one case but not the other and that is set
somewhere *outside* of displayRecognitionError(): the problem of fixing the
default implementation begins to feel like it might snowball into patching
the runtime itself.

As the last example, trying to parse "A B" results in:

-memory-(1)  : error 1 : Unexpected token, at offset 1
    near [Index: 2 (Start: 15787098-Stop: 15787098) ='B', type<4> Line: 1
LinePos:1]
     : syntax error...

The start/stop indices are bogus, i.e. some uninitialized variables -- on
repeated parses they change randomly.

My second question follows. Good error handling is a big selling point of
ANTLR, but with all due respect it hardly seems so for the C target. Is
there documentation available for all context relevant to handling main
mismatch error conditions? I have scanned everything in the available
examples download and there are no examples of customizing the error handler
that I can find. Alternatively, could someone share a workable version of
their own that might be a good learning example?

Thank you,
Vlad

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 33260] [antlr-interest] C target: unhelpful error messages from the default error handler in trivial cases

Reply via email to