Greetings, Like apparently many new ANTLR users, I've borrowed the implementation from the default displayRecognitionError() to implement my own version. Somewhat unfortunately, this version generates unhelpful/random errors in rather trivial cases. Here is a full example:
grammar testerrors; options { language='C'; } NAME : ( 'a'..'z' | 'A'..'Z' | '0'..'9' )+ ; WS : ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN; } ; parse: decl ( options { greedy = true; }: ',' decl )* ','? EOF ; decl: NAME ':' type ; type: 'int' | 'float' ; Feeding "A : badtype" into parse() results in: -memory-(1) : error 10 : Unexpected token, at offset 3 near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line: 1 LinePos:3] : Missing <invalid> What puzzles me is where the <invalid> comes from. It would seem easy to compute that either 'int' or 'float' token was expected. In the stock error handler this comes from tokenNames[ex->expecting] evaluated for ex->expecting being 0. What change to the default implementation is necessary to make this work correctly? Similary, attempting to parse "A :" results in: -unknown source-(1) : error 10 : Unexpected token, at offset -1 near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line: 1 LinePos:1] : Missing <invalid> Note how the source became "unknown" and the offset became -1. In the default handler this is determined by "streamName" as follows: if (ex->streamName == NULL) { if (((pANTLR3_COMMON_TOKEN)(ex->token))->type == ANTLR3_TOKEN_EOF) { ANTLR3_FPRINTF(stderr, "-end of input-("); } else { ANTLR3_FPRINTF(stderr, "-unknown source-("); } } else { ftext = ex->streamName->to8(ex->streamName); ANTLR3_FPRINTF(stderr, "%s(", ftext->chars); } and it is frankly unexpected that a slightly different match error type should have this impact since it does not impact the branches taken here at all (that happens later in the function). Anyone trying to take this function as a blueprint for their own handler would conclude that ex->streamName is NULL in one case but not the other and that is set somewhere *outside* of displayRecognitionError(): the problem of fixing the default implementation begins to feel like it might snowball into patching the runtime itself. As the last example, trying to parse "A B" results in: -memory-(1) : error 1 : Unexpected token, at offset 1 near [Index: 2 (Start: 15787098-Stop: 15787098) ='B', type<4> Line: 1 LinePos:1] : syntax error... The start/stop indices are bogus, i.e. some uninitialized variables -- on repeated parses they change randomly. My second question follows. Good error handling is a big selling point of ANTLR, but with all due respect it hardly seems so for the C target. Is there documentation available for all context relevant to handling main mismatch error conditions? I have scanned everything in the available examples download and there are no examples of customizing the error handler that I can find. Alternatively, could someone share a workable version of their own that might be a good learning example? Thank you, Vlad List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.