> Le 2 juil. 2019 à 14:15, Hans Åberg <haber...@telia.com> a écrit :
> 
> 
>> On 2 Jul 2019, at 07:08, Akim Demaille <a...@lrde.epita.fr> wrote:
>> 
>> Hi Hans,
> 
> Hello,
> 
>>> Le 18 juin 2019 à 18:09, Hans Åberg <haber...@telia.com> a écrit :
>>> 
>>> As 8-bit character tokens are not useful with UTF-8, I have replaced it 
>>> with:
>>> %token token_error "token error"
>>> 
>>> . { return my_parser::token::token_error; }
>>> 
>>> Please let me know if there is a better way to generate a parser error.
>> 
>> I personally prefer to throw an exception.
>> 
>> .   throw parser::syntax_error(loc, "invalid character: "s + yytext);
> 
> I changed to that too, writing to make it look as though thrown by the parser:
> . { throw my_parser::syntax_error(yylloc, "syntax error, unexpected my_parser 
> token error.");
> 
> When the match is a part of an UTF-8 byte, it is not useful to report what it 
> is.

You have a point.  I would still report the culprit, but improve the pattern.

 /* UTF-8 Encoded Unicode Code Point, from Flex's documentation. */
mbchar    
[\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})

%%

{mbchar}  throw parser::syntax_error(loc, "invalid character: "s + yytext);
.         throw parser::syntax_error(loc, "invalid byte: "s + yytext);


Reply via email to