> Le 2 juil. 2019 à 14:15, Hans Åberg <haber...@telia.com> a écrit :
>
>
>> On 2 Jul 2019, at 07:08, Akim Demaille <a...@lrde.epita.fr> wrote:
>>
>> Hi Hans,
>
> Hello,
>
>>> Le 18 juin 2019 à 18:09, Hans Åberg <haber...@telia.com> a écrit :
>>>
>>> As 8-bit character tokens are not useful with UTF-8, I have replaced it
>>> with:
>>> %token token_error "token error"
>>>
>>> . { return my_parser::token::token_error; }
>>>
>>> Please let me know if there is a better way to generate a parser error.
>>
>> I personally prefer to throw an exception.
>>
>> . throw parser::syntax_error(loc, "invalid character: "s + yytext);
>
> I changed to that too, writing to make it look as though thrown by the parser:
> . { throw my_parser::syntax_error(yylloc, "syntax error, unexpected my_parser
> token error.");
>
> When the match is a part of an UTF-8 byte, it is not useful to report what it
> is.
You have a point. I would still report the culprit, but improve the pattern.
/* UTF-8 Encoded Unicode Code Point, from Flex's documentation. */
mbchar
[\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
%%
{mbchar} throw parser::syntax_error(loc, "invalid character: "s + yytext);
. throw parser::syntax_error(loc, "invalid byte: "s + yytext);