Re: Character error not reported

Hans Åberg Wed, 03 Jul 2019 01:25:24 -0700


> On 3 Jul 2019, at 07:24, Akim Demaille <[email protected]> wrote:
> 
>> Le 2 juil. 2019 à 14:15, Hans Åberg <[email protected]> a écrit :
>> 
>>> On 2 Jul 2019, at 07:08, Akim Demaille <[email protected]> wrote:
>>> 
>>>> Le 18 juin 2019 à 18:09, Hans Åberg <[email protected]> a écrit :
>>>> 
>>>> As 8-bit character tokens are not useful with UTF-8, I have replaced it 
>>>> with:
>>>> %token token_error "token error"
>>>> 
>>>> . { return my_parser::token::token_error; }
>>>> 
>>>> Please let me know if there is a better way to generate a parser error.
>>> 
>>> I personally prefer to throw an exception.
>>> 
>>> .   throw parser::syntax_error(loc, "invalid character: "s + yytext);
>> 
>> I changed to that too, writing to make it look as though thrown by the 
>> parser:
>> . { throw my_parser::syntax_error(yylloc, "syntax error, unexpected 
>> my_parser token error.");
>> 
>> When the match is a part of an UTF-8 byte, it is not useful to report what 
>> it is.
> 
> You have a point.  I would still report the culprit, but improve the pattern.


As for Bison, I thought maybe a suggestion for better diagnostics.

> /* UTF-8 Encoded Unicode Code Point, from Flex's documentation. */
> mbchar    
> [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
> 
> %%
> 
> {mbchar}  throw parser::syntax_error(loc, "invalid character: "s + yytext);
> .         throw parser::syntax_error(loc, "invalid byte: "s + yytext);

Thanks for the suggestion. I made a Haskell program generating such regex 
patterns for UTF-8 and UTF-32 character classes, and also a C++ version.

I think though of testing my own software I mentioned before as a replacement 
for Flex.

Re: Character error not reported

Reply via email to