> Le 10 nov. 2018 à 10:38, Hans Åberg <[email protected]> a écrit :
>
>> Also, see if using %param does not already
>> give you what you need to pass information from the scanner to the
>> parser’s yyerror.
>
> How would that get into the yyerror function?
In C, arguments of %parse-param are passed to yyerror. That’s why I mentioned
%param, not %lex-param. And in the C++ case, these are members.
>>>> I believe that the right approach is rather the one we have in compilers
>>>> and in bison: caret errors.
>>>>
>>>> $ cat /tmp/foo.y
>>>> %token FOO 0xff 0xff
>>>> %%
>>>> exp:;
>>>> $ LC_ALL=C bison /tmp/foo.y
>>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>>>> %token FOO 0xff 0xff
>>>> ^^^^
>>>> I would have been bothered by « unexpected 255 ».
>>>
>>> Currently, that’s for those still using only ASCII.
>>
>> No, it’s not, it works with UTF-8. Bison’s count of characters is mostly
>> correct. I’m talking about Bison’s own location, used to parse grammars,
>> which is improved compared to what we ship in generated parsers.
>
> Ah. I thought of errors for the generated parser only. Then I only report
> byte count, but using character count will probably not help much for caret
> errors, as they vary in width. Then problem is that caret errors use two
> lines which are hard to synchronize in Unicode. So perhaps some kind of one
> line markup instead might do the trick.
Two things:
One is that the semantics of Bison’s location’s column is not specified:
it is up the user to track characters or bytes. As a matter of fact, Bison
is hardly concerned by this choice; rather it’s the scanner that has to
deal with that.
The other one is: once you have the location, you can decide how to display
it. In the case of Bison, I think the caret errors are fine, but you
could decide to do something different, say use colors or delimiters, to
be robust to varying width.
>>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display
>>> properly. In fact, I am using special code to even write out Unicode
>>> characters in the error strings, since Bison assumes all strings are ASCII,
>>> the bytes with the high bit set being translated into escape sequences.
>>
>> Yes, I’m aware of this issue, and we have to address it.
>
> For what I could see, the function that converts it to escapes is sometimes
> applied once and sometimes twice, relying on that it is an idempotent.
It’s a bit more tricky than this. I’m looking into it, and I’d like
to address this in 3.3.
>> We also have to provide support for internationalization of
>> the token names.
>
> Personally, I don't have any need for that. I use strings, like
> %token logical_not_key "¬"
> %token logical_and_key "∧"
> %token logical_or_key "∨"
> and in the case there are names, they typically match what the lexer
> identifies.
Yes, not all the strings should be translated. I was thinking of
something like
%token NUM _("number")
%token ID _("identifier")
%token PLUS "+"
This way, we can even point xgettext to looking at the grammar file
rather than the generated parser.
_______________________________________________
[email protected] https://lists.gnu.org/mailman/listinfo/help-bison