Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)

Miguel Ángel Wed, 13 Feb 2013 09:02:21 -0800

Hi Daiki,

Daiki Ueno <[email protected]> writes: 
> Hi Miguel,
> 
> Miguel Ángel <[email protected]> writes:
> 
> > I have implemented a very basic support for escaped unicode code
> > points.
> 
> Cool.  I haven't had time to review formally but some comments are
> below.
>


Thank you very much for the review; It was not an ITF RFC, I knew it was
basically wrong. :)

> > +#define P7_UNICODE (1000 + 'u')
> 
> Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of
> 'phase5_get'?  Like the Python parser?
> 

No problem, but a change in 'phase5_getc' has to be done to store the
actual character, something like mixed_string_buffer to translate the
unicode codepoint to the local encoding.

> > I am not very sure if I have to change always
> > 'xgettext_current_source_encoding'. I have looked into x-java.c code.
> 
> The patch sets 'xgettext_current_source_encoding' to UTF-8 when it
> detects Unicode escapes.  I guess it only works if the source code
> encoding (see "gcc -finput-charset") is UTF-8.
> 
> I'm also not very sure how to handle this case though, maybe we should
> adjust to 'xgettext_global_source_encoding', if it is not ASCII?
> 

I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to
translate each non-ASCII string to UTF-8. Is it the default encoding for
PO(T) files?

Nevertheless, when 'xgettext_current_source_encoding' is ASCII (so the
translation is not possible) gettext() could receive a UTF-8 string
(u8"") that will be UTF-8 in the execution, even with ASCII source and
execution encoding, so that string must be extracted in UTF-8. If I am
not wrong, It is implementation defined when the string is a "" string
literal.

Also we translate string to UTF-8 when we have seen an unicode character
(with mixed_string_buffer) and change 'xgettext_current_source_encoding'
during the call to 'remember_a_message' (where CONVERT_STRING is used)
but only with that string, because it is not in the source encoding.
This would work with an ASCII file and u8"" string literals, but I am
not sure that It works with an ISO-8859-1 file and EE and "" string
literals (e.g. with \u00e9, 'é', a character representable in that
character set) and I mean also gettext(), not only xgettext. I am not
sure where starts the "implementation defined" line.

> > I also have to extend testsuite, because I have tested it with simple
> > files and my current make check (with GtkBuilder support).
> 
> Nice.  A minor thing, it might be good to use spaces consistently in the
> source code.
> 

Sorry, I am far away from perfection. :)
Please, feel free to point me any problem you are able to see. I try to
review every patch several times, but sometimes I miss a (big) point.

> Regards,

Best regards,
Miguel

Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)

Reply via email to