Hi Daiki, Daiki Ueno <[email protected]> writes: > Hi Miguel, > > Miguel Ángel <[email protected]> writes: > > > I have implemented a very basic support for escaped unicode code > > points. > > Cool. I haven't had time to review formally but some comments are > below. >
Thank you very much for the review; It was not an ITF RFC, I knew it was basically wrong. :) > > +#define P7_UNICODE (1000 + 'u') > > Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of > 'phase5_get'? Like the Python parser? > No problem, but a change in 'phase5_getc' has to be done to store the actual character, something like mixed_string_buffer to translate the unicode codepoint to the local encoding. > > I am not very sure if I have to change always > > 'xgettext_current_source_encoding'. I have looked into x-java.c code. > > The patch sets 'xgettext_current_source_encoding' to UTF-8 when it > detects Unicode escapes. I guess it only works if the source code > encoding (see "gcc -finput-charset") is UTF-8. > > I'm also not very sure how to handle this case though, maybe we should > adjust to 'xgettext_global_source_encoding', if it is not ASCII? > I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to translate each non-ASCII string to UTF-8. Is it the default encoding for PO(T) files? Nevertheless, when 'xgettext_current_source_encoding' is ASCII (so the translation is not possible) gettext() could receive a UTF-8 string (u8"") that will be UTF-8 in the execution, even with ASCII source and execution encoding, so that string must be extracted in UTF-8. If I am not wrong, It is implementation defined when the string is a "" string literal. Also we translate string to UTF-8 when we have seen an unicode character (with mixed_string_buffer) and change 'xgettext_current_source_encoding' during the call to 'remember_a_message' (where CONVERT_STRING is used) but only with that string, because it is not in the source encoding. This would work with an ASCII file and u8"" string literals, but I am not sure that It works with an ISO-8859-1 file and EE and "" string literals (e.g. with \u00e9, 'é', a character representable in that character set) and I mean also gettext(), not only xgettext. I am not sure where starts the "implementation defined" line. > > I also have to extend testsuite, because I have tested it with simple > > files and my current make check (with GtkBuilder support). > > Nice. A minor thing, it might be good to use spaces consistently in the > source code. > Sorry, I am far away from perfection. :) Please, feel free to point me any problem you are able to see. I try to review every patch several times, but sometimes I miss a (big) point. > Regards, Best regards, Miguel
