Re: [bug-gettext] broken handling of unicode code point escapes in Tcl

Guido Berhoerster Tue, 25 Jun 2013 05:22:59 -0700

* Daiki Ueno <[email protected]> [2013-06-25 05:58]:
> Hi Guido,
> 
> Guido Berhoerster <[email protected]> writes:
> 
> > xgettext parsing of Tcl unicode code point escapes is broken, it tries
> > to replace the escape with the literal unicode character but does not
> > consume the last character of the escape but copies it into the output
> > which results in corrupt .po files, e.g.:
> >
> > $ cat gettext-bug.tcl
> > puts [msgcat::mc "Hello\u200e\u201cWorld\u201d"]
> >
> > $ /usr/bin/xgettext -o- gettext-bug.tcl
> > #: gettext-bug.tcl:5
> > msgid "Hello‎e“cWorld”d"
> > msgstr ""
> 
> Thanks for the report.
> 
> > It should probably not try to substitute these escapes at all as it
> > results in fragile .po files with embedded control characters, see
> > e.g. the U+200E left-to-right mark in the above example.
> 
> I've just pushed the attached patch (\x fix in the patch is not really
> necessay, sorry; partially reverted in the git).


Thanks for the quick fix, that substitution works correctly now.
I still wonder why you're substituting \u escapes with unicode
characters at all, as that potentially allows unescaped control
sequences which make the .po file quite fragile?
-- 
Guido Berhoerster

Re: [bug-gettext] broken handling of unicode code point escapes in Tcl

Reply via email to