Re: plpython_unicode test (was Re: [HACKERS] buildfarm / handling (undefined) locales)

Andrew Dunstan Sun, 01 Jun 2014 14:59:16 -0700


On 06/01/2014 05:35 PM, Tom Lane wrote:

I wrote:

3. Try to select some "more portable" non-ASCII character, perhaps U+00A0
(non breaking space) or U+00E1 (a-acute).  I think this would probably
work for most encodings but it might still fail in the Far East.  Another
objection is that the expected/plpython_unicode.out file would contain
that character in UTF8 form.  In principle that would work, since the test
sets client_encoding = utf8 explicitly, but I'm worried about accidental
corruption of the expected file by text editors, file transfers, etc.
(The current usage of U+0080 doesn't suffer from this risk because psql
special-cases printing of multibyte UTF8 control characters, so that we
get exactly "\u0080".)

I did a little bit of experimentation and determined that none of the
LATIN1 characters are significantly more portable than what we've got:
for instance a-acute fails to convert into 16 of the 33 supported
server-side encodings (versus 17 failures for U+0080).  However,
non-breaking space is significantly better: it converts into all our
supported server encodings except EUC_CN, EUC_JP, EUC_KR, EUC_TW.
It seems likely that we won't do better than that except with a basic
ASCII character.


Yeah, I just looked at the copyright symbol, with similar results.

Let's just stick to ASCII.

cheers

andrew



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: plpython_unicode test (was Re: [HACKERS] buildfarm / handling (undefined) locales)

Reply via email to