On 4/16/09, Sam Mason <s...@samason.me.uk> wrote: > On Wed, Apr 15, 2009 at 11:19:42PM +0300, Marko Kreen wrote: > > On 4/15/09, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Given Martijn's complaint about more-than-16-bit code points, I think > > > the \u proposal is not mature enough to go into 8.4. We can think > > > about some version of that later, if there's enough interest. > > > > I think it would be good idea. Basically we should pick one from > > couple of pre-existing sane schemes. Here is quick summary > > of Python, Perl and Java: > > > > Python [1]: > > > > \uXXXX - 16-bit codepoint > > \UXXXXXXXX - 32-bit codepoint > > \N{char-name} - Characted by name > > > Microsoft have also gone this way in C#, named code points are not > supported however.
And it handles also non-BMP codepoints with \u escape similarly: http://en.csharp-online.net/ECMA-334:_9.4.1_Unicode_escape_sequences This makes it even more standard. > > Perl [2]: > > > > \x{XXXX..} - {} contains hexadecimal codepoint > > \N{char-name} - Unicode char name > > > Looks OK, but the 'x' seems somewhat redundant. Why not just: > > \{xxxx} > > This would be following the BitC[2] project, especially if it was more > like: > > \{U+xxxx} > > e.g. > > \{U+03BB} > > would be the lowercase lambda character. Added appeal is in the fact > that this (i.e. U+03BB) is how the Unicode consortium spells code > points. We already got yet-another-unique-way-of-escaping-unicode with U&. Now let's try to support some actual standard also. > > Java [3]: > > > > \uXXXX - 16-bit codepoint > > > AFAIK, Java isn't the best reference to choose; it assumed from an early > point in its design that Unicode characters were at most 16bits and > hence had to switch its internal representation to UTF-16. I don't > program much Java these days to know how it's all worked out, but it > would be interesting to hear from people who regularly have to deal with > characters outside the BMP (i.e. code points greater than 65535). You did not read my mail carefully enough - the Java and also Python/C# already support non-BMP chars with '\u' and exactly the same (utf16) way. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers