[HACKERS] Unicode escapes in literals

Peter Eisentraut Thu, 23 Oct 2008 01:42:26 -0700

I would like to add an escape mechanism to PostgreSQL for enteringarbitrary Unicode characters into string literals. We currently onlyhave the option of entering the character directly via the keyboard orcut-and-paste, which is difficult for a number of reasons, such as whenthe font doesn't have the character, and entering the UTF8-encoded bytesusing the E'...' strings, which is hardly usable.


SQL has the following escape syntax for it:


   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint. So this is pretty muchjust another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversionroutines, we can't convert the Unicode codepoint to the server encodingin the scanner stage. I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8. Thiswould probably cover most use cases anyway. We could have limitedsupport for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call. But that would then create alot of inconsistencies, such as needing functional indexes for matchesagainst what should really be a literal.


I'd be happy to start with UTF8 support only.  Other ideas?

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unicode escapes in literals

Reply via email to