I would like to add an escape mechanism to PostgreSQL for entering arbitrary Unicode characters into string literals. We currently only have the option of entering the character directly via the keyboard or cut-and-paste, which is difficult for a number of reasons, such as when the font doesn't have the character, and entering the UTF8-encoded bytes using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:

   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint. So this is pretty much just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion routines, we can't convert the Unicode codepoint to the server encoding in the scanner stage. I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8. This would probably cover most use cases anyway. We could have limited support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call. But that would then create a lot of inconsistencies, such as needing functional indexes for matches against what should really be a literal.

I'd be happy to start with UTF8 support only.  Other ideas?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to