[GENERAL] UTF-8 and =, LIKE problems

Edmund Lian Wed, 03 Nov 2004 20:55:35 -0800

I am running a web-based accounting package (SQL-Ledger) that supports multiple languages on PostgreSQL. When a database encoding is set to Unicode, multilingual operation is possible.

However, when a user's input language is set to say English, and the user enters data such as "79", the data that is sent back to PostgreSQL for storage is U+FF17 U+FF19, which are the Unicode half width characters "79". So far so good.

Now, if the user switches languages and enters "79" as a search key, the previously entered row will not be found with the LIKE or = operators, and all other comparison operations will fail too. The problem is that the browser now sends back U+0037 U+0039, which are Unicode full width characters for "79".

Semantically, one might expect U+FF17 U+FF19 to be identical to U+0037 U+0039, but of course they aren't if a simple-minded byte-by-byte or character-by-character comparison is done.

In the ideal case, one would probably want to convert all full width chars to their half width equivalents because the numbers look wierd on the screen (e.g., "7 9 B r i s b a n e S t r e e t" instead of "79 Brisbane Street". Is there any way to get PostgreSQL to do so?

Failing this, is there any way to get PostgreSQL to be a bit smarter in doing comparisons? I think I'm SOL, but I thought I'd ask anyway.


...Edmund.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[GENERAL] UTF-8 and =, LIKE problems

Reply via email to