The following bug has been logged on the website: Bug reference: 8105 Logged by: András Kolesár Email address: p...@kolesar.hu PostgreSQL version: 9.1.5 Operating system: Windows Description:
If I specify an unicode field name without quotes, field name gets lowecased incorrectly. pgAdmin 1.14.2 on Linux, PostgreSQL server 9.1.5 on Windows: SELECT érték FROM (SELECT 1 AS "érték") AS x; ********** Error ********** SQL state: 42703 Character: 8 In the example above I specify an unicode column name ("érték" means "value" in Hungarian), then I try to read it. If I use double quotes in the outer query, it works. However, the above example works fine if the server runs on Linux: "PostgreSQL 9.1.9 on i686-pc-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 32-bit" I see the same problem from PHP client. There is a more verbose error message: ERROR: column "�rt�k" does not exist LINE 1: SELECT érték FROM (SELECT 1 AS "érték") AS x ^ The "é" character is represented incorrectly in the error message, it shows where the problem is. This character (U+00E9) is represented in UTF8 as C3 A9. In the error message it is an invalid UTF8 sequence: E3 A9. I think Windows uses Windows-1250 or Windows-1252 character set where C3 lowers to E3. A9 survives tolower() because it means © (copyright sign) in these charsets, without lowercase pair. I have localized the problem in PostgreSQL source: src/backend/parser/scansup.c:128 char * downcase_truncate_identifier(const char *ident, int len, bool warn) { // ... for (i = 0; i < len; i++) // ... if (IS_HIGHBIT_SET(ch) && isupper(ch)) ch = tolower(ch); This function walks through identifiers byte-by-byte, lowers them if they were individual characters. This is incorrect in multibyte character sets. It works on Linux with UTF8 system encoding because isupper() returns false both for C3 and A9. The same issue is reported below: Database object names and libpq in UTF-8 locale on Windows http://permalink.gmane.org/gmane.comp.db.postgresql.sql/29464 Solution 1: tolower() only A-Z. Solution 2: use a lowercase function that uses client_encoding -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs