I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.
Here is my code: const wchar_t *strName = L"id_äß"; wstring strCreate = wstring(L"create table test_umlaut(") + strName + L" integer primary key)"; PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres", "******"); if (!pConn) FAIL; if (PQsetClientEncoding(pConn, "UTF-8")) FAIL; PGresult *pResult = PQexec(pConn, "drop table test_umlaut"); if (pResult) PQclear(pResult); pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str()); if (pResult) PQclear(pResult); pResult = PQexec(pConn, "select * from test_umlaut"); if (!pResult) FAIL; if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL; if (PQnfields(pResult)!=1) FAIL; const char *fName = PQfname(pResult,0); ShowW("Name: ", strName); ShowA("in UTF8: ", ToUtf8(strName).c_str()); ShowA("from DB: ", fName); ShowW("in UTF16: ", ToWide(fName).c_str()); PQclear(pResult); PQreset(pConn); (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.) And this is the output generated: Name: id_äß in UTF8: id_äß from DB: id_ã¤ãÿ in UTF16: id_??? It seems like the backend thinks the name is in ANSI encoding, not in UTF-8. If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name. Am I missing some setup in either the database or in the use of libpq? I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit The database uses: ENCODING = 'UTF8' LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' Thanks for any help, Martin