I try to create database columns with umlauts, using the UTF8 client encoding. 
However, the server seems to mess up the column names. In particular, it seems 
to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.

Here is my code:

    const wchar_t *strName = L"id_äß";
    wstring strCreate = wstring(L"create table test_umlaut(") + strName + L" 
integer primary key)";

    PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres", 
"******");
    if (!pConn) FAIL;
    if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;

    PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
    if (pResult) PQclear(pResult);

    pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
    if (pResult) PQclear(pResult);

    pResult = PQexec(pConn, "select * from test_umlaut");
    if (!pResult) FAIL;
    if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
    if (PQnfields(pResult)!=1) FAIL;
    const char *fName = PQfname(pResult,0);

    ShowW("Name:     ", strName);
    ShowA("in UTF8:  ", ToUtf8(strName).c_str());
    ShowA("from DB:  ", fName);
    ShowW("in UTF16: ", ToWide(fName).c_str());

    PQclear(pResult);
    PQreset(pConn);

(ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use 
WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)

And this is the output generated:

Name:     id_äß
in UTF8:  id_äß
from DB:  id_ã¤ãÿ
in UTF16: id_???

It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
If I change the strCreate query and add double quotes around the column name, 
then the problem disappears. But the original name is already in lowercase, so 
I think it should also work without quoting the column name.
Am I missing some setup in either the database or in the use of libpq?

I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

The database uses:
ENCODING = 'UTF8'
LC_COLLATE = 'English_United Kingdom.1252'
LC_CTYPE = 'English_United Kingdom.1252'

Thanks for any help,

Martin

Reply via email to