pg_mb2wchar_with_len() converts server encoded strings to pg_wchar
strings. But pg_wchar is typedef'd as unsigned int which is not the
same as wchar_t at least on Windows (unsigned short).
Oops. The problem is here. TParserInit allocates twice less memory than needed. And it happens if sizeof(wchar_t) < sizeof(pg_wchar) and C-locale for non-Windows box. Also for Windows, encoding should be non-utf. So, all p_is* functions are broken in this case because they work with wrong data.

.
I modified it corresponding to the change in char2wchar() so that
wchar2char(char2wchar(x)) becomes x. Though I'm not sure if it is
mbstowcs/wcstombs doesn't work with C-locale in other OSes too, so that's not needed.

If there's an effective function like pg_wchar2mb_with_len() which
converts wchar_t strings to server encoded strings, we had better
simply call it for char2wchar().
I don't see a way to produce correct result of char2wchar with C-locale and sizeof(wchar_t) = 2.

In summary, I suggest to remove support of C-locale from char2wchar function and tsearch's parser should directly use pg_mb2wchar_with_len() in case of C-locale and multibyte encoding. In all other places char2wchar is called only for non-C locale.

Please, test attached patch.

--
Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/

Attachment: clocale.patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to