Hello.

When a user try to create a text search dictionary for the russian language on Mac OS then called the following error message:

  CREATE EXTENSION hunspell_ru_ru;
+ ERROR:  invalid byte sequence for encoding "UTF8": 0xd1
+ CONTEXT: line 341 of configuration file "/Users/stas/code/postgrespro2/tmp_install/Users/stas/code/postgrespro2/install/share/tsearch_data/ru_ru.affix": "SFX Y хаться шутся хаться

Russian dictionary was downloaded from http://extensions.openoffice.org/en/project/slovari-dlya-russkogo-yazyka-dictionaries-russian Affix and dictionary files was extracted from the archive and converted to UTF-8. Also a converted dictionary can be downloaded from https://github.com/select-artur/hunspell_dicts/tree/master/ru_ru

This behavior occurs on:
- Mac OS X 10.10 Yosemite and Mac OS X 10.11 El Capitan.
- latest PostgreSQL version from git and PostgreSQL 9.5 (probably also on 9.4.5).

There is also the test to reproduce this bug in the attachment.

Did you meet this bug? Do you have a solution or a workaround?

Thanks in advance.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
#include <stdio.h>
#include <locale.h>

char *src = "SFX Y   хаться шутся        хаться";

int
main(int argc, char *argv[])
{
	char c1[1024], c2[1024], c3[1024], c4[1024], c5[1024];

	setlocale(LC_CTYPE, "ru_RU.UTF-8");

	sscanf(src, "%6s %204s %204s %204s %204s", c1, c2, c3, c4, c5);

	printf("%s/%s/%s/%s/%s\n", c1, c2, c3, c4, c5);

	return 0;
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to