On Wed, Jun 22, 2011 at 3:01 AM, Pavel Stehule <pavel.steh...@gmail.com>wrote:
> Hello Peter > > > > Pavel suggested using a collation of ucs_basic, but I get an error when I > > try that on linux: > > $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test > > createdb: database creation failed: ERROR: invalid locale name ucs_basic > > isn't this a bug in collations? > The more I read about this, the more this would appear to be the case. It looks like the SQL standard has some baseline collations that are required and it isn't at all clear how one would access those in postgres if the host in question doesn't have those locale's defined on the host. UCS_BASIC is a SQL collation, but doesn't appear to have an explicit definition on a 'standard' linux host (CentOS 5, in my case). There is another SQL collation called 'UNICODE' which is supposed to obey the Unicode Collation Algorithm with the Default Unicode Collation Element Table defined in Unicode10. It looks like that collation is relatively sensitive to language-specific sort orders, though it isn't a required collation in the sql standard. I suspect that it is the UNICODE collation which actually would appear to be the most 'sensible' within the context of this discussion - characters in expected order, spaces honoured, case sensitive. I have so little experience with localization that I'm not sure if I'm reading this all correctly, though.