Radek Strnad wrote:
Progress so far:
- created catalogs pg_collation a pg_charset which are filled with three
standard collations
- initdb changes rows called "DEFAULT" in both catalogs during the bki
bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
command line.
- new collations can be defined with command CREATE COLLATION <collation
name> FOR <character set specification>  FROM <existing collation name>
[STRCOLFN <fn name>]
[ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [
LCCTYPE <lc_ctype> ]
- because of pg_collation and pg_charset are catalogs individual for each
database, if you want to create a database with collation other than
specified, create it in template1 and then create database

I have to wonder, is all that really necessary? The feature you're trying to implement is to support database-level collation at first, and perhaps column-level collation later. We don't need support for user-defined collations and charsets for that.

If leave all that out of the patch for now, we'll have a much slimmer, and just as useful patch, implementing database-level collation. We can add those catalogs later if we need them, but I don't think there's much point in adding all that infrastructure if they just reflect the locales installed in the operating system.

- when connecting to database, it retrieves locales from pg_database and
sets them

This is the real gist of this patch.

Design & functionality changes left:
- move retrieveing collation from pg_database to pg_type

I don't understand this item. What will you move?

- get case sensitivity and pad characteristic working

I feel we should leave this to the collation implementation.

- when creating database with different collation than database cluster, the
database has to be reindexed. Any idea how to do it? Function
ReindexDatabase works only when database is opened.

That's a tricky one. One idea is to prohibit choosing a different collation than the one in the template database, unless we know it's safe to do so without reindexing. The problem is that we don't know whether it's safe. A simple but limiting solution would be to require that the template database has the same collation as the database that's being created, except that template0 can always be used as template. template0 is safe, because there's no indexes on text columns there.

Note that we already have the same problem with encodings. If you create a database with LATIN1 encoding, load it with data, and then use that as a template for a database with UTF-8 encoding, the text data will be incorrectly encoded. We should probably fix that too.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to