Re: [HACKERS] WIP patch: Collation support

Heikki Linnakangas Wed, 10 Sep 2008 01:32:07 -0700

Radek Strnad wrote:

Progress so far:
- created catalogs pg_collation a pg_charset which are filled with three
standard collations
- initdb changes rows called "DEFAULT" in both catalogs during the bki
bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
command line.
- new collations can be defined with command CREATE COLLATION <collation
name> FOR <character set specification>  FROM <existing collation name>
[STRCOLFN <fn name>]
[ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [
LCCTYPE <lc_ctype> ]
- because of pg_collation and pg_charset are catalogs individual for each
database, if you want to create a database with collation other than
specified, create it in template1 and then create database

I have to wonder, is all that really necessary? The feature you'retrying to implement is to support database-level collation at first, andperhaps column-level collation later. We don't need support foruser-defined collations and charsets for that.

If leave all that out of the patch for now, we'll have a much slimmer,and just as useful patch, implementing database-level collation. We canadd those catalogs later if we need them, but I don't think there's muchpoint in adding all that infrastructure if they just reflect the localesinstalled in the operating system.

- when connecting to database, it retrieves locales from pg_database and
sets them


This is the real gist of this patch.

Design & functionality changes left:
- move retrieveing collation from pg_database to pg_type


I don't understand this item. What will you move?

- get case sensitivity and pad characteristic working


I feel we should leave this to the collation implementation.

- when creating database with different collation than database cluster, the
database has to be reindexed. Any idea how to do it? Function
ReindexDatabase works only when database is opened.

That's a tricky one. One idea is to prohibit choosing a differentcollation than the one in the template database, unless we know it'ssafe to do so without reindexing. The problem is that we don't knowwhether it's safe. A simple but limiting solution would be to requirethat the template database has the same collation as the database that'sbeing created, except that template0 can always be used as template.template0 is safe, because there's no indexes on text columns there.

Note that we already have the same problem with encodings. If you createa database with LATIN1 encoding, load it with data, and then use that asa template for a database with UTF-8 encoding, the text data will beincorrectly encoded. We should probably fix that too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: Collation support

Reply via email to