"PFC" <[EMAIL PROTECTED]> writes: >> Hi. I have a few databases created with UNICODE encoding, and I would like to >> be able to search with accent insensitivity. There's something in Oracle >> (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do this, but I found >> nothing in PostgreSQL, just the 'to_ascii' function, which AFAIK, doesn't >> work with UNICODE.
Postgres supports localized collation orders but what it doesn't currently support is having multiple collation orders within the same server. So when you initialize the server with initdb it takes the setting of LC_COLLATE (usually from LC_ALL) and stores that permanently. If you initdb with LC_COLLATE set to a locale like en_GB.UTF-8 or something like that you may find one that has the behaviour you want. I think they won't be entirely accent insensitive but they'll consider accents only if the rest of the string is identical. You can test the sort order of a locale by writing a file with sample words and sorting with something like: LC_ALL=en_GB.UTF-8 sort /tmp/data > The easiest way is to create an extra column which will hold a copy of > your text, with all accents removed. You can also convert it to lowercase and > remove apostrophes, punctuation etc. Said column is kept up to date with a > trigger. That's another alternative which is useful if you need multiple collations in your database. This gives you control over which collation is used when and exactly what the rules are. The downside is that you have to reinvent the collation rules which the localized collations already provide. You don't necessarily have to keep a column in your table with the normalized strings. You can normalize "on-the-fly" using an expression index as long as your function always returns the same data given the same inputs (and is therefore marked "immutable"). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend