There are actually very many customizations to collations that are possible beyond what the "stock" ICU collations provide (whatever "stock" means). Some of these are really cool, and I can imagine use cases where they are very compelling that have nothing to do with internationalization (such customizations are how we should eventually implement case-insensitive collations, once the infrastructure for doing that without breaking hashing is in place).
I'd like to give a demo on what is already possible, but not currently documented. I didn't see anyone else comment on this, including Peter E (maybe I missed that?). We should improve the documentation in this area, to get this into the hands of users. Say we're unhappy that numbers come first, which we see here: postgres=# select * from (select '1a' i union select '1b' union select '1c' union select 'a1' union select 'b2' union select 'c3') j order by i collate "en-x-icu"; i ──── 1a 1b 1c a1 b2 c3 (6 rows) We may do this to get our desired sort order: postgres=# create collation digitlast (provider=icu, locale='en-u-kr-latn-digit'); CREATE COLLATION postgres=# select * from (select '1a' i union select '1b' union select '1c' union select 'a1' union select 'b2' union select 'c3') j order by i collate "digitlast"; i ──── a1 b2 c3 1a 1b 1c (6 rows) Note that 'kr' is a specific BCP47 Key [1]. Many different options can be set in this manner. Let's say we are unhappy with the fact that capital letters sort higher than lowercase: postgres=# select * from (select 'B' i union select 'b' union select 'A' union select 'a') j order by i collate "en-x-icu"; i ─── a A b B (4 rows) ICU provides a solution here, too: postgres=# create collation capitalfirst (provider=icu, locale='en-u-kf-upper'); CREATE COLLATION postgres=# select * from (select 'B' i union select 'b' union select 'A' union select 'a') j order by i collate "capitalfirst"; i ─── A a B b (4 rows) And, yes -- you can even *combine* these two options by creating a third custom collation. That can be spelled 'en-u-kf-upper-kr-latn-digit', in case you were wondering. Users have certainly complained about not liking this or that aspect of how glibc sorts text many times over the years, particularly around things like how whitespace and punctuation are handled, which they can now do something about [2]. Users can also have numbers sort like numbers should when compared against other numbers, by using the numericOrdering option (not shown). numericOrdering would be great for things like alphanumeric invoice numbers, or the alphanumeric car registration plate numbers that are used in certain countries [3], with fixed number/letter fields. These options are very powerful. [1] http://unicode.org/reports/tr35/tr35-collation.html#Setting_Options [2] http://unicode.org/reports/tr35/tr35-collation.html#Common_Settings [3] https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_the_Republic_of_Ireland -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers