On Mon, Aug 14, 2017 at 9:15 AM, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > I'm having trouble finding some concrete documentation for this. The TR > 35 link you showed documents the key words and values, BCP 47 documents > the syntax, but nothing puts it all together in a form consumable by > users. The ICU documentation still mainly focuses on the "old" > @keyword=value syntax. I guess we'll have to write our own for now.
There is an unusual style to the standards that apply here. It's incredibly detailed, and the options are very powerful, but it's in an unfamiliar language. ICU just considers itself a consumer of the CLDR locale stuff, which is a broad standard. We don't have to write comprehensive documentation of these kn/kb/ka/kh options that I pointed out exist. I think it would be nice to cover a few interesting cases, and link to the BCP 47 Unicode extension (TR 35) stuff. Here is a list of scripts, that are all reorderable with this TR 35 stuff (varies somewhat based on CLDR/ICU version): http://unicode.org/iso15924/iso15924-codes.html Here is a CLDR specific XML specification of the variant keywords (can be mapped to specific ICU version easily): http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml > Given that we cannot reasonably preload all these new variants that you > demonstrated, I think it would make sense to drop all the keyword > variants from the preloaded set. Cool. While I am of course in favor of this, I actually understand very well why you had initdb add them. I think that removing them creates a discoverability problem that cannot easily be fixed through documentation. ISTM that we ought to also add an SQL-callable function that lists the most common keyword variants. Some of those are specific to one or two locales, such as traditional Spanish, or the alternative sort orders for Han characters. What do you think of that idea? I guess an alternative idea is to just link to that XML document (collation.xml), which exactly specifies the variants. Users can get the "co" variants there. Should be for the most part obvious which one is interesting to which locale, since there is not that many "co" variants to choose from, and users will probably know what to look for if they look at all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers