Re: Character expansion with ICU collations

Peter Eisentraut Fri, 11 Jun 2021 13:30:10 -0700

On 11.06.21 22:05, Finnerty, Jim wrote:

     You can have these queries return both rows if you use an
     accent-ignoring collation, like this example in the documentation:

     CREATE COLLATION ignore_accents (provider = icu, locale =
     'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and 
accent-insensitive collations documented anywhere?

The above is merely a consequence of what the default collation elementsfor 'ß' are.

Expansion isn't really a relevant concept in collation. Any charactercan map to 1..N collation elements. The collation algorithm doesn'tcare how many it is.

Can a CI collation be ordered upper case first, or is this a limitation of ICU?

I don't know the authoritative answer to that, but to me it doesn't makesense, since the effect of a case-insensitive collation is to throw awaythe third-level weights, so there is nothing left for "upper case first"to operate on.

More generally, is there any interest in leveraging the full power of ICU 
tailoring rules to get whatever order someone may need, subject to the 
limitations of ICU itself?  what would be required to extend CREATE COLLATION 
to accept an optional sequence of tailoring rules that we would store in the 
pg_collation catalog and apply along with the modifiers in the locale string?

yes

Re: Character expansion with ICU collations

Reply via email to