Re: Character expansion with ICU collations

Finnerty, Jim Fri, 11 Jun 2021 13:05:55 -0700

>>
    You can have these queries return both rows if you use an
    accent-ignoring collation, like this example in the documentation:


    CREATE COLLATION ignore_accents (provider = icu, locale =
    'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and 
accent-insensitive collations documented anywhere?

Another unexpected dependency appears to be @colCaseFirst=upper.  If specified 
in combination with colStrength=secondary, it appears that the upper/lower case 
ordering is random within a group of characters that are secondary equal, e.g. 
'A' < 'a', but 'b' < 'B', 'c' < 'C', ...  , but then 'L' < 'l'.  It is not even 
consistently ordered with respect to case.  If I make it a nondeterministic 
CS_AI collation, then it sorts upper before lower consistently.  The rule seems 
to be that you can't sort by case within a group that is case-insensitive. 

Can a CI collation be ordered upper case first, or is this a limitation of ICU?

For example, this is part of the sort order that I'd like to achieve with ICU, 
with the code point in column 1 and dense_rank() shown in the rightmost column 
indicating that 'b' = 'B', for example:

66      B       B       138     151
98      b       b       138     151   <- so within a group that is CI_AS equal, 
the sort order needs to be upper case first
67      C       C       139     152
99      c       c       139     152
199     Ç       Ç       140     153
231     ç       ç       140     153
68      D       D       141     154
100     d       d       141     154
208     Ð       Ð       142     199
240     ð       ð       142     199
69      E       E       143     155
101     e       e       143     155

Can this sort order be achieved with ICU?

More generally, is there any interest in leveraging the full power of ICU 
tailoring rules to get whatever order someone may need, subject to the 
limitations of ICU itself?  what would be required to extend CREATE COLLATION 
to accept an optional sequence of tailoring rules that we would store in the 
pg_collation catalog and apply along with the modifiers in the locale string?

    /Jim

Re: Character expansion with ICU collations

Reply via email to