On 21.03.24 01:13, Jeff Davis wrote:
The v26 patch was not quite complete, so I didn't commit it yet.
Attached v27-0001 and 0002.
0002 is necessary because otherwise lc_collate_is_c() short-circuits
the version check in pg_newlocale_from_collation(). With 0002, the code
is simpler and all paths go through pg_newlocale_from_collation(), and
the version check happens even when lc_collate_is_c().
But perhaps there was a reason the code was the way it was, so
submitting for review in case I missed something.
0005 and 0006 don't contain any test cases. So I guess they are
really
only usable via 0007. Is that understanding correct?
0005 is not a functional change, it's just a refactoring to use a
callback, which is preparation for 0007.
Are there any test cases that illustrate the word boundary changes in
patch 0005? It might be useful to test those against Oracle as well.
The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
collation vs '123Abc' in PG_UNICODE_FAST.
The reason for the latter behavior is that the Unicode Default Case
Conversion algorithm for toTitlecase() advances to the next Cased
character before mapping to titlecase, and digits are not Cased. ICU
has a configurable adjustment, and defaults in a way that produces
'123abc'.
New rebased series attached.
The patch set v27 is ok with me, modulo (a) discussion about initcap
semantics, and (b) what collation to assign to ucs_basic, which can be
revisited later.