Re: Built-in CTYPE provider

Peter Eisentraut Fri, 22 Mar 2024 07:52:05 -0700

On 21.03.24 01:13, Jeff Davis wrote:

Are there any test cases that illustrate the word boundary changes in
patch 0005?  It might be useful to test those against Oracle as well.

The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
collation vs '123Abc' in PG_UNICODE_FAST.


The reason for the latter behavior is that the Unicode Default Case
Conversion algorithm for toTitlecase() advances to the next Cased
character before mapping to titlecase, and digits are not Cased. ICU
has a configurable adjustment, and defaults in a way that produces
'123abc'.

I think this might be too big of a compatibility break. So far,initcap('123abc') has always returned '123abc'. If the new collationreturns '123Abc' now, then that's quite a change. These are not someobscure Unicode special case characters, after all.

What is the ICU configuration incantation for this? Maybe we could havethe builtin provider understand some of that, too.


Or we should create a function separate from initcap.

Re: Built-in CTYPE provider

Reply via email to