On Thu, 2023-10-26 at 09:21 -0700, Jeff Davis wrote: > Our initcap() is not defined in the standard, and we document that it > only differentiates between alphanumeric and non-alphanumeric > characters, so we could get that behavior pretty easily as well. If > we > wanted to do it the Unicode way instead, we can follow the > toTitlecase() part of the Default Case Algorithm, which is based on > word breaks and would require another lookup table for that.
Correction: the rules for word breaks are fairly complex, so it would not be worth it to try to replicate that just to support initcap(). We could just use the simple, existing, and documented rules for initcap() which only differentiate between alphanumeric and not. Anyone who wants the more sophisticated rules can just use an ICU collation with initcap(). The point stands that it would be pretty simple to have a collation that handles upper() and lower() in a standards-compliant way without relying on libc or ICU. Unfortunately it's too late to call that collation UCS_BASIC, but it would still be useful. Regards, Jeff Davis