On 17/06/2025 20:14, Jeff Davis wrote:
On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote:
If the character set of <character factor> is UTF8, UTF16, or UTF32,
then FR is replaced by
Case:
i) If the <search condition> S IS NORMALIZED evaluates to
True, then NORMALIZE (FR)
ii) Otherwise, FR.
I read that as "if the input is normalized, then the output should be
normalized", IOW preserve the normalization. But does it mean "preserve
whatever the input normal form is" or "preserve NFC if the input is
NFC, otherwise the normalization is undefined"?
The above wording seems to mean "preserve NFC if the input is NFC",
because that's what NORMALIZE(FR) does when the normal form is
unspecified.
Yes, and that is also the default for <normalized predicate>.
It does not appear to me that our LOWER and UPPER functions obey this
rule,
You are correct:
WITH s(t) AS
(SELECT NORMALIZE(U&'\00C1\00DF\0301' COLLATE "en-US-x-icu"))
SELECT UPPER(t) = NORMALIZE(UPPER(t)) FROM s;
?column?
----------
f
so there is a valid argument that we should continue to ignore it.
Or, we can say that we have at least one of three compliant.
What do other databases do?
I don't know. I am just pointing out what the Standard says. I think
we should either comply, or say that we don't do it for LOWER and UPPER
so let's keep things implementation-consistent.
Given how costly normalization can be, imposing that on every caller
seems like a bit much.
How much does it cost to check for NFC? I honestly don't know the
answer to that question, but that is the only case where we need to
maintain normalization.
And favoring NFC for the user unconditionally
might not be the best thing. Then again, NFC is good most of the time,
and there are patches to speed up normalization.
It's not unconditionally, it's only if the input was NFC.
I tend to think that a lot of users who want casefolding would also
want normalization, but it's hard to weigh that against the performance
cost. It might not matter outside of a few edge cases, though I'm not
sure exactly how many.
I defer to you and others in the thread to make this decision.
--
Vik Fearing