On 05.06.25 21:56, Jeff Davis wrote:
On Thu, 2025-06-05 at 10:12 +0200, Peter Eisentraut wrote:
The reason we don't do it at parse time is that we don't have the
information which functions care about collations, which is exactly
what
you are proposing here to add.
Currently, we have:
create table c(x text collate "C", y text collate "en_US");
insert into c values ('x', 'y');
select x < y from c; -- fails (runtime check)
select x || y from c; -- succeeds
Surely, "<" would be marked as ordering-sensitive, and we could move
the error to parse-time.
But what about UDFs? If we assume that all UDFs are ordering-sensitive
unless marked otherwise, then a user-defined version of "||" that
previously worked would now start failing, until they add the ordering-
insensitive mark.
I think no matter how we slice it, there is going to be some case that
will be degraded until some update is applied. I would be content to
accept this particular variant, because it doesn't seem very realistic.
Why would a user define their own concatenation function? There already
is one. Unless your concatenation function does something special, in
which case you should probably think about this collations topic. More
generally, there are I think only so many operations you can do on
characters strings that you can do without considering the
collation/ctype/etc. These are essentially all the operations that you
can do without looking at the characters, like length(), ||, repeat().
Everything beyond that looks at the characters and needs to take
collation/ctype/etc. into account.
We'd need some kind of migration path where we could retain the runtime
checks and disable the parse time checks until people have a chance to
add the right marks to their UDFs. Migration paths like that are not
great because they take several releases to work out, and we're never
quite sure when to finally remove the deprecated behavior.
Perhaps pg_dump can apply some properties during upgrades?
If we make the opposite assumption, that none are ordering-sensitive
unless we mark them so, that would allow properly-marked functions to
fail at parse time, and the rest to fail at runtime. But this
assumption doesn't work as well for recording dependencies, because
we'd miss the dependencies for UDFs that aren't properly marked.
That feels like the worst of both worlds.