Re: Pre-proposal: unicode normalized text

Nico Williams Thu, 05 Oct 2023 12:15:14 -0700

On Thu, Oct 05, 2023 at 07:31:54AM -0400, Robert Haas wrote:
> [...] On the other hand, to do that in PostgreSQL, we'd need to
> propagate the character set/encoding information into all of the
> places that currently get the typmod and collation, and that is not a
> small number of places. It's a lot of infrastructure for the project
> to carry around for a feature that's probably only going to continue
> to become less relevant.


Text+encoding can be just like bytea with a one- or two-byte prefix
indicating what codeset+encoding it's in.  That'd be how to encode
such text values on the wire, though on disk the column's type should
indicate the codeset+encoding, so no need to add a prefix to the value.

Complexity would creep in around when and whether to perform automatic
conversions.  The easy answer would be "never, on the server side", but
on the client side it might be useful to convert to/from the locale's
codeset+encoding when displaying to the user or accepting user input.

If there's no automatic server-side codeset/encoding conversions then
the server-side cost of supporting non-UTF-8 text should not be too high
dev-wise -- it's just (famous last words) a generic text type
parameterized by codeset+ encoding type.  There would not even be a hard
need for functions for conversions, though there would be demand for
them.

But I agree that if there's no need, there's no need.  UTF-8 is great,
and if only all PG users would just switch then there's not much more to
do.

Nico
--

Re: Pre-proposal: unicode normalized text

Reply via email to