On Thu, Oct 05, 2023 at 07:31:54AM -0400, Robert Haas wrote: > [...] On the other hand, to do that in PostgreSQL, we'd need to > propagate the character set/encoding information into all of the > places that currently get the typmod and collation, and that is not a > small number of places. It's a lot of infrastructure for the project > to carry around for a feature that's probably only going to continue > to become less relevant.
Text+encoding can be just like bytea with a one- or two-byte prefix indicating what codeset+encoding it's in. That'd be how to encode such text values on the wire, though on disk the column's type should indicate the codeset+encoding, so no need to add a prefix to the value. Complexity would creep in around when and whether to perform automatic conversions. The easy answer would be "never, on the server side", but on the client side it might be useful to convert to/from the locale's codeset+encoding when displaying to the user or accepting user input. If there's no automatic server-side codeset/encoding conversions then the server-side cost of supporting non-UTF-8 text should not be too high dev-wise -- it's just (famous last words) a generic text type parameterized by codeset+ encoding type. There would not even be a hard need for functions for conversions, though there would be demand for them. But I agree that if there's no need, there's no need. UTF-8 is great, and if only all PG users would just switch then there's not much more to do. Nico --