On Thu, Oct 5, 2023 at 3:15 PM Nico Williams <n...@cryptonector.com> wrote: > Text+encoding can be just like bytea with a one- or two-byte prefix > indicating what codeset+encoding it's in. That'd be how to encode > such text values on the wire, though on disk the column's type should > indicate the codeset+encoding, so no need to add a prefix to the value.
Well, that would be making the encoding a per-value property, rather than a per-column property like collation as I proposed. I can't see that working out very nicely, because encodings are collation-specific. It wouldn't make any sense if the column collation were en_US.UTF8 or ko_KR.eucKR or en_CA.ISO8859-1 (just to pick a few values that are legal on my machine) while data stored in the column was from a whole bunch of different encodings, at most one of which could be the one to which the column's collation applied. That would end up meaning, for example, that such a column was very hard to sort. For that and other reasons, I suspect that the utility of storing data from a variety of different encodings in the same database column is quite limited. What I think people really want is a whole column in some encoding that isn't the normal one for that database. That's not to say we should add such a feature, but if we do, I think it should be that, not a different encoding for every individual value. -- Robert Haas EDB: http://www.enterprisedb.com