On Fri, Oct 6, 2023 at 2:25 PM Nico Williams <n...@cryptonector.com> wrote: > > > > Well, that would be making the encoding a per-value property, rather > > > > than a per-column property like collation as I proposed. I can't see > > > > > > On-disk it would be just a property of the type, not part of the value. > > > > I mean, that's not how it works. > > Sure, because TEXT in PG doesn't have codeset+encoding as part of it -- > it's whatever the database's encoding is. Collation can and should be a > porperty of a column, since for Unicode it wouldn't be reasonable to > make that part of the type. But codeset+encoding should really be a > property of the type if PG were to support more than one. IMO.
No, what I mean is, you can't just be like "oh, the varlena will be different in memory than on disk" as if that were no big deal. I agree that, as an alternative to encoding being a column property, it could instead be completely a type property, meaning that if you want to store, say, LATIN1 text in your UTF-8 database, you first create a latint1text data type and then use it, rather than, as in the model I proposed, creating a text column and then applying a setting like ENCODING latin1 to it. I think that there might be some problems with that model, but it could also have some benefits. If someone were going to make a run at implementing this, they might want to consider both designs and evaluate the tradeoffs. But, even if we were all convinced that this kind of feature was good to add, I think it would almost certainly be wrong to invent new varlena features along the way. -- Robert Haas EDB: http://www.enterprisedb.com