On Mon, Apr 19, 2021 at 1:03 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > That doco is explaining the users-eye view of it. Places addressed > to datatype developers, such as the CREATE TYPE reference page, see > it a bit differently. CREATE TYPE for instance points out that > > All storage values other than plain imply that the functions of the > data type can handle values that have been toasted, as described in ...
Interesting. It feels to me like SET STORAGE PLAIN feels like it is really trying to be two different things. Either you want to inhibit compression and external storage for performance reasons, or your data type can't support either one. Maybe we should separate those concepts, since there's no mode right now that says "don't ever compress, and externalize only if there's absolutely no other way," and there's no way to disable compression and externalization without also killing off short headers. :-( > The notion that short header doesn't cost anything seems extremely > Intel-centric to me. I don't think so. It's true that Intel is very forgiving about unaligned accesses compared to some other architectures, but I think if you have a terabyte of data, you want it to fit into as few disk pages as possible pretty much no matter what architecture you're using. The dominant costs are going to be the I/O costs, not the CPU costs of dealing with unaligned bytes. In fact, even if you have a gigabyte of data, I bet it's *still* better to use a more compact on-disk representation. Now, the dominant cost is going to be pumping the data through the L3 CPU cache, which is still - I think - going to be quite a lot more important than the CPU costs of dealing with unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk itself, just at a higher rate of speed which is still way slower than the CPU speed. Now if you have a megabyte of data, or better yet a kilobyte of data, then I think optimizing for CPU efficiency may well be the right thing to do. I don't know how much 4-byte varlena headers really save there, but if I were designing a storage representation for very small data sets, I'd definitely be thinking about how I could waste space to shave cycles. -- Robert Haas EDB: http://www.enterprisedb.com