>>> not compressing very small datums (< 256 bytes) also seems smart, >>> since that could end up producing a lot of extra compression attempts, >>> most of which will end up saving little or no space. > > That was presumably the rationale for the original logic. However experience > shows that there are certainly databases that store a lot of compressible > short strings. > > Obviously databases with CHAR(n) desperately need us to compress them. But > even plain text data are often moderately compressible even with our fairly > weak compression algorithm. > > One other thing that bothers me about our toast mechanism is that it only > kicks in for tuples that are "too large". It seems weird that the same column > is worth compressing or not depending on what other columns are in the same > tuple.
That's a fair point. There's definitely some inconsistency in the current behavior. It seems to me that, in theory, compression and out-of-line storage are two separate behaviors. Out-of-line storage is pretty much a requirement for dealing with large objects, given that the page size is a constant; compression is not a requirement, but definitely beneficial under some circumstances, particularly when it removes the need for out-of-line storage. char(n) is kind of a wierd case because you could also compress by storing a count of the trailing spaces, without applying a general-purpose compression algorithm. But either way the field is no longer fixed-width, and therefore field access can't be done as a simple byte offset from the start of the tuple. It's difficult even to enumerate the possible use cases, let alone what knobs would be needed to cater to all of them. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers