On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > "Stephen R. van den Berg" <s...@cuci.nl> writes: >> What seems to be hurting the most is the 1MB upper limit. What is the >> rationale behind that limit? > > The argument was that compressing/decompressing such large chunks would > require a lot of CPU effort; also it would defeat attempts to fetch > subsections of a large string. In the past we've required people to > explicitly "ALTER TABLE SET STORAGE external" if they wanted to make > use of the substring-fetch optimization, but it was argued that this > would make that more likely to work automatically. > > I'm not entirely convinced by Alex' analysis anyway; the only way > those 39 large values explain the size difference is if they are > *tremendously* compressible, like almost all zeroes. The toast > compressor isn't so bright that it's likely to get 10X compression > on typical data.
I've seen gzip approach 10X on what was basically a large tab-separated values file, but I agree that some more experimentation to determine the real cause of the problem would be useful. I am a little mystified by the apparent double standard regarding compressibility. My suggestion that we disable compression for pg_statistic columns was perfunctorily shot down even though I provided detailed performance results demonstrating that it greatly sped up query planning on toasted statistics and even though the space savings from compression in that case are bound to be tiny. Here, we have a case where the space savings are potentially much larger, and the only argument against it is that someone might be disappointed in the performance of substring operations, if they happen to do any. What if they know that they don't want to do any and want to get compression? Even if the benefit is only 1.5X on their data rather than 10X, that seems like a pretty sane and useful thing to want to do. It's easy to shut off compression if you don't want it; if the system makes an arbitrary decision to disable it, how do you get it back? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers