On Sat, Aug 04, 2007 at 09:04:33PM +0100, Gregory Stark wrote: > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > Gregory Stark <[EMAIL PROTECTED]> writes: > >> The scenario I was describing was having, for example, 20 fields each > >> of which are char(100) and store 'x' (which are padded with 99 > >> spaces). So the row is 2k but the fields are highly compressible, but > >> shorter than the 256 byte minimum. > > > > To be blunt, the solution to problems like that is sending the DBA to a > > re-education camp. I don't think we should invest huge amounts of > > effort on something that's trivially fixed by using the correct datatype > > instead of the wrong datatype. > > Sorry, there was a bit of a mixup here. The scenario I described above is what > it would take to get Postgres to actually try to compress a small string given > the way the toaster works. > > In the real world interesting cases wouldn't be so extreme. Having a single > CHAR(n) or a text field which contains any other very compressible string > could easily not be compressed currently due to being under 256 bytes. > > I think the richer target here is doing some kind of cross-record compression. > For example, xml text columns often contain the same tags over and over again > in successive records but any single datum wouldn't be compressible.
I have a table of (id serial primary key, url text unique) with a few hundred million urls that average about 120 bytes each. The url index is only used when a possibly new url is to be inserted, but between the data and the index this table occupies a large part of the page cache. Any form of compression here would be really helpful. -dg -- David Gould [EMAIL PROTECTED] If simplicity worked, the world would be overrun with insects. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match