Ühel kenal päeval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote: > >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > >> unbeleiveable. What's in the table? It would seem to imply that our > >> tuple format is far more compressable than we expected. > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a; > > If the tape routines were actually storing visibility information, I'd > > expect that to be pretty compressible in this case since all the tuples > > were presumably created in a single transaction by pgbench. > > It's worse than that: IIRC what passes through a heaptuple sort are > tuples manufactured by heap_form_tuple, which will have consistently > zeroed header fields. However, the above isn't very helpful since the > rest of us have no idea what that "accounts" table contains. How wide > is the tuple data, and what's in it?
Was he not using pg_bench data ? > (This suggests that we might try harder to strip unnecessary header info > from tuples being written to tape inside tuplesort.c. I think most of > the required fields could be reconstructed given the TupleDesc.) I guess that tapefiles compress better than averahe table because they are sorted, and thus at least a little more repetitive than the rest. If there are varlen types, then they usually also have abundance of small 4-byte integers, which should also compress at least better than 4/1, maybe a lot better. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match