On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote: > Tom Lane wrote: > > > The idea that's becoming attractive to me while contemplating the > > multiple-maps problem is that we should adopt something similar to > > the old Mac OS idea of multiple "forks" in a relation. In addition > > to the main data fork which contains the same info as now, there could > > be one or more map forks which are separate files in the filesystem.
Are'nt we in a way doing this for indexes ? > I think something similar could be used to store tuple visibility bits > separately from heap tuple data itself, so +1 to this idea. Not just "bits", but whole visibility info (xmin,xmax,tmin,tmax, plus bits) should be stored separately. A separate "fork" for visibility should be organized as a b-tree index (as we already have well-honed mechanisms for dealing with those effectively) but visibility fork is stored in a compressed form by storing ranges of all-visible or all-deleted tuples as two endpoints only and also the tree is reorganized when possible similar to what we currently do for HOT updates. This will keep the visibility index really small for cases with little updates, most likely one or two pages regardless of table size. One important difference from indexes is that visibility info should be stored first, before writing data to heap and creating ordinary index entries. > (The rough idea in my head was that you can do an indexscan and look > up visibility bits without having to pull the whole heap along; and > visibility updates are also cheaper, whether they come from indexscans > or heap scans. Of course, the implicit cost is that a seqscan needs to > fetch the visibility pages, too; and the locking is more complex.) another cost is heavy inserting/updating where there will probably be more lock contention as visibility info for new tuples will more often land on the same visibility pages due to visibility info being generally smaller. Of course, with visibility info in a separate fork, very narrow tables will have the ratios reversed - for one byte wide table visibility info will be a few times bigger than actual data, at least initially before compression has kicked in. -------------------- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers