FWIW I think I haven't responded appropriately to the points raised by Heikki. Basically, as I see it there are three main items:
1. the revmap physical-to-logical mapping is too complex; let's use something else. We had revmap originally in a separate fork. The current approach grew out of the necessity of putting it in the main fork while ensuring that fast access to individual pages is possible. There are of course many ways to skin this cat; Heikki's proposal is to have it always occupy the first few physical pages, rather than require a logical-to-physical mapping table. To implement this he proposes to move other pages out of the way as the index grows. I don't really have much love for this idea. We can change how this is implemented later in the cycle, if we find that a different approach is better than my proposal. I don't want to spend endless time meddling with this (and I definitely don't want to have this delay the eventual commit of the patch.) 2. vacuuming is not optimal Right now, to eliminate garbage index tuples we need to first scan the revmap to figure out which tuples are unreferenced. There is a concern that if there's an excess of dead tuples, the index becomes unvacuumable because palloc() fails due to request size. This is largely theoretical because in order for this to happen there need to be several million dead index tuples. As a minimal fix to alleviate this problem without requiring a complete rework of vacuuming, we can cap that palloc request to maintenance_work_mem and remove dead tuples in a loop instead of trying to remove all of them in a single pass. Another thing proposed was to store range numbers (or just heap page numbers) within each index tuple. I felt that this would add more bloat unnecessarily. However, there is some padding space in index tuple that maybe we can use to store range numbers. I will think some more about how we can use this to simplify vacuuming. 3. avoid MMTuple as it is just unnecessary extra complexity. The main thing that MMTuple adds is not the fact that we save 2 bytes by storing BlockNumber as is instead of within a TID field. Instead, it's that we can construct and deconstruct using our own design, which means we can use however many Datum entries we want and however many "null" flags. In normal heap and index tuples, there are always the same number of datum/nulls. In minmax, the number of nulls is twice the number of indexed columns; the number of datum values is determined by how many datum values are stored per opclass ("sortable" opclasses store 2 columns, but geometry would store only one). If we were to use regular IndexTuples, we would lose that .. and I have no idea how it would work. In other words, MMTuples look fine to me. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers