Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts.

I think we should do this at the same time as pushing the FSM out of
shared memory, and design a data structure that serves both needs.

I'm afraid the data structure required for FSM is very different.

For the visibility map, we need something that can be quickly addressed by heap block number. For FSM, we need something that's addressable by tuple size.

Cache locality of having the FSM and the visibility map in the same structure is also not very good. Visibility map is accessed by reads, and updates/inserts/deletes on pages that previously had a set bit in the map, while the FSM is only needed for cold updates and deletes.

Where to store the visibility map?
----------------------------------

a) In a fixed size shared memory struct. Not acceptable.

b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes.

c) As a new relation kind, like toast tables.

d) As an auxiliary smgr relation, attached to the heap.

I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much.

I like B.  The main objection I have to D is that it'll be far more
invasive to the buffer manager (and a bunch of other code) than you
admit; for starters RelFileNode won't work as-is.

One problem is that you have to atomically update the visibility map when you update the heap. That means that you have to lock the visibility map page and the heap page at the same time. If the visibility map is in the heap, you need to take care that you don't deadlock.

We can't directly use B for the FSM, because we also need a FSM for indexes, so we'll need something else for indexes anyway.

B is also problematic if we ever get to do upgrades without dump/restore, because it requires changes to the format of the heap itself.

What I'm envisioning is that we dedicate a quarter or half of every n'th
heap page to visibility + free space map.  Probably a byte per page is
needed (this would give us 1 visibility bit and 7 bits for free space,
or other tradeoffs if needed).  So there would be 2K or 4K heap pages
associated with each such special page.  Or we could dial it down to
even less, say 1K heap pages per special page, which would have the
advantage of reducing update contention for those pages.

How would you search that for a page with X bytes of free space?

I don't think this really works.  You are effectively assuming that no
kind of crash-induced corruption can set a bit that you didn't intend
to set.

Don't we already assume that for hint-bit updates?

Stated in those terms it seems obviously bogus.  What we will
need is that every WAL-logged update operation also include the
appropriate setting or clearing of the page's visibility bit; where
necessary, add new information to the WAL trace to allow this.

I think we already emit a WAL record whenever we would set the bit in the visibility map, so we can certainly do that. It was more an interesting observation than anything else.

To further reduce contention, we can have a copy of the bit in the page header of the heap page itself. That way we'd only need to access the visibility map on the first update on a page that clears a bit.

Seems exceedingly prone to corruption.

It does?

It seems fairly simple to keep it up-to-date. Whenever we update the visibility map, we're holding the heap page locked as well.

If you were thinking of hardware failure, I don't see how that bit would be more susceptible to that, and the potential damage isn't any bigger than from any other random bit-flip either.

We can double-check and correct both the bit in the page header and in the visibility map whenever we perform a regular vacuum (or prune), if they did get corrupt for some reason.

- We don't need to clear the bit on HOT updates, because by definition none of the indexed columns changed.

Huh?  I don't think I believe that, and I definitely don't believe your
argument for it.

HOT-updating a row doesn't change what an index-only scan would see. An index-only scan only needs columns that are in the index, and since it's a HOT update, none of those columns have changed, so it doesn't matter which tuple we're returning them from.

Pages that have HOT updated tuples, but haven't otherwise been modified since last vacuum are also not interesting for VACUUM, because a prune will do the job of getting rid of dead HOT-updated tuples just as well.

Am I missing something?


Another thought: Should we support having a FSM and visibility map on temp tables? Doesn't seem very useful, but why not, I guess.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to