Hi,

The idea of using a so called dead space map to speed up vacuum has come up multiple times in this list in the last couple of years. I wrote an initial implementation of it to measure the performance impact it has on updates and on vacuum.

Potential uses for a dead space map are:

* speed up vacuum when there's few dead tuples

Vacuum will need to be modified to use index lookups to find index tuples corresponding the dead heap tuples. Otherwise you have to scan through all the indexes anyway.

* vacuuming pages one by one as they're written by bgwriter

I'm not sure how much difference this would make, but it would be an interesting experiment. In theory, you could save a lot of total I/O, because you would not need to come back to vacuum the pages later, but you would have to read in any index pages pointing to the dead heap tuples inside bgwriter.

* implementation of index-only scans

An index scan would not have to check the visibility information of heap tuples on those heap pages that are marked as clean in the dead space map. This requires that the dead space map is implemented so that a page is reliably marked as dirty in all circumstances when it contains any tuples that are not visible to all backends.

The obvious drawback is that heap updates need to update the dead space map too.

My current implementation stores a bitmap of 32k bits in the special space of every 32k heap pages. Each bit in the bitmap corresponds one heap page. The bit is set every time a tuple is updated, and it's cleared by vacuum. This is a very simple approach, and doesn't take much space.

Is there something I'm missing? Any ideas?

I'm going to have some spare time to hack PostgreSQL in the coming months, and I'm thinking of refining this if there's interest. Is anyone else working on this?

- Heikki

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to