I just had this thought a few minutes ago, discussed it briefly with RhodiumToad on #postgresql and wanted to put it out here for discussion. Feel free to rip it apart. It probably is a bit "al-dente" at this point and needs more cooking.

The reason why we need full_page_writes is that we need to guard against torn pages or partial writes. So what if smgr would manage a mapping between logical page numbers and their physical location in the relation?

At the moment where we today require a full page write into WAL, we would mark the buffer as "needs relocation". The smgr would then write this page into another physical location whenever it is time to write it (via the background writer, hopefully). After that page is flushed, it would update the page location pointer, or whatever we want to call it. A thus free'd physical page location can be reused, once the location pointer has been flushed to disk. This is a critical ordering of writes. First the page at the new location, second the pointer to the current location. Doing so would make write(2) appear atomic to us, which is exactly what we need for crash recovery.

In addition to that, vacuum would now be able to tell smgr "hey, this page is completely empty". Instead of doing the second "empty page for truncate" scan, smgr could slowly migrate pages on first touch after a checkpoint towards the head of the file, into these empty pages. This way it would free pages at the end and now smgr is completely at liberty to truncate them off whenever it sees fit. No extra scan require, just a little more bookkeeping. This would not only be the case for heap pages, but for empty index pages as well. Shrinking/truncating indexes is something, we are completely unable to do today. Whenever the buffer manager is asked for such a page that doesn't exist physically any more, it would just initialize an empty one of that kind (heap/index) in a buffer and mark it "needs relocation". It would get recreated physically on eviction/checkpoint without freeing any previously occupied space.


Comments?
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to