I just had this thought a few minutes ago, discussed it briefly with
RhodiumToad on #postgresql and wanted to put it out here for discussion.
Feel free to rip it apart. It probably is a bit "al-dente" at this point
and needs more cooking.
The reason why we need full_page_writes is that we need to guard against
torn pages or partial writes. So what if smgr would manage a mapping
between logical page numbers and their physical location in the relation?
At the moment where we today require a full page write into WAL, we
would mark the buffer as "needs relocation". The smgr would then write
this page into another physical location whenever it is time to write it
(via the background writer, hopefully). After that page is flushed, it
would update the page location pointer, or whatever we want to call it.
A thus free'd physical page location can be reused, once the location
pointer has been flushed to disk. This is a critical ordering of writes.
First the page at the new location, second the pointer to the current
location. Doing so would make write(2) appear atomic to us, which is
exactly what we need for crash recovery.
In addition to that, vacuum would now be able to tell smgr "hey, this
page is completely empty". Instead of doing the second "empty page for
truncate" scan, smgr could slowly migrate pages on first touch after a
checkpoint towards the head of the file, into these empty pages. This
way it would free pages at the end and now smgr is completely at liberty
to truncate them off whenever it sees fit. No extra scan require, just a
little more bookkeeping. This would not only be the case for heap pages,
but for empty index pages as well. Shrinking/truncating indexes is
something, we are completely unable to do today. Whenever the buffer
manager is asked for such a page that doesn't exist physically any more,
it would just initialize an empty one of that kind (heap/index) in a
buffer and mark it "needs relocation". It would get recreated physically
on eviction/checkpoint without freeing any previously occupied space.
Comments?
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers