Heap Only Tuples ("HOT") is a simplification of earlier proposals for improving the way the server handles frequent updates, based upon what's been learned and feedback received.
Heap Only Tuples ---------------- The basic idea is that when a tuple is UPDATEd we can, in certain circumstances, avoid inserting index tuples for a tuple. Such tuples are marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to other tuples. The pre-conditions for allowing a HOT UPDATE are - UPDATE doesn't change any indexed columns - there is space on the same block as the tuple being updated There is no restriction on tuple length changes, nor any requirement for an additional field in the tuple header; as a result this change does not require activation by an additional WITH parameter and this technique can be used on *all* tables. HOT will, in some cases, perform better in conjunction with the use of the fillfactor storage parameter. For smaller tables, this will seldom be required, so database tuning will not increase in complexity (in comparison with carefully planned VACUUM strategies in earlier releases). In many cases, the update rate will cause a steady state to be reached, with on-block space being reused cyclically. At the same time we insert the HEAP_ONLY_TUPLE, the just-updated tuple will be marked HEAP_UPDATE_ROOT. When we use an index to locate a heap tuple, we start from this root tuple and hop forwards using the ctid chain until we find the appropriate tuple. CREATE INDEX requires some careful work to allow it to identify and correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as a result of the new index. This will cause additional work to be required for those cases. CREATE INDEX on a newly loaded table will be completely unaffected. There is some complexity there, though we don't go into detail on those issues here. Please read on! To allow HOT to work effectively we need to consider how we will VACUUM, noting that in many cases we can remove HEAP_ONLY_TUPLEs much more easily because they have no index tuples referencing them. There are various options at this stage, but for clarity only one of those options is presented here. When we try to UPDATE a tuple and the new tuple version doesn't fit on the block, we get the BufferCleanupLock if possible and then perform a single-block VACUUM. Any tuple that is both HEAP_DEAD & HEAP_ONLY_TUPLE can be removed completely. This is possible by changing the t_ctid field so that it points at the first visible-to-someone tuple in the chain, so it points "over" the previous HOT tuples. The root tuple is also dead - it cannot be removed completely, so it is replaced it with "just a TupleHeader", which is referred to as a TupleStub. (Credit to Itagaki for this concept). e.g. t1 (t_ctid: t2 ) - info HEAP_UPDATE_ROOT status HEAPTUPLE_DEAD t2 (t_ctid: t3 ) - info HEAP_ONLY status HEAPTUPLE_DEAD t3 (t_ctid:self) - info HEAP_ONLY status HEAPTUPLE_LIVE after single-page VACUUM t1 (t_ctid: t3 ) - info HEAP_UPDATE_ROOT & HEAP_TUPLE_STUB - status HEAPTUPLE_RECENTLY_DEAD - t1 is now a TupleStub only t3 (t_ctid:self) - info HEAP_ONLY status HEAPTUPLE_LIVE Status shown is the return value from HeapTupleSatisfiesVacuum() The single-block VACUUM would alter *all* tuple chains on the block, not just the one for the current tuple being UPDATEd. This technique means that a tuple never changes its CTID, so everything that currently uses CTID can continue normally. SeqScan would also work identically to the way it works today. It also means that we can't easily remove the root tuple, even if it is now just a TupleStub (unless the whole chain is also removable because of DELETE). Removing the root tuple will require a VACUUM *FULL*. Even so, this option is still space-neutral in the worst-case, in comparison with inserting index tuples. When we perform the single-block VACUUM we don't change the FSM, nor do we try to check/increment the table's freezelimit. HOT would alter slightly the way that UPDATEs are signalled to stats, so that these HOT UPDATEs don't count towards the threshold for autovacuuming - so that a frequently HOT-updated table may only very seldom require a normal VACUUM. The number of itempointers would increase in many cases, though this would still be limited by current maximums. Various tweaks on this basic idea exist, which can be considered in more detail if the basic concept is accepted. - - - This design is aimed at being a no-frills version of the code that has already been written. The existing version is available for testing now and will be made available on community.enterprisedb.com shortly. Four PostgreSQL developers have various amounts of time to contribute to developing the above solution and customising it further according to the wishes of the Community. That is myself, Heikki Linnakangas, Pavan Deolasee and Nikhil Sontakke. Taken together, it seems possible to craft something that can be acceptable for PostgreSQL 8.3 Plan from here is to publish the WIP patch(es) weekly until 8.3 code freeze, together with any performance results. Your comments are welcome. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster