Heap Only Tuples ("HOT") is a simplification of earlier proposals for
improving the way the server handles frequent updates, based upon what's
been learned and feedback received.

Heap Only Tuples
----------------

The basic idea is that when a tuple is UPDATEd we can, in certain
circumstances, avoid inserting index tuples for a tuple. Such tuples are
marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
other tuples. The pre-conditions for allowing a HOT UPDATE are
- UPDATE doesn't change any indexed columns
- there is space on the same block as the tuple being updated

There is no restriction on tuple length changes, nor any requirement for
an additional field in the tuple header; as a result this change does
not require activation by an additional WITH parameter and this
technique can be used on *all* tables. 

HOT will, in some cases, perform better in conjunction with the use of
the fillfactor storage parameter. For smaller tables, this will seldom
be required, so database tuning will not increase in complexity (in
comparison with carefully planned VACUUM strategies in earlier
releases). In many cases, the update rate will cause a steady state to
be reached, with on-block space being reused cyclically.
 
At the same time we insert the HEAP_ONLY_TUPLE, the just-updated tuple
will be marked HEAP_UPDATE_ROOT. When we use an index to locate a heap
tuple, we start from this root tuple and hop forwards using the ctid
chain until we find the appropriate tuple. 

CREATE INDEX requires some careful work to allow it to identify and
correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
a result of the new index. This will cause additional work to be
required for those cases. CREATE INDEX on a newly loaded table will be
completely unaffected. There is some complexity there, though we don't
go into detail on those issues here. Please read on!

To allow HOT to work effectively we need to consider how we will VACUUM,
noting that in many cases we can remove HEAP_ONLY_TUPLEs much more
easily because they have no index tuples referencing them. There are
various options at this stage, but for clarity only one of those options
is presented here. 

When we try to UPDATE a tuple and the new tuple version doesn't fit on
the block, we get the BufferCleanupLock if possible and then perform a
single-block VACUUM. Any tuple that is both HEAP_DEAD & HEAP_ONLY_TUPLE
can be removed completely. This is possible by changing the t_ctid field
so that it points at the first visible-to-someone tuple in the chain, so
it points "over" the previous HOT tuples. The root tuple is also dead -
it cannot be removed completely, so it is replaced it with "just a
TupleHeader", which is referred to as a TupleStub. (Credit to Itagaki
for this concept).

e.g.

t1 (t_ctid: t2 ) - info HEAP_UPDATE_ROOT status HEAPTUPLE_DEAD
t2 (t_ctid: t3 ) - info HEAP_ONLY        status HEAPTUPLE_DEAD
t3 (t_ctid:self) - info HEAP_ONLY        status HEAPTUPLE_LIVE

after single-page VACUUM

t1 (t_ctid: t3 ) - info HEAP_UPDATE_ROOT & HEAP_TUPLE_STUB
                 - status HEAPTUPLE_RECENTLY_DEAD
                 - t1 is now a TupleStub only
t3 (t_ctid:self) - info HEAP_ONLY        status HEAPTUPLE_LIVE

Status shown is the return value from HeapTupleSatisfiesVacuum()

The single-block VACUUM would alter *all* tuple chains on the block, not
just the one for the current tuple being UPDATEd.

This technique means that a tuple never changes its CTID, so everything
that currently uses CTID can continue normally. SeqScan would also work
identically to the way it works today.

It also means that we can't easily remove the root tuple, even if it is
now just a TupleStub (unless the whole chain is also removable because
of DELETE). Removing the root tuple will require a VACUUM *FULL*. Even
so, this option is still space-neutral in the worst-case, in comparison
with inserting index tuples. 

When we perform the single-block VACUUM we don't change the FSM, nor do
we try to check/increment the table's freezelimit. HOT would alter
slightly the way that UPDATEs are signalled to stats, so that these HOT
UPDATEs don't count towards the threshold for autovacuuming - so that a
frequently HOT-updated table may only very seldom require a normal
VACUUM.

The number of itempointers would increase in many cases, though this
would still be limited by current maximums.

Various tweaks on this basic idea exist, which can be considered in more
detail if the basic concept is accepted. 

- - - 

This design is aimed at being a no-frills version of the code that has
already been written. The existing version is available for testing now
and will be made available on community.enterprisedb.com shortly.

Four PostgreSQL developers have various amounts of time to contribute to
developing the above solution and customising it further according to
the wishes of the Community. That is myself, Heikki Linnakangas, Pavan
Deolasee and Nikhil Sontakke. Taken together, it seems possible to craft
something that can be acceptable for PostgreSQL 8.3

Plan from here is to publish the WIP patch(es) weekly until 8.3 code
freeze, together with any performance results.


Your comments are welcome.

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to