On Thu, May 30, 2013 at 9:33 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
> The reason we have to freeze is that otherwise our 32-bit XIDs wrap around
> and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but
> that would waste a lot space. The trick is to add a field to the page header
> indicating the 'epoch' of the XID, while keeping the XIDs in tuple header
> 32-bit wide (*).

Check.

> The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we
> wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we
> could implicitly treat anything older than relfrozenxid as frozen.

Check.

> That's the basic idea. Vacuum freeze only needs to remove dead tuples, but
> doesn't need to dirty pages that contain no dead tuples.

Check.

> Since we're not storing 64-bit wide XIDs on every tuple, we'd still need to
> replace the XIDs with FrozenXid whenever the difference between the smallest
> and largest XID on a page exceeds 2^31. But that would only happen when
> you're updating the page, in which case the page is dirtied anyway, so it
> wouldn't cause any extra I/O.

It would cause some extra WAL activity, but it wouldn't dirty the page
an extra time.

> This would also be the first step in allowing the clog to grow larger than 2
> billion transactions, eliminating the need for anti-wraparound freezing
> altogether. You'd still want to truncate the clog eventually, but it would
> be nice to not be pressed against the wall with "run vacuum freeze now, or
> the system will shut down".

Interesting.  That seems like a major advantage.

> (*) "Adding an epoch" is inaccurate, but I like to use that as my mental
> model. If you just add a 32-bit epoch field, then you cannot have xids from
> different epochs on the page, which would be a problem. In reality, you
> would store one 64-bit XID value in the page header, and use that as the
> "reference point" for all the 32-bit XIDs on the tuples. See existing
> convert_txid() function for how that works. Another method is to store the
> 32-bit xid values in tuple headers as offsets from the per-page 64-bit
> value, but then you'd always need to have the 64-bit value at hand when
> interpreting the XIDs, even if they're all recent.

As I see it, the main downsides of this approach are:

(1) It breaks binary compatibility (unless you do something to
provided for it, like put the epoch in the special space).

(2) It consumes 8 bytes per page.  I think it would be possible to get
this down to say 5 bytes per page pretty easily; we'd simply decide
that the low-order 3 bytes of the reference XID must always be 0.
Possibly you could even do with 4 bytes, or 4 bytes plus some number
of extra bits.

(3) You still need to periodically scan the entire relation, or else
have a freeze map as Simon and Josh suggested.

The upsides of this approach as compared with what Andres and I are
proposing are:

(1) It provides a stepping stone towards allowing indefinite expansion
of CLOG, which is quite appealing as an alternative to a hard
shut-down.

(2) It doesn't place any particular requirements on PD_ALL_VISIBLE.  I
don't personally find this much of a benefit as I want to keep
PD_ALL_VISIBLE, but I know Jeff and perhaps others disagree.

Random thought: Could you compute the reference XID based on the page
LSN?  That would eliminate the storage overhead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to