On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a big project, so any help is welcome. The design I have in mind is to use the LSN of the commit record as the CSN (as Greg Stark suggested).

Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in pg_clog. Currently, we only store 2 bits per transaction there, indicating if the transaction committed or not, but the patch will expand it to 64 bits, to store the LSN. To check the visibility of an XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.

Currently, before consulting the clog for an XID's status, it is necessary to first check if the transaction is still in progress by scanning the proc array. To get rid of that requirement, just before writing the commit record in the WAL, the backend will mark the clog slot with a magic value that says "I'm just about to commit". After writing the commit record, it is replaced with the record's actual LSN. If a backend sees the magic value in the clog, it will wait for the transaction to finish the insertion, and then check again to get the real LSN. I'm thinking of just using XactLockTableWait() for that. This mechanism makes the insertion of a commit WAL record and updating the clog appear atomic to the rest of the system.

With this mechanism, taking a snapshot is just a matter of reading the current WAL insertion point. There is no need to scan the proc array, which is good. However, it probably still makes sense to record an xmin and an xmax in SnapshotData, for performance reasons. An xmax, in particular, will allow us to skip checking the clog for transactions that will surely not be visible. We will no longer track the latest completed XID or the xmin like we do today, but we can use SharedVariableCache->nextXid as a conservative value for xmax, and keep a cached global xmin value in shared memory, updated when convenient, that can be just copied to the snapshot.

In theory, we could use a snapshot LSN as the cutoff-point for HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but that makes me feel uneasy. In any case, I think we'll need a cut-off point defined as an XID rather than an LSN for freezing purposes. In particular, we need a cut-off XID to determine how far the pg_clog can be truncated, and to store in relfrozenxid. So, we will still need the concept of a global oldest xmin.

When a snapshot is just an LSN, taking a snapshot can no longer calculate an xmin, like we currently do (there will be a snapshot LSN in place of an xmin in the proc array). So we will need a new mechanism to calculate the global oldest xmin. First scan the proc array to find the oldest still in-progress XID. That - 1 will become the new oldest global xmin, after all currently active snapshots have finished. We don't want to sleep in GetOldestXmin(), waiting for the snapshots to finish, so we should periodically advance a system-wide oldest xmin value, for example whenever the walwrite process wakes up, so that when we need an oldest-xmin value, we will always have a fairly recently calculated value ready in shared memory.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to