On Mon, 2007-02-26 at 23:07 -0500, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > On Mon, 2007-02-26 at 18:14 -0500, Tom Lane wrote:
> >> What does this accomplish other than adding syntactic sugar over a
> >> feature that really doesn't work well anyway?
>
> > This patch doesn't intend to implement group commit. I've changed the
> > meaning of commit_delay, sorry if that confuses.
>
> Ah. The patch was pretty much unintelligible without the discussion
> (which got here considerably later :-(). I've still got misgivings
> about how safe it really is, but at least this is better than what
> commit_delay wishes it could do.
Latest WIP version of patch now ready for performance testing.
Applies cleanly to CVS HEAD, with two additional files:
src/backend/postmaster/walwriter.c
src/include/postmaster/walwriter.h
Patch passes make installcheck in these cases
- no options set
- wal_writer_delay = 10
- wal_writer_delay = 10 and transaction_guarantee = off for all
transactions by default in postgresql.conf.
Normal checkpoints and restarts work without problem after these runs.
What this patch does
Implements unguaranteed transactions, which skip the XLogFlush step when
they commit. The flush point is updated in shared memory so that a
separate WAL writer process will perform the flush each time it cycles.
These parameters control this behaviour
transaction_guarantee = on (default) | off USERSET
wal_writer_delay = 0 (default, ==off)SIGHUP
log_transaction_guarantee = on (default) | off SIGHUP
(the default for this would be off in later production version)
WAL writer will start/stop when wal_writer_delay is non-zero/zero.
Unguaranteed transactions are only allowed for
- Execute message
- Fastpath message
- Sync message
- simple query implicit-commit-at-end and explicit COMMITs
All other transaction commits will always use guaranteed commit path.
These include things like VACUUM, various DDL and about a dozen other
places that execute commits. The abort path is never fast in any case.
In addition, any transaction that is deleting files follows guaranteed
commit path, however it was requested.
The interlock between commits and checkpoints is maintained. After the
CheckpointStartLock has been gained by bgwriter, all unguaranteed
transactions are flushed.
(In addition the fsync GUC has been removed from postgresql.conf.sample,
but not actually removed. If this patch goes ahead, I suggest we
deprecate it for one release then remove it next...)
What this patch doesn't do yet
--
Crash recovery does not yet work, but can be made to do so with TODO
items (1) and (2) below.
1. The interlock between buffer manager and WAL is maintained, but not
sufficiently to avoid problems in all cases. Specifically, commit hint
bits must not be written to disk ahead of a transaction commit.
Two approaches are possible
1. avoid setting the hint bits for unguaranteed transactions
2. set the hint bits *and* update the LSN of the page to be the LSN of
the unguaranteed transaction for which we are setting the hint bits.
Either way, we need to maintain a list of unguaranteed transactions in
shared memory that can be accessed when hint bits are set. The list
would need to contain the Xid and the LSN of each unguaranteed
transaction. This would necessitate keeping the list of unguaranteed
transactions fairly small, so some care is required to ensure this. That
can be achieved by keeping commit_fsync_delay small or putting in a
trigger point at which an wannabe unguaranteed transaction is forced to
flush WAL instead. Some testing has shown that committing every 8
transactions has a considerable leap in performance in many cases.
2. As originally discussed, during crash recovery any in-flight
transactions would need to be explicitly aborted in clog, to override
the possibility that an unguaranteed transaction would have been marked
committed. An alternative would be to flush all unguaranteed
transactions prior to flushing dirty clog and multitrans pages. That
could be achieved by keeping the LSN of the last write to those pages
and performing XLogFlush up to that LSN when we write dirty pages. I'm
leaning towards the new alternative version now, since its cleaner and
it fits better with the way the rest of the server works.
3. WAL Writer could be used for various additional tasks, such as doing
the WAL cache-half-filled check. Those options have been ignored until
now, to avoid complicating discussion and review.
4. We probably need more padding in XLogCtlData to ensure that data
protected by WALInsertLock, WALWriteLock and infolck are in separate
cache lines to avoid CPU false sharing. That should be done whether or
not this patch goes ahead.
Tests, reviews and comments please?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/transam/xact.c
=