Re: [HACKERS] Group commit, revised

Heikki Linnakangas Mon, 16 Jan 2012 00:12:34 -0800

On 16.01.2012 00:42, Peter Geoghegan wrote:

I've also attached the results of a pgbench-tools driven benchmark,
which are quite striking (Just the most relevant image - e-mail me
privately if you'd like a copy of the full report, as I don't want to
send a large PDF file to the list as a courtesy to others). Apart from
the obvious improvement in throughput, there is also a considerable
improvement in average and worst latency at all client counts.


Impressive results. How about uploading the PDF to the community wiki?

The main way that I've added value here is by refactoring and fixing
bugs. There were some tricky race conditions that caused the
regression tests to fail for that early draft patch, but there were a
few other small bugs too. There is an unprecedented latch pattern
introduced by this patch: Two processes (the WAL Writer and any given
connection backend) have a mutual dependency. Only one may sleep, in
which case the other is responsible for waking it. Much of the
difficulty in debugging this patch, and much of the complexity that
I've added, came from preventing both from simultaneously sleeping,
even in the face of various known failure modes like postmaster death.
If this happens, it does of course represent a denial-of-service, so
that's something that reviewers are going to want to heavily
scrutinise. I suppose that sync rep should be fine here, as it waits
on walsenders, but I haven't actually comprehensively tested the
patch, so there may be undiscovered unpleasant interactions with other
xlog related features. I can report that it passes the regression
tests successfully, and on an apparently consistently basis - I
battled with intermittent failures for a time.

I think it might be simpler if it wasn't the background writer that'sresponsible for "driving" the group commit queue, but the backendsthemselves. When a flush request comes in, you join the queue, and ifsomeone else is already doing the flush, sleep until the driver wakesyou up. If no-one is doing the flush yet (ie. the queue is empty), startdoing it yourself. You'll need a state variable to keep track who'sdriving the queue, but otherwise I think it would be simpler as therewould be no dependency on WAL writer.

I tend think of the group commit facility as a bus. Passengers can hopon board at any time, and they take turns on who drives the bus. Whenthe first passengers hops in, there is no driver so he takes the driverseat. When the next passenger hops in, he sees that someone is drivingthe bus already, so he sits down, and places a big sign on his foreheadstating the bus stop where he wants to get off, and goes to sleep. Whenthe driver has reached his own bus stop, he wakes up all the passengerswho wanted to get off at the same stop or any of the earlier stops [1].He also wakes up the passenger whose bus stop is the farthest from thecurrent stop, and gets off the bus. The woken-up passengers who havealready reached their stops can immediately get off the bus, and the onewho has not notices that no-one is driving the bus anymore, so he takesthe driver seat.

[1] in a real bus, a passenger would not be happy if he's woken up toolate and finds himself at the next stop instead of the one where hewanted to go, but for group commit, that is fine.

In this arrangement, you could use the per-process semaphore for thesleep/wakeups, instead of latches. I'm not sure if there's anydifference, but semaphores are more tried and tested, at least.

The benefits (and, admittedly, the controversies) of this patch go
beyond mere batching of commits: it also largely, though not entirely,
obviates the need for user backends to directly write/flush WAL, and
the WAL Writer may never sleep if it continually finds work to do -
wal_writer_delay is obsoleted, as are commit_siblings and
commit_delay.

wal_writer_delay is still needed for controlling how often asynchronouscommits are flushed to disk.

Auxiliary processes cannot use group commit. The changes made prevent
them from availing of commit_siblings/commit_delay parallelism,
because it doesn't exist anymore.


Auxiliary processes have PGPROC entries too. Why can't they participate?

Group commit is sometimes throttled, which seems appropriate - if a
backend requests that the WAL Writer flush an LSN deemed too far from
the known flushed point, that request is rejected and the backend goes
through another path, where XLogWrite() is called.

Hmm, if the backend doing the big flush gets the WALWriteLock before abunch of group committers, the group committers will have to wait untilthe big flush is finished, anyway. I presume the idea of the throttlingis to avoid the situation where a bunch of small commits need to waitfor a huge flush to finish.

Perhaps the big flusher should always join the queue, but use someheuristic to first flush up to the previous commit request, to wake upothers quickly, and do another flush to flush its own request after that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

Reply via email to