[HACKERS] Group Commit

Heikki Linnakangas Mon, 26 Mar 2007 02:44:16 -0800

It's been known for years that commit_delay isn't very good at giving usgroup commit behavior. I did some experiments with this simple testcase: "BEGIN; INSERT INTO test VALUES (1); COMMIT;", with differentnumbers of concurrent clients and with and without commit_delay.


Summary for the impatient:
1. Current behavior sucks.

2. commit_delay doesn't help with # of clients < ~10. It does help withhigher numbers, but it still sucks.

3. I'm working on a patch.

I added logging to show how many commit records are flushed on eachfsync. The output with otherwise unpatched PG head looks like this, with5 clients:


LOG:  Flushed 4 out of 5 commits
LOG:  Flushed 1 out of 5 commits
LOG:  Flushed 4 out of 5 commits
LOG:  Flushed 1 out of 5 commits
LOG:  Flushed 4 out of 5 commits
LOG:  Flushed 1 out of 5 commits
LOG:  Flushed 4 out of 5 commits
LOG:  Flushed 1 out of 5 commits
LOG:  Flushed 3 out of 5 commits
LOG:  Flushed 2 out of 5 commits
LOG:  Flushed 3 out of 5 commits
LOG:  Flushed 2 out of 5 commits
LOG:  Flushed 3 out of 5 commits
LOG:  Flushed 2 out of 5 commits
LOG:  Flushed 3 out of 5 commits
...

Here's what's happening:

1. Client 1 issues fsync (A)

2. Clients 2-5 write their commit record, and try to fsync, but theyhave to wait for fsync (A) to finish.

3. fsync (A) finishes, freeing client 1.

4. One of clients 2-5 starts the next fsync (B), which will flushcommits of clients 2-5 to disk5. Client 1 begins new transaction, inserts commit record and tries tofsync. Needs to wait for previous fsync (B) to finish

6. fsync B finishes, freeing clients 2-5
7. Client 1 issues fsync (C)
8. ...

The 2-3-2-3 pattern can be explained with similar unfortunate"resonance", but with two clients instead of client 1 in the abovepossibly running in separate cores (test was run on a dual-core laptop).


I also draw a diagram illustrating the above, attached.

I wrote a quick & dirty patch for this that I'm going to refine further,but wanted to get the results out for others to look at first. I'm notposting the patch yet, but it basically adds some synchronization to theWAL flushes. It introduces a counter of inserted but not yet flushedcommit records. Instead of the commit_delay, the counter is checked. Ifit's smaller than NBackends, the process waits until count reachesNBackends, or a timeout expires. There's two significant differences tocommit_delay here:1. Instead of waiting for commit_delay to expire, processes are wokenand fsync is started immediately when we know there's no more commitrecords coming that we should wait for. Even though commit_delay isgiven in microseconds, the real granularity of the wait can be as highas 10 ms, which is in the same ball park as the fsync itself.2. commit_delay is not used when there's less than commit_siblingsnon-idle backends in the system. With very short transactions, it'sworthwhile to wait even if that's the case, because a client can beginand finish a transaction in much shorter time than it takes to fsync.This is what makes the commit_delay to not work at all in my test casewith 2 clients.


Here's a spreadsheet with the results of the tests I ran:
http://community.enterprisedb.com/groupcommit-comparison.ods

It contains a graph that shows that the patch works very well for thistest case. It's not very good for real life as it is, though. An obviousflaw is that if you have a longer-running transaction, effect 1. goesaway. Instead of waiting for NBackends commit records, we should try toguess the number of transactions that are likely to finish in areasonably short time. I'm thinking of keeping a running average ofcommits per second, or # of transactions that finish while an fsync istaking place.


Any thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[HACKERS] Group Commit

Reply via email to