Re: [HACKERS] Hot standby, recovery infra

Heikki Linnakangas Thu, 05 Feb 2009 04:20:21 -0800

Simon Riggs wrote:

On Thu, 2009-02-05 at 13:18 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
So we might end up flushing more often *and* we will be doing it
potentially in the code path of other users.
For example, imagine a database that fits completely in shared buffers.If we update at every XLogFileRead, we have to fsync every 16MB of WAL.If we update in XLogFlush the way I described, you only need to updatewhen we flush a page from the buffer cache, which will only happen atrestartpoints. That's far less updates.
Oh, did you change the bgwriter so it doesn't do normal page cleaning?
No. Ok, that wasn't completely accurate. The page cleaning by bgwriterwill perform XLogFlushes, but that should be pretty insignificant. Whenthere's little page replacement going on, bgwriter will do a smalltrickle of page cleaning, which won't matter much.
Yes, that case is good, but it wasn't the use case we're trying to speed
up by having the bgwriter active during recovery. We're worried about
I/O bound recoveries.


Ok, let's do the math:

By updating minRecoveryPoint in XLogFileRead, you're fsyncing thecontrol file once every 16MB of WAL.

By updating in XLogFlush, the frequency depends on the amount ofshared_buffers available to buffer the modified pages, the average WALrecord size, and the cache hit ratio. Let's determine the worst case:

The smallest WAL record that dirties a page is a heap deletion record.That contains just enough information to locate the tuple. If I'mreading the headers right, that record is 48 bytes long (28 bytes ofxlog header + 18 bytes of payload + padding). Assuming that the WAL isfull of just those records, and there's no full page images, and thatthe cache hit ratio is 0%, we will need (16 MB / 48 B) * 8 kB = 2730 MBof shared_buffers to achieve the once per 16 MB of WAL per one fsync mark.

So if you have a lower shared_buffers setting than 2.7 GB, you can havemore frequent fsyncs this way in the worst case. If you think of thetypical case, you're probably not doing all deletes, and you're having anon-zero cache hit ratio, so you achieve the same frequency with a muchlower shared_buffers setting. And if you're really that I/O bound, Idoubt the few extra fsyncs matter much.

Also note that when the control file is updated in XLogFlush, it'stypically the bgwriter doing it as it cleans buffers ahead of the clockhand, not the startup process.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, recovery infra

Reply via email to