Re: [HACKERS] Spread checkpoint sync

Greg Smith Sat, 04 Dec 2010 21:57:28 -0800

Greg Stark wrote:

Using sync_file_range you can specify the set of blocks to sync and
then block on them only after some time has passed. But there's no
documentation on how this relates to the I/O scheduler so it's not

clear it would have any effect on the problem.

I believe this is the exact spot we're stalled at in regards to gettingthis improved on the Linux side, as I understand it at least. *The*answer for this class of problem on Linux is to use sync_file_range, andI don't think we'll ever get any sympathy from those kernel developersuntil we do. But that's a Linux specific call, so doing that is goingto add a write path fork with platform-specific code into the database.If I thought sync_file_range was a silver bullet guaranteed to make thisbetter, maybe I'd go for that. I think there's some relativelylow-hanging fruit on the database side that would do better before goingto that extreme though, thus the patch.

We might still have to delay the begining of the sync to allow the dirty blocks 
to be synced
naturally and then when we issue it still end up catching a lot of
other i/o as well.

Whether it's "lots" or not is really workload dependent. I work fromthe assumption that the blocks being written out by the checkpoint arethe most popular ones in the database, the ones that accumulate a highusage count and stay there. If that's true, my guess is that the writesbeing done while the checkpoint is executing are a bit less likely to betouching the same files. You raise a valid concern, I just haven't seenthat actually happen in practice yet.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

Reply via email to