Re: [HACKERS] checkpoint writeback via sync_file_range

Greg Smith Tue, 10 Jan 2012 20:38:58 -0800

On 1/10/12 9:14 PM, Robert Haas wrote:

Based on that, I whipped up the attached patch, which,
if sync_file_range is available, simply iterates through everything
that will eventually be fsync'd before beginning the write phase and
tells the Linux kernel to put them all under write-out.

I hadn't really thought of using it that way. The kernel expects thatwhen this is called the normal way, you're going to track exactly whichsegments you want it to sync. And that data isn't really passed throughthe fsync absorption code yet; the list of things to fsync has alreadylost that level of detail.

What you're doing here doesn't care though, and I hadn't considered thatSYNC_FILE_RANGE_WRITE could be used that way on my last pass through itsdocs. Used this way, it's basically fsync without the wait orguarantee; it just tries to push what's already dirty further ahead ofthe write queue than those writes would otherwise be.

One idea I was thinking about here was building a little hash tableinside of the fsync absorb code, tracking how many absorb operationshave happened for whatever the most popular relation files are. Theidea is that we might say "use sync_file_range every time <N> calls fora relation have come in", just to keep from ever accumulating too manywrites to any one file before trying to nudge some of it out of there.The bat that keeps hitting me in the head here is that right now, asingle fsync might have a full 1GB of writes to flush out, perhapsbecause it extended a table and then write more than that to it. And ineverything but a SSD or giant SAN cache situation, 1GB of I/O is justtoo much to fsync at a time without the OS choking a little on it.

I don't know that I have a suitable place to test this, and I'm not
quite sure what a good test setup would look like either, so while
I've tested that this appears to issue the right kernel calls, I am
not sure whether it actually fixes the problem case.


I'll put this into my testing queue after the upcoming CF starts.

--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint writeback via sync_file_range

Reply via email to