ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> I encountered overflow of bgwriter's file-fsync request queue. It occurred
> during checkpoints. Each backend would call fsync disorderly in such cases,
> so that the checkpoint takes a long time and the performance has decreased.
> It seems to happen frequently on the machines with a lot of memories and
> poor disks.

I can't help thinking that this is a situation that could only be got
into with a seriously misconfigured database --- per the comments for
ForwardFsyncRequest, we really don't want this code to run at all,
let alone run so often that a queue with NBuffers entries overflows.
What exactly are the test conditions under which you're seeing this
happen?

If there actually is a problem that needs to be solved, I think it'd be
better to try to do AbsorbFsyncRequests somewhere in the main checkpoint
loops.  I don't like the idea of holding the BgWriterCommLock long
enough to do a qsort ... especially not if this occurs only with very
large NBuffers settings.  Also, what if the qsort fails to eliminate any
duplicates, or eliminates only a few?  You could get into a scenario
where the qsort gets repeated every few ForwardFsyncRequest calls, in
which case it'd become a drag on performance itself.  (See also recent
discussion with Qingqing about converting BgWriterCommLock to a
spinlock.  Though I was against that because no performance problem had
been shown, it could still become something we want to do ... but
putting a qsort here would foreclose that option.)

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to