Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Sat, 09 Jan 2016 07:52:14 -0800


Hello Andres,

Hm. New theory: The current flush interface does the flushing inside
FlushBuffer()->smgrwrite()->mdwrite()->FileWrite()->FlushContextSchedule(). The
problem with that is that at that point we (need to) hold a content lock
on the buffer!

You are worrying that FlushBuffer is holding a lock on a buffer and the"sync_file_range" call occurs is issued at that moment.

Although I agree that it is not that good, I would be surprise if that wasthe explanation for a performance regression, because the sync_file_rangewith the chosen parameters is an async call, it "advises" the OS to sendthe file, but it does not wait for it to be completed.

Moreover, for this issue to have a significant impact, it would requirethat another backend just happen to need this very buffer, but ISTM thatthe performance regression you are arguing about is on random IO boundperformance, that is a few 100 tps in the best case, for very large bases,so a lot of buffers, so the probability of such a collision is very small,so it would not explain a significant regression.

Especially on a system that's bottlenecked on IO that means we'll
frequently hold content locks for a noticeable amount of time, while
flushing blocks, without any need to.

I'm not that sure it is really noticeable, because sync_file_range doesnot wait for completion.

Even if that's not the reason for the slowdowns I observed, I think this
fact gives further credence to the current "pending flushes" tracking
residing on the wrong level.

ISTM that I put the tracking at the level where is the information isavailable without having to recompute it several times, as the flush needsto know the fd and offset. Doing it differently would mean more code andtranslating buffer to file/offset several times, I think.

Also, maybe you could answer a question I had about the performanceregression you observed, I could not find the post where you gave thedetailed information about it, so that I could try reproducing it: whatare the exact settings and conditions (shared_buffers, pgbench scaling,host memory, ...), what is the observed regression (tps? other?), and whatis the responsiveness of the database under the regression (eg % ofseconds with 0 tps for instance, or something like that).


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to