Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Thu, 07 Jan 2016 07:07:36 -0800


Hello Andres,

I thought of adding a pointer to the current flush structure at the vfd
level, so that on closing a file with a flush in progress the flush can be
done and the structure properly cleaned up, hence later the checkpointer
would see a clean thing and be able to skip it instead of generating flushes
on a closed file or on a different file...

Maybe I'm missing something, but that is the plan I had in mind.


That might work, although it'd not be pretty (not fatally so
though).

Alas, any solution has to communicate somehow between the API levels, soit cannot be "pretty", although we should avoid the worse.

But I'm inclined to go a different way: I think it's a mistake to doflusing based on a single file. It seems better to track a fixed numberof outstanding 'block flushes', independent of the file. Whenever thenumber of outstanding blocks is exceeded, sort that list, and flush alloutstanding flush requests after merging neighbouring flushes.


Hmmm. I'm not sure I understand your strategy.

I do not think that flushing without a prior sorting would be effective,because there is no clear reason why buffers written together would thenbe next to the other and thus give sequential write benefits, we wouldjust get flushed random IO, I tested that and it worked badly.


One of the point of aggregating flushes is that the range flush call cost

is significant, as shown by preliminary tests I did, probably up in thethread, so it makes sense to limit this cost, hence the aggregation. Theseremoved some performation regression I had in some cases.

Also, the granularity of the buffer flush call is a file + offset + size,so necessarily it should be done this way (i.e. per file).

Once buffers are sorted per file and offset within file, then writtenbuffers are as close as possible one after the other, the merging is veryeasy to compute (it is done on the fly, no need to keep the list ofbuffers for instance), it is optimally effective, and when thecheckpointed file changes then we will never go back to it before the nextcheckpoint, so there is no reason not to flush right then.

So basically I do not see a clear positive advantage to your suggestion,especially when taking into consideration the scheduling process of thescheduler:

In effect the checkpointer already works with little bursts of activitybetween sleep phases, so that it writes buffers a few at a time, so it mayalready work more or less as you expect, but not for the same reason.

The closest stategy that I experimented which is maybe close to yoursuggestion was to manage a minimum number of buffers to write when awakenand to change the sleep delay in between, but I had no clear way to choosevalues and the experiments I did did not show significant performanceimpact by varying these parameters, so I kept that out. If you find amagic number of buffer which results in consistant better performance,fine with me, but this is independent with aggregating before or after.

Imo that means that we'd better track writes on a relfilenode + blocknumber level.

I do not think that it is a better option. Moreover, the current approachhas been proven to be very effective on hundreds of runs, so redoing itdifferently for the sake of it does not look like good resourceallocation.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to