Re: [HACKERS] Spread checkpoint sync

Greg Smith Mon, 31 Jan 2011 13:34:04 -0800

Tom Lane wrote:

Robert Haas <robertmh...@gmail.com> writes:

3. Pause for 3 seconds after every fsync.

I think something along the lines of #3 is probably a good idea,


Really?  Any particular delay is guaranteed wrong.

'3 seconds' is just a placeholder for whatever comes out of a "totaltime scheduled to sync / relations to sync" computation. (Still doingall my thinking in terms of time, altough I recognize a showdown withsegment-based checkpoints is coming too)

I think the right way to compute "relations to sync" is to finish thesorted writes patch I sent over a not quite right yet update to already,which is my next thing to work on here. I remain pessimistic that anyattempt to issue fsync calls without the maximum possible delay afterasking kernel to write things out first will work out well. My recenttests with low values of dirty_bytes on Linux just reinforces how badthat can turn out. In addition to computing the relation count whilesorting them, placing writes in-order by relation and then doing allwrites followed by all syncs should place the database right in themiddle of the throughput/latency trade-off here. It will have had themaximum amount of time we can give it to sort and flush writes for anygiven relation before it is asked to sync it. I don't want to try andbe any smarter than that without trying to be a *lot* smarter--timingindividual sync calls, feedback loops on time estimation, etc.

At this point I have to agree with Robert's observation that splittingcheckpoints into checkpoint_write_target and checkpoint_sync_target isthe only reasonable thing left that might be possible complete in ashort period. So that's how this can compute the total time numerator here.

The main thing I will warn about in relations to discussion today is thedanger of true dead-line oriented scheduling in this area. Thecheckpoint process may discover the sync phase is falling behindexpectations because the individual sync calls are taking longer thanexpected. If that happens, aiming for the "finish on target anyway"goal puts you right back to a guaranteed nasty write spike again. Ithink many people would prefer logging the overrun as tuning feedbackfor the DBA rather than to accelerate, which is likely to make theproblem even worse if the checkpoint is falling behind. But sinceultimately the feedback for this will be "make the checkpoints longer orincrease checkpoint_sync_target", sync acceleration to meet the deadlineisn't unacceptable; DBA can try both of those themselves if seeing spikes.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: [HACKERS] Spread checkpoint sync

Reply via email to