On Sat, Jul 20, 2013 at 6:28 PM, Greg Smith <g...@2ndquadrant.com> wrote:
> On 7/20/13 4:48 AM, didier wrote: > >> With your tests did you try to write the hot buffers first? ie buffers >> with a high refcount, either by sorting them on refcount or at least >> sweeping the buffer list in reverse? >> > > I never tried that version. After a few rounds of seeing that all changes > I tried were just rearranging the good and bad cases, I got pretty bored > with trying new changes in that same style. > > > by writing to the OS the less likely to be recycle buffers first it may >> have less work to do at fsync time, hopefully they have been written by >> the OS background task during the spread and are not re-dirtied by other >> backends. >> > > That is the theory. In practice write caches are so large now, there is > almost no pressure forcing writes to happen until the fsync calls show up. > It's easily possible to enter the checkpoint fsync phase only to discover > there are 4GB of dirty writes ahead of you, ones that have nothing to do > with the checkpoint's I/O. > > Backends are constantly pounding the write cache with new writes in > situations with checkpoint spikes. The writes and fsync calls made by the > checkpoint process are only a fraction of the real I/O going on. The volume > of data being squeezed out by each fsync call is based on total writes to > that relation since the checkpoint. That's connected to the writes to that > relation happening during the checkpoint, but the checkpoint writes can > easily be the minority there. > > It is not a coincidence that the next feature I'm working on attempts to > quantify the total writes to each 1GB relation chunk. That's the most > promising path forward on the checkpoint problem I've found. > > > -- > Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com >