Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Fri, 19 Jun 2015 23:58:46 -0700


Hello Andres,

- Move fsync as early as possible, suggested by Andres Freund?

My opinion is that this should be left out for the nonce.


"for the nonce" - what does that mean?


 Nonce \Nonce\ (n[o^]ns), n. [For the nonce, OE. for the nones, ...
     {for the nonce}, i. e. for the present time.

I'm doubtful that it's a good idea to separate this out, if you did.

Actually I did, because as explained in another mail the fsync time whenthe other options are activated as reported in the logs is essentiallynull, so it would not bring significant improvements on these runs,

and also the patch changes enough things as it is.

So this is an evidence-based decision.

I also agree that it seems interesting on principle and should bebeneficial in some case, but I would rather keep that on a TODO listtogether with trying to do better things in the bgwriter and try to focuson the current proposal which already changes significantly thecheckpointer throttling logic.

 - as version 2: checkpoint buffer sorting based on a 2007 patch by
   Takahiro Itagaki but with a smaller and static buffer allocated once.
   Also, sorting is done by chunks of 131072 pages in the current version,
   with a guc to change this value.


I think it's a really bad idea to do this in chunks.

The small problem I see is that for a very large setting there could beseveral seconds or even minutes of sorting, which may or may not bedesirable, so having some control on that seems a good idea.


Another argument is that Tom said he wanted that:-)

In practice the value can be set at a high value so that it is nearlyalways sorted in one go. Maybe value "0" could be made special and used totrigger this behavior systematically, and be the default.

That'll mean we'll frequently uselessly cause repetitive random IO,

This is not an issue if the chunks are large enough, and anyway the gucallows to change the behavior as desired. As I said, keeping some controlseems a good idea, and the "full sorting" can be made the defaultbehavior.

often interleaved. That pattern is horrible for SSDs too. We shouldalways try to do this at once, and only fail back to using less memoryif we couldn't allocate everything.

The memory is needed anyway in order to avoid a double or significantlymore heavy implementation for the throttling loop. It is allocated once onthe first checkpoint. The allocation could be moved to the checkpointerinitialization if this is a concern. The memory needed is one int perbuffer, which is smaller than the 2007 patch.

 . tiny: scale=10 shared_buffers=1GB checkpoint_timeout=30s time=6400s


It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.

The point of tiny was to trigger a lot of checkpoints. The size is prettyridiculous anyway, as "tiny" implies. I think I did some tests on otherversions of the patch and longer checkpoint_timeout on pretty smalldatabase that showed smaller benefit from the options, as one wouldexpect. I'll try to re-run some.

So you've not run things at more serious concurrency, that'd be
interesting to see.


I do not have a box available for "serious concurrency".

I'd also like to see concurrent workloads with synchronous_commit=off -
I've seen absolutely horrible latency behaviour for that, and I'm hoping
this will help. It's also a good way to simulate faster hardware than
you have.

It's also curious that sorting is detrimental for full speed 'tiny'.


Yep.

With SSD probably both options would probably have limited benefit.


I doubt that. Small random writes have bad consequences for wear
leveling. You might not notice that with a short tests - again, I doubt
it - but it'll definitely become visible over time.

Possibly. Testing such effects does not seem easy, though. At least I havenot seen "write stalls" on SSD, which is my primary concern.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to