Hi Noah, On 27 January 2013 02:31, Noah Misch <n...@leadboat.com> wrote: > I did a few more benchmarks along the spectrum.
> So that's a nice 27-53% improvement, fairly similar to the pattern for your > laptop pgbench numbers. I presume that this applies to a tpc-b benchmark (the pgbench default). Note that the really compelling numbers that I reported in that blog post (where there is an increase of over 80% in transaction throughput at lower client counts) occur with an insert-based benchmark (i.e. a maximally commit-bound workload). > Next, based on your comment about the possible value > for cloud-hosted applications > -clients- -tps@commit_delay=0- -tps@commit_delay=500- > 32 1224,1391,1584 1175,1229,1394 > 64 1553,1647,1673 1544,1546,1632 > 128 1717,1833,1900 1621,1720,1951 > 256 1664,1717,1918 1734,1832,1918 > > The numbers are all over the place, but there's more loss than gain. I suspected that the latency of cloud storage might be relatively poor. Since that is evidently not actually the case with Amazon EBS, it makes sense that commit_delay isn't compelling there. I am not disputing whether or not Amazon EBS should be considered representative of such systems in general - I'm sure that it should be. > There was no appreciable > performance advantage from setting commit_delay=0 as opposed to relying on > commit_siblings to suppress the delay. That's good news. Thank you for doing that research; I investigated that the fastpath in MinimumActiveBackends() works well myself, but it's useful to have my findings verified. > On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us. > However, the sleep was consistently around 70us longer than requested. A > 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep. > Mac OS X was similarly precise for short sleeps, but it could oversleep a full > 1000us on a 35000us sleep. Ugh. > The beginning of this paragraph stills says "commit_delay causes a delay just > before a synchronous commit attempts to flush WAL to disk". Since it now > applies to every WAL flush, that should be updated. Agreed. > There's a similar problem at the beginning of this paragraph; it says > specifically, "The commit_delay parameter defines for how many microseconds > the server process will sleep after writing a commit record to the log with > LogInsert but before performing a LogFlush." Right. > As a side note, if we're ever going to recommend a fire-and-forget method for > setting commit_delay, it may be worth detecting whether the host sleep > granularity is limited like this. Setting commit_delay = 20 for your SSD and > silently getting commit_delay = 10000 would make for an unpleasant surprise. Yes, it would. Note on possible oversleeping added. >> ! <para> >> ! Since the purpose of <varname>commit_delay</varname> is to allow >> ! the cost of each flush operation to be more effectively amortized >> ! across concurrently committing transactions (potentially at the >> ! expense of transaction latency), it is necessary to quantify that >> ! cost when altering the setting. The higher that cost is, the more >> ! effective <varname>commit_delay</varname> is expected to be in >> ! increasing transaction throughput. The > > That's true for spinning disks, but I suspect it does not hold for storage > with internal parallelism, notably virtualized storage. Consider an iSCSI > configuration with high bandwidth and high latency. When network latency is > the limiting factor, will sending larger requests less often still help? Well, I don't like to speculate about things like that, because it's just too easy to be wrong. That said, it doesn't immediately occur to me why the statement that you've highlighted wouldn't be true of virtualised storage that has the characteristics you describe. Any kind of latency at flush time means that clients idle, which means that the CPU is potentially not kept fully busy for a greater amount of wall time, where it might otherwise be kept more busy. > One would be foolish to run a performance-sensitive workload like those in > question, including the choice to have synchronous_commit=on, on spinning > disks with no battery-backed write cache. A cloud environment is more > credible, but my benchmark showed no gain there. In an everyday sense you are correct. It would typically be fairly senseless to run an application that was severely limited by transaction throughput like this, when a battery-backed cache could be used at the cost of a couple of hundred dollars. However, it's quite possible to imagine a scenario in which the economics favoured using commit_delay instead. For example, I am aware that at Facebook, a similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs) is used. Furthermore, it might not be obvious that fsync speed is an issue in practice. Setting commit_delay to 4,000 has seemingly no downside on my laptop - it *positively* affects both average and worse-case transaction latency - so with spinning disks, it probably would actually be sensible to set it and forget it, regardless of workload. When Robert committed this feature, he added an additional check when WALWriteLock is acquired, that could see the lock acquired in a way that turned out to be needless, but also prevented a flush that was technically needless from the group commit leader/lock holder backend's own selfish perspective. I never got around to satisfying myself that that changed helped more than it hurt, if in fact it had any measurable impact either way. Perhaps I should. The benchmark that appears on my blog was actually produced with the slightly different, original version. > Overall, I still won't > personally recommend changing commit_delay without measuring the performance > change for one's particular workload and storage environment. commit_delay > can now bring some impressive gains in the right situations, but I doubt those > are common enough for a fire-and-forget setting to do more good than harm. I agree. > I suggest having the documentation recommend half of the fsync time as a > starting point for benchmarking different commit_delay settings against your > own workload. Indicate that it's more likely to help for direct use of > spinning disks than for BBWC/solid state/virtualized storage. Not sure what > else can be credibly given as general advice for PostgreSQL DBAs. That all seems reasonable. The really important thing is that we don't state that we don't have a clue what helps - that inspires no confidence, could turn someone off what would be a really useful feature for them and just isn't accurate. I also think it's important that we don't say "Setting commit_delay can only help when there are many concurrently committing transactions", because roughly the opposite is true. With many connections, there are already enough committing transactions to effectively amortize the cost of a flush, and commit_delay is then only very slightly helpful. Lower client counts are where commit_delay actually helps (at least with slow fsync times, which are the compelling case). I attach a revision that I think addresses your concerns. I've polished it a bit further too - in particular, my elaborations about commit_delay have been concentrated at the end of wal.sgml, where they belong. I've also removed the reference to XLogInsert, because, since all XLogFlush call sites are now covered by commit_delay, XLogInsert isn't particularly relevant. I have also increased the default time that pg_test_fsync runs - I think that the kind of variability commonly seen in its output, that you yourself have reported, justifies doing so in passing. -- Regards, Peter Geoghegan
commit_delay_doc.2013_01_28.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers