Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Tomas Vondra Wed, 26 Oct 2016 15:46:15 -0700

On 10/25/2016 06:10 AM, Amit Kapila wrote:

On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar <[email protected]> wrote:

On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar <[email protected]> wrote:

On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
<[email protected]> wrote:

In the results you've posted on 10/12, you've mentioned a regression with 32
clients, where you got 52k tps on master but only 48k tps with the patch (so
~10% difference). I have no idea what scale was used for those tests,


That test was with scale factor 300 on POWER 4 socket machine. I think
I need to repeat this test with multiple reading to confirm it was
regression or run to run variation. I will do that soon and post the
results.


As promised, I have rerun my test (3 times), and I did not see any regression.


Thanks Tomas and Dilip for doing detailed performance tests for this
patch.  I would like to summarise the performance testing results.

1. With update intensive workload, we are seeing gains from 23%~192%
at client count >=64 with group_update patch [1].
2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
gains from 12% to ~70% at client count >=64 [2].  Tests are done on
8-socket intel   m/c.
3. With pgbench workload (both simple-update and tpc-b at 300 scale
factor), we are seeing gain 10% to > 50% at client count >=64 [3].
Tests are done on 8-socket intel m/c.
4. To see why the patch only helps at higher client count, we have
done wait event testing for various workloads [4], [5] and the results
indicate that at lower clients, the waits are mostly due to
transactionid or clientread.  At client-counts where contention due to
CLOGControlLock is significant, this patch helps a lot to reduce that
contention.  These tests are done on on 8-socket intel m/c and
4-socket power m/c
5. With pgbench workload (unlogged tables), we are seeing gains from
15% to > 300% at client count >=72 [6].

It's not entirely clear which of the above tests were done on unloggedtables, and I don't see that in the referenced e-mails. That would be aninteresting thing to mention in the summary, I think.

There are many more tests done for the proposed patches where gains
are either or similar lines as above or are neutral.  We do see
regression in some cases.

1. When data doesn't fit in shared buffers, there is regression at
some client counts [7], but on analysis it has been found that it is
mainly due to the shift in contention from CLOGControlLock to
WALWriteLock and or other locks.

The questions is why shifting the lock contention to WALWriteLock shouldcause such significant performance drop, particularly when the test wasdone on unlogged tables. Or, if that's the case, how it makes theperformance drop less problematic / acceptable.

FWIW I plan to run the same test with logged tables - if it showssimilar regression, I'll be much more worried, because that's a fairlytypical scenario (logged tables, data set > shared buffers), and wesurely can't just go and break that.

2. We do see in some cases that granular_locking and no_content_lock
patches has shown significant increase in contention on
CLOGControlLock.  I have already shared my analysis for same upthread
[8].

I do agree that some cases this significantly reduces contention on theCLogControlLock. I do however think that currently the performance gainsare limited almost exclusively to cases on unlogged tables, and somelogged+async cases.

On logged tables it usually looks like this (i.e. modest increase forhigh client counts at the expense of significantly higher variability):


  http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64

or like this (i.e. only partial recovery for the drop above 36 clients):

  http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64

And of course, there are cases like this:

  http://tvondra.bitbucket.org/#dilip-300-logged-async

I'd really like to understand why the patched results behave thatdifferently depending on client count.


>
> Attached is the latest group update clog patch.
>

How is that different from the previous versions?

>

In last commit fest, the patch was returned with feedback to evaluate
the cases where it can show win and I think above results indicates
that the patch has significant benefit on various workloads.  What I
think is pending at this stage is the either one of the committer or
the reviewers of this patch needs to provide feedback on my analysis
[8] for the cases where patches are not showing win.

Thoughts?

I do agree the patch(es) significantly reduce CLogControlLock, althoughwith WAL logging enabled (which is what matters for most productiondeployments) it pretty much only shifts the contention to a differentlock (so the immediate performance benefit is 0).

Which raises the question why to commit this patch now, before we have apatch addressing the WAL locks. I realize this is a chicken-egg problem,but my worry is that the increased WALWriteLock contention will causeregressions in current workloads.

BTW I've ran some tests with the number of clog buffers increases to512, and it seems like a fairly positive. Compare for example these tworesults:


  http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
  http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip-clog-512

The first one is with the default 128 buffers, the other one is with 512buffers. The impact on master is pretty obvious - for 72 clients the tpsjumps from 160k to 197k, and for higher client counts it gives us about+50k tps (typically increase from ~80k to ~130k tps). And the tpsvariability is significantly reduced.


For the other workload, the results are less convincing though:

  http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
  http://tvondra.bitbucket.org/#dilip-300-unlogged-sync-clog-512

Interesting that the master adopts the zig-zag patter, but shifted.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Reply via email to