Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Tomas Vondra Fri, 21 Oct 2016 00:39:07 -0700

On 10/21/2016 08:13 AM, Amit Kapila wrote:

On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas <[email protected]> wrote:

On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
<[email protected]> wrote:

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.

What shared_buffer size were you using? I assume the data set fit into
shared buffers, right?


8GB.

FWIW as I explained in the lengthy post earlier today, I can actually
reproduce the significant CLogControlLock contention (and the patches do
reduce it), even on x86_64.


/me goes back, rereads post.  Sorry, I didn't look at this carefully
the first time.

For example consider these two tests:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

However, it seems I can also reproduce fairly bad regressions, like for
example this case with data set exceeding shared_buffers:

* http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip


I'm not sure how seriously we should take the regressions.  I mean,
what I see there is that CLogControlLock contention goes down by about
50% -- which is the point of the patch -- and WALWriteLock contention
goes up dramatically -- which sucks, but can't really be blamed on the
patch except in the indirect sense that a backend can't spend much
time waiting for A if it's already spending all of its time waiting
for B.


Right, I think not only WALWriteLock, but contention on other locks
also goes up as you can see in below table.  I think there is nothing
much we can do for that with this patch.  One thing which is unclear
is why on unlogged tests it is showing WALWriteLock?

Well, although we don't write the table data to the WAL, we still needto write commits and other stuff, right? And on scale 3000 (whichexceeds the 16GB shared buffers in this case), there's a continuousstream of dirty pages (not to WAL, but evicted from shared buffers), soiostat looks like this:


      time    tps  wr_sec/s  avgrq-sz  avgqu-sz     await   %util
  08:48:21  81654   1367483     16.75 127264.60   1294.80   97.41
  08:48:31  41514    697516     16.80 103271.11   3015.01   97.64
  08:48:41  78892   1359779     17.24  97308.42    928.36   96.76
  08:48:51  58735    978475     16.66  92303.00   1472.82   95.92
  08:49:01  62441   1068605     17.11  78482.71   1615.56   95.57
  08:49:11  55571    945365     17.01 113672.62   1923.37   98.07
  08:49:21  69016   1161586     16.83  87055.66   1363.05   95.53
  08:49:31  54552    913461     16.74  98695.87   1761.30   97.84

That's ~500-600 MB/s of continuous writes. I'm sure the storage couldhandle more than this (will do some testing after the tests complete),but surely the WAL has to compete for bandwidth (it's on the same volume/ devices). Another thing is that we only have 8 WAL insert locks, andmaybe that leads to contention with such high client counts.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Reply via email to