On 09/21/2016 08:04 AM, Amit Kapila wrote:
On Wed, Sep 21, 2016 at 3:48 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
...

I'll repeat the test on the 4-socket machine with a newer kernel,
but that's probably the last benchmark I'll do for this patch for
now.


Attached are results from benchmarks running on kernel 4.5 (instead of the old 3.2.80). I've only done synchronous_commit=on, and I've added a few client counts (mostly at the lower end). The data are pushed the data to the git repository, see

    git push --set-upstream origin master

The summary looks like this (showing both the 3.2.80 and 4.5.5 results):

1) Dilip's workload

 3.2.80                             16     32     64    128    192
-------------------------------------------------------------------
 master                          26138  37790  38492  13653   8337
 granular-locking                25661  38586  40692  14535   8311
 no-content-lock                 25653  39059  41169  14370   8373
 group-update                    26472  39170  42126  18923   8366

 4.5.5                 1      8     16     32     64    128    192
-------------------------------------------------------------------
 granular-locking   4050  23048  27969  32076  34874  36555  37710
 no-content-lock    4025  23166  28430  33032  35214  37576  39191
 group-update       4002  23037  28008  32492  35161  36836  38850
 master             3968  22883  27437  32217  34823  36668  38073


2) pgbench

 3.2.80                             16     32     64    128    192
-------------------------------------------------------------------
 master                          22904  36077  41295  35574   8297
 granular-locking                23323  36254  42446  43909   8959
 no-content-lock                 23304  36670  42606  48440   8813
 group-update                    23127  36696  41859  46693   8345

 4.5.5                 1      8     16     32     64    128    192
-------------------------------------------------------------------
 granular-locking   3116  19235  27388  29150  31905  34105  36359
 no-content-lock    3206  19071  27492  29178  32009  34140  36321
 group-update       3195  19104  26888  29236  32140  33953  35901
 master             3136  18650  26249  28731  31515  33328  35243


The 4.5 kernel clearly changed the results significantly:

(a) Compared to the results from 3.2.80 kernel, some numbers improved, some got worse. For example, on 3.2.80 pgbench did ~23k tps with 16 clients, on 4.5.5 it does 27k tps. With 64 clients the performance dropped from 41k tps to ~34k (on master).

(b) The drop above 64 clients is gone - on 3.2.80 it dropped very quickly to only ~8k with 192 clients. On 4.5 the tps actually continues to increase, and we get ~35k with 192 clients.

(c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters):

    12118 11610 27939 11771 18065
    12152 14375 10983 13614 11077

we get this on 4.5.5

    37354 37650 37371 37190 37233
    38498 37166 36862 37928 38509

Notice how much more even the 4.5.5 results are, compared to 3.2.80.

(d) There's no sign of any benefit from any of the patches (it was only helpful >= 128 clients, but that's where the tps actually dropped on 3.2.80 - apparently 4.5.5 fixes that and the benefit is gone).

It's a bit annoying that after upgrading from 3.2.80 to 4.5.5, the performance with 32 and 64 clients dropped quite noticeably (by more than 10%). I believe that might be a kernel regression, but perhaps it's a price for improved scalability for higher client counts.

It of course begs the question what kernel version is running on the machine used by Dilip (i.e. cthulhu)? Although it's a Power machine, so I'm not sure how much the kernel matters on it.

I'll ask someone else with access to this particular machine to repeat the tests, as I have a nagging suspicion that I've missed something important when compiling / running the benchmarks. I'll also retry the benchmarks on 3.2.80 to see if I get the same numbers.


Okay, but I think it is better to see the results between 64~128
client count and may be greater than128 client counts, because it is
clear that patch won't improve performance below that.


There are results for 64, 128 and 192 clients. Why should we care about numbers in between? How likely (and useful) would it be to get improvement with 96 clients, but no improvement for 64 or 128 clients?

>>
I agree with Robert that the cases the patch is supposed to
improve are a bit contrived because of the very high client
counts.


No issues, I have already explained why I think it is important to
reduce the remaining CLOGControlLock contention in yesterday's and
this mail. If none of you is convinced, then I think we have no
choice but to drop this patch.


I agree it's useful to reduce lock contention in general, but considering the last set of benchmarks shows no benefit with recent kernel, I think we really need a better understanding of what's going on, what workloads / systems it's supposed to improve, etc.

I don't dare to suggest rejecting the patch, but I don't see how we could commit any of the patches at this point. So perhaps "returned with feedback" and resubmitting in the next CF (along with analysis of improved workloads) would be appropriate.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: results.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to