[HACKERS] strange case of kernel performance regression (4.3.x and newer)

Tomas Vondra Tue, 04 Oct 2016 23:19:20 -0700

Hi,

Over the past couple of days I've been doing a bit of benchmarking forthe "group clog" patch [1], and I've ran into what I suspect might be afairly serious performance regression on newer kernels (essentially4.3.0 and newer). I got to a point where I need help with furtherinvestigation, testing on other systems etc.

The workload tested in the patch [1] is quite simple - a transactionwith 3 x SELECT FOR UPDATE queries and 2 x SAVEPOINT on unlogged tables.The results (average tps from 5 x 5 minute runs, for 32 and 64 clients)on multiple kernels look like this:


    kernel          32           64
   ---------------------------------
    3.19.8       48524        59291
    4.1.33       47193        59574
    4.2.8        48901        59877
    4.3.0        32187        38970
    4.3.6        31889        38815
    4.4.0        31946        37702
    4.4.23       31498        37724
    4.5.5        31531        37351
    4.7.6        32859        38490

Notice the sudden drop from ~50k to ~30k tps between 4.2.8 and 4.3.0(for 32 clients) and from 60k to 40k (for 64 clients). See the attachedkernel-regression-e5-4620.png.

Those results are from a 4-socket machine, with e5-4620 CPUs, so 32physical cores in total, 64 with HT. The CPU is v1 model (Sandy BridgeEP, releases in 2012 and discontinued in Q2 2015), so not particularlynew or obsolete.

This is on scale 300, which easily fits into RAM on the machine. Theresults are very stable and IMHO quite consistent.

I've also done some tests with regular pgbench (both read-only andread-write), with WAL-logged tables, and the results are quite similar.


     type      kernel           32           64
    --------------------------------------------
     ro        3.19.8        55796        81563
               4.4.23        38188        50983
    --------------------------------------------
     rw        3.19.8        32282        46234
               4.4.23        23367        31311

I've tried to reproduce the issue on another machine, but without muchsuccess. This machine however only has 2 sockets and much newer CPU(e5-2620 v4, so Broadwell, released Q1 2016).

So it might be somewhat related to the older CPU, maybe a slightlydifferent kernel config, or something else.

If you have access to similar machines (2 or 4 sockets), it'd be veryhelpful if you could repeat the benchmarks and report the results, toconfirm (or invalidate) my results.


The test scripts (and results), and kernel configs are available here:

    https://bitbucket.org/tvondra/kernel-perf-regression

It's nothing fancy, mostly trivial shell scripts (you'll need to modifysome paths in those, I guess). Testing a single kernel version takesroughly 1h.

Those are vanilla kernels, BTW - no customized distribution kernels withextra patches, etc.

[1]https://www.postgresql.org/message-id/flat/CAA4eK1+8=X9mSNeVeHg_NqMsOR-XKsjuqrYzQf=icsdh3u4...@mail.gmail.com


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strange case of kernel performance regression (4.3.x and newer)

Reply via email to