Re: [PERFORM] 60 core performance with 9.3

Mark Kirkwood Tue, 29 Jul 2014 18:41:27 -0700

On 17/07/14 11:58, Mark Kirkwood wrote:


Trying out with numa_balancing=0 seemed to get essentially the same
performance. Similarly wrapping postgres startup with --interleave.

All this made me want to try with numa *really* disabled. So rebooted
the box with "numa=off" appended to the kernel cmdline. Somewhat
surprisingly (to me anyway), the numbers were essentially identical. The
profile, however is quite different:


A little more tweaking got some further improvement:

rwlocks patch as before

wal_buffers = 256MB
checkpoint_segments = 1920
wal_sync_method = open_datasync

LSI RAID adaptor disable read ahead and write cache for SSD fast path mode
numa_balancing = 0


Pgbench scale 2000 again:

clients  | tps (prev) |  tps (tweaked config)
---------+------------+---------
6        |   8175     |   8281
12       |  14409     |  15896
24       |  17191     |  19522
48       |  23122     |  29776
96       |  22308     |  32352
192      |  23109     |  28804

Now recall we were seeing no actual tps changes with numa_balancing=0 or1 (so the improvement above is from the other changes), but figured itmight be informative to try to track down what the non-numa bottleneckslooked like. We tried profiling the entire 10 minute run which showedthe stats collector as a possible source of contention:



     3.86%        postgres  [kernel.kallsyms]        [k] _raw_spin_lock_bh
                  |
                  --- _raw_spin_lock_bh
                     |
                     |--95.78%-- lock_sock_nested
                     |          udpv6_sendmsg
                     |          inet_sendmsg
                     |          sock_sendmsg
                     |          SYSC_sendto
                     |          sys_sendto
                     |          tracesys
                     |          __libc_send
                     |          |
                     |          |--99.17%-- pgstat_report_stat
                     |          |          PostgresMain
                     |          |          ServerLoop
                     |          |          PostmasterMain
                     |          |          main
                     |          |          __libc_start_main
                     |          |
                     |          |--0.77%-- pgstat_send_bgwriter
                     |          |          BackgroundWriterMain
                     |          |          AuxiliaryProcessMain
                     |          |          0x7f08efe8d453
                     |          |          reaper
                     |          |          __restore_rt
                     |          |          PostmasterMain
                     |          |          main
                     |          |          __libc_start_main
                     |           --0.07%-- [...]
                     |
                     |--2.54%-- __lock_sock
                     |          |
                     |          |--91.95%-- lock_sock_nested
                     |          |          udpv6_sendmsg
                     |          |          inet_sendmsg
                     |          |          sock_sendmsg
                     |          |          SYSC_sendto
                     |          |          sys_sendto
                     |          |          tracesys
                     |          |          __libc_send
                     |          |          |
                     |          |          |--99.73%-- pgstat_report_stat
                     |          |          |          PostgresMain
                     |          |          |          ServerLoop



Disabling track_counts and rerunning pgbench:

clients  | tps (no counts)
---------+------------
6        |    9806
12       |   18000
24       |   29281
48       |   43703
96       |   54539
192      |   36114

While these numbers look great in the middle range (12-96 clients), thenbenefit looks to be tailing off as client numbers increase. Also runningwith no stats (and hence no auto vacuum or analyze) is way too scary!

Trying out less write heavy workloads shows that the stats overhead doesnot appear to be significant for *read* heavy cases, so this resultabove is perhaps more of a curiosity than anything (given that readheavy is more typical...and our real workload is more similar to readheavy).


The profile for counts off looks like:

     4.79%         swapper  [kernel.kallsyms]        [k] read_hpet
                   |
                   --- read_hpet
                      |
                      |--97.10%-- ktime_get
                      |          |
                      |          |--35.24%-- clockevents_program_event
                      |          |          tick_program_event
                      |          |          |

| | |--56.59%--__hrtimer_start_range_ns

                      |          |          |          |

                      |          |          |          |          |

| | | ||--98.84%-- start_secondary

                      |          |          |          |          |

                      |          |          |          |

                      |          |          |                     |

| | ||--99.89%-- tick_nohz_idle_enter| | | |cpu_startup_entry| | | ||| | | ||--98.30%-- start_secondary| | | ||| | | |--1.70%-- rest_init| | | |start_kernel| | | |x86_64_start_reservations| | | |x86_64_start_kernel| | |--0.11%-- [...]

                      |          |          |

| | |--40.25%--hrtimer_force_reprogram

                      |          |          |          __remove_hrtimer
                      |          |          |          |

                      |          |          |          |          |

| | | ||--99.90%-- tick_nohz_idle_enter| | | | |cpu_startup_entry| | | | ||| | | | ||--99.04%-- start_secondary| | | | ||| | | | |--0.96%-- rest_init| | | | |start_kernel| | | | |x86_64_start_reservations| | | | |x86_64_start_kernel| | | |--0.10%-- [...]

                      |          |          |          |



Any thoughts on how to proceed further appreciated!

Cheers,

Mark


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 60 core performance with 9.3

Reply via email to