On Thu, Mar 24, 2016 at 8:08 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund <and...@anarazel.de> wrote: > > > > Have you, in your evaluation of the performance of this patch, done > > profiles over time? I.e. whether the performance benefits are the > > immediately, or only after a significant amount of test time? Comparing > > TPS over time, for both patched/unpatched looks relevant. > > > > I have mainly done it with half-hour read-write tests. What do you want to observe via smaller tests, sometimes it gives inconsistent data for read-write tests? >
I have done some tests on both intel and power m/c (configuration of which are mentioned at end-of-mail) to see the results at different time-intervals and it is always showing greater than 50% improvement in power m/c at 128 client-count and greater than 29% improvement in Intel m/c at 88 client-count. Non-default parameters ------------------------------------ max_connections = 300 shared_buffers=8GB min_wal_size=10GB max_wal_size=15GB checkpoint_timeout =35min maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 wal_buffers = 256MB pgbench setup ------------------------ scale factor - 300 used *unlogged* tables : pgbench -i --unlogged-tables -s 300 .. pgbench -M prepared tpc-b Results on Intel m/c -------------------------------- client-count - 88 Time (minutes) Base Patch % 5 39978 51858 29.71 10 38169 52195 36.74 20 36992 52173 41.03 30 37042 52149 40.78 Results on power m/c ----------------------------------- Client-count - 128 Time (minutes) Base Patch % 5 42479 65655 54.55 10 41876 66050 57.72 20 38099 65200 71.13 30 37838 61908 63.61 > > > > > Even after changing to scale 500, the performance benefits on this, > > older 2 socket, machine were minor; even though contention on the > > ClogControlLock was the second most severe (after ProcArrayLock). > > > > I have tried this patch on mainly 8 socket machine with 300 & 1000 scale factor. I am hoping that you have tried this test on unlogged tables and by the way at what client count, you have seen these results. > Do you think in your tests, we don't see increase in performance in your tests because of m/c difference (sockets/cpu cores) or client-count? Intel m/c config (lscpu) ------------------------------------- Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 BogoMIPS: 4266.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K NUMA node0 CPU(s): 0,65-71,96-103 NUMA node1 CPU(s): 72-79,104-111 NUMA node2 CPU(s): 80-87,112-119 NUMA node3 CPU(s): 88-95,120-127 NUMA node4 CPU(s): 1-8,33-40 NUMA node5 CPU(s): 9-16,41-48 NUMA node6 CPU(s): 17-24,49-56 NUMA node7 CPU(s): 25-32,57-64 Power m/c config (lscpu) ------------------------------------- Architecture: ppc64le Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 24 NUMA node(s): 4 Model: IBM,8286-42A L1d cache: 64K L1i cache: 32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-47 NUMA node1 CPU(s): 48-95 NUMA node2 CPU(s): 96-143 NUMA node3 CPU(s): 144-191 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com