On Sat, Apr 2, 2016 at 5:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund <and...@anarazel.de> wrote: > > Here is the performance data (configuration of machine used to perform > this test is mentioned at end of mail): > > Non-default parameters > ------------------------------------ > max_connections = 300 > shared_buffers=8GB > min_wal_size=10GB > max_wal_size=15GB > checkpoint_timeout =35min > maintenance_work_mem = 1GB > checkpoint_completion_target = 0.9 > wal_buffers = 256MB > > median of 3, 20-min pgbench tpc-b results for --unlogged-tables > I have ran exactly same test on intel x86 m/c and the results are as below: Client Count/Patch_ver (tps) 2 128 256 HEAD – Commit 2143f5e1 2832 35001 26756 clog_buf_128 2909 50685 40998 clog_buf_128 +group_update_clog_v8 2981 53043 50779 clog_buf_128 +content_lock 2843 56261 54059 clog_buf_128 +nocontent_lock 2630 56554 54429 In this m/c, I don't see any run-to-run variation, however the trend of results seems somewhat similar to power m/c. Clearly the first patch increasing clog bufs to 128 shows upto 50% performance improvement on 256 client-count. We can also observe that group clog patch gives ~24% gain on top of increase clog bufs patch at 256 client count. Both content lock and no content lock patches show similar performance gains and the performance is 6~7% better than group clog patch. Also as on power m/c, no content lock patch seems to show some regression at lower client count (2 clients in this case). Based on above results, increase_clog_bufs to 128 is a clear winner and I think we might not want to proceed with no content lock approach patch as that shows some regression and also it is no better than using content lock approach patch. Now, I think we need to decide between group clog mode approach patch and use content lock approach patch, it seems to me that the difference between both of these is not high (6~7%) and I think that when there are sub-transactions involved (sub-transactions on same page as main transaction) group clog mode patch should give better performance as then content lock in itself will start becoming bottleneck. Now, I think we can address that case for content lock approach by using grouping technique on content lock or something similar, but not sure if that is worth the effort. Also, I see some variation in performance data with content lock patch on power m/c, but again that might be attributed to m/c characteristics. So, I think we can proceed with either group clog patch or content lock patch and if we want to proceed with content lock approach, then we need to do some more work on it. Note - For both content and no content lock, I have applied 0001-Improve-64bit-atomics-support patch. m/c config (lscpu) --------------------------- Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 BogoMIPS: 4266.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K NUMA node0 CPU(s): 0,65-71,96-103 NUMA node1 CPU(s): 72-79,104-111 NUMA node2 CPU(s): 80-87,112-119 NUMA node3 CPU(s): 88-95,120-127 NUMA node4 CPU(s): 1-8,33-40 NUMA node5 CPU(s): 9-16,41-48 NUMA node6 CPU(s): 17-24,49-56 NUMA node7 CPU(s): 25-32,57-64 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com