Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 1, 2017 at 9:17 PM, Robert Haas wrote: > On Fri, Sep 1, 2017 at 10:03 AM, Dilip Kumar wrote: >>> Sure will do so. In the meantime, I have rebased the patch. >> >> I have repeated some of the tests we have performed earlier. > Thanks for repeating the performance tests. > OK, these tests seem to show that this is still working. Committed, > again. Let's hope this attempt goes better than the last one. > Thanks for committing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 1, 2017 at 10:03 AM, Dilip Kumar wrote: >> Sure will do so. In the meantime, I have rebased the patch. > > I have repeated some of the tests we have performed earlier. OK, these tests seem to show that this is still working. Committed, again. Let's hope this attempt goes better than the last one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Aug 30, 2017 at 12:54 PM, Amit Kapila wrote: > > That would have been better. In any case, will do the tests on some > higher end machine and will share the results. > >> Given that we've changed the approach here somewhat, I think we need >> to validate that we're still seeing a substantial reduction in >> CLogControlLock contention on big machines. >> > > Sure will do so. In the meantime, I have rebased the patch. I have repeated some of the tests we have performed earlier. Machine: Intel 8 socket machine with 128 core. Configuration: shared_buffers=8GB checkpoint_timeout=40min max_wal_size=20GB max_connections=300 maintenance_work_mem=4GB synchronous_commit=off checkpoint_completion_target=0.9 I have run taken one reading for each test to measure the wait event. Observation is same that at higher client count there is a significant reduction in the contention on ClogControlLock. Benchmark: Pgbench simple_update, 30 mins run: Head: (64 client) : (TPS 60720) 53808 Client | ClientRead 26147 IPC | ProcArrayGroupUpdate 7866 LWLock | CLogControlLock 3705 Activity| LogicalLauncherMain 3699 Activity| AutoVacuumMain 3353 LWLock | ProcArrayLoc 3099 LWLock | wal_insert 2825 Activity| BgWriterMain 2688 Lock| extend 1436 Activity| WalWriterMain Patch: (64 client) : (TPS 67207) 53235 Client | ClientRead 29470 IPC | ProcArrayGroupUpdate 4302 LWLock | wal_insert 3717 Activity| LogicalLauncherMain 3715 Activity| AutoVacuumMain 3463 LWLock | ProcArrayLock 3140 Lock| extend 2934 Activity| BgWriterMain 1434 Activity| WalWriterMain 1198 Activity| CheckpointerMain 1073 LWLock | XidGenLock 869 IPC | ClogGroupUpdate Head:(72 Client): (TPS 57856) 55820 Client | ClientRead 34318 IPC | ProcArrayGroupUpdate 15392 LWLock | CLogControlLock 3708 Activity| LogicalLauncherMain 3705 Activity| AutoVacuumMain 3436 LWLock | ProcArrayLock Patch:(72 Client): (TPS 65740) 60356 Client | ClientRead 38545 IPC | ProcArrayGroupUpdate 4573 LWLock | wal_insert 3708 Activity| LogicalLauncherMain 3705 Activity| AutoVacuumMain 3508 LWLock | ProcArrayLock 3492 Lock| extend 2903 Activity| BgWriterMain 1903 LWLock | XidGenLock 1383 Activity| WalWriterMain 1212 Activity| CheckpointerMain 1056 IPC | ClogGroupUpdate Head:(96 Client): (TPS 52170) 62841 LWLock | CLogControlLock 56150 IPC | ProcArrayGroupUpdate 54761 Client | ClientRead 7037 LWLock | wal_insert 4077 Lock| extend 3727 Activity| LogicalLauncherMain 3727 Activity| AutoVacuumMain 3027 LWLock | ProcArrayLock Patch:(96 Client): (TPS 67932) 87378 IPC | ProcArrayGroupUpdate 80201 Client | ClientRead 11511 LWLock | wal_insert 4102 Lock| extend 3971 LWLock | ProcArrayLock 3731 Activity| LogicalLauncherMain 3731 Activity| AutoVacuumMain 2948 Activity| BgWriterMain 1763 LWLock | XidGenLock 1736 IPC | ClogGroupUpdate Head:(128 Client): (TPS 40820) 182569 LWLock | CLogControlLock 61484 IPC | ProcArrayGroupUpdate 37969 Client | ClientRead 5135 LWLock | wal_insert 3699 Activity| LogicalLauncherMain 3699 Activity| AutoVacuumMain Patch:(128 Client): (TPS 67054) 174583 IPC | ProcArrayGroupUpdate 66084 Client | ClientRead 16738 LWLock | wal_insert 4993 IPC | ClogGroupUpdate 4893 LWLock | ProcArrayLock 4839 Lock| extend Benchmark: select for update with 3 save points, 10 mins run Script: \set aid random (1,3000) \set tid random (1,3000) BEGIN; SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE; SAVEPOINT s1; SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE; SAVEPOINT s2; SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE; SAVEPOINT s3; SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE; END; Head:(64 Client): (TPS 44577.1802) 53808 Client | ClientRead 26147 IPC | ProcArrayGroupUpdate 7866 LWLock | CLogControlLock 3705 Activity| LogicalLauncherMain 3699 Activity| AutoVacuumMain 3353 LWLock | ProcArrayLock 3099 LWLock | wal_insert Patch:(64 Client): (TPS 46156.245) 53235 Client | ClientRead 29470 IPC | Pr
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Aug 30, 2017 at 2:43 AM, Robert Haas wrote: > On Tue, Jul 4, 2017 at 12:33 AM, Amit Kapila wrote: >> I have updated the patch to support wait events and moved it to upcoming CF. > > This patch doesn't apply any more, but I made it apply with a hammer > and then did a little benchmarking (scylla, EDB server, Intel Xeon > E5-2695 v3 @ 2.30GHz, 2 sockets, 14 cores/socket, 2 threads/core). > The results were not impressive. There's basically no clog contention > to remove, so the patch just doesn't really do anything. > Yeah, in such a case patch won't help. > For example, > here's a wait event profile with master and using Ashutosh's test > script with 5 savepoints: > > 1 Lock| tuple > 2 IO | SLRUSync > 5 LWLock | wal_insert > 5 LWLock | XidGenLock > 9 IO | DataFileRead > 12 LWLock | lock_manager > 16 IO | SLRURead > 20 LWLock | CLogControlLock > 97 LWLock | buffer_content > 216 Lock| transactionid > 237 LWLock | ProcArrayLock >1238 IPC | ProcArrayGroupUpdate >2266 Client | ClientRead > > This is just a 5-minute test; maybe things would change if we ran it > for longer, but if only 0.5% of the samples are blocked on > CLogControlLock without the patch, obviously the patch can't help > much. I did some other experiments too, but I won't bother > summarizing the results here because they're basically boring. I > guess I should have used a bigger machine. > That would have been better. In any case, will do the tests on some higher end machine and will share the results. > Given that we've changed the approach here somewhat, I think we need > to validate that we're still seeing a substantial reduction in > CLogControlLock contention on big machines. > Sure will do so. In the meantime, I have rebased the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v14.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Jul 4, 2017 at 12:33 AM, Amit Kapila wrote: > I have updated the patch to support wait events and moved it to upcoming CF. This patch doesn't apply any more, but I made it apply with a hammer and then did a little benchmarking (scylla, EDB server, Intel Xeon E5-2695 v3 @ 2.30GHz, 2 sockets, 14 cores/socket, 2 threads/core). The results were not impressive. There's basically no clog contention to remove, so the patch just doesn't really do anything. For example, here's a wait event profile with master and using Ashutosh's test script with 5 savepoints: 1 Lock| tuple 2 IO | SLRUSync 5 LWLock | wal_insert 5 LWLock | XidGenLock 9 IO | DataFileRead 12 LWLock | lock_manager 16 IO | SLRURead 20 LWLock | CLogControlLock 97 LWLock | buffer_content 216 Lock| transactionid 237 LWLock | ProcArrayLock 1238 IPC | ProcArrayGroupUpdate 2266 Client | ClientRead This is just a 5-minute test; maybe things would change if we ran it for longer, but if only 0.5% of the samples are blocked on CLogControlLock without the patch, obviously the patch can't help much. I did some other experiments too, but I won't bother summarizing the results here because they're basically boring. I guess I should have used a bigger machine. Given that we've changed the approach here somewhat, I think we need to validate that we're still seeing a substantial reduction in CLogControlLock contention on big machines. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Jul 3, 2017 at 6:15 PM, Amit Kapila wrote: > On Thu, Mar 23, 2017 at 1:18 PM, Ashutosh Sharma > wrote: >> >> Conclusion: >> As seen from the test results mentioned above, there is some performance >> improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly >> better than HEAD, with 7 and 10 SP(s) we do see regression with patch. >> Therefore, I think the threshold value of 4 for number of subtransactions >> considered in the patch looks fine to me. >> > > Thanks for the tests. Attached find the rebased patch on HEAD. I have ran > latest pgindent on patch. I have yet to add wait event for group lock waits > in this patch as is done by Robert in commit > d4116a771925379c33cf4c6634ca620ed08b551d for ProcArrayGroupUpdate. > I have updated the patch to support wait events and moved it to upcoming CF. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v13.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Mar 23, 2017 at 1:18 PM, Ashutosh Sharma wrote: > > *Conclusion:* > As seen from the test results mentioned above, there is some performance > improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly > better than HEAD, with 7 and 10 SP(s) we do see regression with patch. > Therefore, I think the threshold value of 4 for number of subtransactions > considered in the patch looks fine to me. > > Thanks for the tests. Attached find the rebased patch on HEAD. I have ran latest pgindent on patch. I have yet to add wait event for group lock waits in this patch as is done by Robert in commit d4116a771925379c33cf4c6634ca620ed08b551d for ProcArrayGroupUpdate. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Mar 9, 2017 at 5:49 PM, Robert Haas wrote: > However, I just realized that in > both this case and in the case of group XID clearing, we weren't > advertising a wait event for the PGSemaphoreLock calls that are part > of the group locking machinery. I think we should fix that, because a > quick test shows that can happen fairly often -- not, I think, as > often as we would have seen LWLock waits without these patches, but > often enough that you'll want to know. Patch attached. I've pushed the portion of this that relates to ProcArrayLock. (I know this hasn't been discussed much, but there doesn't really seem to be any reason for anybody to object, and looking at just the LWLock/ProcArrayLock wait events gives a highly misleading answer.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi All, I have tried to test 'group_update_clog_v11.1.patch' shared upthread by Amit on a high end machine. I have tested the patch with various savepoints in my test script. The machine details along with test scripts and the test results are shown below, Machine details: 24 sockets, 192 CPU(s) RAM - 500GB test script: \set aid random (1,3000) \set tid random (1,3000) BEGIN; SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE; SAVEPOINT s1; SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE; SAVEPOINT s2; SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE; SAVEPOINT s3; SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE; SAVEPOINT s4; SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE; SAVEPOINT s5; SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE; END; Non-default parameters == max_connections = 200 shared_buffers=8GB min_wal_size=10GB max_wal_size=15GB maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 checkpoint_timeout=900 synchronous_commit=off pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres -f ~/test_script.sql where, time_for_reading = 10 mins Test Results: = With 3 savepoints = CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT 128 50275 53704 6.82048732 64 62860 66561 5.887686923 8 18464 18752 1.559792028 With 5 savepoints = CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT 128 46559 47715 2.482871196 64 52306 52082 -0.4282491492 8 12289 12852 4.581332899 With 7 savepoints = CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT 128 41367 41500 0.3215123166 64 42996 41473 -3.542189971 8 9665 9657 -0.0827728919 With 10 savepoints == CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT 128 34513 34597 0.24338655 64 32581 32035 -1.67582 8 7293 7622 4.511175099 *Conclusion:* As seen from the test results mentioned above, there is some performance improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly better than HEAD, with 7 and 10 SP(s) we do see regression with patch. Therefore, I think the threshold value of 4 for number of subtransactions considered in the patch looks fine to me. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Tue, Mar 21, 2017 at 6:19 PM, Amit Kapila wrote: > On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas > wrote: > > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila > wrote: > >>> I was wondering about doing an explicit test: if the XID being > >>> committed matches the one in the PGPROC, and nsubxids matches, and the > >>> actual list of XIDs matches, then apply the optimization. That could > >>> replace the logic that you've proposed to exclude non-commit cases, > >>> gxact cases, etc. and it seems fundamentally safer. But it might be a > >>> more expensive test, too, so I'm not sure. > >> > >> I think if the number of subxids is very small let us say under 5 or > >> so, then such a check might not matter, but otherwise it could be > >> expensive. > > > > We could find out by testing it. We could also restrict the > > optimization to cases with just a few subxids, because if you've got a > > large number of subxids this optimization probably isn't buying much > > anyway. > > > > Yes, and I have modified the patch to compare xids and subxids for > group update. In the initial short tests (with few client counts), it > seems like till 3 savepoints we can win and 10 savepoints onwards > there is some regression or at the very least there doesn't appear to > be any benefit. We need more tests to identify what is the safe > number, but I thought it is better to share the patch to see if we > agree on the changes because if not, then the whole testing needs to > be repeated. Let me know what do you think about attached? > > > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas wrote: > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila wrote: >>> I was wondering about doing an explicit test: if the XID being >>> committed matches the one in the PGPROC, and nsubxids matches, and the >>> actual list of XIDs matches, then apply the optimization. That could >>> replace the logic that you've proposed to exclude non-commit cases, >>> gxact cases, etc. and it seems fundamentally safer. But it might be a >>> more expensive test, too, so I'm not sure. >> >> I think if the number of subxids is very small let us say under 5 or >> so, then such a check might not matter, but otherwise it could be >> expensive. > > We could find out by testing it. We could also restrict the > optimization to cases with just a few subxids, because if you've got a > large number of subxids this optimization probably isn't buying much > anyway. > Yes, and I have modified the patch to compare xids and subxids for group update. In the initial short tests (with few client counts), it seems like till 3 savepoints we can win and 10 savepoints onwards there is some regression or at the very least there doesn't appear to be any benefit. We need more tests to identify what is the safe number, but I thought it is better to share the patch to see if we agree on the changes because if not, then the whole testing needs to be repeated. Let me know what do you think about attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila wrote: >> I was wondering about doing an explicit test: if the XID being >> committed matches the one in the PGPROC, and nsubxids matches, and the >> actual list of XIDs matches, then apply the optimization. That could >> replace the logic that you've proposed to exclude non-commit cases, >> gxact cases, etc. and it seems fundamentally safer. But it might be a >> more expensive test, too, so I'm not sure. > > I think if the number of subxids is very small let us say under 5 or > so, then such a check might not matter, but otherwise it could be > expensive. We could find out by testing it. We could also restrict the optimization to cases with just a few subxids, because if you've got a large number of subxids this optimization probably isn't buying much anyway. We're trying to avoid grabbing CLogControlLock to do a very small amount of work, but if you've got 10 or 20 subxids we're doing as much work anyway as the group update optimization is attempting to put into one batch. > So we have four ways to proceed: > 1. Have this optimization for subtransactions and make it safe by > having some additional conditions like check for recovery, explicit > check for if the actual transaction ids match with ids stored in proc. > 2. Have this optimization when there are no subtransactions. In this > case, we can have a very simple check for this optimization. > 3. Drop this patch and idea. > 4. Consider it for next version. > > I personally think second way is okay for this release as that looks > safe and gets us the maximum benefit we can achieve by this > optimization and then consider adding optimization for subtransactions > (first way) in the future version if we think it is safe and gives us > the benefit. > > Thoughts? I don't like #2 very much. Restricting it to a relatively small number of transactions - whatever we can show doesn't hurt performance - seems OK, but restriction it to the exactly-zero-subtransactions case seems poor. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Sun, Mar 12, 2017 at 8:11 AM, Robert Haas wrote: > On Fri, Mar 10, 2017 at 7:39 PM, Amit Kapila wrote: >> I agree that more analysis can help us to decide if we can use subxids >> from PGPROC and if so under what conditions. Have you considered the >> another patch I have posted to fix the issue which is to do this >> optimization only when subxids are not present? In that patch, it >> will remove the dependency of relying on subxids in PGPROC. > > Well, that's an option, but it narrows the scope of the optimization > quite a bit. I think Simon previously opposed handling only the > no-subxid cases (although I may be misremembering) and I'm not that > keen about it either. > > I was wondering about doing an explicit test: if the XID being > committed matches the one in the PGPROC, and nsubxids matches, and the > actual list of XIDs matches, then apply the optimization. That could > replace the logic that you've proposed to exclude non-commit cases, > gxact cases, etc. and it seems fundamentally safer. But it might be a > more expensive test, too, so I'm not sure. > I think if the number of subxids is very small let us say under 5 or so, then such a check might not matter, but otherwise it could be expensive. > It would be nice to get some other opinions on how (and whether) to > proceed with this. I'm feeling really nervous about this right at the > moment, because it seems like everybody including me missed some > fairly critical points relating to the safety (or lack thereof) of > this patch, and I want to make sure that if it gets committed again, > we've really got everything nailed down tight. > I think the basic thing that is missing in the last patch was that we can't apply this optimization during WAL replay as during recovery/hotstandby the xids/subxids are tracked in KnownAssignedXids. The same is mentioned in header file comments in procarray.c and in GetSnapshotData (look at an else loop of the check if (!snapshot->takenDuringRecovery)). As far as I can see, the patch has considered that in the initial versions but then the check got dropped in one of the later revisions by mistake. The patch version-5 [1] has the check for recovery, but during some code rearrangement, it got dropped in version-6 [2]. Having said that, I think the improvement in case there are subtransactions will be lesser because having subtransactions means more work under LWLock and that will have lesser context switches. This optimization is all about the reduction in frequent context switches, so I think even if we don't optimize the case for subtransactions we are not leaving much on the table and it will make this optimization much safe. To substantiate this theory with data, see the difference in performance when subtransactions are used [3] and when they are not used [4]. So we have four ways to proceed: 1. Have this optimization for subtransactions and make it safe by having some additional conditions like check for recovery, explicit check for if the actual transaction ids match with ids stored in proc. 2. Have this optimization when there are no subtransactions. In this case, we can have a very simple check for this optimization. 3. Drop this patch and idea. 4. Consider it for next version. I personally think second way is okay for this release as that looks safe and gets us the maximum benefit we can achieve by this optimization and then consider adding optimization for subtransactions (first way) in the future version if we think it is safe and gives us the benefit. Thoughts? [1] - https://www.postgresql.org/message-id/CAA4eK1KUVPxBcGTdOuKyvf5p1sQ0HeUbSMbTxtQc%3DP65OxiZog%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1L4iV-2qe7AyMVsb%2Bnz7SiX8JvCO%2BCqhXwaiXgm3CaBUw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAFiTN-u3%3DXUi7z8dTOgxZ98E7gL1tzL%3Dq9Yd%3DCwWCtTtS6pOZw%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 7:39 PM, Amit Kapila wrote: > I agree that more analysis can help us to decide if we can use subxids > from PGPROC and if so under what conditions. Have you considered the > another patch I have posted to fix the issue which is to do this > optimization only when subxids are not present? In that patch, it > will remove the dependency of relying on subxids in PGPROC. Well, that's an option, but it narrows the scope of the optimization quite a bit. I think Simon previously opposed handling only the no-subxid cases (although I may be misremembering) and I'm not that keen about it either. I was wondering about doing an explicit test: if the XID being committed matches the one in the PGPROC, and nsubxids matches, and the actual list of XIDs matches, then apply the optimization. That could replace the logic that you've proposed to exclude non-commit cases, gxact cases, etc. and it seems fundamentally safer. But it might be a more expensive test, too, so I'm not sure. It would be nice to get some other opinions on how (and whether) to proceed with this. I'm feeling really nervous about this right at the moment, because it seems like everybody including me missed some fairly critical points relating to the safety (or lack thereof) of this patch, and I want to make sure that if it gets committed again, we've really got everything nailed down tight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Sat, Mar 11, 2017 at 2:10 AM, Robert Haas wrote: > On Fri, Mar 10, 2017 at 6:25 AM, Amit Kapila wrote: >> On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane wrote: >>> Amit Kapila writes: Just to let you know that I think I have figured out the reason of failure. If we run the regressions with attached patch, it will make the regression tests fail consistently in same way. The patch just makes all transaction status updates to go via group clog update mechanism. >>> >>> This does *not* give me a warm fuzzy feeling that this patch was >>> ready to commit. Or even that it was tested to the claimed degree. >>> >> >> I think this is more of an implementation detail missed by me. We >> have done quite some performance/stress testing with a different >> number of savepoints, but this could have been caught only by having >> Rollback to Savepoint followed by a commit. I agree that we could >> have devised some simple way (like the one I shared above) to test the >> wide range of tests with this new mechanism earlier. This is a >> learning from here and I will try to be more cautious about such >> things in future. > > After some study, I don't feel confident that it's this simple. The > underlying issue here is that TransactionGroupUpdateXidStatus thinks > it can assume that proc->clogGroupMemberXid, pgxact->nxids, and > proc->subxids.xids match the values that were passed to > TransactionIdSetPageStatus, but that's not checked anywhere. For > example, I thought about adding these assertions: > >Assert(nsubxids == MyPgXact->nxids); >Assert(memcmp(subxids, MyProc->subxids.xids, > nsubxids * sizeof(TransactionId)) == 0); > > There's not even a comment in the patch anywhere that notes that we're > assuming this, let alone anything that checks that it's actually true, > which seems worrying. > > One thing that seems off is that we have this new field > clogGroupMemberXid, which we use to determine the XID that is being > committed, but for the subxids we think it's going to be true in every > case. Well, that seems a bit odd, right? I mean, if the contents of > the PGXACT are a valid way to figure out the subxids that we need to > worry about, then why not also it to get the toplevel XID? > > Another point that's kind of bothering me is that this whole approach > now seems to me to be an abstraction violation. It relies on the set > of subxids for which we're setting status in clog matching the set of > subxids advertised in PGPROC. But actually there's a fair amount of > separation between those things. What's getting passed down to clog > is coming from xact.c's transaction state stack, which is completely > separate from the procarray. Now after going over the logic in some > detail, it does look to me that you're correct that in the case of a > toplevel commit they will always match, but in some sense that looks > accidental. > > For example, look at this code from RecordTransactionAbort: > > /* > * If we're aborting a subtransaction, we can immediately remove failed > * XIDs from PGPROC's cache of running child XIDs. We do that here for > * subxacts, because we already have the child XID array at hand. For > * main xacts, the equivalent happens just after this function returns. > */ > if (isSubXact) > XidCacheRemoveRunningXids(xid, nchildren, children, latestXid); > > That code paints the removal of the aborted subxids from our PGPROC as > an optimization, not a requirement for correctness. And without this > patch, that's correct: the XIDs are advertised in PGPROC so that we > construct correct snapshots, but they only need to be present there > for so long as there is a possibility that those XIDs might in the > future commit. Once they've aborted, it's not *necessary* for them to > appear in PGPROC any more, but it doesn't hurt anything if they do. > However, with this patch, removing them from PGPROC becomes a hard > requirement, because otherwise the set of XIDs that are running > according to the transaction state stack and the set that are running > according to the PGPROC might be different. Yet, neither the original > patch nor your proposed fix patch updated any of the comments here. > There was a comment in existing code (proc.h) which states that it will contain non-aborted transactions. I agree that having it explicitly mentioned in patch would have been much better. /* * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds * for non-aborted subtransactions of its current top transaction. These * have to be treated as running XIDs by other backends. > One might wonder whether it's even wise to tie these things together > too closely. For example, you can imagine a future patch for > autonomous transactions stashing their XIDs in the subxids array. > That'd be fine for snapshot purposes, but it would break this. > > Finally, I had an unexplained hang during
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Robert Haas wrote: > The smoking gun was in 009_twophase_slave.log: > > TRAP: FailedAssertion("!(nsubxids == MyPgXact->nxids)", File: > "clog.c", Line: 288) > > ...and then the node shuts down, which is why this hangs forever. > (Also... what's up with it hanging forever instead of timing out or > failing or something?) This bit my while messing with 2PC tests recently. I think it'd be worth doing something about this, such as causing the test to die if we request a server to (re)start and it doesn't start or it immediately crashes. This doesn't solve the problem of a server crashing at a point not immediately after start, though. (It'd be very annoying to have to sprinkle the Perl test code with "assert $server->islive", but perhaps we can add assertions of some kind in PostgresNode itself). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 3:40 PM, Robert Haas wrote: > Finally, I had an unexplained hang during the TAP tests while testing > out your fix patch. I haven't been able to reproduce that so it > might've just been an artifact of something stupid I did, or of some > unrelated bug, but I think it's best to back up and reconsider a bit > here. I was able to reproduce this with the following patch: diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index bff42dc..0546425 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -268,9 +268,11 @@ set_status_by_pages(int nsubxids, TransactionId *subxids, * has a race condition (see TransactionGroupUpdateXidStatus) but the * worst thing that happens if we mess up is a small loss of efficiency; * the intent is to avoid having the leader access pages it wouldn't - * otherwise need to touch. Finally, we skip it for prepared transactions, - * which don't have the semaphore we would need for this optimization, - * and which are anyway probably not all that common. + * otherwise need to touch. We also skip it if the transaction status is + * other than commit, because for rollback and rollback to savepoint, the + * list of subxids won't be same as subxids array in PGPROC. Finally, we skip + * it for prepared transactions, which don't have the semaphore we would need + * for this optimization, and which are anyway probably not all that common. */ static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids, @@ -280,15 +282,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, { if (all_xact_same_page && nsubxids < PGPROC_MAX_CACHED_SUBXIDS && +status == TRANSACTION_STATUS_COMMITTED && !IsGXactActive()) { +Assert(nsubxids == MyPgXact->nxids); +Assert(memcmp(subxids, MyProc->subxids.xids, + nsubxids * sizeof(TransactionId)) == 0); + /* * If we can immediately acquire CLogControlLock, we update the status * of our own XID and release the lock. If not, try use group XID * update. If that doesn't work out, fall back to waiting for the * lock to perform an update for this transaction only. */ -if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE)) +if (false && LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE)) { TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status, lsn, pageno); LWLockRelease(CLogControlLock); make check-world hung here: t/009_twophase.pl .. 1..13 ok 1 - Commit prepared transaction after restart ok 2 - Rollback prepared transaction after restart [rhaas pgsql]$ ps uxww | grep postgres rhaas 72255 0.0 0.0 2447996 1684 s000 S+3:40PM 0:00.00 /Users/rhaas/pgsql/tmp_install/Users/rhaas/install/dev/bin/psql -XAtq -d port=64230 host=/var/folders/y8/r2ycj_jj2vd65v71rmyddpr4gn/T/ZVWy0JGbuw dbname='postgres' -f - -v ON_ERROR_STOP=1 rhaas 72253 0.0 0.0 2478532 1548 ?? Ss3:40PM 0:00.00 postgres: bgworker: logical replication launcher rhaas 72252 0.0 0.0 2483132740 ?? Ss3:40PM 0:00.05 postgres: stats collector process rhaas 72251 0.0 0.0 2486724 1952 ?? Ss3:40PM 0:00.02 postgres: autovacuum launcher process rhaas 72250 0.0 0.0 2477508880 ?? Ss3:40PM 0:00.03 postgres: wal writer process rhaas 72249 0.0 0.0 2477508972 ?? Ss3:40PM 0:00.03 postgres: writer process rhaas 72248 0.0 0.0 2477508 1252 ?? Ss3:40PM 0:00.00 postgres: checkpointer process rhaas 72246 0.0 0.0 2481604 5076 s000 S+3:40PM 0:00.03 /Users/rhaas/pgsql/tmp_install/Users/rhaas/install/dev/bin/postgres -D /Users/rhaas/pgsql/src/test/recovery/tmp_check/data_master_Ylq1/pgdata rhaas 72337 0.0 0.0 2433796688 s002 S+4:14PM 0:00.00 grep postgres rhaas 72256 0.0 0.0 2478920 2984 ?? Ss3:40PM 0:00.00 postgres: rhaas postgres [local] COMMIT PREPARED waiting for 0/301D0D0 Backtrace of PID 72256: #0 0x7fff8ecc85c2 in poll () #1 0x0001078eb727 in WaitEventSetWaitBlock [inlined] () at /Users/rhaas/pgsql/src/backend/storage/ipc/latch.c:1118 #2 0x0001078eb727 in WaitEventSetWait (set=0x7fab3c8366c8, timeout=-1, occurred_events=0x7fff585e5410, nevents=1, wait_event_info=) at latch.c:949 #3 0x0001078eb409 in WaitLatchOrSocket (latch=, wakeEvents=, sock=-1, timeout=, wait_event_info=134217741) at latch.c:349 #4 0x0001078cf077 in SyncRepWaitForLSN (lsn=, commit=) at syncrep.c:284 #5 0x0001076a2dab in FinishPreparedTransaction (gid=, isCommit=1 '\001') at twophase.c:2110 #6 0x000107919420 in standard_ProcessUtility (pstmt=, queryString=, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x7fab3c853cf8, completionTag=) at utility.c:452 #7 0x0001079186f3 in PortalRunUtility (portal=0x7fab3c874a40, pstmt=0x7fab3c853c00, isTopLeve
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 6:25 AM, Amit Kapila wrote: > On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane wrote: >> Amit Kapila writes: >>> Just to let you know that I think I have figured out the reason of >>> failure. If we run the regressions with attached patch, it will make >>> the regression tests fail consistently in same way. The patch just >>> makes all transaction status updates to go via group clog update >>> mechanism. >> >> This does *not* give me a warm fuzzy feeling that this patch was >> ready to commit. Or even that it was tested to the claimed degree. >> > > I think this is more of an implementation detail missed by me. We > have done quite some performance/stress testing with a different > number of savepoints, but this could have been caught only by having > Rollback to Savepoint followed by a commit. I agree that we could > have devised some simple way (like the one I shared above) to test the > wide range of tests with this new mechanism earlier. This is a > learning from here and I will try to be more cautious about such > things in future. After some study, I don't feel confident that it's this simple. The underlying issue here is that TransactionGroupUpdateXidStatus thinks it can assume that proc->clogGroupMemberXid, pgxact->nxids, and proc->subxids.xids match the values that were passed to TransactionIdSetPageStatus, but that's not checked anywhere. For example, I thought about adding these assertions: Assert(nsubxids == MyPgXact->nxids); Assert(memcmp(subxids, MyProc->subxids.xids, nsubxids * sizeof(TransactionId)) == 0); There's not even a comment in the patch anywhere that notes that we're assuming this, let alone anything that checks that it's actually true, which seems worrying. One thing that seems off is that we have this new field clogGroupMemberXid, which we use to determine the XID that is being committed, but for the subxids we think it's going to be true in every case. Well, that seems a bit odd, right? I mean, if the contents of the PGXACT are a valid way to figure out the subxids that we need to worry about, then why not also it to get the toplevel XID? Another point that's kind of bothering me is that this whole approach now seems to me to be an abstraction violation. It relies on the set of subxids for which we're setting status in clog matching the set of subxids advertised in PGPROC. But actually there's a fair amount of separation between those things. What's getting passed down to clog is coming from xact.c's transaction state stack, which is completely separate from the procarray. Now after going over the logic in some detail, it does look to me that you're correct that in the case of a toplevel commit they will always match, but in some sense that looks accidental. For example, look at this code from RecordTransactionAbort: /* * If we're aborting a subtransaction, we can immediately remove failed * XIDs from PGPROC's cache of running child XIDs. We do that here for * subxacts, because we already have the child XID array at hand. For * main xacts, the equivalent happens just after this function returns. */ if (isSubXact) XidCacheRemoveRunningXids(xid, nchildren, children, latestXid); That code paints the removal of the aborted subxids from our PGPROC as an optimization, not a requirement for correctness. And without this patch, that's correct: the XIDs are advertised in PGPROC so that we construct correct snapshots, but they only need to be present there for so long as there is a possibility that those XIDs might in the future commit. Once they've aborted, it's not *necessary* for them to appear in PGPROC any more, but it doesn't hurt anything if they do. However, with this patch, removing them from PGPROC becomes a hard requirement, because otherwise the set of XIDs that are running according to the transaction state stack and the set that are running according to the PGPROC might be different. Yet, neither the original patch nor your proposed fix patch updated any of the comments here. One might wonder whether it's even wise to tie these things together too closely. For example, you can imagine a future patch for autonomous transactions stashing their XIDs in the subxids array. That'd be fine for snapshot purposes, but it would break this. Finally, I had an unexplained hang during the TAP tests while testing out your fix patch. I haven't been able to reproduce that so it might've just been an artifact of something stupid I did, or of some unrelated bug, but I think it's best to back up and reconsider a bit here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane wrote: > Amit Kapila writes: >> Just to let you know that I think I have figured out the reason of >> failure. If we run the regressions with attached patch, it will make >> the regression tests fail consistently in same way. The patch just >> makes all transaction status updates to go via group clog update >> mechanism. > > This does *not* give me a warm fuzzy feeling that this patch was > ready to commit. Or even that it was tested to the claimed degree. > I think this is more of an implementation detail missed by me. We have done quite some performance/stress testing with a different number of savepoints, but this could have been caught only by having Rollback to Savepoint followed by a commit. I agree that we could have devised some simple way (like the one I shared above) to test the wide range of tests with this new mechanism earlier. This is a learning from here and I will try to be more cautious about such things in future. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 11:43 AM, Amit Kapila wrote: > On Fri, Mar 10, 2017 at 10:51 AM, Tom Lane wrote: >> >> Also, I see clam reported in green just now, so it's not 100% >> reproducible :-( >> > > Just to let you know that I think I have figured out the reason of > failure. If we run the regressions with attached patch, it will make > the regression tests fail consistently in same way. The patch just > makes all transaction status updates to go via group clog update > mechanism. Now, the reason of the problem is that the patch has > relied on XidCache in PGPROC for subtransactions when they are not > overflowed which is okay for Commits, but not for Rollback to > Savepoint and Rollback. For Rollback to Savepoint, we just pass the > particular (sub)-transaction id to abort, but group mechanism will > abort all the sub-transactions in that top transaction to Rollback. I > am still analysing what could be the best way to fix this issue. I > think there could be multiple ways to fix this problem. One way is > that we can advertise the fact that the status update for transaction > involves subtransactions and then we can use xidcache for actually > processing the status update. Second is advertise all the > subtransaction ids for which status needs to be update, but I am sure > that is not-at all efficient as that will cosume lot of memory. Last > resort could be that we don't use group clog update optimization when > transaction has sub-transactions. > On further analysis, I don't think the first way mentioned above can work for Rollback To Savepoint because it can pass just a subset of sub-tranasctions in which case we can never identify it by looking at subxids in PGPROC unless we advertise all such subxids. The case I am talking is something like: Begin; Savepoint one; Insert ... Savepoint two Insert .. Savepoint three Insert ... Rollback to Savepoint two; Now, for Rollback to Savepoint two, we pass transaction ids corresponding to Savepoint three and two. So, I think we can apply this optimization only for transactions that always commits which will anyway be the most common use case. Another alternative as mentioned above is to do this optimization when there are no subtransactions involved. Attached two patches implements these two approaches (fix_clog_group_commit_opt_v1.patch - allow optimization only for commits; fix_clog_group_commit_opt_v2.patch - allow optimizations for transaction status updates that don't involve subxids). I think the first approach is a better way to deal with this, let me know your thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_clog_group_commit_opt_v1.patch Description: Binary data fix_clog_group_commit_opt_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Amit Kapila writes: > Just to let you know that I think I have figured out the reason of > failure. If we run the regressions with attached patch, it will make > the regression tests fail consistently in same way. The patch just > makes all transaction status updates to go via group clog update > mechanism. This does *not* give me a warm fuzzy feeling that this patch was ready to commit. Or even that it was tested to the claimed degree. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 10:51 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane wrote: >>> Buildfarm thinks eight wasn't enough. >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01 > >> At first I was confused how you knew that this was the fault of this >> patch, but this seems like a pretty indicator: >> TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != >> 0x00) || curval == status)", File: "clog.c", Line: 574) > > Yeah, that's what led me to blame the clog-group-update patch. > >> I'm not sure whether it's related to this problem or not, but now that >> I look at it, this (preexisting) comment looks like entirely wishful >> thinking: >> * If we update more than one xid on this page while it is being written >> * out, we might find that some of the bits go to disk and others don't. >> * If we are updating commits on the page with the top-level xid that >> * could break atomicity, so we subcommit the subxids first before we >> mark >> * the top-level commit. > > Maybe, but that comment dates to 2008 according to git, and clam has > been, er, happy as a clam up to now. My money is on a newly-introduced > memory-access-ordering bug. > > Also, I see clam reported in green just now, so it's not 100% > reproducible :-( > Just to let you know that I think I have figured out the reason of failure. If we run the regressions with attached patch, it will make the regression tests fail consistently in same way. The patch just makes all transaction status updates to go via group clog update mechanism. Now, the reason of the problem is that the patch has relied on XidCache in PGPROC for subtransactions when they are not overflowed which is okay for Commits, but not for Rollback to Savepoint and Rollback. For Rollback to Savepoint, we just pass the particular (sub)-transaction id to abort, but group mechanism will abort all the sub-transactions in that top transaction to Rollback. I am still analysing what could be the best way to fix this issue. I think there could be multiple ways to fix this problem. One way is that we can advertise the fact that the status update for transaction involves subtransactions and then we can use xidcache for actually processing the status update. Second is advertise all the subtransaction ids for which status needs to be update, but I am sure that is not-at all efficient as that will cosume lot of memory. Last resort could be that we don't use group clog update optimization when transaction has sub-transactions. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com force_clog_group_commit_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Robert Haas writes: > On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane wrote: >> Buildfarm thinks eight wasn't enough. >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01 > At first I was confused how you knew that this was the fault of this > patch, but this seems like a pretty indicator: > TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != > 0x00) || curval == status)", File: "clog.c", Line: 574) Yeah, that's what led me to blame the clog-group-update patch. > I'm not sure whether it's related to this problem or not, but now that > I look at it, this (preexisting) comment looks like entirely wishful > thinking: > * If we update more than one xid on this page while it is being written > * out, we might find that some of the bits go to disk and others don't. > * If we are updating commits on the page with the top-level xid that > * could break atomicity, so we subcommit the subxids first before we mark > * the top-level commit. Maybe, but that comment dates to 2008 according to git, and clam has been, er, happy as a clam up to now. My money is on a newly-introduced memory-access-ordering bug. Also, I see clam reported in green just now, so it's not 100% reproducible :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane wrote: > Robert Haas writes: >> I think eight is enough. Committed with some cosmetic changes. > > Buildfarm thinks eight wasn't enough. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01 At first I was confused how you knew that this was the fault of this patch, but this seems like a pretty indicator: TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", File: "clog.c", Line: 574) I'm not sure whether it's related to this problem or not, but now that I look at it, this (preexisting) comment looks like entirely wishful thinking: * If we update more than one xid on this page while it is being written * out, we might find that some of the bits go to disk and others don't. * If we are updating commits on the page with the top-level xid that * could break atomicity, so we subcommit the subxids first before we mark * the top-level commit. The problem with that is the word "before". There are no memory barriers here, so there's zero guarantee that other processes see the writes in the order they're performed here. But it might be a stretch to suppose that that would cause this symptom. Maybe we should replace that Assert() with an elog() and dump out the actual values. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Mar 10, 2017 at 7:47 AM, Tom Lane wrote: > Robert Haas writes: >> I think eight is enough. Committed with some cosmetic changes. > > Buildfarm thinks eight wasn't enough. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01 > Will look into this, though I don't have access to that machine, but it looks to be a power machine and I have access to somewhat similar machine. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Robert Haas writes: > I think eight is enough. Committed with some cosmetic changes. Buildfarm thinks eight wasn't enough. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Jan 31, 2017 at 11:35 PM, Michael Paquier wrote: >> Thanks for the review. > > Moved to CF 2017-03, the 8th commit fest of this patch. I think eight is enough. Committed with some cosmetic changes. I think the turning point for this somewhat-troubled patch was when we realized that, while results were somewhat mixed on whether it improved performance, wait event monitoring showed that it definitely reduced contention significantly. However, I just realized that in both this case and in the case of group XID clearing, we weren't advertising a wait event for the PGSemaphoreLock calls that are part of the group locking machinery. I think we should fix that, because a quick test shows that can happen fairly often -- not, I think, as often as we would have seen LWLock waits without these patches, but often enough that you'll want to know. Patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company group-update-waits-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Jan 17, 2017 at 9:18 PM, Amit Kapila wrote: > On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar wrote: >> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar wrote: >>> I have reviewed the latest patch and I don't have any more comments. >>> So if there is no objection from other reviewers I can move it to >>> "Ready For Committer"? >> >> Seeing no objections, I have moved it to Ready For Committer. >> > > Thanks for the review. Moved to CF 2017-03, the 8th commit fest of this patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar wrote: > On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar wrote: >> I have reviewed the latest patch and I don't have any more comments. >> So if there is no objection from other reviewers I can move it to >> "Ready For Committer"? > > Seeing no objections, I have moved it to Ready For Committer. > Thanks for the review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar wrote: > I have reviewed the latest patch and I don't have any more comments. > So if there is no objection from other reviewers I can move it to > "Ready For Committer"? Seeing no objections, I have moved it to Ready For Committer. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Sat, Dec 31, 2016 at 9:01 AM, Amit Kapila wrote: > Agreed and changed accordingly. > >> 2. It seems that we have missed one unlock in case of absorbed >> wakeups. You have initialised extraWaits with -1 and if there is one >> extra wake up then extraWaits will become 0 (it means we have made one >> extra call to PGSemaphoreLock and it's our responsibility to fix it as >> the leader will Unlock only once). But it appear in such case we will >> not make any call to PGSemaphoreUnlock. >> > > Good catch! I have fixed it by initialising extraWaits to 0. This > same issue exists from Group clear xid for which I will send a patch > separately. > > Apart from above, the patch needs to be adjusted for commit be7b2848 > which has changed the definition of PGSemaphore. I have reviewed the latest patch and I don't have any more comments. So if there is no objection from other reviewers I can move it to "Ready For Committer"? I have performed one more test, with 3000 scale factor because previously I tested only up to 1000 scale factor. The purpose of this test is to check whether there is any regression at higher scale factor. Machine: Intel 8 socket machine. Scale Factor: 3000 Shared Buffer: 8GB Test: Pgbench RW test. Run: 30 mins median of 3 Other modified GUC: -N 300 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 Summary: - Did not observed any regression. - The performance gain is in sync with what we have observed with other tests at lower scale factors. Sync_Commit_Off: client Head Patch 8 10065 10009 16 18487 18826 32 28167 28057 64 26655 28712 128 20152 24917 256 16740 22891 Sync_Commit_On: Client Head Patch 8 5102 5110 16 8087 8282 32 12523 12548 64 14701 15112 128 14656 15238 256 13421 16424 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Dec 29, 2016 at 10:41 AM, Dilip Kumar wrote: > > I have done one more pass of the review today. I have few comments. > > + if (nextidx != INVALID_PGPROCNO) > + { > + /* Sleep until the leader updates our XID status. */ > + for (;;) > + { > + /* acts as a read barrier */ > + PGSemaphoreLock(&proc->sem); > + if (!proc->clogGroupMember) > + break; > + extraWaits++; > + } > + > + Assert(pg_atomic_read_u32(&proc->clogGroupNext) == INVALID_PGPROCNO); > + > + /* Fix semaphore count for any absorbed wakeups */ > + while (extraWaits-- > 0) > + PGSemaphoreUnlock(&proc->sem); > + return true; > + } > > 1. extraWaits is used only locally in this block so I guess we can > declare inside this block only. > Agreed and changed accordingly. > 2. It seems that we have missed one unlock in case of absorbed > wakeups. You have initialised extraWaits with -1 and if there is one > extra wake up then extraWaits will become 0 (it means we have made one > extra call to PGSemaphoreLock and it's our responsibility to fix it as > the leader will Unlock only once). But it appear in such case we will > not make any call to PGSemaphoreUnlock. > Good catch! I have fixed it by initialising extraWaits to 0. This same issue exists from Group clear xid for which I will send a patch separately. Apart from above, the patch needs to be adjusted for commit be7b2848 which has changed the definition of PGSemaphore. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Dec 23, 2016 at 8:28 AM, Amit Kapila wrote: > The results look positive. Do you think we can conclude based on all > the tests you and Dilip have done, that we can move forward with this > patch (in particular group-update) or do you still want to do more > tests? I am aware that in one of the tests we have observed that > reducing contention on CLOGControlLock has increased the contention on > WALWriteLock, but I feel we can leave that point as a note to > committer and let him take a final call. From the code perspective > already Robert and Andres have taken one pass of review and I have > addressed all their comments, so surely more review of code can help, > but I think that is not a big deal considering patch size is > relatively small. I have done one more pass of the review today. I have few comments. + if (nextidx != INVALID_PGPROCNO) + { + /* Sleep until the leader updates our XID status. */ + for (;;) + { + /* acts as a read barrier */ + PGSemaphoreLock(&proc->sem); + if (!proc->clogGroupMember) + break; + extraWaits++; + } + + Assert(pg_atomic_read_u32(&proc->clogGroupNext) == INVALID_PGPROCNO); + + /* Fix semaphore count for any absorbed wakeups */ + while (extraWaits-- > 0) + PGSemaphoreUnlock(&proc->sem); + return true; + } 1. extraWaits is used only locally in this block so I guess we can declare inside this block only. 2. It seems that we have missed one unlock in case of absorbed wakeups. You have initialised extraWaits with -1 and if there is one extra wake up then extraWaits will become 0 (it means we have made one extra call to PGSemaphoreLock and it's our responsibility to fix it as the leader will Unlock only once). But it appear in such case we will not make any call to PGSemaphoreUnlock. Am I missing something? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 12/23/2016 03:58 AM, Amit Kapila wrote: On Thu, Dec 22, 2016 at 6:59 PM, Tomas Vondra wrote: Hi, But as discussed with Amit in Tokyo at pgconf.asia, I got access to a Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB of RAM and less powerful storage, for example. I've repeated a subset of x86 tests and pushed them to https://bitbucket.org/tvondra/power8-results-2 The new results are prefixed with "power-" and I've tried to put them right next to the "same" x86 tests. In all cases the patches significantly reduce the contention on CLogControlLock, just like on x86. Which is good and expected. The results look positive. Do you think we can conclude based on all the tests you and Dilip have done, that we can move forward with this patch (in particular group-update) or do you still want to do more tests? I am aware that in one of the tests we have observed that reducing contention on CLOGControlLock has increased the contention on WALWriteLock, but I feel we can leave that point as a note to committer and let him take a final call. From the code perspective already Robert and Andres have taken one pass of review and I have addressed all their comments, so surely more review of code can help, but I think that is not a big deal considering patch size is relatively small. Yes, I believe that seems like a reasonable conclusion. I've done a few more tests on the Power machine with data placed on a tmpfs filesystem (to minimize all the I/O overhead), but the results are the same. I don't think more testing is needed at this point, at lest not with the synthetic test cases we've been using for the testing. The patch already received way more benchmarking than most other patches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Dec 22, 2016 at 6:59 PM, Tomas Vondra wrote: > Hi, > > But as discussed with Amit in Tokyo at pgconf.asia, I got access to a > Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine > compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB > of RAM and less powerful storage, for example. > > I've repeated a subset of x86 tests and pushed them to > > https://bitbucket.org/tvondra/power8-results-2 > > The new results are prefixed with "power-" and I've tried to put them right > next to the "same" x86 tests. > > In all cases the patches significantly reduce the contention on > CLogControlLock, just like on x86. Which is good and expected. > The results look positive. Do you think we can conclude based on all the tests you and Dilip have done, that we can move forward with this patch (in particular group-update) or do you still want to do more tests? I am aware that in one of the tests we have observed that reducing contention on CLOGControlLock has increased the contention on WALWriteLock, but I feel we can leave that point as a note to committer and let him take a final call. From the code perspective already Robert and Andres have taken one pass of review and I have addressed all their comments, so surely more review of code can help, but I think that is not a big deal considering patch size is relatively small. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi, The attached results show that: (a) master shows the same zig-zag behavior - No idea why this wasn't observed on the previous runs. (b) group_update actually seems to improve the situation, because the performance keeps stable up to 72 clients, while on master the fluctuation starts way earlier. I'll redo the tests with a newer kernel - this was on 3.10.x which is what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches you submitted, if the 4.8.6 kernel does not help. Overall, I'm convinced this issue is unrelated to the patches. I've been unable to rerun the tests on this hardware with a newer kernel, so nothing new on the x86 front. But as discussed with Amit in Tokyo at pgconf.asia, I got access to a Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB of RAM and less powerful storage, for example. I've repeated a subset of x86 tests and pushed them to https://bitbucket.org/tvondra/power8-results-2 The new results are prefixed with "power-" and I've tried to put them right next to the "same" x86 tests. In all cases the patches significantly reduce the contention on CLogControlLock, just like on x86. Which is good and expected. Otherwise the results are rather boring - no major regressions compared to master, and all the patches perform almost exactly the same. Compare for example this: * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync * http://tvondra.bitbucket.org/#power-dilip-300-unlogged-sync So the results seem much smoother compared to x86, and the performance difference is roughly 3x, which matches the 24 vs. 72 cores. For pgbench, the difference is much more significant, though: * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip * http://tvondra.bitbucket.org/#power-pgbench-300-unlogged-sync-skip So, we're doing ~40k on Power8, but 220k on x86 (which is ~6x more, so double per-core throughput). My first guess was that this is due to the x86 machine having better I/O subsystem, so I've reran the tests with data directory in tmpfs, but that produced almost the same results. Of course, this observation is unrelated to this patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Dec 5, 2016 at 1:14 PM, Amit Kapila wrote: > On Mon, Dec 5, 2016 at 6:00 AM, Haribabu Kommi > wrote: > > No, that is not true. You have quoted the wrong message, that > discussion was about WALWriteLock contention not about the patch being > discussed in this thread. I have posted the latest set of patches > here [1]. Tomas is supposed to share the results of his tests. He > mentioned to me in PGConf Asia last week that he ran few tests on > Power Box, so let us wait for him to share his findings. > > > Moved to next CF with "waiting on author" status. Please feel free to > > update the status if the current status differs with the actual patch > > status. > > > > I think we should keep the status as "Needs Review". > > [1] - https://www.postgresql.org/message-id/CAA4eK1JjatUZu0% > 2BHCi%3D5VM1q-hFgN_OhegPAwEUJqxf-7pESbg%40mail.gmail.com Thanks for the update. I changed the status to "needs review" in 2017-01 commitfest. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Dec 5, 2016 at 6:00 AM, Haribabu Kommi wrote: > > > On Fri, Nov 4, 2016 at 8:20 PM, Amit Kapila wrote: >> >> On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas wrote: >> > On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra >> >> The difference is that both the fast-path locks and msgNumLock went >> >> into >> >> 9.2, so that end users probably never saw that regression. But we don't >> >> know >> >> if that happens for clog and WAL. >> >> >> >> Perhaps you have a working patch addressing the WAL contention, so that >> >> we >> >> could see how that changes the results? >> > >> > I don't think we do, yet. >> > >> >> Right. At this stage, we are just evaluating the ways (basic idea is >> to split the OS writes and Flush requests in separate locks) to reduce >> it. It is difficult to speculate results at this stage. I think >> after spending some more time (probably few weeks), we will be in >> position to share our findings. >> > > As per my understanding the current state of the patch is waiting for the > performance results from author. > No, that is not true. You have quoted the wrong message, that discussion was about WALWriteLock contention not about the patch being discussed in this thread. I have posted the latest set of patches here [1]. Tomas is supposed to share the results of his tests. He mentioned to me in PGConf Asia last week that he ran few tests on Power Box, so let us wait for him to share his findings. > Moved to next CF with "waiting on author" status. Please feel free to > update the status if the current status differs with the actual patch > status. > I think we should keep the status as "Needs Review". [1] - https://www.postgresql.org/message-id/CAA4eK1JjatUZu0%2BHCi%3D5VM1q-hFgN_OhegPAwEUJqxf-7pESbg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Nov 4, 2016 at 8:20 PM, Amit Kapila wrote: > On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas wrote: > > On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra > >> The difference is that both the fast-path locks and msgNumLock went into > >> 9.2, so that end users probably never saw that regression. But we don't > know > >> if that happens for clog and WAL. > >> > >> Perhaps you have a working patch addressing the WAL contention, so that > we > >> could see how that changes the results? > > > > I don't think we do, yet. > > > > Right. At this stage, we are just evaluating the ways (basic idea is > to split the OS writes and Flush requests in separate locks) to reduce > it. It is difficult to speculate results at this stage. I think > after spending some more time (probably few weeks), we will be in > position to share our findings. > > As per my understanding the current state of the patch is waiting for the performance results from author. Moved to next CF with "waiting on author" status. Please feel free to update the status if the current status differs with the actual patch status. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas wrote: > On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra >> The difference is that both the fast-path locks and msgNumLock went into >> 9.2, so that end users probably never saw that regression. But we don't know >> if that happens for clog and WAL. >> >> Perhaps you have a working patch addressing the WAL contention, so that we >> could see how that changes the results? > > I don't think we do, yet. > Right. At this stage, we are just evaluating the ways (basic idea is to split the OS writes and Flush requests in separate locks) to reduce it. It is difficult to speculate results at this stage. I think after spending some more time (probably few weeks), we will be in position to share our findings. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra wrote: > I don't think I've suggested not committing any of the clog patches (or > other patches in general) because shifting the contention somewhere else > might cause regressions. At the end of the last CF I've however stated that > we need to better understand the impact on various wokloads, and I think > Amit agreed with that conclusion. > > We have that understanding now, I believe - also thanks to your idea of > sampling wait events data. > > You're right we can't fix all the contention points in one patch, and that > shifting the contention may cause regressions. But we should at least > understand what workloads might be impacted, how serious the regressions may > get etc. Which is why all the testing was done. OK. > Sure, I understand that. My main worry was that people will get worse > performance with the next major version that what they get now (assuming we > don't manage to address the other contention points). Which is difficult to > explain to users & customers, no matter how reasonable it seems to us. > > The difference is that both the fast-path locks and msgNumLock went into > 9.2, so that end users probably never saw that regression. But we don't know > if that happens for clog and WAL. > > Perhaps you have a working patch addressing the WAL contention, so that we > could see how that changes the results? I don't think we do, yet. Amit or Kuntal might know more. At some level I think we're just hitting the limits of the hardware's ability to lay bytes on a platter, and fine-tuning the locking may not help much. > I might be wrong, but I doubt the kernel guys are running particularly wide > set of tests, so how likely is it they will notice issues with specific > workloads? Wouldn't it be great if we could tell them there's a bug and > provide a workload that reproduces it? > > I don't see how "it's a Linux issue" makes it someone else's problem. The > kernel guys can't really test everything (and are not obliged to). It's up > to us to do more testing in this area, and report issues to the kernel guys > (which is not happening as much as it should). I don't exactly disagree with any of that. I just want to find a course of action that we can agree on and move forward. This has been cooking for a long time, and I want to converge on some resolution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 11/02/2016 05:52 PM, Amit Kapila wrote: On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra wrote: On 11/01/2016 08:13 PM, Robert Haas wrote: On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra wrote: The one remaining thing is the strange zig-zag behavior, but that might easily be a due to scheduling in kernel, or something else. I don't consider it a blocker for any of the patches, though. The only reason I could think of for that zig-zag behaviour is frequent multiple clog page accesses and it could be due to below reasons: a. transaction and its subtransactions (IIRC, Dilip's case has one main transaction and two subtransactions) can't fit into same page, in which case the group_update optimization won't apply and I don't think we can do anything for it. b. In the same group, multiple clog pages are being accessed. It is not a likely scenario, but it can happen and we might be able to improve a bit if that is happening. c. The transactions at same time tries to update different clog page. I think as mentioned upthread we can handle it by using slots an allowing multiple groups to work together instead of a single group. To check if there is any impact due to (a) or (b), I have added few logs in code (patch - group_update_clog_v9_log). The log message could be "all xacts are not on same page" or "Group contains different pages". Patch group_update_clog_v9_slots tries to address (c). So if there is any problem due to (c), this patch should improve the situation. Can you please try to run the test where you saw zig-zag behaviour with both the patches separately? I think if there is anything due to postgres, then you can see either one of the new log message or performance will be improved, OTOH if we see same behaviour, then I think we can probably assume it due to scheduler activity and move on. Also one point to note here is that even when the performance is down in that curve, it is equal to or better than HEAD. Will do. Based on the results with more client counts (increment by 6 clients instead of 36), I think this really looks like something unrelated to any of the patches - kernel, CPU, or something already present in current master. The attached results show that: (a) master shows the same zig-zag behavior - No idea why this wasn't observed on the previous runs. (b) group_update actually seems to improve the situation, because the performance keeps stable up to 72 clients, while on master the fluctuation starts way earlier. I'll redo the tests with a newer kernel - this was on 3.10.x which is what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches you submitted, if the 4.8.6 kernel does not help. Overall, I'm convinced this issue is unrelated to the patches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services zig-zag.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 11/02/2016 05:52 PM, Amit Kapila wrote: On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra wrote: On 11/01/2016 08:13 PM, Robert Haas wrote: On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra wrote: The one remaining thing is the strange zig-zag behavior, but that might easily be a due to scheduling in kernel, or something else. I don't consider it a blocker for any of the patches, though. The only reason I could think of for that zig-zag behaviour is frequent multiple clog page accesses and it could be due to below reasons: a. transaction and its subtransactions (IIRC, Dilip's case has one main transaction and two subtransactions) can't fit into same page, in which case the group_update optimization won't apply and I don't think we can do anything for it. b. In the same group, multiple clog pages are being accessed. It is not a likely scenario, but it can happen and we might be able to improve a bit if that is happening. c. The transactions at same time tries to update different clog page. I think as mentioned upthread we can handle it by using slots an allowing multiple groups to work together instead of a single group. To check if there is any impact due to (a) or (b), I have added few logs in code (patch - group_update_clog_v9_log). The log message could be "all xacts are not on same page" or "Group contains different pages". Patch group_update_clog_v9_slots tries to address (c). So if there is any problem due to (c), this patch should improve the situation. Can you please try to run the test where you saw zig-zag behaviour with both the patches separately? I think if there is anything due to postgres, then you can see either one of the new log message or performance will be improved, OTOH if we see same behaviour, then I think we can probably assume it due to scheduler activity and move on. Also one point to note here is that even when the performance is down in that curve, it is equal to or better than HEAD. Will do. Based on the results with more client counts (increment by 6 clients instead of 36), I think this really looks like something unrelated to any of the patches - kernel, CPU, or something already present in current master. The attached results show that: (a) master shows the same zig-zag behavior - No idea why this wasn't observed on the previous runs. (b) group_update actually seems to improve the situation, because the performance keeps stable up to 72 clients, while on master the fluctuation starts way earlier. I'll redo the tests with a newer kernel - this was on 3.10.x which is what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches you submitted, if the 4.8.6 kernel does not help. Overall, I'm convinced this issue is unrelated to the patches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra wrote: > On 11/01/2016 08:13 PM, Robert Haas wrote: >> >> On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra >> wrote: >>> > > The one remaining thing is the strange zig-zag behavior, but that might > easily be a due to scheduling in kernel, or something else. I don't consider > it a blocker for any of the patches, though. > The only reason I could think of for that zig-zag behaviour is frequent multiple clog page accesses and it could be due to below reasons: a. transaction and its subtransactions (IIRC, Dilip's case has one main transaction and two subtransactions) can't fit into same page, in which case the group_update optimization won't apply and I don't think we can do anything for it. b. In the same group, multiple clog pages are being accessed. It is not a likely scenario, but it can happen and we might be able to improve a bit if that is happening. c. The transactions at same time tries to update different clog page. I think as mentioned upthread we can handle it by using slots an allowing multiple groups to work together instead of a single group. To check if there is any impact due to (a) or (b), I have added few logs in code (patch - group_update_clog_v9_log). The log message could be "all xacts are not on same page" or "Group contains different pages". Patch group_update_clog_v9_slots tries to address (c). So if there is any problem due to (c), this patch should improve the situation. Can you please try to run the test where you saw zig-zag behaviour with both the patches separately? I think if there is anything due to postgres, then you can see either one of the new log message or performance will be improved, OTOH if we see same behaviour, then I think we can probably assume it due to scheduler activity and move on. Also one point to note here is that even when the performance is down in that curve, it is equal to or better than HEAD. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v9_log.patch Description: Binary data group_update_clog_v9_slots.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 11/01/2016 08:13 PM, Robert Haas wrote: On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra wrote: Honestly, I have no idea what to think about this ... I think a lot of the details here depend on OS scheduler behavior. For example, here's one of the first scalability graphs I ever did: http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html It's a nice advertisement for fast-path locking, but look at the funny shape of the red and green lines between 1 and 32 cores. The curve is oddly bowl-shaped. As the post discusses, we actually dip WAY under linear scalability in the 8-20 core range and then shoot up like a rocket afterwards so that at 32 cores we actually achieve super-linear scalability. You can't blame this on anything except Linux. Someone shared BSD graphs (I forget which flavor) with me privately and they don't exhibit this poor behavior. (They had different poor behaviors instead - performance collapsed at high client counts. That was a long time ago so it's probably fixed now.) This is why I think it's fundamentally wrong to look at this patch and say "well, contention goes down, and in some cases that makes performance go up, but because in other cases it decreases performance or increases variability we shouldn't commit it". If we took that approach, we wouldn't have fast-path locking today, because the early versions of fast-path locking could exhibit *major* regressions precisely because of contention shifting to other locks, specifically SInvalReadLock and msgNumLock. (cf. commit b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4). If we say that because the contention on those other locks can get worse as a result of contention on this lock being reduced, or even worse, if we try to take responsibility for what effect reducing lock contention might have on the operating system scheduler discipline (which will certainly differ from system to system and version to version), we're never going to get anywhere, because there's almost always going to be some way that reducing contention in one place can bite you someplace else. I don't think I've suggested not committing any of the clog patches (or other patches in general) because shifting the contention somewhere else might cause regressions. At the end of the last CF I've however stated that we need to better understand the impact on various wokloads, and I think Amit agreed with that conclusion. We have that understanding now, I believe - also thanks to your idea of sampling wait events data. You're right we can't fix all the contention points in one patch, and that shifting the contention may cause regressions. But we should at least understand what workloads might be impacted, how serious the regressions may get etc. Which is why all the testing was done. I also believe it's pretty normal for patches that remove lock contention to increase variability. If you run an auto race where every car has a speed governor installed that limits it to 80 kph, there will be much less variability in the finish times than if you remove the governor, but that's a stupid way to run a race. You won't get much innovation around increasing the top speed of the cars under those circumstances, either. Nobody ever bothered optimizing the contention around msgNumLock before fast-path locking happened, because the heavyweight lock manager burdened the system so heavily that you couldn't generate enough contention on it to matter. Similarly, we're not going to get much traction around optimizing the other locks to which contention would shift if we applied this patch unless we apply it. This is not theoretical: EnterpriseDB staff have already done work on trying to optimize WALWriteLock, but it's hard to get a benefit. The more contention other contention we eliminate, the easier it will be to see whether a proposed change to WALWriteLock helps. Sure, I understand that. My main worry was that people will get worse performance with the next major version that what they get now (assuming we don't manage to address the other contention points). Which is difficult to explain to users & customers, no matter how reasonable it seems to us. The difference is that both the fast-path locks and msgNumLock went into 9.2, so that end users probably never saw that regression. But we don't know if that happens for clog and WAL. Perhaps you have a working patch addressing the WAL contention, so that we could see how that changes the results? > Of course, we'll also be more at the mercy of operating system scheduler discipline, but that's not all a bad thing either. The Linux kernel guys have been known to run PostgreSQL to see whether proposed changes help or hurt, but they're not going to try those tests after applying patches that we rejected because they expose us to existing Linux shortcomings. I might be wrong, but I doubt the kernel guys are running particularly wide set of tests, so how likely is it t
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra wrote: > Honestly, I have no idea what to think about this ... I think a lot of the details here depend on OS scheduler behavior. For example, here's one of the first scalability graphs I ever did: http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html It's a nice advertisement for fast-path locking, but look at the funny shape of the red and green lines between 1 and 32 cores. The curve is oddly bowl-shaped. As the post discusses, we actually dip WAY under linear scalability in the 8-20 core range and then shoot up like a rocket afterwards so that at 32 cores we actually achieve super-linear scalability. You can't blame this on anything except Linux. Someone shared BSD graphs (I forget which flavor) with me privately and they don't exhibit this poor behavior. (They had different poor behaviors instead - performance collapsed at high client counts. That was a long time ago so it's probably fixed now.) This is why I think it's fundamentally wrong to look at this patch and say "well, contention goes down, and in some cases that makes performance go up, but because in other cases it decreases performance or increases variability we shouldn't commit it". If we took that approach, we wouldn't have fast-path locking today, because the early versions of fast-path locking could exhibit *major* regressions precisely because of contention shifting to other locks, specifically SInvalReadLock and msgNumLock. (cf. commit b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4). If we say that because the contention on those other locks can get worse as a result of contention on this lock being reduced, or even worse, if we try to take responsibility for what effect reducing lock contention might have on the operating system scheduler discipline (which will certainly differ from system to system and version to version), we're never going to get anywhere, because there's almost always going to be some way that reducing contention in one place can bite you someplace else. I also believe it's pretty normal for patches that remove lock contention to increase variability. If you run an auto race where every car has a speed governor installed that limits it to 80 kph, there will be much less variability in the finish times than if you remove the governor, but that's a stupid way to run a race. You won't get much innovation around increasing the top speed of the cars under those circumstances, either. Nobody ever bothered optimizing the contention around msgNumLock before fast-path locking happened, because the heavyweight lock manager burdened the system so heavily that you couldn't generate enough contention on it to matter. Similarly, we're not going to get much traction around optimizing the other locks to which contention would shift if we applied this patch unless we apply it. This is not theoretical: EnterpriseDB staff have already done work on trying to optimize WALWriteLock, but it's hard to get a benefit. The more contention other contention we eliminate, the easier it will be to see whether a proposed change to WALWriteLock helps. Of course, we'll also be more at the mercy of operating system scheduler discipline, but that's not all a bad thing either. The Linux kernel guys have been known to run PostgreSQL to see whether proposed changes help or hurt, but they're not going to try those tests after applying patches that we rejected because they expose us to existing Linux shortcomings. I don't want to be perceived as advocating too forcefully for a patch that was, after all, written by a colleague. However, I sincerely believe it's a mistake to say that a patch which reduces lock contention must show a tangible win or at least no loss on every piece of hardware, on every kernel, at every client count with no increase in variability in any configuration. Very few (if any) patches are going to be able to meet that bar, and if we make that the bar, people aren't going to write patches to reduce lock contention in PostgreSQL. For that to be worth doing, you have to be able to get the patch committed in finite time. We've spent an entire release cycle dithering over this patch. Several alternative patches have been written that are not any better (and the people who wrote those patches don't seem especially interested in doing further work on them anyway). There is increasing evidence that the patch is effective at solving the problem it claims to solve, and that any downsides are just the result of poor lock-scaling behavior elsewhere which we could be working on fixing if we weren't still spending time on this. Is that really not good enough? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/31/2016 02:24 PM, Tomas Vondra wrote: On 10/31/2016 05:01 AM, Jim Nasby wrote: On 10/30/16 1:32 PM, Tomas Vondra wrote: Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good) 108 = 72 * 1.5 (bad) 144 = 72 * 2 (good) 180 = 72 * 2.5 (bad) 216 = 72 * 3 (good) 252 = 72 * 3.5 (bad) 288 = 72 * 4 (good) So maybe this has something to do with how OS schedules the tasks, or maybe some internal heuristics in the CPU, or something like that. It might be enlightening to run a series of tests that are 72*.1 or *.2 apart (say, 72, 79, 86, ..., 137, 144). Yeah, I've started a benchmark with client a step of 6 clients 36 42 48 54 60 66 72 78 ... 252 258 264 270 276 282 288 instead of just 36 72 108 144 180 216 252 288 which did a test every 36 clients. To compensate for the 6x longer runs, I'm only running tests for "group-update" and "master", so I should have the results in ~36h. So I've been curious and looked at results of the runs executed so far, and for the group_update patch it looks like this: clients tps - 36 117663 42 139791 48 129331 54 144970 60 124174 66 137227 72 146064 78 100267 84 141538 90 96607 96 139290 102 93976 108 136421 114 91848 120 133563 126 89801 132 132607 138 87912 144 129688 150 87221 156 129608 162 85403 168 130193 174 83863 180 129337 186 81968 192 128571 198 82053 204 128020 210 80768 216 124153 222 80493 228 125503 234 78950 240 125670 246 78418 252 123532 258 77623 264 124366 270 76726 276 119054 282 76960 288 121819 So, similar saw-like behavior, perfectly periodic. But the really strange thing is the peaks/valleys don't match those observed before! That is, during the previous runs, 72, 144, 216 and 288 were "good" while 108, 180 and 252 were "bad". But in those runs, all those client counts are "good" ... Honestly, I have no idea what to think about this ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/31/2016 08:43 PM, Amit Kapila wrote: On Mon, Oct 31, 2016 at 7:58 PM, Tomas Vondra wrote: On 10/31/2016 02:51 PM, Amit Kapila wrote: And moreover, this setup (single device for the whole cluster) is very common, we can't just neglect it. But my main point here really is that the trade-off in those cases may not be really all that great, because you get the best performance at 36/72 clients, and then the tps drops and variability increases. At least not right now, before tackling contention on the WAL lock (or whatever lock becomes the bottleneck). Okay, but does wait event results show increase in contention on some other locks for pgbench-3000-logged-sync-skip-64? Can you share wait events for the runs where there is a fluctuation? Sure, I do have wait event stats, including a summary for different client counts - see this: http://tvondra.bitbucket.org/by-test/pgbench-3000-logged-sync-skip-64.txt Looking only at group_update patch for three interesting client counts, it looks like this: wait_event_type |wait_event |108 144 180 -+---+- LWLockNamed | WALWriteLock | 661284 847057 1006061 | | 126654 191506 265386 Client | ClientRead| 37273 5279164799 LWLockTranche | wal_insert| 28394 5189379932 LWLockNamed | CLogControlLock | 7766 1491323138 LWLockNamed | WALBufMappingLock | 36153739 3803 LWLockNamed | ProcArrayLock |9131776 2685 Lock| extend|9092082 2228 LWLockNamed | XidGenLock|301 349 675 LWLockTranche | clog |173 331 607 LWLockTranche | buffer_content|163 468 737 LWLockTranche | lock_manager | 88 140 145 Compared to master, this shows significant reduction of contention on CLogControlLock (which on master has 20k, 83k and 200k samples), and moving the contention to WALWriteLock. But perhaps you're asking about variability during the benchmark? I suppose that could be extracted from the collected data, but I haven't done that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 31, 2016 at 7:58 PM, Tomas Vondra wrote: > On 10/31/2016 02:51 PM, Amit Kapila wrote: > And moreover, this setup (single device for the whole cluster) is very > common, we can't just neglect it. > > But my main point here really is that the trade-off in those cases may not > be really all that great, because you get the best performance at 36/72 > clients, and then the tps drops and variability increases. At least not > right now, before tackling contention on the WAL lock (or whatever lock > becomes the bottleneck). > Okay, but does wait event results show increase in contention on some other locks for pgbench-3000-logged-sync-skip-64? Can you share wait events for the runs where there is a fluctuation? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/31/2016 02:51 PM, Amit Kapila wrote: On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra wrote: Hi, On 10/27/2016 01:44 PM, Amit Kapila wrote: I've read that analysis, but I'm not sure I see how it explains the "zig zag" behavior. I do understand that shifting the contention to some other (already busy) lock may negatively impact throughput, or that the group_update may result in updating multiple clog pages, but I don't understand two things: (1) Why this should result in the fluctuations we observe in some of the cases. For example, why should we see 150k tps on, 72 clients, then drop to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180 clients etc. That seems fairly strange. I don't think hitting multiple clog pages has much to do with client-count. However, we can wait to see your further detailed test report. (2) Why this should affect all three patches, when only group_update has to modify multiple clog pages. No, all three patches can be affected due to multiple clog pages. Read second paragraph ("I think one of the probable reasons that could happen for both the approaches") in same e-mail [1]. It is basically due to frequent release-and-reacquire of locks. On logged tables it usually looks like this (i.e. modest increase for high client counts at the expense of significantly higher variability): http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64 What variability are you referring to in those results? Good question. What I mean by "variability" is how stable the tps is during the benchmark (when measured on per-second granularity). For example, let's run a 10-second benchmark, measuring number of transactions committed each second. Then all those runs do 1000 tps on average: run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000 run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500 run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000 Generally, such behaviours are seen due to writes. Are WAL and DATA on same disk in your tests? Yes, there's one RAID device on 10 SSDs, with 4GB of the controller. I've done some tests and it easily handles > 1.5GB/s in sequential writes, and >500MB/s in sustained random writes. Also, let me point out that most of the tests were done so that the whole data set fits into shared_buffers, and with no checkpoints during the runs (so no writes to data files should really happen). For example these tests were done on scale 3000 (45GB data set) with 64GB shared buffers: [a] http://tvondra.bitbucket.org/index2.html#pgbench-3000-unlogged-sync-noskip-64 [b] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-async-noskip-64 and I could show similar cases with scale 300 on 16GB shared buffers. In those cases, there's very little contention between WAL and the rest of the data base (in terms of I/O). And moreover, this setup (single device for the whole cluster) is very common, we can't just neglect it. But my main point here really is that the trade-off in those cases may not be really all that great, because you get the best performance at 36/72 clients, and then the tps drops and variability increases. At least not right now, before tackling contention on the WAL lock (or whatever lock becomes the bottleneck). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 31, 2016 at 7:02 PM, Tomas Vondra wrote: > > The remaining benchmark with 512 clog buffers completed, and the impact > roughly matches Dilip's benchmark - that is, increasing the number of clog > buffers eliminates all positive impact of the patches observed on 128 > buffers. Compare these two reports: > > [a] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest > > [b] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest-512 > > With 128 buffers the group_update and granular_locking patches achieve up to > 50k tps, while master and no_content_lock do ~30k tps. After increasing > number of clog buffers, we get only ~30k in all cases. > > I'm not sure what's causing this, whether we're hitting limits of the simple > LRU cache used for clog buffers, or something else. > I have also seen previously that increasing clog buffers to 256 can impact performance negatively. So, probably here the gains due to group_update patch is negated due to the impact of increasing clog buffers. I am not sure if it is good idea to see the impact of increasing clog buffers along with this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra wrote: > Hi, > > On 10/27/2016 01:44 PM, Amit Kapila wrote: > > I've read that analysis, but I'm not sure I see how it explains the "zig > zag" behavior. I do understand that shifting the contention to some other > (already busy) lock may negatively impact throughput, or that the > group_update may result in updating multiple clog pages, but I don't > understand two things: > > (1) Why this should result in the fluctuations we observe in some of the > cases. For example, why should we see 150k tps on, 72 clients, then drop to > 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180 > clients etc. That seems fairly strange. > I don't think hitting multiple clog pages has much to do with client-count. However, we can wait to see your further detailed test report. > (2) Why this should affect all three patches, when only group_update has to > modify multiple clog pages. > No, all three patches can be affected due to multiple clog pages. Read second paragraph ("I think one of the probable reasons that could happen for both the approaches") in same e-mail [1]. It is basically due to frequent release-and-reacquire of locks. > > >>> On logged tables it usually looks like this (i.e. modest increase for >>> high >>> client counts at the expense of significantly higher variability): >>> >>> http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64 >>> >> >> What variability are you referring to in those results? > >> > > Good question. What I mean by "variability" is how stable the tps is during > the benchmark (when measured on per-second granularity). For example, let's > run a 10-second benchmark, measuring number of transactions committed each > second. > > Then all those runs do 1000 tps on average: > > run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000 > run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500 > run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000 > Generally, such behaviours are seen due to writes. Are WAL and DATA on same disk in your tests? [1] - https://www.postgresql.org/message-id/CAA4eK1J9VxJUnpOiQDf0O%3DZ87QUMbw%3DuGcQr4EaGbHSCibx9yA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/30/2016 07:32 PM, Tomas Vondra wrote: Hi, On 10/27/2016 01:44 PM, Amit Kapila wrote: On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra wrote: FWIW I plan to run the same test with logged tables - if it shows similar regression, I'll be much more worried, because that's a fairly typical scenario (logged tables, data set > shared buffers), and we surely can't just go and break that. Sure, please do those tests. OK, so I do have results for those tests - that is, scale 3000 with shared_buffers=16GB (so continuously writing out dirty buffers). The following reports show the results slightly differently - all three "tps charts" next to each other, then the speedup charts and tables. Overall, the results are surprisingly positive - look at these results (all ending with "-retest"): [1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest [2] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest [3] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest All three show significant improvement, even with fairly low client counts. For example with 72 clients, the tps improves 20%, without significantly affecting variability variability of the results( measured as stdddev, more on this later). It's however interesting that "no_content_lock" is almost exactly the same as master, while the other two cases improve significantly. The other interesting thing is that "pgbench -N" [3] shows no such improvement, unlike regular pgbench and Dilip's workload. Not sure why, though - I'd expect to see significant improvement in this case. I have also repeated those tests with clog buffers increased to 512 (so 4x the current maximum of 128). I only have results for Dilip's workload and "pgbench -N": [4] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512 [5] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512 The results are somewhat surprising, I guess, because the effect is wildly different for each workload. For Dilip's workload increasing clog buffers to 512 pretty much eliminates all benefits of the patches. For example with 288 client, the group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 512 buffers [4]. With "pgbench -N", the effect is exactly the opposite - while with 128 buffers there was pretty much no benefit from any of the patches [3], with 512 buffers we suddenly get almost 2x the throughput, but only for group_update and master (while the other two patches show no improvement at all). The remaining benchmark with 512 clog buffers completed, and the impact roughly matches Dilip's benchmark - that is, increasing the number of clog buffers eliminates all positive impact of the patches observed on 128 buffers. Compare these two reports: [a] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest [b] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest-512 With 128 buffers the group_update and granular_locking patches achieve up to 50k tps, while master and no_content_lock do ~30k tps. After increasing number of clog buffers, we get only ~30k in all cases. I'm not sure what's causing this, whether we're hitting limits of the simple LRU cache used for clog buffers, or something else. But maybe there's something in the design of clog buffers that make them work less efficiently with more clog buffers? I'm not sure whether that's something we need to fix before eventually committing any of them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/31/2016 05:01 AM, Jim Nasby wrote: On 10/30/16 1:32 PM, Tomas Vondra wrote: Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good) 108 = 72 * 1.5 (bad) 144 = 72 * 2 (good) 180 = 72 * 2.5 (bad) 216 = 72 * 3 (good) 252 = 72 * 3.5 (bad) 288 = 72 * 4 (good) So maybe this has something to do with how OS schedules the tasks, or maybe some internal heuristics in the CPU, or something like that. It might be enlightening to run a series of tests that are 72*.1 or *.2 apart (say, 72, 79, 86, ..., 137, 144). Yeah, I've started a benchmark with client a step of 6 clients 36 42 48 54 60 66 72 78 ... 252 258 264 270 276 282 288 instead of just 36 72 108 144 180 216 252 288 which did a test every 36 clients. To compensate for the 6x longer runs, I'm only running tests for "group-update" and "master", so I should have the results in ~36h. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/30/16 1:32 PM, Tomas Vondra wrote: Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good) 108 = 72 * 1.5 (bad) 144 = 72 * 2 (good) 180 = 72 * 2.5 (bad) 216 = 72 * 3 (good) 252 = 72 * 3.5 (bad) 288 = 72 * 4 (good) So maybe this has something to do with how OS schedules the tasks, or maybe some internal heuristics in the CPU, or something like that. It might be enlightening to run a series of tests that are 72*.1 or *.2 apart (say, 72, 79, 86, ..., 137, 144). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi, On 10/27/2016 01:44 PM, Amit Kapila wrote: On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra wrote: FWIW I plan to run the same test with logged tables - if it shows similar regression, I'll be much more worried, because that's a fairly typical scenario (logged tables, data set > shared buffers), and we surely can't just go and break that. Sure, please do those tests. OK, so I do have results for those tests - that is, scale 3000 with shared_buffers=16GB (so continuously writing out dirty buffers). The following reports show the results slightly differently - all three "tps charts" next to each other, then the speedup charts and tables. Overall, the results are surprisingly positive - look at these results (all ending with "-retest"): [1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest [2] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest [3] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest All three show significant improvement, even with fairly low client counts. For example with 72 clients, the tps improves 20%, without significantly affecting variability variability of the results( measured as stdddev, more on this later). It's however interesting that "no_content_lock" is almost exactly the same as master, while the other two cases improve significantly. The other interesting thing is that "pgbench -N" [3] shows no such improvement, unlike regular pgbench and Dilip's workload. Not sure why, though - I'd expect to see significant improvement in this case. I have also repeated those tests with clog buffers increased to 512 (so 4x the current maximum of 128). I only have results for Dilip's workload and "pgbench -N": [4] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512 [5] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512 The results are somewhat surprising, I guess, because the effect is wildly different for each workload. For Dilip's workload increasing clog buffers to 512 pretty much eliminates all benefits of the patches. For example with 288 client, the group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 512 buffers [4]. With "pgbench -N", the effect is exactly the opposite - while with 128 buffers there was pretty much no benefit from any of the patches [3], with 512 buffers we suddenly get almost 2x the throughput, but only for group_update and master (while the other two patches show no improvement at all). I don't have results for the regular pgbench ("noskip") with 512 buffers yet, but I'm curious what that will show. In general I however think that the patches don't show any regression in any of those workloads (at least not with 128 buffers). Based solely on the results, I like the group_update more, because it performs as good as master or significantly better. 2. We do see in some cases that granular_locking and no_content_lock patches has shown significant increase in contention on CLOGControlLock. I have already shared my analysis for same upthread [8]. I've read that analysis, but I'm not sure I see how it explains the "zig zag" behavior. I do understand that shifting the contention to some other (already busy) lock may negatively impact throughput, or that the group_update may result in updating multiple clog pages, but I don't understand two things: (1) Why this should result in the fluctuations we observe in some of the cases. For example, why should we see 150k tps on, 72 clients, then drop to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180 clients etc. That seems fairly strange. (2) Why this should affect all three patches, when only group_update has to modify multiple clog pages. For example consider this: http://tvondra.bitbucket.org/index2.html#dilip-300-logged-async For example looking at % of time spent on different locks with the group_update patch, I see this (ignoring locks with ~1%): event_type wait_event 36 72 108 144 180 216 252 288 - - -60 63 45 53 38 50 33 48 Client ClientRead 33 239 146 1048 LWLockNamedCLogControlLock 27 33 14 34 14 33 14 LWLockTranche buffer_content029 13 19 18 26 22 I don't see any sign of contention shifting to other locks, just CLogControlLock fluctuating between 14% and 33% for some reason. Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good) 108 = 72 * 1.5 (bad) 144 = 72
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 27, 2016 at 5:14 PM, Amit Kapila wrote: >>> Thanks Tomas and Dilip for doing detailed performance tests for this >>> patch. I would like to summarise the performance testing results. >>> >>> 1. With update intensive workload, we are seeing gains from 23%~192% >>> at client count >=64 with group_update patch [1]. this is with unlogged table >>> 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing >>> gains from 12% to ~70% at client count >=64 [2]. Tests are done on >>> 8-socket intel m/c. this is with synchronous_commit=off >>> 3. With pgbench workload (both simple-update and tpc-b at 300 scale >>> factor), we are seeing gain 10% to > 50% at client count >=64 [3]. >>> Tests are done on 8-socket intel m/c. this is with synchronous_commit=off >>> 4. To see why the patch only helps at higher client count, we have >>> done wait event testing for various workloads [4], [5] and the results >>> indicate that at lower clients, the waits are mostly due to >>> transactionid or clientread. At client-counts where contention due to >>> CLOGControlLock is significant, this patch helps a lot to reduce that >>> contention. These tests are done on on 8-socket intel m/c and >>> 4-socket power m/c these both are with synchronous_commit=off + unlogged table >>> 5. With pgbench workload (unlogged tables), we are seeing gains from >>> 15% to > 300% at client count >=72 [6]. >>> >> >> It's not entirely clear which of the above tests were done on unlogged >> tables, and I don't see that in the referenced e-mails. That would be an >> interesting thing to mention in the summary, I think. >> > > One thing is clear that all results are on either > synchronous_commit=off or on unlogged tables. I think Dilip can > answer better which of those are on unlogged and which on > synchronous_commit=off. I have mentioned this above under each of your test point.. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra wrote: > On 10/25/2016 06:10 AM, Amit Kapila wrote: >> >> On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar >> wrote: >>> >>> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar >>> wrote: On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra wrote: > In the results you've posted on 10/12, you've mentioned a regression > with 32 > clients, where you got 52k tps on master but only 48k tps with the > patch (so > ~10% difference). I have no idea what scale was used for those tests, That test was with scale factor 300 on POWER 4 socket machine. I think I need to repeat this test with multiple reading to confirm it was regression or run to run variation. I will do that soon and post the results. >>> >>> >>> As promised, I have rerun my test (3 times), and I did not see any >>> regression. >>> >> >> Thanks Tomas and Dilip for doing detailed performance tests for this >> patch. I would like to summarise the performance testing results. >> >> 1. With update intensive workload, we are seeing gains from 23%~192% >> at client count >=64 with group_update patch [1]. >> 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing >> gains from 12% to ~70% at client count >=64 [2]. Tests are done on >> 8-socket intel m/c. >> 3. With pgbench workload (both simple-update and tpc-b at 300 scale >> factor), we are seeing gain 10% to > 50% at client count >=64 [3]. >> Tests are done on 8-socket intel m/c. >> 4. To see why the patch only helps at higher client count, we have >> done wait event testing for various workloads [4], [5] and the results >> indicate that at lower clients, the waits are mostly due to >> transactionid or clientread. At client-counts where contention due to >> CLOGControlLock is significant, this patch helps a lot to reduce that >> contention. These tests are done on on 8-socket intel m/c and >> 4-socket power m/c >> 5. With pgbench workload (unlogged tables), we are seeing gains from >> 15% to > 300% at client count >=72 [6]. >> > > It's not entirely clear which of the above tests were done on unlogged > tables, and I don't see that in the referenced e-mails. That would be an > interesting thing to mention in the summary, I think. > One thing is clear that all results are on either synchronous_commit=off or on unlogged tables. I think Dilip can answer better which of those are on unlogged and which on synchronous_commit=off. >> There are many more tests done for the proposed patches where gains >> are either or similar lines as above or are neutral. We do see >> regression in some cases. >> >> 1. When data doesn't fit in shared buffers, there is regression at >> some client counts [7], but on analysis it has been found that it is >> mainly due to the shift in contention from CLOGControlLock to >> WALWriteLock and or other locks. > > > The questions is why shifting the lock contention to WALWriteLock should > cause such significant performance drop, particularly when the test was done > on unlogged tables. Or, if that's the case, how it makes the performance > drop less problematic / acceptable. > Whenever the contention shifts to other lock, there is a chance that it can show performance dip in some cases and I have seen that previously as well. The theory behind that could be like this, say you have two locks L1 and L2, and there are 100 processes that are contending on L1 and 50 on L2. Now say, you reduce contention on L1 such that it leads to 120 processes contending on L2, so increased contention on L2 can slowdown the overall throughput of all processes. > FWIW I plan to run the same test with logged tables - if it shows similar > regression, I'll be much more worried, because that's a fairly typical > scenario (logged tables, data set > shared buffers), and we surely can't > just go and break that. > Sure, please do those tests. >> 2. We do see in some cases that granular_locking and no_content_lock >> patches has shown significant increase in contention on >> CLOGControlLock. I have already shared my analysis for same upthread >> [8]. > > > I do agree that some cases this significantly reduces contention on the > CLogControlLock. I do however think that currently the performance gains are > limited almost exclusively to cases on unlogged tables, and some > logged+async cases. > Right, because the contention is mainly visible for those workloads. > On logged tables it usually looks like this (i.e. modest increase for high > client counts at the expense of significantly higher variability): > > http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64 > What variability are you referring to in those results? > or like this (i.e. only partial recovery for the drop above 36 clients): > > http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64 > > And of course, there are cases like this: > > http://tvondra.bitbucket.org/#dilip-300-logged-async > > I'd re
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/25/2016 06:10 AM, Amit Kapila wrote: On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar wrote: On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar wrote: On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra wrote: In the results you've posted on 10/12, you've mentioned a regression with 32 clients, where you got 52k tps on master but only 48k tps with the patch (so ~10% difference). I have no idea what scale was used for those tests, That test was with scale factor 300 on POWER 4 socket machine. I think I need to repeat this test with multiple reading to confirm it was regression or run to run variation. I will do that soon and post the results. As promised, I have rerun my test (3 times), and I did not see any regression. Thanks Tomas and Dilip for doing detailed performance tests for this patch. I would like to summarise the performance testing results. 1. With update intensive workload, we are seeing gains from 23%~192% at client count >=64 with group_update patch [1]. 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing gains from 12% to ~70% at client count >=64 [2]. Tests are done on 8-socket intel m/c. 3. With pgbench workload (both simple-update and tpc-b at 300 scale factor), we are seeing gain 10% to > 50% at client count >=64 [3]. Tests are done on 8-socket intel m/c. 4. To see why the patch only helps at higher client count, we have done wait event testing for various workloads [4], [5] and the results indicate that at lower clients, the waits are mostly due to transactionid or clientread. At client-counts where contention due to CLOGControlLock is significant, this patch helps a lot to reduce that contention. These tests are done on on 8-socket intel m/c and 4-socket power m/c 5. With pgbench workload (unlogged tables), we are seeing gains from 15% to > 300% at client count >=72 [6]. It's not entirely clear which of the above tests were done on unlogged tables, and I don't see that in the referenced e-mails. That would be an interesting thing to mention in the summary, I think. There are many more tests done for the proposed patches where gains are either or similar lines as above or are neutral. We do see regression in some cases. 1. When data doesn't fit in shared buffers, there is regression at some client counts [7], but on analysis it has been found that it is mainly due to the shift in contention from CLOGControlLock to WALWriteLock and or other locks. The questions is why shifting the lock contention to WALWriteLock should cause such significant performance drop, particularly when the test was done on unlogged tables. Or, if that's the case, how it makes the performance drop less problematic / acceptable. FWIW I plan to run the same test with logged tables - if it shows similar regression, I'll be much more worried, because that's a fairly typical scenario (logged tables, data set > shared buffers), and we surely can't just go and break that. 2. We do see in some cases that granular_locking and no_content_lock patches has shown significant increase in contention on CLOGControlLock. I have already shared my analysis for same upthread [8]. I do agree that some cases this significantly reduces contention on the CLogControlLock. I do however think that currently the performance gains are limited almost exclusively to cases on unlogged tables, and some logged+async cases. On logged tables it usually looks like this (i.e. modest increase for high client counts at the expense of significantly higher variability): http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64 or like this (i.e. only partial recovery for the drop above 36 clients): http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64 And of course, there are cases like this: http://tvondra.bitbucket.org/#dilip-300-logged-async I'd really like to understand why the patched results behave that differently depending on client count. > > Attached is the latest group update clog patch. > How is that different from the previous versions? > In last commit fest, the patch was returned with feedback to evaluate the cases where it can show win and I think above results indicates that the patch has significant benefit on various workloads. What I think is pending at this stage is the either one of the committer or the reviewers of this patch needs to provide feedback on my analysis [8] for the cases where patches are not showing win. Thoughts? I do agree the patch(es) significantly reduce CLogControlLock, although with WAL logging enabled (which is what matters for most production deployments) it pretty much only shifts the contention to a different lock (so the immediate performance benefit is 0). Which raises the question why to commit this patch now, before we have a patch addressing the WAL locks. I realize this is a chicken-egg problem, but my worry is that the increased WALWriteLock contention will cause regressions in current workload
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar wrote: > On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar wrote: >> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra >> wrote: >> >>> In the results you've posted on 10/12, you've mentioned a regression with 32 >>> clients, where you got 52k tps on master but only 48k tps with the patch (so >>> ~10% difference). I have no idea what scale was used for those tests, >> >> That test was with scale factor 300 on POWER 4 socket machine. I think >> I need to repeat this test with multiple reading to confirm it was >> regression or run to run variation. I will do that soon and post the >> results. > > As promised, I have rerun my test (3 times), and I did not see any regression. > Thanks Tomas and Dilip for doing detailed performance tests for this patch. I would like to summarise the performance testing results. 1. With update intensive workload, we are seeing gains from 23%~192% at client count >=64 with group_update patch [1]. 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing gains from 12% to ~70% at client count >=64 [2]. Tests are done on 8-socket intel m/c. 3. With pgbench workload (both simple-update and tpc-b at 300 scale factor), we are seeing gain 10% to > 50% at client count >=64 [3]. Tests are done on 8-socket intel m/c. 4. To see why the patch only helps at higher client count, we have done wait event testing for various workloads [4], [5] and the results indicate that at lower clients, the waits are mostly due to transactionid or clientread. At client-counts where contention due to CLOGControlLock is significant, this patch helps a lot to reduce that contention. These tests are done on on 8-socket intel m/c and 4-socket power m/c 5. With pgbench workload (unlogged tables), we are seeing gains from 15% to > 300% at client count >=72 [6]. There are many more tests done for the proposed patches where gains are either or similar lines as above or are neutral. We do see regression in some cases. 1. When data doesn't fit in shared buffers, there is regression at some client counts [7], but on analysis it has been found that it is mainly due to the shift in contention from CLOGControlLock to WALWriteLock and or other locks. 2. We do see in some cases that granular_locking and no_content_lock patches has shown significant increase in contention on CLOGControlLock. I have already shared my analysis for same upthread [8]. Attached is the latest group update clog patch. In last commit fest, the patch was returned with feedback to evaluate the cases where it can show win and I think above results indicates that the patch has significant benefit on various workloads. What I think is pending at this stage is the either one of the committer or the reviewers of this patch needs to provide feedback on my analysis [8] for the cases where patches are not showing win. Thoughts? [1] - https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAFiTN-tr_%3D25EQUFezKNRk%3D4N-V%2BD6WMxo7HWs9BMaNx7S3y6w%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAFiTN-v5hm1EO4cLXYmpppYdNQk%2Bn4N-O1m%2B%2B3U9f0Ga1gBzRQ%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAFiTN-taV4iVkPHrxg%3DYCicKjBS6%3DQZm_cM4hbS_2q2ryLhUUw%40mail.gmail.com [5] - https://www.postgresql.org/message-id/CAFiTN-uQ%2BJbd31cXvRbj48Ba6TqDUDpLKSPnsUCCYRju0Y0U8Q%40mail.gmail.com [6] - http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip [7] - http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip [8] - https://www.postgresql.org/message-id/CAA4eK1J9VxJUnpOiQDf0O%3DZ87QUMbw%3DuGcQr4EaGbHSCibx9yA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar wrote: > On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra > wrote: > >> In the results you've posted on 10/12, you've mentioned a regression with 32 >> clients, where you got 52k tps on master but only 48k tps with the patch (so >> ~10% difference). I have no idea what scale was used for those tests, > > That test was with scale factor 300 on POWER 4 socket machine. I think > I need to repeat this test with multiple reading to confirm it was > regression or run to run variation. I will do that soon and post the > results. As promised, I have rerun my test (3 times), and I did not see any regression. Median of 3 run on both head and with group lock patch are same. However I am posting results of all three runs. I think in my earlier reading, we saw TPS ~48K with the patch, but I think over multiple run we get this reading with both head as well as with patch. Head: run1: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 87784836 latency average = 0.656 ms tps = 48769.327513 (including connections establishing) tps = 48769.543276 (excluding connections establishing) run2: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 91240374 latency average = 0.631 ms tps = 50689.069717 (including connections establishing) tps = 50689.263505 (excluding connections establishing) run3: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 90966003 latency average = 0.633 ms tps = 50536.639303 (including connections establishing) tps = 50536.836924 (excluding connections establishing) With group lock patch: -- run1: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 87316264 latency average = 0.660 ms tps = 48509.008040 (including connections establishing) tps = 48509.194978 (excluding connections establishing) run2: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 91950412 latency average = 0.626 ms tps = 51083.507790 (including connections establishing) tps = 51083.704489 (excluding connections establishing) run3: transaction type: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 90378462 latency average = 0.637 ms tps = 50210.225983 (including connections establishing) tps = 50210.405401 (excluding connections establishing) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 21, 2016 at 1:07 PM, Tomas Vondra wrote: > On 10/21/2016 08:13 AM, Amit Kapila wrote: >> >> On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas >> wrote: >>> >>> On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra >>> wrote: > > I then started a run at 96 clients which I accidentally killed shortly > before it was scheduled to finish, but the results are not much > different; there is no hint of the runaway CLogControlLock contention > that Dilip sees on power2. > What shared_buffer size were you using? I assume the data set fit into shared buffers, right? >>> >>> >>> 8GB. >>> FWIW as I explained in the lengthy post earlier today, I can actually reproduce the significant CLogControlLock contention (and the patches do reduce it), even on x86_64. >>> >>> >>> /me goes back, rereads post. Sorry, I didn't look at this carefully >>> the first time. >>> For example consider these two tests: * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip However, it seems I can also reproduce fairly bad regressions, like for example this case with data set exceeding shared_buffers: * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip >>> >>> >>> I'm not sure how seriously we should take the regressions. I mean, >>> what I see there is that CLogControlLock contention goes down by about >>> 50% -- which is the point of the patch -- and WALWriteLock contention >>> goes up dramatically -- which sucks, but can't really be blamed on the >>> patch except in the indirect sense that a backend can't spend much >>> time waiting for A if it's already spending all of its time waiting >>> for B. >>> >> >> Right, I think not only WALWriteLock, but contention on other locks >> also goes up as you can see in below table. I think there is nothing >> much we can do for that with this patch. One thing which is unclear >> is why on unlogged tests it is showing WALWriteLock? >> > > Well, although we don't write the table data to the WAL, we still need to > write commits and other stuff, right? > We do need to write commit, but do we need to flush it immediately to WAL for unlogged tables? It seems we allow WALWriter to do that, refer logic in RecordTransactionCommit. And on scale 3000 (which exceeds the > 16GB shared buffers in this case), there's a continuous stream of dirty > pages (not to WAL, but evicted from shared buffers), so iostat looks like > this: > > timetps wr_sec/s avgrq-sz avgqu-sz await %util > 08:48:21 81654 1367483 16.75 127264.60 1294.80 97.41 > 08:48:31 41514697516 16.80 103271.11 3015.01 97.64 > 08:48:41 78892 1359779 17.24 97308.42928.36 96.76 > 08:48:51 58735978475 16.66 92303.00 1472.82 95.92 > 08:49:01 62441 1068605 17.11 78482.71 1615.56 95.57 > 08:49:11 55571945365 17.01 113672.62 1923.37 98.07 > 08:49:21 69016 1161586 16.83 87055.66 1363.05 95.53 > 08:49:31 54552913461 16.74 98695.87 1761.30 97.84 > > That's ~500-600 MB/s of continuous writes. I'm sure the storage could handle > more than this (will do some testing after the tests complete), but surely > the WAL has to compete for bandwidth (it's on the same volume / devices). > Another thing is that we only have 8 WAL insert locks, and maybe that leads > to contention with such high client counts. > Yeah, quite possible, but I don't think increasing that would benefit in general, because while writing WAL we need to take all the wal_insert locks. In any case, I think that is a separate problem to study. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/21/2016 08:13 AM, Amit Kapila wrote: On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas wrote: On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra wrote: I then started a run at 96 clients which I accidentally killed shortly before it was scheduled to finish, but the results are not much different; there is no hint of the runaway CLogControlLock contention that Dilip sees on power2. What shared_buffer size were you using? I assume the data set fit into shared buffers, right? 8GB. FWIW as I explained in the lengthy post earlier today, I can actually reproduce the significant CLogControlLock contention (and the patches do reduce it), even on x86_64. /me goes back, rereads post. Sorry, I didn't look at this carefully the first time. For example consider these two tests: * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip However, it seems I can also reproduce fairly bad regressions, like for example this case with data set exceeding shared_buffers: * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip I'm not sure how seriously we should take the regressions. I mean, what I see there is that CLogControlLock contention goes down by about 50% -- which is the point of the patch -- and WALWriteLock contention goes up dramatically -- which sucks, but can't really be blamed on the patch except in the indirect sense that a backend can't spend much time waiting for A if it's already spending all of its time waiting for B. Right, I think not only WALWriteLock, but contention on other locks also goes up as you can see in below table. I think there is nothing much we can do for that with this patch. One thing which is unclear is why on unlogged tests it is showing WALWriteLock? Well, although we don't write the table data to the WAL, we still need to write commits and other stuff, right? And on scale 3000 (which exceeds the 16GB shared buffers in this case), there's a continuous stream of dirty pages (not to WAL, but evicted from shared buffers), so iostat looks like this: timetps wr_sec/s avgrq-sz avgqu-sz await %util 08:48:21 81654 1367483 16.75 127264.60 1294.80 97.41 08:48:31 41514697516 16.80 103271.11 3015.01 97.64 08:48:41 78892 1359779 17.24 97308.42928.36 96.76 08:48:51 58735978475 16.66 92303.00 1472.82 95.92 08:49:01 62441 1068605 17.11 78482.71 1615.56 95.57 08:49:11 55571945365 17.01 113672.62 1923.37 98.07 08:49:21 69016 1161586 16.83 87055.66 1363.05 95.53 08:49:31 54552913461 16.74 98695.87 1761.30 97.84 That's ~500-600 MB/s of continuous writes. I'm sure the storage could handle more than this (will do some testing after the tests complete), but surely the WAL has to compete for bandwidth (it's on the same volume / devices). Another thing is that we only have 8 WAL insert locks, and maybe that leads to contention with such high client counts. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra > wrote: >>> I then started a run at 96 clients which I accidentally killed shortly >>> before it was scheduled to finish, but the results are not much >>> different; there is no hint of the runaway CLogControlLock contention >>> that Dilip sees on power2. >>> >> What shared_buffer size were you using? I assume the data set fit into >> shared buffers, right? > > 8GB. > >> FWIW as I explained in the lengthy post earlier today, I can actually >> reproduce the significant CLogControlLock contention (and the patches do >> reduce it), even on x86_64. > > /me goes back, rereads post. Sorry, I didn't look at this carefully > the first time. > >> For example consider these two tests: >> >> * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync >> * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip >> >> However, it seems I can also reproduce fairly bad regressions, like for >> example this case with data set exceeding shared_buffers: >> >> * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip > > I'm not sure how seriously we should take the regressions. I mean, > what I see there is that CLogControlLock contention goes down by about > 50% -- which is the point of the patch -- and WALWriteLock contention > goes up dramatically -- which sucks, but can't really be blamed on the > patch except in the indirect sense that a backend can't spend much > time waiting for A if it's already spending all of its time waiting > for B. > Right, I think not only WALWriteLock, but contention on other locks also goes up as you can see in below table. I think there is nothing much we can do for that with this patch. One thing which is unclear is why on unlogged tests it is showing WALWriteLock? test | clients | wait_event_type | wait_event | master | granular_locking | no_content_lock | group_update --+-+-+--+-+--+-+-- pgbench-3000-unlogged-sync-skip | 72 | LWLockNamed | CLogControlLock | 217012 |37326 | 32288 |12040 pgbench-3000-unlogged-sync-skip | 72 | LWLockNamed | WALWriteLock | 13188 | 104183 | 123359 | 103267 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | buffer_content | 10532 |65880 | 57007 |86176 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | wal_insert |9280 |85917 | 109472 |99609 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | clog |4623 |25692 | 10422 |11755 -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 9:15 PM, Robert Haas wrote: > So here's my theory. The whole reason why Tomas is having difficulty > seeing any big effect from these patches is because he's testing on > x86. When Dilip tests on x86, he doesn't see a big effect either, > regardless of workload. But when Dilip tests on POWER, which I think > is where he's mostly been testing, he sees a huge effect, because for > some reason POWER has major problems with this lock that don't exist > on x86. Right, because on POWER we can see big contention on ClogControlLock with 300 scale factor, even at 96 client count, but on X86 with 300 scan factor there is almost no contention on ClogControlLock. However at 1000 scale factor we can see significant contention on ClogControlLock on X86 machine. I want to test on POWER with 1000 scale factor to see whether contention on ClogControlLock become much worse ? I will run this test and post the results. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra wrote: > In the results you've posted on 10/12, you've mentioned a regression with 32 > clients, where you got 52k tps on master but only 48k tps with the patch (so > ~10% difference). I have no idea what scale was used for those tests, That test was with scale factor 300 on POWER 4 socket machine. I think I need to repeat this test with multiple reading to confirm it was regression or run to run variation. I will do that soon and post the results. > and I > see no such regression in the current results (but you only report results > for some of the client counts). This test is on X86 8 socket machine, At 1000 scale factor I have given reading with all client counts (32,64,96,192), but at 300 scale factor I posted only with 192 because on this machine (X86 8 socket machine) I did not see much load on ClogControlLock at 300 scale factor. > > Also, which of the proposed patches have you been testing? I tested with GroupLock patch. > Can you collect and share a more complete set of data, perhaps based on the > scripts I use to do tests on the large machine with 36/72 cores, available > at https://bitbucket.org/tvondra/hp05-results ? I think from my last run I did not share data for -> X86 8 socket machine, 300 scale factor, 32,64,96 client. I already have those data so I ma sharing it. (Please let me know if you want to see at some other client count, for that I need to run another test.) Head: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 77233356 latency average: 0.746 ms tps = 42907.363243 (including connections establishing) tps = 42907.546190 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_32_ul.txt 111757 | 3666 1289 LWLockNamed | ProcArrayLock 1142 Lock| transactionid 318 LWLockNamed | CLogControlLock 299 Lock| extend 109 LWLockNamed | XidGenLock 70 LWLockTranche | buffer_content 35 Lock| tuple 29 LWLockTranche | lock_manager 14 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockNamed | CheckpointerCommLock Group Lock Patch: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 77544028 latency average: 0.743 ms tps = 43079.783906 (including connections establishing) tps = 43079.960331 (excluding connections establishing 112209 | 3718 1402 LWLockNamed | ProcArrayLock 1070 Lock| transactionid 245 LWLockNamed | CLogControlLock 188 Lock| extend 80 LWLockNamed | XidGenLock 76 LWLockTranche | buffer_content 39 LWLockTranche | lock_manager 31 Lock| tuple 7 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockTranche | buffer_mapping Head: number of clients: 64 number of threads: 64 duration: 1800 s number of transactions actually processed: 76211698 latency average: 1.512 ms tps = 42339.731054 (including connections establishing) tps = 42339.930464 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt 215734 | 5106 Lock| transactionid 3754 LWLockNamed | ProcArrayLock 3669 3267 LWLockNamed | CLogControlLock 661 Lock| extend 339 LWLockNamed | XidGenLock 310 Lock| tuple 289 LWLockTranche | buffer_content 205 LWLockTranche | lock_manager 50 LWLockTranche | wal_insert 2 LWLockTranche | buffer_mapping 1 Tuples only is on. 1 LWLockTranche | proc GroupLock patch: scaling factor: 300 query mode: prepared number of clients: 64 number of threads: 64 duration: 1800 s number of transactions actually processed: 76629309 latency average: 1.503 ms tps = 42571.704635 (including connections establishing) tps = 42571.905157 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt 217840 | 5197 Lock| transactionid 3744 LWLockNamed | ProcArrayLock 3663 966 Lock| extend 849 LWLockNamed | CLogControlLock 372 Lock| tuple 305 LWLockNamed | XidGenLock 199 LWLockTranche | buffer_content 184 LWLockTranche | lock_manager 35 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockTranche | proc 1 LWLockTranche | buffer_mapping Head: scaling factor: 300 query mode: prepared number of clients: 96 number of threads: 96 duration: 1800 s number of transactions actually processed: 77663593 latency average: 2.225 ms tps = 43145.624864 (including connections establishing) tps = 43145.838167 (excluding connections establishing) 302317 | 18836 Lo
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra wrote: >> I then started a run at 96 clients which I accidentally killed shortly >> before it was scheduled to finish, but the results are not much >> different; there is no hint of the runaway CLogControlLock contention >> that Dilip sees on power2. >> > What shared_buffer size were you using? I assume the data set fit into > shared buffers, right? 8GB. > FWIW as I explained in the lengthy post earlier today, I can actually > reproduce the significant CLogControlLock contention (and the patches do > reduce it), even on x86_64. /me goes back, rereads post. Sorry, I didn't look at this carefully the first time. > For example consider these two tests: > > * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync > * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip > > However, it seems I can also reproduce fairly bad regressions, like for > example this case with data set exceeding shared_buffers: > > * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip I'm not sure how seriously we should take the regressions. I mean, what I see there is that CLogControlLock contention goes down by about 50% -- which is the point of the patch -- and WALWriteLock contention goes up dramatically -- which sucks, but can't really be blamed on the patch except in the indirect sense that a backend can't spend much time waiting for A if it's already spending all of its time waiting for B. It would be nice to know why it happened, but we shouldn't allow CLogControlLock to act as an admission control facility for WALWriteLock (I think). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/20/2016 07:59 PM, Robert Haas wrote: On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas wrote: On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >> ... So here's my theory. The whole reason why Tomas is having difficulty seeing any big effect from these patches is because he's testing on x86. When Dilip tests on x86, he doesn't see a big effect either, regardless of workload. But when Dilip tests on POWER, which I think is where he's mostly been testing, he sees a huge effect, because for some reason POWER has major problems with this lock that don't exist on x86. If that's so, then we ought to be able to reproduce the big gains on hydra, a community POWER server. In fact, I think I'll go run a quick test over there right now... And ... nope. I ran a 30-minute pgbench test on unpatched master using unlogged tables at scale factor 300 with 64 clients and got these results: 14 LWLockTranche | wal_insert 36 LWLockTranche | lock_manager 45 LWLockTranche | buffer_content 223 Lock| tuple 527 LWLockNamed | CLogControlLock 921 Lock| extend 1195 LWLockNamed | XidGenLock 1248 LWLockNamed | ProcArrayLock 3349 Lock| transactionid 85957 Client | ClientRead 135935 | I then started a run at 96 clients which I accidentally killed shortly before it was scheduled to finish, but the results are not much different; there is no hint of the runaway CLogControlLock contention that Dilip sees on power2. What shared_buffer size were you using? I assume the data set fit into shared buffers, right? FWIW as I explained in the lengthy post earlier today, I can actually reproduce the significant CLogControlLock contention (and the patches do reduce it), even on x86_64. For example consider these two tests: * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip However, it seems I can also reproduce fairly bad regressions, like for example this case with data set exceeding shared_buffers: * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: >> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >>> I agree with these conclusions. I had a chance to talk with Andres >>> this morning at Postgres Vision and based on that conversation I'd >>> like to suggest a couple of additional tests: >>> >>> 1. Repeat this test on x86. In particular, I think you should test on >>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server. >> >> I have done my test on cthulhu, basic difference is that In POWER we >> saw ClogControlLock on top at 96 and more client with 300 scale >> factor. But, on cthulhu at 300 scale factor transactionid lock is >> always on top. So I repeated my test with 1000 scale factor as well on >> cthulhu. > > So the upshot appears to be that this problem is a lot worse on power2 > than cthulhu, which suggests that this is architecture-dependent. I > guess it could also be kernel-dependent, but it doesn't seem likely, > because: > > power2: Red Hat Enterprise Linux Server release 7.1 (Maipo), > 3.10.0-229.14.1.ael7b.ppc64le > cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64 > > So here's my theory. The whole reason why Tomas is having difficulty > seeing any big effect from these patches is because he's testing on > x86. When Dilip tests on x86, he doesn't see a big effect either, > regardless of workload. But when Dilip tests on POWER, which I think > is where he's mostly been testing, he sees a huge effect, because for > some reason POWER has major problems with this lock that don't exist > on x86. > > If that's so, then we ought to be able to reproduce the big gains on > hydra, a community POWER server. In fact, I think I'll go run a quick > test over there right now... And ... nope. I ran a 30-minute pgbench test on unpatched master using unlogged tables at scale factor 300 with 64 clients and got these results: 14 LWLockTranche | wal_insert 36 LWLockTranche | lock_manager 45 LWLockTranche | buffer_content 223 Lock| tuple 527 LWLockNamed | CLogControlLock 921 Lock| extend 1195 LWLockNamed | XidGenLock 1248 LWLockNamed | ProcArrayLock 3349 Lock| transactionid 85957 Client | ClientRead 135935 | I then started a run at 96 clients which I accidentally killed shortly before it was scheduled to finish, but the results are not much different; there is no hint of the runaway CLogControlLock contention that Dilip sees on power2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: > On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >> I agree with these conclusions. I had a chance to talk with Andres >> this morning at Postgres Vision and based on that conversation I'd >> like to suggest a couple of additional tests: >> >> 1. Repeat this test on x86. In particular, I think you should test on >> the EnterpriseDB server cthulhu, which is an 8-socket x86 server. > > I have done my test on cthulhu, basic difference is that In POWER we > saw ClogControlLock on top at 96 and more client with 300 scale > factor. But, on cthulhu at 300 scale factor transactionid lock is > always on top. So I repeated my test with 1000 scale factor as well on > cthulhu. So the upshot appears to be that this problem is a lot worse on power2 than cthulhu, which suggests that this is architecture-dependent. I guess it could also be kernel-dependent, but it doesn't seem likely, because: power2: Red Hat Enterprise Linux Server release 7.1 (Maipo), 3.10.0-229.14.1.ael7b.ppc64le cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64 So here's my theory. The whole reason why Tomas is having difficulty seeing any big effect from these patches is because he's testing on x86. When Dilip tests on x86, he doesn't see a big effect either, regardless of workload. But when Dilip tests on POWER, which I think is where he's mostly been testing, he sees a huge effect, because for some reason POWER has major problems with this lock that don't exist on x86. If that's so, then we ought to be able to reproduce the big gains on hydra, a community POWER server. In fact, I think I'll go run a quick test over there right now... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/20/2016 09:36 AM, Dilip Kumar wrote: On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: I agree with these conclusions. I had a chance to talk with Andres this morning at Postgres Vision and based on that conversation I'd like to suggest a couple of additional tests: 1. Repeat this test on x86. In particular, I think you should test on the EnterpriseDB server cthulhu, which is an 8-socket x86 server. I have done my test on cthulhu, basic difference is that In POWER we saw ClogControlLock on top at 96 and more client with 300 scale factor. But, on cthulhu at 300 scale factor transactionid lock is always on top. So I repeated my test with 1000 scale factor as well on cthulhu. All configuration is same as my last test. Test with 1000 scale factor - Test1: number of clients: 192 Head: tps = 21206.108856 (including connections establishing) tps = 21206.245441 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 310489 LWLockNamed | CLogControlLock 296152 | 35537 Lock| transactionid 15821 LWLockTranche | buffer_mapping 10342 LWLockTranche | buffer_content 8427 LWLockTranche | clog 3961 3165 Lock| extend 2861 Lock| tuple 2781 LWLockNamed | ProcArrayLock 1104 LWLockNamed | XidGenLock 745 LWLockTranche | lock_manager 371 LWLockNamed | CheckpointerCommLock 70 LWLockTranche | wal_insert 5 BufferPin | BufferPin 3 LWLockTranche | proc Patch: tps = 28725.038933 (including connections establishing) tps = 28725.367102 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 540061 | 57810 LWLockNamed | CLogControlLock 36264 LWLockTranche | buffer_mapping 29976 Lock| transactionid 4770 Lock| extend 4735 LWLockTranche | clog 4479 LWLockNamed | ProcArrayLock 4006 3955 LWLockTranche | buffer_content 2505 LWLockTranche | lock_manager 2179 Lock| tuple 1977 LWLockNamed | XidGenLock 905 LWLockNamed | CheckpointerCommLock 222 LWLockTranche | wal_insert 8 LWLockTranche | proc Test2: number of clients: 96 Head: tps = 25447.861572 (including connections establishing) tps = 25448.012739 (excluding connections establishing) 261611 | 69604 LWLockNamed | CLogControlLock 6119 Lock| transactionid 4008 2874 LWLockTranche | buffer_mapping 2578 LWLockTranche | buffer_content 2355 LWLockNamed | ProcArrayLock 1245 Lock| extend 1168 LWLockTranche | clog 232 Lock| tuple 217 LWLockNamed | CheckpointerCommLock 160 LWLockNamed | XidGenLock 158 LWLockTranche | lock_manager 78 LWLockTranche | wal_insert 5 BufferPin | BufferPin Patch: tps = 32708.368938 (including connections establishing) tps = 32708.765989 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt 326601 | 7471 LWLockNamed | CLogControlLock 5387 Lock| transactionid 4018 3331 LWLockTranche | buffer_mapping 3144 LWLockNamed | ProcArrayLock 1372 Lock| extend 722 LWLockTranche | buffer_content 393 LWLockNamed | XidGenLock 237 LWLockTranche | lock_manager 234 Lock| tuple 194 LWLockTranche | clog 96 Lock| relation 88 LWLockTranche | wal_insert 34 LWLockNamed | CheckpointerCommLock Test3: number of clients: 64 Head: tps = 28264.194438 (including connections establishing) tps = 28264.336270 (excluding connections establishing) 218264 | 10314 LWLockNamed | CLogControlLock 4019 2067 Lock| transactionid 1950 LWLockTranche | buffer_mapping 1879 LWLockNamed | ProcArrayLock 592 Lock| extend 565 LWLockTranche | buffer_content 222 LWLockNamed | XidGenLock 143 LWLockTranche | clog 131 LWLockNamed | CheckpointerCommLock 63 LWLockTranche | lock_manager 52 Lock| tuple 35 LWLockTranche | wal_insert Patch: tps = 27906.376194 (including connections establishing) tps = 27906.531392 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt 228108 | 4039 2294 Lock| transactionid 2116 LWLockTranche | buffer_mapping 1757 LWLockNamed | ProcArrayLock 1553 LWLockNamed | CLogControlLock 800 Lock| extend 403 LWLockTranche | buffer_content 92 LWLockNamed | XidGenLock 74 LWLockTranche | lock_manager 42 Lock| tuple 35 LWLockTranche | wal_insert 34 LWLockTranche | clog 14 LWLockNamed
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: > I agree with these conclusions. I had a chance to talk with Andres > this morning at Postgres Vision and based on that conversation I'd > like to suggest a couple of additional tests: > > 1. Repeat this test on x86. In particular, I think you should test on > the EnterpriseDB server cthulhu, which is an 8-socket x86 server. I have done my test on cthulhu, basic difference is that In POWER we saw ClogControlLock on top at 96 and more client with 300 scale factor. But, on cthulhu at 300 scale factor transactionid lock is always on top. So I repeated my test with 1000 scale factor as well on cthulhu. All configuration is same as my last test. Test with 1000 scale factor - Test1: number of clients: 192 Head: tps = 21206.108856 (including connections establishing) tps = 21206.245441 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 310489 LWLockNamed | CLogControlLock 296152 | 35537 Lock| transactionid 15821 LWLockTranche | buffer_mapping 10342 LWLockTranche | buffer_content 8427 LWLockTranche | clog 3961 3165 Lock| extend 2861 Lock| tuple 2781 LWLockNamed | ProcArrayLock 1104 LWLockNamed | XidGenLock 745 LWLockTranche | lock_manager 371 LWLockNamed | CheckpointerCommLock 70 LWLockTranche | wal_insert 5 BufferPin | BufferPin 3 LWLockTranche | proc Patch: tps = 28725.038933 (including connections establishing) tps = 28725.367102 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 540061 | 57810 LWLockNamed | CLogControlLock 36264 LWLockTranche | buffer_mapping 29976 Lock| transactionid 4770 Lock| extend 4735 LWLockTranche | clog 4479 LWLockNamed | ProcArrayLock 4006 3955 LWLockTranche | buffer_content 2505 LWLockTranche | lock_manager 2179 Lock| tuple 1977 LWLockNamed | XidGenLock 905 LWLockNamed | CheckpointerCommLock 222 LWLockTranche | wal_insert 8 LWLockTranche | proc Test2: number of clients: 96 Head: tps = 25447.861572 (including connections establishing) tps = 25448.012739 (excluding connections establishing) 261611 | 69604 LWLockNamed | CLogControlLock 6119 Lock| transactionid 4008 2874 LWLockTranche | buffer_mapping 2578 LWLockTranche | buffer_content 2355 LWLockNamed | ProcArrayLock 1245 Lock| extend 1168 LWLockTranche | clog 232 Lock| tuple 217 LWLockNamed | CheckpointerCommLock 160 LWLockNamed | XidGenLock 158 LWLockTranche | lock_manager 78 LWLockTranche | wal_insert 5 BufferPin | BufferPin Patch: tps = 32708.368938 (including connections establishing) tps = 32708.765989 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt 326601 | 7471 LWLockNamed | CLogControlLock 5387 Lock| transactionid 4018 3331 LWLockTranche | buffer_mapping 3144 LWLockNamed | ProcArrayLock 1372 Lock| extend 722 LWLockTranche | buffer_content 393 LWLockNamed | XidGenLock 237 LWLockTranche | lock_manager 234 Lock| tuple 194 LWLockTranche | clog 96 Lock| relation 88 LWLockTranche | wal_insert 34 LWLockNamed | CheckpointerCommLock Test3: number of clients: 64 Head: tps = 28264.194438 (including connections establishing) tps = 28264.336270 (excluding connections establishing) 218264 | 10314 LWLockNamed | CLogControlLock 4019 2067 Lock| transactionid 1950 LWLockTranche | buffer_mapping 1879 LWLockNamed | ProcArrayLock 592 Lock| extend 565 LWLockTranche | buffer_content 222 LWLockNamed | XidGenLock 143 LWLockTranche | clog 131 LWLockNamed | CheckpointerCommLock 63 LWLockTranche | lock_manager 52 Lock| tuple 35 LWLockTranche | wal_insert Patch: tps = 27906.376194 (including connections establishing) tps = 27906.531392 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt 228108 | 4039 2294 Lock| transactionid 2116 LWLockTranche | buffer_mapping 1757 LWLockNamed | ProcArrayLock 1553 LWLockNamed | CLogControlLock 800 Lock| extend 403 LWLockTranche | buffer_content 92 LWLockNamed | XidGenLock 74 LWLockTranche | lock_manager 42 Lock| tuple 35 LWLockTranche | wal_insert 34 LWLockTranche | clog 14 LWLockNamed | CheckpointerCommLock Test4: numb
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 13, 2016 at 7:53 AM, Tomas Vondra wrote: > On 10/12/2016 08:55 PM, Robert Haas wrote: >> On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar wrote: >>> I think at higher client count from client count 96 onwards contention >>> on CLogControlLock is clearly visible and which is completely solved >>> with group lock patch. >>> >>> And at lower client count 32,64 contention on CLogControlLock is not >>> significant hence we can not see any gain with group lock patch. >>> (though we can see some contention on CLogControlLock is reduced at 64 >>> clients.) >> >> I agree with these conclusions. I had a chance to talk with Andres >> this morning at Postgres Vision and based on that conversation I'd >> like to suggest a couple of additional tests: >> >> 1. Repeat this test on x86. In particular, I think you should test on >> the EnterpriseDB server cthulhu, which is an 8-socket x86 server. >> >> 2. Repeat this test with a mixed read-write workload, like -b >> tpcb-like@1 -b select-only@9 >> > > FWIW, I'm already running similar benchmarks on an x86 machine with 72 > cores (144 with HT). It's "just" a 4-socket system, but the results I > got so far seem quite interesting. The tooling and results (pushed > incrementally) are available here: > > https://bitbucket.org/tvondra/hp05-results/overview > > The tooling is completely automated, and it also collects various stats, > like for example the wait event. So perhaps we could simply run it on > ctulhu and get comparable results, and also more thorough data sets than > just snippets posted to the list? > > There's also a bunch of reports for the 5 already completed runs > > - dilip-300-logged-sync > - dilip-300-unlogged-sync > - pgbench-300-logged-sync-skip > - pgbench-300-unlogged-sync-noskip > - pgbench-300-unlogged-sync-skip > > The name identifies the workload type, scale and whether the tables are > wal-logged (for pgbench the "skip" means "-N" while "noskip" does > regular pgbench). > > For example the "reports/wait-events-count-patches.txt" compares the > wait even stats with different patches applied (and master): > > https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master&fileviewer=file-view-default > > and average tps (from 3 runs, 5 minutes each): > > https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master&fileviewer=file-view-default > > There are certainly interesting bits. For example while the "logged" > case is dominated y WALWriteLock for most client counts, for large > client counts that's no longer true. > > Consider for example dilip-300-logged-sync results with 216 clients: > > wait_event | master | gran_lock | no_cont_lock | group_upd > +-+---+--+--- > CLogControlLock | 624566 |474261 | 458599 |225338 > WALWriteLock| 431106 |623142 | 619596 |699224 > | 331542 |358220 | 371393 |537076 > buffer_content | 261308 |134764 | 138664 |102057 > ClientRead | 59826 |100883 | 103609 |118379 > transactionid | 26966 | 23155 |23815 | 31700 > ProcArrayLock |3967 | 3852 | 4070 | 4576 > wal_insert |3948 | 10430 | 9513 | 12079 > clog|1710 | 4006 | 2443 | 925 > XidGenLock |1689 | 3785 | 4229 | 3539 > tuple | 965 | 617 | 655 | 840 > lock_manager| 300 | 571 | 619 | 802 > WALBufMappingLock | 168 | 140 | 158 | 147 > SubtransControlLock | 60 | 115 | 124 | 105 > > Clearly, CLOG is an issue here, and it's (slightly) improved by all the > patches (group_update performing the best). And with 288 clients (which > is 2x the number of virtual cores in the machine, so not entirely crazy) > you get this: > > wait_event | master | gran_lock | no_cont_lock | group_upd > +-+---+--+--- > CLogControlLock | 901670 |736822 | 728823 |398111 > buffer_content | 492637 |318129 | 319251 |270416 > WALWriteLock| 414371 |593804 | 589809 |656613 > | 380344 |452936 | 470178 |745790 > ClientRead | 60261 |111367 | 111391 |126151 > transactionid | 43627 | 34585 |35464 | 48679 > wal_insert |5423 | 29323 |25898 | 30191 > ProcArrayLock |4379 | 3918 | 4006 | 4582 > clog|2952 | 9135 | 5304 | 2514 > XidGenLock |2182 | 9488 | 8894
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/12/2016 08:55 PM, Robert Haas wrote: > On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar wrote: >> I think at higher client count from client count 96 onwards contention >> on CLogControlLock is clearly visible and which is completely solved >> with group lock patch. >> >> And at lower client count 32,64 contention on CLogControlLock is not >> significant hence we can not see any gain with group lock patch. >> (though we can see some contention on CLogControlLock is reduced at 64 >> clients.) > > I agree with these conclusions. I had a chance to talk with Andres > this morning at Postgres Vision and based on that conversation I'd > like to suggest a couple of additional tests: > > 1. Repeat this test on x86. In particular, I think you should test on > the EnterpriseDB server cthulhu, which is an 8-socket x86 server. > > 2. Repeat this test with a mixed read-write workload, like -b > tpcb-like@1 -b select-only@9 > FWIW, I'm already running similar benchmarks on an x86 machine with 72 cores (144 with HT). It's "just" a 4-socket system, but the results I got so far seem quite interesting. The tooling and results (pushed incrementally) are available here: https://bitbucket.org/tvondra/hp05-results/overview The tooling is completely automated, and it also collects various stats, like for example the wait event. So perhaps we could simply run it on ctulhu and get comparable results, and also more thorough data sets than just snippets posted to the list? There's also a bunch of reports for the 5 already completed runs - dilip-300-logged-sync - dilip-300-unlogged-sync - pgbench-300-logged-sync-skip - pgbench-300-unlogged-sync-noskip - pgbench-300-unlogged-sync-skip The name identifies the workload type, scale and whether the tables are wal-logged (for pgbench the "skip" means "-N" while "noskip" does regular pgbench). For example the "reports/wait-events-count-patches.txt" compares the wait even stats with different patches applied (and master): https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master&fileviewer=file-view-default and average tps (from 3 runs, 5 minutes each): https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master&fileviewer=file-view-default There are certainly interesting bits. For example while the "logged" case is dominated y WALWriteLock for most client counts, for large client counts that's no longer true. Consider for example dilip-300-logged-sync results with 216 clients: wait_event | master | gran_lock | no_cont_lock | group_upd +-+---+--+--- CLogControlLock | 624566 |474261 | 458599 |225338 WALWriteLock| 431106 |623142 | 619596 |699224 | 331542 |358220 | 371393 |537076 buffer_content | 261308 |134764 | 138664 |102057 ClientRead | 59826 |100883 | 103609 |118379 transactionid | 26966 | 23155 |23815 | 31700 ProcArrayLock |3967 | 3852 | 4070 | 4576 wal_insert |3948 | 10430 | 9513 | 12079 clog|1710 | 4006 | 2443 | 925 XidGenLock |1689 | 3785 | 4229 | 3539 tuple | 965 | 617 | 655 | 840 lock_manager| 300 | 571 | 619 | 802 WALBufMappingLock | 168 | 140 | 158 | 147 SubtransControlLock | 60 | 115 | 124 | 105 Clearly, CLOG is an issue here, and it's (slightly) improved by all the patches (group_update performing the best). And with 288 clients (which is 2x the number of virtual cores in the machine, so not entirely crazy) you get this: wait_event | master | gran_lock | no_cont_lock | group_upd +-+---+--+--- CLogControlLock | 901670 |736822 | 728823 |398111 buffer_content | 492637 |318129 | 319251 |270416 WALWriteLock| 414371 |593804 | 589809 |656613 | 380344 |452936 | 470178 |745790 ClientRead | 60261 |111367 | 111391 |126151 transactionid | 43627 | 34585 |35464 | 48679 wal_insert |5423 | 29323 |25898 | 30191 ProcArrayLock |4379 | 3918 | 4006 | 4582 clog|2952 | 9135 | 5304 | 2514 XidGenLock |2182 | 9488 | 8894 | 8595 tuple |2176 | 1288 | 1409 | 1821 lock_manager| 323 | 797 | 827 | 1006 WALBufMappingLock | 124 | 124 |
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar wrote: > I think at higher client count from client count 96 onwards contention > on CLogControlLock is clearly visible and which is completely solved > with group lock patch. > > And at lower client count 32,64 contention on CLogControlLock is not > significant hence we can not see any gain with group lock patch. > (though we can see some contention on CLogControlLock is reduced at 64 > clients.) I agree with these conclusions. I had a chance to talk with Andres this morning at Postgres Vision and based on that conversation I'd like to suggest a couple of additional tests: 1. Repeat this test on x86. In particular, I think you should test on the EnterpriseDB server cthulhu, which is an 8-socket x86 server. 2. Repeat this test with a mixed read-write workload, like -b tpcb-like@1 -b select-only@9 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 10, 2016 at 2:17 AM, Tomas Vondra wrote: > after testing each combination (every ~9 hours). Inspired by Robert's wait > event post a few days ago, I've added wait event sampling so that we can > perform similar analysis. (Neat idea!) I have done wait event test on for head vs group lock patch. I have used similar script what Robert has mentioned in below thread https://www.postgresql.org/message-id/ca+tgmoav9q5v5zgt3+wp_1tqjt6tgyxrwrdctrrwimc+zy7...@mail.gmail.com Test details and Results: Machine, POWER, 4 socket machine (machine details are attached in file.) 30-minute pgbench runs with configurations, had max_connections = 200, shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, log_line_prefix = '%t [%p] max_wal_size = 40GB, log_checkpoints =on. Test1: unlogged table, 192 clients - On Head: tps = 44898.862257 (including connections establishing) tps = 44899.761934 (excluding connections establishing) 262092 LWLockNamed | CLogControlLock 224396 | 114510 Lock| transactionid 42908 Client | ClientRead 20610 Lock| tuple 13700 LWLockTranche | buffer_content 3637 2562 LWLockNamed | XidGenLock 2359 LWLockNamed | ProcArrayLock 1037 Lock| extend 948 LWLockTranche | lock_manager 46 LWLockTranche | wal_insert 12 BufferPin | BufferPin 4 LWLockTranche | buffer_mapping With Patch: tps = 77846.622956 (including connections establishing) tps = 77848.234046 (excluding connections establishing) 101832 Lock| transactionid 91358 Client | ClientRead 16691 LWLockNamed | XidGenLock 12467 Lock| tuple 6007 LWLockNamed | CLogControlLock 3640 3531 LWLockNamed | ProcArrayLock 3390 LWLockTranche | lock_manager 2683 Lock| extend 1112 LWLockTranche | buffer_content 72 LWLockTranche | wal_insert 8 LWLockTranche | buffer_mapping 2 LWLockTranche | proc 2 BufferPin | BufferPin Test2: unlogged table, 96 clients -- On head: tps = 58632.065563 (including connections establishing) tps = 58632.767384 (excluding connections establishing) 77039 LWLockNamed | CLogControlLock 39712 Client | ClientRead 18358 Lock| transactionid 4238 LWLockNamed | XidGenLock 3638 3518 LWLockTranche | buffer_content 2717 LWLockNamed | ProcArrayLock 1410 Lock| tuple 792 Lock| extend 182 LWLockTranche | lock_manager 30 LWLockTranche | wal_insert 3 LWLockTranche | buffer_mapping 1 Tuples only is on. 1 BufferPin | BufferPin With Patch: tps = 75204.166640 (including connections establishing) tps = 75204.922105 (excluding connections establishing) [dilip.kumar@power2 bin]$ cat out_300_96_ul.txt 261917 | 53407 Client | ClientRead 14994 Lock| transactionid 5258 LWLockNamed | XidGenLock 3660 3604 LWLockNamed | ProcArrayLock 2096 LWLockNamed | CLogControlLock 1102 Lock| tuple 823 Lock| extend 481 LWLockTranche | buffer_content 372 LWLockTranche | lock_manager 192 Lock| relation 65 LWLockTranche | wal_insert 6 LWLockTranche | buffer_mapping 1 Tuples only is on. 1 LWLockTranche | proc Test3: unlogged table, 64 clients -- On Head: tps = 66231.203018 (including connections establishing) tps = 66231.664990 (excluding connections establishing) 43446 Client | ClientRead 6992 LWLockNamed | CLogControlLock 4685 Lock| transactionid 3650 3381 LWLockNamed | ProcArrayLock 810 LWLockNamed | XidGenLock 734 Lock| extend 439 LWLockTranche | buffer_content 247 Lock| tuple 136 LWLockTranche | lock_manager 64 Lock| relation 24 LWLockTranche | wal_insert 2 LWLockTranche | buffer_mapping 1 Tuples only is on. With Patch: tps = 67294.042602 (including connections establishing) tps = 67294.532650 (excluding connections establishing) 28186 Client | ClientRead 3655 1172 LWLockNamed | ProcArrayLock 619 Lock| transactionid 289 LWLockNamed | CLogControlLock 237 Lock| extend 81 LWLockTranche | buffer_content 48 LWLockNamed | XidGenLock 28 LWLockTranche | lock_manager 23 Lock| tuple 6 LWLockTranche | wal_insert Test4: unlogged table, 32 clients Head: tps = 52320.190549 (including connections establishing)
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/08/2016 07:47 AM, Amit Kapila wrote: On Fri, Oct 7, 2016 at 3:02 PM, Tomas Vondra wrote: > > ... > In total, I plan to test combinations of: (a) Dilip's workload and pgbench (regular and -N) (b) logged and unlogged tables (c) scale 300 and scale 3000 (both fits into RAM) (d) sync_commit=on/off sounds sensible. Thanks for doing the tests. FWIW I've started those tests on the big machine provided by Oleg and Alexander, an estimate to complete all the benchmarks is 9 days. The results will be pushed to https://bitbucket.org/tvondra/hp05-results/src after testing each combination (every ~9 hours). Inspired by Robert's wait event post a few days ago, I've added wait event sampling so that we can perform similar analysis. (Neat idea!) While messing with the kernel on the other machine I've managed to misconfigure it to the extent that it's not accessible anymore. I'll start similar benchmarks once I find someone with console access who can fix the boot. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 7, 2016 at 3:02 PM, Tomas Vondra wrote: > > I got access to a large machine with 72/144 cores (thanks to Oleg and > Alexander from Postgres Professional), and I'm running the tests on that > machine too. > > Results from Dilip's workload (with scale 300, unlogged tables) look like > this: > > 32 64128 192224 256288 > master104943 128579 72167 100967 66631 97088 63767 > granular-locking 103415 141689 83780 120480 71847 115201 67240 > group-update 105343 144322 92229 130149 81247 126629 76638 > no-content-lock 103153 140568 80101 119185 70004 115386 66199 > > So there's some 20-30% improvement for >= 128 clients. > So here we see performance improvement starting at 64 clients, this is somewhat similar to what Dilip saw in his tests. > But what I find much more intriguing is the zig-zag behavior. I mean, 64 > clients give ~130k tps, 128 clients only give ~70k but 192 clients jump up > to >100k tps again, etc. > No clear answer. > FWIW I don't see any such behavior on pgbench, and all those tests were done > on the same cluster. > >>> With 4.5.5, results for the same benchmark look like this: >>> >>>64128192 >>> >>> master 35693 39822 42151 >>> granular-locking 35370 39409 41353 >>> no-content-lock36201 39848 42407 >>> group-update 35697 39893 42667 >>> >>> That seems like a fairly bad regression in kernel, although I have not >>> identified the feature/commit causing it (and it's also possible the >>> issue >>> lies somewhere else, of course). >>> >>> With regular pgbench, I see no improvement on any kernel version. For >>> example on 3.19 the results look like this: >>> >>>64128192 >>> >>> master 54661 61014 59484 >>> granular-locking 55904 62481 60711 >>> no-content-lock56182 62442 61234 >>> group-update 55019 61587 60485 >>> >> >> Are the above results with synchronous_commit=off? >> > > No, but I can do that. > >>> I haven't done much more testing (e.g. with -N to eliminate >>> collisions on branches) yet, let's see if it changes anything. >>> >> >> Yeah, let us see how it behaves with -N. Also, I think we could try >> at higher scale factor? >> > > Yes, I plan to do that. In total, I plan to test combinations of: > > (a) Dilip's workload and pgbench (regular and -N) > (b) logged and unlogged tables > (c) scale 300 and scale 3000 (both fits into RAM) > (d) sync_commit=on/off > sounds sensible. Thanks for doing the tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/05/2016 10:03 AM, Amit Kapila wrote: On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondra wrote: Hi, After collecting a lot more results from multiple kernel versions, I can confirm that I see a significant improvement with 128 and 192 clients, roughly by 30%: 64128192 master 62482 43181 50985 granular-locking 61701 59611 47483 no-content-lock62650 59819 47895 group-update 63702 64758 62596 But I only see this with Dilip's workload, and only with pre-4.3.0 kernels (the results above are from kernel 3.19). That appears positive. I got access to a large machine with 72/144 cores (thanks to Oleg and Alexander from Postgres Professional), and I'm running the tests on that machine too. Results from Dilip's workload (with scale 300, unlogged tables) look like this: 32 64128 192224 256288 master104943 128579 72167 100967 66631 97088 63767 granular-locking 103415 141689 83780 120480 71847 115201 67240 group-update 105343 144322 92229 130149 81247 126629 76638 no-content-lock 103153 140568 80101 119185 70004 115386 66199 So there's some 20-30% improvement for >= 128 clients. But what I find much more intriguing is the zig-zag behavior. I mean, 64 clients give ~130k tps, 128 clients only give ~70k but 192 clients jump up to >100k tps again, etc. FWIW I don't see any such behavior on pgbench, and all those tests were done on the same cluster. With 4.5.5, results for the same benchmark look like this: 64128192 master 35693 39822 42151 granular-locking 35370 39409 41353 no-content-lock36201 39848 42407 group-update 35697 39893 42667 That seems like a fairly bad regression in kernel, although I have not identified the feature/commit causing it (and it's also possible the issue lies somewhere else, of course). With regular pgbench, I see no improvement on any kernel version. For example on 3.19 the results look like this: 64128192 master 54661 61014 59484 granular-locking 55904 62481 60711 no-content-lock56182 62442 61234 group-update 55019 61587 60485 Are the above results with synchronous_commit=off? No, but I can do that. I haven't done much more testing (e.g. with -N to eliminate collisions on branches) yet, let's see if it changes anything. Yeah, let us see how it behaves with -N. Also, I think we could try at higher scale factor? Yes, I plan to do that. In total, I plan to test combinations of: (a) Dilip's workload and pgbench (regular and -N) (b) logged and unlogged tables (c) scale 300 and scale 3000 (both fits into RAM) (d) sync_commit=on/off regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondra wrote: > Hi, > > After collecting a lot more results from multiple kernel versions, I can > confirm that I see a significant improvement with 128 and 192 clients, > roughly by 30%: > >64128192 > > master 62482 43181 50985 > granular-locking 61701 59611 47483 > no-content-lock62650 59819 47895 > group-update 63702 64758 62596 > > But I only see this with Dilip's workload, and only with pre-4.3.0 kernels > (the results above are from kernel 3.19). > That appears positive. > With 4.5.5, results for the same benchmark look like this: > >64128192 > > master 35693 39822 42151 > granular-locking 35370 39409 41353 > no-content-lock36201 39848 42407 > group-update 35697 39893 42667 > > That seems like a fairly bad regression in kernel, although I have not > identified the feature/commit causing it (and it's also possible the issue > lies somewhere else, of course). > > With regular pgbench, I see no improvement on any kernel version. For > example on 3.19 the results look like this: > >64128192 > > master 54661 61014 59484 > granular-locking 55904 62481 60711 > no-content-lock56182 62442 61234 > group-update 55019 61587 60485 > Are the above results with synchronous_commit=off? > I haven't done much more testing (e.g. with -N to eliminate collisions on > branches) yet, let's see if it changes anything. > Yeah, let us see how it behaves with -N. Also, I think we could try at higher scale factor? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi, After collecting a lot more results from multiple kernel versions, I can confirm that I see a significant improvement with 128 and 192 clients, roughly by 30%: 64128192 master 62482 43181 50985 granular-locking 61701 59611 47483 no-content-lock62650 59819 47895 group-update 63702 64758 62596 But I only see this with Dilip's workload, and only with pre-4.3.0 kernels (the results above are from kernel 3.19). With 4.5.5, results for the same benchmark look like this: 64128192 master 35693 39822 42151 granular-locking 35370 39409 41353 no-content-lock36201 39848 42407 group-update 35697 39893 42667 That seems like a fairly bad regression in kernel, although I have not identified the feature/commit causing it (and it's also possible the issue lies somewhere else, of course). With regular pgbench, I see no improvement on any kernel version. For example on 3.19 the results look like this: 64128192 master 54661 61014 59484 granular-locking 55904 62481 60711 no-content-lock56182 62442 61234 group-update 55019 61587 60485 I haven't done much more testing (e.g. with -N to eliminate collisions on branches) yet, let's see if it changes anything. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 8:05 PM, Robert Haas wrote: > OK, another theory: Dilip is, I believe, reinitializing for each run, > and you are not. Yes, I am reinitializing for each run. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 10:14 AM, Tomas Vondra wrote: >> It's not impossible that the longer runs could matter - performance >> isn't necessarily stable across time during a pgbench test, and the >> longer the run the more CLOG pages it will fill. > > Sure, but I'm not doing just a single pgbench run. I do a sequence of > pgbench runs, with different client counts, with ~6h of total runtime. > There's a checkpoint in between the runs, but as those benchmarks are on > unlogged tables, that flushes only very few buffers. > > Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M > transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 > clients or more, which means we fill the 128 pages in less than 80 seconds. > > So half-way through the run only 50% of clog pages fits into the SLRU, and > we have a data set with 30M tuples, with uniform random access - so it seems > rather unlikely we'll get transaction that's still in the SLRU. > > But sure, I can do a run with larger data set to verify this. OK, another theory: Dilip is, I believe, reinitializing for each run, and you are not. Maybe somehow the effect Dilip is seeing only happens with a newly-initialized set of pgbench tables. For example, maybe the patches cause a huge improvement when all rows have the same XID, but the effect fades rapidly once the XIDs spread out... I'm not saying any of what I'm throwing out here is worth the electrons upon which it is printed, just that there has to be some explanation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/29/2016 03:47 PM, Robert Haas wrote: On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra wrote: I feel like we must be missing something here. If Dilip is seeing huge speedups and you're seeing nothing, something is different, and we don't know what it is. Even if the test case is artificial, it ought to be the same when one of you runs it as when the other runs it. Right? Yes, definitely - we're missing something important, I think. One difference is that Dilip is using longer runs, but I don't think that's a problem (as I demonstrated how stable the results are). It's not impossible that the longer runs could matter - performance isn't necessarily stable across time during a pgbench test, and the longer the run the more CLOG pages it will fill. Sure, but I'm not doing just a single pgbench run. I do a sequence of pgbench runs, with different client counts, with ~6h of total runtime. There's a checkpoint in between the runs, but as those benchmarks are on unlogged tables, that flushes only very few buffers. Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 clients or more, which means we fill the 128 pages in less than 80 seconds. So half-way through the run only 50% of clog pages fits into the SLRU, and we have a data set with 30M tuples, with uniform random access - so it seems rather unlikely we'll get transaction that's still in the SLRU. But sure, I can do a run with larger data set to verify this. I wonder what CPU model is Dilip using - I know it's x86, but not which generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer model and it makes a difference (although that seems unlikely). The fact that he's using an 8-socket machine seems more likely to matter than the CPU generation, which isn't much different. Maybe Dilip should try this on a 2-socket machine and see if he sees the same kinds of results. Maybe. I wouldn't expect a major difference between 4 and 8 sockets, but I may be wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra wrote: >> I feel like we must be missing something here. If Dilip is seeing >> huge speedups and you're seeing nothing, something is different, and >> we don't know what it is. Even if the test case is artificial, it >> ought to be the same when one of you runs it as when the other runs >> it. Right? >> > Yes, definitely - we're missing something important, I think. One difference > is that Dilip is using longer runs, but I don't think that's a problem (as I > demonstrated how stable the results are). It's not impossible that the longer runs could matter - performance isn't necessarily stable across time during a pgbench test, and the longer the run the more CLOG pages it will fill. > I wonder what CPU model is Dilip using - I know it's x86, but not which > generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer > model and it makes a difference (although that seems unlikely). The fact that he's using an 8-socket machine seems more likely to matter than the CPU generation, which isn't much different. Maybe Dilip should try this on a 2-socket machine and see if he sees the same kinds of results. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 12:56 PM, Dilip Kumar wrote: > On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra > wrote: >> Yes, definitely - we're missing something important, I think. One difference >> is that Dilip is using longer runs, but I don't think that's a problem (as I >> demonstrated how stable the results are). >> >> I wonder what CPU model is Dilip using - I know it's x86, but not which >> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer >> model and it makes a difference (although that seems unlikely). > > I am using "Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz " > Another difference is that m/c on which Dilip is doing tests has 8 sockets. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra wrote: > Yes, definitely - we're missing something important, I think. One difference > is that Dilip is using longer runs, but I don't think that's a problem (as I > demonstrated how stable the results are). > > I wonder what CPU model is Dilip using - I know it's x86, but not which > generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer > model and it makes a difference (although that seems unlikely). I am using "Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz " -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/29/2016 01:59 AM, Robert Haas wrote: On Wed, Sep 28, 2016 at 6:45 PM, Tomas Vondra wrote: So, is 300 too little? I don't think so, because Dilip saw some benefit from that. Or what scale factor do we think is needed to reproduce the benefit? My machine has 256GB of ram, so I can easily go up to 15000 and still keep everything in RAM. But is it worth it? Dunno. But it might be worth a test or two at, say, 5000, just to see if that makes any difference. OK, I have some benchmarks to run on that machine, but I'll do a few tests with scale 5000 - probably sometime next week. I don't think the delay matters very much, as it's clear the patch will end up with RwF in this CF round. I feel like we must be missing something here. If Dilip is seeing huge speedups and you're seeing nothing, something is different, and we don't know what it is. Even if the test case is artificial, it ought to be the same when one of you runs it as when the other runs it. Right? Yes, definitely - we're missing something important, I think. One difference is that Dilip is using longer runs, but I don't think that's a problem (as I demonstrated how stable the results are). I wonder what CPU model is Dilip using - I know it's x86, but not which generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer model and it makes a difference (although that seems unlikely). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 28, 2016 at 6:45 PM, Tomas Vondra wrote: > So, is 300 too little? I don't think so, because Dilip saw some benefit from > that. Or what scale factor do we think is needed to reproduce the benefit? > My machine has 256GB of ram, so I can easily go up to 15000 and still keep > everything in RAM. But is it worth it? Dunno. But it might be worth a test or two at, say, 5000, just to see if that makes any difference. I feel like we must be missing something here. If Dilip is seeing huge speedups and you're seeing nothing, something is different, and we don't know what it is. Even if the test case is artificial, it ought to be the same when one of you runs it as when the other runs it. Right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/28/2016 05:39 PM, Robert Haas wrote: On Tue, Sep 27, 2016 at 5:15 PM, Tomas Vondra wrote: So, I got the results from 3.10.101 (only the pgbench data), and it looks like this: 3.10.101 1 8 16 32 64128192 granular-locking2582 18492 33416 49583 53759 53572 51295 no-content-lock 2580 18666 33860 49976 54382 54012 51549 group-update2635 18877 33806 49525 54787 54117 51718 master 2630 18783 33630 49451 54104 53199 50497 So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5), and there's no sign any of the patches making a difference. I'm sure that you mentioned this upthread somewhere, but I can't immediately find it. What scale factor are you testing here? 300, the same scale factor as Dilip. It strikes me that the larger the scale factor, the more CLogControlLock contention we expect to have. We'll pretty much do one CLOG access per update, and the more rows there are, the more chance there is that the next update hits an "old" row that hasn't been updated in a long time. So a larger scale factor also increases the number of active CLOG pages and, presumably therefore, the amount of CLOG paging activity. > So, is 300 too little? I don't think so, because Dilip saw some benefit from that. Or what scale factor do we think is needed to reproduce the benefit? My machine has 256GB of ram, so I can easily go up to 15000 and still keep everything in RAM. But is it worth it? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Sep 27, 2016 at 5:15 PM, Tomas Vondra wrote: > So, I got the results from 3.10.101 (only the pgbench data), and it looks > like this: > > 3.10.101 1 8 16 32 64128192 > > granular-locking2582 18492 33416 49583 53759 53572 51295 > no-content-lock 2580 18666 33860 49976 54382 54012 51549 > group-update2635 18877 33806 49525 54787 54117 51718 > master 2630 18783 33630 49451 54104 53199 50497 > > So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5), > and there's no sign any of the patches making a difference. I'm sure that you mentioned this upthread somewhere, but I can't immediately find it. What scale factor are you testing here? It strikes me that the larger the scale factor, the more CLogControlLock contention we expect to have. We'll pretty much do one CLOG access per update, and the more rows there are, the more chance there is that the next update hits an "old" row that hasn't been updated in a long time. So a larger scale factor also increases the number of active CLOG pages and, presumably therefore, the amount of CLOG paging activity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/26/2016 08:48 PM, Tomas Vondra wrote: On 09/26/2016 07:16 PM, Tomas Vondra wrote: The averages (over the 10 runs, 5 minute each) look like this: 3.2.80 1 8 16 32 64128192 granular-locking1567 12146 26341 44188 43263 49590 15042 no-content-lock 1567 12180 25549 43787 43675 51800 16831 group-update1550 12018 26121 44451 42734 51455 15504 master 1566 12057 25457 42299 42513 42562 10462 4.5.5 1 8 16 32 64128192 granular-locking3018 19031 27394 29222 32032 34249 36191 no-content-lock 2988 18871 27384 29260 32120 34456 36216 group-update2960 18848 26870 29025 32078 34259 35900 master 2984 18917 26430 29065 32119 33924 35897 So, I got the results from 3.10.101 (only the pgbench data), and it looks like this: 3.10.101 1 8 16 32 64128192 granular-locking2582 18492 33416 49583 53759 53572 51295 no-content-lock 2580 18666 33860 49976 54382 54012 51549 group-update2635 18877 33806 49525 54787 54117 51718 master 2630 18783 33630 49451 54104 53199 50497 So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5), and there's no sign any of the patches making a difference. It also seems there's a major regression in the kernel, somewhere between 3.10 and 4.5. With 64 clients, 3.10 does ~54k transactions, while 4.5 does only ~32k - that's helluva difference. I wonder if this might be due to running the benchmark on unlogged tables (and thus not waiting for WAL), but I don't see why that should result in such drop on a new kernel. In any case, this seems like an issue unrelated to the patch, so I'll post further data into a new thread instead of hijacking this one. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 21, 2016 at 8:47 AM, Dilip Kumar wrote: > Summary: > -- > At 32 clients no gain, I think at this workload Clog Lock is not a problem. > At 64 Clients we can see ~10% gain with simple update and ~5% with TPCB. > At 128 Clients we can see > 50% gain. > > Currently I have tested with synchronous commit=off, later I can try > with on. I can also test at 80 client, I think we will see some > significant gain at this client count also, but as of now I haven't > yet tested. > > With above results, what we think ? should we continue our testing ? I have done further testing with on TPCB workload to see the impact on performance gain by increasing scale factor. Again at 32 client there is no gain, but at 64 client gain is 12% and at 128 client it's 75%, it shows that improvement with group lock is better at higher scale factor (at 300 scale factor gain was 5% at 64 client and 50% at 128 clients). 8 socket machine (kernel 3.10) 10 min run(median of 3 run) synchronous_commit=off scal factor = 1000 share buffer= 40GB Test results: client head group lock 32 27496 27178 64 31275 35205 1282065634490 LWLOCK_STATS approx. block count on ClogControl Lock ("lwlock main 11") client head group lock 32 8 6 6415 10 128 14 7 Note: These are approx. block count, I have detailed result of LWLOCK_STAT, incase someone wants to look into. LWLOCK_STATS shows that ClogControl lock block count reduced by 25% at 32 client, 33% at 64 client and 50% at 128 client. Conclusion: 1. I think both LWLOCK_STATS and performance data shows that we get significant contention reduction on ClogControlLock with the patch. 2. It also shows that though we are not seeing any performance gain at 32 clients, but there is contention reduction with patch. I am planning to do some more test with higher scale factor (3000 or more). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/26/2016 07:16 PM, Tomas Vondra wrote: The averages (over the 10 runs, 5 minute each) look like this: 3.2.80 1 8 16 32 64128192 granular-locking1567 12146 26341 44188 43263 49590 15042 no-content-lock 1567 12180 25549 43787 43675 51800 16831 group-update1550 12018 26121 44451 42734 51455 15504 master 1566 12057 25457 42299 42513 42562 10462 4.5.5 1 8 16 32 64128192 granular-locking3018 19031 27394 29222 32032 34249 36191 no-content-lock 2988 18871 27384 29260 32120 34456 36216 group-update2960 18848 26870 29025 32078 34259 35900 master 2984 18917 26430 29065 32119 33924 35897 That is: (1) The 3.2.80 performs a bit better than before, particularly for 128 and 256 clients - I'm not sure if it's thanks to the reboots or so. (2) 4.5.5 performs measurably worse for >= 32 clients (by ~30%). That's a pretty significant regression, on a fairly common workload. FWIW, now that I think about this, the regression is roughly in line with my findings presented in my recent blog post: http://blog.2ndquadrant.com/postgresql-vs-kernel-versions/ Those numbers were collected on a much smaller machine (2/4 cores only), which might be why the difference observed on 32-core machine is much more significant. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 9:20 AM, Amit Kapila wrote: > On Fri, Sep 23, 2016 at 6:50 AM, Robert Haas wrote: >> On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra >> wrote: >>> I don't dare to suggest rejecting the patch, but I don't see how we could >>> commit any of the patches at this point. So perhaps "returned with feedback" >>> and resubmitting in the next CF (along with analysis of improved workloads) >>> would be appropriate. >> >> I think it would be useful to have some kind of theoretical analysis >> of how much time we're spending waiting for various locks. So, for >> example, suppose we one run of these tests with various client counts >> - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select >> wait_event from pg_stat_activity" once per second throughout the test. >> Then we see how many times we get each wait event, including NULL (no >> wait event). Now, from this, we can compute the approximate >> percentage of time we're spending waiting on CLogControlLock and every >> other lock, too, as well as the percentage of time we're not waiting >> for lock. That, it seems to me, would give us a pretty clear idea >> what the maximum benefit we could hope for from reducing contention on >> any given lock might be. >> > As mentioned earlier, such an activity makes sense, however today, > again reading this thread, I noticed that Dilip has already posted > some analysis of lock contention upthread [1]. It is clear that patch > has reduced LWLock contention from ~28% to ~4% (where the major > contributor was TransactionIdSetPageStatus which has reduced from ~53% > to ~3%). Isn't it inline with what you are looking for? Hmm, yes. But it's a little hard to interpret what that means; I think the test I proposed in the quoted material above would provide clearer data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/24/2016 06:06 AM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra wrote: ... >> So I'm using 16GB shared buffers (so with scale 300 everything fits into shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout 1h etc. So no, there are no checkpoints during the 5-minute runs, only those triggered explicitly before each run. Thanks for clarification. Do you think we should try some different settings *_flush_after parameters as those can help in reducing spikes in writes? I don't see why that settings would matter. The tests are on unlogged tables, so there's almost no WAL traffic and checkpoints (triggered explicitly before each run) look like this: checkpoint complete: wrote 17 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 13 recycled; write=0.062 s, sync=0.006 s, total=0.092 s; sync files=10, longest=0.004 s, average=0.000 s; distance=309223 kB, estimate=363742 kB So I don't see how tuning the flushing would change anything, as we're not doing any writes. Moreover, the machine has a bunch of SSD drives (16 or 24, I don't remember at the moment), behind a RAID controller with 2GB of write cache on it. Also, I think instead of 5 mins, read-write runs should be run for 15 mins to get consistent data. Where does the inconsistency come from? Thats what I am also curious to know. Lack of warmup? Can't say, but at least we should try to rule out the possibilities. I think one way to rule out is to do slightly longer runs for Dilip's test cases and for pgbench we might need to drop and re-create database after each reading. My point is that it's unlikely to be due to insufficient warmup, because the inconsistencies appear randomly - generally you get a bunch of slow runs, one significantly faster one, then slow ones again. I believe the runs to be sufficiently long. I don't see why recreating the database would be useful - the whole point is to get the database and shared buffers into a stable state, and then do measurements on it. I don't think bloat is a major factor here - I'm collecting some additional statistics during this run, including pg_database_size, and I can see the size oscillates between 4.8GB and 5.4GB. That's pretty negligible, I believe. I'll let the current set of benchmarks complete - it's running on 4.5.5 now, I'll do tests on 3.2.80 too. Then we can re-evaluate if longer runs are needed. Considering how uniform the results from the 10 runs are (at least on 4.5.5), I claim this is not an issue. It is quite possible that it is some kernel regression which might be fixed in later version. Like we are doing most tests in cthulhu which has 3.10 version of kernel and we generally get consistent results. I am not sure if later version of kernel say 4.5.5 is a net win, because there is a considerable difference (dip) of performance in that version, though it produces quite stable results. Well, the thing is - the 4.5.5 behavior is much nicer in general. I'll always prefer lower but more consistent performance (in most cases). In any case, we're stuck with whatever kernel version the people are using, and they're likely to use the newer ones. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra wrote: > On 09/23/2016 03:07 PM, Amit Kapila wrote: >> >> On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra >> wrote: >>> >>> On 09/23/2016 01:44 AM, Tomas Vondra wrote: ... The 4.5 kernel clearly changed the results significantly: >>> ... (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. >>> >>> The more I think about these random spikes in pgbench performance on >>> 3.2.80, >>> the more I find them intriguing. Let me show you another example (from >>> Dilip's workload and group-update patch on 64 clients). >>> >>> This is on 3.2.80: >>> >>> 44175 34619 51944 38384 49066 >>> 37004 47242 36296 46353 36180 >>> >>> and on 4.5.5 it looks like this: >>> >>> 34400 35559 35436 34890 34626 >>> 35233 35756 34876 35347 35486 >>> >>> So the 4.5.5 results are much more even, but overall clearly below >>> 3.2.80. >>> How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we >>> randomly do something right, but what is it and why doesn't it happen on >>> the >>> new kernel? And how could we do it every time? >>> >> >> As far as I can see you are using default values of min_wal_size, >> max_wal_size, checkpoint related params, have you changed default >> shared_buffer settings, because that can have a bigger impact. > > > Huh? Where do you see me using default values? > I was referring to one of your script @ http://bit.ly/2doY6ID. I haven't noticed that you have changed default values in postgresql.conf. > There are settings.log with a > dump of pg_settings data, and the modified values are > > checkpoint_completion_target = 0.9 > checkpoint_timeout = 3600 > effective_io_concurrency = 32 > log_autovacuum_min_duration = 100 > log_checkpoints = on > log_line_prefix = %m > log_timezone = UTC > maintenance_work_mem = 524288 > max_connections = 300 > max_wal_size = 8192 > min_wal_size = 1024 > shared_buffers = 2097152 > synchronous_commit = on > work_mem = 524288 > > (ignoring some irrelevant stuff like locales, timezone etc.). > >> Using default values of mentioned parameters can lead to checkpoints in >> between your runs. > > > So I'm using 16GB shared buffers (so with scale 300 everything fits into > shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout > 1h etc. So no, there are no checkpoints during the 5-minute runs, only those > triggered explicitly before each run. > Thanks for clarification. Do you think we should try some different settings *_flush_after parameters as those can help in reducing spikes in writes? >> Also, I think instead of 5 mins, read-write runs should be run for 15 >> mins to get consistent data. > > > Where does the inconsistency come from? Thats what I am also curious to know. > Lack of warmup? Can't say, but at least we should try to rule out the possibilities. I think one way to rule out is to do slightly longer runs for Dilip's test cases and for pgbench we might need to drop and re-create database after each reading. > Considering how > uniform the results from the 10 runs are (at least on 4.5.5), I claim this > is not an issue. > It is quite possible that it is some kernel regression which might be fixed in later version. Like we are doing most tests in cthulhu which has 3.10 version of kernel and we generally get consistent results. I am not sure if later version of kernel say 4.5.5 is a net win, because there is a considerable difference (dip) of performance in that version, though it produces quite stable results. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 02:59 PM, Pavan Deolasee wrote: On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote: On 09/23/2016 05:10 AM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote: On 09/21/2016 08:04 AM, Amit Kapila wrote: (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. how long each run was? Generally, I do half-hour run to get stable results. 10 x 5-minute runs for each client count. The full shell script driving the benchmark is here: http://bit.ly/2doY6ID and in short it looks like this: for r in `seq 1 $runs`; do for c in 1 8 16 32 64 128 192; do psql -c checkpoint pgbench -j 8 -c $c ... done done I see couple of problems with the tests: 1. You're running regular pgbench, which also updates the small tables. At scale 300 and higher clients, there is going to heavy contention on the pgbench_branches table. Why not test with pgbench -N? Sure, I can do a bunch of tests with pgbench -N. Good point. But notice that I've also done the testing with Dilip's workload, and the results are pretty much the same. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/23/2016 03:07 PM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra wrote: On 09/23/2016 01:44 AM, Tomas Vondra wrote: ... The 4.5 kernel clearly changed the results significantly: ... (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. The more I think about these random spikes in pgbench performance on 3.2.80, the more I find them intriguing. Let me show you another example (from Dilip's workload and group-update patch on 64 clients). This is on 3.2.80: 44175 34619 51944 38384 49066 37004 47242 36296 46353 36180 and on 4.5.5 it looks like this: 34400 35559 35436 34890 34626 35233 35756 34876 35347 35486 So the 4.5.5 results are much more even, but overall clearly below 3.2.80. How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we randomly do something right, but what is it and why doesn't it happen on the new kernel? And how could we do it every time? As far as I can see you are using default values of min_wal_size, max_wal_size, checkpoint related params, have you changed default shared_buffer settings, because that can have a bigger impact. Huh? Where do you see me using default values? There are settings.log with a dump of pg_settings data, and the modified values are checkpoint_completion_target = 0.9 checkpoint_timeout = 3600 effective_io_concurrency = 32 log_autovacuum_min_duration = 100 log_checkpoints = on log_line_prefix = %m log_timezone = UTC maintenance_work_mem = 524288 max_connections = 300 max_wal_size = 8192 min_wal_size = 1024 shared_buffers = 2097152 synchronous_commit = on work_mem = 524288 (ignoring some irrelevant stuff like locales, timezone etc.). Using default values of mentioned parameters can lead to checkpoints in between your runs. So I'm using 16GB shared buffers (so with scale 300 everything fits into shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout 1h etc. So no, there are no checkpoints during the 5-minute runs, only those triggered explicitly before each run. Also, I think instead of 5 mins, read-write runs should be run for 15 mins to get consistent data. Where does the inconsistency come from? Lack of warmup? Considering how uniform the results from the 10 runs are (at least on 4.5.5), I claim this is not an issue. For Dilip's workload where he is using only Select ... For Update, i think it is okay, but otherwise you need to drop and re-create the database between each run, otherwise data bloat could impact the readings. And why should it affect 3.2.80 and 4.5.5 differently? I think in general, the impact should be same for both the kernels because you are using same parameters, but I think if use appropriate parameters, then you can get consistent results for 3.2.80. I have also seen variation in read-write tests, but the variation you are showing is really a matter of concern, because it will be difficult to rely on final data. Both kernels use exactly the same parameters (fairly tuned, IMHO). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:50 AM, Robert Haas wrote: > On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra > wrote: >> I don't dare to suggest rejecting the patch, but I don't see how we could >> commit any of the patches at this point. So perhaps "returned with feedback" >> and resubmitting in the next CF (along with analysis of improved workloads) >> would be appropriate. > > I think it would be useful to have some kind of theoretical analysis > of how much time we're spending waiting for various locks. So, for > example, suppose we one run of these tests with various client counts > - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select > wait_event from pg_stat_activity" once per second throughout the test. > Then we see how many times we get each wait event, including NULL (no > wait event). Now, from this, we can compute the approximate > percentage of time we're spending waiting on CLogControlLock and every > other lock, too, as well as the percentage of time we're not waiting > for lock. That, it seems to me, would give us a pretty clear idea > what the maximum benefit we could hope for from reducing contention on > any given lock might be. > As mentioned earlier, such an activity makes sense, however today, again reading this thread, I noticed that Dilip has already posted some analysis of lock contention upthread [1]. It is clear that patch has reduced LWLock contention from ~28% to ~4% (where the major contributor was TransactionIdSetPageStatus which has reduced from ~53% to ~3%). Isn't it inline with what you are looking for? [1] - https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 23, 2016 at 6:29 PM, Pavan Deolasee wrote: > On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra > wrote: >> >> On 09/23/2016 05:10 AM, Amit Kapila wrote: >>> >>> On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra >>> wrote: On 09/21/2016 08:04 AM, Amit Kapila wrote: > > (c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters): 12118 11610 27939 11771 18065 12152 14375 10983 13614 11077 we get this on 4.5.5 37354 37650 37371 37190 37233 38498 37166 36862 37928 38509 Notice how much more even the 4.5.5 results are, compared to 3.2.80. >>> >>> how long each run was? Generally, I do half-hour run to get stable >>> results. >>> >> >> 10 x 5-minute runs for each client count. The full shell script driving >> the benchmark is here: http://bit.ly/2doY6ID and in short it looks like >> this: >> >> for r in `seq 1 $runs`; do >> for c in 1 8 16 32 64 128 192; do >> psql -c checkpoint >> pgbench -j 8 -c $c ... >> done >> done > > > > I see couple of problems with the tests: > > 1. You're running regular pgbench, which also updates the small tables. At > scale 300 and higher clients, there is going to heavy contention on the > pgbench_branches table. Why not test with pgbench -N? As far as this patch > is concerned, we are only interested in seeing contention on > ClogControlLock. In fact, how about a test which only consumes an XID, but > does not do any write activity at all? Complete artificial workload, but > good enough to tell us if and how much the patch helps in the best case. We > can probably do that with a simple txid_current() call, right? > Right, that is why in the initial tests done by Dilip, he has used Select .. for Update. I think using txid_current will generate lot of contention on XidGenLock which will mask the contention around CLOGControlLock, in-fact we have tried that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers