Hi guys,

I have noticed that during VACUUM FULL on reasonably big tables, replication lag climbs. In order to smooth down the replication lag, I propose the attached patch which enables vacuum delay for VACUUM FULL.

Please find attached the patch and below more information on this specific issue.

Cheers,
Gabriele


== Scenario

I have setup a simple SyncRep scenario with one master and one standby on the same server.
On the master I have setup vacuum_cost_delay = 10 milliseconds.

I have created a scale 50 pgbench database, which produces a 640MB pgbench_accounts table (about 82k pages). I have then launched a 60 seconds pgbench activity with 4 concurrent clients with the goal to make some changes to the pgbench table (approximately 1800 changes on my laptop).

== Problem observed

Replication lag climbs during VACUUM FULL.

== Proposed change

Enable vacuum delay for VACUUM FULL (and CLUSTER).

== Test

I have then launched a VACUUM FULL operation on the pgbench_accounts table and measured the lag in bytes every 5 seconds, by calculating the difference between the current location and the sent location.

Here is a table with lag values. The first column (sec) is the sampling time (every 5 seconds for the sake of simplicity here), the second column (mlag) is the master lag on the current HEAD instance, the third column (mlagpatch) is the lag measured on the patched Postgres instance.

 sec |  mlag     | mlagpatch
-----+-----------+-----------
  0  |  1896424  |        0
  5  | 15654912  |  4055040
 10  |  8019968  | 13893632
 15  | 16850944  |  4177920
 20  | 10969088  | 21102592
 25  | 11468800  |  2277376
 30  |  7995392  | 13893632
 35  | 14811136  | 20660224
 40  |  6127616  |        0
 45  |  6914048  |  5136384
 50  |  5996544  | 13500416
 55  | 14155776  |  9043968
 60  | 23298048  | 11722752
 65  | 15400960  | 18202624
 70  | 17858560  | 28049408
 75  |  8560640  | 34865152
 80  | 19628032  | 33161216
 85  | 25526272  | 39976960
 90  | 23183360  | 23683072
 95  | 23265280  |   303104
100  | 24346624  |  3710976
105  | 24813568  |        0
110  | 32587776  |  7651328
115  | 42827776  | 12369920
120  | 50167808  | 14991360
125  | 60260352  |  3850240
130  | 62750720  |  5160960
135  | 68255744  |  9355264
140  | 60653568  | 14336000
145  | 68780032  | 16564224
150  | 74342400  |  5398528
155  | 84639744  | 11321344
160  | 92741632  | 16302080
165  | 70123520  | 20234240
170  | 13606912  | 23248896
175  | 20586496  | 29278208
180  | 16482304  |  1900544
185  |        0  |        0

As you can see, replication lag on HEAD's PostgreSQL reaches 92MB (160 seconds) before starting to decrease (when the operation terminates). The test result is consistent with the expected behaviour of cost-based vacuum delay.

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 191ef54..f10ae3c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -877,7 +877,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid 
OIDOldIndex,
                Buffer          buf;
                bool            isdead;
 
-               CHECK_FOR_INTERRUPTS();
+               /* Launches vacuum delay */
+               vacuum_delay_point();
 
                if (indexScan != NULL)
                {
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to