On Wed, Feb 10, 2016 at 10:55:10AM -0500, Tom Lane wrote: > Interestingly, we seem to have managed to greatly reduce the "other" > time (which I presume is basically mdpostchkpt unlinking) since 9.2. > The worst case observed in HEAD is about 100s: > > regression=# select ts,write,sync,total-write-sync as other, total from > chkpts order by total-write-sync desc limit 10; > ts | write | sync | other | total > ----------------------------+---------+---------+---------+--------- > 2016-02-09 08:27:04.365-08 | 14.479 | 298.043 | 100.586 | 413.108 > 2016-02-09 06:15:22.477-08 | 75.099 | 26.590 | 90.452 | 192.141 > 2016-02-09 16:31:34.661-08 | 254.462 | 186.246 | 51.541 | 492.249 > 2016-02-09 15:25:01.586-08 | 0.015 | 2.985 | 47.822 | 50.822 > 2016-02-09 11:40:13.565-08 | 287.327 | 314.218 | 44.059 | 645.604 > 2016-02-09 08:56:51.339-08 | 135.464 | 31.242 | 39.131 | 205.837 > 2016-02-09 20:23:52.797-08 | 1.309 | 12.155 | 36.350 | 49.814 > 2016-02-09 00:51:41.157-08 | 134.502 | 103.482 | 34.110 | 272.094 > 2016-02-09 19:28:33.154-08 | 0.575 | 20.434 | 33.803 | 54.812 > 2016-02-09 03:47:02.57-08 | 0.066 | 37.475 | 33.119 | 70.660 > (10 rows) > > but 9.2's worst cases greatly exceed that: > > regression=# select ts,write,sync,total-write-sync as other, total from > chkpts92 order by total-write-sync desc limit 10; > ts | write | sync | other | total > ----------------------------+--------+---------+---------+--------- > 2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091 > 2016-02-09 16:31:25.657-08 | 3.860 | 184.042 | 248.488 | 436.390 > 2016-02-08 23:33:36.16-08 | 66.904 | 397.653 | 229.020 | 693.577 > 2016-02-09 11:39:57.061-08 | 2.471 | 303.924 | 201.923 | 508.318 > 2016-02-09 07:37:44.48-08 | 3.317 | 1.494 | 158.159 | 162.970 > 2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726 > 2016-02-09 12:37:43.52-08 | 17.620 | 333.983 | 141.172 | 492.775 > 2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384 > 2016-02-09 18:17:32.193-08 | 90.370 | 25.369 | 63.626 | 179.365 > 2016-02-09 11:02:14.977-08 | 2.713 | 2.306 | 38.581 | 43.600 > (10 rows) > > Not real sure where that improvement came from. We could hardly > be unlinking fewer files overall, and the bottleneck seems to be > in the kernel, so what changed? The *average*, as opposed to worst > case, checkpoint time has definitely grown since 9.2:
Good question; I don't know. I would have expected 9.2's xlog-driven checkpoints to create more stability than HEAD's time-driven checkpoints. On the AIX animals, I have now set PGCTLTIMEOUT=900 and removed the "-t 120" arguments. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers