Noah Misch <n...@leadboat.com> writes: >>> That's reasonable. If you would like higher-fidelity data, I can run loops >>> of >>> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run >>> that for HEAD and 9.2 simultaneously. A day of logs from that should show >>> clearly if HEAD is systematically worse than 9.2.
>> That sounds like a fine plan, please do it. > Log files: > HEAD: > https://drive.google.com/uc?export=download&id=0B9IURs2-_2ZMakl2TjFHUlpvc1k > 92: > https://drive.google.com/uc?export=download&id=0B9IURs2-_2ZMYVZtY3VqcjBFX1k Thanks for doing this. I did a bit of analysis of these logs. There is no case in the HEAD log where 'lock files all released' comes out more than 2.221 seconds after 'shutdown checkpoint complete'. So it doesn't appear that stats writing is ever a big problem for your machine. The checkpoints themselves, though ... here are the slowest few checkpoints on HEAD: regression=# select ts,write,sync,total-write-sync as other, total from chkpts order by total desc limit 10; ts | write | sync | other | total ----------------------------+---------+---------+---------+--------- 2016-02-09 03:33:14.671-08 | 287.691 | 355.528 | 12.509 | 655.728 2016-02-09 11:40:13.565-08 | 287.327 | 314.218 | 44.059 | 645.604 2016-02-09 12:36:49.753-08 | 149.652 | 481.736 | 13.094 | 644.482 2016-02-09 04:32:53.708-08 | 214.896 | 304.102 | 17.929 | 536.927 2016-02-09 16:31:34.661-08 | 254.462 | 186.246 | 51.541 | 492.249 2016-02-09 08:27:04.365-08 | 14.479 | 298.043 | 100.586 | 413.108 2016-02-09 07:37:57.999-08 | 280.960 | 10.408 | 14.287 | 305.655 2016-02-09 07:06:30.121-08 | 155.331 | 93.433 | 23.814 | 272.578 2016-02-09 17:45:34.422-08 | 149.471 | 104.185 | 18.598 | 272.254 2016-02-09 00:51:41.157-08 | 134.502 | 103.482 | 34.110 | 272.094 (10 rows) and the same for 9.2: regression=# select ts,write,sync,total-write-sync as other, total from chkpts92 order by total desc limit 10; ts | write | sync | other | total ----------------------------+---------+---------+---------+--------- 2016-02-08 23:33:36.16-08 | 66.904 | 397.653 | 229.020 | 693.577 2016-02-09 11:39:57.061-08 | 2.471 | 303.924 | 201.923 | 508.318 2016-02-09 12:37:43.52-08 | 17.620 | 333.983 | 141.172 | 492.775 2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091 2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726 2016-02-09 16:31:25.657-08 | 3.860 | 184.042 | 248.488 | 436.390 2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384 2016-02-09 03:32:30.718-08 | 296.907 | 15.260 | 5.644 | 317.811 2016-02-09 15:31:41.691-08 | 0.385 | 307.996 | 0.155 | 308.536 2016-02-09 12:23:10.92-08 | 93.634 | 161.581 | 19.342 | 274.557 (10 rows) It looks like you're going to have to set PGCTLTIMEOUT somewhere north of 10 minutes to keep these animals from failing under load. Interestingly, we seem to have managed to greatly reduce the "other" time (which I presume is basically mdpostchkpt unlinking) since 9.2. The worst case observed in HEAD is about 100s: regression=# select ts,write,sync,total-write-sync as other, total from chkpts order by total-write-sync desc limit 10; ts | write | sync | other | total ----------------------------+---------+---------+---------+--------- 2016-02-09 08:27:04.365-08 | 14.479 | 298.043 | 100.586 | 413.108 2016-02-09 06:15:22.477-08 | 75.099 | 26.590 | 90.452 | 192.141 2016-02-09 16:31:34.661-08 | 254.462 | 186.246 | 51.541 | 492.249 2016-02-09 15:25:01.586-08 | 0.015 | 2.985 | 47.822 | 50.822 2016-02-09 11:40:13.565-08 | 287.327 | 314.218 | 44.059 | 645.604 2016-02-09 08:56:51.339-08 | 135.464 | 31.242 | 39.131 | 205.837 2016-02-09 20:23:52.797-08 | 1.309 | 12.155 | 36.350 | 49.814 2016-02-09 00:51:41.157-08 | 134.502 | 103.482 | 34.110 | 272.094 2016-02-09 19:28:33.154-08 | 0.575 | 20.434 | 33.803 | 54.812 2016-02-09 03:47:02.57-08 | 0.066 | 37.475 | 33.119 | 70.660 (10 rows) but 9.2's worst cases greatly exceed that: regression=# select ts,write,sync,total-write-sync as other, total from chkpts92 order by total-write-sync desc limit 10; ts | write | sync | other | total ----------------------------+--------+---------+---------+--------- 2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091 2016-02-09 16:31:25.657-08 | 3.860 | 184.042 | 248.488 | 436.390 2016-02-08 23:33:36.16-08 | 66.904 | 397.653 | 229.020 | 693.577 2016-02-09 11:39:57.061-08 | 2.471 | 303.924 | 201.923 | 508.318 2016-02-09 07:37:44.48-08 | 3.317 | 1.494 | 158.159 | 162.970 2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726 2016-02-09 12:37:43.52-08 | 17.620 | 333.983 | 141.172 | 492.775 2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384 2016-02-09 18:17:32.193-08 | 90.370 | 25.369 | 63.626 | 179.365 2016-02-09 11:02:14.977-08 | 2.713 | 2.306 | 38.581 | 43.600 (10 rows) Not real sure where that improvement came from. We could hardly be unlinking fewer files overall, and the bottleneck seems to be in the kernel, so what changed? The *average*, as opposed to worst case, checkpoint time has definitely grown since 9.2: regression=# select avg(write) as write,avg(sync) as sync,avg(total-write-sync) as other, avg(total) as total from chkpts; write | sync | other | total ---------------------+--------------------+--------------------+--------------------- 14.4432267389340560 | 7.4136738934056007 | 4.9619313459801265 | 26.8188319783197832 (1 row) regression=# select avg(write) as write,avg(sync) as sync,avg(total-write-sync) as other, avg(total) as total from chkpts92; write | sync | other | total --------------------+--------------------+--------------------+--------------------- 9.1272341075794621 | 6.2059480440097800 | 4.0349553789731051 | 19.3681375305623472 (1 row) I don't find that troublesome, since we're probably syncing more data in the newer branch's tests. But the improvement in worst case behavior is puzzling. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers