Hi, I really like the idea of tuning checkpoint segments based on disk space usage.
I performed series of tests for this patch and would like to share the results. My comments are in-line. To start with, I applied this patch to the master successfully - > But ... do I understand things correctly that checkpoint wouldn't "kick >> in" until you hit checkpoint_wal_size? If that's the case, isn't real >> disk space usage around 2X checkpoint_wal_size if spread checkpoint is >> set to 0.9? Or does checkpoint kick in sometime earlier? >> > > It kicks in earlier, so that the checkpoint *completes* just when > checkpoint_wal_size of WAL is used up. So the real disk usage is > checkpoint_wal_size. > > There is still an internal variable called CheckPointSegments that > triggers the checkpoint, but it is now derived from checkpoint_wal_size > (see CalculateCheckpointSegments function): > > CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 + > checkpoint_completion_target) > Yes, i see this happening. This is the same formula we've always had in the manual for calculating the > amount of WAL space used, but in reverse. I.e. we calculate > CheckPointSegments based on the desired disk space usage, not the other way > round. > > As a note, pgBench would be a terrible test for this patch; we really >> need something which creates uneven traffic. I'll see if I can devise >> something. >> > > Attached is a rebased version of this patch. Everyone, please try this out > on whatever workloads you have, and let me know: > > a) How does the auto-tuning of the number of recycled segments work? Does > pg_xlog reach a steady-state size, or does it fluctuate a lot? > I performed the tests by executing heavy INSERT operations (INSERTS only) using benchmarksql. I do see that pg_xlog size is increasing at times. I have inserted about 6GB of data for testing. Below are the test results. *Test 1 :* In this test, i see removed+recycled segments = 3 (except for the first 3 checkpoint cycles) and has been steady through out until the INSERT operation completed. Actual calculation of CheckPointSegments = 3.2 ( is getting rounded up to 3 ) pg_xlog size is 128M and has increased to 160M max during the INSERT operation. shared_buffers = 128M checkpoint_wal_size = 128M min_recycle_wal_size = 80M checkpoint_timeout = 5min TimeStamp=2015-01-27 09:39:14.325 GMT-10 DB=bsql SID=54c6cfd6.5e4 User=postgres LOG: statement: update order_line set ol_amount = 0.01; TimeStamp=2015-01-27 09:39:15.407 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint complete: wrote 5123 buffers (31.3%); 0 transaction log file(s) added, 1 removed, 0 recycled; write=0.593 s, sync=2.492 s, total=3.273 s; sync files=26, longest=0.399 s, average=0.095 s; distance=52653 KB, estimate=52653 KB TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoints are occurring too frequently (3 seconds apart) TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= HINT: Consider increasing the configuration parameter "checkpoint_wal_size". TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint complete: wrote 8145 buffers (49.7%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=0.913 s, sync=1.476 s, total=2.530 s; sync files=4, longest=0.534 s, average=0.369 s; distance=87446 KB, estimate=87446 KB TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoints are occurring too frequently (3 seconds apart) TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= HINT: Consider increasing the configuration parameter "checkpoint_wal_size". TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 09:39:23.169 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint complete: wrote 4598 buffers (28.1%); 0 transaction log file(s) added, 3 removed, 2 recycled; write=0.716 s, sync=1.083 s, total=1.957 s; sync files=4, longest=0.486 s, average=0.270 s; distance=47964 KB, estimate=83498 KB TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoints are occurring too frequently (2 seconds apart) TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= HINT: Consider increasing the configuration parameter "checkpoint_wal_size". TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 09:39:24.968 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint complete: wrote 3417 buffers (20.9%); 0 transaction log file(s) added, 1 removed, 2 recycled; write=0.539 s, sync=1.059 s, total=1.732 s; sync files=4, longest=0.535 s, average=0.264 s; distance=44814 KB, estimate=79629 KB TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoints are occurring too frequently (2 seconds apart) TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= HINT: Consider increasing the configuration parameter "checkpoint_wal_size". TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 09:39:26.879 GMT-10 DB= SID=54bee4a1.3002 User= LOG: checkpoint complete: wrote 4721 buffers (28.8%); 0 transaction log file(s) added, 1 removed, 2 recycled; write=0.474 s, sync=1.166 s, total=1.761 s; sync files=4, longest=0.583 s, average=0.291 s; distance=49145 KB, estimate=76581 KB *Test 2 :* removed+recycled segments remained 3 even after i increased the checkpoint_wal_size = 144M. This is obviously due to the calculation in CalculateCheckpointSegments() functions. checkpoint_wal_size = 144M min_recycle_wal_size = 104M checkpoint_timeout = 5min shared_buffers = 1 GB Actual calculation of CheckPointSegments = 3.6 TimeStamp=2015-01-27 13:54:38.469 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:54:42.831 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5419 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.408 s, sync=1.820 s, total=4.361 s; sync files=3, longest=1.432 s, average=0.606 s; distance=48175 KB, estimate=49972 KB TimeStamp=2015-01-27 13:54:44.824 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:54:49.008 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5570 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.769 s, sync=1.268 s, total=4.184 s; sync files=3, longest=0.843 s, average=0.422 s; distance=51720 KB, estimate=51720 KB TimeStamp=2015-01-27 13:54:50.754 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:54:55.127 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5155 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.977 s, sync=1.273 s, total=4.372 s; sync files=3, longest=0.848 s, average=0.424 s; distance=46133 KB, estimate=51161 KB TimeStamp=2015-01-27 13:54:57.164 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:55:01.622 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5345 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.598 s, sync=1.290 s, total=4.458 s; sync files=3, longest=0.894 s, average=0.430 s; distance=49604 KB, estimate=51006 KB TimeStamp=2015-01-27 13:55:03.501 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:55:07.390 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5482 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.549 s, sync=1.193 s, total=3.889 s; sync files=3, longest=0.837 s, average=0.397 s; distance=49963 KB, estimate=50901 KB TimeStamp=2015-01-27 13:55:09.381 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 13:55:13.626 GMT-10 DB= SID=54c70b57.21a0 User= LOG: checkpoint complete: wrote 5481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.778 s, sync=1.280 s, total=4.244 s; sync *Test 3 :* checkpoint_wal_size = 244M min_recycle_wal_size = 204M checkpoint_timeout = 5min shared_buffers = 1 GB removed+recycled segments remained 6. Actual calculation of checkpointsegments = 6.1 TimeStamp=2015-01-27 14:02:01.936 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 14:02:10.638 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint complete: wrote 14111 buffers (10.8%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=5.527 s, sync=2.719 s, total=8.701 s; sync files=14, longest=1.789 s, average=0.194 s; distance=98617 KB, estimate=99036 KB TimeStamp=2015-01-27 14:02:14.243 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 14:02:22.783 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint complete: wrote 16524 buffers (12.6%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=7.013 s, sync=1.394 s, total=8.540 s; sync files=3, longest=0.867 s, average=0.464 s; distance=98724 KB, estimate=99005 KB TimeStamp=2015-01-27 14:02:28.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 14:02:36.946 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint complete: wrote 16541 buffers (12.6%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=4.899 s, sync=3.801 s, total=8.879 s; sync files=9, longest=2.800 s, average=0.422 s; distance=98719 KB, estimate=98976 KB TimeStamp=2015-01-27 14:02:40.611 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 14:02:48.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint complete: wrote 10998 buffers (8.4%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=4.874 s, sync=1.998 s, total=7.455 s; sync files=3, longest=1.998 s, average=0.666 s; distance=98771 KB, estimate=98956 KB TimeStamp=2015-01-27 14:02:53.327 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-27 14:03:00.872 GMT-10 DB= SID=54c70d58.22f4 User= LOG: checkpoint complete: wrote 10640 buffers (8.1%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=5.247 s, sync=2.097 s, total=7.544 s; sync files=3, longest=1.640 s, average=0.699 s; distance=98624 KB, estimate=98923 KB *Test 4 :* This time i tested with wal_keep_segments = 300 (4.8 G) checkpoint_wal_size = 512MB min_recycle_wal_size = 80M wal_keep_segments = 300 checkpoint_timeout = 5min shared_buffers = 1 GB Actual calculation of checkpointsegments = 12.8 TimeStamp=2015-01-29 12:51:48.276 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 12:52:04.325 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 20965 buffers (16.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=11.676 s, sync=3.830 s, total=16.049 s; sync files=18, longest=2.991 s, average=0.212 s; distance=196705 KB, estimate=196705 KB TimeStamp=2015-01-29 12:52:16.068 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 12:52:33.529 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 22009 buffers (16.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=12.705 s, sync=3.559 s, total=17.460 s; sync files=3, longest=3.002 s, average=1.186 s; distance=200401 KB, estimate=200401 KB TimeStamp=2015-01-29 12:52:43.321 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog Since the wal_keep_segments is 300, recycling or removing of the transactions logs begins after the required number of wal_keep_segments are retained. Which is 4.8G in this case. removed+recycled has always been 12 except for the first 3 checkpoint cycles after pg_xlog size reached 4.8G. TimeStamp=2015-01-29 13:03:29.167 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:03:58.401 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 20316 buffers (15.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=11.963 s, sync=16.840 s, total=29.233 s; sync files=16, longest=15.137 s, average=1.052 s; distance=197432 KB, estimate=197432 KB TimeStamp=2015-01-29 13:04:05.451 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:04:52.416 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 20280 buffers (15.5%); 0 transaction log file(s) added, 5 removed, 8 recycled; write=10.989 s, sync=35.791 s, total=46.965 s; sync files=10, longest=17.927 s, average=3.579 s; distance=196668 KB, estimate=197356 KB TimeStamp=2015-01-29 13:04:52.635 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:05:15.520 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 31394 buffers (24.0%); 0 transaction log file(s) added, 0 removed, 10 recycled; write=10.270 s, sync=12.404 s, total=22.884 s; sync files=17, longest=5.014 s, average=0.729 s; distance=197961 KB, estimate=197961 KB TimeStamp=2015-01-29 13:05:20.356 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:05:35.060 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 32731 buffers (25.0%); 0 transaction log file(s) added, 0 removed, 10 recycled; write=11.433 s, sync=3.055 s, total=14.703 s; sync files=13, longest=1.300 s, average=0.235 s; distance=196510 KB, estimate=197816 KB TimeStamp=2015-01-29 13:05:43.059 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:05:59.518 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 30264 buffers (23.1%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=10.687 s, sync=5.624 s, total=16.459 s; sync files=12, longest=3.971 s, average=0.468 s; distance=193348 KB, estimate=197369 KB TimeStamp=2015-01-29 13:06:07.371 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint starting: xlog TimeStamp=2015-01-29 13:06:23.870 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG: checkpoint complete: wrote 30723 buffers (23.4%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=10.132 s, sync=6.159 s, total=16.498 s; sync file > b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size, > intuitive to set? > > During my tests, I did not observe the significance of min_recycle_wal_size parameter yet. Ofcourse, i had sufficient disk space for pg_xlog. I would like to understand more about "min_recycle_wal_size" parameter. In theory, i only understand from the note in the patch that if the disk space usage falls below certain threshold, min_recycle_wal_size number of WALs will be removed to accommodate future pg_xlog segments. I will try to test this out. Please let me know if there is any specific test to understand min_recycle_wal_size behaviour. I will try to perform some more stress testing with different set of high workloads and will share the results. I did not review the patch code completely. Will comment once done. Please share your thoughts on this. Regards, Venkata B N