On Wednesday, October 21, 2020 4:37 PM, Tsunakawa-san wrote:
> RelationTruncate() invalidates the cached fork sizes as follows.  This causes
> smgrnblocks() return accurate=false, resulting in not running optimization.
> Try commenting out for non-recovery case.
> 
>     /*
>      * Make sure smgr_targblock etc aren't pointing somewhere past new
> end
>      */
>     rel->rd_smgr->smgr_targblock = InvalidBlockNumber;
>     for (int i = 0; i <= MAX_FORKNUM; ++i)
>         rel->rd_smgr->smgr_cached_nblocks[i] = InvalidBlockNumber;

Hello, I have updated the set of patches which incorporated all your feedback 
in the previous email.
Thank you for also looking into it. The patch 0003 (DropRelFileNodeBuffers 
improvement)
is indeed for vacuum optimization and not for truncate.
I'll post a separate patch for the truncate optimization in the coming days.

1. Vacuum Optimization
I have confirmed that the above comment (commenting out the lines in 
RelationTruncate)
solves the issue for non-recovery case.
The attached 0004 patch is just for non-recovery testing and is not included in 
the
final set of patches to be committed for vacuum optimization.

The table below shows the vacuum execution time for non-recovery case.
I've also subtracted the execution time when VACUUM (truncate off) is set.

[NON-RECOVERY CASE - VACUUM execution Time in seconds]

| s_b   | master | patched | %reg      | 
|-------|--------|---------|-----------| 
| 128MB | 0.22   | 0.181   | -21.55%   | 
| 1GB   | 0.701  | 0.712   | 1.54%     | 
| 20GB  | 15.027 | 1.920   | -682.66%  | 
| 100GB | 65.456 | 1.795   | -3546.57% |

[RECOVERY CASE, VACUUM execution + failover]
I've made a mistake in my writing of the previous email [1].
DELETE from was executed before pausing the WAL replay on standby.
In short, the procedure and results were correct. But I repeated the
performance measurement just in case. The results are still great and 
almost the same as the previous measurement.

| s_b   | master | patched | %reg   | 
|-------|--------|---------|--------| 
| 128MB | 3.043  | 3.009   | -1.13% | 
| 1GB   | 3.417  | 3.410   | -0.21% | 
| 20GB  | 20.597 | 2.410   | -755%  | 
| 100GB | 65.734 | 2.409   | -2629% |

Based from the results above, with the patches applied,
the performance for both recovery and non-recovery were relatively close.
For default and small shared_buffers (128MB, 1GB), the performance is
relatively the same as master. But we see the benefit when we have large 
shared_buffers setting.

I've tested using the same test case I indicated in the previous email,
Including the following additional setting:
vacuum_cost_delay = 0
vacuum_cost_limit = 10000

That's it for the vacuum optimization. Feedback and comments would be highly 
appreciated.

2. Truncate Optimization
I'll post a separate patch in the future for the truncate optimization which 
modifies the
DropRelFileNodesAllBuffers and related functions along the truncate path..

Thank you.

Regards,
Kirk Jamison

[1] 
https://www.postgresql.org/message-id/OSBPR01MB2341672E9A95E5EC6D2E79B5EF020%40OSBPR01MB2341.jpnprd01.prod.outlook.com

Attachment: v26-0001-Prevent-invalidating-blocks-in-smgrextend-during.patch
Description: v26-0001-Prevent-invalidating-blocks-in-smgrextend-during.patch

Attachment: v26-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks.patch
Description: v26-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks.patch

Attachment: v26-0003-Optimize-DropRelFileNodeBuffers-during-recovery.patch
Description: v26-0003-Optimize-DropRelFileNodeBuffers-during-recovery.patch

Attachment: v26-0004-For-non-recovery-performance-test-case-purposes-.patch
Description: v26-0004-For-non-recovery-performance-test-case-purposes-.patch

Reply via email to