Hi hackers, During logical replication, if there is a large write transaction, some spill files will be written to disk, depending on the setting of logical_decoding_work_mem.
This behavior can effectively avoid OOM, but if the transaction generates a lot of change before commit, a large number of files may fill the disk. For example, you can update a TB-level table. Of course, this is also inevitable. But I found an inelegant phenomenon. If the updated large table is not published, its changes will also be written with a large number of spill files. Look at an example below: publisher: ``` create table tbl_pub(id int, val1 text, val2 text,val3 text); create table tbl_t1(id int, val1 text, val2 text,val3 text); CREATE PUBLICATION mypub FOR TABLE public.tbl_pub; ``` subscriber: ``` create table tbl_pub(id int, val1 text, val2 text,val3 text); create table tbl_t1(id int, val1 text, val2 text,val3 text); CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432 user=postgres dbname=postgres' PUBLICATION mypub; ``` publisher: ``` begin; insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,999999) i; ``` Later you will see a large number of spill files in the "/$PGDATA/pg_replslot/mysub/" directory. ``` $ll -sh total 4.5G 4.0K -rw------- 1 postgres postgres 200 Nov 30 09:24 state 17M -rw------- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-10000000.spill 12M -rw------- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-1000000.spill 17M -rw------- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-11000000.spill ...... ``` We can see that table tbl_t1 is not published in mypub. It also won't be sent downstream because it's not subscribed. After the transaction is reorganized, the pgoutput decoding plugin filters out changes to these unpublished relationships when sending logical changes. See function pgoutput_change. Most importantly, if we filter out unpublished relationship-related changes after constructing the changes but before queuing the changes into a transaction, will it reduce the workload of logical decoding and avoid disk or memory growth as much as possible? Attached is the patch I used to implement this optimization. Design: 1. Added a callback LogicalDecodeFilterByRelCB for the output plugin. 2. Added this callback function pgoutput_table_filter for the pgoutput plugin. Its main implementation is based on the table filter in the pgoutput_change function. Its main function is to determine whether the change needs to be published based on the parameters of the publication, and if not, filter it. 3. After constructing a change and before Queue a change into a transaction, use RelidByRelfilenumber to obtain the relation associated with the change, just like obtaining the relation in the ReorderBufferProcessTXN function. 4. Relation may be a toast, and there is no good way to get its real table relation based on toast relation. Here, I get the real table oid through toast relname, and then get the real table relation. 5. This filtering takes into account INSERT/UPDATE/INSERT. Other changes have not been considered yet and can be expanded in the future. Test: 1. Added a test case 034_table_filter.pl 2. Like the case above, create two tables, the published table tbl_pub and the non-published table tbl_t1 3. Insert 10,000 rows of toast data into tbl_t1 on the publisher, and use pg_ls_replslotdir to record the total size of the slot directory every second. 4. Compare the size of the slot directory at the beginning of the transaction(size1), the size at the end of the transaction (size2), and the average size of the entire process(size3). 5. Assert(size1==size2==size3) Sincerely look forward to your feedback. Regards, lijie
v1-Reduce-useless-changes-before-reassemble-during-logi.patch
Description: Binary data