Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Hi wenhui, I carefully analyzed the reason for the performance regression with fewer temporary tables in the previous patch (v1-0002-): the k_hash_funcs determined by the bloom_create function were 10(MAX_HASH_FUNCS), which led to an excessive calculation overhead for the bloom filter. Based on the calculation formula for the bloom filter, when the number of items is 100 and k_hash_funcs is 2, the false positive rate for a 1KB bloom filter is 0.0006096; when the number of items is 1000, the false positive rate is 0.048929094. Therefore, k_hash_funcs of 2 can already achieve a decent false positive rate, while effectively reducing the computational overhead of the bloom filter. I have re-implemented a bloom_create_v2 function to create a bloom filter with a specified number of hash functions and specified memory size. From the test data below, it can be seen that the new patch in the attachment (v1-0003-) does not lead to performance regression in any scenario. Furthermore, the default threshold value can be lowered to 2. Here is the TPS performance data for different numbers of temporary tables under different thresholds, as compared with the head (98347b5a). The testing tool used is pgbench, with the workload being to insert into one temporary table (when the number of temporary tables is 0, the workload is SELECT 1): |tablenum |0 |1 |2 |5 |10 |100 |1000| |--|||---|---|---|---|---| |head(98347b5a)|39912.722209|10064.306268|9183.871298|7452.071689|5641.487369|1073.203851|114.530958 | |threshold-2 |40097.047974|10009.598155|9982.172866|9955.651235|.338901|9785.626296|8278.828828| Here is the TPS performance data for different numbers of temporary tables at a threshold of 2, compared with the head (commit 98347b5a). The testing tool is pgbench, with the workload being to insert into all temporary tables: |table num |1 |2 | 5 |10 |100|1000 | |--|---|---|---|---|--|-| |head(98347b5a)|7243.945042|5734.545012|3627.290594|2262.594766|297.856756|27.745808| |threshold-2 |7289.171381|5740.849676|3626.135510|2207.439931|293.145036|27.020953| I have previously researched the implementation of the Global Temp Table (GTT) you mentioned, and it have been used in Alibaba Cloud's PolarDB (Link [1]). GTT can prevent truncation operations on temporary tables that have not been accessed by the current session (those not in the OnCommitItem List), but GTT that have been accessed by the current session still need to be truncated at commit time.Therefore, GTT also require the optimizations mentioned in the above patch. [1] https://www.alibabacloud.com/help/en/polardb/polardb-for-oracle/using-global-temporary-tables?spm=a3c0i.23458820.2359477120.1.66e16e9bUpV7cK Best Regards, Fei Changhong v1-0003-Optimize-commit-with-temp-tables-guc.patch Description: Binary data v1-0003-Optimize-commit-with-temp-tables-without-guc.patch Description: Binary data
Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Hi wenhui, > On Jul 8, 2024, at 12:18, wenhui qiu wrote: > > Hi feichanghong > I don't think it's acceptable to introduce a patch to fix a problem that > leads to performance degradation, or can we take tom's suggestion to optimise > PreCommit_on_commit_actions? I think it to miss the forest for the trees You're right, any performance regression is certainly unacceptable. That's why we've introduced a threshold. The bloom filter optimization is only applied when the number of temporary tables exceeds this threshold. Test data also reveals that with a threshold of 10, barring cases where all temporary tables are implicated in a transaction, there's hardly any performance loss. "Improve PreCommit_on_commit_actions by having it just quit immediately if XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set" can only reduce the overhead of traversing the OnCommitItem List but still doesn't address the issue with temporary table truncation. Looking forward to more suggestions! Best Regards, Fei Changhong
Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Hi wenhui, Thank you for your suggestions. I have supplemented some performance tests. Here is the TPS performance data for different numbers of temporary tables under different thresholds, as compared with the head (98347b5a). The testing tool used is pgbench, with the workload being to insert into one temporary table (when the number of temporary tables is 0, the workload is SELECT 1): | table num | 0 | 1 | 5 | 10 | 100 | 1000| |---|--|--|-|-|-|-| | head 98347b5a | 39912.722209 | 10064.306268 | 7452.071689 | 5641.487369 | 1073.203851 | 114.530958 | | threshold 1 | 40332.367414 | 7078.117192 | 7044.951156 | 7020.249434 | 6893.652062 | 5826.597260 | | threshold 5 | 40173.562744 | 10017.532933 | 7023.770203 | 7024.283577 | 6919.769315 | 5806.314494 | Here is the TPS performance data for different numbers of temporary tables at a threshold of 5, compared with the head (commit 98347b5a). The testing tool is pgbench, with the workload being to insert into all temporary tables: | table num | 1 | 5 | 10 | 100| 1000 | |---|-|-|-||---| | head 98347b5a | 7243.945042 | 3627.290594 | 2262.594766 | 297.856756 | 27.745808 | | threshold 5 | 7287.764656 | 3130.814888 | 2038.308763 | 288.226032 | 27.705149 | According to test results, the patch does cause some performance loss with fewer temporary tables, but benefits are substantial when many temporary tables are used. The specific threshold could be set to 10 (HDDs may require a smaller one). I've provided two patches in the attachments, both with a default threshold of 10. One has the threshold configured as a GUC parameter, while the other is hardcoded to 10. Best Regards, Fei Changhong v1-0002-Optimize-commit-with-temp-tables-guc.patch Description: Binary data v1-0002-Optimize-commit-with-temp-tables-without-guc.patch Description: Binary data
Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
The patch in the attachment, compared to the previous one, adds a threshold for using the bloom filter. The current ON_COMMITS_FILTER_THRESHOLD is set to 64, which may not be the optimal value. Perhaps this threshold could be configured as a GUC parameter? Best Regards, Fei Changhong v1-0001-Optimize-commit-with-temp-tables.patch Description: Binary data
Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Thank you for your attention and suggestions. > On Jul 6, 2024, at 00:15, Tom Lane wrote: > > writes: >> PostgreSQL maintains a list of temporary tables for 'on commit >> drop/delete rows' via an on_commits list in the session. Once a >> transaction accesses a temp table or namespace, the >> XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the >> PreCommit_on_commit_actions function truncates all 'commit delete >> rows' temp tables, even those not accessed in the current transaction. >> Commit performance can degrade if there are many such temp tables. > > Hmm. I can sympathize with wanting to improve the performance of > this edge case, but it is an edge case: you are the first to > complain about it. You cannot trash the performance of more typical > cases in order to get there ... >> In the attached patch (based on HEAD): >> - A Bloom filter (can also be a list or hash table) maintains >> the temp tables accessed by the current transaction. > > ... and I'm afraid this proposal may do exactly that. Our bloom > filters are pretty heavyweight objects, so making one in situations > where it buys nothing is likely to add a decent amount of overhead. > (I've not tried to quantify that for this particular patch.) Yes, this is an edge case, but we have more than one customer facing the issue, and unfortunately, they are not willing to modify their service code. We should indeed avoid negatively impacting typical cases: - Each connection requires an extra 1KB for the filter (the original bloom filter implementation had a minimum of 1MB, which I've adjusted to this smaller value). - The filter is reset at the start of each transaction, which is unnecessary for sessions that do not access temporary tables. - In the PreCommit_on_commit_actions function, each 'on commit delete rows' temporary table has to be filtered through the bloom filter, which incurs some CPU overhead. However, this might be negligible compared to the IO cost of truncation. Adding a threshold for using the bloom filter is a good idea. We can create the bloom filter only when the current number of OnCommitItems exceeds the threshold at the start of a transaction, which should effectively avoid affecting typical cases. I will provide a new patch later to implement this. > I wonder if we could instead add marker fields to the OnCommitItem > structs indicating whether their rels were touched in the current > transaction, and use those to decide whether we need to truncate. Adding a flag to OnCommitItem to indicate whether the temp table was accessed by the current transaction is feasible. But, locating the OnCommitItem by relid efficiently when opening a relation may require an extra hash table to map relids to OnCommitItems. > Another possibility is to make the bloom filter only when the > number of OnCommitItems exceeds some threshold (compare d365ae705). > > BTW, I wonder if we could improve PreCommit_on_commit_actions by > having it just quit immediately if XACT_FLAGS_ACCESSEDTEMPNAMESPACE > is not set. I think that must be set if any ON COMMIT DROP tables > have been made, so there should be nothing to do if not. In normal > cases that's not going to buy much because the OnCommitItems list > is short, but in your scenario maybe it could win. I also think when XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set, it's unnecessary to iterate over on_commits (unless I'm overlooking something), which would be beneficial for the aforementioned scenarios as well. Best Regards, Fei Changhong
Optimize commit performance with a large number of 'on commit delete rows' temp tables
Hi hackers, # Background PostgreSQL maintains a list of temporary tables for 'on commit drop/delete rows' via an on_commits list in the session. Once a transaction accesses a temp table or namespace, the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the PreCommit_on_commit_actions function truncates all 'commit delete rows' temp tables, even those not accessed in the current transaction. Commit performance can degrade if there are many such temp tables. In practice, users created many 'commit delete rows' temp tables in a session, but each transaction only accessed a few. With varied access frequency, users were reluctant to change to 'on commit drop'. Below is an example showing the effect of the number of temp tables on commit performance: ``` -- 100 DO $$ DECLARE begin FOR i IN 1..100 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 1.325 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 1.330 ms ``` ``` -- 1000 DO $$ DECLARE begin FOR i IN 1..1000 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 10.939 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 10.955 ms ``` ``` -- 1 DO $$ DECLARE begin FOR i IN 1..1 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 110.253 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 175.875 ms ``` # Solution An intuitive solution is to truncate only the temp tables that the current process has accessed upon transaction commit. In the attached patch (based on HEAD): - A Bloom filter (can also be a list or hash table) maintains the temp tables accessed by the current transaction. - Only temp tables filtered through the Bloom filter need truncation. False positives may occur, but they are acceptable. - The Bloom filter is reset at the start of the transaction, indicating no temp tables have been accessed by the current transaction yet. After optimization, the performance for the same case is as follows: ``` -- 100 DO $$ DECLARE begin FOR i IN 1..100 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 0.447 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 0.453 ms ``` ``` -- 1000 DO $$ DECLARE begin FOR i IN 1..1000 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 0.531 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 0.567 ms ``` ``` -- 1 DO $$ DECLARE begin FOR i IN 1..1 LOOP EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ; END LOOP; END; $$; postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 1.370 ms postgres=# insert into temp_table_1 select 1; INSERT 0 1 Time: 1.362 ms ``` Hoping for some suggestions from hackers. Best Regards, Fei Changhong v1--Optimize-commit-with-temp-tables.patch Description: Binary data
logical decoding build wrong snapshot with subtransactions
This issue has been reported in the https://www.postgresql.org/message-id/18280-4c8060178cb41750%40postgresql.org Hoping for some feedback from kernel hackers, thanks! Hi, hackers, I've encountered a problem with logical decoding history snapshots. The specific error message is: "ERROR: could not map filenode "base/5/16390" to relation OID". If a subtransaction that modified the catalog ends before the restart_lsn of the logical replication slot, and the commit WAL record of its top transaction is after the restart_lsn, the WAL record related to the subtransaction won't be decoded during logical decoding. Therefore, the subtransaction won't be marked as having modified the catalog, resulting in its absence from the snapshot's committed list. The issue seems to be caused by SnapBuildXidSetCatalogChanges (introduced in 272248a) skipping checks for subtransactions when the top transaction is marked as containing catalog changes. The following steps can reproduce the problem (I increased the value of LOG_SNAPSHOT_INTERVAL_MS to avoid the impact of bgwriter writing XLOG_RUNNING_XACTS WAL records): session 1: ``` CREATE TABLE tbl1 (val1 integer, val2 integer); CREATE TABLE tbl1_part (val1 integer) PARTITION BY RANGE (val1); SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); BEGIN; SAVEPOINT sp1; CREATE TABLE tbl1_part_p1 PARTITION OF tbl1_part FOR VALUES FROM (0) TO (10); RELEASE SAVEPOINT sp1; ``` session 2: ``` CHECKPOINT; ``` session 1: ``` CREATE TABLE tbl1_part_p2 PARTITION OF tbl1_part FOR VALUES FROM (10) TO (20); COMMIT; BEGIN; TRUNCATE tbl1; ``` session 2: ``` CHECKPOINT; SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); INSERT INTO tbl1_part VALUES (1); SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); ``` To fix this issue, it is sufficient to remove the condition check for ReorderBufferXidHasCatalogChanges in SnapBuildXidSetCatalogChanges. This fix may add subtransactions that didn't change the catalog to the commit list, which seems like a false positive. However, this is acceptable since we only use the snapshot built during decoding to read system catalogs, as stated in 272248a's commit message. I have verified that the patch in the attachment resolves the issues mentioned?? and I added some test cases. I am eager to hear your suggestions on this! Best Regards, Fei Changhong Alibaba Cloud Computing Ltd. fix_wrong_snapshot_for_logical_decoding.patch Description: Binary data
Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing
It has been verified that the patch in the attachment can solve the above problems. I sincerely look forward to your suggestions! v1-try_index_open.patch Description: Binary data
Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing
For this specific job, I have always wanted a try_index_open() thatwould attempt to open the index with a relkind check, perhaps we couldintroduce one and reuse it here?Yes, replacing index_open with try_index_open solves the problem. Theidea is similar to my initial report of "after successfully opening the heaptable in reindex_index, check again whether the index still exists”.But it should be better to introduce try_index_open. v1-try_index_open.patch Description: Binary data Best Regards,Fei ChanghongAlibaba Cloud Computing Ltd.
Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing
> This is extremely nonspecific, as line numbers in our code change > constantly. Please quote a chunk of code surrounding that > and indicate which line you are trying to stop at. Thanks for the suggestion, I've refined the steps below to reproduce: 1. Initialize the data ``` DROP TABLE IF EXISTS tbl_part; CREATE TABLE tbl_part (a integer) PARTITION BY RANGE (a); CREATE TABLE tbl_part_p1 PARTITION OF tbl_part FOR VALUES FROM (0) TO (10); CREATE INDEX ON tbl_part(a); ``` 2. session1 reindex and the gdb break after the reindex_index function successfully obtains the heapId, as noted in the code chunk below: reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, const ReindexParams *params) { .. /* * Open and lock the parent heap relation. ShareLock is sufficient since * we only need to be sure no schema or data changes are going on. */ heapId = IndexGetRelation(indexId, (params->options & REINDEXOPT_MISSING_OK) != 0); > gdb break at here /* if relation is missing, leave */ if (!OidIsValid(heapId)) return; ``` REINDEX INDEX tbl_part_a_idx; ``` 3. session2 drop index succeed ``` DROP INDEX tbl_part_a_idx; ``` 4. session1 gdb continue Best Regards, Fei Changhong Alibaba Cloud Computing Ltd.
Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing
I have provided a python script in the attachment to minimize the reproduction of the issue. I'm sorry that I lost the attached script in my last reply, but I've added it in this reply. You can also use psql to reproduce it with the following steps: 1. Initialize the data ``` DROP TABLE IF EXISTS tbl_part; CREATE TABLE tbl_part (a integer) PARTITION BY RANGE (a); CREATE TABLE tbl_part_p1 PARTITION OF tbl_part FOR VALUES FROM (0) TO (10); CREATE INDEX ON tbl_part(a); ``` 2. session1 reindex and gdb break at index.c:3585 ``` REINDEX INDEX tbl_part_a_idx; ``` 3. session2 drop index succeed ``` DROP INDEX tbl_part_a_idx; ``` 4. session1 gdb continue Best Regards, Fei Changhong Alibaba Cloud Computing Ltd. reproduce.py Description: Binary data
Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing
Thank you for your attention.Any chance you could provide minimal steps to reproduce the issue onan empty PG instance, ideally as a script? That's going to be helpfulto reproduce / investigate the issue and also make sure that it'sfixed.I have provided a python script in the attachment to minimize the reproduction of the issue.The specific reproduction steps are as follows:1. Initialize the data```DROP TABLE IF EXISTS tbl_part;CREATE TABLE tbl_part (a integer) PARTITION BY RANGE (a);CREATE TABLE tbl_part_p1 PARTITION OF tbl_part FOR VALUES FROM (0) TO (10);CREATE INDEX ON tbl_part(a);```2. session1 reindex and gdb break at index.c:3585```REINDEX INDEX tbl_part_a_idx;```3. session2 drop index succeed```DROP INDEX tbl_part_a_idx;```4. session1 gdb continueimport psycopg2 import os import threading import time dbname = 'postgres' user= 'postgres' host= '127.0.0.1' port= 5432 conn1 = psycopg2.connect(database=dbname, user=user, host=host, port=port) conn1.autocommit = True conn2 = psycopg2.connect(database=dbname, user=user, host=host, port=port) conn2.autocommit = True def execute_sql(conn, sql): cur = conn.cursor() cur.execute(sql) try: res = cur.fetchall() except psycopg2.ProgrammingError as e: # ddl, dml will raise 'no results to fetch' if 'no results to fetch' in str(e): res = [] cur.close() return res def init_data(): conn = psycopg2.connect(database=dbname, user=user, host=host, port=port) conn.autocommit = True for sql in [ 'DROP TABLE IF EXISTS tbl_part', 'CREATE TABLE tbl_part (a integer) PARTITION BY RANGE (a)', 'CREATE TABLE tbl_part_p1 PARTITION OF tbl_part FOR VALUES FROM (0) TO (10)', 'CREATE INDEX ON tbl_part(a)' ]: execute_sql(conn=conn1, sql=sql) conn.close() def session1(): conn = psycopg2.connect(database=dbname, user=user, host=host, port=port) conn.autocommit = True pg_backend_pid = execute_sql(conn, 'select pg_backend_pid()')[0][0] gdb_cmd = "gdb --quiet --batch -p {} -ex 'b index.c:3585' -ex 'c' -ex 'shell sleep 5' &".format(pg_backend_pid) os.system(gdb_cmd) time.sleep(1) execute_sql(conn, 'REINDEX INDEX tbl_part_a_idx') time.sleep(5) conn.close() def session2(): conn = psycopg2.connect(database=dbname, user=user, host=host, port=port) conn.autocommit = True time.sleep(3) execute_sql(conn, 'DROP INDEX tbl_part_a_idx') conn.close() # init table init_data() t1 = threading.Thread(target=session1) t2 = threading.Thread(target=session2) t1.start() t2.start() t1.join() t2.join() Best Regards,Fei ChanghongAlibaba Cloud Computing Ltd.
"ERROR: could not open relation with OID 16391" error was encountered when reindexing
This issue has been reported in the list at the link below, but has not received a reply. https://www.postgresql.org/message-id/18286-f6273332500c2a62%40postgresql.org Hopefully to get some response from kernel hackers, thanks! Hi, When reindex the partitioned table's index and the drop index are executed concurrently, we may encounter the error "could not open relation with OID”. The reconstruction of the partitioned table's index is completed in multiple transactions and can be simply summarized into the following steps: 1. Obtain the oids of all partition indexes in the ReindexPartitions function, and then commit the transaction to release all locks. 2. Reindex each index in turn 2.1 Start a new transaction 2.2 Check whether the index still exists 2.3 Call the reindex_index function to complete the index rebuilding work 2.4 Submit transaction There is no lock between steps 2.2 and 2.3 to protect the heap table and index from being deleted, so whether the heap table still exists is determined in the reindex_index function, but the index is not checked. One fix I can think of is: after successfully opening the heap table in reindex_index, check again whether the index still exists, Something like this: diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 143fae01eb..21777ec98c 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3594,6 +3594,17 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, if (!heapRelation) return; + /* +* Before opening the index, check if the index relation still exists. +* If index relation is gone, leave. +*/ + if (params->options & REINDEXOPT_MISSING_OK != 0 && + !SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexId))) + { + table_close(heapRelation, NoLock); + return; + } + /* * Switch to the table owner's userid, so that any index functions are run * as that user. Also lock down security-restricted operations and The above analysis is based on the latest master branch. I'm not sure if my idea is reasonable, I hope you can give me some suggestions. Thanks. Best Regards, Fei Changhong Alibaba Cloud Computing Ltd.