Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, On Wednesday, April 10, 2024 17:16 MSK, Давыдов Виталий wrote: Hi Amit, Ajin, All Thank you for the patch and the responses. I apologize for my delayed answer due to some curcumstances. On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila wrote: Vitaly, does the minimal solution provided by the proposed patch (Allow to alter two_phase option of a subscriber provided no uncommitted prepared transactions are pending on that subscription.) address your use case?In general, the idea behind the patch seems to be suitable for my case. Furthermore, the case of two_phase switch from false to true with uncommitted pending prepared transactions probably never happens in my case. The switch from false to true means that the replica completes the catchup from the master and switches to the normal mode when it participates in the multi-node configuration. There should be no uncommitted pending prepared transactions at the moment of the switch to the normal mode. I'm going to try this patch. Give me please some time to investigate the patch. I will come with some feedback a little bit later. I looked at the patch and realized that I can't try it easily in the near future because the solution I'm working on is based on PG16 or earlier. This patch is not easily applicable to the older releases. I have to port my solution to the master, which is not done yet. I apologize for that - so much work should be done before applying the patch. BTW, I tested the idea with async 2PC commit on my side and it seems to work fine in my case. Anyway, I agree, the idea with altering of subscription seems the best one but much harder to implement. To summarise my case of a synchronous multimaster where twophase is used to implement global transactions: * Replica may have prepared but not committed transactions when I toggle subscription twophase from true to false. In this case, all prepared transactions may be aborted on the replica before altering the subscription. * Replica will not have prepared transactions when subscription is toggled from false to true. In this scenario, the replica completes the catchup (with twophase=off) and becomes the part of the multi-nodal cluster and is ready to accept new 2PC transactions. All the new pending transactions will wait until replica responds. But it may work differently for some other solutions. In general, it would be great to allow toggling for all scenarious.Just interested, does anyone tried to reproduce the problem with slow catchup of twophase transactions (pgbench should be used with big number of clients)? I haven't seen any messages from anyone other that me that the problem takes place. Thank you for your help! With best regards, Vitaly
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Ajin, All Thank you for the patch and the responses. I apologize for my delayed answer due to some curcumstances. On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila wrote: Vitaly, does the minimal solution provided by the proposed patch (Allow to alter two_phase option of a subscriber provided no uncommitted prepared transactions are pending on that subscription.) address your use case?In general, the idea behind the patch seems to be suitable for my case. Furthermore, the case of two_phase switch from false to true with uncommitted pending prepared transactions probably never happens in my case. The switch from false to true means that the replica completes the catchup from the master and switches to the normal mode when it participates in the multi-node configuration. There should be no uncommitted pending prepared transactions at the moment of the switch to the normal mode. I'm going to try this patch. Give me please some time to investigate the patch. I will come with some feedback a little bit later. Thank you for your help! With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Heikki, Thank you for the reply. On Tuesday, March 05, 2024 12:05 MSK, Heikki Linnakangas wrote: In a nutshell, this changes PREPARE TRANSACTION so that if synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to disk. So if you crash after the PREPARE TRANSACTION has returned, the transaction might be lost. I think that's completely unacceptable. You are right, the prepared transaction might be lost after crash. The same may happen with regular transactions that are not fsync-ed on replica in logical replication by default. The subscription parameter synchronous_commit is OFF by default. I'm not sure, is there some auto recovery for regular transactions? I think, the main difference between these two cases - how to manually recover when some PREPARE TRANSACTION or COMMIT PREPARED are lost. For regular transactions, some updates or deletes in tables on replica may be enough to fix the problem. For twophase transactions, it may be harder to fix it by hands, but it is possible, I believe. If you create a custom solution that is based on twophase transactions (like multimaster) such auto recovery may happen automatically. Another solution is to ignore errors on commit prepared if the corresponding prepared tx is missing. I don't know other risks that may happen with async commit of twophase transactions. If you're ok to lose the prepared state of twophase transactions on crash, why don't you create the subscription with 'two_phase=off' to begin with?In usual work, the subscription has two_phase = on. I have to change this option at catchup stage only, but this parameter can not be altered. There was a patch proposal in past to implement altering of two_phase option, but it was rejected. I think, the recreation of the subscription with two_phase = off will not work. I believe, async commit for twophase transactions on catchup will significantly improve the catchup performance. It is worth to think about such feature. P.S. We might introduce a GUC option to allow async commit for twophase transactions. By default, sync commit will be applied for twophase transactions, as it is now. With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, Consider, please, my patch for async commit for twophase transactions. It can be applicable when catchup performance is not enought with publication parameter twophase = on. The key changes are: * Use XLogSetAsyncXactLSN instead of XLogFlush as it is for usual transactions. * In case of async commit only, save 2PC state in the pg_twophase file (but not fsync it) in addition to saving in the WAL. The file is used as an alternative to storing 2pc state in the memory. * On recovery, reject pg_twophase files with future xids.Probably, 2PC async commit should be enabled by a GUC (not implemented in the patch). With best regards, Vitaly From cbaaa7270d771f9ccd6def08f0f02ce61dc15ff6 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Thu, 29 Feb 2024 18:58:13 +0300 Subject: [PATCH] Async commit support for twophase transactions --- src/backend/access/transam/twophase.c | 171 +- 1 file changed, 138 insertions(+), 33 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 234c8d08eb..352266be14 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -109,6 +109,8 @@ #include "utils/memutils.h" #include "utils/timestamp.h" +#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + /* * Directory where Two-phase commit files reside within PGDATA */ @@ -169,6 +171,7 @@ typedef struct GlobalTransactionData BackendId locking_backend; /* backend currently working on the xact */ bool valid; /* true if PGPROC entry is in proc array */ bool ondisk; /* true if prepare state file is on disk */ + bool infile; /* true if prepared state saved in file (but not fsync-ed) */ bool inredo; /* true if entry was added via xlog_redo */ char gid[GIDSIZE]; /* The GID assigned to the prepared xact */ } GlobalTransactionData; @@ -227,12 +230,14 @@ static void RemoveGXact(GlobalTransaction gxact); static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len); static char *ProcessTwoPhaseBuffer(TransactionId xid, XLogRecPtr prepare_start_lsn, - bool fromdisk, bool setParent, bool setNextXid); + bool fromdisk, bool setParent, bool setNextXid, + const char *filename); static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, TimestampTz prepared_at, Oid owner, Oid databaseid); static void RemoveTwoPhaseFile(TransactionId xid, bool giveWarning); -static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len); +static void RemoveTwoPhaseFileByName(const char *filename, bool giveWarning); +static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len, bool dosync); /* * Initialization of shared memory @@ -427,6 +432,7 @@ MarkAsPreparing(TransactionId xid, const char *gid, MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid); + gxact->infile = false; gxact->ondisk = false; /* And insert it into the active array */ @@ -1204,6 +1210,37 @@ EndPrepare(GlobalTransaction gxact) (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("two-phase state file maximum length exceeded"))); +#ifdef POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + + Assert(gxact->infile == false); + + if (synchronous_commit == SYNCHRONOUS_COMMIT_OFF) + { + char *buf; + size_t len = 0; + size_t offset = 0; + + for (record = records.head; record != NULL; record = record->next) + len += record->len; + + if (len > 0) + { + buf = palloc(len); + + for (record = records.head; record != NULL; record = record->next) + { +memcpy(buf + offset, record->data, record->len); +offset += record->len; + } + + RecreateTwoPhaseFile(gxact->xid, buf, len, false); + pfree(buf); + gxact->infile = true; + } + } + +#endif + /* * Now writing 2PC state data to WAL. We let the WAL's CRC protection * cover us, so no need to calculate a separate CRC. @@ -1239,8 +1276,24 @@ EndPrepare(GlobalTransaction gxact) gxact->prepare_end_lsn); } +#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT) + XLogFlush(gxact->prepare_end_lsn); +#else + + if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF) + { + /* Flush XLOG to disk */ + XLogFlush(gxact->prepare_end_lsn); + } + else + { + XLogSetAsyncXactLSN(gxact->prepare_end_lsn); + } + +#endif + /* If we crash now, we have prepared: WAL replay will fix things */ /* Store record's start location to read that later on Commit */ @@ -1546,12 +1599,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit) * in WAL files if the LSN is after the last checkpoint record, or moved * to disk if for some reason they have lived for a long time. */ - if (gxact->ondisk) + if (gxact->infile || gxact->ondisk) buf = ReadTwoPhaseFile(xid, false); else XlogReadTwoPhaseData(gxact->prepare_start_lsn, &buf, NULL); - /* * Disassemble the header area */ @@
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, On Tuesday, February 27, 2024 16:00 MSK, Amit Kapila wrote: As we do XLogFlush() at the time of prepare then why it is not available? OR are you talking about this state after your idea/patch where you are trying to make both Prepare and Commit_prepared records async?Right, I'm talking about my patch where async commit is implemented. There is no such problem with reading 2PC from not flushed WAL in the vanilla because XLogFlush is called unconditionally, as you've described. But an attempt to add some async stuff leads to the problem of reading not flushed WAL. It is why I store 2pc state in the local memory in my patch. It would be good if you could link those threads.Sure, I will find and add some links to the discussions from past. Thank you! With best regards, Vitaly On Tue, Feb 27, 2024 at 4:49 PM Давыдов Виталий wrote: > > Thank you for your interest in the discussion! > > On Monday, February 26, 2024 16:24 MSK, Amit Kapila > wrote: > > > I think the reason is probably that when the WAL record for prepared is > already flushed then what will be the idea of async commit here? > > I think, the idea of async commit should be applied for both transactions: > PREPARE and COMMIT PREPARED, which are actually two separate local > transactions. For both these transactions we may call XLogSetAsyncXactLSN on > commit instead of XLogFlush when async commit is enabled. When I use async > commit, I mean to apply async commit to local transactions, not to a twophase > (prepared) transaction itself. > > > At commit prepared, it seems we read prepare's WAL record, right? If so, it > is not clear to me do you see a problem with a flush of commit_prepared or > reading WAL for prepared or both of these. > > The problem with reading WAL is due to async commit of PREPARE TRANSACTION > which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with > PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet. > As we do XLogFlush() at the time of prepare then why it is not available? OR are you talking about this state after your idea/patch where you are trying to make both Prepare and Commit_prepared records async? So, PREPARE TRANSACTION should wait until its 2PC state is flushed. > > I did some experiments with saving 2PC state in the local memory of logical > replication worker and, I think, it worked and demonstrated much better > performance. Logical replication worker utilized up to 100% CPU. I'm just > concerned about possible problems with async commit for twophase transactions. > > To be more specific, I've attached a patch to support async commit for > twophase. It is not the final patch but it is presented only for discussion > purposes. There were some attempts to save 2PC in memory in past but it was > rejected. > It would be good if you could link those threads. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Thank you for your interest in the discussion! On Monday, February 26, 2024 16:24 MSK, Amit Kapila wrote: I think the reason is probably that when the WAL record for prepared is already flushed then what will be the idea of async commit here?I think, the idea of async commit should be applied for both transactions: PREPARE and COMMIT PREPARED, which are actually two separate local transactions. For both these transactions we may call XLogSetAsyncXactLSN on commit instead of XLogFlush when async commit is enabled. When I use async commit, I mean to apply async commit to local transactions, not to a twophase (prepared) transaction itself. At commit prepared, it seems we read prepare's WAL record, right? If so, it is not clear to me do you see a problem with a flush of commit_prepared or reading WAL for prepared or both of these.The problem with reading WAL is due to async commit of PREPARE TRANSACTION which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet. So, PREPARE TRANSACTION should wait until its 2PC state is flushed. I did some experiments with saving 2PC state in the local memory of logical replication worker and, I think, it worked and demonstrated much better performance. Logical replication worker utilized up to 100% CPU. I'm just concerned about possible problems with async commit for twophase transactions. To be more specific, I've attached a patch to support async commit for twophase. It is not the final patch but it is presented only for discussion purposes. There were some attempts to save 2PC in memory in past but it was rejected. Now, there might be the second round to discuss it. With best regards, Vitaly From 549f809fa122ca0842ec4bfc775afd08feee0d80 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Tue, 27 Feb 2024 14:02:23 +0300 Subject: [PATCH] Add asynchronous commit support for 2PC --- src/backend/access/transam/twophase.c | 111 +- 1 file changed, 108 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index c6af8cfd7e..52f0853db8 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -109,6 +109,8 @@ #include "utils/memutils.h" #include "utils/timestamp.h" +#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + /* * Directory where Two-phase commit files reside within PGDATA */ @@ -163,6 +165,9 @@ typedef struct GlobalTransactionData */ XLogRecPtr prepare_start_lsn; /* XLOG offset of prepare record start */ XLogRecPtr prepare_end_lsn; /* XLOG offset of prepare record end */ + void* prepare_2pc_mem_data; + size_t prepare_2pc_mem_len; + pid_t prepare_2pc_proc; TransactionId xid; /* The GXACT id */ Oid owner; /* ID of user that executed the xact */ @@ -427,6 +432,9 @@ MarkAsPreparing(TransactionId xid, const char *gid, MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid); + Assert(gxact->prepare_2pc_mem_data == NULL); + Assert(gxact->prepare_2pc_proc == 0); + gxact->ondisk = false; /* And insert it into the active array */ @@ -1129,6 +1137,8 @@ StartPrepare(GlobalTransaction gxact) } } +extern bool IsLogicalWorker(void); + /* * Finish preparing state data and writing it to WAL. */ @@ -1167,6 +1177,37 @@ EndPrepare(GlobalTransaction gxact) (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("two-phase state file maximum length exceeded"))); + Assert(gxact->prepare_2pc_mem_data == NULL); + Assert(gxact->prepare_2pc_proc == 0); + + if (IsLogicalWorker()) + { + size_t len = 0; + size_t offset = 0; + + for (record = records.head; record != NULL; record = record->next) + len += record->len; + + if (len > 0) + { + MemoryContext oldmemctx; + + oldmemctx = MemoryContextSwitchTo(TopMemoryContext); + + gxact->prepare_2pc_mem_data = palloc(len); + gxact->prepare_2pc_mem_len = len; + gxact->prepare_2pc_proc = getpid(); + + for (record = records.head; record != NULL; record = record->next) + { +memcpy((char *)gxact->prepare_2pc_mem_data + offset, record->data, record->len); +offset += record->len; + } + + MemoryContextSwitchTo(oldmemctx); + } + } + /* * Now writing 2PC state data to WAL. We let the WAL's CRC protection * cover us, so no need to calculate a separate CRC. @@ -1202,8 +1243,24 @@ EndPrepare(GlobalTransaction gxact) gxact->prepare_end_lsn); } +#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT) + XLogFlush(gxact->prepare_end_lsn); +#else + + if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF) + { + /* Flush XLOG to disk */ + XLogFlush(gxact->prepare_end_lsn); + } + else + { + XLogSetAsyncXactLSN(gxact->prepare_end_lsn); + } + +#endif + /* If we crash now, we have prepared: WAL replay will fix things */ /* Store record's start location to read that later on Commit */ @@ -1
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Amit Kapila wrote: I don't see we do anything specific for 2PC transactions to make them behave differently than regular transactions with respect to synchronous_commit setting. What makes you think so? Can you pin point the code you are referring to?Yes, sure. The function RecordTransactionCommitPrepared is called on prepared transaction commit (twophase.c). It calls XLogFlush unconditionally. The function RecordTransactionCommit (for regular transactions, xact.c) calls XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN. There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) that shows that async commit is not supported yet: /* * We don't currently try to sleep before flush here ... nor is there any * support for async commit of a prepared xact (the very idea is probably * a contradiction) */ /* Flush XLOG to disk */ XLogFlush(recptr); Right, I think for this we need to implement parallel apply.Yes, parallel apply is a good point. But, I believe, it will not work if asynchronous commit is not supported. You have only one receiver process which should dispatch incoming messages to parallel workers. I guess, you will never reach such rate of parallel execution on replica as on the master with multiple backends. Can you be a bit more specific about what exactly you have in mind to achieve the above solutions?My proposal is to implement async commit for 2PC transactions as it is for regular transactions. It should significantly speedup the catchup process. Then, think how to apply in parallel, which is much diffcult to do. The current problem is to get 2PC state from the WAL on commit prepared. At this moment, the WAL is not flushed yet, commit function waits until WAL with 2PC state is to be flushed. I just tried to do it in my sandbox and found such a problem. Inability to get 2PC state from unflushed WAL stops me right now. I think about possible solutions. The idea with enableFsync is not a suitable solution, in general, I think. I just pointed it as an alternate idea. You just do enableFsync = false before prepare or commit prepared and do enableFsync = true after these functions. In this case, 2PC records will not be fsync-ed, but FlushPtr will be increased. Thus, 2PC state can be read from WAL on commit prepared without waiting. To make it work correctly, I guess, we have to do some additional work to keep more wal on the master and filter some duplicate transactions on the replica, if replica restarts during catchup. With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Ajin, Thank you for your feedback. Could you please try to increase the number of clients (-c pgbench option) up to 20 or more? It seems, I forgot to specify it. With best regards, Vitaly Davydov On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий wrote: Dear All, I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. To reproduce the problem: * Setup logical replication from master to replica with subscription parameter twophase = true. * Create some intermediate load on the master (use pgbench with custom sql with prepare+commit) * Optionally switch off the replica for some time (keep load on master). * Switch on the replica and wait until it reaches the master. The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load. I tried this setup and I do see that the logical subscriber does reach the master in a short time. I'm not sure what I'm missing. I stopped the logical subscriber in between while pgbench was running and then started it again and ran the following:postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; sent_lsn | pg_current_wal_lsn ---+ 0/6793FA0 | 0/6793FA0 <=== caught up (1 row) My pgbench command:pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql -T 200 -P 5 my custom sql file:cat test.sql SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$; regards,Ajin CherianFujitsu Australia
Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. To reproduce the problem: * Setup logical replication from master to replica with subscription parameter twophase = true. * Create some intermediate load on the master (use pgbench with custom sql with prepare+commit) * Optionally switch off the replica for some time (keep load on master). * Switch on the replica and wait until it reaches the master. The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load. I think, the main proplem of 2PC catchup bad performance - the lack of asynchronous commit support for 2PC. For regular transactions asynchronous commit is used on the replica by default (subscrition sycnronous_commit = off). It allows the replication worker process on the replica to avoid fsync (XLogFLush) and to utilize 100% CPU (the background wal writer or checkpointer will do fsync). I agree, 2PC are mostly used in multimaster configurations with two or more nodes which are performed synchronously, but when the node in catchup (node is not online in a multimaster cluster), asynchronous commit have to be used to speedup the catchup. There is another thing that affects on the disbalance of the master and replica performance. When the master executes requestes from multiple clients, there is a fsync optimization takes place in XLogFlush. It allows to decrease the number of fsync in case when a number of parallel backends write to the WAL simultaneously. The replica applies received transactions in one thread sequentially, such optimization is not applied. I see some possible solutions: * Implement asyncronous commit for 2PC transactions. * Do some hacking with enableFsync when it is possible. I think, asynchronous commit support for 2PC transactions should significantly increase replica performance and help to solve this problem. I tried to implement it (like for usual transactions) but I've found another problem: 2PC state is stored in WAL on prepare, on commit we have to read 2PC state from WAL but the read is delayed until WAL is flushed by the background wal writer (read LSN should be less than flush LSN). Storing 2PC state in a shared memory (as it proposed earlier) may help. I used the following query to monitor the catchup progress on the master:SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; I used the following script for pgbench to the master:SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$; What do you think? With best regards, Vitaly Davydov
Re: How to accurately determine when a relation should use local buffers?
Hi Aleksander, Well even assuming this patch will make it to the upstream some day, which I seriously doubt, it will take somewhere between 2 and 5 years. Personally I would recommend reconsidering this design. I understand what you are saying. I have no plans to create a patch for this issue. I would like to believe that my case will be taken into consideration for next developments. Thank you very much for your help! With best regards, Vitaly
Re: How to accurately determine when a relation should use local buffers?
Hi Aleksander, I sort of suspect that you are working on a very specific extension and/or feature for PG fork. Any chance you could give us more details about the case?I'm trying to adapt a multimaster solution to some changes in pg16. We replicate temp table DDL due to some reasons. Furthermore, such tables should be accessible from other processes than the replication receiver process on a replica, and they still should be temporary. I understand that DML replication for temporary tables will cause a severe performance degradation. But it is not our case. There are some changes in ReadBuffer logic if to compare with pg15. To define which buffers to use, ReadBuffer used SmgrIsTemp function in pg15. The decision was based on backend id of the relation. In pg16 the decision is based on relpersistence attribute, that caused some problems on my side. My opinion, we should choose local buffers based on backend ids of relations, not on its persistence. Additional check for relpersistence prior to backend id may improve the performance in some cases, I think. The internal design may become more flexible as a result. With best regards, Vitaly Davydov
Re: How to accurately determine when a relation should use local buffers?
Hi Aleksander, Thank you for your answers. It seems, local buffers are used for temporary relations unconditionally. In this case, we may check either relpersistence or backend id, or both of them. I didn't do a deep investigation of the code in this particular aspect but that could be a fair point. Would you like to propose a refactoring that unifies the way we check if the relation is temporary?I would propose not to associate temporary relations with local buffers. I would say, that we that we should choose local buffers only in a backend context. It is the primary condition. Thus, to choose local buffers, two checks should be succeeded: * relpersistence (RelationUsesLocalBuffers) * backend id (SmgrIsTemp)I know, it may be not as effective as to check relpersistence only, but it makes the internal architecture more flexible, I believe. With best regards, Vitaly Davydov
How to accurately determine when a relation should use local buffers?
Dear Hackers, I would like to clarify, what the correct way is to determine that a given relation is using local buffers. Local buffers, as far as I know, are used for temporary tables in backends. There are two functions/macros (bufmgr.c): SmgrIsTemp, RelationUsesLocalBuffers. The first function verifies that the current process is a regular session backend, while the other macro verifies the relation persistence characteristic. It seems, the use of each function independently is not correct. I think, these functions should be applied in pair to check for local buffers use, but, it seems, these functions are used independently. It works until temporary tables are allowed only in session backends. I'm concerned, how to determine the use of local buffers in some other theoretical cases? For example, if we decide to replicate temporary tables? Are there the other cases, when local buffers can be used with relations in the Vanilla? Do we allow the use of relations with RELPERSISTENCE_TEMP not only in session backends? Thank you in advance for your help! With best regards, Vitaly Davydov