XACT_EVENT for 'commit prepared'
Hi hackers, I found that in enum XactEvent, there is 'XACT_EVENT_PREPARE' for 'prepare transaction', but there is no event for 'commit prepared' or 'rollback prepared'. For the following SQL: begin; create table test(a int); PREPARE TRANSACTION 'foo'; rollback prepared 'foo'; - When executing ' rollback prepared 'foo'; ', I expected to get 'XACT_EVENT_ABORT', but actually, the event type is 'XACT_EVENT_COMMIT'. I think XACT_EVENT_COMMIT_PREPARED and XACT_EVENT_ROLLBACK_PREPARED can be added in function 'FinishPreparedTransaction' I'm confused why there are no related events for them.
Re: Recovering from detoast-related catcache invalidations
This is an interesting idea. Although some catalog tables are not in catcaches, such as pg_depend, when scanning them, if there is any SharedInvalidationMessage, the CatalogSnapshot will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly" in syscache.c) Maybe during the system_scan, it receives the SharedInvalidationMessages and returns the tuples which are out of date. systable_recheck_tuple is used in dependency.c for such case. Tom Lane 于2024年1月14日周日 03:12写道: > I wrote: > > Xiaoran Wang writes: > >> Hmm, how about first checking if any invalidated shared messages have > been > >> accepted, then rechecking the tuple's visibility? > >> If there is no invalidated shared message accepted during > >> 'toast_flatten_tuple', > >> there is no need to do then visibility check, then it can save several > >> CPU cycles. > > > Meh, I'd just as soon not add the additional dependency/risk of bugs. > > This is an expensive and seldom-taken code path, so I don't think > > shaving a few cycles is really important. > > It occurred to me that this idea might be more interesting if we > could encapsulate it right into systable_recheck_tuple: something > like having systable_beginscan capture the current > SharedInvalidMessageCounter and save it in the SysScanDesc struct, > then compare in systable_recheck_tuple to possibly short-circuit > that work. This'd eliminate one of the main bug hazards in the > idea, namely that you might capture SharedInvalidMessageCounter too > late, after something's already happened. However, the whole idea > only works for catalogs that have catcaches, and the other users of > systable_recheck_tuple are interested in pg_depend which doesn't. > So that put a damper on my enthusiasm for the idea. > > regards, tom lane >
Re: Recovering from detoast-related catcache invalidations
Hmm, how about first checking if any invalidated shared messages have been accepted, then rechecking the tuple's visibility? If there is no invalidated shared message accepted during 'toast_flatten_tuple', there is no need to do then visibility check, then it can save several CPU cycles. if (inval_count != SharedInvalidMessageCounter && !systable_recheck_tuple(scandesc, ntp)) { heap_freetuple(dtp); return NULL; } Xiaoran Wang 于2024年1月13日周六 13:16写道: > Great! That's what exactly we need. > > The patch LGTM, +1 > > > Tom Lane 于2024年1月13日周六 04:47写道: > >> I wrote: >> > This is uncomfortably much in bed with the tuple table slot code, >> > perhaps, but I don't see a way to do it more cleanly unless we want >> > to add some new provisions to that API. Andres, do you have any >> > thoughts about that? >> >> Oh! After nosing around a bit more I remembered systable_recheck_tuple, >> which is meant for exactly this purpose. So v4 attached. >> >> regards, tom lane >> >>
Re: Recovering from detoast-related catcache invalidations
Great! That's what exactly we need. The patch LGTM, +1 Tom Lane 于2024年1月13日周六 04:47写道: > I wrote: > > This is uncomfortably much in bed with the tuple table slot code, > > perhaps, but I don't see a way to do it more cleanly unless we want > > to add some new provisions to that API. Andres, do you have any > > thoughts about that? > > Oh! After nosing around a bit more I remembered systable_recheck_tuple, > which is meant for exactly this purpose. So v4 attached. > > regards, tom lane > >
Re: Recovering from detoast-related catcache invalidations
> Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather > than GetCatalogSnapshot is the right thing, because the catcaches > use the latter. Yes, you are right, should use GetCatalogSnapshot here. > Maybe, but that undocumented hack in SetHintBits seems completely > unacceptable. Isn't there a cleaner way to make this check? Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the tuple has been deleted. As the tuple's xmin must been committed, so we just need to check if its xmax is committed, like the below: @@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, */ if (HeapTupleHasExternal(ntp)) { + TransactionId xmax; dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc); - if (!HeapTupleSatisfiesVisibility(ntp, GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer)) + xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data); + if (TransactionIdIsValid(xmax) && TransactionIdDidCommit(xmax)) { heap_freetuple(dtp); return NULL; I'm not quite sure the code is correct, I cannot clearly understand 'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it. Any thoughts? Tom Lane 于2024年1月12日周五 06:21写道: > Xiaoran Wang writes: > >>> The detection of "get an invalidation" could be refined: what I did > >>> here is to check for any advance of SharedInvalidMessageCounter, > >>> which clearly will have a significant number of false positives. > > > I have reviewed your patch, and it looks good. But instead of checking > for > > any advance of SharedInvalidMessageCounter ( if the invalidate message is > > not related to the current tuple, it is a little expensive) I have > another > > idea: we can recheck the visibility of the tuple with > CatalogSnapshot(the > > CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) > if > > it is not visible, we re-fetch the tuple, otherwise, we can continue to > use > > it as it is not outdated. > > Maybe, but that undocumented hack in SetHintBits seems completely > unacceptable. Isn't there a cleaner way to make this check? > > Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather > than GetCatalogSnapshot is the right thing, because the catcaches > use the latter. > > regards, tom lane >
Re: Recovering from detoast-related catcache invalidations
Hi, >> BTW, while nosing around I found what seems like a very nasty related >> bug. Suppose that a catalog tuple being loaded into syscache contains >> some toasted fields. CatalogCacheCreateEntry will flatten the tuple, >> involving fetches from toast tables that will certainly cause >> AcceptInvalidationMessages calls. What if one of those should have >> invalidated this tuple? We will not notice, because it's not in >> the hashtable yet. When we do add it, we will mark it not-dead, >> meaning that the stale entry looks fine and could persist for a long >> while. I spent some time trying to understand the bug and finally, I can reproduce it locally with the following steps: step1: create a function called 'test' with a long body that must be stored in a toast table. and put it in schema 'yy' by : "alter function test set schema yy"; step 2: I added a breakpoint at 'toast_flatten_tuple' for session1 , then execute the following SQL: -- set search_path='public'; alter function test set schema xx; -- step 3: when the session1 stops at the breakpoint, I open session2 and execute --- set search_path = 'yy'; alter function test set schema public; --- step4: resume the session1 , it reports the error "ERROR: could not find a function named "test"" step 5: continue to execute "alter function test set schema xx;" in session1, but it still can not work and report the above error although the function test already belongs to schema 'public' Obviously, in session 1, the "test" proc tuple in the cache is outdated. >> The detection of "get an invalidation" could be refined: what I did >> here is to check for any advance of SharedInvalidMessageCounter, >> which clearly will have a significant number of false positives. >> However, the only way I see to make that a lot better is to >> temporarily create a placeholder catcache entry (probably a negative >> one) with the same keys, and then see if it got marked dead. >> This seems a little expensive, plus I'm afraid that it'd be actively >> wrong in the recursive-lookup cases that the existing comment in >> SearchCatCacheMiss is talking about (that is, the catcache entry >> might mislead any recursive lookup that happens). I have reviewed your patch, and it looks good. But instead of checking for any advance of SharedInvalidMessageCounter ( if the invalidate message is not related to the current tuple, it is a little expensive) I have another idea: we can recheck the visibility of the tuple with CatalogSnapshot(the CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if it is not visible, we re-fetch the tuple, otherwise, we can continue to use it as it is not outdated. I added a commit based on your patch and attached it. 0001-Recheck-the-tuple-visibility.patch Description: Binary data
Re: Avoid computing ORDER BY junk columns unnecessarily
Hi Heikii, I haven't dug into your patch yet, but for this problem, I have another idea. --- explain verbose select foo from mytable order by sha256(bar::bytea); QUERY PLAN Index Scan using mytable_sha256_idx on public.mytable (cost=0.29..737.28 rows=1 width=64) Output: foo, sha256((bar)::bytea) (2 rows) The index is used to satisfy the ORDER BY, but the expensive ORDER BY expression is still computed for every row, just to be thrown away by the junk filter. -- How about adding the orderby column value in 'xs_heaptid' with the 'xs_heaptid' together? So that we can use that value directly instead of computing it when using an index scan to fetch the ordered data. Another problem I am concerned about is that if we exclude junk columns in the sort plan, we may change its behavior. I'm not sure if it can lead to some other issues. Heikki Linnakangas 于2023年12月22日周五 02:39写道: > Problem > --- > > We are using junk columns for (at least) two slightly different purposes: > > 1. For passing row IDs and other such data from lower plan nodes to > LockRows / ModifyTable. > > 2. To represent ORDER BY and GROUP BY columns that don't appear in the > SELECT list. For example, in a query like: > > SELECT foo FROM mytable ORDER BY bar; > > The parser adds 'bar' to the target list as a junk column. You can see > that with EXPLAIN VERBOSE: > > explain (verbose, costs off) >select foo from mytable order by bar; > > QUERY PLAN > -- > Sort > Output: foo, bar > Sort Key: mytable.bar > -> Seq Scan on public.mytable > Output: foo, bar > (5 rows) > > The 'bar' column get filtered away in the executor, by the so-called > junk filter. That's fine for simple cases like the above, but in some > cases, that causes the ORDER BY value to be computed unnecessarily. For > example: > > create table mytable (foo text, bar text); > insert into mytable select g, g from generate_series(1, 1) g; > create index on mytable (sha256(bar::bytea)); > explain verbose > select foo from mytable order by sha256(bar::bytea); > > QUERY PLAN > > > > Index Scan using mytable_sha256_idx on public.mytable > (cost=0.29..737.28 rows=1 width=64) > Output: foo, sha256((bar)::bytea) > (2 rows) > > The index is used to satisfy the ORDER BY, but the expensive ORDER BY > expression is still computed for every row, just to be thrown away by > the junk filter. > > This came up with pgvector, as the vector distance functions are pretty > expensive. All vector operations are expensive, so one extra distance > function call per row doesn't necessarily make that much difference, but > it sure looks silly. See > https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021 > (thanks Matthias for the investigation!). > > Solution > > > The obvious solution is that the planner should not include those junk > columns in the plan. But how exactly to implement that is a different > matter. > > I came up with the attached patch set, which adds a projection to all > the paths at the end of planning in grouping_planner(). The projection > filters out the unnecessary junk columns. With that, the plan for the > above example: > > postgres=# explain verbose select foo from mytable order by > sha256(bar::bytea); >QUERY PLAN > > > --- > Index Scan using mytable_sha256_idx on public.mytable > (cost=0.29..662.24 rows=1 width=4) > Output: foo > (2 rows) > > > Problems with the solution > -- > > So this seems to work, but I have a few doubts: > > 1. Because Sort cannot project, this adds an extra Result node on top of > Sort nodes when the the ORDER BY is implemented by sorting: > > postgres=# explain verbose select foo from mytable order by bar; > QUERY PLAN > > > > Result (cost=818.39..843.39 rows=1 width=4) > Output: foo > -> Sort (cost=818.39..843.39 rows=1 width=8) > Output: foo, bar > Sort Key: mytable.bar > -> Seq Scan on public.mytable (cost=0.00..154.00 rows=1 > width=8) > Output: foo, bar > (7 rows) > > From a performance point of view, I think that's not as bad as it > sounds. Remember that without this patch, the executor needs to execute > the junk filter to filter out the extra column instead. It's not clear > that an extra Result is worse than that, although I haven't tried > benchmarking it though. > > This
Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages
Hi, I updated the comment about the CatalogSnapshot `src/backend/utils/time/ snapmgr.c` Xiaoran Wang 于2023年12月18日周一 15:02写道: > Hi, > Thanks for your reply. > > jian he 于2023年12月18日周一 08:20写道: > >> Hi >> ---setup. >> drop table s2; >> create table s2(a int); >> >> After apply the patch >> alter table s2 add primary key (a); >> >> watch CatalogSnapshot >> >> #0 GetNonHistoricCatalogSnapshot (relid=1259) >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412 >> #1 0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259) >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371 >> #2 0x55ba785ffbe1 in systable_beginscan >> (heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false, >> snapshot=0x0, nkeys=1, key=0x7ffe230f0180) >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413 >> (More stack frames follow...) >> >> - >> Hardware watchpoint 13: CatalogSnapshot >> >> Old value = (Snapshot) 0x55ba7980b6a0 >> New value = (Snapshot) 0x0 >> InvalidateCatalogSnapshot () at >> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435 >> 435 SnapshotResetXmin(); >> (gdb) bt 4 >> #0 InvalidateCatalogSnapshot () >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435 >> #1 0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true, >> resetXmin=false) >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057 >> #2 0x55ba7868201b in CommitTransaction () >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373 >> #3 0x55ba78683495 in CommitTransactionCommand () >> at >> ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061 >> (More stack frames follow...) >> >> -- >> but the whole process changes pg_class, pg_index, >> pg_attribute,pg_constraint etc. >> only one GetCatalogSnapshot and InvalidateCatalogSnapshot seems not >> correct? >> what if there are concurrency changes in the related pg_catalog table. >> >> your patch did pass the isolation test! >> > > Yes, I have run the installcheck-world locally, and all the tests passed. > There are two kinds of Invalidation Messages. > One kind is from the local backend, such as what you did in the example > "alter table s2 add primary key (a);", it modifies the pg_class, > pg_attribute ect , > so it generates some Invalidation Messages to invalidate the "s2" related > tuples in pg_class , pg_attribute ect, and Invalidate Message to > invalidate s2 > relation cache. When the command is finished, in the > CommandCounterIncrement, > those Invalidation Messages will be processed to make the system cache work > well for the following commands. > > The other kind of Invalidation Messages are from other backends. > Suppose there are two sessions: > session1 > --- > 1: create table foo(a int); > --- > session 2 > --- > 1: create table test(a int); (before session1:1) > 2: insert into foo values(1); (execute after session1:1) > --- > Session1 will generate Invalidation Messages and send them when the > transaction is committed, > and session 2 will accept those Invalidation Messages from session 1 and > then execute > the second command. > > Before the patch, Postgres will invalidate the CatalogSnapshot for those > two kinds of Invalidation > Messages. So I did a small optimization in this patch, for local > Invalidation Messages, we don't > call InvalidateCatalogSnapshot, we can use one CatalogSnapshot in a > transaction even if we modify > the catalog and generate Invalidation Messages, as the visibility of the > tuple is identified by the curcid, > as long as we update the curcid of the CatalogSnapshot in > SnapshotSetCommandId, > it can work > correctly. > > > >> I think you patch doing is against following code comments in >> src/backend/utils/time/snapmgr.c >> >> /* >> * CurrentSnapshot points to the only snapshot taken in >> transaction-snapshot >> * mode, and to the latest one taken in a read-committed transaction. >> * SecondarySnapshot is a snapshot that's always up-to-date as of the >> current >> * instant, even in transaction-snapshot mode. It should only be used for >> * special-purpose code (say, RI checking.) CatalogSnapshot points to an >> * MVCC snapshot intended to be used for catalog s
Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages
Hi, Thanks for your reply. jian he 于2023年12月18日周一 08:20写道: > Hi > ---setup. > drop table s2; > create table s2(a int); > > After apply the patch > alter table s2 add primary key (a); > > watch CatalogSnapshot > > #0 GetNonHistoricCatalogSnapshot (relid=1259) > at > ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:412 > #1 0x55ba78f0d6ba in GetCatalogSnapshot (relid=1259) > at > ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:371 > #2 0x55ba785ffbe1 in systable_beginscan > (heapRelation=0x7f256f30b5a8, indexId=2662, indexOK=false, > snapshot=0x0, nkeys=1, key=0x7ffe230f0180) > at > ../../Desktop/pg_src/src7/postgresql/src/backend/access/index/genam.c:413 > (More stack frames follow...) > > - > Hardware watchpoint 13: CatalogSnapshot > > Old value = (Snapshot) 0x55ba7980b6a0 > New value = (Snapshot) 0x0 > InvalidateCatalogSnapshot () at > ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435 > 435 SnapshotResetXmin(); > (gdb) bt 4 > #0 InvalidateCatalogSnapshot () > at > ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:435 > #1 0x55ba78f0ee85 in AtEOXact_Snapshot (isCommit=true, > resetXmin=false) > at > ../../Desktop/pg_src/src7/postgresql/src/backend/utils/time/snapmgr.c:1057 > #2 0x55ba7868201b in CommitTransaction () > at > ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:2373 > #3 0x55ba78683495 in CommitTransactionCommand () > at > ../../Desktop/pg_src/src7/postgresql/src/backend/access/transam/xact.c:3061 > (More stack frames follow...) > > -- > but the whole process changes pg_class, pg_index, > pg_attribute,pg_constraint etc. > only one GetCatalogSnapshot and InvalidateCatalogSnapshot seems not > correct? > what if there are concurrency changes in the related pg_catalog table. > > your patch did pass the isolation test! > Yes, I have run the installcheck-world locally, and all the tests passed. There are two kinds of Invalidation Messages. One kind is from the local backend, such as what you did in the example "alter table s2 add primary key (a);", it modifies the pg_class, pg_attribute ect , so it generates some Invalidation Messages to invalidate the "s2" related tuples in pg_class , pg_attribute ect, and Invalidate Message to invalidate s2 relation cache. When the command is finished, in the CommandCounterIncrement, those Invalidation Messages will be processed to make the system cache work well for the following commands. The other kind of Invalidation Messages are from other backends. Suppose there are two sessions: session1 --- 1: create table foo(a int); --- session 2 --- 1: create table test(a int); (before session1:1) 2: insert into foo values(1); (execute after session1:1) --- Session1 will generate Invalidation Messages and send them when the transaction is committed, and session 2 will accept those Invalidation Messages from session 1 and then execute the second command. Before the patch, Postgres will invalidate the CatalogSnapshot for those two kinds of Invalidation Messages. So I did a small optimization in this patch, for local Invalidation Messages, we don't call InvalidateCatalogSnapshot, we can use one CatalogSnapshot in a transaction even if we modify the catalog and generate Invalidation Messages, as the visibility of the tuple is identified by the curcid, as long as we update the curcid of the CatalogSnapshot in SnapshotSetCommandId, it can work correctly. > I think you patch doing is against following code comments in > src/backend/utils/time/snapmgr.c > > /* > * CurrentSnapshot points to the only snapshot taken in > transaction-snapshot > * mode, and to the latest one taken in a read-committed transaction. > * SecondarySnapshot is a snapshot that's always up-to-date as of the > current > * instant, even in transaction-snapshot mode. It should only be used for > * special-purpose code (say, RI checking.) CatalogSnapshot points to an > * MVCC snapshot intended to be used for catalog scans; we must invalidate > it > * whenever a system catalog change occurs. > * > * These SnapshotData structs are static to simplify memory allocation > * (see the hack in GetSnapshotData to avoid repeated malloc/free). > */ > static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC}; > static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC}; > SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC}; > SnapshotData SnapshotSelfData = {SNAPSHOT_SELF}; > SnapshotData SnapshotAnyData = {SNAPSHOT_ANY}; > Thank you for pointing it out, I think I need to update the comment in the patch too.
Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages
Hi hackers, I would like to give more details of my patch. In postgres, it uses a global snapshot “CatalogSnapshot” to check catalog data visibility. “CatalogSnapshot” is always updated to the latest version to make the latest catalog table content visible. If there is any updating on catalog tables, to make the changes to be visible for the following commands in the current transaction, “CommandCounterIncrement”- >”AtCCI_LocalCache” ->”CommandEndInvalidationMessages ”->”LocalExecuteInvalidationMessage” ->”InvalidateCatalogSnapshot” it will invalidate the “CatalogSnapshot” by setting it to NULL. And next time, when it needs the “CatalogSnapsthot” and finds it is NULL, it will regenerate one. In a query, “CommandCounterIncrement” may be called many times, and “CatalogSnapsthot” may be destroyed and recreated many times. To reduce such overhead, instead of invalidating “CatalogSnapshot” , we can keep it and just increase the “curcid” of it. When the transaction is committed or aborted, or there are catalog invalidation messages from other backends, the “CatalogSnapshot” will be invalidated and regenerated. Sometimes, the “CatalogSnapshot” is not the same as the transaction “CurrentSnapshot”, but we can still update the CatalogSnapshot’s “curcid”, as the “curcid” only be checked when the tuple is inserted or deleted by the current transaction. Xiaoran Wang 于2023年12月7日周四 10:13写道: > Hi hackers, > > > For local invalidation messages, there is no need to call > `InvalidateCatalogSnapshot` to set the CatalogSnapshot to NULL and > rebuild it later. Instead, just update the CatalogSnapshot's `curcid` > in `SnapshotSetCommandId`, this way can make the CatalogSnapshot work > well too. > > This optimization can reduce the overhead of rebuilding CatalogSnapshot > after each command. > > > Best regards, xiaoran >
[PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages
Hi hackers, For local invalidation messages, there is no need to call `InvalidateCatalogSnapshot` to set the CatalogSnapshot to NULL and rebuild it later. Instead, just update the CatalogSnapshot's `curcid` in `SnapshotSetCommandId`, this way can make the CatalogSnapshot work well too. This optimization can reduce the overhead of rebuilding CatalogSnapshot after each command. Best regards, xiaoran 0001-Not-to-invalidate-CatalogSnapshot-for-local-invalida.patch Description: Binary data
[PATCH] update the comment in SnapshotSetCommandId
Hi, I updated the comment in 'SnapshotSetCommandId' in this patch which specifies the reason why it is not necessary to update 'curcid' of CatalogSnapshot. Best regards, xiaoran 0001-Update-the-comment-in-SnapshotSetCommandId.patch Description: 0001-Update-the-comment-in-SnapshotSetCommandId.patch
About `GetNonHistoricCatalogSnapshot`: does it allow developers to use catalog snapshot to scan non-catalog tables?
Hi, I have a question about the routine "GetNonHistoricCatalogSnapshot". It has a param "Oid relid". It firstly checks if the relation has systemcache or if it is in "RelationInvalidatesSnapshotsOnly" related relations. If yes, it will invalidate the CatalogSnapshot. I just wonder in which situation the developer tries to scan a non-catalog table by CatalogSnapshot. By the way, in the routine " SnapshotSetCommandId", there is a comment /* Should we do the same with CatalogSnapshot? */ From my point of view, there is no need to update the curcid of CatalogSnapshot, as the CatalogSnapshot will be invalidated if there are any updates on the catalog tables in the current transaction. If I misunderstood, please correct me! Best regards, xiaoran
Re: [PATCH] Use RelationClose rather than table_close in heap_create_with_catalog
Thanks for all your responses. It seems better to change the comments on the code rather than call RelationClose here. table_close(new_rel_desc, NoLock); /* do not unlock till end of xact */ Do I need to create another patch to fix the comments? Best regards, xiaoran From: tender wang Sent: Thursday, May 11, 2023 3:26 PM To: Tom Lane Cc: Bharath Rupireddy ; Xiaoran Wang ; pgsql-hackers@lists.postgresql.org Subject: Re: [PATCH] Use RelationClose rather than table_close in heap_create_with_catalog !! External Email Tom Lane mailto:t...@sss.pgh.pa.us>> 于2023年5月11日周四 00:32写道: Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> writes: > And, the /* do not unlock till end of xact */, it looks like it's been > there from day 1. It may be indicating that the ref count fo the new > relation created in heap_create_with_catalog() will be decremented at > the end of xact, but I'm not sure what it means. Hmm, I think it's been copied-and-pasted from somewhere. It's quite common for us to not release locks on modified tables until end of transaction. However, that's not what's happening here, because we actually *don't have any such lock* at this point, as you can easily prove by stepping through this code and watching the contents of pg_locks from another session. We do acquire AccessExclusiveLock on the new table eventually, but not till control returns to DefineRelation. I'm not real sure that I like the proposed code change: it's unclear to me whether it's an unwise piercing of a couple of abstraction layers or an okay change because those abstraction layers haven't yet been applied to the new relation at all. However, I think the existing comment is actively misleading and needs to be changed. Maybe something like /* * Close the relcache entry, since we return only an OID not a * relcache reference. Note that we do not yet hold any lockmanager * lock on the new rel, so there's nothing to release. */ table_close(new_rel_desc, NoLock); /* * ok, the relation has been cataloged, so close catalogs and return * the OID of the newly created relation. */ table_close(pg_class_desc, RowExclusiveLock); +1 Personally, I prefer above code. Given these comments, maybe changing the first call to RelationClose would be sensible, but I'm still not quite convinced. regards, tom lane !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
Re: [PATCH] Use RelationClose rather than table_close in heap_create_with_catalog
The routine table_close takes 2 params: Relation and LOCKMODE, it first calls RelationClose to decrease the relation cache reference count, then deals with the lock on the table based on LOCKMOD param. In heap_create_with_catalog, the Relation new_rel_desc is only a local relation cache, created by RelationBuildLocalRelation. No other processes can see this relation, as the transaction is not committed, so there is no lock on it. There is no problem to release the relation cache by table_close(new_rel_desc, NoLock) here. However, from my point of view, table_close(new_rel_desc, NoLock); /* do not unlock till end of xact */ this line is a little confusing since there is no lock on the relation at all. So I think it's better to use RelationColse here. From: tender wang Sent: Wednesday, May 10, 2023 10:57 AM To: Xiaoran Wang Cc: PostgreSQL-development Subject: Re: [PATCH] Use RelationClose rather than table_close in heap_create_with_catalog !! External Email Xiaoran Wang mailto:wxiao...@vmware.com>> 于2023年3月18日周六 15:04写道: Hi hackers, In heap_create_with_catalog, the Relation new_rel_desc is created by RelationBuildLocalRelation, not table_open. So it's better to call RelationClose to release it. Why it's better to call RelationClose? Is there a problem if using table_close()? What's more, the comment for it seems useless, just delete it. Thanks! regard, tender wang !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
[PATCH] Use RelationClose rather than table_close in heap_create_with_catalog
Hi hackers, In heap_create_with_catalog, the Relation new_rel_desc is created by RelationBuildLocalRelation, not table_open. So it's better to call RelationClose to release it. What's more, the comment for it seems useless, just delete it. Thanks! 0001-Use-RelationClose-rather-than-table_close-in-heap_cr.patch Description: 0001-Use-RelationClose-rather-than-table_close-in-heap_cr.patch
postgres_fdw: dead lock in a same transaction when postgres_fdw server is lookback
Hi, I created a postgers_fdw server lookback as the test does. Then run the following SQLs create table t1(c0 int); insert into t1 values(1); create foreign table ft1( c0 int ) SERVER loopback OPTIONS (schema_name 'public', table_name 't1'); Then started a transaction that runs queries on both t1 and ft1 tables: begin; select * from ft1; c0 1 (1 row) select * from t1; c0 1 (1 row) insert into ft1 values(2); select * from ft1; c0 1 2 (2 rows) select * from t1; c0 1 (1 row) Though t1 and ft1 actually share the same data, and in the same transaction, they have different transaction ids and different snapshots, and queries are run in different processes, they see different data. Then I attempted to run the update on them update t1 set c0=8; update ft1 set c0=9; Then the transaction got stuck. Should the "lookback" server be disabled in the postgres_fdw? Thoughts? thanks, Xiaoran
[patch] Adding an assertion to report too long hash table name
Hi, The max size for the shmem hash table name is SHMEM_INDEX_KEYSIZE - 1. but when the caller uses a longer hash table name, it doesn't report any error, instead it just uses the first SHMEM_INDEX_KEYSIZE -1 chars as the hash table name. I created some shmem hash tables with the same prefix which was longer than SHMEM_INDEX_KEYSIZE - 1, and the size of those hash tables were the same, then only one hash table was created. But I thought those hash tables were created successfully. I know this is a corner case, but it's difficult to figure it out when run into it. So I add an assertion to prevent it. Thanks, Xiaoran Adding-an-assertion-to-report-too-long-hash-table-name.patch Description: Adding-an-assertion-to-report-too-long-hash-table-name.patch
Add an assert to the length of shared hashtable name
The max size for the shared memory hash table name is SHMEM_INDEX_KEYSIZE - 1 (shared hash table name is stored and indexed by ShmemIndex hash table, the key size of it is SHMEM_INDEX_KEYSIZE), but when the caller uses a longer hash table name, it doesn't report any error, instead it just uses the first SHMEM_INDEX_KEYSIZE chars as the hash table name. When some hash tables' names have the same prefix which is longer than (SHMEM_INDEX_KEYSIZE - 1), issues will come: those hash tables actually are created as the same hash table whose name is the prefix. So add the assert to prevent it. fix_shmem_hash_table_name_length.patch Description: fix_shmem_hash_table_name_length.patch