Re: "PANIC: could not open critical system index 2662" - twice
On Tue, May 9, 2023 at 3:15 AM Michael Paquier wrote: > > On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote: > > I am able to reproduce this using the steps given above, I am also > > trying to analyze this further. I will send the update once I get > > some clue. > > Have you been able to reproduce this on HEAD or at the top of > REL_15_STABLE, or is that 15.2-ish without fa5dd46? > I am able to reproduce on REL_15_STABLE as well with your test case. The only difference is without fa5dd46 the issue gets reproduced just in a couple of seconds and very consistent OTOH on REL_15_STABLE it takes time to reproduce 2-3 mins and also reproduction is not very consistent. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 8, 2023 at 7:55 AM Michael Paquier wrote: > > On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote: > > Bug-in-PostgreSQL explanations could include that we forgot it was > > dirty, or some backend wrote it out to the wrong file; but if we were > > forgetting something like permanent or dirty, would there be a more > > systematic failure? Oh, it could require special rare timing if it is > > similar to 8a8661828's confusion about permanence level or otherwise > > somehow not setting BM_PERMANENT, but in the target blocks, so I think > > that'd require a checkpoint AND a crash. It doesn't reproduce for me, > > but perhaps more unlucky ingredients are needed. > > > > Bug-in-OS/FS explanations could include that a whole lot of writes > > were mysteriously lost in some time window, so all those files still > > contain the zeroes we write first in smgrextend(). I guess this > > previously rare (previously limited to hash indexes?) use of sparse > > file hole-punching could be a factor in an it's-all-ZFS's-fault > > explanation: > > Yes, you would need a bit of all that. > > I can reproduce the same backtrace here. That's just my usual laptop > with ext4, so this would be a Postgres bug. First, here are the four > things running in parallel so as I can get a failure in loading a > critical index when connecting: > 1) Create and drop a database with WAL_LOG as strategy and the > regression database as template: > while true; do > createdb --template=regression --strategy=wal_log testdb; > dropdb testdb; > done > 2) Feeding more data to pg_class in the middle, while testing the > connection to the database created: > while true; > do psql -c 'create table popo as select 1 as a;' regression > /dev/null > 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > psql -c 'drop table popo' regression > /dev/null 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > done; > 3) Force some checkpoints: > while true; do psql -c 'checkpoint' > /dev/null 2>&1; sleep 4; done > 4) Force a few crashes and recoveries: > while true ; do pg_ctl stop -m immediate ; pg_ctl start ; sleep 4 ; done > I am able to reproduce this using the steps given above, I am also trying to analyze this further. I will send the update once I get some clue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: log shipping with pg_receivewal
On Tue, Dec 14, 2021 at 9:55 AM Marc Millas wrote: > > Hi, > postgres 13 question, > > if I setup pg_receivewal from a primary instance to a given directory, and > then setup a standby with a restore_command pointing to that directory,, the > changes on the primary are applied on the standby at each wal change. And > this, even if I run pg_receivewal with the --synchronous option. > > My question: as the synchronous option is supposed to make pg_receivewal > write transaction immediately in the wal files, is there a way to ask the > standby to apply them on the fly ie. without waiting a wal file change ? I don't think that can be done, actually, the wal_receiver process is running in the same cluster so we can interact with that process and know exactly up to what LSN it has flush and what wal data is valid to be applied and wal receiver also continues to append more WAL to the same file. But pg_receivewal is a separate utility so this can serve as an archive location that means we can restore the complete WAL file only. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Sun, Dec 5, 2021 at 10:55 AM Dilip Kumar wrote: > > On Fri, Dec 3, 2021 at 9:02 PM Tom Lane wrote: > > > > Dilip Kumar writes: > > > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar wrote: > > >> I think there is no such view or anything which tells about which > > >> backend or transaction has more than 64 sub transaction. But if we > > >> are ready to modify the code then we can LOG that information in > > >> GetNewTransactionId(), when first time we are marking it overflown. > > > > > I have prepared a small patch to log this information. > > > > Putting an elog call into GetNewTransactionId seems like a completely > > horrid idea from a performance standpoint. Especially if you put it > > inside the XidGenLock hold, where it can block the entire system not just > > the one process. But even without that, this seems like a performance > > penalty with basically no real-world benefit. People who have issues > > like this are not going to want to trawl the postmaster log for such > > messages. > > Agreed with both points. What about we add, subxid count and overflow > status in LocalPgBackendStatus and through that, we can show in > pg_stat_activity. That way we don't have to report it ever and > whenever the user is running pg_stat_activity they can fetch it > directly from "proc->subxidStatus", along with fetching the proc.xid > and proc.xmin. Does this make sense? > The attached patch implements the idea I explained. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 622e1012667c3cfa0c71f27590e2a49833970e22 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Sun, 5 Dec 2021 17:56:16 +0530 Subject: [PATCH v1] Add subtransaction count and overflow status in pg_stat_activity If there are some backends having a lot of nested subtransaction or the subtransaction cache is overflowed there is a no way to detect that. So this patch is making that easy by adding those fields in pg_stat_activity view. --- doc/src/sgml/monitoring.sgml| 18 ++ src/backend/catalog/system_views.sql| 4 +++- src/backend/storage/ipc/sinvaladt.c | 13 + src/backend/utils/activity/backend_status.c | 4 +++- src/backend/utils/adt/pgstatfuncs.c | 13 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/sinvaladt.h | 4 +++- src/include/utils/backend_status.h | 10 ++ src/test/regress/expected/rules.out | 12 +++- 9 files changed, 68 insertions(+), 16 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 62f2a33..3eca83a 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -918,6 +918,24 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser The current backend's xmin horizon. + + + + subxact_count xid + + + The current backend's active subtransactions count. + + + + + + subxact_overflowed xid + + + Set to true if current backend's subtransaction cache is overflowed. + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 61b515c..3df23df 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -839,7 +839,9 @@ CREATE VIEW pg_stat_activity AS s.backend_xmin, S.query_id, S.query, -S.backend_type +S.backend_type, +S.subxact_count, +S.subxact_overflowed FROM pg_stat_get_activity(NULL) AS S LEFT JOIN pg_database AS D ON (S.datid = D.oid) LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid); diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c index 946bd8e..876d7fe 100644 --- a/src/backend/storage/ipc/sinvaladt.c +++ b/src/backend/storage/ipc/sinvaladt.c @@ -395,17 +395,20 @@ BackendIdGetProc(int backendID) /* * BackendIdGetTransactionIds - * Get the xid and xmin of the backend. The result may be out of date - * arbitrarily quickly, so the caller must be careful about how this - * information is used. + * Get the xid and xmin, nsubxid and overflow status of the backend. The + * result may be out of date arbitrarily quickly, so the caller must be + * careful about how this information is used. */ void -BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin) +BackendIdGetTransactionIds(int backendID, TransactionId *xid, + TransactionId *xmin, int *nsubxid, bool *overflowed) { SISeg *segP = shmInvalBuffer; *xid = InvalidTransactionId; *xmin = InvalidTransactionI
Re: Max connections reached without max connections reached
On Fri, Dec 3, 2021 at 9:02 PM Tom Lane wrote: > > Dilip Kumar writes: > > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar wrote: > >> I think there is no such view or anything which tells about which > >> backend or transaction has more than 64 sub transaction. But if we > >> are ready to modify the code then we can LOG that information in > >> GetNewTransactionId(), when first time we are marking it overflown. > > > I have prepared a small patch to log this information. > > Putting an elog call into GetNewTransactionId seems like a completely > horrid idea from a performance standpoint. Especially if you put it > inside the XidGenLock hold, where it can block the entire system not just > the one process. But even without that, this seems like a performance > penalty with basically no real-world benefit. People who have issues > like this are not going to want to trawl the postmaster log for such > messages. Agreed with both points. What about we add, subxid count and overflow status in LocalPgBackendStatus and through that, we can show in pg_stat_activity. That way we don't have to report it ever and whenever the user is running pg_stat_activity they can fetch it directly from "proc->subxidStatus", along with fetching the proc.xid and proc.xmin. Does this make sense? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar wrote: > I think there is no such view or anything which tells about which > backend or transaction has more than 64 sub transaction. But if we > are ready to modify the code then we can LOG that information in > GetNewTransactionId(), when first time we are marking it overflown. > if (nxids < PGPROC_MAX_CACHED_SUBXIDS) > { >MyProc->subxids.xids[nxids] = xid; > pg_write_barrier(); > MyProc->subxidStatus.count = substat->count = nxids + 1; > } > else > { > MyProc->subxidStatus.overflowed = substat->overflowed = true; > <-- we can log or put breakpoint here and identify which statement is > creating oeverflow--> > } > > IMHO, it is good to LOG such information if we are not already logging > this anywhere. > I have prepared a small patch to log this information. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 83aa6b7afbe50c1b2f4404f68c3f1a08bd16f7e8 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Fri, 3 Dec 2021 14:28:41 +0530 Subject: [PATCH] LOG subxid cache overflow message --- src/backend/access/transam/varsup.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index a6e98e7..4877aaa 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -245,8 +245,13 @@ GetNewTransactionId(bool isSubXact) pg_write_barrier(); MyProc->subxidStatus.count = substat->count = nxids + 1; } - else + else if (!substat->overflowed) + { + ereport(LOG, + (errmsg("subxid cache for top transaction %u, overflowed", + MyProc->xid))); MyProc->subxidStatus.overflowed = substat->overflowed = true; + } } LWLockRelease(XidGenLock); -- 1.8.3.1
Re: Max connections reached without max connections reached
On Thu, Dec 2, 2021 at 3:59 AM James Sewell wrote: > > >> Looking again into the back trace[1], it appeared that the backend is >> getting stuck while getting the subtransaction's parent information, >> and that can only happen if the snapshot is getting marked as >> overflowed. So it seems that some of the scripts are using a lot of >> sub-transaction (> 64) within a transaction and in such case, we can >> not cache the sub-transaction information and the snapshot will be >> marked overflowed. After that, for checking the visibility of any >> tuple we need to consult the SLRU to get TopMost transaction >> information and if the subtransaction SLRU cache is full then it will >> have to do the I/O. So in this case it appeared that a lot of >> parallel scans/new connections are trying to read the pg_subtrans and >> due to limited cache size that is not even able to load the data in >> the cache and getting frequently blocked. So I assume that when you >> say the startup is getting stuck forever, how long did you wait, maybe >> it is moving but getting stuck again for the next tuple, so to confirm >> that attach the process with gdb put breakpoint in XidInMVCCSnapshot() >> and check whether it is hitting that breakpoint again or not. Also, >> check whether you are seeing "SubtransControlLock" in >> pg_stat_activity. There is one nice blog[2] explaining this issue so >> you can refer to that as well. > > > Ah. Snap. This makes a lot of sense. The developers have a nasty habit of > using LOOP over rows which has never been stamped out. I bet they have > applied this to something with a lot of rows. > > Is there a way of seeing which statements / backends / processes have greater > than 64 stxns? I'd like to monitor this. I think there is no such view or anything which tells about which backend or transaction has more than 64 sub transaction. But if we are ready to modify the code then we can LOG that information in GetNewTransactionId(), when first time we are marking it overflown. if (nxids < PGPROC_MAX_CACHED_SUBXIDS) { MyProc->subxids.xids[nxids] = xid; pg_write_barrier(); MyProc->subxidStatus.count = substat->count = nxids + 1; } else { MyProc->subxidStatus.overflowed = substat->overflowed = true; <-- we can log or put breakpoint here and identify which statement is creating oeverflow--> } IMHO, it is good to LOG such information if we are not already logging this anywhere. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Tue, Nov 30, 2021 at 5:16 AM James Sewell wrote: >> >> How did you verify that, maybe some process started IO and stuck >> there? Can we check pg_stat_activity that is there some process that >> shows in the wait event as SLRURead/SLRUWrite and not coming out of >> that state? Looking again into the back trace[1], it appeared that the backend is getting stuck while getting the subtransaction's parent information, and that can only happen if the snapshot is getting marked as overflowed. So it seems that some of the scripts are using a lot of sub-transaction (> 64) within a transaction and in such case, we can not cache the sub-transaction information and the snapshot will be marked overflowed. After that, for checking the visibility of any tuple we need to consult the SLRU to get TopMost transaction information and if the subtransaction SLRU cache is full then it will have to do the I/O. So in this case it appeared that a lot of parallel scans/new connections are trying to read the pg_subtrans and due to limited cache size that is not even able to load the data in the cache and getting frequently blocked. So I assume that when you say the startup is getting stuck forever, how long did you wait, maybe it is moving but getting stuck again for the next tuple, so to confirm that attach the process with gdb put breakpoint in XidInMVCCSnapshot() and check whether it is hitting that breakpoint again or not. Also, check whether you are seeing "SubtransControlLock" in pg_stat_activity. There is one nice blog[2] explaining this issue so you can refer to that as well. [1] 21 #2 0x7f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 20 #3 0x0071c482 in PGSemaphoreLock () 19 #4 0x0079078c in LWLockAcquire () 18 #5 0x004fa9ae in SimpleLruWaitIO.isra.1 () 17 #6 0x004fb2a4 in SimpleLruReadPage () 16 #7 0x004fbc07 in SubTransGetParent () 15 #8 0x004fbc57 in SubTransGetTopmostTransaction () 14 #9 0x008f65a7 in XidInMVCCSnapshot () 13 #10 0x008f662e in HeapTupleSatisfiesMVCC () 12 #11 0x004c436e in heapgetpage () [2]https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/ -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Thu, Nov 25, 2021 at 10:53 AM James Sewell wrote: > > >> > Seems like some of the processes are taking a long time or stuck while >> > reading/writing SLRU pages, and due to that while creating a new >> > connection the backend process is not able to check the transaction >> > status (from pg_xlog) of the pg_class tuple and gets stuck/taking a >> > long time in a startup. >> >> My next question would be whether this particular process shown is >> stack is stuck forever or finally, it is able to make a connection? I >> want to understand is this just due to slow I/O or something else? > > > They don't ever seem to clear, (more and more show up) and IO doesn't seem > to be deadlocked at that time. How did you verify that, maybe some process started IO and stuck there? Can we check pg_stat_activity that is there some process that shows in the wait event as SLRURead/SLRUWrite and not coming out of that state? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar wrote: > > Does that shed any light? > > Seems like some of the processes are taking a long time or stuck while > reading/writing SLRU pages, and due to that while creating a new > connection the backend process is not able to check the transaction > status (from pg_xlog) of the pg_class tuple and gets stuck/taking a > long time in a startup. Correction, I mean pg_xact (pg_clog in older version) not pg_xlog :) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Max connections reached without max connections reached
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar wrote: > > On Thu, Nov 25, 2021 at 8:58 AM James Sewell > wrote: > >> > >> The hypothesis I'm thinking of is that incoming sessions are being blocked > >> somewhere before they can acquire a ProcArray entry; if so, they'd not > >> show up in either pg_stat_activity or pg_locks. What we have to look for > >> then is evidence of somebody holding a strong lock on a shared relation. > >> Try "select * from pg_locks where locktype = 'relation' and database = 0". > >> > > > > That doesn't show anything when the issue is happening. > > > > Strace shows the startup processes looping like so: > > > > 13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=733559666}) = 0 > > 13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1 > > 13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=734037545}) = 0 > > 13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=743136450}) = 0 > > 13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=743376746}) = 0 > > 13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=761920081}) = 0 > > 13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1 > > 13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=762298276}) = 0 > > 13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=769929341}) = 0 > > 13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1 > > 13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=770524120}) = 0 > > 13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN > > (Resource temporarily unavailable) > > 13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN > > (Resource temporarily unavailable) > > 13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=850204434}) = 0 > > 13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1 > > 13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16 > > 13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072 > > 13:51:15.628461 read(16, > > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 8192) = 8192 > > 13:51:15.628677 close(16) = 0 > > 13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16 > > 13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880 > > 13:51:15.629309 read(16, > > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 8192) = 8192 > > 13:51:15.629503 close(16) = 0 > > 13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=852272645}) = 0 > > 13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=879314944}) = 0 > > 13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1 > > 13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16 > > 13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728 > > 13:51:15.657638 read(16, > > "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., > > 8192) = 8192 > > 13:51:15.657907 close(16) = 0 > > 13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=880753952}) = 0 > > 13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > > 13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > > tv_nsec=912331058}) = 0 > > 13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1 > > 13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1 > > 13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16 > > 13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728 > > 13:51:15.691037 read(16, > > "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"..., > > 8192) = 8192 > > 13:51:15.691303 close(16) = 0 > > 13:51:15.691582 clock_
Re: Max connections reached without max connections reached
IC, {tv_sec=19439358, > tv_nsec=939899218}) = 0 > 13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > 13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0 > 13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, > tv_nsec=969020657}) = 0 > 13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1 > 13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16 > > > While gdb shows similar to: > >24 ^[[?1034h(gdb) bt >23 #0 0x7f18fb9deb3b in do_futex_wait.constprop.1 () from > /lib64/libpthread.so.0 >22 #1 0x7f18fb9debcf in __new_sem_wait_slow.constprop.0 () from > /lib64/libpthread.so.0 >21 #2 0x7f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from > /lib64/libpthread.so.0 >20 #3 0x0071c482 in PGSemaphoreLock () >19 #4 0x0079078c in LWLockAcquire () >18 #5 0x004fa9ae in SimpleLruWaitIO.isra.1 () >17 #6 0x004fb2a4 in SimpleLruReadPage () >16 #7 0x004fbc07 in SubTransGetParent () >15 #8 0x004fbc57 in SubTransGetTopmostTransaction () >14 #9 0x008f65a7 in XidInMVCCSnapshot () >13 #10 0x008f662e in HeapTupleSatisfiesMVCC () >12 #11 0x004c436e in heapgetpage () >11 #12 0x004c58e4 in heap_getnext () >10 #13 0x008ab8b5 in ScanPgRelation () > 9 #14 0x008ae124 in RelationBuildDesc () > 8 #15 0x008b010e in load_critical_index () > 7 #16 0x008b39ac in RelationCacheInitializePhase3 () > 6 #17 0x008cbd56 in InitPostgres () > 5 #18 0x0000007a1cfe in PostgresMain () > 4 #19 0x0048a9c0 in ServerLoop () > 3 #20 0x0072f11e in PostmasterMain () > 2 #21 0x0048b854 in main () > > Does that shed any light? Seems like some of the processes are taking a long time or stuck while reading/writing SLRU pages, and due to that while creating a new connection the backend process is not able to check the transaction status (from pg_xlog) of the pg_class tuple and gets stuck/taking a long time in a startup. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: WAL File Recovery on Standby Server Stops Before End of WAL Files
On Thu, Oct 28, 2021 at 7:28 AM Kyotaro Horiguchi wrote: > > At Wed, 27 Oct 2021 16:42:52 +, "Ryan, Les" wrote in > > 2021-10-27 10:26:31.467 MDT [2012] LOG: redo starts at 419/5229A858 > ... > > 2021-10-27 10:26:36.188 MDT [2012] LOG: restored log file > > "00010419005A" from archive > > 2021-10-27 10:26:36.750 MDT [2012] LOG: consistent recovery state reached > > at 419/5AB8 > > 2021-10-27 10:26:36.752 MDT [6204] LOG: database system is ready to accept > > read only connections > > 2021-10-27 10:26:36.823 MDT [6040] LOG: started streaming WAL from primary > > at 419/5A00 on timeline 1 > > > > * There are many more WAL files available starting with > > 00010419005B but the restore process always stops at > > 00010419005A. > > > > I have two questions: > > > > * Why does the WAL file recovery process now stop after it reads > > 00010419005A? > > * What do I need to do to get PostgreSQL to recover the rest of the > > available WAL files. > > The info above alone donesn't clearly suggest anything about the > reason. Could you show the result of "pg_waldump > 00010419005A 2>&1 | tail -5"? What I'm expecting to see > is an error message from pg_waldump before the end of the file. It > would be the immediate cause of the problem. +1, that will be the best place to start with, additionally, you can enable DEBUG2 message so that from logs we can identify why it could not continue recovery from the archive. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re:
On Mon, May 17, 2021 at 7:37 PM David G. Johnston wrote: > > On Monday, May 17, 2021, Tom Lane wrote: >> >> "David G. Johnston" writes: >> > On Sunday, May 16, 2021, Loles wrote: >> >> I don't understand why the server starts that process if I'm not using >> >> replication of any kind. >> >> > It starts the process because at any time someone can execute a “create >> > subscription” command. If no one does so the amount of time spent >> > confirming the subscriptions table is empty is trivial. >> >> It looks like it won't be started if you set >> max_logical_replication_workers = 0. >> > > I was wondering about that. Would you mind posting a link to the code where > that is checked? I actually looked though admittedly not that hard. > /* * ApplyLauncherRegister * Register a background worker running the logical replication launcher. */ void ApplyLauncherRegister(void) { BackgroundWorker bgw; if (max_logical_replication_workers == 0) return; -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re:
On Mon, May 17, 2021 at 12:18 AM Loles wrote: > > Hi! > > Let's see if someone can answer my question. > > In a default PostgreSQL 13 installation, starting the server starts a process > called "logical replication launcher". > > I know what logical replication is but I don't know what this process does on > the server. > > Even if you change the default setting of wal_level to minimal, the process > continues to start after the server is restarted so that it takes the change > in the wal_level parameter. This process launches the worker process for each subscription for applying the logical replication changes. It does not matter what is the wal level, because wal level should be logical to create a publisher but this is used for launching the worker for the subscriber. So in short this process will scan the subscriber and launches the apply worker if there is no worker corresponding to some subscription. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Race condition with restore_command on streaming replica
On Thu, Nov 5, 2020 at 1:39 AM Brad Nicholson wrote: > > Hi, > > I've recently been seeing a race condition with the restore_command on > replicas using streaming replication. > > On the primary, we are archiving wal files to s3 compatible storage via > pgBackRest. In the recovery.conf section of the postgresql.conf file on the > replicas, we define the restore command as follows: > > restore_command = '/usr/bin/pgbackrest --config > /conf/postgresql/pgbackrest_restore.conf --stanza=formation archive-get %f > "%p"' > > We have a three member setup - m-0, m-1, m-2. Consider the case where m-0 is > the Primary and m-1 and m-2 are replicas connected to the m-0. > > When issuing a switchover (via Patroni) from m-0 to m-1, the connection from > m-2 to m-0 is terminated. The restore_command on m-2 is run, and it looks for > the .history file for the new timeline. If this happens before the history > file is created and pushed to the archive, m-2 will look for the next wal > file on the existing timeline in the archive. It will never be created as the > source has moved on, so this m-2 hangs waiting on that file. The > restore_command on the replica looking for this non-existent file is only run > once. This seems like an odd state to be in. The replica is waiting on a new > file, but it's not actually looking for it. Is this expected, or should the > restore_command be polling for that file? I am not sure how Patroni does it internally, can you explain the scenario in more detail? Suppose you are executing the promote on m-1 and if the promotion is successful it will switch the timeline and it will create the timeline history file. Now, once the promotion is successful if we change the primary_conninfo on the m-2 then it will restart the walsender and look for the latest .history file which it should find either from direct streaming or through the restore_command. If you are saying that m-2 tried to look for the history file before m-1 created it then it seems like you change the primary_conninfo on m-2 before the m-1 promotion got completed. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: 12.2: Howto check memory-leak in worker?
y appreciated (Dtrace > should work) - processes will usually fail with OOM at this size, due > to machine configuration - I'm waiting for that now (it is a very very > old pentium3 machine ;) ). One idea is that you can attach your process in gdb and call MemoryContextStats(TopMemoryContext). This will show which context is using how much memory. So basically u can call this function 2-3 times with some interval and see in which context the memory is continuously increasing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Help needed with XMLTABLE
On Wed, Apr 15, 2020 at 12:56 PM Pavel Stehule wrote: > st 15. 4. 2020 v 7:32 odesílatel Dilip Kumar napsal: >> >> One of our customers tried to use XMLTABLE syntax without >> row_expression, which works fine with ORACLE but doesn't work with >> PostgreSQL. So can anyone suggest what alternative we can use? >> >> CREATE TABLE user_pool_clean ( >> fk_user_pool_pk bytea NOT NULL, >> user_id character varying(255) NOT NULL, >> email_address character varying(250), >> is_mil numeric, >> is_civ numeric, >> is_ctr numeric, >> is_gov numeric, >> is_edu numeric, >> role_id character varying(50), >> user_profile_id character varying(50), >> service_branch_id character varying(50), >> mil_pay_grade_id character varying(50), >> my_auds character varying(4000), >> my_orgs character varying(4000), >> processed character(1) DEFAULT 'N'::bpchar NOT NULL >> ); >> >> insert into user_pool_clean >> values('995CECDC1881375DE05312A270C7CF56','10015706','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y'); >> >> insert into user_pool_clean >> values('995CECDC1905375DE05312A270C7CF56','10015848','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11',' >> 1705562,1708486','1710621','Y'); >> >> >> SQL> SELECT upc.is_mil,TRIM(column_value) src >> FROM user_pool_clean upc >>,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"')); > > > It's very famous how people are creative - when xmltable has only one > argument, then it is evaluated like XQuery expression. > > https://stewashton.wordpress.com/2016/08/01/splitting-strings-surprise/ > > Unfortunately, Postgres has not support of XQuery, so nothing similar is > possible. Fortunately, Postgres has much better tools for parsing string. Thanks for the help, Pavel. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Help needed with XMLTABLE
One of our customers tried to use XMLTABLE syntax without row_expression, which works fine with ORACLE but doesn't work with PostgreSQL. So can anyone suggest what alternative we can use? CREATE TABLE user_pool_clean ( fk_user_pool_pk bytea NOT NULL, user_id character varying(255) NOT NULL, email_address character varying(250), is_mil numeric, is_civ numeric, is_ctr numeric, is_gov numeric, is_edu numeric, role_id character varying(50), user_profile_id character varying(50), service_branch_id character varying(50), mil_pay_grade_id character varying(50), my_auds character varying(4000), my_orgs character varying(4000), processed character(1) DEFAULT 'N'::bpchar NOT NULL ); insert into user_pool_clean values('995CECDC1881375DE05312A270C7CF56','10015706','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y'); insert into user_pool_clean values('995CECDC1905375DE05312A270C7CF56','10015848','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11',' 1705562,1708486','1710621','Y'); SQL> SELECT upc.is_mil,TRIM(column_value) src FROM user_pool_clean upc ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"')); IS_MIL SRC -- --- 0 1705562 O/P from the oracle database 01708486 01706882 postgres[7604]=# SELECT upc.is_mil,TRIM(column_value) src postgres-# FROM user_pool_clean upc postgres-#,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"')); ERROR: syntax error at or near ")" LINE 3: ... ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"')); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Worse performance with higher work_mem?
On Tue, Jan 14, 2020 at 5:29 AM Israel Brewster wrote: > > I was working on diagnosing a “slow” (about 6 second run time) query: > > SELECT > to_char(bucket,'-MM-DD"T"HH24:MI:SS') as dates, > x_tilt, > y_tilt, > rot_x, > rot_y, > date_part('epoch', bucket) as timestamps, > temp > FROM > (SELECT > time_bucket('1 week', read_time) as bucket, > avg(tilt_x::float) as x_tilt, > avg(tilt_y::float) as y_tilt, > avg(rot_x::float) as rot_x, > avg(rot_y::float) as rot_y, > avg(temperature::float) as temp > FROM tilt_data > WHERE station='c08883c0-fbe5-11e9-bd6e-aec49259cebb' > AND read_time::date<='2020-01-13'::date > GROUP BY bucket) s1 > ORDER BY bucket; > > In looking at the explain analyze output, I noticed that it had an “external > merge Disk” sort going on, accounting for about 1 second of the runtime > (explain analyze output here: https://explain.depesz.com/s/jx0q). Since the > machine has plenty of RAM available, I went ahead and increased the work_mem > parameter. Whereupon the query plan got much simpler, and performance of said > query completely tanked, increasing to about 15.5 seconds runtime > (https://explain.depesz.com/s/Kl0S), most of which was in a HashAggregate. > > I am running PostgreSQL 11.6 on a machine with 128GB of ram (so, like I said, > plenty of RAM) > > How can I fix this? Thanks. I have noticed that after increasing the work_mem your plan has switched from a parallel plan to a non-parallel plan. Basically, earlier it was getting executed with 3 workers. And, after it becomes non-parallel plan execution time is 3x. For the analysis can we just reduce the value of parallel_tuple_cost and parallel_setup_cost and see how it behaves? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas wrote: > On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar > wrote: > > I think BitmapHeapScan check whether dsa is valid or not if DSA is not > > valid then it should assume it's non-parallel plan. > > > > Attached patch should fix the issue. > > So, create the pstate and then pretend we didn't? Why not just avoid > creating it in the first place, like this? > This is better way to fix it. > > I haven't checked whether this fixes the bug, but if it does, we can > avoid introducing an extra branch in BitmapHeapNext. With my test it's fixing the problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro wrote: > On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa > wrote: > > As for the crash. I dug up the initial log and it looks like a > segmentation > > fault... > > > > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: > too > > many dynamic shared memory segments > > I think there are two failure modes: one of your sessions showed the > "too many ..." error (that's good, ran out of slots and said so and > our error machinery worked as it should), and another crashed with a > segfault, because it tried to use a NULL "area" pointer (bad). I > think this is a degenerate case where we completely failed to launch > parallel query, but we ran the parallel query plan anyway and this > code thinks that the DSA is available. Oops. > I think BitmapHeapScan check whether dsa is valid or not if DSA is not valid then it should assume it's non-parallel plan. Attached patch should fix the issue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com bug_fix_in_pbhs_when_dsa_not_initialized.patch Description: Binary data