Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Dilip Kumar
On Tue, May 9, 2023 at 3:15 AM Michael Paquier  wrote:
>
> On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote:
> > I am able to reproduce this using the steps given above, I am also
> > trying to analyze this further.  I will send the update once I get
> > some clue.
>
> Have you been able to reproduce this on HEAD or at the top of
> REL_15_STABLE, or is that 15.2-ish without fa5dd46?
>

I am able to reproduce on REL_15_STABLE as well with your test case.
The only difference is without fa5dd46 the issue gets reproduced just
in a couple of seconds and very consistent OTOH on REL_15_STABLE it
takes time to reproduce 2-3 mins and also reproduction is not very
consistent.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Dilip Kumar
On Mon, May 8, 2023 at 7:55 AM Michael Paquier  wrote:
>
> On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote:
> > Bug-in-PostgreSQL explanations could include that we forgot it was
> > dirty, or some backend wrote it out to the wrong file; but if we were
> > forgetting something like permanent or dirty, would there be a more
> > systematic failure?  Oh, it could require special rare timing if it is
> > similar to 8a8661828's confusion about permanence level or otherwise
> > somehow not setting BM_PERMANENT, but in the target blocks, so I think
> > that'd require a checkpoint AND a crash.  It doesn't reproduce for me,
> > but perhaps more unlucky ingredients are needed.
> >
> > Bug-in-OS/FS explanations could include that a whole lot of writes
> > were mysteriously lost in some time window, so all those files still
> > contain the zeroes we write first in smgrextend().  I guess this
> > previously rare (previously limited to hash indexes?) use of sparse
> > file hole-punching could be a factor in an it's-all-ZFS's-fault
> > explanation:
>
> Yes, you would need a bit of all that.
>
> I can reproduce the same backtrace here.  That's just my usual laptop
> with ext4, so this would be a Postgres bug.  First, here are the four
> things running in parallel so as I can get a failure in loading a
> critical index when connecting:
> 1) Create and drop a database with WAL_LOG as strategy and the
> regression database as template:
> while true; do
>   createdb --template=regression --strategy=wal_log testdb;
>   dropdb testdb;
> done
> 2) Feeding more data to pg_class in the middle, while testing the
> connection to the database created:
> while true;
>   do psql -c 'create table popo as select 1 as a;' regression > /dev/null 
> 2>&1 ;
>   psql testdb -c "select 1" > /dev/null 2>&1 ;
>   psql -c 'drop table popo' regression > /dev/null 2>&1 ;
>   psql testdb -c "select 1" > /dev/null 2>&1 ;
> done;
> 3) Force some checkpoints:
> while true; do psql -c 'checkpoint' > /dev/null 2>&1; sleep 4; done
> 4) Force a few crashes and recoveries:
> while true ; do pg_ctl stop -m immediate ; pg_ctl start ; sleep 4 ; done
>

I am able to reproduce this using the steps given above, I am also
trying to analyze this further.  I will send the update once I get
some clue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: log shipping with pg_receivewal

2021-12-13 Thread Dilip Kumar
On Tue, Dec 14, 2021 at 9:55 AM Marc Millas  wrote:
>
> Hi,
> postgres 13 question,
>
> if I setup pg_receivewal from a primary instance to a given directory, and 
> then setup a standby with a restore_command pointing to that directory,, the 
> changes on the primary are applied on the standby at each wal change. And 
> this, even if I run pg_receivewal with the --synchronous option.
>
> My question: as the synchronous option is supposed to make pg_receivewal 
> write transaction immediately in the wal files, is there a way to ask the 
> standby to apply them on the fly ie. without waiting a wal file change ?

I don't think that can be done, actually, the wal_receiver process is
running in the same cluster so we can interact with that process and
know exactly up to what LSN it has flush and what wal data is valid to
be applied and wal receiver also continues to append more WAL to the
same file.  But  pg_receivewal is a separate utility so this can serve
as an archive location that means we can restore the complete WAL file
only.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-12-05 Thread Dilip Kumar
On Sun, Dec 5, 2021 at 10:55 AM Dilip Kumar  wrote:
>
> On Fri, Dec 3, 2021 at 9:02 PM Tom Lane  wrote:
> >
> > Dilip Kumar  writes:
> > > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar  wrote:
> > >> I think there is no such view or anything which tells about which
> > >> backend or transaction has more than 64 sub transaction.  But if we
> > >> are ready to modify the code then we can LOG that information in
> > >> GetNewTransactionId(), when first time we are marking it overflown.
> >
> > > I have prepared a small patch to log this information.
> >
> > Putting an elog call into GetNewTransactionId seems like a completely
> > horrid idea from a performance standpoint.  Especially if you put it
> > inside the XidGenLock hold, where it can block the entire system not just
> > the one process.  But even without that, this seems like a performance
> > penalty with basically no real-world benefit.  People who have issues
> > like this are not going to want to trawl the postmaster log for such
> > messages.
>
> Agreed with both points.  What about we add, subxid count and overflow
> status in LocalPgBackendStatus and through that, we can show in
> pg_stat_activity.  That way we don't have to report it ever and
> whenever the user is running pg_stat_activity they can fetch it
> directly from "proc->subxidStatus", along with fetching the proc.xid
> and proc.xmin.  Does this make sense?
>

The attached patch implements the idea I explained.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 622e1012667c3cfa0c71f27590e2a49833970e22 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sun, 5 Dec 2021 17:56:16 +0530
Subject: [PATCH v1] Add subtransaction count and overflow status in
 pg_stat_activity

If there are some backends having a lot of nested subtransaction
or the subtransaction cache is overflowed there is a no way to
detect that.  So this patch is making that easy by adding those
fields in pg_stat_activity view.
---
 doc/src/sgml/monitoring.sgml| 18 ++
 src/backend/catalog/system_views.sql|  4 +++-
 src/backend/storage/ipc/sinvaladt.c | 13 +
 src/backend/utils/activity/backend_status.c |  4 +++-
 src/backend/utils/adt/pgstatfuncs.c | 13 -
 src/include/catalog/pg_proc.dat |  6 +++---
 src/include/storage/sinvaladt.h |  4 +++-
 src/include/utils/backend_status.h  | 10 ++
 src/test/regress/expected/rules.out | 12 +++-
 9 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a33..3eca83a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -918,6 +918,24 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
The current backend's xmin horizon.
   
  
+
+ 
+  
+   subxact_count xid
+  
+  
+   The current backend's active subtransactions count.
+  
+ 
+
+ 
+  
+   subxact_overflowed xid
+  
+  
+   Set to true if current backend's subtransaction cache is overflowed.
+  
+ 
 
 
  
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 61b515c..3df23df 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -839,7 +839,9 @@ CREATE VIEW pg_stat_activity AS
 s.backend_xmin,
 S.query_id,
 S.query,
-S.backend_type
+S.backend_type,
+S.subxact_count,
+S.subxact_overflowed
 FROM pg_stat_get_activity(NULL) AS S
 LEFT JOIN pg_database AS D ON (S.datid = D.oid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 946bd8e..876d7fe 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -395,17 +395,20 @@ BackendIdGetProc(int backendID)
 
 /*
  * BackendIdGetTransactionIds
- *		Get the xid and xmin of the backend. The result may be out of date
- *		arbitrarily quickly, so the caller must be careful about how this
- *		information is used.
+ *		Get the xid and xmin, nsubxid and overflow status of the backend. The
+ *		result may be out of date arbitrarily quickly, so the caller must be
+ *		careful about how this information is used.
  */
 void
-BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
+BackendIdGetTransactionIds(int backendID, TransactionId *xid,
+		   TransactionId *xmin, int *nsubxid, bool *overflowed)
 {
 	SISeg	   *segP = shmInvalBuffer;
 
 	*xid = InvalidTransactionId;
 	*xmin = InvalidTransactionI

Re: Max connections reached without max connections reached

2021-12-04 Thread Dilip Kumar
On Fri, Dec 3, 2021 at 9:02 PM Tom Lane  wrote:
>
> Dilip Kumar  writes:
> > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar  wrote:
> >> I think there is no such view or anything which tells about which
> >> backend or transaction has more than 64 sub transaction.  But if we
> >> are ready to modify the code then we can LOG that information in
> >> GetNewTransactionId(), when first time we are marking it overflown.
>
> > I have prepared a small patch to log this information.
>
> Putting an elog call into GetNewTransactionId seems like a completely
> horrid idea from a performance standpoint.  Especially if you put it
> inside the XidGenLock hold, where it can block the entire system not just
> the one process.  But even without that, this seems like a performance
> penalty with basically no real-world benefit.  People who have issues
> like this are not going to want to trawl the postmaster log for such
> messages.

Agreed with both points.  What about we add, subxid count and overflow
status in LocalPgBackendStatus and through that, we can show in
pg_stat_activity.  That way we don't have to report it ever and
whenever the user is running pg_stat_activity they can fetch it
directly from "proc->subxidStatus", along with fetching the proc.xid
and proc.xmin.  Does this make sense?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-12-03 Thread Dilip Kumar
On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar  wrote:

> I think there is no such view or anything which tells about which
> backend or transaction has more than 64 sub transaction.  But if we
> are ready to modify the code then we can LOG that information in
> GetNewTransactionId(), when first time we are marking it overflown.
> if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
> {
>MyProc->subxids.xids[nxids] = xid;
>   pg_write_barrier();
>   MyProc->subxidStatus.count = substat->count = nxids + 1;
> }
> else
> {
>   MyProc->subxidStatus.overflowed = substat->overflowed = true;
>  <-- we can log or put breakpoint here and identify which statement is
> creating oeverflow-->
> }
>
> IMHO, it is good to LOG such information if we are not already logging
> this anywhere.
>

I have prepared a small patch to log this information.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 83aa6b7afbe50c1b2f4404f68c3f1a08bd16f7e8 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Fri, 3 Dec 2021 14:28:41 +0530
Subject: [PATCH] LOG subxid cache overflow message

---
 src/backend/access/transam/varsup.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index a6e98e7..4877aaa 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -245,8 +245,13 @@ GetNewTransactionId(bool isSubXact)
 			pg_write_barrier();
 			MyProc->subxidStatus.count = substat->count = nxids + 1;
 		}
-		else
+		else if (!substat->overflowed)
+		{
+			ereport(LOG,
+	(errmsg("subxid cache for top transaction %u, overflowed",
+			MyProc->xid)));
 			MyProc->subxidStatus.overflowed = substat->overflowed = true;
+		}
 	}
 
 	LWLockRelease(XidGenLock);
-- 
1.8.3.1



Re: Max connections reached without max connections reached

2021-12-01 Thread Dilip Kumar
On Thu, Dec 2, 2021 at 3:59 AM James Sewell  wrote:
>
>
>> Looking again into the back trace[1], it appeared that the backend is
>> getting stuck while getting the subtransaction's parent information,
>> and that can only happen if the snapshot is getting marked as
>> overflowed.  So it seems that some of the scripts are using a lot of
>> sub-transaction (> 64) within a transaction and in such case, we can
>> not cache the sub-transaction information and the snapshot will be
>> marked overflowed.  After that, for checking the visibility of any
>> tuple we need to consult the SLRU to get TopMost transaction
>> information and if the subtransaction SLRU cache is full then it will
>> have to do the I/O.  So in this case it appeared that a lot of
>> parallel scans/new connections are trying to read the pg_subtrans and
>> due to limited cache size that is not even able to load the data in
>> the cache and getting frequently blocked.  So I assume that when you
>> say the startup is getting stuck forever, how long did you wait, maybe
>> it is moving but getting stuck again for the next tuple, so to confirm
>> that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
>> and check whether it is hitting that breakpoint again or not.  Also,
>> check whether you are seeing "SubtransControlLock" in
>> pg_stat_activity.  There is one nice blog[2] explaining this issue so
>> you can refer to that as well.
>
>
> Ah. Snap. This makes a lot of sense. The developers have a nasty habit of 
> using LOOP over rows which has never been stamped out. I bet they have 
> applied this to something with a lot of rows.
>
> Is there a way of seeing which statements / backends / processes have greater 
> than 64 stxns? I'd like to monitor this.

I think there is no such view or anything which tells about which
backend or transaction has more than 64 sub transaction.  But if we
are ready to modify the code then we can LOG that information in
GetNewTransactionId(), when first time we are marking it overflown.
if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
{
   MyProc->subxids.xids[nxids] = xid;
  pg_write_barrier();
  MyProc->subxidStatus.count = substat->count = nxids + 1;
}
else
{
  MyProc->subxidStatus.overflowed = substat->overflowed = true;
 <-- we can log or put breakpoint here and identify which statement is
creating oeverflow-->
}

IMHO, it is good to LOG such information if we are not already logging
this anywhere.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-12-01 Thread Dilip Kumar
On Tue, Nov 30, 2021 at 5:16 AM James Sewell  wrote:
>>
>> How did you verify that, maybe some process started IO and stuck
>> there? Can we check pg_stat_activity that is there some process that
>> shows in the wait event as SLRURead/SLRUWrite and not coming out of
>> that state?

Looking again into the back trace[1], it appeared that the backend is
getting stuck while getting the subtransaction's parent information,
and that can only happen if the snapshot is getting marked as
overflowed.  So it seems that some of the scripts are using a lot of
sub-transaction (> 64) within a transaction and in such case, we can
not cache the sub-transaction information and the snapshot will be
marked overflowed.  After that, for checking the visibility of any
tuple we need to consult the SLRU to get TopMost transaction
information and if the subtransaction SLRU cache is full then it will
have to do the I/O.  So in this case it appeared that a lot of
parallel scans/new connections are trying to read the pg_subtrans and
due to limited cache size that is not even able to load the data in
the cache and getting frequently blocked.  So I assume that when you
say the startup is getting stuck forever, how long did you wait, maybe
it is moving but getting stuck again for the next tuple, so to confirm
that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
and check whether it is hitting that breakpoint again or not.  Also,
check whether you are seeing "SubtransControlLock" in
pg_stat_activity.  There is one nice blog[2] explaining this issue so
you can refer to that as well.


[1]
   21 #2  0x7f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from
/lib64/libpthread.so.0
   20 #3  0x0071c482 in PGSemaphoreLock ()
   19 #4  0x0079078c in LWLockAcquire ()
   18 #5  0x004fa9ae in SimpleLruWaitIO.isra.1 ()
   17 #6  0x004fb2a4 in SimpleLruReadPage ()
   16 #7  0x004fbc07 in SubTransGetParent ()
   15 #8  0x004fbc57 in SubTransGetTopmostTransaction ()
   14 #9  0x008f65a7 in XidInMVCCSnapshot ()
   13 #10 0x008f662e in HeapTupleSatisfiesMVCC ()
   12 #11 0x004c436e in heapgetpage ()

[2]https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-11-24 Thread Dilip Kumar
On Thu, Nov 25, 2021 at 10:53 AM James Sewell  wrote:
>
>
>> > Seems like some of the processes are taking a long time or stuck while
>> > reading/writing SLRU pages, and due to that while creating a new
>> > connection the backend process is not able to check the transaction
>> > status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
>> > long time in a startup.
>>
>> My next question would be whether this particular process shown is
>> stack is stuck forever or finally, it is able to make a connection?  I
>> want to understand is this just due to slow I/O or something else?
>
>
> They don't ever seem to clear, (more and more show up)  and IO doesn't seem 
> to be deadlocked at that time.

How did you verify that, maybe some process started IO and stuck
there? Can we check pg_stat_activity that is there some process that
shows in the wait event as SLRURead/SLRUWrite and not coming out of
that state?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-11-24 Thread Dilip Kumar
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar  wrote:

> > Does that shed any light?
>
> Seems like some of the processes are taking a long time or stuck while
> reading/writing SLRU pages, and due to that while creating a new
> connection the backend process is not able to check the transaction
> status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
> long time in a startup.

Correction, I mean pg_xact (pg_clog in older version) not pg_xlog :)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Max connections reached without max connections reached

2021-11-24 Thread Dilip Kumar
On Thu, Nov 25, 2021 at 9:50 AM Dilip Kumar  wrote:
>
> On Thu, Nov 25, 2021 at 8:58 AM James Sewell  
> wrote:
> >>
> >> The hypothesis I'm thinking of is that incoming sessions are being blocked
> >> somewhere before they can acquire a ProcArray entry; if so, they'd not
> >> show up in either pg_stat_activity or pg_locks.  What we have to look for
> >> then is evidence of somebody holding a strong lock on a shared relation.
> >> Try "select * from pg_locks where locktype = 'relation' and database = 0".
> >>
> >
> > That doesn't show anything when the issue is happening.
> >
> > Strace shows the startup processes looping like so:
> >
> > 13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=733559666}) = 0
> > 13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1
> > 13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=734037545}) = 0
> > 13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=743136450}) = 0
> > 13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=743376746}) = 0
> > 13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=761920081}) = 0
> > 13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1
> > 13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=762298276}) = 0
> > 13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=769929341}) = 0
> > 13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1
> > 13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=770524120}) = 0
> > 13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN 
> > (Resource temporarily unavailable)
> > 13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN 
> > (Resource temporarily unavailable)
> > 13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=850204434}) = 0
> > 13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16
> > 13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072
> > 13:51:15.628461 read(16, 
> > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> > 8192) = 8192
> > 13:51:15.628677 close(16)   = 0
> > 13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16
> > 13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880
> > 13:51:15.629309 read(16, 
> > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> > 8192) = 8192
> > 13:51:15.629503 close(16)   = 0
> > 13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=852272645}) = 0
> > 13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=879314944}) = 0
> > 13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16
> > 13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728
> > 13:51:15.657638 read(16, 
> > "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"...,
> >  8192) = 8192
> > 13:51:15.657907 close(16)   = 0
> > 13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=880753952}) = 0
> > 13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> > 13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> > tv_nsec=912331058}) = 0
> > 13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1
> > 13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16
> > 13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728
> > 13:51:15.691037 read(16, 
> > "\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"...,
> >  8192) = 8192
> > 13:51:15.691303 close(16)   = 0
> > 13:51:15.691582 clock_

Re: Max connections reached without max connections reached

2021-11-24 Thread Dilip Kumar
IC, {tv_sec=19439358, 
> tv_nsec=939899218}) = 0
> 13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
> 13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358, 
> tv_nsec=969020657}) = 0
> 13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1
> 13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16
>
>
> While gdb shows similar to:
>
>24 ^[[?1034h(gdb) bt
>23 #0  0x7f18fb9deb3b in do_futex_wait.constprop.1 () from 
> /lib64/libpthread.so.0
>22 #1  0x7f18fb9debcf in __new_sem_wait_slow.constprop.0 () from 
> /lib64/libpthread.so.0
>21 #2  0x7f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from 
> /lib64/libpthread.so.0
>20 #3  0x0071c482 in PGSemaphoreLock ()
>19 #4  0x0079078c in LWLockAcquire ()
>18 #5  0x004fa9ae in SimpleLruWaitIO.isra.1 ()
>17 #6  0x004fb2a4 in SimpleLruReadPage ()
>16 #7  0x004fbc07 in SubTransGetParent ()
>15 #8  0x004fbc57 in SubTransGetTopmostTransaction ()
>14 #9  0x008f65a7 in XidInMVCCSnapshot ()
>13 #10 0x008f662e in HeapTupleSatisfiesMVCC ()
>12 #11 0x004c436e in heapgetpage ()
>11 #12 0x004c58e4 in heap_getnext ()
>10 #13 0x008ab8b5 in ScanPgRelation ()
> 9 #14 0x008ae124 in RelationBuildDesc ()
> 8 #15 0x008b010e in load_critical_index ()
> 7 #16 0x008b39ac in RelationCacheInitializePhase3 ()
> 6 #17 0x008cbd56 in InitPostgres ()
> 5 #18 0x0000007a1cfe in PostgresMain ()
> 4 #19 0x0048a9c0 in ServerLoop ()
> 3 #20 0x0072f11e in PostmasterMain ()
> 2 #21 0x0048b854 in main ()
>
> Does that shed any light?

Seems like some of the processes are taking a long time or stuck while
reading/writing SLRU pages, and due to that while creating a new
connection the backend process is not able to check the transaction
status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
long time in a startup.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: WAL File Recovery on Standby Server Stops Before End of WAL Files

2021-10-27 Thread Dilip Kumar
On Thu, Oct 28, 2021 at 7:28 AM Kyotaro Horiguchi
 wrote:
>
> At Wed, 27 Oct 2021 16:42:52 +, "Ryan, Les"  wrote in
> > 2021-10-27 10:26:31.467 MDT [2012] LOG:  redo starts at 419/5229A858
> ...
> > 2021-10-27 10:26:36.188 MDT [2012] LOG:  restored log file 
> > "00010419005A" from archive
> > 2021-10-27 10:26:36.750 MDT [2012] LOG:  consistent recovery state reached 
> > at 419/5AB8
> > 2021-10-27 10:26:36.752 MDT [6204] LOG:  database system is ready to accept 
> > read only connections
> > 2021-10-27 10:26:36.823 MDT [6040] LOG:  started streaming WAL from primary 
> > at 419/5A00 on timeline 1
> >
> >   *   There are many more WAL files available starting with 
> > 00010419005B but the restore process always stops at 
> > 00010419005A.
> >
> > I have two questions:
> >
> >   *   Why does the WAL file recovery process now stop after it reads 
> > 00010419005A?
> >   *   What do I need to do to get PostgreSQL to recover the rest of the 
> > available WAL files.
>
> The info above alone donesn't clearly suggest anything about the
> reason. Could you show the result of "pg_waldump
> 00010419005A 2>&1 | tail -5"?  What I'm expecting to see
> is an error message from pg_waldump before the end of the file. It
> would be the immediate cause of the problem.

+1, that will be the best place to start with, additionally, you can
enable DEBUG2 message so that from logs we can identify why it could
not continue recovery from the archive.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re:

2021-05-17 Thread Dilip Kumar
On Mon, May 17, 2021 at 7:37 PM David G. Johnston
 wrote:
>
> On Monday, May 17, 2021, Tom Lane  wrote:
>>
>> "David G. Johnston"  writes:
>> > On Sunday, May 16, 2021, Loles  wrote:
>> >> I don't understand why the server starts that process if I'm not using
>> >> replication of any kind.
>>
>> >  It starts the process because at any time someone can execute a “create
>> > subscription” command.  If no one does so the amount of time spent
>> > confirming the subscriptions table is empty is trivial.
>>
>> It looks like it won't be started if you set
>> max_logical_replication_workers = 0.
>>
>
> I was wondering about that.  Would you mind posting a link to the code where 
> that is checked?  I actually looked though admittedly not that hard.
>

/*
* ApplyLauncherRegister
* Register a background worker running the logical replication launcher.
*/
void
ApplyLauncherRegister(void)
{
BackgroundWorker bgw;

if (max_logical_replication_workers == 0)
return;


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re:

2021-05-16 Thread Dilip Kumar
On Mon, May 17, 2021 at 12:18 AM Loles  wrote:
>
> Hi!
>
> Let's see if someone can answer my question.
>
> In a default PostgreSQL 13 installation, starting the server starts a process 
> called "logical replication launcher".
>
> I know what logical replication is but I don't know what this process does on 
> the server.
>
> Even if you change the default setting of wal_level to minimal, the process 
> continues to start after the server is restarted so that it takes the change 
> in the wal_level parameter.

This process launches the worker process for each subscription for
applying the logical replication changes.  It does not matter what is
the wal level, because wal level should be logical to create a
publisher but this is used for launching the worker for the
subscriber.  So in short this process will scan the subscriber and
launches the apply worker if there is no worker corresponding to some
subscription.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Race condition with restore_command on streaming replica

2020-11-15 Thread Dilip Kumar
On Thu, Nov 5, 2020 at 1:39 AM Brad Nicholson  wrote:
>
> Hi,
>
> I've recently been seeing a race condition with the restore_command on 
> replicas using streaming replication.
>
> On the primary, we are archiving wal files to s3 compatible storage via 
> pgBackRest. In the recovery.conf section of the postgresql.conf file on the 
> replicas, we define the restore command as follows:
>
> restore_command = '/usr/bin/pgbackrest --config 
> /conf/postgresql/pgbackrest_restore.conf --stanza=formation archive-get %f 
> "%p"'
>
> We have a three member setup - m-0, m-1, m-2. Consider the case where m-0 is 
> the Primary and m-1 and m-2 are replicas connected to the m-0.
>
> When issuing a switchover (via Patroni) from m-0 to m-1, the connection from 
> m-2 to m-0 is terminated. The restore_command on m-2 is run, and it looks for 
> the .history file for the new timeline. If this happens before the history 
> file is created and pushed to the archive, m-2 will look for the next wal 
> file on the existing timeline in the archive. It will never be created as the 
> source has moved on, so this m-2 hangs waiting on that file. The 
> restore_command on the replica looking for this non-existent file is only run 
> once. This seems like an odd state to be in. The replica is waiting on a new 
> file, but it's not actually looking for it. Is this expected, or should the 
> restore_command be polling for that file?

I am not sure how Patroni does it internally,  can you explain the
scenario in more detail?  Suppose you are executing the promote on m-1
and if the promotion is successful it will switch the timeline and it
will create the timeline history file.  Now, once the promotion is
successful if we change the primary_conninfo on the m-2 then it will
restart the walsender and look for the latest .history file which it
should find either from direct streaming or through the
restore_command.  If you are saying that m-2 tried to look for the
history file before m-1 created it then it seems like you change the
primary_conninfo on m-2 before the m-1 promotion got completed.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: 12.2: Howto check memory-leak in worker?

2020-05-04 Thread Dilip Kumar
y appreciated (Dtrace
> should work) - processes will usually fail with OOM at this size, due
> to machine configuration - I'm waiting for that now (it is a very very
> old pentium3 machine ;) ).

One idea is that you can attach your process in gdb and call
MemoryContextStats(TopMemoryContext).  This will show which context is
using how much memory.  So basically u can call this function 2-3
times with some interval and see in which context the memory is
continuously increasing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Help needed with XMLTABLE

2020-04-15 Thread Dilip Kumar
On Wed, Apr 15, 2020 at 12:56 PM Pavel Stehule  wrote:
> st 15. 4. 2020 v 7:32 odesílatel Dilip Kumar  napsal:
>>
>> One of our customers tried to use XMLTABLE syntax without
>> row_expression, which works fine with ORACLE but doesn't work with
>> PostgreSQL.  So can anyone suggest what alternative we can use?
>>
>> CREATE TABLE user_pool_clean (
>> fk_user_pool_pk bytea NOT NULL,
>> user_id character varying(255) NOT NULL,
>> email_address character varying(250),
>> is_mil numeric,
>> is_civ numeric,
>> is_ctr numeric,
>> is_gov numeric,
>> is_edu numeric,
>> role_id character varying(50),
>> user_profile_id character varying(50),
>> service_branch_id character varying(50),
>> mil_pay_grade_id character varying(50),
>> my_auds character varying(4000),
>> my_orgs character varying(4000),
>> processed character(1) DEFAULT 'N'::bpchar NOT NULL
>> );
>>
>> insert into user_pool_clean
>> values('995CECDC1881375DE05312A270C7CF56','10015706','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y');
>>
>> insert into user_pool_clean
>> values('995CECDC1905375DE05312A270C7CF56','10015848','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11','
>> 1705562,1708486','1710621','Y');
>>
>>
>> SQL> SELECT upc.is_mil,TRIM(column_value) src
>>  FROM   user_pool_clean upc
>>,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>
>
> It's very famous how people are creative - when xmltable has only one 
> argument, then it is evaluated like XQuery expression.
>
> https://stewashton.wordpress.com/2016/08/01/splitting-strings-surprise/
>
> Unfortunately, Postgres has not support of XQuery, so nothing similar is 
> possible. Fortunately, Postgres has much better tools for parsing string.

Thanks for the help, Pavel.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Help needed with XMLTABLE

2020-04-14 Thread Dilip Kumar
One of our customers tried to use XMLTABLE syntax without
row_expression, which works fine with ORACLE but doesn't work with
PostgreSQL.  So can anyone suggest what alternative we can use?

CREATE TABLE user_pool_clean (
fk_user_pool_pk bytea NOT NULL,
user_id character varying(255) NOT NULL,
email_address character varying(250),
is_mil numeric,
is_civ numeric,
is_ctr numeric,
is_gov numeric,
is_edu numeric,
role_id character varying(50),
user_profile_id character varying(50),
service_branch_id character varying(50),
mil_pay_grade_id character varying(50),
my_auds character varying(4000),
my_orgs character varying(4000),
processed character(1) DEFAULT 'N'::bpchar NOT NULL
);

insert into user_pool_clean
values('995CECDC1881375DE05312A270C7CF56','10015706','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y');

insert into user_pool_clean
values('995CECDC1905375DE05312A270C7CF56','10015848','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11','
1705562,1708486','1710621','Y');


SQL> SELECT upc.is_mil,TRIM(column_value) src
 FROM   user_pool_clean upc
   ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));

IS_MIL SRC
-- ---
 0   1705562  O/P from the oracle database
 01708486
01706882

postgres[7604]=# SELECT upc.is_mil,TRIM(column_value) src
postgres-#  FROM   user_pool_clean upc
postgres-#,xmltable(('"'|| REPLACE( upc.my_auds, ',',
'","')|| '"'));
ERROR:  syntax error at or near ")"
LINE 3: ... ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Worse performance with higher work_mem?

2020-01-13 Thread Dilip Kumar
On Tue, Jan 14, 2020 at 5:29 AM Israel Brewster  wrote:
>
> I was working on diagnosing a “slow” (about 6 second run time) query:
>
> SELECT
> to_char(bucket,'-MM-DD"T"HH24:MI:SS') as dates,
> x_tilt,
> y_tilt,
> rot_x,
> rot_y,
> date_part('epoch', bucket) as timestamps,
> temp
> FROM
> (SELECT
>   time_bucket('1 week', read_time) as bucket,
>   avg(tilt_x::float) as x_tilt,
>   avg(tilt_y::float) as y_tilt,
>   avg(rot_x::float) as rot_x,
>   avg(rot_y::float) as rot_y,
>   avg(temperature::float) as temp
> FROM tilt_data
> WHERE station='c08883c0-fbe5-11e9-bd6e-aec49259cebb'
> AND read_time::date<='2020-01-13'::date
> GROUP BY bucket) s1
> ORDER BY bucket;
>
> In looking at the explain analyze output, I noticed that it had an “external 
> merge Disk” sort going on, accounting for about 1 second of the runtime 
> (explain analyze output here: https://explain.depesz.com/s/jx0q). Since the 
> machine has plenty of RAM available, I went ahead and increased the work_mem 
> parameter. Whereupon the query plan got much simpler, and performance of said 
> query completely tanked, increasing to about 15.5 seconds runtime 
> (https://explain.depesz.com/s/Kl0S), most of which was in a HashAggregate.
>
> I am running PostgreSQL 11.6 on a machine with 128GB of ram (so, like I said, 
> plenty of RAM)
>
> How can I fix this? Thanks.

I have noticed that after increasing the work_mem your plan has
switched from a parallel plan to a non-parallel plan.  Basically,
earlier it was getting executed with 3 workers.  And, after it becomes
non-parallel plan execution time is 3x.  For the analysis can we just
reduce the value of parallel_tuple_cost and parallel_setup_cost and
see how it behaves?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: ERROR: too many dynamic shared memory segments

2017-11-28 Thread Dilip Kumar
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas  wrote:

> On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar 
> wrote:
> >  I think BitmapHeapScan check whether dsa is valid or not if DSA is not
> > valid then it should assume it's non-parallel plan.
> >
> > Attached patch should fix the issue.
>
> So, create the pstate and then pretend we didn't?  Why not just avoid
> creating it in the first place, like this?
>

This is better way to fix it.

>
> I haven't checked whether this fixes the bug, but if it does, we can
> avoid introducing an extra branch in BitmapHeapNext.


With my test it's fixing the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: ERROR: too many dynamic shared memory segments

2017-11-27 Thread Dilip Kumar
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro  wrote:

> On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa 
> wrote:
> > As for the crash. I dug up the initial log and it looks like a
> segmentation
> > fault...
> >
> > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR:
> too
> > many dynamic shared memory segments
>
> I think there are two failure modes: one of your sessions showed the
> "too many ..." error (that's good, ran out of slots and said so and
> our error machinery worked as it should), and another crashed with a
> segfault, because it tried to use a NULL "area" pointer (bad).  I
> think this is a degenerate case where we completely failed to launch
> parallel query, but we ran the parallel query plan anyway and this
> code thinks that the DSA is available.  Oops.
>

 I think BitmapHeapScan check whether dsa is valid or not if DSA is not
valid then it should assume it's non-parallel plan.

Attached patch should fix the issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


bug_fix_in_pbhs_when_dsa_not_initialized.patch
Description: Binary data