Re: In-placre persistance change of a relation

2021-12-22 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 15:01:41 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I added TAP test to excecise the in-place persistence change.

We don't need a base table for every index. TAP test revised.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 112c077561bb24a0b40995e2d6ada7b33edd6475 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v13 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  93 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/test/recovery/t/027_persistence_change.pl | 244 
 24 files changed, 1704 insertions(+), 182 deletions(-)
 create mode 100644 src/test/recovery/t/027_persistence_change.pl

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..d251f22207 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +98,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c1..b344bbe511 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -724,6 +724,14 @@ we must panic and abort recovery.  The DBA will have to manually clean up
 then restart recovery.  This is part of the reason for not writing a WAL
 entry until we've successfully done the original action.
 
+The Smgr MARK files
+
+
+An smgr mark file is created when a new relation file is created to
+mark the relfilenode needs to be cleaned up at recovery time.  In
+contrast to the four actions above, failure to remove smgr mark files
+will lead to data loss, in which case the server will shut down.
+
 
 Skipping WAL for New RelFileNode
 

Re: In-placre persistance change of a relation

2021-12-22 Thread Kyotaro Horiguchi
At Wed, 22 Dec 2021 08:42:14 +, Jakub Wartak  
wrote in 
> I think there's slight omission:
...
> apparently reindex_index() params cannot be NULL - the same happens with 
> switching persistent

Hmm. a3dc926009 has changed the interface. (But the name is also
changed after that.)

-reindex_relation(Oid relid, int flags, int options)
+reindex_relation(Oid relid, int flags, ReindexParams *params)

> I'll also try to give another shot to the patch early next year - as we are 
> starting long Christmas/holiday break here 
> - with verifying WAL for GiST and more advanced setup (more crashes, and 
> standby/archiving/barman to see 
> how it's possible to use wal_level=minimal <-> replica transitions). 

Thanks. I added TAP test to excecise the in-place persistence change.

have a nice holiday, Jakub!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 19a8bb14dc863981e33ce8a10ecb9b87a4aa3937 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v12 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  93 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/test/recovery/t/027_persistence_change.pl | 268 +
 24 files changed, 1728 insertions(+), 182 deletions(-)
 create mode 100644 src/test/recovery/t/027_persistence_change.pl

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..d251f22207 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +98,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c1..b344bbe511 100644
--- a/src/backend/access/transam/README
+++ 

Re: parallel vacuum comments

2021-12-22 Thread Masahiko Sawada
On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila  wrote:
>
> On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > >
> > > 2)
> > > +#include "utils/rel.h"
> > > +#include "utils/lsyscache.h"
> > > +#include "utils/memutils.h"
> > >
> > > It might be better to keep the header file in alphabetical order.
> > > :
> > > +#include "utils/lsyscache.h"
> > > +#include "utils/memutils.h"
> > > +#include "utils/rel.h"
> > >
> >
> > Right, I'll take care of this as I am already making some other edits
> > in the patch.
> >
>
> Fixed this and made a few other changes in the patch that includes (a)
> passed down the num_index_scans information in parallel APIs based on
> which it can make the decision whether to reinitialize DSM or consider
> conditional parallel vacuum clean up; (b) get rid of first-time
> variable in ParallelVacuumState as that is not required if we have
> num_index_scans information; (c) there seems to be quite a few
> unnecessary includes in vacuumparallel.c which I have removed; (d)
> unnecessary error callback info was being set in ParallelVacuumState
> in leader backend; (e) changed/added comments at quite a few places.
>
> Can you please once verify the changes in the attached?

Thank you for updating the patch!

I agreed with these changes and it looks good to me.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Buildfarm support for older versions

2021-12-22 Thread Larry Rosenman

On 12/22/2021 10:15 pm, Tom Lane wrote:

Larry Rosenman  writes:

On 12/22/2021 9:59 pm, Tom Lane wrote:

Does it work if you drop --enable-nls?  (It'd likely be worth fixing
if so, but I'm trying to narrow the possible causes.)



Nope...


OK.  Since 9.3 succeeds, it seems like it's a link problem
we fixed at some point.  Can you bisect to find where we
fixed it?

regards, tom lane


I can try -- I haven't been very good at that.

I can give you access to the machine and the id the Buildfarm runs 
under.


(or give me a good process starting from a buildfarm layout).


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: Buildfarm support for older versions

2021-12-22 Thread Tom Lane
Larry Rosenman  writes:
> On 12/22/2021 9:59 pm, Tom Lane wrote:
>> Does it work if you drop --enable-nls?  (It'd likely be worth fixing
>> if so, but I'm trying to narrow the possible causes.)

> Nope...

OK.  Since 9.3 succeeds, it seems like it's a link problem
we fixed at some point.  Can you bisect to find where we
fixed it?

regards, tom lane




Re: Buildfarm support for older versions

2021-12-22 Thread Larry Rosenman

On 12/22/2021 9:59 pm, Tom Lane wrote:

Larry Rosenman  writes:

On 12/22/2021 9:34 pm, Tom Lane wrote:

What configure options did you use?



config_opts =>[
 qw(
   --enable-cassert
   --enable-debug
   --enable-nls
   --enable-tap-tests
   --with-perl
   )
 ],


Does it work if you drop --enable-nls?  (It'd likely be worth fixing
if so, but I'm trying to narrow the possible causes.)

regards, tom lane



Nope...

gmake[3]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/contrib/dummy_seclabel'

cp ../../../contrib/dummy_seclabel/dummy_seclabel.so dummy_seclabel.so
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -fPIC -DPIC -L../../src/port 
-L/usr/local/lib  -Wl,--as-needed 
-Wl,-R'/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/lib' 
-L../../src/port -lpgport -shared -o moddatetime.so moddatetime.o
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -fPIC -DPIC -L../../src/port 
-L/usr/local/lib  -Wl,--as-needed 
-Wl,-R'/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/lib' 
-L../../src/port -lpgport -shared -o insert_username.so 
insert_username.o
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -fPIC -DPIC -L../../src/port 
-L/usr/local/lib  -Wl,--as-needed 
-Wl,-R'/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/lib' 
-L../../src/port -lpgport -shared -o autoinc.so autoinc.o
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -fPIC -DPIC -L../../src/port 
-L/usr/local/lib  -Wl,--as-needed 
-Wl,-R'/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/lib' 
-L../../src/port -lpgport -shared -o timetravel.so timetravel.o
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -fPIC -DPIC -L../../src/port 
-L/usr/local/lib  -Wl,--as-needed 
-Wl,-R'/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/lib' 
-L../../src/port -lpgport -shared -o refint.so refint.o
ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:37
  pgstrcasecmp.o:(pg_strcasecmp) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:37
  pgstrcasecmp.o:(pg_strcasecmp) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:70
  pgstrcasecmp.o:(pg_strncasecmp) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:70
  pgstrcasecmp.o:(pg_strncasecmp) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:109
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:126
  pgstrcasecmp.o:(pg_tolower) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  

Re: Buildfarm support for older versions

2021-12-22 Thread Tom Lane
Larry Rosenman  writes:
> On 12/22/2021 9:34 pm, Tom Lane wrote:
>> What configure options did you use?

> config_opts =>[
>  qw(
>--enable-cassert
>--enable-debug
>--enable-nls
>--enable-tap-tests
>--with-perl
>)
>  ],

Does it work if you drop --enable-nls?  (It'd likely be worth fixing
if so, but I'm trying to narrow the possible causes.)

regards, tom lane




Re: Buildfarm support for older versions

2021-12-22 Thread Larry Rosenman

On 12/22/2021 9:34 pm, Tom Lane wrote:

Larry Rosenman  writes:

REL9_2_STABLE make dies on:
ld: error: relocation R_X86_64_PC32 cannot be used against symbol
_CurrentRuneLocale; recompile with -fPIC
[etc]


What configure options did you use?

regards, tom lane


config_opts =>[
qw(
  --enable-cassert
  --enable-debug
  --enable-nls
  --enable-tap-tests
  --with-perl
  )
],


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: Buildfarm support for older versions

2021-12-22 Thread Tom Lane
Larry Rosenman  writes:
> REL9_2_STABLE make dies on:
> ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
> _CurrentRuneLocale; recompile with -fPIC
> [etc]

What configure options did you use?

regards, tom lane




RE: parallel vacuum comments

2021-12-22 Thread houzj.f...@fujitsu.com
On Wed, Dec 22, 2021 9:55 PM Amit Kapila  wrote:
> On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila 
> wrote:
> >
> > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > >
> > > 2)
> > > +#include "utils/rel.h"
> > > +#include "utils/lsyscache.h"
> > > +#include "utils/memutils.h"
> > >
> > > It might be better to keep the header file in alphabetical order.
> > > :
> > > +#include "utils/lsyscache.h"
> > > +#include "utils/memutils.h"
> > > +#include "utils/rel.h"
> > >
> >
> > Right, I'll take care of this as I am already making some other edits
> > in the patch.
> >
> 
> Fixed this and made a few other changes in the patch that includes (a) passed
> down the num_index_scans information in parallel APIs based on which it can
> make the decision whether to reinitialize DSM or consider conditional parallel
> vacuum clean up; (b) get rid of first-time variable in ParallelVacuumState as 
> that
> is not required if we have num_index_scans information; (c) there seems to be
> quite a few unnecessary includes in vacuumparallel.c which I have removed; (d)
> unnecessary error callback info was being set in ParallelVacuumState in leader
> backend; (e) changed/added comments at quite a few places.
> 
> Can you please once verify the changes in the attached?

The changes look ok to me.
I tested the patch for multi-pass parallel vacuum cases and ran 'make 
check-world',
all the tests passed

Best regards,
Hou zj





Re: Buildfarm support for older versions

2021-12-22 Thread Larry Rosenman

On 12/22/2021 7:16 pm, Larry Rosenman wrote:

On 12/22/2021 7:20 am, Andrew Dunstan wrote:

On 12/21/21 15:06, Larry Rosenman wrote:

I filled out that form on the 16th, and haven't gotten a new animal
assignment.  Is there
a problem with my data?




It's a manual process, done when your friendly admins have time. I 
have

approved it now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



REL9_2_STABLE make dies on:
gmake[4]: Entering directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend/utils'
'/usr/bin/perl' ./generate-errcodes.pl
../../../src/backend/utils/errcodes.txt > errcodes.h
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument
-Wno-compound-token-split-by-macro -Wno-sometimes-uninitialized -g
-I../../src/port -DFRONTEND -I../../src/include -I/usr/local/include
-c -o path.o path.c
gmake[4]: Leaving directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend/utils'
prereqdir=`cd 'utils/' >/dev/null && pwd` && \
  cd '../../src/include/utils/' && rm -f errcodes.h && \
  ln -s "$prereqdir/errcodes.h" .
gmake[3]: Leaving directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend'
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
$ tail -30  make.log
ld: error: relocation R_X86_64_PC32 cannot be used against symbol
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:109
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:126
  pgstrcasecmp.o:(pg_tolower) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  pgstrcasecmp.o:(pg_tolower) in archive 
../../src/port/libpgport.a
cc: error: linker command failed with exit code 1 (use -v to see 
invocation)

gmake[3]: *** [../../src/Makefile.port:20: timetravel.so] Error 1
gmake[3]: *** Waiting for unfinished jobs
rm moddatetime.o autoinc.o refint.o timetravel.o insert_username.o
gmake[3]: Leaving directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/contrib/spi'
gmake[2]: *** [GNUmakefile:126: submake-contrib-spi] Error 2
gmake[2]: *** Waiting for unfinished jobs
gmake[2]: Leaving directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/test/regress'
gmake[1]: *** [Makefile:33: all-test/regress-recurse] Error 2
gmake[1]: Leaving directory
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src'
gmake: *** [GNUmakefile:11: all-src-recurse] Error 2
$

The other branches are still running.



Here's the full run:
$ bin/latest/run_branches.pl --test --config $(pwd)/conf/gerenuk.conf 
--run-all
Wed Dec 22 19:05:53 2021: buildfarm run for gerenuk:REL9_2_STABLE 
starting

gerenuk:REL9_2_STABLE [19:05:54] checking out source ...
gerenuk:REL9_2_STABLE [19:06:06] checking if build run needed ...
gerenuk:REL9_2_STABLE [19:06:06] copying source to pgsql.build ...
gerenuk:REL9_2_STABLE [19:06:08] running configure ...
gerenuk:REL9_2_STABLE [19:06:36] running make ...
Branch: REL9_2_STABLE
Stage Make failed with status 2
Wed Dec 22 19:08:21 2021: buildfarm run for gerenuk:REL9_3_STABLE 
starting

gerenuk:REL9_3_STABLE [19:08:21] checking out source ...
gerenuk:REL9_3_STABLE [19:08:27] checking if build run needed ...
gerenuk:REL9_3_STABLE [19:08:28] copying source to pgsql.build ...
gerenuk:REL9_3_STABLE [19:08:29] running configure ...
gerenuk:REL9_3_STABLE [19:08:52] running make ...
gerenuk:REL9_3_STABLE [19:10:38] running make check ...
gerenuk:REL9_3_STABLE [19:11:05] running make contrib ...
gerenuk:REL9_3_STABLE [19:11:15] running make install ...
gerenuk:REL9_3_STABLE [19:11:19] running make contrib install ...
gerenuk:REL9_3_STABLE [19:11:21] checking pg_upgrade
gerenuk:REL9_3_STABLE [19:12:29] running make check miscellaneous 
modules ...

gerenuk:REL9_3_STABLE [19:12:29] setting up db cluster (C)...
gerenuk:REL9_3_STABLE [19:12:32] starting db (C)...
gerenuk:REL9_3_STABLE [19:12:33] running make installcheck (C)...
gerenuk:REL9_3_STABLE [19:13:00] restarting db (C)...
gerenuk:REL9_3_STABLE [19:13:03] running make isolation check ...
gerenuk:REL9_3_STABLE [19:14:00] restarting db (C)...

Re: Delay the variable initialization in get_rel_sync_entry

2021-12-22 Thread Kyotaro Horiguchi
At Wed, 22 Dec 2021 13:11:38 +, "houzj.f...@fujitsu.com" 
 wrote in 
> Hi,
> 
> When reviewing some logical replication patches. I noticed that in
> function get_rel_sync_entry() we always invoke get_rel_relispartition() 
> and get_rel_relkind() at the beginning which could cause unnecessary
> cache access.
> 
> ---
> get_rel_sync_entry(PGOutputData *data, Oid relid)
> {
>   RelationSyncEntry *entry;
>   boolam_partition = get_rel_relispartition(relid);
>   charrelkind = get_rel_relkind(relid);
> ---
> 
> The extra cost could sometimes be noticeable because get_rel_sync_entry is a
> hot function which is executed for each change. And the 'am_partition' and
> 'relkind' are necessary only when we need to rebuild the RelationSyncEntry.
> 
> Here is the perf result for the case when inserted large amounts of data into 
> a
> un-published table in which case the cost is noticeable.
> 
> --12.83%--pgoutput_change
> |--11.84%--get_rel_sync_entry
>   |--4.76%--get_rel_relispartition
>   |--4.70%--get_rel_relkind
> 
> So, I think it would be better if we do the initialization only when
> RelationSyncEntry in invalid.
> 
> Attach a small patch which delay the initialization.
> 
> Thoughts ?

A simple benchmarking that replicates pgbench workload showed me that
the function doesn't enter the path to use the am_partition and
relkind in almost all (99.999..%) cases and I don't think it is a
special case. Thus I think we can expect that we gain about 10%
without any possibility of loss.

Addition to that, it is simply a good practice to keep variable scopes
narrow.

So +1 for this change.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Buildfarm support for older versions

2021-12-22 Thread Larry Rosenman

On 12/22/2021 7:20 am, Andrew Dunstan wrote:

On 12/21/21 15:06, Larry Rosenman wrote:

I filled out that form on the 16th, and haven't gotten a new animal
assignment.  Is there
a problem with my data?




It's a manual process, done when your friendly admins have time. I have
approved it now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



REL9_2_STABLE make dies on:
gmake[4]: Entering directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend/utils'
'/usr/bin/perl' ./generate-errcodes.pl 
../../../src/backend/utils/errcodes.txt > errcodes.h
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro 
-Wno-sometimes-uninitialized -g -I../../src/port -DFRONTEND 
-I../../src/include -I/usr/local/include  -c -o path.o path.c
gmake[4]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend/utils'

prereqdir=`cd 'utils/' >/dev/null && pwd` && \
  cd '../../src/include/utils/' && rm -f errcodes.h && \
  ln -s "$prereqdir/errcodes.h" .
gmake[3]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/backend'
cc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing

$ tail -30  make.log
ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:109
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  pgstrcasecmp.o:(pg_toupper) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
__mb_sb_limit; recompile with -fPIC

defined in /lib/libc.so.7
referenced by pgstrcasecmp.c:126
  pgstrcasecmp.o:(pg_tolower) in archive 
../../src/port/libpgport.a


ld: error: relocation R_X86_64_PC32 cannot be used against symbol 
_CurrentRuneLocale; recompile with -fPIC

defined in /lib/libc.so.7
referenced by runetype.h:0 (/usr/include/runetype.h:0)
  pgstrcasecmp.o:(pg_tolower) in archive 
../../src/port/libpgport.a
cc: error: linker command failed with exit code 1 (use -v to see 
invocation)

gmake[3]: *** [../../src/Makefile.port:20: timetravel.so] Error 1
gmake[3]: *** Waiting for unfinished jobs
rm moddatetime.o autoinc.o refint.o timetravel.o insert_username.o
gmake[3]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/contrib/spi'

gmake[2]: *** [GNUmakefile:126: submake-contrib-spi] Error 2
gmake[2]: *** Waiting for unfinished jobs
gmake[2]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src/test/regress'

gmake[1]: *** [Makefile:33: all-test/regress-recurse] Error 2
gmake[1]: Leaving directory 
'/home/pgbuildfarm/buildroot/REL9_2_STABLE/pgsql.build/src'

gmake: *** [GNUmakefile:11: all-src-recurse] Error 2
$

The other branches are still running.
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-22 Thread SATYANARAYANA NARLAPURAM
Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

The idea here is to calculate the lag between the primary and the standby
(Async?) server during XLogInsert and block the caller until the lag is
less than the threshold value. We can calculate the max lag by iterating
over ReplicationSlotCtl->replication_slots. If this is not something we
don't want to do in the core, at least adding a hook for XlogInsert is of
great value.

A few other scenarios I can think of with the hook are:

   1. Enforcing RPO as described above
   2. Enforcing rate limit and slow throttling when sync standby is falling
   behind (could be flush lag or replay lag)
   3. Transactional log rate governance - useful for cloud providers to
   provide SKU sizes based on allowed WAL writes.

Thoughts?

Thanks,
Satya


Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output

2021-12-22 Thread Peter Geoghegan
On Tue, Dec 21, 2021 at 9:46 PM Greg Stark  wrote:
> Or rather I think a better way to look at it is that the progress
> output for the operator should be separated from the metrics logged.
> As an operator what I want to see is some progress indicator
> ""starting table scan", "overflow at x% of table scanned, starting
> index scan", "processing index 1" "index 2"... so I can have some idea
> of how much longer the vacuum will take and see whether I need to
> raise maintenance_work_mem and by how much. I don't need to see all
> the metrics while it's running.

We have the pg_stat_progress_vacuum view for that these days, of
course. Which has the advantage of working with autovacuum and
manually-run VACUUMs in exactly the same way. I am generally opposed
to any difference between autovacuum and manual VACUUM that isn't
clearly necessary. For example, ANALYZE behaves very differently in a
VACUUM ANALYZE run on a table with a GIN index in autovacuum -- that
seems awful to me.

> 2) I don't much like the format. I want to be able to parse the output
> with awk or mtail or even just grep for relevant lines. Things like
> "index scan not needed" make it hard to parse since you don't know
> what it will look like if they are needed. I would have expected
> something like "index scans: 0" which is actually already there up
> above. I'm not clear how this line is meant to be read. Is it just
> explaining *why* the index scan was skipped? It would just be missing
> entirely if it wasn't skipped?

No, a line that looks very much like the "index scan not needed" line
will always be there. IOW there will reliably be a line that explains
whether or not any index scan took place, and why (or why not).
Whereas there won't ever be a line in VACUUM VERBOSE (as currently
implemented) that tells you about something that might have been
expected to happen, but didn't actually happen.

The same thing cannot be said for every line of the log output,
though. For example, the line about I/O timings only appears with
track_io_timing=on.

I have changed things here quite a bit in the last year. I do try to
stick to the "always show line" convention, if only for the benefit of
humans. If the line doesn't generalize to every situation, then I tend
to doubt that it merits appearing in the summary in the first place.

> Fwiw, having it be parsable is why I wouldn't want it to be multiple
> ereports. That would mean it could get interleaved with other errors
> from other backends. That would be a disaster.

That does seem relevant, but honestly I haven't made that a goal here.

Part of the problem has been with what we've actually shown. Postgres
14 was the first version to separately report on the number of LP_DEAD
line pointers in the table (or left behind in the table when we didn't
do index vacuuming). Prior to 14 we only reported dead tuples. These
seemed to be assumed to be roughly equivalent in the past, but
actually they're totally different things, with many practical
consequences:

https://www.postgresql.org/message-id/flat/CAH2-WzkkGT2Gt4XauS5eQOQi4mVvL5X49hBTtWccC8DEqeNfKA%40mail.gmail.com#b7bd96573a2ca27b023ce78b4a8c2b13

This means that we only just started showing one particular metric
that is of fundamental importance in this log output (and VACUUM
VERBOSE). We also used to show things that had very little relevance,
with slightly different (confusingly similar) metrics shown in each
variant of the instrumentation (a problem that I'm trying to
permanently avoid by unifying everything). While things have improved
a lot here recently, I don't think that things have fully settled yet
-- the output will probably change quite a bit more in Postgres 15.
That makes me a little hesitant to promise very much about making the
output parseable or stable.

That said, I don't want to make it needlessly difficult. That should be avoided.

--
Peter Geoghegan




Are datcollate/datctype always libc even under --with-icu ?

2021-12-22 Thread Chapman Flack
... ok, I see that the answer is yes, according to the commit comment
for eccfef8:

  Currently, ICU-provided collations can only be explicitly named
  collations.  The global database locales are still always libc-provided.

I got there the long way, by first wondering how to tell whether a
datcollate or datctype string was intended for libc or ICU, and then
reading pg_perm_setlocale, and then combing through the docs at
CREATE DATABASE and createdb and initdb and Collation Support and
pg_database and the release notes for 10, sure that I would find
the answer staring at me in one of those places once I knew I was asking.

Next question: the "currently" in that comment suggests that could change,
but is there any present intention to change it, or is this likely to just
be the way it is for the foreseeable future?

Regards,
-Chap




Re: do only critical work during single-user vacuum?

2021-12-22 Thread Peter Geoghegan
On Tue, Dec 21, 2021 at 6:39 PM Masahiko Sawada  wrote:
> Even not in the situation where the database has to run as the
> single-user mode to freeze tuples, I think there would be some use
> cases where users want to run vacuum (in failsafe mode) on tables with
> relfrozenxid/relminmxid greater than their failsafe thresholds before
> falling into that situation. I think it’s common that users are
> somehow monitoring relfrozenxid/relminmxid and want to manually run
> vacuum on them rather than relying on autovacuums. --min-xid-age
> option and --min-mxid-age option of vacuumdb command would be good
> examples. So I think this new command/facility might not necessarily
> need to be specific to single-user mode.

It wouldn't be specific to single-user mode, since that is not really
special. It's only special in a way that's quite artificial (it can
continue to allocate XIDs past the point where we usually deem it
unsafe).

So, I think we agree; this new emergency vacuuming feature shouldn't
be restricted to single-user mode in any way, and shouldn't care about
whether we're in single user mode or not when run. OTOH, it probably
will be presented as something that is typically run in single user
mode, in situations like the one John's customer found themselves in
-- disastrous, unpleasant situations. It's not just a good policy
(that makes testing easy). The same kind of problem can easily be
caught a little earlier, before the system actually becomes unable to
allocate new XIDs (when not in single-user mode) -- that's quite
likely, and almost as scary.

As I said before, ISTM that the important thing is to have something
dead simple -- something that is easy to use when woken at 4am, when
the DBA is tired and stressed. Something that makes generic choices,
that are not way too conservative, but also don't risk making the
problem worse instead of better.

-- 
Peter Geoghegan




Re: \d with triggers: more than one row returned by a subquery used as an expression

2021-12-22 Thread Justin Pryzby
On Fri, Dec 17, 2021 at 09:43:56AM -0600, Justin Pryzby wrote:
> I want to mention that the 2nd problem I mentioned here is still broken.
> https://www.postgresql.org/message-id/20210717010259.gu20...@telsasoft.com
> 
> It happens if non-inheritted triggers on child and parent have the same name.

This is the fix I was proposing

It depends on pg_partition_ancestors() to return its partitions in order:
this partition => parent => ... => root.

>From 1c7282f5c06197e178bd23c50a226a148cb66bc8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 16 Jul 2021 19:57:00 -0500
Subject: [PATCH 1/2] psql: fix \d for statement triggers with same name on
 partition and its parent

This depends on pg_partition_ancestors() to return its partitions in order:
this partition => parent => ... => root.

20210716193319.gs20...@telsasoft.com
---
 src/bin/psql/describe.c|  2 +-
 src/test/regress/expected/triggers.out | 13 +
 src/test/regress/sql/triggers.sql  |  4 
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 42bad9281ed..7ad10fa1a3f 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3012,7 +3012,7 @@ describeOneTableDetails(const char *schemaname,
 		   " FROM pg_catalog.pg_trigger AS u, "
 		   "  pg_catalog.pg_partition_ancestors(t.tgrelid) AS a"
 		   " WHERE u.tgname = t.tgname AND u.tgrelid = a.relid"
-		   "   AND u.tgparentid = 0) AS parent" :
+		   "   AND u.tgparentid = 0 LIMIT 1) AS parent" :
 		   "NULL AS parent"),
 		  oid);
 		if (pset.sversion >= 11)
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 9d529e949f4..44c8cdf206f 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2122,6 +2122,19 @@ Triggers:
 alter table trigpart attach partition trigpart3 FOR VALUES FROM (2000) to (3000); -- fail
 ERROR:  trigger "trg1" for relation "trigpart3" already exists
 drop table trigpart3;
+create trigger samename after delete on trigpart execute function trigger_nothing();
+create trigger samename after delete on trigpart1 execute function trigger_nothing();
+\d trigpart1
+ Table "public.trigpart1"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+ b  | integer |   |  | 
+Partition of: trigpart FOR VALUES FROM (0) TO (1000)
+Triggers:
+samename AFTER DELETE ON trigpart1 FOR EACH STATEMENT EXECUTE FUNCTION trigger_nothing()
+trg1 AFTER INSERT ON trigpart1 FOR EACH ROW EXECUTE FUNCTION trigger_nothing(), ON TABLE trigpart
+
 drop table trigpart;
 drop function trigger_nothing();
 --
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index eaab642950b..f644a64ca7d 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1428,6 +1428,10 @@ create trigger trg1 after insert on trigpart3 for each row execute procedure tri
 alter table trigpart attach partition trigpart3 FOR VALUES FROM (2000) to (3000); -- fail
 drop table trigpart3;
 
+create trigger samename after delete on trigpart execute function trigger_nothing();
+create trigger samename after delete on trigpart1 execute function trigger_nothing();
+\d trigpart1
+
 drop table trigpart;
 drop function trigger_nothing();
 
-- 
2.17.0

>From 543badde5a9bc536d5072cdafd4947a46bfb3efa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH 2/2] psql: Add newlines and spacing in trigger query for \d

cosmetic change to c33869cc3 to make the output of psql -E look pretty.
---
 src/bin/psql/describe.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 7ad10fa1a3f..a82c945a9a2 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3007,12 +3007,12 @@ describeOneTableDetails(const char *schemaname,
 		  "t.tgenabled, t.tgisinternal, %s\n"
 		  "FROM pg_catalog.pg_trigger t\n"
 		  "WHERE t.tgrelid = '%s' AND ",
-		  (pset.sversion >= 13 ?
-		   "(SELECT (NULLIF(a.relid, t.tgrelid))::pg_catalog.regclass"
-		   " FROM pg_catalog.pg_trigger AS u, "
-		   "  pg_catalog.pg_partition_ancestors(t.tgrelid) AS a"
-		   " WHERE u.tgname = t.tgname AND u.tgrelid = a.relid"
-		   "   AND u.tgparentid = 0 LIMIT 1) AS parent" :
+		  (pset.sversion >= 13 ? "\n"
+		   "  (SELECT (NULLIF(a.relid, t.tgrelid))::pg_catalog.regclass\n"
+		   "   FROM pg_catalog.pg_trigger AS u,\n"
+		   "  pg_catalog.pg_partition_ancestors(t.tgrelid) AS a\n"
+		   "   WHERE u.tgname = t.tgname AND u.tgrelid = a.relid\n"
+		   "AND u.tgparentid = 0 LIMIT 1) AS parent" :
 		   "NULL AS parent"),
 		  oid);
 		

Re: track_io_timing default setting

2021-12-22 Thread Peter Geoghegan
On Wed, Dec 22, 2021 at 11:16 AM Stephen Frost  wrote:
> > I set "track_io_timing" to "on" all the time, same as "log_lock_waits",
> > so I'd want them both on by default.
>
> Same.  I'd also push back and ask what modern platforms still require a
> kernel call for gettimeofday, and are we really doing ourselves a favor
> by holding back on enabling this by default due to those?

+1

> If it's such
> an issue, could we figure out a way to have an 'auto' option where we
> detect if the platform has such an issue and disable in that case, but
> enable otherwise?

This is the same principle behind wal_sync_method's per-platform
default, of course. Seems like a similar case to me.

I think that the following heuristic might be a good one: If the
platform uses clock_gettime() (for INSTR_TIME_SET_CURRENT() stuff),
then enable track_io_timing by default on that platform. Otherwise,
disable it by default. (Plus do whatever makes the most sense on
Windows, which uses something else entirely.)

The issue with gettimeofday() seems to be that it isn't really
intended for the same purpose as clock_gettime() -- it's just for
getting the time, not measuring elapsed time. It seems reasonable to
suppose that an operating system that offers a facility for measuring
elapsed time won't have horrible performance problems. clock_gettime()
first appeared in POSIX almost 30 years ago.

-- 
Peter Geoghegan




Re: track_io_timing default setting

2021-12-22 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Fri, 2021-12-10 at 10:20 -0500, Tom Lane wrote:
> > Jeff Janes  writes:
> > > Can we change the default setting of track_io_timing to on?
> > 
> > That adds a very significant amount of overhead on some platforms
> > (gettimeofday is not cheap if it requires a kernel call).  And I
> > doubt the claim that the average Postgres user needs this, and
> > doubt even more that they need it on all the time.
> > So I'm -1 on the idea.
> 
> I set "track_io_timing" to "on" all the time, same as "log_lock_waits",
> so I'd want them both on by default.

Same.  I'd also push back and ask what modern platforms still require a
kernel call for gettimeofday, and are we really doing ourselves a favor
by holding back on enabling this by default due to those?  If it's such
an issue, could we figure out a way to have an 'auto' option where we
detect if the platform has such an issue and disable in that case, but
enable otherwise?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: sequences vs. synchronous replication

2021-12-22 Thread Tomas Vondra

On 12/22/21 18:50, Fujii Masao wrote:



On 2021/12/22 21:11, Tomas Vondra wrote:

Interesting idea, but I think it has a couple of issues :-(


Thanks for the review!

1) We'd need to know the LSN of the last WAL record for any given 
sequence, and we'd need to communicate that between backends somehow. 
Which seems rather tricky to do without affecting performance.


How about using the page lsn for the sequence? nextval_internal() 
already uses that to check whether it's less than or equal to checkpoint 
redo location.




Hmm, maybe.



2) SyncRepWaitForLSN() is used only in commit-like situations, and 
it's a simple wait, not a decision to write more WAL. Environments 
without sync replicas are affected by this too - yes, the data loss 
issue is not there, but the amount of WAL is still increased.


How about reusing only a part of code in SyncRepWaitForLSN()? Attached 
is the PoC patch that implemented what I'm thinking.



IIRC sync_standby_names can change while a transaction is running, 
even just right before commit, at which point we can't just go back in 
time and generate WAL for sequences accessed earlier. But we still 
need to ensure the sequence is properly replicated.


Yes. In the PoC patch, SyncRepNeedsWait() still checks 
sync_standbys_defined and uses SyncRepWaitMode. But they should not be 
checked nor used because their values can be changed on the fly, as you 
pointed out. Probably SyncRepNeedsWait() will need to be changed so that 
it doesn't use them.




Right. I think the data loss with sync standby is merely a symptom, not 
the root cause. We'd need to deduce the LSN for which to wait at commit.




3) I don't think it'd actually reduce the amount of WAL records in 
environments with many sessions (incrementing the same sequence). In 
those cases the WAL (generated by in-progress xact from another 
session) is likely to not be flushed, so we'd generate the extra WAL 
record. (And if the other backends would need flush LSN of this new 
WAL record, which would make it more likely they have to generate WAL 
too.)


With the PoC patch, only when previous transaction that executed 
nextval() and caused WAL record is aborted, subsequent nextval() 
generates additional WAL record. So this approach can reduce WAL volume 
than other approach?

 > In the PoC patch, to reduce WAL volume more, it might be better to make
nextval_internal() update XactLastRecEnd and assign XID rather than 
emitting new WAL record, when SyncRepNeedsWait() returns true.




Yes, but I think there are other cases. For example the WAL might have 
been generated by another backend, in a transaction that might be still 
running. In which case I don't see how updating XactLastRecEnd in 
nextval_internal would fix this, right?


I did some experiments with increasing CACHE for the sequence, and that 
mostly eliminates the overhead. See the message I sent a couple minutes 
ago. IMHO that's a reasonable solution for the tiny number of people 
using nextval() in a way that'd be affected by this (i.e. without 
writing anything else in the xact).



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: sequences vs. synchronous replication

2021-12-22 Thread Tomas Vondra



On 12/21/21 03:49, Tomas Vondra wrote:

On 12/21/21 02:01, Tom Lane wrote:

Tomas Vondra  writes:

OK, I did a quick test with two very simple benchmarks - simple select
from a sequence, and 'pgbench -N' on scale 1. Benchmark was on current
master, patched means SEQ_LOG_VALS was set to 1.


But ... pgbench -N doesn't use sequences at all, does it?

Probably inserts into a table with a serial column would constitute a
plausible real-world case.



D'oh! For some reason I thought pgbench has a sequence on the history 
table, but clearly I was mistaken. There's another thinko, because after 
inspecting pg_waldump output I realized "SEQ_LOG_VALS 1" actually logs 
only every 2nd increment. So it should be "SEQ_LOG_VALS 0".


So I repeated the test fixing SEQ_LOG_VALS, and doing the pgbench with a 
table like this:


   create table test (a serial, b int);

and a script doing

   insert into test (b) values (1);

The results look like this:

1) select nextval('s');

  clients  1 4
     --
  master   39533    124998
  patched   3748  9114
     --
  diff  -91%  -93%


2) insert into test (b) values (1);

  clients  1 4
     --
  master    3718  9188
  patched   3698  9209
     --
  diff    0%    0%

So the nextval() results are a bit worse, due to not caching 1/2 the 
nextval calls. The -90% is roughly expected, due to generating about 32x 
more WAL (and having to wait for commit).


But results for the more realistic insert workload are about the same as 
before (i.e. no measurable difference). Also kinda expected, because 
those transactions have to wait for WAL anyway.




Attached is a patch tweaking WAL logging - in wal_level=minimal we do 
the same thing as now, in higher levels we log every sequence fetch.


After thinking about this a bit more, I think even the nextval workload 
is not such a big issue, because we can set cache for the sequences. 
Until now this had fairly limited impact, but it can significantly 
reduce the performance drop caused by WAL-logging every sequence fetch.


I've repeated the nextval test on a different machine (the one I used 
before is busy with something else), and the results look like this:


1) 1 client

cache  1 32128
--
master 13975  14425  19886
patched  886   7900  18397
--
diff-94%   -45%-7%

4) 4 clients

cache 1 32128
-
master 8338  12849  18248
patched 331   8124  18983
-
diff   -96%   -37% 4%

So I think this makes it acceptable / manageable. Of course, this means 
the values are much less monotonous (across backends), but I don't think 
we really promised that. And I doubt anyone is really using sequences 
like this (just nextval) in performance critical use cases.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom eeaa7cb36c69af048f0321e4883864ebe2542429 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 22 Dec 2021 03:18:46 +0100
Subject: [PATCH 1/6] WAL-log individual sequence fetches

---
 src/backend/commands/sequence.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 72bfdc07a4..0f309d0a4e 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -52,6 +52,9 @@
  * We don't want to log each fetching of a value from a sequence,
  * so we pre-log a few fetches in advance. In the event of
  * crash we can lose (skip over) as many values as we pre-logged.
+ *
+ * We only pre-log fetches in wal_level=minimal. For higher levels we
+ * WAL-log every individual sequence increment, as if this was 0.
  */
 #define SEQ_LOG_VALS	32
 
@@ -666,11 +669,18 @@ nextval_internal(Oid relid, bool check_permissions)
 	 * WAL record to be written anyway, else replay starting from the
 	 * checkpoint would fail to advance the sequence past the logged values.
 	 * In this case we may as well fetch extra values.
+	 *
+	 * We only pre-log fetches in wal_level=minimal. For higher levels we
+	 * WAL-log every individual sequence increment.
 	 */
 	if (log < fetch || !seq->is_called)
 	{
 		/* forced log to satisfy local demand for values */
-		fetch = log = fetch + SEQ_LOG_VALS;
+		if (XLogIsNeeded())
+			fetch = log = fetch;
+		else
+			fetch = log = fetch + SEQ_LOG_VALS;
+
 		logit = true;
 	}
 	else
@@ -680,7 +690,11 @@ nextval_internal(Oid relid, bool check_permissions)
 		if (PageGetLSN(page) <= redoptr)
 		{
 			

Re: sequences vs. synchronous replication

2021-12-22 Thread Fujii Masao



On 2021/12/22 21:11, Tomas Vondra wrote:

Interesting idea, but I think it has a couple of issues :-(


Thanks for the review!


1) We'd need to know the LSN of the last WAL record for any given sequence, and 
we'd need to communicate that between backends somehow. Which seems rather 
tricky to do without affecting performance.


How about using the page lsn for the sequence? nextval_internal() already uses 
that to check whether it's less than or equal to checkpoint redo location.



2) SyncRepWaitForLSN() is used only in commit-like situations, and it's a 
simple wait, not a decision to write more WAL. Environments without sync 
replicas are affected by this too - yes, the data loss issue is not there, but 
the amount of WAL is still increased.


How about reusing only a part of code in SyncRepWaitForLSN()? Attached is the 
PoC patch that implemented what I'm thinking.



IIRC sync_standby_names can change while a transaction is running, even just 
right before commit, at which point we can't just go back in time and generate 
WAL for sequences accessed earlier. But we still need to ensure the sequence is 
properly replicated.


Yes. In the PoC patch, SyncRepNeedsWait() still checks sync_standbys_defined 
and uses SyncRepWaitMode. But they should not be checked nor used because their 
values can be changed on the fly, as you pointed out. Probably 
SyncRepNeedsWait() will need to be changed so that it doesn't use them.



3) I don't think it'd actually reduce the amount of WAL records in environments 
with many sessions (incrementing the same sequence). In those cases the WAL 
(generated by in-progress xact from another session) is likely to not be 
flushed, so we'd generate the extra WAL record. (And if the other backends 
would need flush LSN of this new WAL record, which would make it more likely 
they have to generate WAL too.)


With the PoC patch, only when previous transaction that executed nextval() and 
caused WAL record is aborted, subsequent nextval() generates additional WAL 
record. So this approach can reduce WAL volume than other approach?

In the PoC patch, to reduce WAL volume more, it might be better to make 
nextval_internal() update XactLastRecEnd and assign XID rather than emitting 
new WAL record, when SyncRepNeedsWait() returns true.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 72bfdc07a4..38cd55b81a 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -37,6 +37,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_type.h"
+#include "replication/syncrep.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/smgr.h"
@@ -676,8 +677,9 @@ nextval_internal(Oid relid, bool check_permissions)
else
{
XLogRecPtr  redoptr = GetRedoRecPtr();
+   XLogRecPtr  pagelsn = PageGetLSN(page);
 
-   if (PageGetLSN(page) <= redoptr)
+   if (pagelsn <= redoptr || SyncRepNeedsWait(pagelsn))
{
/* last update of seq was before checkpoint */
fetch = log = fetch + SEQ_LOG_VALS;
diff --git a/src/backend/replication/syncrep.c 
b/src/backend/replication/syncrep.c
index bdbc9ef844..589364cb58 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -130,6 +130,34 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * ===
  */
 
+bool
+SyncRepNeedsWait(XLogRecPtr lsn)
+{
+   int mode;
+
+   if (!SyncRepRequested() ||
+   !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+   return false;
+
+   mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
+
+   Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
+   Assert(WalSndCtl != NULL);
+
+   LWLockAcquire(SyncRepLock, LW_SHARED);
+   Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
+
+   if (!WalSndCtl->sync_standbys_defined ||
+   lsn <= WalSndCtl->lsn[mode])
+   {
+   LWLockRelease(SyncRepLock);
+   return false;
+   }
+
+   LWLockRelease(SyncRepLock);
+   return true;
+}
+
 /*
  * Wait for synchronous replication, if requested by user.
  *
diff --git a/src/include/replication/syncrep.h 
b/src/include/replication/syncrep.h
index 4266afde8b..b08f3f32d1 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,6 +82,7 @@ extern char *syncrep_parse_error_msg;
 extern char *SyncRepStandbyNames;
 
 /* called by user backend */
+extern bool SyncRepNeedsWait(XLogRecPtr lsn);
 extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
 /* called at backend exit */


Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 22/12/2021 20:42, Tom Lane wrote:

Andrey Lepikhov  writes:

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.



This thread gives us hope to get started on solving some of the basic
planner problems.
But there is no activity for a long time, as I see. Have You tried to
implement this idea? Is it actual now?


It's been on the back burner for awhile :-(.  I've not forgotten
about it though.
I would try to develop this feature. Idea is clear for me, but 
definition of the NullableVars structure is not obvious. Maybe you can 
prepare a sketch of this struct or you already have some draft code?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: pg_upgrade should truncate/remove its logs before running

2021-12-22 Thread Justin Pryzby
On Wed, Dec 22, 2021 at 09:52:26AM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote:
> >> Are you suggesting to remove these ?
> >> -/pg_upgrade_internal.log
> >> -/loadable_libraries.txt
> 
> > Yep, it looks so as these are part of the logs, the second one being a
> > failure state.
> 
> >> -/reindex_hash.sql
> 
> > But this one is not, no?
> 
> I'd like to get to a state where there's just one thing to "rm -rf"
> to clean up after any pg_upgrade run.  If we continue to leave the
> we-suggest-you-run-these scripts loose in $CWD then we've not really
> improved things much.

My patch moves reindex_hash.sql, and I'm having trouble seeing why it shouldn't
be handled in .gitignore the same way as other stuff that's moved.

But delete-old-cluster.sh is not moved, and I'm not sure how to improve on
that.

> Perhaps there'd be merit in putting log files into an additional
> subdirectory of that output directory, like
> pg_upgrade_output.d/logs/foo.log, so that the more-ignorable
> output files would be separated from the less-ignorable ones.
> Or perhaps that's just gilding the lily.

In the case it's successful, everything is removed - except for the delete
script.  I can see the case for separating the dumps (which are essentially
internal and of which there may be many) and the logs (same), from
the .txt error files like loadable_libraries.txt (which are user-facing).

It could also be divided with each DB having its own subdir, with a dumpfile
and a logfile.

Should the unix socket be created underneath the "output dir" ?

Should it be possible to set the output dir to "." ?  That would give the
pre-existing behavior, but only if we don't use subdirs for log/ and dump/.
>From 8d82297d0e46b48460820ca338781d352093443a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 11 Dec 2021 19:19:50 -0600
Subject: [PATCH] pg_upgrade: write log files and dumps in subdir..

The subdir must not exist when pg_upgrade is started.
This avoids appending to logfiles which have pre-existing errors in them.
And allows more easily cleaning up after pg_upgrade.
---
 doc/src/sgml/ref/pgupgrade.sgml | 15 ++-
 src/bin/pg_upgrade/.gitignore   |  3 --
 src/bin/pg_upgrade/check.c  | 14 +++---
 src/bin/pg_upgrade/dump.c   |  6 ++-
 src/bin/pg_upgrade/exec.c   |  5 ++-
 src/bin/pg_upgrade/function.c   |  3 +-
 src/bin/pg_upgrade/option.c | 31 --
 src/bin/pg_upgrade/pg_upgrade.c | 76 +++--
 src/bin/pg_upgrade/pg_upgrade.h |  1 +
 src/bin/pg_upgrade/server.c |  6 ++-
 10 files changed, 119 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index c5ce732ee98..e6d730a2574 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -130,6 +130,19 @@ PostgreSQL documentation
   cluster
  
 
+ 
+  -l
+  --logdir
+  Directory where SQL and log files will be written.
+  The directory will be created and must not exist before calling
+  pg_upgrade.
+  It will be removed after a successful upgrade unless
+  --retain is specified.
+  The default is pg_upgrade_output.d in the current
+  working directory.
+  
+ 
+
  
   -N
   --no-sync
@@ -768,7 +781,7 @@ psql --username=postgres --file=script.sql postgres
 
   
pg_upgrade creates various working files, such
-   as schema dumps, in the current working directory.  For security, be sure
+   as schema dumps, below the current working directory.  For security, be sure
that that directory is not readable or writable by any other users.
   
 
diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa502..7222f8a6b1f 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -1,9 +1,6 @@
 /pg_upgrade
 # Generated by test suite
-/pg_upgrade_internal.log
 /delete_old_cluster.sh
 /delete_old_cluster.bat
-/reindex_hash.sql
-/loadable_libraries.txt
 /log/
 /tmp_check/
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1b8ef79242b..d422e580e07 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -552,7 +552,7 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name)
 
 	prep_status("Creating script to delete old cluster");
 
-	if ((script = fopen_priv(*deletion_script_file_name, "w")) == NULL)
+	if ((script = fopen_priv(*deletion_script_file_name, "w")) == NULL) // OK ?
 		pg_fatal("could not open file \"%s\": %s\n",
  *deletion_script_file_name, strerror(errno));
 
@@ -784,7 +784,8 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster)
 		return;
 	}
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "contrib_isn_and_int8_pass_by_value.txt");
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
@@ -861,7 

Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Tom Lane
Andrey Lepikhov  writes:
> On 5/2/2020 01:24, Tom Lane wrote:
>> I've not written any actual code, but am close to being ready to.

> This thread gives us hope to get started on solving some of the basic 
> planner problems.
> But there is no activity for a long time, as I see. Have You tried to 
> implement this idea? Is it actual now?

It's been on the back burner for awhile :-(.  I've not forgotten
about it though.

regards, tom lane




Re: Logical replication timeout problem

2021-12-22 Thread Fabrice Chapuis
Hello Amit,

I was able to reproduce the timeout problem in the lab.
After loading more than 20 millions of rows in a table which is not
replicated (insert command ends without error), errors related to logical
replication processes appear in the postgres log.
Approximately every 5 minutes worker process is restarted. The snap files
in the slot directory are still present. The replication system seems to be
blocked. Why these snap files are not removed. What do they contain?
I will recompile postgres with your patch to debug.

2021-12-22 14:54:21.506 CET [64939] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:20.908 CET [64938] ERROR:  terminating logical replication
worker due to timeout
2021-12-22 15:01:20.911 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 64938) exited with exit code
1
2021-12-22 15:01:20.923 CET [65037] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:20.932 CET [65038] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:20.932 CET [65038] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:20.932 CET [65037] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:20.933 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65037) exited with exit code
1
2021-12-22 15:01:25.944 CET [65039] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:25.951 CET [65040] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:25.951 CET [65040] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:25.951 CET [65039] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:25.952 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65039) exited with exit code
1
2021-12-22 15:01:30.962 CET [65041] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:30.970 CET [65042] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:30.970 CET [65042] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')
2021-12-22 15:01:30.970 CET [65041] ERROR:  could not start WAL streaming:
ERROR:  replication slot "sub008_s00" is active for PID 64939
2021-12-22 15:01:30.971 CET [61827] LOG:  worker process: logical
replication worker for subscription 26994 (PID 65041) exited with exit code
1
2021-12-22 15:01:35.982 CET [65043] LOG:  logical replication apply worker
for subscription "sub008_s00" has started
2021-12-22 15:01:35.990 CET [65044] ERROR:  replication slot
"sub008_s00" is active for PID 64939
2021-12-22 15:01:35.990 CET [65044] STATEMENT:  START_REPLICATION SLOT
"sub008_s00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s00"')

-rw---. 1 postgres postgres 16270723 Dec 22 16:02
xid-14312-lsn-23-9900.snap
-rw---. 1 postgres postgres 16145717 Dec 22 16:02
xid-14312-lsn-23-9A00.snap
-rw---. 1 postgres postgres 10889437 Dec 22 16:02
xid-14312-lsn-23-9B00.snap
[postgres@s729058a debug]$ ls -ltr pg_replslot/sub008_s012a00/ | wc -l
1420

On Fri, Nov 12, 2021 at 5:57 PM Fabrice Chapuis 
wrote:

> I made a mistake in the configuration of my test script, in fact I cannot
> reproduce the problem at the moment.
> Yes, on the original environment there is physical replication, that's why
> for the lab I configured 2 nodes with physical replication.
> I'll try new tests next week
> Regards
>
> On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila 
> wrote:
>
>> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>>  wrote:
>> >
>> > Hello,
>> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your
>> patch.Tang, I used your script to configure logical replication between 2
>> databases and to generate 10 million entries in an unreplicated foo table.
>> On a standalone instance no error message appears in log.
>> > I activate the physical replication between 2 nodes, and I got
>> following error:
>> >
>> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
>> message
>> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
>> 0/300 TIMELINE 1
>> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical
>> replication worker due to administrator command
>> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical
>> replication worker for subscription 16413 (PID 12064) exited with 

Re: do only critical work during single-user vacuum?

2021-12-22 Thread John Naylor
On Tue, Dec 21, 2021 at 10:39 PM Masahiko Sawada  wrote:
>
> On Wed, Dec 22, 2021 at 6:56 AM Peter Geoghegan  wrote:
> >
> > This new command/facility should probably not be a new flag to the
> > VACUUM command, as such. Rather, I think that it should either be an
> > SQL-callable function, or a dedicated top-level command (that doesn't
> > accept any tables). The only reason to have this is for scenarios
> > where the user is already in a tough spot with wraparound failure,
> > like that client of yours. Nobody wants to force the failsafe for one
> > specific table. It's not general purpose, at all, and shouldn't claim
> > to be.
>
> Even not in the situation where the database has to run as the
> single-user mode to freeze tuples, I think there would be some use
> cases where users want to run vacuum (in failsafe mode) on tables with
> relfrozenxid/relminmxid greater than their failsafe thresholds before
> falling into that situation. I think it’s common that users are
> somehow monitoring relfrozenxid/relminmxid and want to manually run
> vacuum on them rather than relying on autovacuums. --min-xid-age
> option and --min-mxid-age option of vacuumdb command would be good
> examples. So I think this new command/facility might not necessarily
> need to be specific to single-user mode.

If we want to leave open the possibility to specify these parameters,
a SQL-callable function seems like the way to go. And even if we
don't, a function is fine.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: pg_upgrade should truncate/remove its logs before running

2021-12-22 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote:
>> Are you suggesting to remove these ?
>> -/pg_upgrade_internal.log
>> -/loadable_libraries.txt

> Yep, it looks so as these are part of the logs, the second one being a
> failure state.

>> -/reindex_hash.sql

> But this one is not, no?

I'd like to get to a state where there's just one thing to "rm -rf"
to clean up after any pg_upgrade run.  If we continue to leave the
we-suggest-you-run-these scripts loose in $CWD then we've not really
improved things much.

Perhaps there'd be merit in putting log files into an additional
subdirectory of that output directory, like
pg_upgrade_output.d/logs/foo.log, so that the more-ignorable
output files would be separated from the less-ignorable ones.
Or perhaps that's just gilding the lily.

regards, tom lane




Re: [Proposal][WIP] Add option to log auto_explain output to separate logfile

2021-12-22 Thread Pavel Stehule
Hi

st 22. 12. 2021 v 14:54 odesílatel Timofey 
napsal:

> Hello, hackers!
>
> Now, all of auto_explain output is directed to postgres's log and it is
> not comfortably to extract on big highloaded systems.
> My proposal is add option to auto_explain to log data to separate
> logfile. In my patch I plan to (re)open file every time associated guc
> variable is changing, included extension boot. In case of error or
> unexpected situation file closes and output directed to common
> postgres's log.
> What do you think about this idea? Also i would be grateful about any
> ideas how conveniently and suitable rotate that new log
>

It is good idea, but I think so it needs to be implemented more generally.
There should be a) possibility to add some extra informations, that can
allows redirect in others tools like rsyslog, b) possibility to use more
logfiles than one - for more extensions, or for slow query log, for log
sensitive or audit data, ... General design solves the problem with log
rotation.

Regards

Pavel


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-22 Thread Ashutosh Sharma
On Wed, Dec 22, 2021 at 5:06 PM Dilip Kumar  wrote:
>
> On Wed, Dec 22, 2021 at 4:26 PM Ashutosh Sharma  wrote:
>
> >> Basically, ALTER TABLE SET TABLESPACE, will register the
> >> SYNC_UNLINK_REQUEST for the table files w.r.t the old tablespace, but
> >> those will get unlinked during the next checkpoint.  Although the
> >> files must be truncated during commit itself but unlink might not have
> >> been processed until the next checkpoint.  This is the explanation for
> >> the behavior you found during your investigation, but I haven't looked
> >> into the issue so I will do it latest by tomorrow and send my
> >> analysis.
> >>
> >> Thanks for working on this.
> >
> >
> > Yeah the problem here is that the old rel file that needs to be unlinked 
> > still exists in the old tablespace. Earlier, without your changes we were 
> > doing force checkpoint before starting with the actual work for the alter 
> > database which unlinked/deleted the rel file from the old tablespace, but 
> > that is not the case here.  Now we have removed the force checkpoint from 
> > movedb() which means until the auto checkpoint happens the old rel file 
> > will remain in the old tablespace thereby creating this problem.
>
> One solution to this problem could be that, similar to mdpostckpt(),
> we invent one more function which takes dboid and dsttblspc oid as
> input and it will unlink all the requests which are w.r.t. the dboid
> and tablespaceoid, and before doing it we should also do
> ForgetDatabaseSyncRequests(), so that next checkpoint does not flush
> some old request.

I couldn't find the mdpostchkpt() function. Are you talking about
SyncPostCheckpoint() ? Anyway, as you have rightly said, we need to
unlink all the files available inside the dst_tablespaceoid/dst_dboid/
directory by scanning the pendingUnlinks list. And finally we don't
want the next checkpoint to unlink this file again and PANIC so for
that we have to update the entry for this unlinked rel file in the
hash table i.e. cancel the sync request for it.

--
With Regards,
Ashutosh Sharma.




RE: Failed transaction statistics to measure the logical replication progress

2021-12-22 Thread osumi.takami...@fujitsu.com
On Wednesday, December 22, 2021 9:38 PM Wang, Wei/王 威  
wrote:
> I have a question on the v19-0002 patch:
> 
> When I tested for this patch, I found pg_stat_subscription_workers has some
> unexpected data.
> For example:
> [Publisher]
> create table replica_test1(a int, b text); create publication pub1 for table
> replica_test1; create table replica_test2(a int, b text); create publication 
> pub2
> for table replica_test2;
> 
> [Subscriber]
> create table replica_test1(a int, b text); create subscription sub1 CONNECTION
> 'dbname=postgres' publication pub1; create table replica_test2(a int, b text);
> create subscription sub2 CONNECTION 'dbname=postgres' publication pub2;
> 
> [Publisher]
> insert into replica_test1 values(1,'1');
> 
> [Subscriber]
> select * from pg_stat_subscription_workers;
> 
> -[ RECORD 1 ]--+--
> Subid | 16389
> subname   | sub1
> subrelid  |
> commit_count  | 1
> ...
> -[ RECORD 2 ]--+--
> subid | 16395
> subname   | sub2
> subrelid  |
> commit_count  | 1
> ...
> 
> I originally expected only one record for "sub1".
> 
> I think the reason is apply_handle_commit() always invoke
> pgstat_report_subworker_xact_end().
> But when we insert data to replica_test1 in publish side:
> [In the publish]
> pub1's walsender1 will send three messages((LOGICAL_REP_MSG_BEGIN,
> LOGICAL_REP_MSG_INSERT and LOGICAL_REP_MSG_COMMIT))
> to sub1's apply worker1.
> pub2's walsender2 will also send two messages(LOGICAL_REP_MSG_BEGIN
> and LOGICAL_REP_MSG_COMMIT) to sub2's apply worker2. Because
> inserted table is not published by pub2.
> 
> [In the subscription]
> sub1's apply worker1 receive LOGICAL_REP_MSG_COMMIT,
>   so invoke pgstat_report_subworker_xact_end to increase
> commit_count of sub1's stats.
> sub2's apply worker2 receive LOGICAL_REP_MSG_COMMIT,
>   it will do the same action to increase commit_count of sub2's stats.
> 
> Do we expect these commit counts which come from empty transactions ?
Hi, thank you so much for your test !


This is another issue discussed in [1]
where the patch in the thread is a work in progress, I think.

The point you reported will bring a lot of confusion for the users,
to interpret the results of the subscription stats values,
if those patches including the subscription stats patch will not get committed
together (like in the same version).

I've confirmed that HEAD applied with v19-* and
v15-0001-Skip-empty-transactions-for-logical-replication.patch
on top of v19-* showed only one record, as you expected like below,
although all patches are not finished yet.

postgres=# select * from pg_stat_subscription_workers;
-[ RECORD 1 ]--+--
subid  | 16389
subname| sub1
subrelid   | 
commit_count   | 1
abort_count| 0
error_count| 0


IMHO, the conclusion is we are currently in the middle of fixing the behavior.

[1] - 
https://www.postgresql.org/message-id/CAFPTHDbVLWxpfnwMxJcXq703gLXciXHE83hwKQ_0OTCZ6oLCjg%40mail.gmail.com


Best Regards,
Takamichi Osumi



Re: parallel vacuum comments

2021-12-22 Thread Amit Kapila
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila  wrote:
>
> On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
>  wrote:
> >
> >
> > 2)
> > +#include "utils/rel.h"
> > +#include "utils/lsyscache.h"
> > +#include "utils/memutils.h"
> >
> > It might be better to keep the header file in alphabetical order.
> > :
> > +#include "utils/lsyscache.h"
> > +#include "utils/memutils.h"
> > +#include "utils/rel.h"
> >
>
> Right, I'll take care of this as I am already making some other edits
> in the patch.
>

Fixed this and made a few other changes in the patch that includes (a)
passed down the num_index_scans information in parallel APIs based on
which it can make the decision whether to reinitialize DSM or consider
conditional parallel vacuum clean up; (b) get rid of first-time
variable in ParallelVacuumState as that is not required if we have
num_index_scans information; (c) there seems to be quite a few
unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState
in leader backend; (e) changed/added comments at quite a few places.

Can you please once verify the changes in the attached?

-- 
With Regards,
Amit Kapila.


v12-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patch
Description: Binary data


[Proposal][WIP] Add option to log auto_explain output to separate logfile

2021-12-22 Thread Timofey

Hello, hackers!

Now, all of auto_explain output is directed to postgres's log and it is 
not comfortably to extract on big highloaded systems.
My proposal is add option to auto_explain to log data to separate 
logfile. In my patch I plan to (re)open file every time associated guc 
variable is changing, included extension boot. In case of error or 
unexpected situation file closes and output directed to common 
postgres's log.
What do you think about this idea? Also i would be grateful about any 
ideas how conveniently and suitable rotate that new log



Index: contrib/auto_explain/auto_explain.c
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
--- a/contrib/auto_explain/auto_explain.c	(revision dfaa346c7c00ff8a3fd8ea436a7d5be7527e67cb)
+++ b/contrib/auto_explain/auto_explain.c	(date 1640174495515)
@@ -19,6 +19,8 @@
 #include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "jit/jit.h"
+#include "storage/ipc.h"
+#include "storage/fd.h"
 #include "utils/guc.h"
 
 PG_MODULE_MAGIC;
@@ -36,6 +38,7 @@
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
 static double auto_explain_sample_rate = 1;
+static char *auto_explain_logfile = NULL;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -62,6 +65,9 @@
 /* Current nesting depth of ExecutorRun calls */
 static int	nesting_level = 0;
 
+/* Descriptor for open logfile */
+static int ae_logfile = 0;
+
 /* Is the current top-level query to be sampled? */
 static bool current_query_sampled = false;
 
@@ -71,6 +77,7 @@
 	 current_query_sampled)
 
 /* Saved hook values in case of unload */
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 static ExecutorStart_hook_type prev_ExecutorStart = NULL;
 static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
@@ -86,6 +93,27 @@
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
+static void explain_shmem_startup_hook(void);
+
+static void explain_proc_exit(int code, Datum arg);
+
+/* Try to open logfile then value is assigned to auto_explain_logfile, in case of failure -- block assigning */
+static bool explain_guc_open_logfile_check(char **newval, void **extra, GucSource source);
+
+//todo: rotate log
+
+static void close_logfile() {
+	if (ae_logfile <= 0)
+		return;
+	if (pg_fsync(ae_logfile) != 0)
+		ereport(WARNING, (errcode_for_file_access(),
+errmsg("auto_explain could not fsync it's logfile :\"%s\"", auto_explain_logfile)));
+	if (close(ae_logfile) != 0)
+		ereport(WARNING, (errcode_for_file_access(),
+errmsg("auto_explain could not close it's logfile :\"%s\"", auto_explain_logfile)));
+	ae_logfile = 0;
+	SetConfigOption("auto_explain.logfile", NULL, PGC_INTERNAL, PGC_S_OVERRIDE)
+}
 
 /*
  * Module load callback
@@ -231,9 +259,21 @@
 			 NULL,
 			 NULL);
 
+	DefineCustomStringVariable("auto_explain.logfile",
+			   "File to log explained queries",
+			   "If set extension will store explained queries in this file, otherwise in syslog",
+			   _explain_logfile,
+			   NULL,
+			   PGC_SUSET,
+			   0,
+			   explain_guc_logfile_open_check,
+			   NULL, NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = explain_shmem_startup_hook;
 	prev_ExecutorStart = ExecutorStart_hook;
 	ExecutorStart_hook = explain_ExecutorStart;
 	prev_ExecutorRun = ExecutorRun_hook;
@@ -251,12 +291,46 @@
 _PG_fini(void)
 {
 	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
 	ExecutorStart_hook = prev_ExecutorStart;
 	ExecutorRun_hook = prev_ExecutorRun;
 	ExecutorFinish_hook = prev_ExecutorFinish;
 	ExecutorEnd_hook = prev_ExecutorEnd;
 }
 
+/*
+ * shmem_startup_hook: set on_proc_exit
+ */
+static void
+explain_shmem_startup_hook(void) {
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	on_proc_exit(explain_proc_exit, 0);
+}
+
+static bool
+explain_guc_open_logfile_check(char **newval, void **extra, GucSource source) {
+	if (newval) {
+		if (ae_logfile > 0) {
+			close_logfile();
+		}
+		ae_logfile = BasicOpenFile(*newval, O_WRONLY | O_APPEND | O_CREAT | O_BINARY);
+		if (ae_logfile < 0) {
+			GUC_check_errcode(errcode_for_file_access());
+			GUC_check_errmsg("auto_explain could not access file \"%s\", continue logging to syslog", *newval);
+			GUC_check_errdetail("continue logging to syslog");
+			return false;
+		}
+		return true;
+	}
+}
+
+static void
+explain_proc_exit(int code, Datum arg) {
+	close_logfile();
+};
+
 /*
  * ExecutorStart hook: start up logging if needed
  */
@@ -427,10 +501,30 @@
 			 * reported.  

Re: Replication slot drop message is sent after pgstats shutdown.

2021-12-22 Thread Masahiko Sawada
On Mon, Dec 13, 2021 at 12:11 PM Kyotaro Horiguchi
 wrote:
>
> At Fri, 10 Dec 2021 18:13:31 +0900, Masahiko Sawada  
> wrote in
> > I agreed with Andres and Horiguchi-san and attached an updated patch.
>
> Thanks for the new version.
>
> It seems fine, but I have some comments from a cosmetic viewpoint.
>
> +   /*
> +* Register callback to make sure cleanup and releasing the 
> replication
> +* slot on exit.
> +*/
> +   ReplicationSlotInitialize();
>
> Actually all the function does is that but it looks slightly
> inconsistent with the function name. I think we can call it just
> "initialization" here. I think we don't need to change the function
> comment the same way but either will do for me.
>
> +ReplicationSlotBeforeShmemExit(int code, Datum arg)
>
> The name looks a bit too verbose. Doesn't just "ReplicationSlotShmemExit" 
> work?
>
> -* so releasing here is fine. There's another cleanup in 
> ProcKill()
> -* ensuring we'll correctly cleanup on FATAL errors as well.
> +* so releasing here is fine. There's another cleanup in
> +* ReplicationSlotBeforeShmemExit() callback ensuring we'll 
> correctly
> +* cleanup on FATAL errors as well.
>
> I'd prefer "during before_shmem_exit()" than "in
> ReplicationSlotBeforeShmemExit() callback" here. (But the current
> wording is also fine by me.)

Thank you for the comments! Agreed with all comments.

I've attached an updated patch. Please review it.

> The attached detects that bug, but I'm not sure it's worth expending
> test time, or this might be in the server test suit.

Thanks. It's convenient to test this issue but I'm also not sure it's
worth adding to the test suit.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v3-0001-Move-replication-slot-release-to-before_shmem_exi.patch
Description: Binary data


Re: Buildfarm support for older versions

2021-12-22 Thread Andrew Dunstan


On 12/21/21 15:06, Larry Rosenman wrote:
> I filled out that form on the 16th, and haven't gotten a new animal
> assignment.  Is there
> a problem with my data?



It's a manual process, done when your friendly admins have time. I have
approved it now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.
This thread gives us hope to get started on solving some of the basic 
planner problems.
But there is no activity for a long time, as I see. Have You tried to 
implement this idea? Is it actual now?


--
regards,
Andrey Lepikhov
Postgres Professional




Delay the variable initialization in get_rel_sync_entry

2021-12-22 Thread houzj.f...@fujitsu.com
Hi,

When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition() 
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.

---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
boolam_partition = get_rel_relispartition(relid);
charrelkind = get_rel_relkind(relid);
---

The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

So, I think it would be better if we do the initialization only when
RelationSyncEntry in invalid.

Attach a small patch which delay the initialization.

Thoughts ?

Best regards,
Hou zj


0001-Delay-the-variable-initialization-in-get_rel_sync_en.patch
Description:  0001-Delay-the-variable-initialization-in-get_rel_sync_en.patch


Re: parallel vacuum comments

2021-12-22 Thread Amit Kapila
On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wed, Dec 22, 2021 11:36 AM Masahiko Sawada  wrote:
> > On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila 
> > wrote:
>
> The patch looks mostly good to me.
> I only have few comments.
>
> 1)
> +/*
> + * Do parallel index bulk-deletion with parallel workers.
> + */
> +void
> +parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long 
> num_table_tuples)
> +{
> +   Assert(!IsParallelWorker());
> +
>
> Would it be better to also put Assert(pvs != NULL) here ? Because we removed
> the Assert(ParallelVacuumIsActive(vacrel)) check in the old function.
>

I am not sure if that is helpful or not because there is only one
caller of it which checks pvs before calling this function.

>
> 2)
> +#include "utils/rel.h"
> +#include "utils/lsyscache.h"
> +#include "utils/memutils.h"
>
> It might be better to keep the header file in alphabetical order.
> :
> +#include "utils/lsyscache.h"
> +#include "utils/memutils.h"
> +#include "utils/rel.h"
>

Right, I'll take care of this as I am already making some other edits
in the patch.

-- 
With Regards,
Amit Kapila.




RE: Failed transaction statistics to measure the logical replication progress

2021-12-22 Thread wangw.f...@fujitsu.com
On Mon, Dec 22, 2021 at 6:14 PM osumi.takami...@fujitsu.com 
 wrote:
>
> Attached the new patch v19.
>

I have a question on the v19-0002 patch:

When I tested for this patch, I found pg_stat_subscription_workers has some 
unexpected data.
For example:
[Publisher]
create table replica_test1(a int, b text); create publication pub1 for table 
replica_test1;
create table replica_test2(a int, b text); create publication pub2 for table 
replica_test2;

[Subscriber]
create table replica_test1(a int, b text); create subscription sub1 CONNECTION 
'dbname=postgres' publication pub1;
create table replica_test2(a int, b text); create subscription sub2 CONNECTION 
'dbname=postgres' publication pub2;

[Publisher]
insert into replica_test1 values(1,'1');

[Subscriber]
select * from pg_stat_subscription_workers;

-[ RECORD 1 ]--+--
Subid   | 16389
subname | sub1
subrelid|
commit_count| 1
...
-[ RECORD 2 ]--+--
subid   | 16395
subname | sub2
subrelid|
commit_count| 1
...

I originally expected only one record for "sub1". 

I think the reason is apply_handle_commit() always invoke 
pgstat_report_subworker_xact_end().
But when we insert data to replica_test1 in publish side:
[In the publish]
pub1's walsender1 will send three messages((LOGICAL_REP_MSG_BEGIN, 
LOGICAL_REP_MSG_INSERT and LOGICAL_REP_MSG_COMMIT))
to sub1's apply worker1.
pub2's walsender2 will also send two messages(LOGICAL_REP_MSG_BEGIN and 
LOGICAL_REP_MSG_COMMIT)
to sub2's apply worker2. Because inserted table is not published by pub2.

[In the subscription]
sub1's apply worker1 receive LOGICAL_REP_MSG_COMMIT,
so invoke pgstat_report_subworker_xact_end to increase commit_count of 
sub1's stats.
sub2's apply worker2 receive LOGICAL_REP_MSG_COMMIT,
it will do the same action to increase commit_count of sub2's stats.

Do we expect these commit counts which come from empty transactions ?

Regards,
Wang wei


Re: sequences vs. synchronous replication

2021-12-22 Thread Tomas Vondra




On 12/22/21 05:56, Fujii Masao wrote:



On 2021/12/22 10:57, Tomas Vondra wrote:



On 12/19/21 04:03, Amit Kapila wrote:

On Sat, Dec 18, 2021 at 7:24 AM Tomas Vondra
 wrote:


while working on logical decoding of sequences, I ran into an issue 
with

nextval() in a transaction that rolls back, described in [1]. But after
thinking about it a bit more (and chatting with Petr Jelinek), I think
this issue affects physical sync replication too.

Imagine you have a primary <-> sync_replica cluster, and you do this:

    CREATE SEQUENCE s;

    -- shutdown the sync replica

    BEGIN;
    SELECT nextval('s') FROM generate_series(1,50);
    ROLLBACK;

    BEGIN;
    SELECT nextval('s');
    COMMIT;

The natural expectation would be the COMMIT gets stuck, waiting for the
sync replica (which is not running), right? But it does not.



How about if we always WAL log the first sequence change in a 
transaction?




I've been thinking about doing something like this, but I think it 
would not have any significant advantages compared to using 
"SEQ_LOG_VALS 0". It would still have the same performance hit for 
plain nextval() calls, and there's no measurable impact on simple 
workloads that already write WAL in transactions even with 
SEQ_LOG_VALS 0.


Just idea; if wal_level > minimal, how about making nextval_internal() 
(1) check whether WAL is replicated to sync standbys, up to the page lsn 
of the sequence, and (2) forcibly emit a WAL record if not replicated 
yet? The similar check is performed at the beginning of 
SyncRepWaitForLSN(), so probably we can reuse that code.




Interesting idea, but I think it has a couple of issues :-(

1) We'd need to know the LSN of the last WAL record for any given 
sequence, and we'd need to communicate that between backends somehow. 
Which seems rather tricky to do without affecting performance.


2) SyncRepWaitForLSN() is used only in commit-like situations, and it's 
a simple wait, not a decision to write more WAL. Environments without 
sync replicas are affected by this too - yes, the data loss issue is not 
there, but the amount of WAL is still increased.


IIRC sync_standby_names can change while a transaction is running, even 
just right before commit, at which point we can't just go back in time 
and generate WAL for sequences accessed earlier. But we still need to 
ensure the sequence is properly replicated.


3) I don't think it'd actually reduce the amount of WAL records in 
environments with many sessions (incrementing the same sequence). In 
those cases the WAL (generated by in-progress xact from another session) 
is likely to not be flushed, so we'd generate the extra WAL record. (And 
if the other backends would need flush LSN of this new WAL record, which 
would make it more likely they have to generate WAL too.)



So I don't think this would actually help much.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: parallel vacuum comments

2021-12-22 Thread houzj.f...@fujitsu.com
On Wed, Dec 22, 2021 11:36 AM Masahiko Sawada  wrote:
> On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila 
> wrote:
> >
> > On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada
>  wrote:
> > >
> > > On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila 
> wrote:
> > > >
> > >
> > > Thank you for the comment. Agreed.
> > >
> > > I've attached updated version patches. Please review them.
> > >
> >
> > These look mostly good to me. Please find attached the slightly edited
> > version of the 0001 patch. I have modified comments, ran pgindent, and
> > modify the commit message. I'll commit this tomorrow if you are fine
> > with it.
> 
> Thank you for committing the first patch.
> 
> I've attached an updated version second patch. Please review it.
> 
> Regards,

Hi,

The patch looks mostly good to me.
I only have few comments.

1)
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long 
num_table_tuples)
+{
+   Assert(!IsParallelWorker());
+

Would it be better to also put Assert(pvs != NULL) here ? Because we removed
the Assert(ParallelVacuumIsActive(vacrel)) check in the old function.


2)
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"

It might be better to keep the header file in alphabetical order.
:
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"

Best regards,
Hou zj


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-22 Thread Dilip Kumar
On Wed, Dec 22, 2021 at 4:26 PM Ashutosh Sharma  wrote:

>> Basically, ALTER TABLE SET TABLESPACE, will register the
>> SYNC_UNLINK_REQUEST for the table files w.r.t the old tablespace, but
>> those will get unlinked during the next checkpoint.  Although the
>> files must be truncated during commit itself but unlink might not have
>> been processed until the next checkpoint.  This is the explanation for
>> the behavior you found during your investigation, but I haven't looked
>> into the issue so I will do it latest by tomorrow and send my
>> analysis.
>>
>> Thanks for working on this.
>
>
> Yeah the problem here is that the old rel file that needs to be unlinked 
> still exists in the old tablespace. Earlier, without your changes we were 
> doing force checkpoint before starting with the actual work for the alter 
> database which unlinked/deleted the rel file from the old tablespace, but 
> that is not the case here.  Now we have removed the force checkpoint from 
> movedb() which means until the auto checkpoint happens the old rel file will 
> remain in the old tablespace thereby creating this problem.

One solution to this problem could be that, similar to mdpostckpt(),
we invent one more function which takes dboid and dsttblspc oid as
input and it will unlink all the requests which are w.r.t. the dboid
and tablespaceoid, and before doing it we should also do
ForgetDatabaseSyncRequests(), so that next checkpoint does not flush
some old request.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-22 Thread Ashutosh Sharma
On Wed, Dec 22, 2021 at 2:44 PM Dilip Kumar  wrote:

> On Tue, Dec 21, 2021 at 11:10 AM Ashutosh Sharma 
> wrote:
> >
> > I am getting the below error when running the same test-case that Neha
> shared in her previous email.
> >
> > ERROR:  55000: some relations of database "test1" are already in
> tablespace "tab1"
> > HINT:  You must move them back to the database's default tablespace
> before using this command.
> > LOCATION:  movedb, dbcommands.c:1555
> >
> > test-case:
> > 
> > create tablespace tab1 location '/home/ashu/test1';
> > create tablespace tab location '/home/ashu/test';
> >
> > create database test tablespace tab;
> > \c test
> >
> > create table t(a int primary key, b text);
> >
> > create or replace function large_val() returns text language sql as
> 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
> >
> > insert into t values (generate_series(1,10), large_val());
> >
> > alter table t set tablespace tab1 ;
> >
> > \c postgres
> > create database test1 template test;
> >
> > \c test1
> > alter table t set tablespace tab;
> >
> > \c postgres
> > alter database test1 set tablespace tab1; -- this fails with  the given
> error.
> >
> > Observations:
> > ===
> > Please note that before running above alter database statement, the
> table 't'  is moved to tablespace 'tab' from 'tab1' so not sure why
> ReadDir() is returning true when searching for table 't' in tablespace
> 'tab1'. It should have returned NULL here:
> >
> >  while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
> > {
> > if (strcmp(xlde->d_name, ".") == 0 ||
> > strcmp(xlde->d_name, "..") == 0)
> > continue;
> >
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >  errmsg("some relations of database \"%s\" are
> already in tablespace \"%s\"",
> > dbname, tblspcname),
> >  errhint("You must move them back to the database's
> default tablespace before using this command.")));
> > }
> >
> > Also, if I run the checkpoint explicitly before executing the above
> alter database statement, this error doesn't appear which means it only
> happens with the new changes because earlier we were doing the force
> checkpoint at the end of createdb statement.
> >
>
> Basically, ALTER TABLE SET TABLESPACE, will register the
> SYNC_UNLINK_REQUEST for the table files w.r.t the old tablespace, but
> those will get unlinked during the next checkpoint.  Although the
> files must be truncated during commit itself but unlink might not have
> been processed until the next checkpoint.  This is the explanation for
> the behavior you found during your investigation, but I haven't looked
> into the issue so I will do it latest by tomorrow and send my
> analysis.
>
> Thanks for working on this.
>

Yeah the problem here is that the old rel file that needs to be unlinked
still exists in the old tablespace. Earlier, without your changes we were
doing force checkpoint before starting with the actual work for the alter
database which unlinked/deleted the rel file from the old tablespace, but
that is not the case here.  Now we have removed the force checkpoint from
movedb() which means until the auto checkpoint happens the old rel file
will remain in the old tablespace thereby creating this problem.

--
With Regards,
Ashutosh Sharma.


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-22 Thread osumi.takami...@fujitsu.com
On Tuesday, December 21, 2021 11:18 PM I wrote:
> On Thursday, December 16, 2021 9:51 PM I wrote:
> > Attached the updated patch v14.
> FYI, I've conducted a test of disable_on_error flag using pg_upgrade.  I
> prepared PG14 and HEAD applied with disable_on_error patch.
> Then, I setup a logical replication pair of the publisher and the subscriber 
> by 14
> and executed pg_upgrade for both the publisher and the subscriber
> individually.
> 
> After the updation, on the subscriber, I've confirmed the disable_on_error is
> false via both pg_subscription and \dRs+, as expected.
Additionally, I've tested the new TAP test in a tight loop
that executed 027_disable_on_error.pl 100 times sequentially.
There was no failure, which means
any timing issue should not exist in the test.

Best Regards,
Takamichi Osumi



RE: Failed transaction statistics to measure the logical replication progress

2021-12-22 Thread osumi.takami...@fujitsu.com
On Tuesday, December 21, 2021 6:00 PM Greg Nancarrow  
wrote:
> Some review comments on the v18 patches:
Thank you for your review !

> v18-0002
> 
> doc/src/sgml/monitoring.sgml
> (1) tablesync worker stats?
> 
> Shouldn't the comment below only mention the apply worker? (since we're no
> longer recording stats of the tablesync worker)
> 
> +   Number of transactions that failed to be applied by the table
> +   sync worker or main apply worker in this subscription. This
> +   counter is updated after confirming the error is not same as
> +   the previous one.
> +   
> 
> Also, it should say "... the error is not the same as the previous one."
Fixed.
 
> src/backend/catalog/system_views.sql
> (2) pgstat_report_subworker_xact_end()
> 
> Fix typo and some wording:
> 
> BEFORE:
> + *  This should be called before the call of process_syning_tables()
> + not to
> AFTER:
> + *  This should be called before the call of
> process_syncing_tables(), so to not
Fixed.

> src/backend/postmaster/pgstat.c
> (3) pgstat_send_subworker_xact_stats()
> 
> BEFORE:
> + * Send a subworker transaction stats to the collector.
> AFTER:
> + * Send a subworker's transaction stats to the collector.
Fixed.

> (4)
> Wouldn't it be best for:
> 
> +   if (!TimestampDifferenceExceeds(last_report, now,
> + PGSTAT_STAT_INTERVAL))
> 
> to be:
> 
> +   if (last_report != 0 && !TimestampDifferenceExceeds(last_report,
> now, PGSTAT_STAT_INTERVAL))
> 
> ?
I'm not sure which is better and
I never have strong objection to your idea but currently
I prefer the previous code because we don't need to
add one extra condition (last_report != 0) in the function called really 
frequently
and the optimization to avoid calling TimestampDifferenceExceeds works just once
with your change, I'd say.

We call pgstat_send_subworker_xact_stats() in the LogicalRepApplyLoop's loop.
For the apply worker, this should be the first call for normal operation,
before any call of the apply_dispatch(and subsequent commit-like functions which
calls pgstat_send_subworker_xact_stats() in the end).

In the first call, existing v18's codes (without your suggested change) just 
initializes
'last_report' variable because of the diff between 0 and 'now' and returns
because of no stats values in commit_count and abort_count in the function.
After this, 'last_report' will not be 0 for the apply worker.

On the other hand, in the case I add your change, in the first call of
pgstat_send_subworker_xact_stats(), similarly 'last_report' is initialized but
without one call of TimestampDifferenceExceeds(), which might be the 
optimization
effect and the function returns with no stats again. Here the 'last_report' will
not be 0 after this. But, then we'll have to check the condition in the apply 
worker
in the every loop. Besides, after the first setting of 'last_report',
every call of the pgstat_send_subworker_xact_stats() calculates the time 
subtraction.
This means the one skipped call of the function looks less worth in the case of 
frequent
calls of the function. So, I'm not sure it' a good idea to incorporate this 
change.

Kindly let me know if I miss something.
At present, I'll keep the code as it is.

> (5) pgstat_send_subworker_xact_stats()
> 
> I think that the comment:
> 
> +   * Clear out the statistics buffer, so it can be re-used.
> 
> should instead say:
> 
> +   * Clear out the supplied statistics.
> 
> because the current comment infers that repWorker is pointed at the
> MyLogicalRepWorker buffer (which it might be but it shouldn't be relying on
> that) Also, I think that the function header should mention something like:
> "The supplied repWorker statistics are cleared upon return, to assist re-use 
> by
> the caller."
Fixed.

Attached the new patch v19.

Best Regards,
Takamichi Osumi



v19-0002-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v19-0002-Extend-pg_stat_subscription_workers-to-include-g.patch


v19-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch
Description:  v19-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-22 Thread Dilip Kumar
On Tue, Dec 21, 2021 at 11:10 AM Ashutosh Sharma  wrote:
>
> I am getting the below error when running the same test-case that Neha shared 
> in her previous email.
>
> ERROR:  55000: some relations of database "test1" are already in tablespace 
> "tab1"
> HINT:  You must move them back to the database's default tablespace before 
> using this command.
> LOCATION:  movedb, dbcommands.c:1555
>
> test-case:
> 
> create tablespace tab1 location '/home/ashu/test1';
> create tablespace tab location '/home/ashu/test';
>
> create database test tablespace tab;
> \c test
>
> create table t(a int primary key, b text);
>
> create or replace function large_val() returns text language sql as 'select 
> array_agg(md5(g::text))::text from generate_series(1, 256) g';
>
> insert into t values (generate_series(1,10), large_val());
>
> alter table t set tablespace tab1 ;
>
> \c postgres
> create database test1 template test;
>
> \c test1
> alter table t set tablespace tab;
>
> \c postgres
> alter database test1 set tablespace tab1; -- this fails with  the given error.
>
> Observations:
> ===
> Please note that before running above alter database statement, the table 't' 
>  is moved to tablespace 'tab' from 'tab1' so not sure why ReadDir() is 
> returning true when searching for table 't' in tablespace 'tab1'. It should 
> have returned NULL here:
>
>  while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
> {
> if (strcmp(xlde->d_name, ".") == 0 ||
> strcmp(xlde->d_name, "..") == 0)
> continue;
>
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("some relations of database \"%s\" are already in 
> tablespace \"%s\"",
> dbname, tblspcname),
>  errhint("You must move them back to the database's 
> default tablespace before using this command.")));
> }
>
> Also, if I run the checkpoint explicitly before executing the above alter 
> database statement, this error doesn't appear which means it only happens 
> with the new changes because earlier we were doing the force checkpoint at 
> the end of createdb statement.
>

Basically, ALTER TABLE SET TABLESPACE, will register the
SYNC_UNLINK_REQUEST for the table files w.r.t the old tablespace, but
those will get unlinked during the next checkpoint.  Although the
files must be truncated during commit itself but unlink might not have
been processed until the next checkpoint.  This is the explanation for
the behavior you found during your investigation, but I haven't looked
into the issue so I will do it latest by tomorrow and send my
analysis.

Thanks for working on this.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: In-placre persistance change of a relation

2021-12-22 Thread Jakub Wartak
Hi Kyotaro,

> At Tue, 21 Dec 2021 13:07:28 +, Jakub Wartak
>  wrote in
> > So what's suspicious is that 122880 -> 0 file size truncation. I've
> > investigated WAL and it seems to contain TRUNCATE records after logged
> FPI images, so when the crash recovery would kick in it probably clears this
> table (while it shouldn't).
> 
> Darn..  It is too silly that I wrongly issued truncate records for the target
> relation of the function (rel) instaed of the relation on which we're 
> currently
> operating at that time (r).
> 
> [..]
> The following fix works.

Cool, I have verified basic stuff that was coming to my mind, now even cfbot is 
happy with v11, You should happy too I hope :)

> I made another change in this version. Previously only btree among all index
> AMs was processed in the in-place manner.  In this version we do that all
> AMs except GiST.  Maybe if gistGetFakeLSN behaved the same way for
> permanent and unlogged indexes, we could skip index rebuild in exchange of
> some extra WAL records emitted while it is unlogged.

I think there's slight omission:

-- unlogged table -> logged with GiST:
DROP TABLE IF EXISTS testcase;
CREATE UNLOGGED TABLE testcase(geom geometry not null);
CREATE INDEX idx_testcase_gist ON testcase USING gist(geom);
INSERT INTO testcase(geom) SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(-1.0, 
2.0),4326), 0.0001);
ALTER TABLE testcase SET LOGGED;

-- crashes with:
(gdb) where
#0  reindex_index (indexId=indexId@entry=65541, 
skip_constraint_checks=skip_constraint_checks@entry=true, 
persistence=persistence@entry=112 'p', params=params@entry=0x0) at index.c:3521
#1  0x0062f494 in RelationChangePersistence (tab=tab@entry=0x1947258, 
persistence=112 'p', lockmode=lockmode@entry=8) at tablecmds.c:5434
#2  0x00642819 in ATRewriteTables (context=0x7ffc19c04520, 
lockmode=, wqueue=0x7ffc19c04388, parsetree=0x1925ec8) at 
tablecmds.c:5644
[..]
#10 0x007f078f in exec_simple_query (query_string=0x1925340 "ALTER 
TABLE testcase SET LOGGED;") at postgres.c:1215

apparently reindex_index() params cannot be NULL - the same happens with 
switching persistent 
table to unlogged one too (with GiST). 

I'll also try to give another shot to the patch early next year - as we are 
starting long Christmas/holiday break here 
- with verifying WAL for GiST and more advanced setup (more crashes, and 
standby/archiving/barman to see 
how it's possible to use wal_level=minimal <-> replica transitions). 

-J.








Re: Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."

2021-12-22 Thread Ashutosh Sharma
On Wed, Dec 22, 2021 at 7:20 AM Dilip Kumar  wrote:

> On Wed, 22 Dec 2021 at 12:28 AM, Ashutosh Sharma 
> wrote:
>
>>
>> Is it okay to share the same tablespace (infact relfile) between the
>> primary and standby server? Perhaps NO.
>>
>
>> Oops, yeah absolutely they can never share the tablespace path.  So we
> can ignore this issue.
>

That's right. I agree. thanks.!

--
With Regards,
Ashutosh sharma.


fix crash with Python 3.11

2021-12-22 Thread Peter Eisentraut


This patch needs another close pass and possibly some refactoring to 
avoid copy-and-paste, but I'm putting this out here, since people are 
already testing with Python 3.11 and will surely run into this problem.


The way plpy.commit() and plpy.rollback() handle errors is not ideal. 
They end up just longjmping back to the main loop, without telling the 
Python interpreter about it.  This hasn't been a problem so far, 
apparently, but with Python 3.11-to-be, this appears to cause corruption 
in the state of the Python interpreter.  This is readily reproducible 
and will cause crashes in the plpython_transaction test.


The fix is that we need to catch the PostgreSQL error and turn it into a 
Python exception, like we do for other places where plpy.* methods call 
into PostgreSQL internals.From 31bbd62f43ceb0542e0a311dc28704fd702caeb3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Dec 2021 09:01:24 +0100
Subject: [PATCH v1] Set Python exception after failed commit or rollback

When plpy.commit() or plpy.rollback() failed, the SPI code would
ereport(ERROR) and jump all the way back to the main loop, bypassing
PL/Python.  By contrast, in plpy.execute(), we catch the ereport() and
generate a Python exception.  This has the advantage that you can
catch the exception and you get better backtraces.

An additional problem is apparently that beginning in Python
3.11-to-be, longjmping out of the Python interpreter leaves its
internal state inconsistent, and subsequent invocations of PL/Python
code will crash.  With previous Python versions, this has apparently
not been a problem.

To fix, adapt the logic in PLy_spi_subtransaction_abort() (which is
used by plpy.execute() and others) to catch the PostgreSQL error and
turn it into a Python exception.  Move some code around to avoid
additional cross-file zigzag from this.
---
 .../expected/plpython_transaction.out |  18 ++-
 src/pl/plpython/plpy_plpymodule.c |  30 -
 src/pl/plpython/plpy_spi.c| 106 ++
 src/pl/plpython/plpy_spi.h|   3 +
 4 files changed, 121 insertions(+), 36 deletions(-)

diff --git a/src/pl/plpython/expected/plpython_transaction.out 
b/src/pl/plpython/expected/plpython_transaction.out
index 14152993c7..9fd9ef500e 100644
--- a/src/pl/plpython/expected/plpython_transaction.out
+++ b/src/pl/plpython/expected/plpython_transaction.out
@@ -55,8 +55,11 @@ for i in range(0, 10):
 return 1
 $$;
 SELECT transaction_test2();
-ERROR:  invalid transaction termination
-CONTEXT:  PL/Python function "transaction_test2"
+ERROR:  spiexceptions.InvalidTransactionTermination: invalid transaction 
termination
+CONTEXT:  Traceback (most recent call last):
+  PL/Python function "transaction_test2", line 5, in 
+plpy.commit()
+PL/Python function "transaction_test2"
 SELECT * FROM test1;
  a | b 
 ---+---
@@ -70,7 +73,7 @@ plpy.execute("CALL transaction_test1()")
 return 1
 $$;
 SELECT transaction_test3();
-ERROR:  spiexceptions.InvalidTransactionTermination: invalid transaction 
termination
+ERROR:  spiexceptions.InvalidTransactionTermination: 
spiexceptions.InvalidTransactionTermination: invalid transaction termination
 CONTEXT:  Traceback (most recent call last):
   PL/Python function "transaction_test3", line 2, in 
 plpy.execute("CALL transaction_test1()")
@@ -88,7 +91,7 @@ plpy.execute("DO LANGUAGE plpythonu $x$ plpy.commit() $x$")
 return 1
 $$;
 SELECT transaction_test4();
-ERROR:  spiexceptions.InvalidTransactionTermination: invalid transaction 
termination
+ERROR:  spiexceptions.InvalidTransactionTermination: 
spiexceptions.InvalidTransactionTermination: invalid transaction termination
 CONTEXT:  Traceback (most recent call last):
   PL/Python function "transaction_test4", line 2, in 
 plpy.execute("DO LANGUAGE plpythonu $x$ plpy.commit() $x$")
@@ -100,8 +103,11 @@ s.enter()
 plpy.commit()
 $$;
 WARNING:  forcibly aborting a subtransaction that has not been exited
-ERROR:  cannot commit while a subtransaction is active
-CONTEXT:  PL/Python anonymous code block
+ERROR:  spiexceptions.InvalidTransactionTermination: cannot commit while a 
subtransaction is active
+CONTEXT:  Traceback (most recent call last):
+  PL/Python anonymous code block, line 4, in 
+plpy.commit()
+PL/Python anonymous code block
 -- commit inside cursor loop
 CREATE TABLE test2 (x int);
 INSERT INTO test2 VALUES (0), (1), (2), (3), (4);
diff --git a/src/pl/plpython/plpy_plpymodule.c 
b/src/pl/plpython/plpy_plpymodule.c
index 0365acc95b..907f89d153 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -40,8 +40,6 @@ static PyObject *PLy_fatal(PyObject *self, PyObject *args, 
PyObject *kw);
 static PyObject *PLy_quote_literal(PyObject *self, PyObject *args);
 static PyObject *PLy_quote_nullable(PyObject *self, PyObject *args);
 static PyObject *PLy_quote_ident(PyObject *self, PyObject *args);
-static PyObject *PLy_commit(PyObject *self, PyObject 

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-22 Thread Ashutosh Sharma
Hi Dilip,

On Tue, Dec 21, 2021 at 11:10 AM Ashutosh Sharma 
wrote:

> I am getting the below error when running the same test-case that Neha
> shared in her previous email.
>
> ERROR:  55000: some relations of database "test1" are already in
> tablespace "tab1"
> HINT:  You must move them back to the database's default tablespace before
> using this command.
> LOCATION:  movedb, dbcommands.c:1555
>
> test-case:
> 
> create tablespace tab1 location '/home/ashu/test1';
> create tablespace tab location '/home/ashu/test';
>
> create database test tablespace tab;
> \c test
>
> create table t(a int primary key, b text);
>
> create or replace function large_val() returns text language sql as
> 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
>
> insert into t values (generate_series(1,10), large_val());
>
> alter table t set tablespace tab1 ;
>
> \c postgres
> create database test1 template test;
>
> \c test1
> alter table t set tablespace tab;
>
> \c postgres
> alter database test1 set tablespace tab1; -- this fails with  the given
> error.
>
> Observations:
> ===
> Please note that before running above alter database statement, the table
> 't'  is moved to tablespace 'tab' from 'tab1' so not sure why ReadDir() is
> returning true when searching for table 't' in tablespace 'tab1'. It should
> have returned NULL here:
>
>  while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
> {
> if (strcmp(xlde->d_name, ".") == 0 ||
> strcmp(xlde->d_name, "..") == 0)
> continue;
>
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("some relations of database \"%s\" are already
> in tablespace \"%s\"",
> dbname, tblspcname),
>  errhint("You must move them back to the database's
> default tablespace before using this command.")));
> }
>
> Also, if I run the checkpoint explicitly before executing the above alter
> database statement, this error doesn't appear which means it only happens
> with the new changes because earlier we were doing the force checkpoint at
> the end of createdb statement.
>

Is this expected? I think it is not.

--
With Regards,
Ashutosh Sharma.


Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-12-22 Thread Pavel Borisov
>
> The tests in check_btree.sql no longer create a bttest_unique table, so
> the DROP TABLE is surplusage:
>
> +DROP TABLE bttest_unique;
> +ERROR:  table "bttest_unique" does not exist
>
>
> The changes in pg_amcheck.c to pass the new checkunique parameter will
> likely need to be based on a amcheck version check.  The implementation of
> prepare_btree_command() in pg_amcheck.c should be kept compatible with
> older versions of amcheck, because it connects to remote servers and you
> can't know in advance that the remote servers are as up-to-date as the
> machine where pg_amcheck is installed.  I'm thinking specifically about
> this change:
>
> @@ -871,7 +877,8 @@ prepare_btree_command(PQExpBuffer sql, RelationInfo
> *rel, PGconn *conn)
> if (opts.parent_check)
> appendPQExpBuffer(sql,
>   "SELECT %s.bt_index_parent_check("
> - "index := c.oid, heapallindexed := %s,
> rootdescend := %s)"
> + "index := c.oid, heapallindexed := %s,
> rootdescend := %s, "
> + "checkunique := %s)"
>   "\nFROM pg_catalog.pg_class c,
> pg_catalog.pg_index i "
>   "WHERE c.oid = %u "
>   "AND c.oid = i.indexrelid "
>
> If the user calls pg_amcheck with --checkunique, and one or more remote
> servers have an amcheck version < 1.4, at a minimum you'll need to avoid
> calling bt_index_parent_check with that parameter, and probably also you'll
> either need to raise a warning or perhaps an error telling the user that
> such a check cannot be performed.
>
>
> You've forgotten to include contrib/amcheck/amcheck--1.3--1.4.sql in the
> v5 patch, resulting in a failed install:
>
> /usr/bin/install -c -m 644 ./amcheck--1.3--1.4.sql ./amcheck--1.2--1.3.sql
> ./amcheck--1.1--1.2.sql ./amcheck--1.0--1.1.sql ./amcheck--1.0.sql
> '/Users/mark.dilger/hydra/unique_review.5/tmp_install/Users/mark.dilger/pgtest/test_install/share/postgresql/extension/'
> install: ./amcheck--1.3--1.4.sql: No such file or directory
> make[2]: *** [install] Error 71
> make[1]: *** [checkprep] Error 2
>
> Using the one from the v4 patch fixes the problem.  Please include this
> file in v6, to simplify the review process.
>
>
> The changes to t/005_opclass_damage.pl look ok.  The creation of a new
> table for the new test seems unnecessary, but only problematic in that it
> makes the test slightly longer to read.  I recommend changing the test to
> use the same table that the prior test uses, but that is just a
> recommendation, not a requirement.
>
> You should add coverage for --checkunique to t/003_check.pl.
>
> You should add coverage for multiple PostgreSQL::Test::Cluster instances
> running differing versions of amcheck, perhaps some on version 1.3 and some
> on version 1.4.  Then test that the --checkunique option works adequately.
>

Thank you, Mark!

In v6 (PFA) I've made the changes on your advice i.e.

- pg_amcheck with --checkunique option will ignore uniqueness check (with a
warning) if amcheck version in a db is <1.4 and doesn't support the feature.
- fixed unnecessary drop table in regression
- use the existing table for uniqueness check in 005_opclass_damage.pl
- added tests into 003_check.pl . It is only smoke test that just verifies
new functions.
- added test contrib/amcheck/t/004_verify_nbtree_unique.pl it is more
extensive test based on opclass damage which was intended to be main test
for amcheck, but which I've forgotten to add to commit in v5.
005_opclass_damage.pl test, which you've seen in v5 is largely based on
first part of 004_verify_nbtree_unique.pl (with the later calling
pg_amcheck, and the former calling bt_index_check(),
bt_index_parent_check() )
- added forgotten upgrade script amcheck--1.3--1.4.sql (from v4)

You are welcome with more considerations.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v6-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch
Description: Binary data