Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-05-30 Thread Michael Paquier
On Sat, May 28, 2022 at 05:30:51PM +0200, Daniel Gustafsson wrote:
> On 27 May 2022, at 23:07, Thomas Munro  wrote:
>> We should go full Marie Kondo on EOL'd OSes that are not in our CI or
>> build farm, IMHO.
> 
> FWIW, +1

Okay, I am outnumbered, and that would mean bumping MIN_WINNT to
0x0A00.  So, ready to move to this version at full speed for 16?  We
still have a couple of weeks ahead before the next dev cycle begins,
so feel free to comment, of course.
--
Michael


signature.asc
Description: PGP signature


Re: Shmem queue is not flushed if receiver is not yet attached

2022-05-30 Thread Pavan Deolasee
Hi Robert,

On Tue, May 24, 2022 at 8:35 PM Robert Haas  wrote:

>
>
> I think that this patch is basically correct, except that it's not
> correct to set mqh_counterparty_attached when receiver is still NULL.
> Here's a v2 where I've attempted to correct that while preserving the
> essence of your proposed fix.
>

This looks good to me,


>
> I'm not sure that we need a shm_mq_flush(), but we definitely don't
> have one currently, so I've also adjusted your patch to remove the
> dead prototype.
>
>
Makes sense to me.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com


Re: Shmem queue is not flushed if receiver is not yet attached

2022-05-30 Thread Pavan Deolasee
On Wed, May 25, 2022 at 7:01 AM Japin Li  wrote:

>
> I have a problem that is also related to shmem queue [1], however, I cannot
> reproduce it.  How did you reproduce this problem?
>
>
I discovered this bug while working on an extension that makes use of the
shared memory queue facility. Not sure how useful that is for your purpose.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com


Re: Remove support for Visual Studio 2013

2022-05-30 Thread Michael Paquier
On Thu, May 26, 2022 at 05:50:40PM -0400, Tom Lane wrote:
> The comments about that in win32_port.h and cygwin.h only date back
> to 2019, so it seems unlikely that the situation has changed much.
> We could try removing HAVE_BUGGY_STRTOF to see if the buildfarm
> complains, but I wouldn't bet money on that succeeding.  What we
> *do* need to do is update the #if tests and comments to make clear
> that HAVE_BUGGY_STRTOF is only needed for Mingw and Cygwin, not
> for any supported MSVC release.

After looking at that again, the whole comment related to VS in
strtof.c can be removed.  I have noticed while on it more places that
still referred to VS2013 in ./configure[.ac] and win32_langinfo() got
an overall incorrect description.  This leads to v2 as of the
attached.
--
Michael
From 86b838c70b30998a981f6e53cacb0885963eda66 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 30 May 2022 16:44:22 +0900
Subject: [PATCH v2] Remove support for VS 2013

---
 src/include/port/win32.h  | 10 +++
 src/include/port/win32_port.h |  7 +
 src/backend/main/main.c   | 22 ---
 src/backend/optimizer/path/costsize.c |  4 +--
 src/backend/utils/adt/float.c |  6 +---
 src/backend/utils/adt/pg_locale.c | 40 ---
 src/port/chklocale.c  | 33 --
 src/port/strtof.c |  9 --
 doc/src/sgml/install-windows.sgml | 10 +++
 configure |  7 ++---
 configure.ac  |  7 ++---
 src/tools/msvc/MSBuildProject.pm  | 27 +-
 src/tools/msvc/README |  8 +++---
 src/tools/msvc/Solution.pm| 28 ---
 src/tools/msvc/VSObjectFactory.pm | 12 ++--
 15 files changed, 30 insertions(+), 200 deletions(-)

diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index c6213c77c3..539f3ec6d1 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -11,12 +11,12 @@
 
 /*
  * Make sure _WIN32_WINNT has the minimum required value.
- * Leave a higher value in place. When building with at least Visual
- * Studio 2015 the minimum requirement is Windows Vista (0x0600) to
- * get support for GetLocaleInfoEx() with locales. For everything else
- * the minimum version is Windows XP (0x0501).
+ * Leave a higher value in place. When building with Visual Studio the
+ * minimum requirement is Windows Vista (0x0600) to get support for
+ * GetLocaleInfoEx() with locales. For everything else the minimum
+ * version is Windows XP (0x0501).
  */
-#if defined(_MSC_VER) && _MSC_VER >= 1900
+#ifdef _MSC_VER
 #define MIN_WINNT 0x0600
 #else
 #define MIN_WINNT 0x0501
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index dbbf88f8e8..c0225603f2 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -531,13 +531,8 @@ typedef unsigned short mode_t;
 
 #endif			/* _MSC_VER */
 
-#if (defined(_MSC_VER) && (_MSC_VER < 1900)) || \
-	defined(__MINGW32__) || defined(__MINGW64__)
+#if  defined(__MINGW32__) || defined(__MINGW64__)
 /*
- * VS2013 has a strtof() that seems to give correct answers for valid input,
- * even on the rounding edge cases, but which doesn't handle out-of-range
- * input correctly. Work around that.
- *
  * Mingw claims to have a strtof, and my reading of its source code suggests
  * that it ought to work (and not need this hack), but the regression test
  * results disagree with me; whether this is a version issue or not is not
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c43a527d3f..bb782fa1ec 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -30,11 +30,6 @@
 #include 
 #endif
 
-#if defined(_M_AMD64) && _MSC_VER == 1800
-#include 
-#include 
-#endif
-
 #include "bootstrap/bootstrap.h"
 #include "common/username.h"
 #include "port/atomics.h"
@@ -290,23 +285,6 @@ startup_hacks(const char *progname)
 		_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
 		_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
 		_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
-
-#if defined(_M_AMD64) && _MSC_VER == 1800
-
-		/*--
-		 * Avoid crashing in certain floating-point operations if we were
-		 * compiled for x64 with MS Visual Studio 2013 and are running on
-		 * Windows prior to 7/2008R2 SP1 on an AVX2-capable CPU.
-		 *
-		 * Ref: https://connect.microsoft.com/VisualStudio/feedback/details/811093/visual-studio-2013-rtm-c-x64-code-generation-bug-for-avx2-instructions
-		 *--
-		 */
-		if (!IsWindows7SP1OrGreater())
-		{
-			_set_FMA3_enable(0);
-		}
-#endif			/* defined(_M_AMD64) && _MSC_VER == 1800 */
-
 	}
 #endif			/* WIN32 */
 
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index fcc26b01a4..5e5732f6e1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.

Re: Remove useless tests about TRUNCATE on foreign table

2022-05-30 Thread Michael Paquier
On Fri, May 27, 2022 at 05:25:43PM +0900, Yugo NAGATA wrote:
> --- TRUNCATE doesn't work on foreign tables, either directly or recursively
> -TRUNCATE ft2;  -- ERROR
> -ERROR:  foreign-data wrapper "dummy" has no handler
> -TRUNCATE fd_pt1;  -- ERROR
> -ERROR:  foreign-data wrapper "dummy" has no handler
>  DROP TABLE fd_pt1 CASCADE;

In the case of this test, fd_pt1 is a normal table that ft2 inherits,
so this TRUNCATE command somewhat checks that the TRUNCATE falls back
to the foreign table in this case.  However, this happens to be tested
in postgres_fdw (see around tru_ftable_parent),

> --- TRUNCATE doesn't work on foreign tables, either directly or recursively
> -TRUNCATE fd_pt2_1;  -- ERROR
> -ERROR:  foreign-data wrapper "dummy" has no handler
> -TRUNCATE fd_pt2;  -- ERROR
> -ERROR:  foreign-data wrapper "dummy" has no handler

Partitions have also some coverage as far as I can see, so I agree
that it makes little sense to keep the tests you are removing here.
--
Michael


signature.asc
Description: PGP signature


RE: Hash index build performance tweak from sorting

2022-05-30 Thread shiy.f...@fujitsu.com
On Tue, May 10, 2022 5:43 PM Simon Riggs  wrote:
> 
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila 
> wrote:
> >
> > Few comments on the patch:
> > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as
> > the nearby code is using.
> > 2. You may want to change the below comment in HSpool
> > /*
> > * We sort the hash keys based on the buckets they belong to. Below
> masks
> > * are used in _hash_hashkey2bucket to determine the bucket of given
> hash
> > * key.
> > */
> 
> Addressed in new patch, v2.
> 

I think your changes looks reasonable.

Besides, I tried this patch with Simon's script, and index creation time was 
about
7.5% faster after applying this patch on my machine, which looks good to me.

RESULT - index creation time
===
HEAD: 9513.466 ms
Patched: 8796.75 ms

I ran it 10 times and got the average, and here are the configurations used in
the test:
shared_buffers = 2GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Regards,
Shi yu


Re: CREATE COLLATION must be specified

2022-05-30 Thread jian he
--ok
CREATE COLLATION some_collation (
PROVIDER = icu,
LOCALE = 'en-u-ks-primary',
DETERMINISTIC = FALSE
);

CREATE COLLATION some_collation1 (
PROVIDER = icu,
LC_COLLATE = 'en-u-ks-primary',
LC_CTYPE = 'en-u-ks-primary',
DETERMINISTIC = FALSE
);
--ERROR: parameter "locale" must be specified

CREATE COLLATION some_collation2 (
LC_COLLATE = 'en-u-ks-primary',
LC_CTYPE = 'en-u-ks-primary',
LOCALE = 'en-u-ks-primary',
PROVIDER = icu,
DETERMINISTIC = FALSE
);
--ERROR:  conflicting or redundant options
--DETAIL:  LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.

Since LC_COLLATE is bundled together with LC_CTYPE.
In 15, If the provider is ICU then LC_COLLATE and LC_CTYPE are no longer
required?


On Sat, May 28, 2022 at 11:55 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 28.05.22 20:16, Shay Rojansky wrote:
> > CREATE COLLATION some_collation (LC_COLLATE = 'en-u-ks-primary',
> >  LC_CTYPE = 'en-u-ks-primary',
> >  PROVIDER = icu,
> >  DETERMINISTIC = False
> > );
> >
> > This works on PG14, but on PG15 it errors with 'parameter "locale" must
> > be specified'.
> >
> > I wanted to make sure this breaking change is intentional (it doesn't
> > seem documented in the release notes or in the docs for CREATE
> COLLATION).
>
> This change is intentional, but the documentation could be improved.
>
>
>

-- 
 I recommend David Deutsch's <>

  Jian


Re: Skipping schema changes in publication

2022-05-30 Thread Peter Smith
Here are some minor review comments for v7-0001.

==

1. General

Probably the commit message and all the PG docs and code comments
should be changed to refer to "publication parameters" instead of
(currently) "publication options". This is because these things are
really called "publication_parameters" in the PG docs [1].

All the following review comments are just examples of this suggestion.

~~~

2. Commit message

"includes resetting the publication options," -> "includes resetting
the publication parameters,"

~~~

3. doc/src/sgml/ref/alter_publication.sgml

+  
+   The RESET clause will reset the publication to the
+   default state which includes resetting the publication options, setting
+   ALL TABLES flag to false and
+   dropping all relations and schemas that are associated with the publication.
   


"resetting the publication options," -> "resetting the publication parameters,"

~~~

4. src/backend/commands/publicationcmds.c

@@ -53,6 +53,14 @@
 #include "utils/syscache.h"
 #include "utils/varlena.h"

+/* CREATE PUBLICATION default values for flags and options */
+#define PUB_DEFAULT_ACTION_INSERT true
+#define PUB_DEFAULT_ACTION_UPDATE true
+#define PUB_DEFAULT_ACTION_DELETE true
+#define PUB_DEFAULT_ACTION_TRUNCATE true
+#define PUB_DEFAULT_VIA_ROOT false
+#define PUB_DEFAULT_ALL_TABLES false

"flags and options" -> "flags and publication parameters"

~~~

5. src/backend/commands/publicationcmds.c

+/*
+ * Reset the publication.
+ *
+ * Reset the publication options, setting ALL TABLES flag to false and drop
+ * all relations and schemas that are associated with the publication.
+ */
+static void
+AlterPublicationReset(ParseState *pstate, AlterPublicationStmt *stmt,
+   Relation rel, HeapTuple tup)

"Reset the publication options," -> "Reset the publication parameters,"

~~~

6. src/test/regress/sql/publication.sql

+-- Verify that publish options and publish_via_partition_root option are reset
+\dRp+ testpub_reset
+ALTER PUBLICATION testpub_reset RESET;
+\dRp+ testpub_reset

SUGGESTION
-- Verify that 'publish' and 'publish_via_partition_root' publication
parameters are reset

--
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-05-30 Thread Kyotaro Horiguchi
At Sat, 28 May 2022 13:22:45 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2022-05-26 16:27:53 +0900, Kyotaro Horiguchi wrote:
> > It could be in SQL, but *I* prefer to use perl for this, since it
> > allows me to write a bit complex things (than simple string
> > comparison) simpler.
> 
> I wonder if we shouldn't just expose a C function to do this, rather than
> having a separate implementation in a tap test.

It was annoying that I needed to copy the unit-conversion stuff.  I
did that in the attached.  parse_val() and check_val() and the duped
data is removed.

> > +# parameter names that cannot get consistency check performed
> > +my @ignored_parameters =
> 
> I think most of these we could ignore by relying on source <> 'override'
> instead of listing them?
> 
> 
> > +# parameter names that requires case-insensitive check
> > +my @case_insensitive_params =
> > +  ('ssl_ciphers',
> > +   'log_filename',
> > +   'event_source',
> > +   'log_timezone',
> > +   'timezone',
> > +   'lc_monetary',
> > +   'lc_numeric',
> > +   'lc_time');
> 
> Why do these differ by case?

Mmm. It just came out of a thinko. I somehow believed that the script
down-cases only the parameter names among the values from
pg_settings. I felt that something's strange while on it,
though.. After fixing it, there are only the following values that
differ only in letter cases. In passing I changed "bool" and "enum"
are case-sensitive, too.

name  conf  bootval
client_encoding: "sql_ascii"  "SQL_ASCII"
datestyle  : "iso, mdy"   "ISO, MDY"
syslog_facility: "LOCAL0" "local0"

It seems to me that the bootval is right for all variables.


I added a testing-aid function pg_normalize_config_option(name,value)
so the consistency check can be performed like this.

SELECT f.n, f.v, s.boot_val
FROM (VALUES ('work_mem','4MB'),...) f(n,v)
JOIN pg_settings s ON s.name = f.n '.
WHERE pg_normalize_config_value(f.n, f.v) <> '.
  pg_normalize_config_value(f.n, s.boot_val)';

There're some concerns on the function.

- _ShowConfig() returns archive_command as "(disabled)" regardless of
  its value.  The test passes accidentally for the variable...

- _ShowConfig() errors out for "timezone_abbreviations" and "" since
  the check function tries to open the timezone file. (It is excluded
  from the test.)

I don't want to create a copy of the function only for this purpose.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1e52b23111b5e8860092dcfea0ac9459f1725f0d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 30 May 2022 16:11:34 +0900
Subject: [PATCH v2] Add fileval-bootval consistency check of GUC parameters

We should keep GUC variables consistent between the default values
written in postgresql.conf.sample and defined in guc.c. Add an
automated way to check for the consistency to the TAP test suite. Some
variables are still excluded since they cannot be checked simple way.
---
 src/backend/utils/misc/guc.c  | 70 +++
 src/backend/utils/misc/postgresql.conf.sample |  6 +-
 src/include/catalog/pg_proc.dat   |  5 ++
 src/test/modules/test_misc/t/003_check_guc.pl | 55 +--
 4 files changed, 127 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 55d41ae7d6..cd47dede1a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -7514,6 +7514,76 @@ parse_and_validate_value(struct config_generic *record,
 	return true;
 }
 
+/*
+ * Helper function for pg_normalize_config_value().
+ * Makes a palloced copy of src then link val to it.
+ * DO NOT destroy val while dst is in use.
+ */
+static struct config_generic *
+copy_config_and_set_value(struct config_generic *src, union config_var_val *val)
+{
+	struct config_generic *dst;
+
+#define CREATE_CONFIG_COPY(dst, src, t)	\
+	dst = palloc(sizeof(struct t));\
+	*(struct t *) dst = *(struct t *) src;		\
+	
+	switch (src->vartype)
+	{
+		case PGC_BOOL:
+			CREATE_CONFIG_COPY(dst, src, config_bool);
+			((struct config_bool *)dst)->variable = &val->boolval;
+			break;
+		case PGC_INT:
+			CREATE_CONFIG_COPY(dst, src, config_int);
+			((struct config_int *)dst)->variable = &val->intval;
+			break;
+		case PGC_REAL:
+			CREATE_CONFIG_COPY(dst, src, config_real);
+			((struct config_real *)dst)->variable = &val->realval;
+			break;
+		case PGC_STRING:
+			CREATE_CONFIG_COPY(dst, src, config_string);
+			((struct config_string *)dst)->variable = &val->stringval;
+			break;
+		case PGC_ENUM:
+			CREATE_CONFIG_COPY(dst, src, config_enum);
+			((struct config_enum *)dst)->variable = &val->enumval;
+			break;
+	}
+
+	return dst;
+}
+
+
+/*
+ * Normalize given value according to the specified GUC variable
+ */
+Datum
+pg_normalize_config_value(PG_FUNCTION_ARGS)
+{
+	char   *name = "";
+	char   *value = "";
+	struct config_generic *record;
+	char   *result;
+	void   *extra;
+	union config_var_val altval;
+
+	if (!

ParseTzFile doesn't FreeFile on error

2022-05-30 Thread Kyotaro Horiguchi
While working on some patch, I saw the following error message when a
transaction ended successfully after a failed call to
parse_and_validate_value().

The cause is ParseTzFile() returns leaving an open file descriptor
unfreed in some error cases.

This happens only in a special case when the errors are ignored, but
in principle the file descriptor should be released before exiting the
function.

I'm not sure it's worth fixing but the attached fixes that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From a81f2fd6f0293f3e813575d76a59beda93bc030a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 30 May 2022 15:49:16 +0900
Subject: [PATCH 1/2] Fix ParseTzFile to call FreeFile() on error

ParseTzFile() forgot to FreeFile() zone file on error. Thus when this
is called within a successfully-ended transaction, it leaves an open
file descriptor, which leads to the error "n temporary files and
directories not closed at end-of-transaction".
---
 src/backend/utils/misc/tzparser.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/misc/tzparser.c b/src/backend/utils/misc/tzparser.c
index a69cb2d268..bfb259b3c3 100644
--- a/src/backend/utils/misc/tzparser.c
+++ b/src/backend/utils/misc/tzparser.c
@@ -364,7 +364,7 @@ ParseTzFile(const char *filename, int depth,
 			{
 GUC_check_errmsg("could not read time zone file \"%s\": %m",
  filename);
-return -1;
+goto error_exit;
 			}
 			/* else we're at EOF after all */
 			break;
@@ -374,7 +374,7 @@ ParseTzFile(const char *filename, int depth,
 			/* the line is too long for tzbuf */
 			GUC_check_errmsg("line is too long in time zone file \"%s\", line %d",
 			 filename, lineno);
-			return -1;
+			goto error_exit;
 		}
 
 		/* skip over whitespace */
@@ -397,12 +397,12 @@ ParseTzFile(const char *filename, int depth,
 			{
 GUC_check_errmsg("@INCLUDE without file name in time zone file \"%s\", line %d",
  filename, lineno);
-return -1;
+goto error_exit;
 			}
 			n = ParseTzFile(includeFile, depth + 1,
 			base, arraysize, n);
 			if (n < 0)
-return -1;
+goto error_exit;
 			continue;
 		}
 
@@ -413,14 +413,19 @@ ParseTzFile(const char *filename, int depth,
 		}
 
 		if (!splitTzLine(filename, lineno, line, &tzentry))
-			return -1;
+			goto error_exit;
 		if (!validateTzEntry(&tzentry))
-			return -1;
+			goto error_exit;
 		n = addToArray(base, arraysize, n, &tzentry, override);
 		if (n < 0)
-			return -1;
+			goto error_exit;
 	}
+	goto success;
 
+error_exit:
+	n = -1;
+
+success:
 	FreeFile(tzFile);
 
 	return n;
-- 
2.31.1



Re: Prevent writes on large objects in read-only transactions

2022-05-30 Thread Yugo NAGATA
On Sat, 28 May 2022 18:00:54 +0900
Michael Paquier  wrote:

> On Fri, May 27, 2022 at 03:30:28PM +0900, Yugo NAGATA wrote:
> > Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put,
> > and lo_from_bytea are allowed even in read-only transactions.
> > By using them, pg_largeobject and pg_largeobject_metatable can
> > be modified in read-only transactions and the effect remains
> > after the transaction finished. Is it unacceptable behaviours, 
> > isn't it?
> 
> Well, there is an actual risk to break applications that have worked
> until now for a behavior that has existed for years with zero
> complaints on the matter, so I would leave that alone.  Saying that, I
> don't really disagree with improving the error messages a bit if we
> are in recovery.

Thank you for your comment. I am fine with leaving the behaviour in
read-only transactions as is if anyone don't complain and there are no
risks. 

As to the error messages during recovery, I think it is better to improve
it, because the current messages are emitted by elog() and it seems to imply
they are internal errors that we don't expected. I attached a updated patch
for it.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 5804532881..55b25b19ea 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -245,6 +245,8 @@ be_lo_creat(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId;
 
+	PreventCommandDuringRecovery("lo_creat()");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(InvalidOid);
 
@@ -256,6 +258,8 @@ be_lo_create(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandDuringRecovery("lo_create()");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(lobjId);
 
@@ -306,6 +310,8 @@ be_lo_unlink(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandDuringRecovery("lo_unlink()");
+
 	/*
 	 * Must be owner of the large object.  It would be cleaner to check this
 	 * in inv_drop(), but we want to throw the error before not after closing
@@ -368,6 +374,8 @@ be_lowrite(PG_FUNCTION_ARGS)
 	int			bytestowrite;
 	int			totalwritten;
 
+	PreventCommandDuringRecovery("lowrite()");
+
 	bytestowrite = VARSIZE_ANY_EXHDR(wbuf);
 	totalwritten = lo_write(fd, VARDATA_ANY(wbuf), bytestowrite);
 	PG_RETURN_INT32(totalwritten);
@@ -413,6 +421,8 @@ lo_import_internal(text *filename, Oid lobjOid)
 	LargeObjectDesc *lobj;
 	Oid			oid;
 
+	PreventCommandDuringRecovery("lo_import()");
+
 	/*
 	 * open the file to be read in
 	 */
@@ -539,6 +549,8 @@ lo_truncate_internal(int32 fd, int64 len)
 {
 	LargeObjectDesc *lobj;
 
+	PreventCommandDuringRecovery("lo_truncate()");
+
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -815,6 +827,8 @@ be_lo_from_bytea(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandDuringRecovery("lo_from_bytea()");
+
 	lo_cleanup_needed = true;
 	loOid = inv_create(loOid);
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
@@ -837,6 +851,8 @@ be_lo_put(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandDuringRecovery("lo_put()");
+
 	lo_cleanup_needed = true;
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
 


Amcheck verification of GiST and GIN

2022-05-30 Thread Andrey Borodin
Hello world!

Few years ago we had a thread with $subj [0]. A year ago Heikki put a lot of 
effort in improving GIN checks [1] while hunting a GIN bug.
And in view of some releases with a recommendation to reindex anything that 
fails or lacks amcheck verification, I decided that I want to review the thread.

PFA $subj incorporating all Heikki's improvements and restored GiST checks. 
Also I've added heapallindexed verification for GiST. I'm sure that we must add 
it for GIN too. Yet I do not know how to implement it. Maybe just check that 
every entry generated from heap present in entry tree? Or that every tids is 
present in the index?

GiST verification does parent check despite taking only AccessShareLock. It's 
possible because when the key discrepancy is found we acquire parent tuple with 
lock coupling. I'm sure that this is correct to check keys this way. And I'm 
almost sure it will not deadlock, because split is doing the same locking.

What do you think?

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/CAF3eApa07-BajjG8%2BRYx-Dr_cq28ZA0GsZmUQrGu5b2ayRhB5A%40mail.gmail.com
[1] 
https://www.postgresql.org/message-id/flat/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf%40iki.fi#fec2751faf1ca52495b0a61acc0f5532


v10-0001-Amcheck-for-GIN-and-GiST.patch
Description: Binary data


Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-30 Thread Amit Kapila
On Mon, May 30, 2022 at 2:22 PM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patches(only changed 0001 and 0002)
>

Few comments/suggestions for 0001 and 0003
=
0001

1.
+ else
+ snprintf(bgw.bgw_name, BGW_MAXLEN,
+ "logical replication apply worker for subscription %u", subid);

Can we slightly change the message to: "logical replication background
apply worker for subscription %u"?

2. Can we think of separating the new logic for applying the xact by
bgworker into a new file like applybgwroker or applyparallel? We have
previously done the same in the case of vacuum (see vacuumparallel.c).

3.
+ /*
+ * XXX The publisher side doesn't always send relation update messages
+ * after the streaming transaction, so update the relation in main
+ * apply worker here.
+ */
+ if (action == LOGICAL_REP_MSG_RELATION)
+ {
+ LogicalRepRelation *rel = logicalrep_read_rel(s);
+ logicalrep_relmap_update(rel);
+ }

I think the publisher side won't send the relation update message
after streaming transaction only if it has already been sent for a
non-streaming transaction in which case we don't need to update the
local cache here. This is as per my understanding of
maybe_send_schema(), do let me know if I am missing something? If my
understanding is correct then we don't need this change.

4.
+ * For the main apply worker, if in streaming mode (receiving a block of
+ * streamed transaction), we send the data to the apply background worker.
  *
- * If in streaming mode (receiving a block of streamed transaction), we
- * simply redirect it to a file for the proper toplevel transaction.

This comment is slightly confusing. Can we change it to something
like: "In streaming case (receiving a block of streamed transaction),
for SUBSTREAM_ON mode, we simply redirect it to a file for the proper
toplevel transaction, and for SUBSTREAM_APPLY mode, we send the
changes to background apply worker."?

5.
+apply_handle_stream_abort(StringInfo s)
 {
...
...
+ /*
+ * If the two XIDs are the same, it's in fact abort of toplevel xact,
+ * so just free the subxactlist.
+ */
+ if (subxid == xid)
+ {
+ set_apply_error_context_xact(subxid, InvalidXLogRecPtr);

- fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
- false);
+ AbortCurrentTransaction();

- buffer = palloc(BLCKSZ);
+ EndTransactionBlock(false);
+ CommitTransactionCommand();
+
+ in_remote_transaction = false;
...
...
}

Here, can we update the replication origin as we are doing in
apply_handle_rollback_prepared? Currently, we don't do it because we
are just cleaning up temporary files for which we don't even have a
transaction. Also, we don't have the required infrastructure to
advance origins for aborts as we have for abort prepared. See commits
[1eb6d6527a][8a812e5106]. If we think it is a good idea then I think
we need to send abort_lsn and abort_time from the publisher and we
need to be careful to make it work with lower subscriber versions that
don't have the facility to process these additional values.

0003

6.
+ /*
+ * If any unique index exist, check that they are same as remoterel.
+ */
+ if (!rel->sameunique)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot replicate relation with different unique index"),
+ errhint("Please change the streaming option to 'on' instead of 'apply'.")));

I think we can do better here. Instead of simply erroring out and
asking the user to change streaming mode, we can remember this in the
system catalog probably in pg_subscription, and then on restart, we
can change the streaming mode to 'on', perform the transaction, and
again change the streaming mode to apply. I am not sure whether we
want to do it in the first version or not, so if you agree with this,
developing it as a separate patch would be a good idea.

Also, please update comments here as to why we don't handle such cases.

-- 
With Regards,
Amit Kapila.




Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word

2022-05-30 Thread Michael Meskes
> I looked briefly at whether we could improve that situation.
> Two of the four uses of ECPGColLabelCommon in ecpg.trailer can be
> converted to the more general ECPGColLabel without difficulty,
> but trying to change either of the uses in var_type results in
> literally thousands of shift/reduce and reduce/reduce conflicts.
> This seems to be because what follows ecpgstart can be either a general
> SQL statement or an ECPGVarDeclaration (beginning with var_type),
> and bison isn't smart enough to disambiguate.  I have a feeling that
> this situation could be improved with enough elbow grease, because
> plpgsql manages to solve a closely-related problem in allowing its
> assignment statements to coexist with general SQL statements.  But
> I don't have the interest to tackle it personally, and anyway any
> fix would probably be more invasive than we want to put in post-beta.

Right, the reason for all this is that the part after the "exec sql" could be
either language, SQL or C. It has been like this for all those years. I do not
claim that the solution we have is the best, it's only the best I could come up
with when I implemented it. If anyone has a better way, please be my guest.

> I also wondered briefly about just changing the affected test cases,
> reasoning that breaking some ECPG applications that use "string" is
> less bad than breaking everybody else that uses "string".  But type
> "string" is apparently a standard ECPG and/or INFORMIX thing, so we
> probably can't get away with that.

IIRC STRING is a normal datatype for INFORMIX and thus is implemented in ECPG
for its compatibility.

> Hence, I propose the attached, which fixes the easily-fixable
> ECPGColLabelCommon uses and adds a hard-wired special case for
> STRING to handle the var_type usage.

I'm fine with this approach.

Thanks Tom.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org




Re: Ignoring BRIN for HOT udpates seems broken

2022-05-30 Thread Matthias van de Meent
On Sat, 28 May 2022 at 22:51, Tomas Vondra
 wrote:
>
>
>
> On 5/28/22 21:24, Matthias van de Meent wrote:
> > On Sat, 28 May 2022 at 16:51, Tomas Vondra
> >  wrote:
> >>
> >> Hi,
> >>
> >> while working on some BRIN stuff, I realized (my) commit 5753d4ee320b
> >> ignoring BRIN indexes for HOT is likely broken. Consider this example:
> >>
> >> --
> >> CREATE TABLE t (a INT) WITH (fillfactor = 10);
> >>
> >> INSERT INTO t SELECT i
> >>   FROM generate_series(0,10) s(i);
> >>
> >> CREATE INDEX ON t USING BRIN (a);
> >>
> >> UPDATE t SET a = 0 WHERE random() < 0.01;
> >>
> >> SET enable_seqscan = off;
> >> EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF) SELECT * FROM t WHERE a = 0;
> >>
> >> SET enable_seqscan = on;
> >> EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF) SELECT * FROM t WHERE a = 0;
> >> --
> >>
> >> which unfortunately produces this:
> >>
> >>   QUERY PLAN
> >> ---
> >>  Bitmap Heap Scan on t (actual rows=23 loops=1)
> >>Recheck Cond: (a = 0)
> >>Rows Removed by Index Recheck: 2793
> >>Heap Blocks: lossy=128
> >>->  Bitmap Index Scan on t_a_idx (actual rows=1280 loops=1)
> >>  Index Cond: (a = 0)
> >>  Planning Time: 0.049 ms
> >>  Execution Time: 0.424 ms
> >> (8 rows)
> >>
> >> SET
> >>QUERY PLAN
> >> -
> >>  Seq Scan on t (actual rows=995 loops=1)
> >>Filter: (a = 0)
> >>Rows Removed by Filter: 99006
> >>  Planning Time: 0.027 ms
> >>  Execution Time: 7.670 ms
> >> (5 rows)
> >>
> >> That is, the index fails to return some of the rows :-(
> >>
> >> I don't remember the exact reasoning behind the commit, but the commit
> >> message justifies the change like this:
> >>
> >> There are no index pointers to individual tuples in BRIN, and the
> >> page range summary will be updated anyway as it relies on visibility
> >> info.
> >>
> >> AFAICS that's a misunderstanding of how BRIN uses visibility map, or
> >> rather does not use. In particular, bringetbitmap() does not look at the
> >> vm at all, so it'll produce incomplete bitmap.
> >>
> >> So it seems I made a boo boo here. Luckily, this is a PG15 commit, not a
> >> live issue. I don't quite see if this can be salvaged - I'll think about
> >> this a bit more, but it'll probably end with a revert.
> >
> > The principle of 'amhotblocking' for only blocking HOT updates seems
> > correct, except for the fact that the HOT flag bit is also used as a
> > way to block the propagation of new values to existing indexes.
> >
> > A better abstraction would be "amSummarizes[Block]', in which updates
> > that only modify columns that are only included in summarizing indexes
> > still allow HOT, but still will see an update call to all (relevant?)
> > summarizing indexes. That should still improve performance
> > significantly for the relevant cases.
> >
>
> Yeah, I think that might/should work. We could still create the HOT
> chain, but we'd have to update the BRIN indexes. But that seems like a
> fairly complicated change to be done this late for PG15.

Here's an example patch for that (based on a branch derived from
master @ 5bb2b6ab). A nod to the authors of the pHOT patch, as that is
a related patch and was informative in how this could/should impact AM
APIs -- this is doing things similar (but not exactly the same) to
that by only updating select indexes.

Note that this is an ABI change in some critical places -- I'm not
sure it's OK to commit a fix like this into PG15 unless we really
don't want to revert 5753d4ee320b.

Also of note is that this still updates _all_ summarizing indexes, not
only those involved in the tuple update. Better performance is up to a
different implementation.

The patch includes a new regression test based on your example, which
fails on master but succeeds after applying the patch.

-Matthias
From fdcb789a52d6f63c035984da7006a276c7c946e1 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 30 May 2022 15:55:02 +0200
Subject: [PATCH v1] Rework 5753d4ee's amhotblocking infrastructure -- replaced
 with amsummarizing

Pain points of 'hot blocking': it is a strange name for an indexam-specific
API, as HOT is a heap-specific optimization that is not necessarily shared
across table AMs. The specific feature that allows HOT to still be applied
when there is an index on an updated column would be that the index provides
a 'summary of indexed values on page X'.

The specific feature for those indexes thus becomes whether the index is
'summarizing' the data of the block in a way that doesn't allow point tuple
lookups, such that the values returned by the AM are either all tuples of a
page, or no tuples of that page.

The new code in 5753d4ee320

Re: ParseTzFile doesn't FreeFile on error

2022-05-30 Thread Tom Lane
Kyotaro Horiguchi  writes:
> The cause is ParseTzFile() returns leaving an open file descriptor
> unfreed in some error cases.
> This happens only in a special case when the errors are ignored, but
> in principle the file descriptor should be released before exiting the
> function.
> I'm not sure it's worth fixing but the attached fixes that.

I agree this is worth fixing, but adding all these gotos seems a bit
inelegant.  What do you think of the attached version?

BTW, my first thought about it was "what if one of the callees throws
elog(ERROR), eg palloc out-of-memory"?  But I think that's all right
since then we'll reach transaction abort cleanup, which won't whine
about open files.  The problem is limited to the case where no error
gets thrown.

regards, tom lane

diff --git a/src/backend/utils/misc/tzparser.c b/src/backend/utils/misc/tzparser.c
index a69cb2d268..8f2c95f055 100644
--- a/src/backend/utils/misc/tzparser.c
+++ b/src/backend/utils/misc/tzparser.c
@@ -364,7 +364,8 @@ ParseTzFile(const char *filename, int depth,
 			{
 GUC_check_errmsg("could not read time zone file \"%s\": %m",
  filename);
-return -1;
+n = -1;
+break;
 			}
 			/* else we're at EOF after all */
 			break;
@@ -374,7 +375,8 @@ ParseTzFile(const char *filename, int depth,
 			/* the line is too long for tzbuf */
 			GUC_check_errmsg("line is too long in time zone file \"%s\", line %d",
 			 filename, lineno);
-			return -1;
+			n = -1;
+			break;
 		}
 
 		/* skip over whitespace */
@@ -397,12 +399,13 @@ ParseTzFile(const char *filename, int depth,
 			{
 GUC_check_errmsg("@INCLUDE without file name in time zone file \"%s\", line %d",
  filename, lineno);
-return -1;
+n = -1;
+break;
 			}
 			n = ParseTzFile(includeFile, depth + 1,
 			base, arraysize, n);
 			if (n < 0)
-return -1;
+break;
 			continue;
 		}
 
@@ -413,12 +416,18 @@ ParseTzFile(const char *filename, int depth,
 		}
 
 		if (!splitTzLine(filename, lineno, line, &tzentry))
-			return -1;
+		{
+			n = -1;
+			break;
+		}
 		if (!validateTzEntry(&tzentry))
-			return -1;
+		{
+			n = -1;
+			break;
+		}
 		n = addToArray(base, arraysize, n, &tzentry, override);
 		if (n < 0)
-			return -1;
+			break;
 	}
 
 	FreeFile(tzFile);


Re: Race conditions in 019_replslot_limit.pl

2022-05-30 Thread Andres Freund
Hi,

On 2022-03-27 22:37:34 -0700, Andres Freund wrote:
> On 2022-03-27 17:36:14 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > I still feel like there's something off here. But that's probably not 
> > > enough
> > > to keep causing failures. I'm inclined to leave the debugging in for a bit
> > > longer, but not fail the test anymore?
> > 
> > WFM.
> 
> I've done so now.

I did look over the test results a couple times since then and once more
today. There were a few cases with pretty significant numbers of iterations:

The highest is
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2022-04-07%2022%3A14%3A03
showing:
# multiple walsenders active in iteration 19

It's somewhat interesting that the worst case was just around the feature
freeze, where the load on my buildfarm animal boxes was higher than normal.


I comparison to earlier approaches, with the current in-tree approach, we
don't do anything when hitting the "problem", other than wait. Which does give
us additional information - afaics there's nothing at all indicating that some
other backend existed allowing the replication slot drop to finish.

It just looks like for reasons I still do not understand, removing a directory
and 2 files or so takes multiple seconds (at least ~36 new connections, 18
pg_usleep(100_100)), while there are no other indications of problems.

I also still don't have a theory why this suddenly started to happen.


Unless somebody has another idea, I'm planning to remove all the debugging
code added, but keep the retry based approach in 019_replslot_limit.pl, so we
don't again get all the spurious failures.

Greetings,

Andres Freund




Re: Ignoring BRIN for HOT udpates seems broken

2022-05-30 Thread Andres Freund
Hi,

On 2022-05-30 17:22:35 +0200, Matthias van de Meent wrote:
> > Yeah, I think that might/should work. We could still create the HOT
> > chain, but we'd have to update the BRIN indexes. But that seems like a
> > fairly complicated change to be done this late for PG15.
> 
> Here's an example patch for that (based on a branch derived from
> master @ 5bb2b6ab). A nod to the authors of the pHOT patch, as that is
> a related patch and was informative in how this could/should impact AM
> APIs -- this is doing things similar (but not exactly the same) to
> that by only updating select indexes.
> 
> Note that this is an ABI change in some critical places -- I'm not
> sure it's OK to commit a fix like this into PG15 unless we really
> don't want to revert 5753d4ee320b.
> 
> Also of note is that this still updates _all_ summarizing indexes, not
> only those involved in the tuple update. Better performance is up to a
> different implementation.
> 
> The patch includes a new regression test based on your example, which
> fails on master but succeeds after applying the patch.

This seems like a pretty clear cut case for reverting and retrying in
16. There's plenty subtlety in this area (as evidenced by this thread and the
index/reindex concurrently breakage), and building infrastructure post beta1
isn't exactly conducive to careful analysis and testing.

Greetings,

Andres Freund




Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word

2022-05-30 Thread Tom Lane
Michael Meskes  writes:
>> This seems to be because what follows ecpgstart can be either a general
>> SQL statement or an ECPGVarDeclaration (beginning with var_type),
>> and bison isn't smart enough to disambiguate.  I have a feeling that
>> this situation could be improved with enough elbow grease, because
>> plpgsql manages to solve a closely-related problem in allowing its
>> assignment statements to coexist with general SQL statements.

> Right, the reason for all this is that the part after the "exec sql" could be
> either language, SQL or C. It has been like this for all those years. I do not
> claim that the solution we have is the best, it's only the best I could come 
> up
> with when I implemented it. If anyone has a better way, please be my guest.

I pushed the proposed patch, but after continuing to think about
it I have an idea for a possible solution to the older problem.
I noticed that the problematic cases in var_type aren't really
interested in seeing any possible unreserved keyword: they care about
certain specific built-in type names (most of which are keywords
already) and about typedef names.  Now, almost every C-parsing program
I've ever seen has to lex typedef names specially, so what if we made
ecpg do that too?  After a couple of false starts, I came up with the
attached draft patch.  The key ideas are:

1. In pgc.l, if an identifier is a typedef name, ignore any possible
keyword meaning and return it as an IDENT.  (I'd originally supposed
that we'd want to return some new TYPEDEF token type, but that does
not seem to be necessary right now, and adding a new token type would
increase the patch footprint quite a bit.)

2. In the var_type production, forget about ECPGColLabel[Common]
and just handle the keywords we know we need, plus IDENT for the
typedef case.  It turns out that we have to have duplicate coding
because most of these keywords are not keywords in C lexing mode,
so that they'll come through the IDENT path anyway when we're
in a C rather than SQL context.  That seemed acceptable to me.
I thought about adding them all to the C keywords list but that
seemed likely to have undesirable side-effects, and again it'd
bloat the patch footprint.

This fix is not without downsides.  Disabling recognition of
keywords that match typedefs means that, for example, if you
declare a typedef named "work" then ECPG will fail to parse
"EXEC SQL BEGIN WORK".  So in a real sense this is just trading
one hazard for another.  But there is an important difference:
with this, whether your ECPG program works depends only on what
typedef names and SQL commands are used in the program.  If
it compiles today it'll still compile next year, whereas with
the present implementation the addition of some new unreserved
SQL keyword could break it.  We'd have to document this change
for sure, and it wouldn't be something to back-patch, but it
seems like it might be acceptable from the users' standpoint.

We could narrow (not eliminate) this hazard if we could get the
typedef lookup in pgc.l to happen only when we're about to parse
a var_type construct.  But because of Bison's lookahead behavior,
that seems to be impossible, or at least undesirably messy
and fragile.  But perhaps somebody else will see a way.

Anyway, this seems like too big a change to consider for v15,
so I'll stick this patch into the v16 CF queue.  It's only
draft quality anyway --- lacks documentation changes and test
cases.  There are also some coding points that could use review.
Notably, I made the typedef lookup override SQL keywords but
not C keywords; this is for consistency with the C-mode lookup
rules, but is it the right thing?

regards, tom lane

diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer
index b95fc44314..fba35f6be6 100644
--- a/src/interfaces/ecpg/preproc/ecpg.trailer
+++ b/src/interfaces/ecpg/preproc/ecpg.trailer
@@ -564,8 +564,29 @@ var_type:	simple_type
 			$$.type_index = mm_strdup("-1");
 			$$.type_sizeof = NULL;
 		}
-		| ECPGColLabelCommon '(' precision opt_scale ')'
+		| NUMERIC '(' precision opt_scale ')'
 		{
+			$$.type_enum = ECPGt_numeric;
+			$$.type_str = mm_strdup("numeric");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| DECIMAL_P '(' precision opt_scale ')'
+		{
+			$$.type_enum = ECPGt_decimal;
+			$$.type_str = mm_strdup("decimal");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| IDENT '(' precision opt_scale ')'
+		{
+			/*
+			 * In C parsing mode, NUMERIC and DECIMAL are not keywords, so
+			 * they will show up here as a plain identifier, and we need
+			 * this duplicate code to recognize them.
+			 */
 			if (strcmp($1, "numeric") == 0)
 			{
 $$.type_enum = ECPGt_numeric;
@@ -587,15 +608,98 @@ var_type:	simple_type
 			$$.type_index = mm_strdup("-1");
 			$$.type_sizeof = NULL;
 		}

Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-30 Thread Peter Geoghegan
On Mon, May 30, 2022 at 2:37 PM David Rowley  wrote:
> The results compare PG14 @ 0adff38d against master @ b3fb16e8b.  In
> the chart, anything below 100% is a performance improvement over PG14
> and anything above 100% means PG15 is slower.  You can see there's
> only the 64-byte / 64MB work_mem test that gets significantly slower
> and that there are only a small amount of other tests that are
> slightly slower.  Most are faster and on average PG15 takes 90% of the
> time that PG14 took.

Shouldn't this be using the geometric mean rather than the arithmetic
mean? That's pretty standard practice when summarizing a set of
benchmark results that are expressed as ratios to some baseline.

If I tweak your spreadsheet to use the geometric mean, the patch looks
slightly better -- 89%.

-- 
Peter Geoghegan




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-30 Thread David Rowley
On Tue, 31 May 2022 at 09:48, Peter Geoghegan  wrote:
> Shouldn't this be using the geometric mean rather than the arithmetic
> mean? That's pretty standard practice when summarizing a set of
> benchmark results that are expressed as ratios to some baseline.

Maybe just comparing the SUM of the seconds of each version is the
best way.  That comes out to 86.6%

David




Re: "ERROR: latch already owned" on gharial

2022-05-30 Thread Thomas Munro
On Sat, May 28, 2022 at 8:11 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Fri, May 27, 2022 at 10:21 AM Tom Lane  wrote:
> >> What I'd suggest is to promote that failure to elog(PANIC), which
> >> would at least give us the PID and if we're lucky a stack trace.
>
> > That proposed change is fine with me.
>
> > As to the question of whether it's a real bug, nobody can prove
> > anything unless we actually run it down.
>
> Agreed, and I'll even grant your point that if it is an HPUX-specific
> or IA64-specific bug, it is not worth spending huge amounts of time
> to isolate.  The problem is that we don't know that.  What we do know
> so far is that if it can occur elsewhere, it's rare --- so we'd better
> be prepared to glean as much info as possible if we do get such a
> failure.  Hence my thought of s/ERROR/PANIC/.  And I'd be in favor of
> any other low-effort change we can make to instrument the case better.

OK, pushed (except I realised that all the PIDs involved were int, not
pid_t).  Let's see...




Re: ParseTzFile doesn't FreeFile on error

2022-05-30 Thread Kyotaro Horiguchi
At Mon, 30 May 2022 13:11:04 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > The cause is ParseTzFile() returns leaving an open file descriptor
> > unfreed in some error cases.
> > This happens only in a special case when the errors are ignored, but
> > in principle the file descriptor should be released before exiting the
> > function.
> > I'm not sure it's worth fixing but the attached fixes that.
> 
> I agree this is worth fixing, but adding all these gotos seems a bit
> inelegant.  What do you think of the attached version?

It is what came up to me first. It is natural. So I'm fine with
it. The point of the "goto"s was that repeated "n = -1;break;" looked
somewhat noisy to me in the loop.

> BTW, my first thought about it was "what if one of the callees throws
> elog(ERROR), eg palloc out-of-memory"?  But I think that's all right
> since then we'll reach transaction abort cleanup, which won't whine
> about open files.  The problem is limited to the case where no error
> gets thrown.

Right. This "issue" is not a problem unless the caller continues
without throwing an exception after the function errors out, which is
not done by the current code.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: "ERROR: latch already owned" on gharial

2022-05-30 Thread Thomas Munro
On Sat, May 28, 2022 at 1:56 AM Robert Haas  wrote:
> What I'm inclined to do is get gharial and anole removed from the
> buildfarm. anole was set up by Heikki in 2011. I don't know when
> gharial was set up, or by whom. I don't think anyone at EDB cares
> about these machines any more, or has any interest in maintaining
> them. I think the only reason they're still running is that, just by
> good fortune, they haven't fallen over and died yet. The hardest part
> of getting them taken out of the buildfarm is likely to be finding
> someone who has a working username and password to log into them and
> take the jobs out of the crontab.

FWIW, in a previous investigation, Semab and Sandeep had access:

https://www.postgresql.org/message-id/CABimMB4mRs9N3eivR-%3DqF9M8oWc5E6OX7GywsWF0DXN4P5gNEA%40mail.gmail.com




Re: Remove useless tests about TRUNCATE on foreign table

2022-05-30 Thread Michael Paquier
On Mon, May 30, 2022 at 05:08:10PM +0900, Michael Paquier wrote:
> Partitions have also some coverage as far as I can see, so I agree
> that it makes little sense to keep the tests you are removing here.

And done as of 0efa513.
--
Michael


signature.asc
Description: PGP signature


Re: Ignore heap rewrites for materialized views in logical replication

2022-05-30 Thread Euler Taveira
On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote:
> I agree with your analysis and the fix looks correct to me.
Thanks for checking.

> Instead of waiting for an error, we can try to insert into a new table
> created by the test case after the 'Refresh ..' command and wait for
> the change to be replicated by using wait_for_caught_up.
That's a good idea. [modifying the test...] I used the same table. Whenever the
new row arrives on the subscriber or it reads that error message, it bails out.

> Let's try to see if we can simplify the test so that it can be
> committed along with a fix. If we are not able to find any reasonable
> way then we can think of skipping it.
The new test is attached.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From fe13dc44f210b5f14b97bf9a29a242c2211ba700 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 27 May 2022 11:35:27 -0300
Subject: [PATCH v2 1/2] Ignore heap rewrites for materialized views in logical
 replication

If you have a publication that specifies FOR ALL TABLES clause, a REFRESH
MATERIALIZED VIEW can break your setup with the following message

ERROR:  logical replication target relation "public.pg_temp_NNN" does not exist

Commit 1a499c2520 introduces a heuristic to prevent that transient heaps (due
to table rewrites) are included for a FOR ALL TABLES publication. However, it
forgot to exclude materialized views (heap rewrites occurs when you execute
REFRESH MATERIALIZED VIEW). Since materialized views are not supported in
logical decoding, it is appropriate to filter them too.

As explained in the commit message, a more robust solution was included in
version 11 (commit 325f2ec555), hence only version 10 is affected.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f96fde3df0..e03ff4a11e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 if (sscanf(relname, "pg_temp_%u%n", &u, &n) == 1 &&
 	relname[n] == '\0')
 {
-	if (get_rel_relkind(u) == RELKIND_RELATION)
+	char	relkind = get_rel_relkind(u);
+
+	if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
 		break;
 }
 			}
-- 
2.30.2

From 04b6b6566f4cf4ddc8cf3ddc51420e93c599024d Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 27 May 2022 14:43:10 -0300
Subject: [PATCH v2 2/2] test: heap rewrite for materialized view in logical
 replication

---
 src/test/subscription/t/006_rewrite.pl | 35 +-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl
index 5e3211aefa..27eb6a1cb7 100644
--- a/src/test/subscription/t/006_rewrite.pl
+++ b/src/test/subscription/t/006_rewrite.pl
@@ -3,7 +3,8 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Test::More tests => 3;
 
 sub wait_for_caught_up
 {
@@ -26,6 +27,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);";
 $node_publisher->safe_psql('postgres', $ddl);
 $node_subscriber->safe_psql('postgres', $ddl);
 
+$ddl = "CREATE TABLE test2 (a int, b text);";
+$node_publisher->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');});
+$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;');
+
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 my $appname   = 'encoding_test';
 
@@ -69,5 +77,30 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}),
 3|three|33),
'data replicated to subscriber');
 
+$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;');
+
+# an additional row to check if the REFRESH worked
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (30, 'thirty');});
+
+my $max_attempts = 10 * $TestLib::timeout_default;
+my $attempts = 0;
+my $ret  = 0;
+my $errmsg = qr/logical replication target relation.*does not exist/;
+while ($attempts < $max_attempts)
+{
+	# Succeed. Bail out.
+	if ($node_subscriber->safe_psql('postgres', 'SELECT COUNT(1) FROM test2') == 3) {
+		$ret = 1;
+		last;
+	}
+
+	# Failed. Bail out.
+	last if (slurp_file($node_subscriber->logfile) =~ $errmsg);
+
+	usleep(100_000);
+	$attempts++;
+}
+is($ret, 1, 'heap rewrite for materialized view on subscriber');
+
 $node_subscriber->stop;
 $node_publisher->stop;
-- 
2.30.2



Re: Race conditions in 019_replslot_limit.pl

2022-05-30 Thread Kyotaro Horiguchi
At Mon, 30 May 2022 12:01:55 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2022-03-27 22:37:34 -0700, Andres Freund wrote:
> > On 2022-03-27 17:36:14 -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > I still feel like there's something off here. But that's probably not 
> > > > enough
> > > > to keep causing failures. I'm inclined to leave the debugging in for a 
> > > > bit
> > > > longer, but not fail the test anymore?
> > > 
> > > WFM.
> > 
> > I've done so now.
> 
> I did look over the test results a couple times since then and once more
> today. There were a few cases with pretty significant numbers of iterations:
> 
> The highest is
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2022-04-07%2022%3A14%3A03
> showing:
> # multiple walsenders active in iteration 19
> 
> It's somewhat interesting that the worst case was just around the feature
> freeze, where the load on my buildfarm animal boxes was higher than normal.

If disk is too busy, CheckPointReplicationSlots may take very long.

> I comparison to earlier approaches, with the current in-tree approach, we
> don't do anything when hitting the "problem", other than wait. Which does give
> us additional information - afaics there's nothing at all indicating that some
> other backend existed allowing the replication slot drop to finish.

preventing?  Only checkpointer and a client backend that ran "SELECT * FROM
pg_stat_activity" are the only processes that are running during the
blocking state.

> It just looks like for reasons I still do not understand, removing a
directory
> and 2 files or so takes multiple seconds (at least ~36 new connections, 18
> pg_usleep(100_100)), while there are no other indications of problems.

That fact suports that CheckPointReplicationSlots took long time.

> I also still don't have a theory why this suddenly started to happen.

Maybe we need to see the load of disks at that time OS-wide. Couldn't
compiler or other non-postgres tools put significant load to disks?

> Unless somebody has another idea, I'm planning to remove all the debugging
> code added, but keep the retry based approach in 019_replslot_limit.pl, so we
> don't again get all the spurious failures.

+1.  

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Prevent writes on large objects in read-only transactions

2022-05-30 Thread Michael Paquier
On Mon, May 30, 2022 at 05:44:18PM +0900, Yugo NAGATA wrote:
> As to the error messages during recovery, I think it is better to improve
> it, because the current messages are emitted by elog() and it seems to imply
> they are internal errors that we don't expected. I attached a updated patch
> for it.

Yeah, elog() messages should never be user-facing as they refer to
internal errors, and any of those errors are rather deep in the tree
while being unexpected.

lo_write() is published in be-fsstubs.h, though we have no callers of
it in the backend for the core code.  Couldn't there be a point in
having the recovery protection there rather than in the upper SQL
routine be_lowrite()?  At the end, we would likely generate a failure
when attempting to insert the LO data in the catalogs through
inv_api.c, but I was wondering if we should make an extra effort in
improving the report also in this case if there is a direct caller of
this LO write routine.  The final picture may be better if we make
lo_write() a routine static to be-fsstubs.c but it is available for
ages, so I'd rather leave it declared as-is.

libpq fetches the OIDs of the large object functions and caches it for
PQfn() as far as I can see, so it is fine by me to have the
protections in be-fsstubs.c, letting inv_api.c deal with the internals
with the catalogs, ACL checks, etc.  Should we complain on lo_open()
with the write mode though?

The change for lo_truncate_internal() is a bit confusing for the 64b
version, as we would complain about lo_truncate() and not
lo_truncate64().

While on it, could we remove -DFSDB?
--
Michael


signature.asc
Description: PGP signature


Re: Prevent writes on large objects in read-only transactions

2022-05-30 Thread Michael Paquier
On Fri, May 27, 2022 at 02:02:24PM +0200, Laurenz Albe wrote:
> On Fri, 2022-05-27 at 15:30 +0900, Yugo NAGATA wrote:
>> Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put,
>> and lo_from_bytea are allowed even in read-only transactions.
>> By using them, pg_largeobject and pg_largeobject_metatable can
>> be modified in read-only transactions and the effect remains
>> after the transaction finished. Is it unacceptable behaviours, 
>> isn't it?
> 
> +1

And I have forgotten to add your name as a reviewer.  Sorry about
that!
--
Michael


signature.asc
Description: PGP signature


Failures in sto_using_cursor test

2022-05-30 Thread Thomas Munro
Hi,

Here are some failures in the test sto_using_cursor, on 12, 13 and
HEAD branches:

 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-03-15%2023:18:30
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-03-03%2005:59:30
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-20%2020:49:17
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-08-10%2004:47:08
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-09-16%2002:15:25
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2022-05-16%2010:21:03

Unfortunately the build farm doesn't capture regression.diffs.  I
recall that a problem like that was fixed at some point in the BF
code, but peripatus is running the latest tag (REL_14) so I'm not sure
why.

That reminds me: there was a discussion about whether the whole STO
feature should be deprecated/removed[1], starting from Andres's
complaints about correctness and testability problems in STO while
hacking on snapshot scalability.  I'd be happy to come up with the
patch for that, if we decide to do that, perhaps when the tree opens
for 16?

Or perhaps someone wants to try to address the remaining known issues
with it?  Robert fixed some stuff, and I had some more patches[2] that
tried to fix a few more things relating to testability and a
wraparound problem, not committed.  I'd happily rebase them with a
view to getting them in, but only if there is interest in reviewing
them and really trying to save this design.  There were more problems
though: there isn't a systematic way to make sure that
TestForOldSnapshot() is in all the right places, and IIRC there are
some known mistakes?  And maybe more.

[1] 
https://www.postgresql.org/message-id/flat/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
[2] 
https://www.postgresql.org/message-id/CA%2BhUKGJyw%3DuJ4eL1x%3D%2BvKm16fLaxNPvKUYtnChnRkSKi024u_A%40mail.gmail.com




Re: convert libpq uri-regress tests to tap test

2022-05-30 Thread Michael Paquier
On Sun, May 29, 2022 at 10:18:50AM -0500, Justin Pryzby wrote:
> On Sat, Feb 26, 2022 at 05:46:26PM -0800, Andres Freund wrote:
> > On 2022-02-25 17:52:29 -0800, Andres Freund wrote:
> > > I'd like to commit 0001 and 0002 soon, unless somebody sees a reason not 
> > > to?
> > 
> > Pushed.
> 
> If I'm not wrong, this isn't being run by check-world.

You are right.

> -$(call recurse,check-world,src/test src/pl src/interfaces/ecpg contrib 
> src/bin,check)
> -$(call recurse,checkprep,  src/test src/pl src/interfaces/ecpg contrib 
> src/bin)
> +$(call recurse,check-world,src/test src/pl src/interfaces/ecpg 
> src/interfaces/libpq contrib src/bin,check)
> +$(call recurse,checkprep,  src/test src/pl src/interfaces/ecpg 
> src/interfaces/libpq contrib src/bin)
>  
> -$(call recurse,installcheck-world,src/test src/pl src/interfaces/ecpg 
> contrib src/bin,installcheck)
> +$(call recurse,installcheck-world,src/test src/pl src/interfaces/ecpg 
> src/interfaces/libpq contrib src/bin,installcheck)
>  $(call recurse,install-tests,src/test/regress,install-tests)

Why don't you just use src/interfaces/ instead of adding a direct
path to libpq?
--
Michael


signature.asc
Description: PGP signature


Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-30 Thread Kyotaro Horiguchi
At Fri, 20 May 2022 12:00:14 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 19 May 2022 17:16:03 -0400, Tom Lane  wrote in 
> > Justin Pryzby  writes:
> > > ./tmp_install/usr/local/pgsql/bin/postgres -D 
> > > ./src/test/regress/tmp_check/data -c min_dynamic_shared_memory=1MB
> > > TRAP: FailedAssertion("val > base", File: 
> > > "../../../../src/include/utils/relptr.h", Line: 67, PID: 21912)
> > 
> > Yeah, I see it too.
> > 
> > > It looks like this may be pre-existing problem exposed by
> > > commit e07d4ddc55fdcf82082950b3eb0cd8f728284c9d
> > 
> > Agreed.  Here I see
> > 
> > #5  FreePageManagerInitialize (fpm=fpm@entry=0x7f34b3ddd300, 
> > base=base@entry=0x7f34b3ddd300 "") at freepage.c:187
> > #6  0x0082423e in dsm_shmem_init () at dsm.c:473
> > 
> > so that where we do
> > 
> > relptr_store(base, fpm->self, fpm);
> > 
> > the "relative" pointer value would have to be zero, making the case
> > indistinguishable from a NULL pointer.  We can either change the
> > caller so that these addresses aren't the same, or give up the
> > ability to store NULL in relptrs ... doesn't seem like a hard call.
> > 
> > One interesting question I didn't look into is why it takes a nondefault
> > value of min_dynamic_shared_memory to expose this bug.
> 
> The path is taken only when a valid value is given to the
> parameter. If we don't use preallocated dsm, it is initialized
> elsewhere.  In those cases the first bytes of the base address (the
> second parameter of FreePageManagerInitialize) are used for
> dsa_segment_header so the relptr won't be zero (!= NULL).
> 
> It can be silenced by wasting the first MAXALIGN bytes of
> dsm_main_space_begin..

Actually, that change doesn't result in wasting of usable memory size
since the change doesn't move the first effective page.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 9d86bbe872..c67134ec27 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -461,12 +461,19 @@ dsm_shmem_init(void)
 	dsm_main_space_begin = ShmemInitStruct("Preallocated DSM", size, &found);
 	if (!found)
 	{
-		FreePageManager *fpm = (FreePageManager *) dsm_main_space_begin;
+		/*
+		 * fpm cannot be the same with dsm_main_space_begin due to the
+		 * restriction imposed by FreePageManagerInitialize() due to
+		 * relptr. Add MAXALIGN(1) to fpm to live with that restriction.
+		 */
+		int			gap = MAXALIGN(1);
+		FreePageManager *fpm =
+			(FreePageManager *) ((char *)dsm_main_space_begin + gap);
 		size_t		first_page = 0;
 		size_t		pages;
 
 		/* Reserve space for the FreePageManager. */
-		while (first_page * FPM_PAGE_SIZE < sizeof(FreePageManager))
+		while (first_page * FPM_PAGE_SIZE < sizeof(FreePageManager) + gap)
 			++first_page;
 
 		/* Initialize it and give it all the rest of the space. */
diff --git a/src/test/modules/test_misc/t/004_check_min_dsm_size.pl b/src/test/modules/test_misc/t/004_check_min_dsm_size.pl
new file mode 100644
index 00..025dca33e4
--- /dev/null
+++ b/src/test/modules/test_misc/t/004_check_min_dsm_size.pl
@@ -0,0 +1,14 @@
+# Tests of min_dynamic_shared_memory
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', 'min_dynamic_shared_memory = 1MB');
+$node->start;
+pass('check if server can start with min_dynamic_shared_memory');
+done_testing();


Re: Skipping schema changes in publication

2022-05-30 Thread Peter Smith
Here are my review comments for patch v7-0002.

==

1. doc/src/sgml/logical-replication.sgml

@@ -1167,8 +1167,9 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
   
To add tables to a publication, the user must have ownership rights on the
table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables or
all tables in
-   schema automatically, the user must be a superuser.
+   superuser. To add all tables to a publication, the user must be a superuser.
+   To create a publication that publishes all tables or all tables in schema
+   automatically, the user must be a superuser.
   

I felt that maybe this whole paragraph should be rearranged. Put the
"create publication" parts before the "alter publication" parts;
Re-word the sentences more similarly. I also felt the ALL TABLES and
ALL TABLES IN SCHEMA etc should be written uppercase/literal since
that is what was meant.

SUGGESTION
To create a publication using FOR ALL TABLES or FOR ALL TABLES IN
SCHEMA, the user must be a superuser. To add ALL TABLES or ALL TABLES
IN SCHEMA to a publication, the user must be a superuser. To add
tables to a publication, the user must have ownership rights on the
table.

~~~

2. doc/src/sgml/ref/alter_publication.sgml

@@ -82,8 +88,8 @@ ALTER PUBLICATION name RESET

   
You must own the publication to use ALTER PUBLICATION.
-   Adding a table to a publication additionally requires owning that table.
-   The ADD ALL TABLES IN SCHEMA,
+   Adding a table to or excluding a table from a publication additionally
+   requires owning that table. The ADD ALL TABLES IN SCHEMA,
SET ALL TABLES IN SCHEMA to a publication and

Isn't this missing some information that says ADD ALL TABLES requires
the invoking user to be a superuser?

~~~

3. doc/src/sgml/ref/alter_publication.sgml - examples

+  
+   Alter publication production_publication to publish
+   all tables except users and
+   departments tables:
+
+ALTER PUBLICATION production_publication ADD ALL TABLES EXCEPT users,
departments;
+
+

I didn't think it needs to say "tables" 2x (e.g. remove the last "tables")

~~~

4. doc/src/sgml/ref/create_publication.sgml - examples

+  
+   Create a publication that publishes all changes in all the tables except for
+   the changes of users and
+   departments tables:
+
+CREATE PUBLICATION mypublication FOR ALL TABLES EXCEPT users, departments;
+
+  

I didn't think it needs to say "tables" 2x (e.g. remove the last "tables")

~~~

5. src/backend/catalog/pg_publication.c

  foreach(lc, ancestors)
  {
  Oid ancestor = lfirst_oid(lc);
- List*apubids = GetRelationPublications(ancestor);
- List*aschemaPubids = NIL;
+ List*apubids = GetRelationPublications(ancestor, false);
+ List*aschemapubids = NIL;
+ List*aexceptpubids = NIL;

  level++;

- if (list_member_oid(apubids, puboid))
+ /* check if member of table publications */
+ if (!list_member_oid(apubids, puboid))
  {
- topmost_relid = ancestor;
-
- if (ancestor_level)
- *ancestor_level = level;
- }
- else
- {
- aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
- if (list_member_oid(aschemaPubids, puboid))
+ /* check if member of schema publications */
+ aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor));
+ if (!list_member_oid(aschemapubids, puboid))
  {
- topmost_relid = ancestor;
-
- if (ancestor_level)
- *ancestor_level = level;
+ /*
+ * If the publication is all tables publication and the table
+ * is not part of exception tables.
+ */
+ if (puballtables)
+ {
+ aexceptpubids = GetRelationPublications(ancestor, true);
+ if (list_member_oid(aexceptpubids, puboid))
+ goto next;
+ }
+ else
+ goto next;
  }
  }

+ topmost_relid = ancestor;
+
+ if (ancestor_level)
+ *ancestor_level = level;
+
+next:
  list_free(apubids);
- list_free(aschemaPubids);
+ list_free(aschemapubids);
+ list_free(aexceptpubids);
  }


I felt those negative (!) conditions and those goto are making this
logic hard to understand. Can’t it be simplified more than this? Even
just having another bool flag might help make it easier.

e.g. Perhaps something a bit like this (but add some comments)

foreach(lc, ancestors)
{
Oid ancestor = lfirst_oid(lc);
List*apubids = GetRelationPublications(ancestor);
List*aschemaPubids = NIL;
List*aexceptpubids = NIL;
bool set_top = false;
level++;

set_top = list_member_oid(apubids, puboid);
if (!set_top)
{
aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
set_top = list_member_oid(aschemaPubids, puboid);

if (!set_top && puballtables)
{
aexceptpubids = GetRelationPublications(ancestor, true);
set_top = !list_member_oid(aexceptpubids, puboid);
}
}
if (set_top)
{
topmost_relid = ancestor;

if (ancestor_level)
*ancestor_level = level;
}

list_free(apubids);
list_free(aschemapubids);
list_free(aexceptpubids);
}

--

6. src/backend/commands/publicationcmds.c - CheckPublicationDefValue