date:20220506

Re: bogus: logical replication rows/cols combinations

2022-05-06 Thread Amit Kapila

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera  wrote:
>
> On 2022-May-02, Amit Kapila wrote:
>
>
> > I think it is possible to expose a list of publications for each
> > walsender as it is stored in each walsenders
> > LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
> > can have one such LogicalDecodingContext and we can probably share it
> > via shared memory?
>
> I guess we need to create a DSM each time a walsender opens a
> connection, at START_REPLICATION time.  Then ALTER PUBLICATION needs to
> connect to all DSMs of all running walsenders and see if they are
> reading from it.  Is that what you have in mind?  Alternatively, we
> could have one DSM per publication with a PID array of all walsenders
> that are sending it (each walsender needs to add its PID as it starts).
> The latter might be better.
>

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

-- 
With Regards,
Amit Kapila.

Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-05-06 Thread Thomas Munro

On Wed, May 4, 2022 at 2:23 PM Thomas Munro  wrote:
> Assuming no
> objections or CI failures show up, I'll consider pushing the first two
> patches tomorrow.

Done.  Time to watch the build farm.

It's possible that these changes will produce some blowback, now that
we're using PROCSIGNAL_BARRIER_SMGRRELEASE on common platforms.
Obviously the earlier work started down this path already, but that
was Windows-only, so it probably didn't get much attention from our
mostly Unix crowd.

For example, if there are bugs in the procsignal barrier system, or if
there are places that don't process interrupts at all or promptly, we
might hear complaints about that.  That bug surface includes, for
example, background workers created by extensions.  An example on my
mind is a place where we hold interrupts while cleaning up temporary
files (= a loop of arbitrary size with filesystem ops that might hang
on slow storage), so a concurrent DROP TABLESPACE will have to wait,
which is kinda strange because it would in fact be perfectly safe to
process this particular "interrupt".  In that case we really just
don't want the kinds of interrupts that perform non-local exits and
prevent our cleanup work from running to completion, but we don't have
a way to say that.  I think we'll probably also want to invent a way
to report which backend is holding up progress, since without that
it's practically impossible for an end user to understand why their
command is hanging.

Re: Support logical replication of DDLs

2022-05-06 Thread Amit Kapila

On Fri, May 6, 2022 at 10:21 PM Zheng Li  wrote:
>
> > Attached is a set of two patches as an attempt to evaluate this approach.
> >
>
> Thanks for exploring this direction.
>
> I read the deparsing thread and your patch. Here is my thought:
> 1. The main concern on maintainability of the deparsing code still
> applies if we want to adapt it for DDL replication.
>

I agree that it adds to our maintainability effort, like every time we
enhance any DDL or add a new DDL that needs to be replicated, we
probably need to change the deparsing code. But OTOH this approach
seems to provide better flexibility. So, in the long run, maybe the
effort is worthwhile. I am not completely sure at this stage which
approach is better but I thought it is worth evaluating this approach
as Alvaro and Robert seem to prefer this idea.

> 2. The search_path and role still need to be handled, in the deparsing
> code. And I think it's actually more overhead to qualify every object
> compared to just logging the search_path and enforcing it on the apply
> worker.
>

But doing it in the deparsing code will have the benefit that the
other plugins won't have to develop similar logic.

-- 
With Regards,
Amit Kapila.

Re: Collation version tracking for macOS

2022-05-06 Thread Thomas Munro

On Mon, Feb 14, 2022 at 10:00 PM Peter Eisentraut
 wrote:
> During development, I have been using the attached patch to simulate
> libc collation versions on macOS.  It just uses the internal major OS
> version number.  I don't know to what the extend the libc locales on
> macOS are maintained or updated at all, so I don't know what practical
> effect this would have.  Again, it's mainly for development.  If there
> is interest from others, I think we could add this, maybe disabled by
> default, or we just keep it in the mailing list archives for interested
> parties.

Last time I looked into this it seemed like macOS's strcoll() gave
sensible answers in the traditional single-byte encodings, but didn't
understand UTF-8 at all so you get C/strcmp() order.  In other words
there was effectively nothing to version.  I remember that other old
Unixes used to be like that, and I suspect that they might be using
old pre-UTF-8 FreeBSD code for locales based on a quick peek at [1]
(though FreeBSD itself has since learned to do CLDR-based UTF-8
sorting with a completely new implementation shared with other OSes).
This makes me wonder if Apple is hiding another collation
implementation somewhere up its sleeve -- surely that libc support is
not good enough for the world's shiny globalised macOS/iOS apps?
Maybe UCCompareText() and friends (UnicodeUtilitiesCoreLib) and the
various Obj-C NSString comparison stuff, all of which probably
predates Unixoid macOS (google tells me that UnicodeUtilities.h was
present in macOS 9).  It wouldn't be surprising if it shares nothing
with the modern OS's C runtime stuff that came via NeXT.  Just
mentioning this as a curiosity, because I was trying to figure out how
that could be left non-working without anyone complaining...

[1] https://github.com/apple-open-source-mirror/Libc/tree/master/locale

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Robert Haas

On Fri, May 6, 2022 at 5:15 PM Nathan Bossart  wrote:
> On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote:
> > +1, I'll put together a new patch set.
>
> As promised...

Looks reasonable to me.

Anyone else have thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Mark all GUC variable as PGDLLIMPORT

2022-05-06 Thread Andres Freund

Hi,

On 2022-04-08 08:42:38 -0400, Robert Haas wrote:
> On Wed, Apr 6, 2022 at 7:56 PM Michael Paquier  wrote:
> > On Wed, Apr 06, 2022 at 12:57:29AM +0700, John Naylor wrote:
> > > For these two patches, I'd say a day or two after feature freeze is a
> > > reasonable goal.
> >
> > Yeah.  For patches as invasive as the PGDLLIMPORT business and the
> > frontend error refactoring, I am also fine to have two exceptions with
> > the freeze deadline.
> 
> Done now.

Just noticed that
extern sigset_t UnBlockSig,
BlockSig,
StartupBlockSig;
are unadorned.

There's also a number of variables that are only declared in .c files that
!windows can still access. Some likely aren't worth caring about, but some
others are underlying GUCs, so we probably do? E.g.
CommitDelay
CommitSiblings
default_tablespace
ignore_checksum_failure
ignore_invalid_pages
Log_disconnections
ssl_renegotiation_limit
temp_tablespaces
Unix_socket_group
Unix_socket_permissions
wal_level_options

Likely don't care:
dynamic_shared_memory_options
gistBufferingOptValues
recovery_target_action_options
ssl_protocol_versions_info

Also noticed that the bbsink_ops variables are const, instead of static const,
was that intentional?

Greetings,

Andres Freund

Re: Possible corruption by CreateRestartPoint at promotion

2022-05-06 Thread Michael Paquier

On Fri, May 06, 2022 at 08:52:45AM -0700, Nathan Bossart wrote:
> I was looking at other changes in this area (e.g., 3c64dcb), and now I'm
> wondering if we actually should invalidate the minRecoveryPoint when the
> control file no longer indicates archive recovery.  Specifically, what
> happens if a base backup of a newly promoted standby is used for a
> point-in-time restore?  If the checkpoint location is updated and all
> previous segments have been recycled/removed, it seems like the
> minRecoveryPoint might point to a missing segment.

A new checkpoint is enforced at the beginning of the backup which
would update minRecoveryPoint and minRecoveryPointTLI, while we don't
a allow a backup to finish if it began on a standby has just promoted
in-between.
--
Michael


signature.asc
Description: PGP signature

Re: [RFC] building postgres with meson -v8

2022-05-06 Thread Andres Freund

Hi,

On 2022-05-06 14:27:24 -0700, Andres Freund wrote:
> > 0003-meson-Install-all-server-headers.patch
> > 
> > With this, all the server headers installed by a makefile-based build are
> > installed.  I tried to strike a balance between using install_subdir() with
> > exclude list versus listing things explicitly. Different variations might be
> > possible, but this looked pretty sensible to me.
> 
> I locally had something similar, but I'm worried that this approach will be
> too fragile. Leads to e.g. editor temp files getting installed. I've merged it
> for now, but I think we need a different approach.

Meant to add potential alternatives here: The easiest likely would be to just
add an install script that globs *.h. Alternatively we could build a file list
at configure time, and then install that with install_header(). The advantage
would be that it be available for things like cpluspluscheck, the disadvantage
that something needs to trigger reconfiguration to update the file list.

Greetings,

Andres Freund

Re: [RFC] building postgres with meson -v8

2022-05-06 Thread Andres Freund

Hi,

On 2022-04-21 17:34:47 -0400, Tom Lane wrote:
> FWIW, I don't think that either gaur or prairiedog need be factored into
> this conversation.  They cannot build ninja at all for lack of ,
> so whether they could run meson is pretty much beside the point.

Yea.

> (I wonder if we should stick in a configure test for ,
> just to see if anything else doesn't have it?)

Might be worth doing?

> We should worry a little more about Solaris and AIX, but even there I
> think it's largely up to the platform owner whether they've updated
> python to something modern.

Looks like "AIX toolbox" is at 3.7. Solaris 11.4 apparently has 3.5 (11.3 is
EOL January 2024).

I think it's worth caring about supporting 3.6 due to RHEL 7 for now.

> If it isn't, you need to move the goalposts
> back some more :-(.  As of today I see the following pre-3.6 pythons
> in the buildfarm, exclusive of mine:
>
> skate 3.2.3
> snapper   3.2.3

Debian wheezy, I feel ok with dropping that.

> topminnow 3.4.2

Debian jessie, similar.

> hornet3.4.3
> sungazer  3.4.3

Looks like a newer python version is available for AIX, without manually
compiling.

> wrasse3.4.3

Apparently solaris 11.4 has python 3.5 (still not great :/)

> shelduck  3.4.10

This animal seems to have retired.

> curculio  3.5.1

Supported versions of openbsd have modern versions of python.

> hoverfly  3.5.1

AIX

> batfish   3.5.2
> spurfowl  3.5.2
> cuon  3.5.2

Ubuntu 16.04 is EOL (since 2021-04), outside of paid extended support.

> ayu   3.5.3
> chimaera  3.5.3
> chipmunk  3.5.3
> grison3.5.3
> mussurana 3.5.3
> tadarida  3.5.3
> urocryon  3.5.3

These are all [variants of] debian stretch. I think we should be ok dropping
support for that, the extended "LTS" support for stretch ends June 30, 2022
(with the last non-extended update at July 18, 2020).

Greetings,

Andres Freund

[1] https://repology.org/project/python/versions

Re: [RFC] building postgres with meson -v8

2022-05-06 Thread Andres Freund

Hi,

On 2022-05-04 13:53:54 +0200, Peter Eisentraut wrote:
> 0001-meson-Assorted-compiler-test-tweaks.patch
> 
> I was going through a diff of pg_config.h between old and new build and
> found a few omissions and small differences.

Thanks, merged that.

> is of course annoying and can be removed eventually, but it's useful when
> analyzing the diff, and since it's already done in other places it seems
> reasonable to apply it consistently.

Yea, I'd tried to minimize the difference at some point, but haven't done so
in a while...

> 0002-meson-Add-pg_walinspect.patch
> 
> This was added more recently and was not ported yet.  Nothing too
> interesting here.

Merged that.

> 0003-meson-Install-all-server-headers.patch
> 
> With this, all the server headers installed by a makefile-based build are
> installed.  I tried to strike a balance between using install_subdir() with
> exclude list versus listing things explicitly. Different variations might be
> possible, but this looked pretty sensible to me.

I locally had something similar, but I'm worried that this approach will be
too fragile. Leads to e.g. editor temp files getting installed. I've merged it
for now, but I think we need a different approach.

> With these patches, the list of files installed with make versus meson match
> up, except for known open items (postmaster symlink, some library naming
> differences, pkgconfig, pgxs, test modules installed, documentation).

I added pkgconfig since then. They're not exactly the same, but pretty close,
except for one thing: Looks like some of the ecpg libraries really should link
to some other ecpg libs? I think we're missing something there... That then
leads to missing requirements in the .pc files.

Re symlink: Do you have an opion about dropping the symlink vs implementing it
(likely via a small helper script?)?

Re library naming: It'd obviously be easy to adjust the library names, but I
wonder if it'd not be worth keeping the _static.a suffix, right now unsuffixed
library name imo is quite confusing.

Re test modules: Not sure what the best fix for that is yet. Except that we
don't have a search path for server libs, I'd just install them to a dedicated
path or add the build dir to the search path. But we don't, so ...

Re docs: I think the best approach here would be to have a new
meson_options.txt option defining whether the docs should be built. But not
quite sure.

Greetings,

Andres Freund

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Nathan Bossart

On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote:
> +1, I'll put together a new patch set.

As promised...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 831218d6e0c04d6342bf593dbf6799efdd831b6b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 18 Apr 2022 15:25:37 -0700
Subject: [PATCH v6 1/2] Add a new shmem_request_hook hook.

Currently, preloaded libraries are expected to request additional
shared memory and LWLocks in _PG_init().  However, it is not unusal
for such requests to depend on MaxBackends, which won't be
initialized at that time.  Such requests could also depend on GUCs
that other modules might change.  This introduces a new hook where
modules can safely use MaxBackends and GUCs to request additional
shared memory and LWLocks.

Furthermore, this change restricts requests for shared memory and
LWLocks to this hook.  Previously, libraries could make requests
until the size of the main shared memory segment was calculated.
Unlike before, we no longer silently ignore requests received at
invalid times.  Instead, we FATAL if someone tries to request
additional shared memory or LWLocks outside of the hook.

Authors: Julien Rouhaud, Nathan Bossart
Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13
---
 contrib/pg_prewarm/autoprewarm.c  | 17 +++-
 .../pg_stat_statements/pg_stat_statements.c   | 27 +--
 src/backend/postmaster/postmaster.c   |  5 
 src/backend/storage/ipc/ipci.c| 20 +-
 src/backend/storage/lmgr/lwlock.c | 18 -
 src/backend/utils/init/miscinit.c | 15 +++
 src/include/miscadmin.h   |  5 
 src/tools/pgindent/typedefs.list  |  1 +
 8 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 45e012a63a..c0c4f5d9ca 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -96,6 +96,8 @@ static void apw_start_database_worker(void);
 static bool apw_init_shmem(void);
 static void apw_detach_shmem(int code, Datum arg);
 static int	apw_compare_blockinfo(const void *p, const void *q);
+static void autoprewarm_shmem_request(void);
+static shmem_request_hook_type prev_shmem_request_hook = NULL;
 
 /* Pointer to shared-memory state. */
 static AutoPrewarmSharedState *apw_state = NULL;
@@ -139,13 +141,26 @@ _PG_init(void)
 
 	MarkGUCPrefixReserved("pg_prewarm");
 
-	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+	prev_shmem_request_hook = shmem_request_hook;
+	shmem_request_hook = autoprewarm_shmem_request;
 
 	/* Register autoprewarm worker, if enabled. */
 	if (autoprewarm)
 		apw_start_leader_worker();
 }
 
+/*
+ * Requests any additional shared memory required for autoprewarm.
+ */
+static void
+autoprewarm_shmem_request(void)
+{
+	if (prev_shmem_request_hook)
+		prev_shmem_request_hook();
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+}
+
 /*
  * Main entry point for the leader autoprewarm process.  Per-database workers
  * have a separate entry point.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index df2ce63790..87b75d779e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -252,6 +252,7 @@ static int	exec_nested_level = 0;
 static int	plan_nested_level = 0;
 
 /* Saved hook values in case of unload */
+static shmem_request_hook_type prev_shmem_request_hook = NULL;
 static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
 static planner_hook_type prev_planner_hook = NULL;
@@ -317,6 +318,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_1_10);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
 PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
+static void pgss_shmem_request(void);
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
 static void pgss_post_parse_analyze(ParseState *pstate, Query *query,
@@ -452,17 +454,11 @@ _PG_init(void)
 
 	MarkGUCPrefixReserved("pg_stat_statements");
 
-	/*
-	 * Request additional shared resources.  (These are no-ops if we're not in
-	 * the postmaster process.)  We'll allocate or attach to the shared
-	 * resources in pgss_shmem_startup().
-	 */
-	RequestAddinShmemSpace(pgss_memsize());
-	RequestNamedLWLockTranche("pg_stat_statements", 1);
-
 	/*
 	 * Install hooks.
 	 */
+	prev_shmem_request_hook = shmem_request_hook;
+	shmem_request_hook = pgss_shmem_request;
 	prev_shmem_startup_hook = shmem_startup_hook;
 	shmem_startup_hook = pgss_shmem_startup;
 	prev_post_parse_analyze_hook = post_parse_analyze_hook;
@@ -488,6 +484,7 @@ void
 _PG_fini(void)
 {
 	/* Uninstall hooks. */
+	shmem_request_hook = prev_shmem_request_hook;
 	shmem_startup_hook = prev_shmem_

Re: configure openldap crash warning

2022-05-06 Thread Tom Lane

I wrote:
> After thinking about this for awhile, it seems like the best solution
> is to make configure proceed like this:

> 1. Find libldap.
> 2. Detect whether it's OpenLDAP 2.5 or newer.
> 3. If not, try to find libldap_r.

Here's a proposed patch for that.  It seems to do the right thing
with openldap 2.4.x and 2.6.x, but I don't have a 2.5 installation
at hand to try.

regards, tom lane

diff --git a/configure b/configure
index 364f37559d..274e0db8c5 100755
--- a/configure
+++ b/configure
@@ -13410,7 +13410,18 @@ _ACEOF
 fi
 done
 
-if test "$enable_thread_safety" = yes; then
+# The separate ldap_r library only exists in OpenLDAP < 2.5, and if we
+# have 2.5 or later, we shouldn't even probe for ldap_r (we might find a
+# library from a separate OpenLDAP installation).  The most reliable
+# way to check that is to check for a function introduced in 2.5.
+ac_fn_c_check_func "$LINENO" "ldap_verify_credentials" "ac_cv_func_ldap_verify_credentials"
+if test "x$ac_cv_func_ldap_verify_credentials" = xyes; then :
+  thread_safe_libldap=yes
+else
+  thread_safe_libldap=no
+fi
+
+if test "$enable_thread_safety" = yes -a "$thread_safe_libldap" = no; then
   # Use ldap_r for FE if available, else assume ldap is thread-safe.
   # On some platforms ldap_r fails to link without PTHREAD_LIBS.
   LIBS="$_LIBS"
diff --git a/configure.ac b/configure.ac
index bfd8b713a9..bba1ac5878 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1374,7 +1374,14 @@ if test "$with_ldap" = yes ; then
 LDAP_LIBS_BE="-lldap $EXTRA_LDAP_LIBS"
 # This test is carried out against libldap.
 AC_CHECK_FUNCS([ldap_initialize])
-if test "$enable_thread_safety" = yes; then
+# The separate ldap_r library only exists in OpenLDAP < 2.5, and if we
+# have 2.5 or later, we shouldn't even probe for ldap_r (we might find a
+# library from a separate OpenLDAP installation).  The most reliable
+# way to check that is to check for a function introduced in 2.5.
+AC_CHECK_FUNC([ldap_verify_credentials],
+		  [thread_safe_libldap=yes],
+		  [thread_safe_libldap=no])
+if test "$enable_thread_safety" = yes -a "$thread_safe_libldap" = no; then
   # Use ldap_r for FE if available, else assume ldap is thread-safe.
   # On some platforms ldap_r fails to link without PTHREAD_LIBS.
   LIBS="$_LIBS"

Re: configure openldap crash warning

2022-05-06 Thread Tom Lane

I wrote:
> We still have a bit of work to do, because this setup isn't getting
> all the way through src/test/ldap/:
> 2022-05-04 11:01:33.459 EDT [21304] LOG:  server process (PID 21312) was 
> terminated by signal 11: Segmentation fault: 11
> Many of the test cases pass, but it looks like ldaps-related ones don't.

Sadly, this still happens with MacPorts' build of openldap 2.6.1.
I was able to get a stack trace from the point of the segfault
this time:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
  * frame #0: 0x00018f120628 libsystem_pthread.dylib`pthread_rwlock_rdlock
frame #1: 0x0001019174c4 libcrypto.3.dylib`CRYPTO_THREAD_read_lock + 12
frame #2: 0x0001019099d0 libcrypto.3.dylib`ossl_lib_ctx_get_data + 56
frame #3: 0x0001019144d0 libcrypto.3.dylib`get_provider_store + 28
frame #4: 0x00010191641c 
libcrypto.3.dylib`ossl_provider_deregister_child_cb + 32
frame #5: 0x000101909748 libcrypto.3.dylib`OSSL_LIB_CTX_free + 48
frame #6: 0x00010aa982d8 legacy.dylib`legacy_teardown + 24
frame #7: 0x000101914840 libcrypto.3.dylib`ossl_provider_free + 76
frame #8: 0x0001018ec404 libcrypto.3.dylib`evp_cipher_free_int + 48
frame #9: 0x000101472804 libssl.3.dylib`SSL_CTX_free + 420
frame #10: 0x0001013a4768 libldap.2.dylib`ldap_int_tls_destroy + 40
frame #11: 0x00018f00fdd0 libsystem_c.dylib`__cxa_finalize_ranges + 464
frame #12: 0x00018f00fb74 libsystem_c.dylib`exit + 44
frame #13: 0x000100941cb8 postgres`proc_exit(code=) at 
ipc.c:152:2 [opt]
frame #14: 0x00010096d804 postgres`PostgresMain(dbname=, 
username=) at postgres.c:4756:5 [opt]
frame #15: 0x0001008d7730 postgres`BackendRun(port=) at 
postmaster.c:4489:2 [opt]
frame #16: 0x0001008d6ff4 postgres`ServerLoop [inlined] 
BackendStartup(port=) at postmaster.c:4217:3 [opt]
frame #17: 0x0001008d6fcc postgres`ServerLoop at postmaster.c:1791:7 
[opt]
frame #18: 0x0001008d474c postgres`PostmasterMain(argc=, 
argv=) at postmaster.c:1463:11 [opt]
frame #19: 0x00010083a248 postgres`main(argc=, 
argv=) at main.c:202:3 [opt]
frame #20: 0x00010117908c dyld`start + 520

So (1) libldap relies on libssl to implement ldaps ... no surprise there,
and (2) something's going wrong in the atexit callback that it seemingly
installs to close down its SSL context.  It's not clear from this whether
this is purely libldap's fault or if there is something we're doing that
sends it off the rails.  I could believe that the problem is essentially
a double shutdown of libssl, except that there doesn't seem to be any
reason why PG itself would have touched libssl; this isn't an SSL-enabled
build.  (Adding --with-openssl doesn't make it better, either.)

On the whole I'm leaning to the position that this is openldap's fault
not ours.

regards, tom lane

Re: Configuration Parameter/GUC value validation hook

2022-05-06 Thread Euler Taveira

On Wed, May 4, 2022, at 8:12 AM, Bharath Rupireddy wrote:
> How about we provide a sample extension (limiting some important
> parameters say shared_buffers, work_mem and so on to some
> "reasonable/recommended" limits) in the core along with the
> set_config_option_hook? This way, all the people using open source
> postgres out-of-the-box will benefit and whoever wants, can modify
> that sample extension to suit their needs. The sampe extension can
> also serve as an example to implement set_config_option_hook.
I agree with Robert that providing a feature for it instead of a hook is the
way to go. It is not just one or two vendors that will benefit from it but
almost or if not all vendors will use this feature. Hooks should be used for
niche features; that's not the case here.

The commit a0ffa885e47 introduced the GRANT SET ON PARAMETER command. It could
be used for this purpose. Despite of accepting GRANT on PGC_USERSET GUCs, it
has no use. It doesn't mean that additional properties couldn't be added to the
current syntax. This additional use case should be enforced before or while
executing set_config_option(). Is it ok to extend this SQL command?

The syntax could be:

GRANT SET ON PARAMETER work_mem (MIN '1MB', MAX '512MB') TO bob;

NULL keyword can be used to remove the MIN and MAX limit. The idea is to avoid
a verbose syntax (add an "action" to MIN/MAX -- ADD MIN 1, DROP MAX 234, SET
MIN 456).

The other alternative is to ALTER USER SET and ALTER DATABASE SET. The current
user can set parameter for himself and he could adjust the limits. Besides that
the purpose of these SQL commands are to apply initial settings for a
combination of user/database. I'm afraid it is out of scope to check after the
session is established.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Tom Lane

Andres Freund  writes:
> Done. Perhaps you could trigger a run on longfin, that seems to have been the
> most reliably failing animal?

No need, its cron job launched already.

regards, tom lane

Re: configure openldap crash warning

2022-05-06 Thread Tom Lane

I wrote:
> Oh, I have a theory about this: I bet your Homebrew installation
> has a recent OpenLDAP version that only supplies libldap not libldap_r.
> In that case, configure will still find libldap_r available and will
> bind libpq to it, and you get the observed result.  The configure
> check is not sophisticated enough to realize that it's finding chunks
> of two different OpenLDAP installations.

After thinking about this for awhile, it seems like the best solution
is to make configure proceed like this:

1. Find libldap.
2. Detect whether it's OpenLDAP 2.5 or newer.
3. If not, try to find libldap_r.

There are various ways we could perform step 2, but I think the most
reliable is to try to link to some function that's present in 2.5
but not before.  (In particular, this doesn't require any strong
assumptions about whether the installation's header files match the
library.)  After a quick dig in 2.4 and 2.5, it looks like
ldap_verify_credentials() would serve.

Barring objections, I'll make a patch for that.

BTW, I was a little distressed to read this in the 2.4 headers:

** If you fail to define LDAP_THREAD_SAFE when linking with
** -lldap_r or define LDAP_THREAD_SAFE when linking with -lldap,
** provided header definations and declarations may be incorrect.

That's not something we do or ever have done, AFAIK.  Given the
lack of complaints and the fact that 2.4 is more or less EOL,
I don't feel a strong need to worry about it; but it might be
something to keep in mind in case we get bug reports.

regards, tom lane

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Andres Freund

On 2022-05-06 12:12:19 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I looked through all the failures I found and it's two kinds of failures, 
> > both
> > related to the deadlock test. So I'm thinking of skipping just that test as 
> > in
> > the attached.
> 
> > Working on committing / backpatching that, unless somebody suggests changes
> > quickly...
> 
> WFM.

Done. Perhaps you could trigger a run on longfin, that seems to have been the
most reliably failing animal?

Greetings,

Andres Freund

Re: Estimating HugePages Requirements?

2022-05-06 Thread Nathan Bossart

On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote:
> Yes, the redirection issue would apply to all the run-time GUCs.

Should this be tracked as an open item for v15?  There was another recent
report about the extra log output [0].

[0] https://www.postgresql.org/message-id/YnARlI5nvbziobR4%40momjian.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Support logical replication of DDLs

2022-05-06 Thread Zheng Li

> Attached is a set of two patches as an attempt to evaluate this approach.
>
> The first patch provides functions to deparse DDL commands. Currently,
> it is restricted to just a simple CREATE TABLE statement, the required
> code is extracted from one of the patches posted in the thread [1].
>
> The second patch allows replicating simple CREATE TABLE DDL
> replication. To do that we used an event trigger and DDL deparsing
> facilities. While creating a publication, we register a command end
> trigger that deparses the DDL as a JSON blob, and WAL logs it. The
> event trigger is automatically removed at the time of drop
> publication. The WALSender decodes the WAL and sends it downstream
> similar to other DML commands. The subscriber then converts JSON back
> to the DDL command string and executes it. In the subscriber, we also
> add the newly added rel to pg_subscription_rel so that the DML changes
> on the new table can be replicated without having to manually run
> "ALTER SUBSCRIPTION ... REFRESH PUBLICATION". Some of the code related
> to WAL logging and subscriber-side work is taken from the patch posted
> by Zheng in this thread but there are quite a few changes in that as
> we don't need schema, role, transaction vs. non-transactional
> handling.
>
> Note that for now, we have hacked Create Publication code such that
> when the user specifies the "FOR ALL TABLES" clause, we invoke this
> new functionality. So, this will work only for "FOR ALL TABLES"
> publications. For example, we need to below to replicate the simple
> Create Table command.
>
> Publisher:
> Create Publication pub1 For All Tables;
>
> Subscriber:
> Create Subscription sub1 Connection '...' Publication pub1;
>
> Publisher:
> Create Table t1(c1 int);
>
> Subscriber:
> \d should show t1.
>
> As we have hacked CreatePublication function for this POC, the
> regression tests are not passing but we can easily change it so that
> we invoke new functionality with the syntax proposed in this thread or
> with some other syntax and we shall do that in the next patch unless
> this approach is not worth pursuing.
>
> This POC is prepared by Ajin Cherian, Hou-San, and me.
>
> Thoughts?
>
> [1] - 
> https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org

Hi,

Thanks for exploring this direction.

I read the deparsing thread and your patch. Here is my thought:
1. The main concern on maintainability of the deparsing code still
applies if we want to adapt it for DDL replication.
2. The search_path and role still need to be handled, in the deparsing
code. And I think it's actually more overhead to qualify every object
compared to just logging the search_path and enforcing it on the apply
worker.
3. I'm trying to understand if deparsing helps with edge cases like
"ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();". I
don't think it helps out of the box. The crux of separating table
rewrite and DDL still needs to be solved for such cases.

Regards,
Zheng

Re: Can postgres ever delete the recycled future WAL files to free-up disk space if max_wal_size is reduced or wal_recycle is set to off?

2022-05-06 Thread Tom Lane

Bharath Rupireddy  writes:
> Can postgres delete the recycled future WAL files once max_wal_size is
> reduced and/or wal_recycle is set to off?

A checkpoint should do that, see RemoveOldXlogFiles.

Maybe you have a broken WAL archiving setup, or something else preventing
removal of old WAL files?

regards, tom lane

Can postgres ever delete the recycled future WAL files to free-up disk space if max_wal_size is reduced or wal_recycle is set to off?

2022-05-06 Thread Bharath Rupireddy

Hi,

I have a scenario where max_wal_size = 10GB and wal_recycle = on, the
postgres starts to recycle and keep WAL files for future use,
eventually around 600~ WAL files have been kept in the pg_wal
directory. The checkpoints were happening at regular intervals. But
the disk was about to get full (of course scaling up disk is an
option) but to avoid "no space left on device" crashes, changed
max_wal_size = 5GB, and issued a checkpoint, thinking that postgres
will free up the 5GB of disk space. It seems like that's not the case
because postgres will not remove future WAL files even after
max_wal_size is reduced, but if it can delete the future WAL file(s)
immediately, the server would have had 5GB free disk space to keep the
server up avoiding crash and meanwhile disk scaling can be performed.

Can postgres delete the recycled future WAL files once max_wal_size is
reduced and/or wal_recycle is set to off?

Thoughts?

Regards,
Bharath Rupireddy.

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Tom Lane

Andres Freund  writes:
> I looked through all the failures I found and it's two kinds of failures, both
> related to the deadlock test. So I'm thinking of skipping just that test as in
> the attached.

> Working on committing / backpatching that, unless somebody suggests changes
> quickly...

WFM.

regards, tom lane

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Andres Freund

Hi,

On 2022-05-05 23:57:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
> >> So I reluctantly vote for removing 031_recovery_conflict.pl in the
> >> back branches for now, with the expectation that we'll fix the
> >> infrastructure and put it back after the current release round
> >> is done.
> 
> > What about instead marking the flapping test TODO? That'd still give us most
> > of the coverage...
> 
> Are you sure there's just one test that's failing?  I haven't checked
> the buildfarm history close enough to be sure of that.  But if it's
> true, disabling just that one would be fine (again, as a stopgap
> measure).

I looked through all the failures I found and it's two kinds of failures, both
related to the deadlock test. So I'm thinking of skipping just that test as in
the attached.

Working on committing / backpatching that, unless somebody suggests changes
quickly...

Greetings,

Andres Freund
diff --git c/src/test/recovery/t/031_recovery_conflict.pl i/src/test/recovery/t/031_recovery_conflict.pl
index 52f00a6f514..72808095d21 100644
--- c/src/test/recovery/t/031_recovery_conflict.pl
+++ i/src/test/recovery/t/031_recovery_conflict.pl
@@ -228,6 +228,10 @@ check_conflict_stat("lock");
 
 
 ## RECOVERY CONFLICT 5: Deadlock
+SKIP:
+{
+	skip "disabled until after minor releases, due to instability";
+
 $sect = "startup deadlock";
 $expected_conflicts++;
 
@@ -286,6 +290,7 @@ check_conflict_stat("deadlock");
 
 # clean up for next tests
 $node_primary->safe_psql($test_db, qq[ROLLBACK PREPARED 'lock';]);
+}
 
 
 # Check that expected number of conflicts show in pg_stat_database. Needs to

Re: Possible corruption by CreateRestartPoint at promotion

2022-05-06 Thread Nathan Bossart

On Fri, May 06, 2022 at 07:58:43PM +0900, Michael Paquier wrote:
> And I have spent a bit of this stuff to finish with the attached.  It
> will be a plus to get that done on HEAD for beta1, so I'll try to deal
> with it on Monday.  I am still a bit stressed about the back branches
> as concurrent checkpoints are possible via CreateCheckPoint() from the
> startup process (not the case of HEAD), but the stable branches will
> have a new point release very soon so let's revisit this choice there
> later.  v6 attached includes a TAP test, but I don't intend to include
> it as it is expensive.

I was looking at other changes in this area (e.g., 3c64dcb), and now I'm
wondering if we actually should invalidate the minRecoveryPoint when the
control file no longer indicates archive recovery.  Specifically, what
happens if a base backup of a newly promoted standby is used for a
point-in-time restore?  If the checkpoint location is updated and all
previous segments have been recycled/removed, it seems like the
minRecoveryPoint might point to a missing segment.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: libpq async duplicate error results

2022-05-06 Thread Tom Lane

I wrote:
> What is happening is that this bit in PQsendQueryStart is deciding
> not to clear the message buffer because conn->cmd_queue_head isn't
> NULL:

> /*
>  * If this is the beginning of a query cycle, reset the error state.
>  * However, in pipeline mode with something already queued, the error
>  * buffer belongs to that command and we shouldn't clear it.
>  */
> if (newQuery && conn->cmd_queue_head == NULL)
> pqClearConnErrorState(conn);

> So apparently the fault is with the pipelined-query logic.  I'm
> not sure what cleanup step is missing though.

I experimented with the attached patch, which is based on the idea
that clearing the command queue is being done in entirely the wrong
place.  It's more appropriate to do it where we drop unsent output
data, ie in pqDropConnection not pqDropServerData.  The fact that
it was jammed into the latter without any commentary doesn't do much
to convince me that that placement was thought about carefully.

This fixes the complained-of symptom and still passes check-world.
However, I wonder whether cutting the queue down to completely empty
is the right thing.  Why is the queue set up like this in the first
place --- that is, why don't we remove an entry the moment the
command is sent?  I also wonder whether pipelineStatus ought to
get reset here.  We aren't resetting asyncStatus here, so maybe it's
fine, but I'm not quite sure.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 4c12f1411f..6e936bbff3 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -377,6 +377,7 @@ static int	connectDBStart(PGconn *conn);
 static int	connectDBComplete(PGconn *conn);
 static PGPing internal_ping(PGconn *conn);
 static PGconn *makeEmptyPGconn(void);
+static void pqFreeCommandQueue(PGcmdQueueEntry *queue);
 static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
 static void freePGconn(PGconn *conn);
 static void closePGconn(PGconn *conn);
@@ -462,6 +463,12 @@ pqDropConnection(PGconn *conn, bool flushInput)
 	/* Always discard any unsent data */
 	conn->outCount = 0;
 
+	/* Likewise, discard any pending pipelined commands */
+	pqFreeCommandQueue(conn->cmd_queue_head);
+	conn->cmd_queue_head = conn->cmd_queue_tail = NULL;
+	pqFreeCommandQueue(conn->cmd_queue_recycle);
+	conn->cmd_queue_recycle = NULL;
+
 	/* Free authentication/encryption state */
 #ifdef ENABLE_GSS
 	{
@@ -569,12 +576,6 @@ pqDropServerData(PGconn *conn)
 	}
 	conn->notifyHead = conn->notifyTail = NULL;
 
-	pqFreeCommandQueue(conn->cmd_queue_head);
-	conn->cmd_queue_head = conn->cmd_queue_tail = NULL;
-
-	pqFreeCommandQueue(conn->cmd_queue_recycle);
-	conn->cmd_queue_recycle = NULL;
-
 	/* Reset ParameterStatus data, as well as variables deduced from it */
 	pstatus = conn->pstatus;
 	while (pstatus != NULL)

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Nathan Bossart

On Fri, May 06, 2022 at 10:43:21AM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> On Fri, May 6, 2022 at 9:57 AM Tom Lane  wrote:
>>> I agree that _PG_fini functions as they stand are worthless.
>>> What I'm not getting is why we should care enough about that
>>> to break just about everybody's extension.  Even if unloading
>>> extensions were feasible, who would bother?
> 
>> Well, if we think that, then we ought to remove the NOT_USED code and
>> all the random _PG_fini() stuff that's still floating around.
> 
> I think that's exactly what we should do, if it bugs you that stuff
> is just sitting there.  I see no prospect that we'll ever make it
> work, because the question of unhooking from hooks is just the tip
> of the iceberg.  As an example, what should happen with any custom
> GUCs the module has defined?  Dropping their values might not be
> very nice, but if we leave them around then the next LOAD (if any)
> will see a conflict.  Another fun question is whether it's ever
> safe to unload a module that was preloaded by the postmaster.
> 
> In short, this seems like a can of very wriggly worms, with not
> a lot of benefit that would ensue from opening it.

+1, I'll put together a new patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: gcc 12.1.0 warning

2022-05-06 Thread Tom Lane

Erik Rijkers  writes:
> Not sure if these compiler-mutterings are worth reporting but I guess 
> we're trying to get a silent compile.

> System: Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux
> Compiling with gcc 12.1.0 causes the below 'warning' and 'note'.
> Compiling with --enable-cassert --enable-debug  is silent, no warnings)

> In function ‘guc_var_compare’,
>  inlined from ‘bsearch’ at 
> /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23,
>  inlined from ‘find_option’ at guc.c:5649:35:
> guc.c:5736:38: warning: array subscript ‘const struct config_generic[0]’ 
> is partly outside array bounds of ‘const char[8]’ [-Warray-bounds]
>   5736 | return guc_name_compare(confa->name, confb->name);
>| ~^~

I'd call that a compiler bug.

regards, tom lane

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Tom Lane

Robert Haas  writes:
> On Fri, May 6, 2022 at 9:57 AM Tom Lane  wrote:
>> I agree that _PG_fini functions as they stand are worthless.
>> What I'm not getting is why we should care enough about that
>> to break just about everybody's extension.  Even if unloading
>> extensions were feasible, who would bother?

> Well, if we think that, then we ought to remove the NOT_USED code and
> all the random _PG_fini() stuff that's still floating around.

I think that's exactly what we should do, if it bugs you that stuff
is just sitting there.  I see no prospect that we'll ever make it
work, because the question of unhooking from hooks is just the tip
of the iceberg.  As an example, what should happen with any custom
GUCs the module has defined?  Dropping their values might not be
very nice, but if we leave them around then the next LOAD (if any)
will see a conflict.  Another fun question is whether it's ever
safe to unload a module that was preloaded by the postmaster.

In short, this seems like a can of very wriggly worms, with not
a lot of benefit that would ensue from opening it.

regards, tom lane

gcc 12.1.0 warning

2022-05-06 Thread Erik Rijkers


Hi,

Not sure if these compiler-mutterings are worth reporting but I guess 
we're trying to get a silent compile.


System: Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux
Compiling with gcc 12.1.0 causes the below 'warning' and 'note'.
Compiling with --enable-cassert --enable-debug  is silent, no warnings)

In function ‘guc_var_compare’,
inlined from ‘bsearch’ at 
/usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23,

inlined from ‘find_option’ at guc.c:5649:35:
guc.c:5736:38: warning: array subscript ‘const struct config_generic[0]’ 
is partly outside array bounds of ‘const char[8]’ [-Warray-bounds]

 5736 | return guc_name_compare(confa->name, confb->name);
  | ~^~

guc.c: In function ‘find_option’:
guc.c:5636:25: note: object ‘name’ of size 8
 5636 | find_option(const char *name, bool create_placeholders, bool 
skip_errors,

  | ^~~~

(Compiling with gcc 6.3.0 does not complain.)

Below are the two configure lines, FWIW.


Erik Rijkers


# cassert-build: no warning/note
./configure \
--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin \
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib \
--with-pgport=6515 --quiet --enable-depend \
--enable-cassert --enable-debug \
--with-openssl --with-perl --with-libxml --with-libxslt --with-zlib \
--enable-tap-tests --with-extra-version=_0506_HEAD_701d --with-lz4


# normal build: causes warning/note:
./configure \
--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin.fast \
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib.fast \
--with-pgport=6515 --quiet --enable-depend \
--with-openssl --with-perl --with-libxml --with-libxslt --with-zlib \
--enable-tap-tests --with-extra-version=_0506_HEAD_701d --with-lz4

Re: libpq async duplicate error results

2022-05-06 Thread Tom Lane

Peter Eisentraut  writes:
> I took another look at this.  The output from the test program at the 
> beginning of the thread is now (master branch):
> ...
> command = SELECT 'after';
> PQsendQuery() error: FATAL:  terminating connection due to administrator 
> command
> server closed the connection unexpectedly
>   This probably means the server terminated abnormally
>   before or while processing the request.
> no connection to the server

> It appears the "after" query is getting the error message from the 
> previous cycle somehow.

What is happening is that this bit in PQsendQueryStart is deciding
not to clear the message buffer because conn->cmd_queue_head isn't
NULL:

/*
 * If this is the beginning of a query cycle, reset the error state.
 * However, in pipeline mode with something already queued, the error
 * buffer belongs to that command and we shouldn't clear it.
 */
if (newQuery && conn->cmd_queue_head == NULL)
pqClearConnErrorState(conn);

So apparently the fault is with the pipelined-query logic.  I'm
not sure what cleanup step is missing though.

regards, tom lane

Re: BufferAlloc: don't take two simultaneous locks

2022-05-06 Thread Robert Haas

On Thu, Apr 21, 2022 at 6:58 PM Yura Sokolov  wrote:
> At the master state:
> - SharedBufHash is not declared as HASH_FIXED_SIZE
> - get_hash_entry falls back to element_alloc too fast (just if it doesn't
>   found free entry in current freelist partition).
> - get_hash_entry has races.
> - if there are small number of spare items (and NUM_BUFFER_PARTITIONS is
>   small number) and HASH_FIXED_SIZE is set, it becomes contended and
>   therefore slow.
>
> HASH_REUSE solves (for shared buffers) most of this issues. Free list
> became rare fallback, so HASH_FIXED_SIZE for SharedBufHash doesn't lead
> to performance hit. And with fair number of spare items, get_hash_entry
> will find free entry despite its races.

Hmm, I see. The idea of trying to arrange to reuse entries rather than
pushing them onto a freelist and immediately trying to take them off
again is an interesting one, and I kind of like it. But I can't
imagine that anyone would commit this patch the way you have it. It's
way too much action at a distance. If any ereport(ERROR,...) could
happen between the HASH_REUSE operation and the subsequent HASH_ENTER,
it would be disastrous, and those things are separated by multiple
levels of call stack across different modules, so mistakes would be
easy to make. If this could be made into something dynahash takes care
of internally without requiring extensive cooperation with the calling
code, I think it would very possibly be accepted.

One approach would be to have a hash_replace() call that takes two
const void * arguments, one to delete and one to insert. Then maybe
you propagate that idea upward and have, similarly, a BufTableReplace
operation that uses that, and then the bufmgr code calls
BufTableReplace instead of BufTableDelete. Maybe there are other
better ideas out there...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Robert Haas

On Fri, May 6, 2022 at 9:57 AM Tom Lane  wrote:
> Robert Haas  writes:
> > perhaps for v16 or some future release we can think about redoing a
> > bunch of hooks this way. There would be some migration pain for
> > extension authors for sure, but then we might get to a point where
> > extensions can be cleanly unloaded in at least some circumstances,
> > instead of continuing to write silly _PG_fini() functions that
> > couldn't possibly ever work.
>
> I agree that _PG_fini functions as they stand are worthless.
> What I'm not getting is why we should care enough about that
> to break just about everybody's extension.  Even if unloading
> extensions were feasible, who would bother?

Well, if we think that, then we ought to remove the NOT_USED code and
all the random _PG_fini() stuff that's still floating around.

I don't actually have a clear answer to whether it's a useful thing to
be able to unload modules. If the module is just providing a bunch of
SQL-callable functions, it probably isn't. If it's modifying the
behavior of your session, it might be. Now, it could also have an
"off" switch in the form of a GUC, and probably should - but you'd
probably save at least a few cycles by detaching from the hooks rather
than just still getting called and doing nothing, and maybe that's
worth something. Whether it's worth enough to justify making changes
that will affect extensions, I'm not sure.

IOW, I don't really know what we ought to do here, but I think that
maintaining a vestigial system that has never worked and can never
work is not it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Tom Lane

Robert Haas  writes:
> perhaps for v16 or some future release we can think about redoing a
> bunch of hooks this way. There would be some migration pain for
> extension authors for sure, but then we might get to a point where
> extensions can be cleanly unloaded in at least some circumstances,
> instead of continuing to write silly _PG_fini() functions that
> couldn't possibly ever work.

I agree that _PG_fini functions as they stand are worthless.
What I'm not getting is why we should care enough about that
to break just about everybody's extension.  Even if unloading
extensions were feasible, who would bother?

regards, tom lane

Re: bogus: logical replication rows/cols combinations

2022-05-06 Thread Tomas Vondra

On 5/6/22 15:40, Amit Kapila wrote:
> On Fri, May 6, 2022 at 5:56 PM Tomas Vondra
>  wrote:
>>

 I could think of below two options:
 1. Forbid any case where column list is different for the same table
 when combining publications.
 2. Forbid if the column list and row filters for a table are different
 in the set of publications we are planning to combine. This means we
 will allow combining column lists when row filters are not present or
 when column list is the same (we don't get anything additional by
 combining but the idea is we won't forbid such cases) and row filters
 are different.

 Now, I think the points in favor of (1) are that the main purpose of
 introducing a column list are: (a) the structure/schema of the
 subscriber is different from the publisher, (b) want to hide sensitive
 columns data. In both cases, it should be fine if we follow (1) and
 from Peter E.'s latest email [1] he also seems to be indicating the
 same. If we want to be slightly more relaxed then we can probably (2).
 We can decide on something else as well but I feel it should be such
 that it is easy to explain.
>>>
>>> I also think it makes sense to add a restriction like (1). I am planning to
>>> implement the restriction if no one objects.
>>>
>>
>> I'm not going to block that approach if that's the consensus here,
>> though I'm not convinced.
>>
>> Let me point out (1) does *not* work for data redaction use case,
>> certainly not the example Alvaro and me presented, because that relies
>> on a combination of row filters and column filters.
>>
> 
> This should just forbid the case presented by Alvaro in his first
> email in this thread [1].
> 
>> Requiring all column
>> lists to be the same (and not specific to row filter) prevents that
>> example from working. Yes, you can create multiple subscriptions, but
>> that brings it's own set of challenges too.
>>
>> I doubt forcing users to use the more complex setup is good idea, and
>> combining the column lists per [1] seems sound to me.
>>
>> That being said, the good thing is this restriction seems it might be
>> relaxed in the future to work per [1], without causing any backwards
>> compatibility issues.
>>
> 
> These are my thoughts as well. Even, if we decide to go via the column
> list merging approach (in selective cases), we need to do some
> performance testing of that approach as it does much more work per
> tuple. It is possible that the impact is not much but still worth
> evaluating, so let's try to see the patch to prohibit combining the
> column lists then we can decide.
> 

Surely we could do some performance testing now. I doubt it's very
expensive - sure, you can construct cases with many row filters / column
lists, but how likely is that in practice?

Moreover, it's not like this would affect existing setups, so even if
it's a bit expensive, we may interpret that as cost of the feature.

>> Should we do something similar for row filters, though? It seems quite
>> weird we're so concerned about unexpected behavior due to combining
>> column lists (despite having a patch that makes it behave sanely), and
>> at the same time wave off similarly strange behavior due to combining
>> row filters because "that's what you get if you define the publications
>> in a strange way".
>>
> 
> During development, we found that we can't combine the row-filters for
> 'insert' and 'update'/'delete' because of replica identity
> restrictions, so we have kept them separate. But if we came across
> other such things then we can either try to fix those or forbid them.
> 

I understand how we got to the current state. I'm just saying that this
allows defining separate publications for insert, update and delete
actions, and set different row filters for each of them. Which results
in behavior that is hard to explain/understand, especially when it comes
to tablesync.

It seems quite strange to prohibit merging column lists because there
might be some strange behavior that no one described, and allow setups
with different row filters that definitely have strange behavior.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: make MaxBackends available in _PG_init

2022-05-06 Thread Robert Haas

On Tue, Apr 19, 2022 at 11:47 AM Nathan Bossart
 wrote:
> Okay, I did it this way in v5.

I pushed 0001. Regarding 0002, I think there's no point in adding a
_PG_fini(). The code to call _PG_fini() has been surrounded by #ifdef
NOT_USED forever, and that seems unlikely to change any time soon as
long as we stick with these stupid hook designs where there's
effectively a linked list of hooks, but you can't actually access the
list because spread across a bunch of random static variables in each
module. I think we ought to go through all the hooks that are being
used in this kind of way and replace them with a system where there's
an explicit List of functions to call, and instead of this lame stuff
like:

+   prev_shmem_request_hook = shmem_request_hook;
+   shmem_request_hook = autoprewarm_shmem_request;

You instead do:

shmem_request_functions = lappend(shmem_request_functions,
autoprewarm_shmem_request);

Or maybe wrap that up in an API, like:

add_shmem_request_function(autoprewarm_shmem_request);

Then it would be easy to have corresponding a corresponding remove function.

For shmem request, it would probably make sense to ditch the part
where each hook function calls the next hook function and instead just
have the calling code loop over the list and call them one by one. For
things like the executor start/run/end hooks, that wouldn't work, but
you could have something like
invoke_next_executor_start_hook_function(args).

I don't think we should try to make this kind of change now - it seems
to me that what you've done here is consistent with existing practice
and we might as well commit it more or less like this for now. But
perhaps for v16 or some future release we can think about redoing a
bunch of hooks this way. There would be some migration pain for
extension authors for sure, but then we might get to a point where
extensions can be cleanly unloaded in at least some circumstances,
instead of continuing to write silly _PG_fini() functions that
couldn't possibly ever work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq async duplicate error results

2022-05-06 Thread Peter Eisentraut

I took another look at this.  The output from the test program at the 
beginning of the thread is now (master branch):


command = SELECT 'before';
result 1 status = PGRES_TUPLES_OK
error message = ""

command = SELECT pg_terminate_backend(pg_backend_pid());
result 1 status = PGRES_FATAL_ERROR
error message = "FATAL:  terminating connection due to administrator command
"
result 2 status = PGRES_FATAL_ERROR
error message = "server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
"

command = SELECT 'after';
PQsendQuery() error: FATAL:  terminating connection due to administrator 
command

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no connection to the server


It appears the "after" query is getting the error message from the 
previous cycle somehow.



The output in PG14 and PG13 is:

command = SELECT 'after';
PQsendQuery() error: no connection to the server


Is the change intended or do we need to think about more tweaking?

Re: Trying to add more tests to gistbuild.c

2022-05-06 Thread Matheus Alcantara

The attached patch is failing on make check due to a typo, resubmitting the 
correct one.

--
Matheus Alcantaradiff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out
index a36b4c9c56..b5edc44250 100644
--- a/src/test/regress/expected/gist.out
+++ b/src/test/regress/expected/gist.out
@@ -387,6 +387,12 @@ select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1;
 
 select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1;
 ERROR:  lossy distance functions are not supported in index-only scans
+-- Build an index using buffering caused by a index build that don't fit on cache.
+set effective_cache_size = '1MB';
+create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c);
+reset effective_cache_size;
+-- Force an index build using buffering.
+create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on);
 -- Clean up
 reset enable_seqscan;
 reset enable_bitmapscan;
diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql
index 3360266370..214366c157 100644
--- a/src/test/regress/sql/gist.sql
+++ b/src/test/regress/sql/gist.sql
@@ -169,6 +169,15 @@ explain (verbose, costs off)
 select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1;
 select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1;
 
+
+-- Build an index using buffering caused by a index build that don't fit on cache.
+set effective_cache_size = '1MB';
+create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c);
+reset effective_cache_size;
+
+-- Force an index build using buffering.
+create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on);
+
 -- Clean up
 reset enable_seqscan;
 reset enable_bitmapscan;

Re: bogus: logical replication rows/cols combinations

2022-05-06 Thread Amit Kapila

On Fri, May 6, 2022 at 5:56 PM Tomas Vondra
 wrote:
>
> >>
> >> I could think of below two options:
> >> 1. Forbid any case where column list is different for the same table
> >> when combining publications.
> >> 2. Forbid if the column list and row filters for a table are different
> >> in the set of publications we are planning to combine. This means we
> >> will allow combining column lists when row filters are not present or
> >> when column list is the same (we don't get anything additional by
> >> combining but the idea is we won't forbid such cases) and row filters
> >> are different.
> >>
> >> Now, I think the points in favor of (1) are that the main purpose of
> >> introducing a column list are: (a) the structure/schema of the
> >> subscriber is different from the publisher, (b) want to hide sensitive
> >> columns data. In both cases, it should be fine if we follow (1) and
> >> from Peter E.'s latest email [1] he also seems to be indicating the
> >> same. If we want to be slightly more relaxed then we can probably (2).
> >> We can decide on something else as well but I feel it should be such
> >> that it is easy to explain.
> >
> > I also think it makes sense to add a restriction like (1). I am planning to
> > implement the restriction if no one objects.
> >
>
> I'm not going to block that approach if that's the consensus here,
> though I'm not convinced.
>
> Let me point out (1) does *not* work for data redaction use case,
> certainly not the example Alvaro and me presented, because that relies
> on a combination of row filters and column filters.
>

This should just forbid the case presented by Alvaro in his first
email in this thread [1].

> Requiring all column
> lists to be the same (and not specific to row filter) prevents that
> example from working. Yes, you can create multiple subscriptions, but
> that brings it's own set of challenges too.
>
> I doubt forcing users to use the more complex setup is good idea, and
> combining the column lists per [1] seems sound to me.
>
> That being said, the good thing is this restriction seems it might be
> relaxed in the future to work per [1], without causing any backwards
> compatibility issues.
>

These are my thoughts as well. Even, if we decide to go via the column
list merging approach (in selective cases), we need to do some
performance testing of that approach as it does much more work per
tuple. It is possible that the impact is not much but still worth
evaluating, so let's try to see the patch to prohibit combining the
column lists then we can decide.

> Should we do something similar for row filters, though? It seems quite
> weird we're so concerned about unexpected behavior due to combining
> column lists (despite having a patch that makes it behave sanely), and
> at the same time wave off similarly strange behavior due to combining
> row filters because "that's what you get if you define the publications
> in a strange way".
>

During development, we found that we can't combine the row-filters for
'insert' and 'update'/'delete' because of replica identity
restrictions, so we have kept them separate. But if we came across
other such things then we can either try to fix those or forbid them.

[1] - 
https://www.postgresql.org/message-id/202204251548.mudq7jbqnh7r%40alvherre.pgsql

-- 
With Regards,
Amit Kapila.

Re: Fix typo in comment

2022-05-06 Thread Tom Lane

Alvaro Herrera  writes:
> On 2022-May-06, Zaorang Yang wrote:
>> Maybe, the first letter of comments in postinit.c should be capitalized.

> Hmm, typically these one-line comments are not "full sentences", so they
> don't have capitals and no ending periods either.  I wouldn't like the
> endless stream of patches that would result if we let this go in.

There is no project style guideline suggesting one over the other, so
I think we should just leave it alone.

regards, tom lane

Trying to add more tests to gistbuild.c

2022-05-06 Thread Matheus Alcantara


I'm studying how the gist index works trying to improve the test coverage of 
gistbuild.c.

Reading the source code I noticed that the gistInitBuffering function is not 
covered, so I decided to start with it.
Reading the documentation and the source I understood that for this function to 
be executed it is necessary to force
bufferring=on when creating the index or the index to be created is big enough 
to not fit in the cache, am I correct?

Considering the above, I added two new index creation statements in the gist 
regression test (patch attached) to create
an index using buffering=on and another to try to simulate an index that does 
not fit in the cache.

With these new tests the coverage went from 45.3% to 85.5%, but I have some 
doubts:
- Does this test make sense?
- Would there be a way to validate that the buffering was done correctly?
- Is this test necessary?

I've been studying Postgresql implementations and I'm just trying to start 
contributing the source code.


--
Matheus Alcantaradiff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out
index a36b4c9c56..044986433a 100644
--- a/src/test/regress/expected/gist.out
+++ b/src/test/regress/expected/gist.out
@@ -46,6 +46,12 @@ vacuum analyze gist_tbl;
 set enable_seqscan=off;
 set enable_bitmapscan=off;
 set enable_indexonlyscan=on;
+-- Build an index using buffering caused by a index build that don't fit on cache.
+set effective_cache_size = '1MB';
+create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c);
+reset effective_cache_size;
+-- Force a index build using buffering.
+create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on);
 -- Test index-only scan with point opclass
 create index gist_tbl_point_index on gist_tbl using gist (p);
 -- check that the planner chooses an index-only scan
diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql
index 3360266370..836ce84d71 100644
--- a/src/test/regress/sql/gist.sql
+++ b/src/test/regress/sql/gist.sql
@@ -55,6 +55,14 @@ set enable_seqscan=off;
 set enable_bitmapscan=off;
 set enable_indexonlyscan=on;
 
+-- Build an index using buffering caused by a index build that don't fit on cache.
+set effective_cache_size = '1MB';
+create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c);
+reset effective_cache_size;
+
+-- Force an index build using buffering.
+create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on);
+
 -- Test index-only scan with point opclass
 create index gist_tbl_point_index on gist_tbl using gist (p);

Re: Fix typo in comment

2022-05-06 Thread Daniel Gustafsson

> On 6 May 2022, at 14:50, Alvaro Herrera  wrote:
> 
> On 2022-May-06, Zaorang Yang wrote:
> 
>> Maybe, the first letter of comments in postinit.c should be capitalized.
> 
> Hmm, typically these one-line comments are not "full sentences", so they
> don't have capitals and no ending periods either.  I wouldn't like the
> endless stream of patches that would result if we let this go in.

Agreed. A quick grep turns up a fair number of such comments:

 $ git grep "^\s*\/\* [a-z].\+\*\/$" src/ | wc -l
   16588

If anything should be done to this it would perhaps be better to add to
pgindent or a similar automated process.  If anything.

--
Daniel Gustafsson   https://vmware.com/

Re: Fix typo in comment

2022-05-06 Thread Alvaro Herrera

On 2022-May-06, Zaorang Yang wrote:

> Maybe, the first letter of comments in postinit.c should be capitalized.

Hmm, typically these one-line comments are not "full sentences", so they
don't have capitals and no ending periods either.  I wouldn't like the
endless stream of patches that would result if we let this go in.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Fix typo in comment

2022-05-06 Thread Zaorang Yang

Maybe, the first letter of comments in postinit.c should be capitalized.
Attaching a tiny patch to fix it.


0001-Fix-typo-in-comment.patch
Description: Binary data

Re: bogus: logical replication rows/cols combinations

2022-05-06 Thread Tomas Vondra

On 5/6/22 05:23, houzj.f...@fujitsu.com wrote:
> On Tuesday, May 3, 2022 11:31 AM Amit Kapila  wrote:
>>
>> On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
>>  wrote:
>>>
>>> On 5/2/22 19:51, Alvaro Herrera wrote:
> Why would we need to know publications replicated by other
>> walsenders?
> And what if the subscriber is not connected at the moment? In that case
> there'll be no walsender.

 Sure, if the replica is not connected then there's no issue -- as you
 say, that replica will fail at START_REPLICATION time.

>>>
>>> Right, I got confused a bit.
>>>
>>> Anyway, I think the main challenge is defining what exactly we want to
>>> check, in order to ensure "sensible" behavior, without preventing way
>>> too many sensible use cases.
>>>
>>
>> I could think of below two options:
>> 1. Forbid any case where column list is different for the same table
>> when combining publications.
>> 2. Forbid if the column list and row filters for a table are different
>> in the set of publications we are planning to combine. This means we
>> will allow combining column lists when row filters are not present or
>> when column list is the same (we don't get anything additional by
>> combining but the idea is we won't forbid such cases) and row filters
>> are different.
>>
>> Now, I think the points in favor of (1) are that the main purpose of
>> introducing a column list are: (a) the structure/schema of the
>> subscriber is different from the publisher, (b) want to hide sensitive
>> columns data. In both cases, it should be fine if we follow (1) and
>> from Peter E.'s latest email [1] he also seems to be indicating the
>> same. If we want to be slightly more relaxed then we can probably (2).
>> We can decide on something else as well but I feel it should be such
>> that it is easy to explain.
> 
> I also think it makes sense to add a restriction like (1). I am planning to
> implement the restriction if no one objects.
> 

I'm not going to block that approach if that's the consensus here,
though I'm not convinced.

Let me point out (1) does *not* work for data redaction use case,
certainly not the example Alvaro and me presented, because that relies
on a combination of row filters and column filters. Requiring all column
lists to be the same (and not specific to row filter) prevents that
example from working. Yes, you can create multiple subscriptions, but
that brings it's own set of challenges too.

I doubt forcing users to use the more complex setup is good idea, and
combining the column lists per [1] seems sound to me.

That being said, the good thing is this restriction seems it might be
relaxed in the future to work per [1], without causing any backwards
compatibility issues.

Should we do something similar for row filters, though? It seems quite
weird we're so concerned about unexpected behavior due to combining
column lists (despite having a patch that makes it behave sanely), and
at the same time wave off similarly strange behavior due to combining
row filters because "that's what you get if you define the publications
in a strange way".

regards

[1]
https://www.postgresql.org/message-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824%40enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix typo in code comment - origin.c

2022-05-06 Thread Michael Paquier

On Fri, May 06, 2022 at 07:25:34PM +1000, Peter Smith wrote:
> PSA trivial patch to fix some very old comment typo.

Thanks, fixed.
--
Michael


signature.asc
Description: PGP signature

Re: Possible corruption by CreateRestartPoint at promotion

2022-05-06 Thread Michael Paquier

On Thu, Apr 28, 2022 at 03:49:42PM +0900, Michael Paquier wrote:
> I am not sure what you mean here.  FWIW, I am translating the
> suggestion of Nathan to split the existing check in
> CreateRestartPoint() that we are discussing here into two if blocks,
> instead of just one:
> - Move the update of checkPoint and checkPointCopy into its own if
> block, controlled only by the check on
> (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
> - Keep the code updating minRecoveryPoint and minRecoveryPointTLI
> mostly as-is, but just do the update when the control file state is
> DB_IN_ARCHIVE_RECOVERY.  Of course we need to keep the check on
> (minRecoveryPoint < lastCheckPointEndPtr).

And I have spent a bit of this stuff to finish with the attached.  It
will be a plus to get that done on HEAD for beta1, so I'll try to deal
with it on Monday.  I am still a bit stressed about the back branches
as concurrent checkpoints are possible via CreateCheckPoint() from the
startup process (not the case of HEAD), but the stable branches will
have a new point release very soon so let's revisit this choice there
later.  v6 attached includes a TAP test, but I don't intend to include
it as it is expensive.
--
Michael
From ffc1469819b28ff76e3290cd695e60511d94305a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 6 May 2022 19:57:54 +0900
Subject: [PATCH v6] Correctly update control file at promotion

If the cluster ends recovery on a promotion (aka the control file shows
a state different than DB_IN_ARCHIVE_RECOVERY) while
CreateRestartPoint() is still processing, this function could miss the
update the control file but still do the recycling/removal of the past
WAL segments, causing a follow-up restart doing recovery to fail on a
missing checkpoint record (aka PANIC) because the redo LSN referred in
the control file was located in a segment already gone.

This commit fixes the control file update so the redo LSN is updated to
reflect what gets recycled, with the minimum recovery point changed only
if the cluster is still doing archive recovery.
---
 src/backend/access/transam/xlog.c |  48 
 .../t/027_invalid_checkpoint_after_promote.pl | 113 ++
 2 files changed, 141 insertions(+), 20 deletions(-)
 create mode 100644 src/test/recovery/t/027_invalid_checkpoint_after_promote.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..c9c0b3f90a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6921,6 +6921,9 @@ CreateRestartPoint(int flags)
 	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
 
+	/* Concurrent checkpoint/restartpoint cannot happen */
+	Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
+
 	/* Get a local copy of the last safe checkpoint record. */
 	SpinLockAcquire(&XLogCtl->info_lck);
 	lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
@@ -7013,30 +7016,33 @@ CreateRestartPoint(int flags)
 	 */
 	PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
-	/*
-	 * Update pg_control, using current time.  Check that it still shows
-	 * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
-	 * this is a quick hack to make sure nothing really bad happens if somehow
-	 * we get here after the end-of-recovery checkpoint.
-	 */
+	/* Update pg_control, using current time */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-	if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-		ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+
+	/*
+	 * Check if the control file still shows an older checkpoint, and update
+	 * its information.
+	 */
+	if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
 	{
 		ControlFile->checkPoint = lastCheckPointRecPtr;
 		ControlFile->checkPointCopy = lastCheckPoint;
+	}
 
-		/*
-		 * Ensure minRecoveryPoint is past the checkpoint record.  Normally,
-		 * this will have happened already while writing out dirty buffers,
-		 * but not necessarily - e.g. because no buffers were dirtied.  We do
-		 * this because a backup performed in recovery uses minRecoveryPoint to
-		 * determine which WAL files must be included in the backup, and the
-		 * file (or files) containing the checkpoint record must be included,
-		 * at a minimum. Note that for an ordinary restart of recovery there's
-		 * no value in having the minimum recovery point any earlier than this
-		 * anyway, because redo will begin just after the checkpoint record.
-		 */
+	/*
+	 * Ensure minRecoveryPoint is past the checkpoint record while archive
+	 * recovery is still ongoing.  Normally, this will have happened already
+	 * while writing out dirty buffers, but not necessarily - e.g. because no
+	 * buffers were dirtied.  We do this because a base backup uses
+	 * minRecoveryPoint to determine which WAL files must be included in the
+	 * backup, and the file (or files) containing the checkpoint record must
+	 * be included, at a minimum.  Note that for an ordinary restart o

Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread Pavel Stehule

pá 6. 5. 2022 v 10:05 odesílatel David Rowley  napsal:

> On Fri, 6 May 2022 at 17:52, Pavel Stehule 
> wrote:
> > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
> nodeMemoize.c:268
> > 268 if (size == 0)
> > (gdb) p size
> > $1 = 4369066
>
> Thanks for the report.  I think I now see the problem.  Looking at
> [1], it seems that's a bushy plan. That's fine, but less common than a
> left deep plan.
>
> I think the problem is down to some incorrect code in
> get_memoize_path() where we pass the wrong value of "calls" to
> create_memoize_path(). I think instead of outer_path->parent->rows it
> instead should be outer_path->rows.
>
> If you look closely at the plan, you'll see that the outer side of the
> inner-most Nested Loop is parameterized by some higher-level nested
> loop.
>
> ->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
> time=0.024..0.024 rows=1 loops=66)
>  ->  Index Only Scan using
> uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
> (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
> loops=66)
>Index Cond: (item_id = itembo0_.id)
>Heap Fetches: 21
>  ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
> (actual time=0.013..0.013 rows=1 loops=66)
>
> so instead of passing 91 to create_memoize_path() as I thought. Since
> I can't see any WHERE clause items filtering rows from the
> itemsharei2__1 relation, then the outer_path->parent->rows is should
> be whatever pg_class.reltuples says.
>
> Are you able to send the results of:
>
> explain select item_id from item_share_image group by item_id; -- I'm
> interested in the estimated number of groups in the plan's top node.
>



>
> select reltuples from pg_class where oid = 'item_share_image'::regclass;
>
> I'm expecting the estimated number of rows in the top node of the
> group by plan to be about 4369066.
>

(2022-05-06 12:30:23) prd_aukro=# explain select item_id from
item_share_image group by item_id;
QUERY PLAN


Finalize HashAggregate  (cost=1543418.63..1554179.08 rows=1076045 width=8)
  Group Key: item_id
  ->  Gather  (cost=1000.57..1532658.18 rows=4304180 width=8)
Workers Planned: 4
->  Group  (cost=0.57..1101240.18 rows=1076045 width=8)
  Group Key: item_id
  ->  Parallel Index Only Scan using ixfk_isi_itemid on
item_share_image  (cost=0.57..1039823.86 rows=24566530 width=8)
(7 řádek)

Čas: 1,808 ms
(2022-05-06 12:30:26) prd_aukro=# select reltuples from pg_class where oid
= 'item_share_image'::regclass;
 reltuples

9.826612e+07
(1 řádka)

Čas: 0,887 ms

Regards

Pavel


> David
>
> [1] https://explain.depesz.com/s/2rBw#source
>

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-05-06 Thread Etsuro Fujita

On Thu, May 5, 2022 at 6:39 AM David Zhang  wrote:
> On 2022-05-02 1:25 a.m., Etsuro Fujita wrote:
> > On Wed, Apr 20, 2022 at 4:55 AM David Zhang  wrote:
> >> I tried to apply the patch to master and plan to run some tests, but got
> >> below errors due to other commits.
> > I rebased the patch against HEAD.  Attached is an updated patch.
>
> Applied the patch v8 to master branch today, and the `make check` is OK.
> I also repeated previous performance tests on three virtual Ubuntu
> 18.04, and the performance improvement of parallel abort in 10 times
> average is more consistent.
>
> before:
>
>  abort.1 = 2.6344 ms
>  abort.2 = 4.2799 ms
>
> after:
>
>  abort.1 = 1.4105 ms
>  abort.2 = 2.2075 ms

Good to know!  Thanks for testing!

> >> + * remote server in parallel at (sub)transaction end.
> >>
> >> Here, I think the comment above could potentially apply to multiple
> >> remote server(s).
> > I agree on that point, but I think it's correct to say "the remote
> > server" here, because we determine this for the given remote server.
> > Maybe I'm missing something, so could you elaborate on it?
> Your understanding is correct. I was thinking `remote server(s)` would
> be easy for end user to understand but this is a comment in source code,
> so either way is fine for me.

Ok, but I noticed that the comment failed to mention that the
parallel_commit option is disabled by default.  Also, I noticed a
comment above it:

 * It's enough to determine this only when making new connection because
 * all the connections to the foreign server whose keep_connections option
 * is changed will be closed and re-made later.

This would apply to the parallel_commit option as well.  How about
updating these like the attached?  (I simplified the latter comment
and moved it to a more appropriate place.)

Best regards,
Etsuro Fujita


update-comment-in-make_new_connection.patch
Description: Binary data

Re: Configuration Parameter/GUC value validation hook

2022-05-06 Thread Robert Haas

On Wed, May 4, 2022 at 7:12 AM Bharath Rupireddy
 wrote:
> Thanks Tom and Robert for your responses.
>
> How about we provide a sample extension (limiting some important
> parameters say shared_buffers, work_mem and so on to some
> "reasonable/recommended" limits) in the core along with the
> set_config_option_hook? This way, all the people using open source
> postgres out-of-the-box will benefit and whoever wants, can modify
> that sample extension to suit their needs. The sampe extension can
> also serve as an example to implement set_config_option_hook.
>
> Thoughts?

Well, it's better than just adding a hook and stopping, but I'm not
really sure that it's as good as what I'd like to have.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread David Rowley

On Fri, 6 May 2022 at 20:04, David Rowley  wrote:
> Thanks for the report.  I think I now see the problem.  Looking at
> [1], it seems that's a bushy plan. That's fine, but less common than a
> left deep plan.

On second thoughts, it does not need to be a bushy plan for the outer
side of the nested loop to be parameterized by some higher-level
nested loop. There's an example of a plan like this in the regression
tests.

regression=# explain (analyze, costs off, summary off)
regression-# select * from tenk1 t1 left join
regression-# (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
regression-#  on t1.hundred = t2.hundred and t1.ten = t3.ten
regression-# where t1.unique1 = 1;
   QUERY PLAN
-
 Nested Loop Left Join (actual time=0.258..0.487 rows=20 loops=1)
   ->  Index Scan using tenk1_unique1 on tenk1 t1 (actual
time=0.049..0.049 rows=1 loops=1)
 Index Cond: (unique1 = 1)
   ->  Nested Loop (actual time=0.204..0.419 rows=20 loops=1)
 Join Filter: (t1.ten = t3.ten)
 Rows Removed by Join Filter: 80
 ->  Bitmap Heap Scan on tenk1 t2 (actual time=0.064..0.194
rows=100 loops=1)
   Recheck Cond: (t1.hundred = hundred)
   Heap Blocks: exact=86
   ->  Bitmap Index Scan on tenk1_hundred (actual
time=0.036..0.036 rows=100 loops=1)
 Index Cond: (hundred = t1.hundred)
 ->  Memoize (actual time=0.001..0.001 rows=1 loops=100)
   Cache Key: t2.thousand
   Cache Mode: logical
   Hits: 90  Misses: 10  Evictions: 0  Overflows: 0
Memory Usage: 4kB
   ->  Index Scan using tenk1_unique2 on tenk1 t3 (actual
time=0.009..0.009 rows=1 loops=10)
 Index Cond: (unique2 = t2.thousand)
(17 rows)

debugging this I see that the memorize plan won because it was passing
1 as the number of calls. It should have been passing 100.  The
memorize node's number of loops agrees with that. Fixing the calls to
correctly pass 100 gets rid of the Memoize node.

I've attached a patch to fix.  I'll look at it in more detail after the weekend.

I'm very tempted to change the EXPLAIN output in at least master to
display the initial and final (maximum) hash table sizes. Wondering if
anyone would object to that?

David
diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index 9a8c5165b0..313f379ed8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -610,7 +610,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,

hash_operators,

extra->inner_unique,

binary_mode,
-   
outer_path->parent->rows);
+   
outer_path->rows);
}
 
return NULL;
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index bf1a2db2cf..bd3375f2ba 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-  QUERY PLAN  
---
+   QUERY PLAN   
+
  Nested Loop Left Join
->  Index Scan using tenk1_unique1 on tenk1 t1
  Index Cond: (unique1 = 1)
@@ -3684,20 +3684,17 @@ where t1.unique1 = 1;
Recheck Cond: (t1.hundred = hundred)
->  Bitmap Index Scan on tenk1_hundred
  Index Cond: (hundred = t1.hundred)
- ->  Memoize
-   Cache Key: t2.thousand
-   Cache Mode: logical
-   ->  Index Scan using tenk1_unique2 on tenk1 t3
- Index Cond: (unique2 = t2.thousand)
-(14 rows)
+ ->  Index Scan using tenk1_unique2 on tenk1 t3
+   Index Cond: (unique2 = t2.thousand)
+(11 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-  QUERY PLAN  
---
+   QUERY PLAN   
+-

Fix typo in code comment - origin.c

2022-05-06 Thread Peter Smith

PSA trivial patch to fix some very old comment typo.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Fix-typo-in-comment.patch
Description: Binary data

Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-06 Thread Peter Smith

On Fri, Apr 29, 2022 at 3:22 PM shiy.f...@fujitsu.com
 wrote:
...
> Thanks for your patch.
>
> The patch modified streaming option in logical replication, it can be set to
> 'on', 'off' and 'apply'. The new option 'apply' haven't been tested in the 
> tap test.
> Attach a patch which modified the subscription tap test to cover both 'on' and
> 'apply' option. (The main patch is also attached to make cfbot happy.)
>

Here are my review comments for v5-0002 (TAP tests)

Your changes followed a similar pattern of refactoring so most of my
comments below is repeated for all the files.

==

1. Commit message

For the tap tests about streaming option in logical replication, test both
'on' and 'apply' option.

SUGGESTION
Change all TAP tests using the PUBLICATION "streaming" option, so they
now test both 'on' and 'apply' values.

~~~

2. src/test/subscription/t/015_stream.pl

+sub test_streaming
+{

I think the function should have a comment to say that its purpose is
to encapsulate all the common (stream related) test steps so the same
code can be run both for the streaming=on and streaming=apply cases.

~~~

3. src/test/subscription/t/015_stream.pl

+
+# Test streaming mode on

+# Test streaming mode apply

These comments fell too small. IMO they should both be more prominent like:

# Test using streaming mode 'on'

###
# Test using streaming mode 'apply'
###

~~~

4. src/test/subscription/t/015_stream.pl

+# Test streaming mode apply
+$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)");
 $node_publisher->wait_for_catchup($appname);

I think those 2 lines do not really belong after the "# Test streaming
mode apply" comment. IIUC they are really just doing cleanup from the
prior test part so I think they should

a) be *above* this comment (and say "# cleanup the test data") or
b) maybe it is best to put all the cleanup lines actually inside the
'test_streaming' function so that the last thing the function does is
clean up after itself.

option b seems tidier to me.

~~~

5. src/test/subscription/t/016_stream_subxact.pl

sub test_streaming should be commented. (same as comment #2)

~~~

6. src/test/subscription/t/016_stream_subxact.pl

The comments for the different streaming nodes should be more
prominent. (same as comment #3)

~~~

7. src/test/subscription/t/016_stream_subxact.pl

+# Test streaming mode apply
+$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)");
 $node_publisher->wait_for_catchup($appname);

These don't seem to belong here. They are clean up from the prior
test. (same as comment #4)

~~~

8. src/test/subscription/t/017_stream_ddl.pl

sub test_streaming should be commented. (same as comment #2)

~~~

9. src/test/subscription/t/017_stream_ddl.pl

The comments for the different streaming nodes should be more
prominent. (same as comment #3)

~~~

10. src/test/subscription/t/017_stream_ddl.pl

+# Test streaming mode apply
 $node_publisher->safe_psql(
  'postgres', q{
-BEGIN;
-INSERT INTO test_tab VALUES (2001, md5(2001::text), -2001, 2*2001);
-ALTER TABLE test_tab ADD COLUMN e INT;
-SAVEPOINT s1;
-INSERT INTO test_tab VALUES (2002, md5(2002::text), -2002, 2*2002, -3*2002);
-COMMIT;
+DELETE FROM test_tab WHERE (a > 2);
+ALTER TABLE test_tab DROP COLUMN c, DROP COLUMN d, DROP COLUMN e,
DROP COLUMN f;
 });

 $node_publisher->wait_for_catchup($appname);

These don't seem to belong here. They are clean up from the prior
test. (same as comment #4)

~~~

11. .../t/018_stream_subxact_abort.pl

sub test_streaming should be commented. (same as comment #2)

~~~

12. .../t/018_stream_subxact_abort.pl

The comments for the different streaming nodes should be more
prominent. (same as comment #3)

~~~

13. .../t/018_stream_subxact_abort.pl

+# Test streaming mode apply
+$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)");
 $node_publisher->wait_for_catchup($appname);

These don't seem to belong here. They are clean up from the prior
test. (same as comment #4)

~~~

14. .../t/019_stream_subxact_ddl_abort.pl

sub test_streaming should be commented. (same as comment #2)

~~~

15. .../t/019_stream_subxact_ddl_abort.pl

The comments for the different streaming nodes should be more
prominent. (same as comment #3)

~~~

16. .../t/019_stream_subxact_ddl_abort.pl

+test_streaming($node_publisher, $node_subscriber, $appname);
+
+# Test streaming mode apply
 $node_publisher->safe_psql(
  'postgres', q{
-BEGIN;
-INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,500) s(i);
-ALTER TABLE test_tab ADD COLUMN c INT;
-SAVEPOINT s1;
-INSERT INTO test_tab SELECT i, md5(i::text), -i FROM
generate_series(501,1000) s(i);
-ALTER TABLE test_tab ADD COLUMN d INT;
-SAVEPOINT s2;
-INSERT INTO test_tab SELECT i, md5(i::text), -i, 2*i FROM
generate_series(1001,1500) s(i);
-ALTER TABLE test_tab ADD COLUMN e INT;
-SAVEPOINT

Re: strange slow query - lost lot of time somewhere

2022-05-06 Thread David Rowley

On Fri, 6 May 2022 at 17:52, Pavel Stehule  wrote:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at 
> nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Thanks for the report.  I think I now see the problem.  Looking at
[1], it seems that's a bushy plan. That's fine, but less common than a
left deep plan.

I think the problem is down to some incorrect code in
get_memoize_path() where we pass the wrong value of "calls" to
create_memoize_path(). I think instead of outer_path->parent->rows it
instead should be outer_path->rows.

If you look closely at the plan, you'll see that the outer side of the
inner-most Nested Loop is parameterized by some higher-level nested
loop.

->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
time=0.024..0.024 rows=1 loops=66)
 ->  Index Only Scan using
uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
(cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
loops=66)
   Index Cond: (item_id = itembo0_.id)
   Heap Fetches: 21
 ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
(actual time=0.013..0.013 rows=1 loops=66)

so instead of passing 91 to create_memoize_path() as I thought. Since
I can't see any WHERE clause items filtering rows from the
itemsharei2__1 relation, then the outer_path->parent->rows is should
be whatever pg_class.reltuples says.

Are you able to send the results of:

explain select item_id from item_share_image group by item_id; -- I'm
interested in the estimated number of groups in the plan's top node.

select reltuples from pg_class where oid = 'item_share_image'::regclass;

I'm expecting the estimated number of rows in the top node of the
group by plan to be about 4369066.

David

[1] https://explain.depesz.com/s/2rBw#source

RE: Logical replication timeout problem

2022-05-06 Thread wangw.f...@fujitsu.com

On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada  wrote:
> On Wed, May 4, 2022 at 7:18 PM Amit Kapila  wrote:
> >
> > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada 
> wrote:
> > >
> > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila 
> wrote:
> > > >
> > > >
> > > > So, shall we go back to the previous approach of using a separate
> > > > function update_replication_progress?
> > >
> > > Ok, agreed.
> > >
> >
> > Attached, please find the updated patch accordingly. Currently, I have
> > prepared it for HEAD, if you don't see any problem with this, we can
> > prepare the back-branch patches based on this.
> 
> Thank you for updating the patch. Looks good to me.
Thanks for your review.

Improve the back-branch patches according to the discussion.
Move the CHANGES_THRESHOLD logic from function OutputPluginUpdateProgress to
new funcion update_replication_progress.
In addition, improve all patches formatting with pgindent.

Attach the patches.

Regards,
Wang wei


HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch
Description:  HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch


REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch


REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description:  REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch

54 matches

Mail list logo