date:20221122

Re: New docs chapter on Transaction Management and related changes

2022-11-22 Thread Laurenz Albe

On Tue, 2022-11-22 at 13:50 -0500, Bruce Momjian wrote:
> Agreed, updated patch attached.

I cannot find any more problems, and I shouldn't mention the extra empty
line at the end of the patch.

I'd change the commitfest status to "ready for committer" now if it were
not already in that status.

Yours,
Laurenz Albe

Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-22 Thread Julien Rouhaud

Hi,

Sorry for the very late answer, I had quite a lot of other things going on
recently.  And thanks for taking care of the patchset!

On Wed, Nov 23, 2022 at 03:05:18PM +0900, Michael Paquier wrote:
> On Tue, Nov 22, 2022 at 05:20:01PM +0900, Michael Paquier wrote:
> > + /* XXX: this should stick to elevel for some cases? */
> > + ereport(LOG,
> > + (errmsg("skipping missing authentication file \"%s\"",
> > + inc_fullname)));
> > Should we always issue a LOG here?  In some cases we use an elevel of
> > DEBUG3.
>
> And here I think that we need to use elevel.  In hba.c, this would
> generate a LOG while hbafuncs.c uses DEBUG3, leading to less noise in
> the log files.

Yeah I agree the LOG message is only interesting if you're reloading the conf.
I actually thought that this is what I did sorry about that.

> > So, what do you think about something like the attached?  I have begun
> > a lookup at the tests, but I don't have enough material for an actual
> > review of this part yet.  Note that I have removed temporarily
> > process_included_authfile(), as I was looking at all the code branches
> > in details.  The final result ought to include AbsoluteConfigLocation(),
> > open_auth_file() and tokenize_auth_file() with an extra missing_ok
> > argument, I guess ("strict" as proposed originally is not the usual
> > PG-way for ENOENT-ish problems).  process_line should be used only
> > when we have no err_msg, meaning that these have been consumed in some
> > TokenizedAuthLines already.
>
> This, however, was still brittle in terms of memory handling.
> Reloading the server a few hundred times proved that this was leaking
> memory in the tokenization.

Oh, nice catch.

> This becomes quite simple once you switch
> to an approach where tokenize_auth_file() uses its own memory context,
> while we store all the TokenizedAuthLines in the static context.  It
> also occurred to me that we can create and drop the static context
> holding the tokens when opening the top-root auth file, aka with a
> depth at 0.  It may be a bit simpler to switch to a single-context
> approach for all the allocations of tokenize_auth_file(), though I
> find the use of a static context much cleaner for the inclusions with
> @ files when we expand an existing list.

Agreed.

The depth 0 is getting used quite a lot now, maybe we should have a define for
it to make it easier to grep, like TOP_LEVEL_AUTH_FILE or something like that?
And also add a define for the magical 10 for the max inclusion depth, for both
auth files and GUC files while at it?

> The patch can be split, again, into more pieces.  Attached 0001
> reworks the memory allocation contexts for the tokenization and 0002
> is the main feature.  As of things are, I am quite happy with the
> shapes of 0001 and 0002.  I have tested quite a bit its robustness,
> with valgrind for example, to make sure that there are no leaks in the
> postmaster (with[out] -DEXEC_BACKEND).  The inclusion logic is
> refactored to be in a state close to your previous patches, see
> tokenize_inclusion_file().

Yep I saw that.  I don't have much to say about the patch, it looks good to me.
The only nitpicking I have is:

+/*
+ * Memory context holding the list of TokenizedAuthLines when parsing
+ * HBA or ident config files.  This is created when loading the top HBA
+ + or ident file (depth of 0).
+ */
+static MemoryContext tokenize_context = NULL;

The comment seems a bit ambiguous as with "loading the top..." you probably
meant something like "loading the file in memory" rather than "(re)loading the
configuration".  Maybe s/loading/opening/?
>
> Note that the tests are failing for some of the Windows CIs, actually,
> due to a difference in some of the paths generated vs the file paths
> (aka c:\cirrus vs c:/cirrus, as far as I saw).

Mmm, I haven't looked deeply so I'm not sure if the perl podules are aware of
it or not, but maybe we could somehow detect the used delimiter at the
beginning after normalizing the directory, and use a $DELIM rather than a plain
"/"?

drop postmaster symlink

2022-11-22 Thread Peter Eisentraut

A little while ago we discussed briefly over in the meson thread whether 
we could remove the postmaster symlink [0].  The meson build system 
currently does not install a postmaster symlink.  (AFAICT, the MSVC 
build system does not either.)  So if we want to elevate the meson build 
system, we either need to add the postmaster symlink, or remove it from 
the other build system(s) as well.  Seeing that it's been deprecated for 
a long time, I propose we just remove it.  See attached patches.



[0]: 
https://www.postgresql.org/message-id/bfdf03c8-c24f-c5b1-474e-4c9a96210...@enterprisedb.comFrom 179eaab96dc5215970b2b3485cf77f1a58f9d337 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 13 Nov 2022 20:52:00 +0100
Subject: [PATCH 1/2] Remove gratuitous references to postmaster instead of
 postgres

---
 contrib/start-scripts/freebsd | 2 +-
 contrib/start-scripts/linux   | 2 +-
 src/port/path.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/start-scripts/freebsd b/contrib/start-scripts/freebsd
index 3323237a54b4..5e22b2d2f0e4 100644
--- a/contrib/start-scripts/freebsd
+++ b/contrib/start-scripts/freebsd
@@ -29,7 +29,7 @@ 
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
 # What to use to start up the postmaster.  (If you want the script to wait
 # until the server has started, you could use "pg_ctl start" here.)
-DAEMON="$prefix/bin/postmaster"
+DAEMON="$prefix/bin/postgres"
 
 # What to use to shut down the postmaster
 PGCTL="$prefix/bin/pg_ctl"
diff --git a/contrib/start-scripts/linux b/contrib/start-scripts/linux
index a7757162fc4b..9292855df716 100644
--- a/contrib/start-scripts/linux
+++ b/contrib/start-scripts/linux
@@ -61,7 +61,7 @@ 
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
 # What to use to start up the postmaster.  (If you want the script to wait
 # until the server has started, you could use "pg_ctl start" here.)
-DAEMON="$prefix/bin/postmaster"
+DAEMON="$prefix/bin/postgres"
 
 # What to use to shut down the postmaster
 PGCTL="$prefix/bin/pg_ctl"
diff --git a/src/port/path.c b/src/port/path.c
index 05fe812f757b..fb64873c7a43 100644
--- a/src/port/path.c
+++ b/src/port/path.c
@@ -651,7 +651,7 @@ dir_strcmp(const char *s1, const char *s2)
  * For example:
  * target_path  = '/usr/local/share/postgresql'
  * bin_path = '/usr/local/bin'
- * my_exec_path = '/opt/pgsql/bin/postmaster'
+ * my_exec_path = '/opt/pgsql/bin/postgres'
  * Given these inputs, the common prefix is '/usr/local/', the tail of
  * bin_path is 'bin' which does match the last directory component of
  * my_exec_path, so we would return '/opt/pgsql/share/postgresql'
-- 
2.38.1

From 6118b4098a50b78ca7437938117cc3ab41b72e08 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 13 Nov 2022 20:52:56 +0100
Subject: [PATCH 2/2] Don't install postmaster symlink anymore

Also remove man page.
---
 doc/src/sgml/ref/postmaster.sgml | 44 
 doc/src/sgml/reference.sgml  |  1 -
 src/backend/Makefile |  8 +-
 3 files changed, 1 insertion(+), 52 deletions(-)
 delete mode 100644 doc/src/sgml/ref/postmaster.sgml

diff --git a/doc/src/sgml/ref/postmaster.sgml b/doc/src/sgml/ref/postmaster.sgml
deleted file mode 100644
index 7b544ed0b613..
--- a/doc/src/sgml/ref/postmaster.sgml
+++ /dev/null
@@ -1,44 +0,0 @@
-
-
-
- 
-  postmaster
- 
-
- 
-  postmaster
-  1
-  Application
- 
-
- 
-  postmaster
-  PostgreSQL database 
server
- 
-
- 
-  
-   postmaster
-   option
-  
- 
-
- 
-  Description
-
-  
-   postmaster is a deprecated alias of 
postgres.
-  
- 
-
- 
-  See Also
-
-  
-   
-  
- 
-
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index a3b743e8c1e7..e11b4b613075 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -289,7 +289,6 @@ PostgreSQL Server Applications



-   
 
  
 
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 181c217fae4c..ed364543b74e 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -208,12 +208,6 @@ endif
 
 install-bin: postgres $(POSTGRES_IMP) installdirs
$(INSTALL_PROGRAM) postgres$(X) '$(DESTDIR)$(bindir)/postgres$(X)'
-ifneq ($(PORTNAME), win32)
-   @rm -f '$(DESTDIR)$(bindir)/postmaster$(X)'
-   ln -s postgres$(X) '$(DESTDIR)$(bindir)/postmaster$(X)'
-else
-   $(INSTALL_PROGRAM) postgres$(X) '$(DESTDIR)$(bindir)/postmaster$(X)'
-endif
 ifeq ($(MAKE_EXPORTS), true)
$(INSTALL_DATA) $(POSTGRES_IMP) '$(DESTDIR)$(pkglibdir)/$(POSTGRES_IMP)'
$(INSTALL_PROGRAM) $(MKLDEXPORT) 
'$(DESTDIR)$(pgxsdir)/$(MKLDEXPORT_DIR)/mkldexport.sh'
@@ -242,7 +236,7 @@ endif
 ##
 
 uninstall:
-   rm -f '$(DESTDIR)$(bindir)/postgres$(X)' 
'$(DESTDIR)$(bindir)/postmaster'
+   rm -f '$(DESTDIR)$(bindir)/postgres$(X)'
 ifeq

Re: Prefetch the next tuple's memory during seqscans

2022-11-22 Thread David Rowley

On Wed, 23 Nov 2022 at 20:29, sirisha chamarthi
 wrote:
> I ran your test1 exactly like your setup except the row count is 300 
> (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration 
> details at the bottom of the mail. It appears Master + 0001 + 0005 regressed 
> compared to master slightly .

Thank you for running these tests.

Can you share if the plans used for these queries was a parallel plan?
I had set max_parallel_workers_per_gather to 0 to remove the
additional variability from parallel query.

Also, 13275 blocks is 104MBs, does EXPLAIN (ANALYZE, BUFFERS) indicate
that all pages were in shared buffers? I used pg_prewarm() to ensure
they were so that the runs were consistent.

David

Re: Prefetch the next tuple's memory during seqscans

2022-11-22 Thread sirisha chamarthi

On Tue, Nov 22, 2022 at 1:58 PM David Rowley  wrote:

> On Thu, 3 Nov 2022 at 06:25, Andres Freund  wrote:
> > Attached is an experimental patch/hack for that. It ended up being more
> > beneficial to make the access ordering more optimal than prefetching the
> tuple
> > contents, but I'm not at all sure that's the be-all-end-all.
>
> Thanks for writing that patch. I've been experimenting with it.
>
> I tried unrolling the loop (patch 0003) as you mentioned in:
>
> + * FIXME: Worth unrolling so that we don't fetch the same cacheline
> + * over and over, due to line items being smaller than a cacheline?
>
> but didn't see any gains from doing that.
>
> I also adjusted your patch a little so that instead of doing:
>
> - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */
> + OffsetNumber *rs_vistuples;
> + OffsetNumber rs_vistuples_d[MaxHeapTuplesPerPage]; /* their offsets */
>
> to work around the issue of having to populate rs_vistuples_d in
> reverse, I added a new field called rs_startindex to mark where the
> first element in the rs_vistuples array is. The way you wrote it seems
> to require fewer code changes, but per the FIXME comment you left, I
> get the idea you just did it the way you did to make it work enough
> for testing.
>
> I'm quite keen to move forward in committing the 0001 patch to add the
> pg_prefetch_mem macro. What I'm a little undecided about is what the
> best patch is to commit first to make use of the new macro.
>
> I did some tests on the attached set of patches:
>
> alter system set max_parallel_workers_per_gather = 0;
> select pg_reload_conf();
>
> create table t as select a from generate_series(1,1000)a;
> alter table t set (autovacuum_enabled=false);
>
> $ cat bench.sql
> select * from t where a = 0;
>
> psql -c "select pg_prewarm('t');" postgres
>
> -- Test 1 no frozen tuples in "t"
>
> Master (@9c6ad5eaa):
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 383.332 ms
> latency average = 375.747 ms
> latency average = 376.090 ms
>
> Master + 0001 + 0002:
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 370.133 ms
> latency average = 370.149 ms
> latency average = 370.157 ms
>
> Master + 0001 + 0005:
> $ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 372.662 ms
> latency average = 371.034 ms
> latency average = 372.709 ms
>
> -- Test 2 "select count(*) from t" with all tuples frozen
>
> $ cat bench1.sql
> select count(*) from t;
>
> psql -c "vacuum freeze t;" postgres
> psql -c "select pg_prewarm('t');" postgres
>
> Master (@9c6ad5eaa):
> $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 406.238 ms
> latency average = 407.029 ms
> latency average = 406.962 ms
>
> Master + 0001 + 0005:
> $ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
> latency average = 345.470 ms
> latency average = 345.775 ms
> latency average = 345.354 ms
>
> My current thoughts are that it might be best to go with 0005 to start
> with.  I know Melanie is working on making some changes in this area,
> so perhaps it's best to leave 0002 until that work is complete.
>

I ran your test1 exactly like your setup except the row count is 300
(with 13275 blocks). Shared_buffers is 128MB and the hardware configuration
details at the bottom of the mail. It appears *Master + 0001 + 0005 *regressed
compared to master slightly .

*Master (@56d0ed3b756b2e3799a7bbc0ac89bc7657ca2c33)*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 430.287 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 369.046 ms

*Master + 0001 + 0002:*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 427.983 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 367.185 ms

*Master + 0001 + 0005:*

Before vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 447.045 ms

After Vacuum:
/usr/local/pgsql/bin/pgbench -n -f bench.sql -M prepared -T 30 -P 10
postgres | grep -E "^latency"
latency average = 374.484 ms

lscpu output

Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   46 bits physical, 48 bits virtual
CPU(s):  1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):   1
NUMA node(s):1
Vendor ID:   GenuineIntel
CPU family:  6
Model:

Re: Documentation for building with meson

2022-11-22 Thread Justin Pryzby

On Mon, Nov 14, 2022 at 10:41:21AM -0800, samay sharma wrote:

> You need LZ4, if you want to support compression of data with that
> method; see default_toast_compression and wal_compression. 

=> The first comma is odd.  Maybe it should say "LZ4 is needed to
support .."

> You need Zstandard, if you want to support compression of data or
> backups with that method; see wal_compression. The minimum required
> version is 1.4.0. 

Same.

Also, since v15, LZ4 and zstd can both be used by basebackup.

>Some commonly used ones are mentioned in the subsequent sections

=> Some commonly used options ...

>  Most of these require additional software, as described in Section
>  17.3.2, and are set to be auto features.

=> "Are set to be auto features" sounds odd.  I think it should say
something like " .. and are automatically enabled if the required
software is detected.".

> You can change this behavior by manually setting the auto features to
> enabled to require them or disabled to not build with them. 

remove "auto".  Maybe "enabled" and "disabled" need markup.

> On Windows, the default WinLDAP library is used. It defults to auto

typo: defults

> It defaults to auto and libsystemd and the associated header files need
> to be installed to use this option.

=> write this as two separate sentences.  Same for libxml.

> bsd to use the UUID functions found in FreeBSD, NetBSD, and some other
> BSD-derived systems 

=> should remove mention of netbsd, like c4b6d218e

> Enables use of the Zlib library. It defaults to auto and enables
> support for compressed archives in pg_dump ,pg_restore and
> pg_basebackup and is recommended. 

=> The comma is mis-placed.

> The default backend meson uses is ninja and that should suffice for
> most use cases. However, if you'd like to fully integrate with
> visual studio, you can set the BACKEND to vs. 

=> BACKEND is missing markup.

> This option can be used to specify the buildtype to use; defaults to
> release

=> release is missing markup

>  Specify the optimization level. LEVEL can be set to any of
>  {0,g,1,2,3,s}. 

=> LEVEL is missing markup

Thanks,
-- 
Justin

Help running 010_tab_completion.pl on windows

2022-11-22 Thread Kirk Wolak

I have psql working with readline (roughly) in windows 10!
in my attempt to test it...

>> 1..0 # SKIP IO::Pty is needed to run this test

I would like to run these tests to see how far off I am...
(Randomly typing sql and squealing like a child has its limits)

I have  built this using VS 2022 Community Edition.

The quick search failed to find an obvious answer.
One note in one of the strawberry .pm files read:
>ptys are not supported yet under Win32, but will be emulated...

Thanks in advance!

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-22 Thread Michael Paquier

On Tue, Nov 22, 2022 at 05:42:24PM -0800, Andres Freund wrote:
> The failure has to be happening in wait_for_postmaster_promote(), because the
> standby2 is actually successfully promoted.

That's the one under -fsanitize=address.  It really smells to me like
a bug with a race condition all over it.
--
Michael


signature.asc
Description: PGP signature

Re: Introduce a new view for checkpointer related stats

2022-11-22 Thread Bharath Rupireddy

On Wed, Nov 23, 2022 at 2:23 AM Andres Freund  wrote:
>
> On 2022-11-22 18:08:28 +0530, Bharath Rupireddy wrote:
> >
> >  CREATE VIEW pg_stat_bgwriter AS
> >  SELECT
> > -pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
> > -pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
> > -pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
> > -pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
> > -pg_stat_get_bgwriter_buf_written_checkpoints() AS 
> > buffers_checkpoint,
> >  pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
> >  pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
> > -pg_stat_get_buf_written_backend() AS buffers_backend,
> > -pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
> >  pg_stat_get_buf_alloc() AS buffers_alloc,
> >  pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
>
>
> I think we should consider deprecating the pg_stat_bgwriter columns but
> leaving them in place for a few years. New stuff should only be added to
> pg_stat_checkpointer, but we don't need to break old monitoring queries.

May I know what it means to deprecate pg_stat_bgwriter columns? Are
you suggesting to add deprecation warnings to corresponding functions
pg_stat_get_bgwriter_buf_written_clean(),
pg_stat_get_bgwriter_maxwritten_clean(), pg_stat_get_buf_alloc() and
pg_stat_get_bgwriter_stat_reset_time() and in the docs? And eventually
do away with the bgwriter stats and the file pgstat_bgwriter.c? Aren't
the bgwriter stats buf_written_clean, maxwritten_clean  and buf_alloc
useful?

I think we need to discuss the pg_stat_bgwriter deprecation separately
independent of the patch here, no?

PS: I noticed some discussion here
https://www.postgresql.org/message-id/20221121003815.qnwlnz2lhkow2e5w%40awork3.anarazel.de,
I haven't spent enough time on it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-22 Thread Michael Paquier

On Tue, Nov 22, 2022 at 05:20:01PM +0900, Michael Paquier wrote:
> + /* XXX: this should stick to elevel for some cases? */
> + ereport(LOG,
> + (errmsg("skipping missing authentication file \"%s\"",
> + inc_fullname)));
> Should we always issue a LOG here?  In some cases we use an elevel of
> DEBUG3.

And here I think that we need to use elevel.  In hba.c, this would
generate a LOG while hbafuncs.c uses DEBUG3, leading to less noise in
the log files.

> So, what do you think about something like the attached?  I have begun
> a lookup at the tests, but I don't have enough material for an actual
> review of this part yet.  Note that I have removed temporarily
> process_included_authfile(), as I was looking at all the code branches
> in details.  The final result ought to include AbsoluteConfigLocation(),
> open_auth_file() and tokenize_auth_file() with an extra missing_ok
> argument, I guess ("strict" as proposed originally is not the usual
> PG-way for ENOENT-ish problems).  process_line should be used only
> when we have no err_msg, meaning that these have been consumed in some
> TokenizedAuthLines already.

This, however, was still brittle in terms of memory handling.
Reloading the server a few hundred times proved that this was leaking
memory in the tokenization.  This becomes quite simple once you switch
to an approach where tokenize_auth_file() uses its own memory context,
while we store all the TokenizedAuthLines in the static context.  It
also occurred to me that we can create and drop the static context
holding the tokens when opening the top-root auth file, aka with a
depth at 0.  It may be a bit simpler to switch to a single-context
approach for all the allocations of tokenize_auth_file(), though I
find the use of a static context much cleaner for the inclusions with
@ files when we expand an existing list.

The patch can be split, again, into more pieces.  Attached 0001
reworks the memory allocation contexts for the tokenization and 0002
is the main feature.  As of things are, I am quite happy with the
shapes of 0001 and 0002.  I have tested quite a bit its robustness,
with valgrind for example, to make sure that there are no leaks in the
postmaster (with[out] -DEXEC_BACKEND).  The inclusion logic is
refactored to be in a state close to your previous patches, see
tokenize_inclusion_file().

Note that the tests are failing for some of the Windows CIs, actually,
due to a difference in some of the paths generated vs the file paths
(aka c:\cirrus vs c:/cirrus, as far as I saw).
--
Michael
From 7949bda7adcb378f8355f06325d4fade56077337 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 23 Nov 2022 14:39:07 +0900
Subject: [PATCH v21 1/2] Rework memory contexts in charge of HBA/ident
 tokenization

The list of TokenizedAuthLines generated for the tokens are now stored
in a static context called tokenize_context, where only all the parsed
elements are stored.  This context is stored when opening the top-root
of an authentication file, and is cleaned up once we are done with it
with a new routine called free_auth_file().  One call of
open_auth_file() should have one matching call of free_auth_file().

Rather than having tokenize_auth_file() return a memory context that
includes all the records, this now creates and drops one memory context
each time the function is called.  This will simplify recursive calls to
this routine for the upcoming inclusion record logic.

While on it, rename tokenize_inc_file() to tokenize_expand_file() as
this would conflict with the upcoming patch that will add inclusion
records for HBA/ident files.

Reloading HBA/indent configuration in a tight loop shows no leaks, as of
one type of test done (with and without -DEXEC_BACKEND).
---
 src/include/libpq/hba.h  |   5 +-
 src/backend/libpq/hba.c  | 152 +++
 src/backend/utils/adt/hbafuncs.c |  12 +--
 3 files changed, 119 insertions(+), 50 deletions(-)

diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index a84a5f0961..90c51ad6fa 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -179,7 +179,8 @@ extern IdentLine *parse_ident_line(TokenizedAuthLine *tok_line, int elevel);
 extern bool pg_isblank(const char c);
 extern FILE *open_auth_file(const char *filename, int elevel, int depth,
 			char **err_msg);
-extern MemoryContext tokenize_auth_file(const char *filename, FILE *file,
-		List **tok_lines, int elevel, int depth);
+extern void free_auth_file(FILE *file, int depth);
+extern void tokenize_auth_file(const char *filename, FILE *file,
+			   List **tok_lines, int elevel, int depth);

 #endif			/* HBA_H */
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index abdebeb3f8..9064ad6714 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -76,6 +76,13 @@ typedef struct
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-11-22 Thread Justin Pryzby

Note that 001 fails to compile without 002:

../src/backend/storage/buffer/bufmgr.c:1257:43: error: ‘from_ring’ undeclared 
(first use in this function)
 1257 |   StrategyRejectBuffer(strategy, buf, from_ring))

My "warnings" script informed me about these gripes from MSVC:

[03:42:30.607] c:\cirrus>call sh -c 'if grep ": warning " build.txt; then exit 
1; fi; exit 0' 
[03:42:30.749] c:\cirrus\src\backend\storage\buffer\freelist.c(699) : warning 
C4715: 'IOContextForStrategy': not all control paths return a value
[03:42:30.749] c:\cirrus\src\backend\utils\activity\pgstat_io_ops.c(190) : 
warning C4715: 'pgstat_io_context_desc': not all control paths return a value
[03:42:30.749] c:\cirrus\src\backend\utils\activity\pgstat_io_ops.c(204) : 
warning C4715: 'pgstat_io_object_desc': not all control paths return a value
[03:42:30.749] c:\cirrus\src\backend\utils\activity\pgstat_io_ops.c(226) : 
warning C4715: 'pgstat_io_op_desc': not all control paths return a value
[03:42:30.749] c:\cirrus\src\backend\utils\adt\pgstatfuncs.c(1816) : warning 
C4715: 'pgstat_io_op_get_index': not all control paths return a value

In the docs table, you say things like:
| io_context vacuum refers to the IO operations incurred while vacuuming and 
analyzing.

..but it's a bit unclear (maybe due to the way the docs are rendered).
I think it may be more clear to say "when  is
, ..."

| acquiring the equivalent number of shared buffers

I don't think "equivelent" fits here, since it's actually acquiring a
different number of buffers.

There's a missing period before " The difference is"

The sentence beginning "read plus extended for backend_types" is difficult to
parse due to having a bulleted list in its middle.

There aren't many references to "IOOps", which is good, because I
started to read it as "I oops".

+* Flush IO Operations statistics now. pgstat_report_stat() will flush 
IO
+* Operation stats, however this will not be called after an entire

=> I think that's intended to say *until* after ?

+ * Functions to assert that invalid IO Operation counters are zero.

=> There's a missing newline above this comment.

+   Assert(counters->evictions == 0 && counters->extends == 0 &&
+   counters->fsyncs == 0 && counters->reads == 0 && 
counters->reuses
+   == 0 && counters->writes == 0);

=> It'd be more readable and also maybe help debugging if these were separate
assertions.  I wondered in the past if that should be a general policy
for all assertions.

+pgstat_io_op_stats_collected(BackendType bktype)
+{
+   return bktype != B_INVALID && bktype != B_ARCHIVER && bktype != 
B_LOGGER &&
+   bktype != B_WAL_RECEIVER && bktype != B_WAL_WRITER;

Similar: I'd prefer to see this as 5 "ifs" or a "switch" to return
false, else return true.  But YMMV.

+* CREATE TEMPORRARY TABLE AS ...

=> typo: temporary

+   if (strategy_io_context  && io_op == IOOP_FSYNC)

=> Extra space.

pgstat_count_io_op() has a superflous newline before "}".

I think there may be a problem/deficiency with hint bits:

|postgres=# DROP TABLE u2; CREATE TABLE u2 AS SELECT 
generate_series(1,99)a; SELECT pg_stat_reset_shared('io'); explain 
(analyze,buffers) SELECT * FROM u2;
|...
| Seq Scan on u2  (cost=0.00..15708.75 rows=1128375 width=4) (actual 
time=0.111..458.239 rows=99 loops=1)
|   Buffers: shared hit=2048 read=2377 dirtied=2377 written=2345

|postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM 
pg_buffercache b LEFT JOIN pg_class c ON 
pg_relation_filenode(c.oid)=b.relfilenode GROUP BY 2 ORDER BY 1 DESC LIMIT 11;
| count | relname | count
|---+-+---
| 13619 | | 0
|  2080 | u2  |  2080
|   104 | pg_attribute| 4
|71 | pg_statistic| 1
|51 | pg_class| 1

It says that SELECT caused 2377 buffers to be dirtied, of which 2080 are
associated with the new table in pg_buffercache.

|postgres=# SELECT * FROM pg_stat_io WHERE 
backend_type!~'autovac|archiver|logger|standalone|startup|^wal|background 
worker' or true ORDER BY 2;
|backend_type | io_context  |   io_object   | read | written | extended 
| op_bytes | evicted | reused | files_synced |  stats_reset
|...
| client backend  | bulkread| relation  | 2377 |2345 |  
| 8192 |   0 |   2345 |  | 2022-11-22 22:32:33.044552-06

I think it's a known behavior that hint bits do not use the strategy
ring buffer.  For BAS_BULKREAD, ring_size = 256kB (32, 8kB pages), but
there's 2080 dirty pages in the buffercache (~16MB).

But the IO view says that 2345 of the pages were "reused", which seems
misleading to me.  Maybe that just follows from the behavior and the view is
fine.  If the view is fine, maybe this case should still be

Re: Collation version tracking for macOS

2022-11-22 Thread Thomas Munro

On Tue, Nov 22, 2022 at 7:34 PM Jeff Davis  wrote:
> On Sat, 2022-10-22 at 14:22 +1300, Thomas Munro wrote:
> > Problem 2:  If ICU 67 ever decides to report a different version for
> > a
> > given collation (would it ever do that?  I don't expect so, but ...),
> > we'd be unable to open the collation with the search-by-collversion
> > design, and potentially the database.  What is a user supposed to do
> > then?  Presumably our error/hint for that would be "please insert the
> > correct ICU library into drive A", but now there is no correct
> > library
>
> Let's say that Postgres is compiled against version 67.X, and the
> sysadmin upgrades the ICU package to 67.Y, which reports a different
> collation version for some locale.
>
> Your current patch makes this impossible for the administrator to fix,
> because there's no way to have two different libraries loaded with the
> same major version number, so it will always pick the compiled-in ICU.
> The user will be forced to accept the new version of the collation, see
> WARNINGs in their logs, and possibly corrupt their indexes.

They could probably also 'pin' the older minor version package using
their package manager (= downgrade) until they're ready to upgrade and
use REFRESH VERSION to certify that they've rebuilt everything
relevant or are OK with risks.  Not pretty I admit, but I think the
end result is about the same for search-for-collversion, because I
imagine that (1) the default behaviour on failure to search would
likely be to use the linked library instead and WARN about
[dat]collversion mismatch, so far the same, and (2) the set of people
who would really be prepared to compile their own copy of 67.X instead
of downgrading or REFRESHing (with or without rebuilding) is
vanishingly small.

Two questions I wondered about:

1.  *Do* they change ucol_getVersion() values in minor releases?  I
tried to find a written policy on that.
https://icu.unicode.org/processes is not encouraging: it gives the
example of a "third digit in an official release number" [changing]
because a CLDR change was incorporated.  Hrmph.  But that's clearly
not even the modern ICU versioning system (it made a change a bit like
ours in 49, making the first number only major, so maybe that "third"
number is now the second number, AKA minor version), and also that's a
CLDR minor version change; is CLDR minor even in the recipe for
ucol_getVersion()?  Even without data changes, I guess that bug fixes
could apply to the UCA logic, and I assume that UCA logic is included
in it.  Hmm.

A non-hypothetical example of a CLDR change within an ICU major
version that I've been able to find is:

https://cldr.unicode.org/index/downloads/cldr-38

Here we see that CLDR had a minor version bump 38 -> 38.1, "a very
small number of incremental additions to version 38 to address the
specific bugs listed in Δ38.1", and was included in ICU 68.2.  Being a
minor ICU release 68.1 -> 68.2, perhaps you could finish up running
that just with a regular upgrade on typical distros (not a major OS
upgrade), and since PostgreSQL would normally be linked against eg
.68, not .68.1, it'd start using it at the next cluster start when
that symlink is updated to point to .68.2.  As it happens, if you
follow the documentation links to see what actually changed in that
particular pair of CLDR+ICU minor releases, it's timezones and locale
stuff other than collations, so wouldn't affect us.  Can we find a
chapter and verse that says that ICU would only ever move to a new
CLDR in a minor release, and CLDR would never change order of
pre-existing code points in a minor release?

It might be interesting to see if
https://github.com/unicode-org/icu/tree/release-68-1 and
https://github.com/unicode-org/icu/tree/release-68-2 report a
different ucol_getVersion() for any locale, but not conclusive if it
doesn't; it might be because something in the version pipeline knew
that particular CLDR change didn't affect collators...

This speculation feels pretty useless.  Maybe we should go and read
the code or ask an ICU expert, but I'm not against making it
theoretically possible to access two different minor versions at once,
just to cover all the bases for future-proofing.

2.  Would package managers ever allow two minor versions to be
installed at once?  I highly doubt it; they're probably more
interested in ABI stability so that dependent packages work when
bugfixes are shipped, and that's certainly nailed down at the major
version level.  It'd probably be a case of having to compile it
yourself, which seems unlikely to me in the real world.  That's why I
left minor version out of earlier patches, but I'm OK with changing
that.

As for how, I think that depends on our modelling decision (see below).

> Search-by-collversion would still be frustrating for the admin, but at
> least it would be possible to fix by compiling their own 67.X and
> asking Postgres to search that library, too. We could make it slightly
> more friendly by

Re: Non-decimal integer literals

2022-11-22 Thread John Naylor

On Tue, Nov 22, 2022 at 8:36 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 15.11.22 11:31, Peter Eisentraut wrote:
> > On 14.11.22 08:25, John Naylor wrote:
> >> Regarding the patch, it looks good overall. My only suggestion would
> >> be to add a regression test for just below and just above overflow, at
> >> least for int2.
> >
> > ok
>
> This was a valuable suggestion, because this found some breakage.  In
> particular, the handling of grammar-level literals that overflow to
> "Float" was not correct.  (The radix prefix was simply stripped and
> forgotten.)  So I added a bunch more tests for this.  Here is a new patch.

Looks good to me.

--
John Naylor
EDB: http://www.enterprisedb.com

RE: Ability to reference other extensions by schema in extension scripts

2022-11-22 Thread Regina Obe

> 
> "Regina Obe"  writes:
> >> I have a distinct sense of deja vu here.  I think this idea, or
> >> something isomorphic to it, was previously discussed with some other
> syntax details.
> 
> > I found the old discussion I recalled having and Stephen had suggested
> > using @extschema{'postgis'}@ On this thread --
> > https://www.postgresql.org/message-
> id/20160425232251.GR10850@tamriel.s
> > nowman.net
> > Is that the one you remember?
> 
> Hmmm ... no, ISTM it was considerably more recent than that.
> [ ...digs... ]  Here we go, it was in the discussion around converting
contrib SQL
> functions to new-style:
> 
> https://www.postgresql.org/message-
> id/flat/3395418.1618352794%40sss.pgh.pa.us
> 
> There are a few different ideas bandied around in there.
> Personally I still like the @extschema:extensionname@ option the best,
> though.
> 
>   regards, tom lane

Here is first version of my patch using the @extschema:extensionname@ syntax
you proposed.

This patch includes:
1) Changes to replace references of @extschema:extensionname@ with the
schema of the required extension
2) Documentation for the feature
3) Tests for the feature.

There is one issue I thought about that is not addressed by this.

If an extension is required by another extension and that required extension
schema is referenced in the extension scripts using the
@extschema:extensionname@ syntax, then ideally we should prevent the
required extension from being relocatable.  This would prevent a user from
accidentally moving the required extension, thus breaking the dependent
extensions.

I didn't add that feature cause I wasn't sure if it was overstepping the
bounds of what should be done, or if we leave it up to the user to just know
better.

Thanks,
Regina

0001-Allow-use-of-extschema-reqextname-to-reference.patch
Description: Binary data

Re: [PoC] configurable out of disk space elog level

2022-11-22 Thread Greg Stark

On Thu, 17 Nov 2022 at 14:56, Robert Haas  wrote:
>
> Having a switch for one particular kind of error (out of many that
> could possibly occur) that triggers one particular coping strategy
> (out of many that could possibly be used) seems far too specific a
> thing to add as a core feature. And even if we had something much more
> general, I'm not sure why that should go into the database rather than
> being implemented outside it. After all, nothing at all prevents the
> user from scanning the database logs for "out of space" errors and
> shutting down the database if any are found. While you're at it, you
> could make your monitoring script also check the free space on the
> relevant partition using statfs() and page somebody if the utilization
> goes above 95% or whatever threshold you like, which would probably
> avoid service outages much more effectively than $SUBJECT.

I have often thought we report a lot of errors to the user as
transaction errors that database admins are often going to feel they
would rather treat as system-wide errors. Often the error the user
sees seems like a very low level error with no context that they can't
do anythign about. This seems to be an example of that.

I don't really have a good solution for it but I do think most users
would rather deal with these errors at a higher level than individual
queries from individual clients. Out of disk space, hardware errors,
out of memory, etc they would rather handle in one centralized place
as a global condition. You can work around that with
middleware/libraries/monitoring but it's kind of working around the
database giving you the information at the wrong time and place for
your needs.

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila

On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas
 wrote:
>
> Reading more carefully what you described, I think you are interested in 
> getting something you call origin from publishers, probably some metadata 
> from the publications.
>
> This identifier in those metadata maybe does not have business value on the 
> reporting side. The idea is to use a value which has specific meaning to the 
> user at the end.
>
> For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
> based on a dimension table which holds this mapping the user would be able to 
> filter the data. So programmatically the user can set the id value of the 
> column plus creating the mapping table from an application let’s say and be 
> able to distinguish the data.
>

In your example, are different tenants represent different publisher
nodes? If so, why can't we have a predefined column and value for the
required tables on each publisher rather than logical replication
generate that value while replicating data?

-- 
With Regards,
Amit Kapila.

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila

On Tue, Nov 22, 2022 at 6:22 PM Stavros Koureas
 wrote:
>
> Sure, this can be implemented as a subscription option, and it will cover 
> this use case scenario as each subscriber points only to one database.
> I also have some more analytical/reporting use-cases which need additions in 
> logical-replication, I am not sure if I need to open different discussions 
> for each one, all ideas are for publication/subscription.
>

I think to some extent it depends on how unique each idea is but
initially you may want to post here and then we can spin off different
threads for a discussion if required. Are you interested in working on
one or more of those ideas to make them reality or do you want others
to pick up based on their interest?

-- 
With Regards,
Amit Kapila.

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-22 Thread Amit Kapila

On Tue, Nov 22, 2022 at 10:33 PM Maxim Orlov  wrote:
>>
>>
>> Regarding the tests, the patch includes a new scenario to
>> reproduce this issue. However, since the issue can be reproduced also
>> by the existing scenario (with low probability, though), I'm not sure
>> it's worth adding the new scenario.
>
> AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it does 
> not worth it. But, I do not have a strong opinion here.
> Let's add tests in a separate commit and let the actual committer to decide 
> what to do, should we?
>

+1 to not have a test for this as the scenario can already be tested
by the existing set of tests.

-- 
With Regards,
Amit Kapila.

Re: ssl tests aren't concurrency safe due to get_free_port()

2022-11-22 Thread Andrew Dunstan



> On Nov 22, 2022, at 8:36 PM, Tom Lane  wrote:
> 
> Andres Freund  writes:
>> While looking into a weird buildfarm failure ([1]), I noticed this:
> 
>> # Checking port 62707
>> Use of uninitialized value $pid in scalar chomp at 
>> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>>  line 1247.
>> Use of uninitialized value $pid in addition (+) at 
>> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>>  line 1248.
> 
> Yeah, my animals are showing that too.
> 
>> Not quite sure how $pid ends up uninitialized, given the code:
>># see if someone else has or had a reservation of this port
>>my $pid = <$portfile>;
>>chomp $pid;
>>if ($pid +0 > 0)
> 
> I guess the <$portfile> might return undef if the file is empty?
> 

Probably, will fix in the morning 

Cheers

Andrew

Re: Logical replication missing information

2022-11-22 Thread Peter Smith

On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form
 wrote:
>
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html
> Description:

Hi,

FYI - I have forwarded this post to the hacker's list, where I think
it will receive more attention.

I am not sure why that (above) page was cited -- the section "31.3 Row
Filters" is specifically about row filtering, whereas the items you
reported seem unrelated to row filters, but are generic for all
Logical Replication.

>
> There are several things missing here and some of them I found to be highly
> important:
> 1. How can I find why a logical replication failed. Currently I only can see
> it "does nothing" in pg_stat_subscriptions.

There should be logs reporting any replication conflicts etc. See [1]
for example logs. See also the answer for #2 below.

> 2. In case of copying the existing data: how can I find which tables or
> partitions were processed and which are on the processing queue (while
> monitoring I have observed no specific order or rule).

There is no predictable processing queue or order - The initial
tablesyncs might be happening in multiple asynchronous processes
according to the GUC max_sync_workers_per_subscription [2].

Below I show examples of replicating two tables (tab1 and tab2).

~~

>From the logs you should see which table syncs have completed OK:

e.g. (the initial copy is all good)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:23:18.513 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:23:18.524 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:23:18.593 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
2022-11-23 12:23:18.611 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
finished

e.g. (where there is conflict in table tab2)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:40:56.808 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:40:56.819 AEDT [23405] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:40:56.890 AEDT [23405] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:40:56.890 AEDT [23405] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:40:56.890 AEDT [23405] CONTEXT:  COPY tab2, line 1
2022-11-23 12:40:56.891 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 23405) exited with exit code 1
2022-11-23 12:40:56.902 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
...

~~

Alternatively, you can use some SQL query to discover which tables of
the subscription had attained a READY state. The READY state (denoted
by 'r') means that the initial COPY was completed ok. The table
replication state is found in the 'srsubstate' column. See [3]

e.g. (the initial copy is all good)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;
 srsubid | srrelid | subname | relname | srsubstate
-+-+-+-+
   16418 |   16409 | sub1| tab1| r
   16418 |   16402 | sub1| tab2| r
(2 rows)

e.g. (where it has a conflict in table tab2, so it did not get to READY state)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23
12:41:37.686 AEDT [24501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:41:37.774 AEDT [24501] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:41:37.774 AEDT [24501] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:41:37.774 AEDT [24501] CONTEXT:  COPY tab2, line 1
2022-11-23 12:41:37.775 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 24501) exited with exit code 1

 srsubid | srrelid |

odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-22 Thread Andres Freund

Hi,

My buildfarm animal grassquit just showed an odd failure [1] in REL_11_STABLE:

ok 10 - standby is in recovery
# Running: pg_ctl -D 
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/src/bin/pg_ctl/tmp_check/t_003_promote_standby2_data/pgdata
 promote
waiting for server to promotepg_ctl: control file appears to be corrupt
not ok 11 - pg_ctl promote of standby runs

#   Failed test 'pg_ctl promote of standby runs'
#   at 
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/TestLib.pm
 line 474.


I didn't find other references to this kind of failure. Nor has the error
re-occurred on grassquit.


I don't immediately see a way for this message to be hit that's not indicating
a bug somewhere. We should be updating the control file in an atomic way and
read it in an atomic way.


The failure has to be happening in wait_for_postmaster_promote(), because the
standby2 is actually successfully promoted.

Greetings,

Andres Freund

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2022-11-22%2016%3A33%3A57

Re: ssl tests aren't concurrency safe due to get_free_port()

2022-11-22 Thread Tom Lane

Andres Freund  writes:
> While looking into a weird buildfarm failure ([1]), I noticed this:

> # Checking port 62707
> Use of uninitialized value $pid in scalar chomp at 
> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>  line 1247.
> Use of uninitialized value $pid in addition (+) at 
> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>  line 1248.

Yeah, my animals are showing that too.

> Not quite sure how $pid ends up uninitialized, given the code:
>   # see if someone else has or had a reservation of this port
>   my $pid = <$portfile>;
>   chomp $pid;
>   if ($pid +0 > 0)

I guess the <$portfile> might return undef if the file is empty?

regards, tom lane

Re: Prefetch the next tuple's memory during seqscans

2022-11-22 Thread John Naylor

On Wed, Nov 23, 2022 at 5:00 AM David Rowley  wrote:
>
> On Thu, 3 Nov 2022 at 22:09, John Naylor 
wrote:
> > I tried a similar test, but with text fields of random length, and
there is improvement here:
>
> Thank you for testing that. Can you share which CPU this was on?

That was an Intel Core i7-10750H

--
John Naylor
EDB: http://www.enterprisedb.com

Re: ssl tests aren't concurrency safe due to get_free_port()

2022-11-22 Thread Andres Freund

Hi,

On 2022-11-22 10:57:41 -0500, Andrew Dunstan wrote:
> On 2022-11-20 Su 14:05, Andres Freund wrote:
> >> If it works ok I will backpatch in couple of days.
> > +1
> Done.

While looking into a weird buildfarm failure ([1]), I noticed this:

# Checking port 62707
Use of uninitialized value $pid in scalar chomp at 
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
 line 1247.
Use of uninitialized value $pid in addition (+) at 
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
 line 1248.

This isn't related the failure afaics. I think it's happening for all runs on
all branches on my host. And also a few other animals [2].

Not quite sure how $pid ends up uninitialized, given the code:
# see if someone else has or had a reservation of this port
my $pid = <$portfile>;
chomp $pid;
if ($pid +0 > 0)

Greetings,

Andres Freund

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2022-11-22%2016%3A33%3A57

The main symptom is
# Running: pg_ctl -D 
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/src/bin/pg_ctl/tmp_check/t_003_promote_standby2_data/pgdata
 promote
waiting for server to promote
pg_ctl: control file appears to be corrupt

[2] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=peripatus=2022-11-23%2000%3A20%3A13=pg_ctl-check

Re: More efficient build farm animal wakeup?

2022-11-22 Thread Andres Freund

Hi,

On 2022-11-22 17:35:12 -0500, Andrew Dunstan wrote:
> The server side appears to be working well.
> 
> The new client side code is being tested on crake and working fine - the
> all-up-to-date case takes just a second or two, almost all of which is
> taken with getting the json file from the server. No git calls at all
> are done on the client in this case.

It's a huge improvement here. I start one set of animals via systemd timers
and updated that buildfarm client to 24a6bb0 (because it shows the cpu/network
resources used):

Before:
Nov 21 20:36:01 bf-valgrind-v4 systemd[1]: Starting PG buildfarm spinlock...
...
Nov 21 20:36:14 bf-valgrind-v4 systemd[1]: bf@spinlock.service: Consumed 2.346s 
CPU time, received 578.3K IP traffic, sent 32.3K IP traffic.

Now:

Nov 23 00:59:25 bf-valgrind-v4 systemd[1]: Starting PG buildfarm spinlock...
...
Nov 23 00:59:26 bf-valgrind-v4 systemd[1]: bf@spinlock.service: Consumed 173ms 
CPU time, received 5.2K IP traffic, sent 1.8K IP traffic.

Both of these are for builds that didn't do anything.


Leaving wall clock time and resource usage aside, the output is also just a
lot more readable:

Nov 21 20:36:02 bf-valgrind-v4 run_branches.pl[1188989]: Mon Nov 21 20:36:02 
2022: buildfarm run for francolin:REL_11_STABLE starting
Nov 21 20:36:02 bf-valgrind-v4 run_branches.pl[1188989]: 
francolin:REL_11_STABLE [20:36:02] checking out source ...
Nov 21 20:36:04 bf-valgrind-v4 run_branches.pl[1188989]: 
francolin:REL_11_STABLE [20:36:04] checking if build run needed ...
Nov 21 20:36:04 bf-valgrind-v4 run_branches.pl[1188989]: 
francolin:REL_11_STABLE [20:36:04] No build required: last status = Sat Nov 19 
21:56:55 2022 GMT, current snapshot = Sat Nov 19 20:36:52 2022 GMT, changed 
files = 0
Nov 21 20:36:04 bf-valgrind-v4 run_branches.pl[1189119]: Mon Nov 21 20:36:04 
2022: buildfarm run for francolin:REL_12_STABLE starting
Nov 21 20:36:04 bf-valgrind-v4 run_branches.pl[1189119]: 
francolin:REL_12_STABLE [20:36:04] checking out source ...
Nov 21 20:36:06 bf-valgrind-v4 run_branches.pl[1189119]: 
francolin:REL_12_STABLE [20:36:06] checking if build run needed ...
Nov 21 20:36:06 bf-valgrind-v4 run_branches.pl[1189119]: 
francolin:REL_12_STABLE [20:36:06] No build required: last status = Sat Nov 19 
22:52:54 2022 GMT, current snapshot = Sat Nov 19 20:36:48 2022 GMT, changed 
files = 0
Nov 21 20:36:06 bf-valgrind-v4 run_branches.pl[1189233]: Mon Nov 21 20:36:06 
2022: buildfarm run for francolin:REL_13_STABLE starting
Nov 21 20:36:06 bf-valgrind-v4 run_branches.pl[1189233]: 
francolin:REL_13_STABLE [20:36:06] checking out source ...
Nov 21 20:36:08 bf-valgrind-v4 run_branches.pl[1189233]: 
francolin:REL_13_STABLE [20:36:08] checking if build run needed ...
Nov 21 20:36:08 bf-valgrind-v4 run_branches.pl[1189233]: 
francolin:REL_13_STABLE [20:36:08] No build required: last status = Sat Nov 19 
23:12:55 2022 GMT, current snapshot = Sat Nov 19 20:36:33 2022 GMT, changed 
files = 0
Nov 21 20:36:08 bf-valgrind-v4 run_branches.pl[1189298]: Mon Nov 21 20:36:08 
2022: buildfarm run for francolin:REL_14_STABLE starting
Nov 21 20:36:08 bf-valgrind-v4 run_branches.pl[1189298]: 
francolin:REL_14_STABLE [20:36:08] checking out source ...
Nov 21 20:36:10 bf-valgrind-v4 run_branches.pl[1189298]: 
francolin:REL_14_STABLE [20:36:10] checking if build run needed ...
Nov 21 20:36:10 bf-valgrind-v4 run_branches.pl[1189298]: 
francolin:REL_14_STABLE [20:36:10] No build required: last status = Mon Nov 21 
15:51:38 2022 GMT, current snapshot = Mon Nov 21 15:50:50 2022 GMT, changed 
files = 0
Nov 21 20:36:10 bf-valgrind-v4 run_branches.pl[1189364]: Mon Nov 21 20:36:10 
2022: buildfarm run for francolin:REL_15_STABLE starting
Nov 21 20:36:10 bf-valgrind-v4 run_branches.pl[1189364]: 
francolin:REL_15_STABLE [20:36:10] checking out source ...
Nov 21 20:36:12 bf-valgrind-v4 run_branches.pl[1189364]: 
francolin:REL_15_STABLE [20:36:12] checking if build run needed ...
Nov 21 20:36:12 bf-valgrind-v4 run_branches.pl[1189364]: 
francolin:REL_15_STABLE [20:36:12] No build required: last status = Mon Nov 21 
16:26:28 2022 GMT, current snapshot = Mon Nov 21 15:50:50 2022 GMT, changed 
files = 0
Nov 21 20:36:12 bf-valgrind-v4 run_branches.pl[1189432]: Mon Nov 21 20:36:12 
2022: buildfarm run for francolin:HEAD starting
Nov 21 20:36:12 bf-valgrind-v4 run_branches.pl[1189432]: francolin:HEAD 
 [20:36:12] checking out source ...
Nov 21 20:36:14 bf-valgrind-v4 run_branches.pl[1189432]: francolin:HEAD 
 [20:36:14] checking if build run needed ...
Nov 21 20:36:14 bf-valgrind-v4 run_branches.pl[1189432]: francolin:HEAD 
 [20:36:14] No build required: last status = Mon Nov 21 17:31:31 2022 GMT, 
current snapshot = Mon Nov 21 16:59:29 2022 GMT, changed files = 0

vs

Nov 23 00:59:26 bf-valgrind-v4 run_branches.pl[4125973]: Wed Nov 23 00:59:26 
2022: francolin:REL_11_STABLE is up to date.
Nov 23 00:59:26 bf-valgrind-v4 run_branches.pl[4125973]: Wed Nov 23 00:59:26 
2022: francolin:REL_12_STABLE is up

Re: TAP output format in pg_regress

2022-11-22 Thread Andres Freund

Hi,


On 2022-11-22 23:17:44 +0100, Daniel Gustafsson wrote:
> The attached v10 attempts to address the points raised above.  Notes and
> diagnostics are printed to stdout/stderr respectively and the TAP emitter is
> changed to move more of the syntax into it making it less painful on the
> translators.

Thanks! I played a bit with it locally and it's nice.

I think it might be worth adding a few more details to the output on stderr,
e.g. a condensed list of failed tests, as that's then printed in the errorlogs
summary in meson after running all tests. As we don't do that in the current
output, that seems more like an optimization for later.


My compiler complains with:

[6/80 7   7%] Compiling C object src/test/regress/pg_regress.p/pg_regress.c.o
../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c: In 
function ‘emit_tap_output_v’:
../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:354:9: 
warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ 
format attribute [-Wsuggest-attribute=format]
  354 | vfprintf(fp, fmt, argp);
  | ^~~~
../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:356:17: 
warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ 
format attribute [-Wsuggest-attribute=format]
  356 | vfprintf(logfile, fmt, argp_logfile);
  | ^~~~

Which seems justified and easily remedied via pg_attribute_printf().


I think there might be something slightly wrong with 'tags' - understandable,
since there's basically no comment explaining what it's supposed to do. I
added an intentional failure to 'show.pgc', which then yielded the following
tap output:
ok 51sql/quote  15 ms
not ok 52sql/show9 ms
# tags: stdout source ok 53sql/insupd 
11 ms
ok 54sql/parser 13 ms

which then subsequently leads meson to complain that
TAP parsing error: out of order test numbers
TAP parsing error: Too few tests run (expected 62, got 61)

Looks like all that's needed is a \n after "tags: %s"


I think the patch is also missing a \n after in log_child_failure().

If I kill postgres during a test I get:

# parallel group (12 tests):  regex tstypes geometry type_sanity misc_sanity 
horology expressions unicode mvcc opr_sanity comments xid
not ok 43   geometry 8 ms
# (test process exited with exit code 2)not ok 44   horology
 9 ms
# (test process exited with exit code 2)not ok 45   tstypes 
 7 ms
# (test process exited with exit code 2)not ok 46   regex   
 7 ms




> Subject: [PATCH v10 2/2] Experimental: meson: treat regress tests as TAP

I'd just squash that in I think. Isn't really experimental either IMO ;)


> +static void
> +emit_tap_output_v(TAPtype type, const char *fmt, va_list argp)
> +{
> + va_list argp_logfile;
> + FILE   *fp = stdout;
> +
> + /* We only need to copy the arg array in case we actually need it */
>   if (logfile)
> - fprintf(logfile, "\n");
> + va_copy(argp_logfile, argp);
> +
> + /*
> +  * Diagnostic output will be hidden by prove unless printed to stderr.
> +  * The Bail message is also printed to stderr to aid debugging under a
> +  * harness which might otherwise not emit such an important message.
> +  */
> + if (type == DIAG || type == BAIL)
> + fp = stderr;

Personally I'd move the initialization of fp to an else branch here, but
that's a very minor taste thing.


> + /*
> +  * Non-protocol output such as diagnostics or notes must be prefixed by
> +  * a '#' character. We print the Bail message like this too.
> +  */
> + if (type == NOTE || type == DIAG || type == BAIL)
> + {
> + fprintf(fp, "# ");
> + if (logfile)
> + fprintf(logfile, "# ");
> + }
> + vfprintf(fp, fmt, argp);
> + if (logfile)
> + vfprintf(logfile, fmt, argp_logfile);
> + /*
> +  * If this was a Bail message, the bail protocol message must go to
> +  * stdout separately.
> +  */
> + if (type == BAIL)
> + {
> + fprintf(stdout, "Bail Out!");
> + if (logfile)
> + fprintf(logfile, "Bail Out!");

I think this needs a \n.


> + }
> +
> + fflush(NULL);
>  }

I was wondering whether it's worth having an printf wrapper that does the
vfprintf(); if (logfile) vfprintf() dance. But it's proably not.


> @@ -1089,9 +1201,8 @@ spawn_process(const char *cmdline)
>  
>   cmdline2 = psprintf("exec %s", cmdline);
>   execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
> - fprintf(stderr, _("%s:

Re: More efficient build farm animal wakeup?

2022-11-22 Thread Tom Lane

Andrew Dunstan  writes:
> The new client side code is being tested on crake and working fine - the
> all-up-to-date case takes just a second or two, almost all of which is
> taken with getting the json file from the server. No git calls at all
> are done on the client in this case.

Nice!  I installed the new run_branches.pl file on sifaka, and it seems to
be doing the right things.  With the much lower overhead, I've reduced
that cronjob's cycle time from five minutes to one, so that machine's
response time should be even better.

It'll probably help my slower animals even more, so I'm off to
update them as well.

regards, tom lane

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Peter Smith

On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
 wrote:
>
> Reading more carefully what you described, I think you are interested in 
> getting something you call origin from publishers, probably some metadata 
> from the publications.
>
> This identifier in those metadata maybe does not have business value on the 
> reporting side. The idea is to use a value which has specific meaning to the 
> user at the end.
>
> For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
> based on a dimension table which holds this mapping the user would be able to 
> filter the data. So programmatically the user can set the id value of the 
> column plus creating the mapping table from an application let’s say and be 
> able to distinguish the data.
>
> In addition this column should have the ability to be part of the primary key 
> on the subscription table in order to not conflict with lines from other 
> tenants having the same keys.
>
>

I was wondering if a simpler syntax solution might also work here.

Imagine another SUBSCRIPTION parameter that indicates to write the
*name* of the subscription to some pre-defined table column:
e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
CONNECTION '...' WITH (subscription_column);

Logical Replication already allows the subscriber table to have extra
columns, so you just need to manually create the extra 'subscription'
column up-front.

Then...

~~

On Publisher:

test_pub=# CREATE TABLE tab(id int primary key, description varchar);
CREATE TABLE

test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
INSERT 0 3

test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
CREATE PUBLICATION

~~

On Subscriber:

test_sub=# CREATE TABLE tab(id int, description varchar, subscription varchar);
CREATE TABLE

test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
CREATE SUBSCRIPTION

test_sub=# SELECT * FROM tab;
 id | description | subscription
+-+--
  1 | one | sub_tenant1
  2 | two | sub_tenant1
  3 | three   | sub_tenant1
(3 rows)

~~

Subscriptions to different tenants would be named differently.

And using other SQL you can map/filter those names however your
application wants.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Make ON_ERROR_STOP stop on shell script failure

2022-11-22 Thread Matheus Alcantara

--- Original Message ---
On Tuesday, November 22nd, 2022 at 20:10, bt22nakamorit 
 wrote:

> There was a mistake in the error message for \! so I updated the patch.
> 
> Best,
> Tatsuhiro Nakamori

Hi

I was checking your patch and seems that it failed to be applied into the
master cleanly. Could you please rebase it?

--
Matheus Alcantara

Re: CI and test improvements

2022-11-22 Thread Justin Pryzby

On Mon, Nov 21, 2022 at 02:45:42PM -0800, Andres Freund wrote:
> > > > +ninja -C build |tee build/meson-logs/build.txt
> > > > +REM Since pipes lose exit status of the preceding command, rerun 
> > > > compilation,
> > > > +REM without the pipe exiting now if it fails, rather than trying 
> > > > to run checks
> > > > +ninja -C build > nul
> > > 
> > > This seems mighty grotty :(. but I guess it's quick enough not worry 
> > > about,
> > > and I can't come up with a better plan.
> > > 
> > > It doesn't seem quite right to redirect into meson-logs/ to me, my
> > > interpretation is that that's "meson's namespace".  Why not just store it 
> > > in
> > > build/?
> > 
> > I put it there so it'd be included with the build artifacts.
> 
> Wouldn't just naming it build-warnings.log suffice? I don't think we
> want to actually upload build.txt - it already is captured.

Originally, I wanted the input and the output to be available as files
and not just in cirrus' web GUI, but maybe that's not important anymore.
I rewrote it again.

> > I'm not sure if this ought to be combined with/before/after your "move
> > compilerwarnings task to meson" patch?  (Regarding that patch: I
> > mentioned that it shouldn't use ccache -C, and it should use
> > meson_log_artifacts.)
> 
> TBH, I'm not quite sure a separate docs task does really still make
> sense after the SanityCheck task. It's worth building the docs even if
> some flappy test fails, but I don't think we should try to build the
> docs if the code doesn't even compile, in all likelihood a lot more is
> wrong in that case.

It'd be okay either way.  I had split it out to 1) isolate the changes
in the "upload changed docs as artifacts" patch; and, 2) so the docs
artifacts are visible in a cfbot link called "Documentation"; and, 3) so
the docs task runs without a dependency on "Linux", since (as you said)
docs/errors are worth showing/reviewing/reporting/addressing separately
from test errors (perhaps similar to compiler warnings...).

I shuffled my branch around and sending now the current "docs" patches,
but I suppose this is waiting on the "convert CompilerWarnings task to
meson" patch.

-- 
Justin
>From e7eb22b85dcfe503810ea0f89f48833f04f80d3f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 01/10] cirrus/windows: add compiler_warnings_script

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

ci-os-only: windows
---
 .cirrus.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f3192ef..6ce4f393e29 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -547,12 +547,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.25.1

>From 79586f7ac46ce48648b150b6ad08987578c730f5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 11 Nov 2022 21:14:39 -0600
Subject: [PATCH 02/10] cirrus/macos: switch to "macos_instance" / M1..

This uses an M1 ARM processor rather than intel.

See:

https://cirrus-ci.org/blog/2022/11/08/sunsetting-intel-macos-instances/
https://github.com/justinpryzby/postgres/runs/9446478561
| warning We are sunsetting support for Intel-based macOS instances! Please migrate before January 1st 2023.

ci-os-only: macos
---
 .cirrus.yml | 10 ++
 src/test/kerberos/t/001_auth.pl | 14 --
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 6ce4f393e29..ed05c4adf87 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -391,7 +391,7 @@ task:
   name: macOS - Monterey - Meson
 
   env:
-CPUS: 12 # always get that much for cirrusci macOS instances
+CPUS: 4 # always get that much for cirrusci macOS instances
 BUILD_JOBS: $CPUS
 # Test performance regresses noticably when using all cores. 8 seems to
 # work OK. See
@@ -412,8 +412,10 @@ task:
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~

Re: Allow placeholders in ALTER ROLE w/o superuser

2022-11-22 Thread Steve Chavez

Hey Alexander,

Looks like your latest patch addresses the original issue I posted!

So now I can create a placeholder with the USERSET modifier without a
superuser, while non-USERSET placeholders still require superuser:

```sql
create role foo noinherit;
set role to foo;

alter role foo set prefix.bar to true user set;
ALTER ROLE

alter role foo set prefix.baz to true;
ERROR:  permission denied to set parameter "prefix.baz"

set role to postgres;
alter role foo set prefix.baz to true;
ALTER ROLE
```

Also USERSET gucs are marked(`(u)`) on `pg_db_role_setting`:

```sql
select * from pg_db_role_setting ;
 setdatabase | setrole |  setconfig
-+-+--
   0 |   16384 | {prefix.bar(u)=true,prefix.baz=true}
```

Which I guess avoids the need for adding columns to `pg_catalog` and makes
the "fix" simpler.

So from my side this all looks good!

Best regards,
Steve

On Sun, 20 Nov 2022 at 12:50, Alexander Korotkov 
wrote:

> .On Sun, Nov 20, 2022 at 8:48 PM Alexander Korotkov
>  wrote:
> > I've drafted a patch implementing ALTER ROLE ... SET ... TO ... USER SET
> syntax.
> >
> > These options are working only for USERSET GUC variables, but require
> > less privileges to set.  I think there is no problem to implement
> >
> > Also it seems that this approach doesn't conflict with future
> > privileges for USERSET GUCs [1].  I expect that USERSET GUCs should be
> > available unless explicitly REVOKEd.  That mean we should be able to
> > check those privileges during ALTER ROLE.
> >
> > Opinions on the patch draft?
> >
> > Links
> > 1.
> https://mail.google.com/mail/u/0/?ik=a20b091faa=om=msg-f%3A1749871710745577015
>
> Uh, sorry for the wrong link.  I meant
> https://www.postgresql.org/message-id/2271988.1668807...@sss.pgh.pa.us
>
> --
> Regards,
> Alexander Korotkov
>

Re: fixing CREATEROLE

2022-11-22 Thread Mark Dilger

> On Nov 22, 2022, at 2:02 PM, Robert Haas  wrote:
> 
>> Patch 0004 feels like something that won't get committed.  The 
>> INHERITCREATEDROLES and SETCREATEDROLES in 0004 seems clunky.
> 
> I think role properties are kind of clunky in general, the way we've
> implemented them in PostgreSQL, but I don't really see why these are
> worse than anything else. I think we need some way to control the
> behavior, and I don't really see a reasonable place to put it other
> than a per-role property. And if we're going to do that then they
> might as well look like the other properties that we've already got.
> 
> Do you have a better idea?

Whatever behavior is to happen in the CREATE ROLE statement should be spelled 
out in that statement.  "CREATE ROLE bob WITH INHERIT false WITH SET false" 
doesn't seem too unwieldy, and has the merit that it can be read and understood 
without reference to hidden parameters.  Forcing this to be explicit should be 
safer if these statements ultimately make their way into dump/restore scripts, 
or into logical replication.

That's not to say that I wouldn't rather that it always work one way or always 
the other.  It's just to say that I don't want it to work differently based on 
some poorly advertised property of the role executing the command.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More efficient build farm animal wakeup?

2022-11-22 Thread Andrew Dunstan



On 2022-11-22 Tu 13:04, Magnus Hagander wrote:
>
>
> On Tue, Nov 22, 2022 at 12:10 AM Magnus Hagander 
> wrote:
>
>
>
> On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan
>  wrote:
>
>
> On 2022-11-21 Mo 16:20, Magnus Hagander wrote:
> > n Mon, Nov 21, 2022 at 9:58 PM Tom Lane 
> wrote:
> >
> >     Andrew Dunstan  writes:
> b>     > The buildfarm server now creates a companion to
> >     branches_of_interest.txt
> >     > called branches_of_interest.json which looks like this:
> >
> >     ... okay ...
> >
> >
> > Yeah, it's not as efficient as something like long polling
> or web
> > sockets, but it is most definitely a lot simpler!
> >
> > If we're going to have a lot of animals do pulls of this
> file every
> > minute or more, it's certainly a lot better to pull this
> small file
> > than to make multiple git calls.
> >
> > It could trivially be made even more efficient by making the
> request
> > with either a If-None-Match or If-Modified-Since. While it's
> still
> > small, that cuts the size approximately in half, and would
> allow you
> > to skip even more processing if nothing has changed.
>
>
> I'll look at that.
>
>
> >
> >
> >     > It updates this every time it does a git fetch,
> currently every
> >     5 minutes.
> >
> >     That up-to-five-minute delay, on top of whatever cronjob
> delay one has
> >     on one's animals, seems kind of sad.  I've gotten kind
> of spoiled
> >     maybe
> >     by seeing first buildfarm results typically within 15
> minutes of a
> >     push.
> >     But if we're trying to improve matters in this area,
> this doesn't seem
> >     like quite the way to go.
> >
> >     But it does seem like this eliminates one expense.  Now
> that you have
> >     that bit, maybe we could arrange a webhook or something
> that allows
> >     branches_of_interest.json to get updated immediately
> after a push?
> >
> >
> > Webhooks are definitely a lot easier to implement in between our
> > servers yeah, so that shouldn't be too hard. We could use
> the same
> > hooks that we use for borka to build the docs, but have it
> just run
> > whatever script it is the buildfarm needs. I assume it's just
> > something trivial to run there, Andrew?
>
>
> Yes, I think much better between servers. Currently the cron
> job looks
> something like this:
>
>
> */5 * * * * cd $HOME/postgresql.git && git fetch -q &&
> $HOME/website/bin/branches_of_interest.pl
> 
>
>
> That script is what sets up the json files.
>
>
> I know nothing about git webhooks though, someone will have to
> point me
> in the right direction.
>
>
> I can set that up for you -- we have ready-made packages for 95%
> of what's needed for that one as we use it elsewhere in the infra.
> So I'll just set something up that will run that exact script (as
> the correct user of course) and comment out the cronjob,and then
> send you the details of what is set up where (I don't recall it
> offhand, but as it's the same we have elsewhere I'll find it
> quickly once I look into it).
>  
>
>
> Hi!
>
> This should now be set up, and Andrew has been sent the instructions
> for how to access that setup on the buildfarm server. So hopefully it
> will now be updating the buildfarm server side of things within a
> couple of seconds from a commit, and not do any speculative pulls. But
> we'll keep an extra eye on it for a bit of course, as it's entirely
> possible I got something worng :)
>
> (This is only the part git -> bf server, of course, as that step
> doesn't need any client changes it was easier to do quickly)
>
>

The server side appears to be working well.

The new client side code is being tested on crake and working fine - the
all-up-to-date case takes just a second or two, almost all of which is
taken with getting the json file from the server. No git calls at all
are done on the client in this case.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pgstattuple: add test for coverage

2022-11-22 Thread Andres Freund

Hi,

On 2022-10-03 00:42:27 +0900, Dong Wook Lee wrote:
> > Which indeed is the case, e.g. on 32bit systems it fails:
> >
> > https://cirrus-ci.com/task/4619535222308864?logs=test_world_32#L253
> >
> > https://api.cirrus-ci.com/v1/artifact/task/4619535222308864/testrun/build-32/testrun/pgstattuple/regress/regression.diffs
> >
> >   table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | 
> > dead_tuple_len | dead_tuple_percent | free_space | free_percent
> >  
> > ---+-+---+---+--++++--
> > -   1171456 |5000 |56 |  47.8 | 5000 |  
> >56 |   47.8 |   7452 | 0.64
> > +   1138688 |5000 |54 | 47.42 | 5000 |  
> >54 |  47.42 |  14796 |  1.3
> >  (1 row)
> >
> > ...
> >
> >
> > You definitely can't rely on such details not to change across platforms.

> Thank you for letting me know I'll fix it and check if there's any problem.

I've marked the patch as returned with feedback for now. Please change that
once you submit an updated version.

Greetings,

Andres Freund

Re: TAP output format in pg_regress

2022-11-22 Thread Daniel Gustafsson

> On 21 Nov 2022, at 14:42, Dagfinn Ilmari Mannsåker  wrote:
> 
> Andres Freund  writes:
> 
>> But either way, it seems nicer to output the # inside a helper function?
> 
> Note that the helper function should inject '# ' at the start of every
> line in the message, not just the first line. It might also be worth
> having two separate functions: one that prints to stdout, so is only
> shown by the harness is in verbose mode, and another which prints to
> stderr, so is always shown. Perl's Test::More calls these note() and
> diag(), respectively.
> 
>>> +   /*
>>> +* In order for this information (or any error information) to be shown
>>> +* when running pg_regress test suites under prove it needs to be 
>>> emitted
>>> +* stderr instead of stdout.
>>> +*/
>>> if (file_size(difffilename) > 0)
>>> {
>>> -   printf(_("The differences that caused some tests to fail can be 
>>> viewed in the\n"
>>> -"file \"%s\". A copy of the test summary that 
>>> you see\n"
>>> -"above is saved in the file \"%s\".\n\n"),
>>> +   status(_("\n# The differences that caused some tests to fail 
>>> can be viewed in the\n"
>>> +"# file \"%s\". A copy of the test summary 
>>> that you see\n"
>>> +"# above is saved in the file \"%s\".\n\n"),
>>>  difffilename, logfilename);
>> 
>> The comment about needing to print to stderr is correct - but the patch
>> doesn't do so (anymore)?
>> 
>> The count of failed tests also should go to stderr (but not the report of all
>> tests having passed).
>> 
>> bail() probably also ought to print the error to stderr, so the most 
>> important
>> detail is immediately visible when looking at the failed test result.
> 
> Agreed on all three points.

The attached v10 attempts to address the points raised above.  Notes and
diagnostics are printed to stdout/stderr respectively and the TAP emitter is
changed to move more of the syntax into it making it less painful on the
translators.

--
Daniel Gustafsson   https://vmware.com/



v10-0002-Experimental-meson-treat-regress-tests-as-TAP.patch
Description: Binary data


v10-0001-Make-pg_regress-output-format-TAP-compliant.patch
Description: Binary data

Re: PGDOCS - Logical replication GUCs - added some xrefs

2022-11-22 Thread Peter Smith

On Wed, Nov 16, 2022 at 10:24 PM vignesh C  wrote:
>
...

> One suggestion:
> The format of subscribers includes the data type and default values,
> the format of publishers does not include data type and default
> values. We can try to maintain the consistency for both publisher and
> subscriber configurations.
> +   
> +wal_level must be set to logical.
> +   
>
> + max_logical_replication_workers
> (integer)
> + 
> +  max_logical_replication_workers
> configuration parameter
> + 
> + 
> + 
> +  
> +   Specifies maximum number of logical replication workers. This
> must be set
> +   to at least the number of subscriptions (for apply workers), plus some
> +   reserve for the table synchronization workers.
> +  
> +  
>
> If we don't want to keep the same format, we could give a link to
> runtime-config-replication where data type and default is defined for
> publisher configurations max_replication_slots and max_wal_senders.
>

Thanks for your suggestions.

I have included xref links to the original definitions, rather than
defining the same GUC in multiple places.

PSA v3.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-Logical-replication-GUCs-consolidated.patch
Description: Binary data

Re: Make mesage at end-of-recovery less scary.

2022-11-22 Thread Justin Pryzby

On Fri, Nov 18, 2022 at 05:25:37PM +0900, Kyotaro Horiguchi wrote:
> + while (*p == 0 && p < pe)
> + p++;

The bug reported by Andres/cfbot/ubsan is here.

Fixed in attached.

I didn't try to patch the test case to output the failing stderr, but
that might be good.
>From 9bdf59ed0d78fff3f690584fc3c49c863d53f321 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 15 Nov 2022 13:41:46 +0900
Subject: [PATCH] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlogreader.c   | 135 +-
 src/backend/access/transam/xlogrecovery.c |  94 +++
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 ++-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/011_crash_recovery.pl | 106 +
 6 files changed, 298 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 93f667b2544..137de967951 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -48,6 +48,8 @@ static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 		pfree(state);
 		return NULL;
 	}
+	state->EndOfWAL = false;
 	state->errormsg_buf[0] = '\0';
 
 	/*
@@ -558,6 +561,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 	/* reset error state */
 	state->errormsg_buf[0] = '\0';
 	decoded = NULL;
+	state->EndOfWAL = false;
 
 	state->abortedRecPtr = InvalidXLogRecPtr;
 	state->missingContrecPtr = InvalidXLogRecPtr;
@@ -640,25 +644,21 @@ restart:
 	Assert(pageHeaderSize <= readOff);
 
 	/*
-	 * Read the record length.
+	 * Validate the record header.
 	 *
-	 * NB: Even though we use an XLogRecord pointer here, the whole record
-	 * header might not fit on this page. xl_tot_len is the first field of the
-	 * struct, so it must be on this page (the records are MAXALIGNed), but we
-	 * cannot access any other fields until we've verified that we got the
-	 * whole header.
-	 */
-	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-	total_len = record->xl_tot_len;
-
-	/*
-	 * If the whole record header is on this page, validate it immediately.
-	 * Otherwise do just a basic sanity check on xl_tot_len, and validate the
-	 * rest of the header after reading it from the next page.  The xl_tot_len
+	 * Even though we use an XLogRecord pointer here, the whole record header
+	 * might not fit on this page.  If the whole record header is on this page,
+	 * validate it immediately.  Even otherwise xl_tot_len must be on this page
+	 * (it is the first field of MAXALIGNed records), but we still cannot
+	 * access any further fields until we've verified that we got the whole
+	 * header, so do just a basic sanity check on record length, and validate
+	 * the rest of the header after reading it from the next page.  The length
 	 * check is necessary here to ensure that we enter the "Need to reassemble
 	 * record" code path below; otherwise we might fail to apply
 	 * ValidXLogRecordHeader at all.
 	 */
+	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
+
 	if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
 	{
 		if (!ValidXLogRecordHeader(state, RecPtr, state->DecodeRecPtr, record,
@@ -668,18 +668,14 @@ restart:
 	}
 	else
 	{
-		/* XXX: more validation should be done here */
-		if (total_len < SizeOfXLogRecord)
-		{
-			report_invalid_record(state,
-  "invalid record length at %X/%X: wanted %u, got %u",
-  LSN_FORMAT_ARGS(RecPtr),
-  (uint32) SizeOfXLogRecord, total_len);
+		if (!ValidXLogRecordLength(state, RecPtr, record))
 			goto err;
-		}
+
 		gotheader = false;
 	}
 
+	total_len = record->xl_tot_len;
+
 	/*
 	 * Find space to decode this record.  Don't allow oversized allocation if
 	 * the caller requested nonblocking.  Otherwise, we *have* to try to
@@ -1106,16 +1102,47 @@ XLogReaderInvalReadState(XLogReaderState *state)
 }
 
 /*
- *

Re: fixing CREATEROLE

2022-11-22 Thread Robert Haas

On Tue, Nov 22, 2022 at 3:01 PM Mark Dilger
 wrote:
> > On Nov 21, 2022, at 12:39 PM, Robert Haas  wrote:
> > I have drafted a few patches to try to improve the situation.
>
> The 0001 and 0002 patches appear to be uncontroversial refactorings.  Patch 
> 0003 looks on-point and a move in the right direction.  The commit message in 
> that patch is well written.

Thanks.

> Patch 0004 feels like something that won't get committed.  The 
> INHERITCREATEDROLES and SETCREATEDROLES in 0004 seems clunky.

I think role properties are kind of clunky in general, the way we've
implemented them in PostgreSQL, but I don't really see why these are
worse than anything else. I think we need some way to control the
behavior, and I don't really see a reasonable place to put it other
than a per-role property. And if we're going to do that then they
might as well look like the other properties that we've already got.

Do you have a better idea?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Prefetch the next tuple's memory during seqscans

2022-11-22 Thread David Rowley

On Thu, 3 Nov 2022 at 22:09, John Naylor  wrote:
> I tried a similar test, but with text fields of random length, and there is 
> improvement here:

Thank you for testing that. Can you share which CPU this was on?

My tests were all on AMD Zen 2. I'm keen to see what the results are
on intel hardware.

David

Re: Prefetch the next tuple's memory during seqscans

2022-11-22 Thread David Rowley

On Thu, 3 Nov 2022 at 06:25, Andres Freund  wrote:
> Attached is an experimental patch/hack for that. It ended up being more
> beneficial to make the access ordering more optimal than prefetching the tuple
> contents, but I'm not at all sure that's the be-all-end-all.

Thanks for writing that patch. I've been experimenting with it.

I tried unrolling the loop (patch 0003) as you mentioned in:

+ * FIXME: Worth unrolling so that we don't fetch the same cacheline
+ * over and over, due to line items being smaller than a cacheline?

but didn't see any gains from doing that.

I also adjusted your patch a little so that instead of doing:

- OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */
+ OffsetNumber *rs_vistuples;
+ OffsetNumber rs_vistuples_d[MaxHeapTuplesPerPage]; /* their offsets */

to work around the issue of having to populate rs_vistuples_d in
reverse, I added a new field called rs_startindex to mark where the
first element in the rs_vistuples array is. The way you wrote it seems
to require fewer code changes, but per the FIXME comment you left, I
get the idea you just did it the way you did to make it work enough
for testing.

I'm quite keen to move forward in committing the 0001 patch to add the
pg_prefetch_mem macro. What I'm a little undecided about is what the
best patch is to commit first to make use of the new macro.

I did some tests on the attached set of patches:

alter system set max_parallel_workers_per_gather = 0;
select pg_reload_conf();

create table t as select a from generate_series(1,1000)a;
alter table t set (autovacuum_enabled=false);

$ cat bench.sql
select * from t where a = 0;

psql -c "select pg_prewarm('t');" postgres

-- Test 1 no frozen tuples in "t"

Master (@9c6ad5eaa):
$ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
latency average = 383.332 ms
latency average = 375.747 ms
latency average = 376.090 ms

Master + 0001 + 0002:
$ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
latency average = 370.133 ms
latency average = 370.149 ms
latency average = 370.157 ms

Master + 0001 + 0005:
$ pgbench -n -f bench.sql -M prepared -T 10 postgres | grep -E "^latency"
latency average = 372.662 ms
latency average = 371.034 ms
latency average = 372.709 ms

-- Test 2 "select count(*) from t" with all tuples frozen

$ cat bench1.sql
select count(*) from t;

psql -c "vacuum freeze t;" postgres
psql -c "select pg_prewarm('t');" postgres

Master (@9c6ad5eaa):
$ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
latency average = 406.238 ms
latency average = 407.029 ms
latency average = 406.962 ms

Master + 0001 + 0005:
$ pgbench -n -f bench1.sql -M prepared -T 10 postgres | grep -E "^latency"
latency average = 345.470 ms
latency average = 345.775 ms
latency average = 345.354 ms

My current thoughts are that it might be best to go with 0005 to start
with.  I know Melanie is working on making some changes in this area,
so perhaps it's best to leave 0002 until that work is complete.

David
From 491df9d6ab87a54bbc76b876484733d02d6c94ea Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Wed, 19 Oct 2022 08:54:01 +1300
Subject: [PATCH v2 1/5] Add pg_prefetch_mem() macro to load cache lines.

Initially mapping to GCC, Clang and MSVC builtins.

Discussion: 
https://postgr.es/m/CAEepm%3D2y9HM9QP%2BHhRZdQ3pU6FShSMyu%3DV1uHXhQ5gG-dketHg%40mail.gmail.com
---
 config/c-compiler.m4   | 17 
 configure  | 40 ++
 configure.ac   |  3 +++
 meson.build|  3 ++-
 src/include/c.h|  8 
 src/include/pg_config.h.in |  3 +++
 src/tools/msvc/Solution.pm |  1 +
 7 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 000b075312..582a47501c 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -355,6 +355,23 @@ AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1,
[Define to 1 if your compiler understands $1.])
 fi])# PGAC_CHECK_BUILTIN_FUNC
 
+# PGAC_CHECK_BUILTIN_VOID_FUNC
+# ---
+# Variant for void functions.
+AC_DEFUN([PGAC_CHECK_BUILTIN_VOID_FUNC],
+[AC_CACHE_CHECK(for $1, pgac_cv$1,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([
+void
+call$1($2)
+{
+$1(x);
+}], [])],
+[pgac_cv$1=yes],
+[pgac_cv$1=no])])
+if test x"${pgac_cv$1}" = xyes ; then
+AC_DEFINE_UNQUOTED(AS_TR_CPP([HAVE$1]), 1,
+   [Define to 1 if your compiler understands $1.])
+fi])# PGAC_CHECK_BUILTIN_VOID_FUNC
 
 
 # PGAC_CHECK_BUILTIN_FUNC_PTR
diff --git a/configure b/configure
index 3966368b8d..c4685b8a1e 100755
--- a/configure
+++ b/configure
@@ -15988,6 +15988,46 @@ _ACEOF
 
 fi
 
+# Can we use a built-in to prefetch memory?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_prefetch" >&5
+$as_echo_n "checking for __builtin_prefetch... " >&6; }
+if ${pgac_cv__builtin_prefetch+:} false; then :
+  $as_echo_n "(cached) "

Re: Slow standby snapshot

2022-11-22 Thread Michail Nikolaev

Hello, everyone.

I have tried to put it all together.

> In the absence of that approach, falling back to a counter that
> compresses every N xids would be best, in addition to the two new
> forced compression events.

Done.

> Also, if we add more forced compressions, it seems like we should have
> a short-circuit for a forced compression where there's nothing to do.

Done.

> I'm also wondering why there's not an
>
> Assert(compress_index == pArray->numKnownAssignedXids);
>
> after the loop, to make sure our numKnownAssignedXids tracking
> is sane.

Done.

> * when idle - since we have time to do it when that happens, which
> happens often since most workloads are bursty

I have added getting of ProcArrayLock for this case.
Also, I have added maximum frequency as 1 per second to avoid
contention with heavy read load in case of small,
episodic but regular WAL traffic (WakeupRecovery() for each 100ms for
example). Or it is useless?

> It'd be more reliable
> to use a static counter to skip all but every N'th compress attempt
> (something we could do inside KnownAssignedXidsCompress itself, instead
> of adding warts at the call sites).

Done. I have added “reason” enum for calling KnownAssignedXidsCompress
to keep it as much clean as possible.
But not sure that I was successful here.

Also, I think while we still in the context, it is good to add:
* Simon's optimization (1) for KnownAssignedXidsRemoveTree (it is
simple and effective for some situations without any seen drawbacks)
* Maybe my patch (2) for replacing known_assigned_xids_lck with memory barrier?

New version in attach. Running benchmarks now.
Preliminary result in attachments (16CPU, 5000 max_connections, now 64
active connections instead of 16).
Also, interesting moment - with 64 connections, vanilla version is
unable to recover its performance after 30-sec transaction on primary.

[1]: 
https://www.postgresql.org/message-id/flat/CANbhV-Ey8HRYPvnvQnsZAteCfzN3VHVhZVKfWMYcnjMnSzs4dQ%40mail.gmail.com#05993cf2bc87e35e0dff38fec26b9805
[2]: 
https://www.postgresql.org/message-id/flat/CANtu0oiPoSdQsjRd6Red5WMHi1E83d2%2B-bM9J6dtWR3c5Tap9g%40mail.gmail.com#cc4827dee902978f93278732435e8521

--
Michail Nikolaev
From 926403598e9506edfa32c9534be9a4e48f756002 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev 
Date: Wed, 23 Nov 2022 00:13:32 +0300
Subject: [PATCH vnext] Currently, KnownAssignedXidsGetAndSetXmin requires an
 iterative loop through KnownAssignedXids array, including xids marked as
 invalid. Performance impact is especially noticeable in the presence of long
 (few seconds) transactions on primary, high value (few thousands) of
 max_connections and high read workload on standby. Most of the CPU spent on
 looping throw KnownAssignedXids while almost all xid are invalid anyway.
 KnownAssignedXidsCompress removes invalid xid from time to time, but
 performance is still affected.

To increase performance, frequency of running KnownAssignedXidsCompress is increased. Now it is called for each xid % 64 == 0 (number selected by running benchmarks). Also, the minimum bound of element to compress (4 * PROCARRAY_MAXPROCS) is removed.

Additionally, compression is called on RecoveryInfo and idle state of startup process.

Simon Riggs, Michail Nikolaev with help by Tom Lane and
Andres Freund.
---
 src/backend/access/transam/xlogrecovery.c |   3 +
 src/backend/storage/ipc/procarray.c   | 103 +-
 src/include/storage/standby.h |   1 +
 3 files changed, 85 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..2cc9581cb6 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3836,6 +3836,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		streaming_reply_sent = true;
 	}
 
+	/* Do any background tasks that might benefit us later */
+	StartupProcessIdleMaintenance();
+
 	/* Update pg_stat_recovery_prefetch before sleeping. */
 	XLogPrefetcherComputeStats(xlogprefetcher);
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 283517d956..fde748519e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -274,6 +274,10 @@ static TransactionId *KnownAssignedXids;
 static bool *KnownAssignedXidsValid;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
+/* Counters for KnownAssignedXids compression heuristic */
+static int transactionEndsCounter;
+static TimestampTz lastCompressTs;
+
 /*
  * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
  * the highest xid that might still be running that we don't have in
@@ -335,8 +339,16 @@ static void DisplayXidCache(void);
 #define xc_slow_answer_inc()		((void) 0)
 #endif			/* XIDCACHE_DEBUG */
 
+typedef enum KnownAssignedXidsCompressReason
+{
+	NO_SPACE,
+	RECOVERY_INFO,
+

Re: Introduce a new view for checkpointer related stats

2022-11-22 Thread Andres Freund

Hi,

On 2022-11-22 18:08:28 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/catalog/system_views.sql 
> b/src/backend/catalog/system_views.sql
> index 2d8104b090..131d949dfb 100644
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1105,18 +1105,22 @@ CREATE VIEW pg_stat_archiver AS
>  
>  CREATE VIEW pg_stat_bgwriter AS
>  SELECT
> -pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
> -pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
> -pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
> -pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
> -pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
>  pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
>  pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
> -pg_stat_get_buf_written_backend() AS buffers_backend,
> -pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
>  pg_stat_get_buf_alloc() AS buffers_alloc,
>  pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
>  
> +CREATE VIEW pg_stat_checkpointer AS
> +SELECT
> +pg_stat_get_timed_checkpoints() AS checkpoints_timed,
> +pg_stat_get_requested_checkpoints() AS checkpoints_req,
> +pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
> +pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
> +pg_stat_get_buf_written_checkpoints() AS buffers_checkpoint,
> +pg_stat_get_buf_written_backend() AS buffers_backend,
> +pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
> +pg_stat_get_checkpointer_stat_reset_time() AS stats_reset;


I think we should consider deprecating the pg_stat_bgwriter columns but
leaving them in place for a few years. New stuff should only be added to
pg_stat_checkpointer, but we don't need to break old monitoring queries.

Greetings,

Andres Freund

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Stavros Koureas

Reading more carefully what you described, I think you are interested in 
getting something you call origin from publishers, probably some metadata from 
the publications.

This identifier in those metadata maybe does not have business value on the 
reporting side. The idea is to use a value which has specific meaning to the 
user at the end.

For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
based on a dimension table which holds this mapping the user would be able to 
filter the data. So programmatically the user can set the id value of the 
column plus creating the mapping table from an application let’s say and be 
able to distinguish the data.

In addition this column should have the ability to be part of the primary key 
on the subscription table in order to not conflict with lines from other 
tenants having the same keys.

> 
> 22 Νοε 2022, 14:52, ο χρήστης «Stavros Koureas » 
> έγραψε:
> 
> 
> Sure, this can be implemented as a subscription option, and it will cover 
> this use case scenario as each subscriber points only to one database.
> I also have some more analytical/reporting use-cases which need additions in 
> logical-replication, I am not sure if I need to open different discussions 
> for each one, all ideas are for publication/subscription.
> 
> Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila 
>  έγραψε:
>> On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
>>  wrote:
>> >
>> > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
>> >  wrote:
>> > >
>> > > What does not support is the option for defining custom column 
>> > > expressions, as keys or values, into the upstream (publication). This 
>> > > will give more flexibility into making replication from multiple 
>> > > upstreams into less downstreams adding more logic. For instance, in a 
>> > > project for analytical purposes there is the need to consolidate data 
>> > > from multiple databases into one and at the same time keep the origin of 
>> > > each replicated data identified by a tenanant_id column. In this case we 
>> > > also need the ability to define the new column as an additional key 
>> > > which will participate into the destination table.
>> > >
>> > > Tenant 1 table
>> > > id serial pk
>> > > description varchar
>> > >
>> > > Tenant 2 table
>> > > id integer pk
>> > > description varchar
>> > >
>> > > Group table
>> > > tenant integer pk
>> > > id integer pk
>> > > description varchar
>> > >
>> > > Possible syntax to archive that
>> > > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} 
>> > > ,"id", "name")
>> > >
>> > > Example
>> > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} 
>> > > ,"id", "name")
>> >
>> > I think that's a valid usecase.
>> >
>> > This looks more like a subscription option to me. In multi-subscriber
>> > multi-publisher scenarios, on one subscriber a given upstream may be
>> > tenant 1 but on some other it could be 2. But I don't think we allow
>> > specifying subscription options for a single table. AFAIU, the origin
>> > ids are available as part of the commit record which contained this
>> > change; that's how conflict resolution is supposed to know it. So
>> > somehow the subscriber will need to fetch those from there and set the
>> > tenant.
>> >
>> 
>> Yeah, to me also it appears that we can handle it on the subscriber
>> side. We have the provision of sending origin information in proto.c.
>> But note that by default publishers won't have any origin associated
>> with change unless someone has defined it. I think this work needs
>> more thought but sounds to be an interesting feature.
>> 
>> -- 
>> With Regards,
>> Amit Kapila.

Re: fixing CREATEROLE

2022-11-22 Thread Mark Dilger

> On Nov 21, 2022, at 12:39 PM, Robert Haas  wrote:
> 
> I have drafted a few patches to try to improve the situation.

The 0001 and 0002 patches appear to be uncontroversial refactorings.  Patch 
0003 looks on-point and a move in the right direction.  The commit message in 
that patch is well written.  Patch 0004 feels like something that won't get 
committed.  The INHERITCREATEDROLES and SETCREATEDROLES in 0004 seems clunky.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support % wildcard in extension upgrade filenames

2022-11-22 Thread Regina Obe

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I reviewed this patch 
https://www.postgresql.org/message-id/20221117095734.igldlk6kngr6ogim%40c19

Most look good to me.  The only things that have changed since last I reviewed 
this, with the new patch is

1) wildcard_upgrades=true is no longer needed in the control file and will 
break if present.  This is an expected change, since we are now going strictly 
by presence of % extension upgrade scripts per suggestions
2) The meson build issue has been fixed.  Cirrus is passing on that now.  I'm 
still fiddling with my meson setup, so didn't personally test this.

3) The documentation has been updated to no longer include wildcard_upgrades as 
a variable for control file.

and has this text now describing it's use:

 The literal value % can be used as the old_version component in an extension 
update script for it to match any version. Such wildcard update scripts will 
only be used when no explicit path is found from old to target version.

Given that a suitable update script is available, the command ALTER EXTENSION 
UPDATE will update an installed extension to the specified new version. The 
update script is run in the same environment that CREATE EXTENSION provides for 
installation scripts: in particular, search_path is set up in the same way, and 
any new objects created by the script are automatically added to the extension. 
Also, if the script chooses to drop extension member objects, they are 
automatically dissociated from the extension. 

I built the html docs but ran into a lot of warnings I don't recall getting 
last time.  I think this is just my doc local setup is a bit messed up or 
something else changed in the doc setup.

My main gripe with this patch is Sandro did not increment the version number of 
it, so it's the same name as an old patch he had submitted before.

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Robert Haas

On Tue, Nov 22, 2022 at 1:44 PM Tom Lane  wrote:
> I wrote:
> > Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere
> > else in this loop.
>
> I did some experimentation using the test case Jakub presented
> to start with, and verified that that loop does respond promptly
> to control-C even in HEAD.  So there are CFI(s) in the loop as
> I thought, and we don't need another.

OK. Although an extra CFI isn't such a bad thing, either.

> What we do need is some more work on nearby comments.  I'll
> see about that and push it.

Great!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New docs chapter on Transaction Management and related changes

2022-11-22 Thread Bruce Momjian

On Tue, Nov 22, 2022 at 07:47:26PM +0100, Erik Rijkers wrote:
> Op 22-11-2022 om 19:00 schreef Bruce Momjian:
> > On Mon, Nov 21, 2022 at 11:15:36AM +0100, Laurenz Albe wrote:
> > >..., while both columns will be set in read-write transactions.
> > 
> > Agreed, changed.  Updated patch attached.
> 
> In func.sgml:
> 
> 'Only top-level transaction ID are'  should be
> 'Only top-level transaction IDs are'
> 
> 'subtransaction ID are'  should be
> 'subtransaction IDs are'
> 
> In xact.sgml:
> 
> 'Non-virtual TransactionId (or xid)' should
> be
> 'Non-virtual TransactionIds (or xids)'

Agreed, updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..01c65ede4c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7229,12 +7229,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 82fba48d5f..8affaf2a3d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction, this will return the top-level transaction ID;
+see  for details.

   
 
@@ -24693,6 +24696,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction, this will return the top-level
+transaction ID.

   
 
@@ -24736,6 +24741,9 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are now in-progress.
+Only top-level transaction IDs are included in the snapshot;
+subtransaction IDs are not shown;  see 
+for details.

   
 
@@ -24790,7 +24798,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 Is the given transaction ID visible according
 to this snapshot (that is, was it completed before the snapshot was
 taken)?  Note that this function will not give the correct answer for
-a subtransaction ID.
+a subtransaction ID (subxid);  see  for
+details.

   
  
@@ -24802,8 +24811,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
 wraps around every 4 billion transactions.  However,
 the functions shown in  use a
 64-bit type xid8 that does not wrap around during the life
-of an installation, and can be converted to xid by casting if
-required.  The data type pg_snapshot stores information about
+of an installation and can be converted to xid by casting if
+required;  see  for details. 
+The data type pg_snapshot stores information about
 transaction ID visibility at a particular moment in time.  Its components
 are described in .
 pg_snapshot's textual representation is
@@ -24849,7 +24859,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 xmax and not in this list was already completed at the time
 of the snapshot, and thus is either visible or dead according to its
 commit status.  This list does not include the transaction IDs of
-subtransactions.
+subtransactions (subxids).

Re: New docs chapter on Transaction Management and related changes

2022-11-22 Thread Erik Rijkers


Op 22-11-2022 om 19:00 schreef Bruce Momjian:

On Mon, Nov 21, 2022 at 11:15:36AM +0100, Laurenz Albe wrote:

   ..., while both columns will be set in read-write transactions.


Agreed, changed.  Updated patch attached.


In func.sgml:

'Only top-level transaction ID are'  should be
'Only top-level transaction IDs are'

'subtransaction ID are'  should be
'subtransaction IDs are'

In xact.sgml:

'Non-virtual TransactionId (or xid)' 
should be

'Non-virtual TransactionIds (or xids)'



Erik Rijkers

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Tom Lane

I wrote:
> Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere
> else in this loop.

I did some experimentation using the test case Jakub presented
to start with, and verified that that loop does respond promptly
to control-C even in HEAD.  So there are CFI(s) in the loop as
I thought, and we don't need another.

What we do need is some more work on nearby comments.  I'll
see about that and push it.

regards, tom lane

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Tom Lane

Robert Haas  writes:
> On Tue, Nov 22, 2022 at 11:35 AM Tom Lane  wrote:
>> Is it appropriate to count distinct pages, rather than just the
>> number of times we have to visit a heap tuple?  That seems to
>> complicate the logic a good deal, and I'm not sure it's buying
>> much, especially since (as you noted) it's imprecise anyway.

> FWW, the same question also occurred to me. But after mulling it over,
> what Simon did seems kinda reasonable to me. Although it's imprecise,
> it will generally cause us to stop sooner if we're bouncing all over
> the heap and be willing to explore further if we're just hitting the
> same heap page. I feel like that's pretty reasonable behavior.
> Stopping early could hurt, so if we know that continuing isn't costing
> much, why not?

Fair I guess --- and I did say that I wanted it to be based on number
of pages visited not number of tuples.  So objection withdrawn to that
aspect.

Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere
else in this loop.

regards, tom lane

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Robert Haas

On Tue, Nov 22, 2022 at 11:35 AM Tom Lane  wrote:
> Simon Riggs  writes:
> > New patch version reporting for duty, sir. Please take it from here!
>
> Why the CHECK_FOR_INTERRUPTS?  I'd supposed that there's going to be
> one somewhere down inside the index or heap access --- do you have
> reason to think there isn't?
>
> Is it appropriate to count distinct pages, rather than just the
> number of times we have to visit a heap tuple?  That seems to
> complicate the logic a good deal, and I'm not sure it's buying
> much, especially since (as you noted) it's imprecise anyway.

FWW, the same question also occurred to me. But after mulling it over,
what Simon did seems kinda reasonable to me. Although it's imprecise,
it will generally cause us to stop sooner if we're bouncing all over
the heap and be willing to explore further if we're just hitting the
same heap page. I feel like that's pretty reasonable behavior.
Stopping early could hurt, so if we know that continuing isn't costing
much, why not?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New docs chapter on Transaction Management and related changes

2022-11-22 Thread Bruce Momjian

On Mon, Nov 21, 2022 at 11:35:09AM +0100, Álvaro Herrera wrote:
> Agreed on not using "unaborted", per previous discussion.
> 
> On 2022-Nov-21, Laurenz Albe wrote:
> 
> > Perhaps we should also avoid the term "transaction block".  Even without 
> > speaking
> > of a "block", way too many people confuse PL/pgSQL's BEGIN ... END blocks
> > with transactions.  On the other hand, we use "transaction block" everywhere
> > else in the documentation...
> 
> Yeah, I don't understand why we need this "transaction block" term at
> all.  It adds nothing.  We could just use the term "transaction", and
> little meaning would be lost.  When necessary, we could just say
> "explicit transaction" or something to that effect.  In this particular
> case, we could modify your proposed wording,

Yes, I just posted that same thing:

Transactions can be created explicitly using

> >   
> >Multi-statement transactions can be created explicitly using
> >BEGIN or START TRANSACTION and
> >are ended using COMMIT or ROLLBACK.
> >An SQL statement outside of a transaction block automatically uses
> >a single-statement transaction.
> >   
> 
> by removing the word "block":
> 
> >Any SQL statement outside of an transaction automatically uses
> >a single-statement transaction.
> 
> and perhaps add "explicit", but I don't think it's necessary:
> 
> >Any SQL statement outside of an explicit transaction automatically
> >uses a single-statement transaction.

Full paragraph is now:

   Transactions can be created explicitly using BEGIN
   or START TRANSACTION and ended using
   COMMIT or ROLLBACK.  An SQL
   statement outside of an explicit transaction automatically uses a
   single-statement transaction.

> (I also changed "An" to "Any" because it seems more natural, but I
> suppose it's a stylistic choice.)

I think we have a plurality mismatch so I went with "SQL statements" and
didn't need "an" or "any" (even newer paragraph version):

   Transactions can be created explicitly using BEGIN
   or START TRANSACTION and ended using
   COMMIT or ROLLBACK.  SQL
   statements outside of explicit transactions automatically use
   single-statement transactions.

Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..01c65ede4c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7229,12 +7229,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 82fba48d5f..2a083eb015 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction, this will return the top-level transaction ID;
+see  for details.

   
 
@@ -24693,6 +24696,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction, this will return the top-level
+transaction ID.

   
 
@@ -24736,6 +24741,9 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are

Re: More efficient build farm animal wakeup?

2022-11-22 Thread Magnus Hagander

On Tue, Nov 22, 2022 at 12:10 AM Magnus Hagander 
wrote:

>
>
> On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan 
> wrote:
>
>>
>> On 2022-11-21 Mo 16:20, Magnus Hagander wrote:
>> > n Mon, Nov 21, 2022 at 9:58 PM Tom Lane  wrote:
>> >
>> > Andrew Dunstan  writes:
>> b> > The buildfarm server now creates a companion to
>> > branches_of_interest.txt
>> > > called branches_of_interest.json which looks like this:
>> >
>> > ... okay ...
>> >
>> >
>> > Yeah, it's not as efficient as something like long polling or web
>> > sockets, but it is most definitely a lot simpler!
>> >
>> > If we're going to have a lot of animals do pulls of this file every
>> > minute or more, it's certainly a lot better to pull this small file
>> > than to make multiple git calls.
>> >
>> > It could trivially be made even more efficient by making the request
>> > with either a If-None-Match or If-Modified-Since. While it's still
>> > small, that cuts the size approximately in half, and would allow you
>> > to skip even more processing if nothing has changed.
>>
>>
>> I'll look at that.
>>
>>
>> >
>> >
>> > > It updates this every time it does a git fetch, currently every
>> > 5 minutes.
>> >
>> > That up-to-five-minute delay, on top of whatever cronjob delay one
>> has
>> > on one's animals, seems kind of sad.  I've gotten kind of spoiled
>> > maybe
>> > by seeing first buildfarm results typically within 15 minutes of a
>> > push.
>> > But if we're trying to improve matters in this area, this doesn't
>> seem
>> > like quite the way to go.
>> >
>> > But it does seem like this eliminates one expense.  Now that you
>> have
>> > that bit, maybe we could arrange a webhook or something that allows
>> > branches_of_interest.json to get updated immediately after a push?
>> >
>> >
>> > Webhooks are definitely a lot easier to implement in between our
>> > servers yeah, so that shouldn't be too hard. We could use the same
>> > hooks that we use for borka to build the docs, but have it just run
>> > whatever script it is the buildfarm needs. I assume it's just
>> > something trivial to run there, Andrew?
>>
>>
>> Yes, I think much better between servers. Currently the cron job looks
>> something like this:
>>
>>
>> */5 * * * * cd $HOME/postgresql.git && git fetch -q &&
>> $HOME/website/bin/branches_of_interest.pl
>>
>>
>> That script is what sets up the json files.
>>
>>
>> I know nothing about git webhooks though, someone will have to point me
>> in the right direction.
>>
>
> I can set that up for you -- we have ready-made packages for 95% of what's
> needed for that one as we use it elsewhere in the infra. So I'll just set
> something up that will run that exact script (as the correct user of
> course) and comment out the cronjob,and then send you the details of what
> is set up where (I don't recall it offhand, but as it's the same we have
> elsewhere I'll find it quickly once I look into it).
>
>

Hi!

This should now be set up, and Andrew has been sent the instructions for
how to access that setup on the buildfarm server. So hopefully it will now
be updating the buildfarm server side of things within a couple of seconds
from a commit, and not do any speculative pulls. But we'll keep an extra
eye on it for a bit of course, as it's entirely possible I got something
worng :)

(This is only the part git -> bf server, of course, as that step doesn't
need any client changes it was easier to do quickly)

//Magnus

Re: New docs chapter on Transaction Management and related changes

2022-11-22 Thread Bruce Momjian

On Mon, Nov 21, 2022 at 11:15:36AM +0100, Laurenz Albe wrote:
> > --- a/doc/src/sgml/ref/release_savepoint.sgml
> > +++ b/doc/src/sgml/ref/release_savepoint.sgml
> 
> > +   RELEASE SAVEPOINT releases the named savepoint and
> > +   all active savepoints that were created after the named savepoint,
> > +   and frees their resources.  All changes made since the creation of the
> > +   savepoint, excluding rolled back savepoints changes, are merged into
> > +   the transaction or savepoint that was active when the named savepoint
> > +   was created.  Changes made after RELEASE SAVEPOINT
> > +   will also be part of this active transaction or savepoint.
> 
> I am not sure if "rolled back savepoints changes" is clear enough.
> I understand that you are trying to avoid the term "subtransaction".
> How about:
> 
>   All changes made since the creation of the savepoint that didn't already
>   get rolled back are merged ...

Yes, I like that, changed.

> > --- a/doc/src/sgml/ref/rollback.sgml
> > +++ b/doc/src/sgml/ref/rollback.sgml
> >
> > +  If AND CHAIN is specified, a new (unaborted)
> 
> *Sigh* I'll make one last plea for "not aborted".

Uh, I thought you wanted "unaborted", but I now changed it to "not
aborted".

> > --- /dev/null
> > +++ b/doc/src/sgml/xact.sgml
> 
> > +  
> > +   Transactions can be created explicitly using BEGIN
> > +   and COMMIT, which creates a transaction block.
> > +   An SQL statement outside of a transaction block automatically uses
> > +   a single-statement transaction.
> > +  
> 
> Sorry, I should have noted that the first time around.
> 
> Transactions are not created using COMMIT.
> 
> Perhaps we should also avoid the term "transaction block".  Even without 
> speaking
> of a "block", way too many people confuse PL/pgSQL's BEGIN ... END blocks
> with transactions.  On the other hand, we use "transaction block" everywhere
> else in the documentation...
> 
> How about:
> 
>   
>Multi-statement transactions can be created explicitly using
>BEGIN or START TRANSACTION and
>are ended using COMMIT or ROLLBACK.
>An SQL statement outside of a transaction block automatically uses
>a single-statement transaction.
>   

I used your wording, but technically you can use BEGIN/COMMIT with a
single statement, so multi-statement it not a requirement, so I used
your text but removed "Multi-statement":

Transactions can be created explicitly using BEGIN or
START TRANSACTION and ended using
COMMIT or ROLLBACK.

> > + 
> > +
> > +  Transactions and Locking
> > +
> > +  
> > +   The transaction IDs of currently executing transactions are shown in
> > +   pg_locks
> > +   in columns virtualxid and
> > +   transactionid.  Read-only transactions
> > +   will have virtualxids but NULL
> > +   transactionids, while read-write transactions
> > +   will have both as non-NULL.
> > +  
> 
> Perhaps the following will be prettier than "have both as non-NULL":
> 
>   ..., while both columns will be set in read-write transactions.

Agreed, changed.  Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..01c65ede4c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7229,12 +7229,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 82fba48d5f..2a083eb015 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the

Re: [PATCH] Const'ify the arguments of ilist.c/ilist.h functions

2022-11-22 Thread Andres Freund

Hi,

On 2022-11-07 12:03:23 +0300, Aleksander Alekseev wrote:
> During the [1] discussion it was suggested to constify the arguments
> of ilist.c/ilist.h functions. Bharath (cc'ed) pointed out that it's
> better to start a new thread in order to attract more hackers that may
> be interested in this change, so I started one.

I needed something similar in 
https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2%40awork3.anarazel.de



> @@ -484,7 +484,7 @@ dlist_has_prev(dlist_head *head, dlist_node *node)
>   * Return the next node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dlist_next_node(dlist_head *head, dlist_node *node)
> +dlist_next_node(const dlist_head *head, const dlist_node *node)
>  {
>   Assert(dlist_has_next(head, node));
>   return node->next;
> @@ -494,7 +494,7 @@ dlist_next_node(dlist_head *head, dlist_node *node)
>   * Return previous node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dlist_prev_node(dlist_head *head, dlist_node *node)
> +dlist_prev_node(const dlist_head *head, const dlist_node *node)
>  {
>   Assert(dlist_has_prev(head, node));
>   return node->prev;
> @@ -502,7 +502,7 @@ dlist_prev_node(dlist_head *head, dlist_node *node)
>  
>  /* internal support function to get address of head element's struct */
>  static inline void *
> -dlist_head_element_off(dlist_head *head, size_t off)
> +dlist_head_element_off(const dlist_head *head, size_t off)
>  {
>   Assert(!dlist_is_empty(head));
>   return (char *) head->head.next - off;
> @@ -512,14 +512,14 @@ dlist_head_element_off(dlist_head *head, size_t off)
>   * Return the first node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dlist_head_node(dlist_head *head)
> +dlist_head_node(const dlist_head *head)
>  {
>   return (dlist_node *) dlist_head_element_off(head, 0);
>  }
>  
>  /* internal support function to get address of tail element's struct */
>  static inline void *
> -dlist_tail_element_off(dlist_head *head, size_t off)
> +dlist_tail_element_off(const dlist_head *head, size_t off)
>  {
>   Assert(!dlist_is_empty(head));
>   return (char *) head->head.prev - off;
> @@ -529,7 +529,7 @@ dlist_tail_element_off(dlist_head *head, size_t off)
>   * Return the last node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dlist_tail_node(dlist_head *head)
> +dlist_tail_node(const dlist_head *head)
>  {
>   return (dlist_node *) dlist_tail_element_off(head, 0);
>  }

I don't think it is correct for any of these to add const. The only reason it
works is because of casting etc.


> @@ -801,7 +801,7 @@ dclist_has_prev(dclist_head *head, dlist_node *node)
>   *   Return the next node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dclist_next_node(dclist_head *head, dlist_node *node)
> +dclist_next_node(const dclist_head *head, const dlist_node *node)
>  {
>   Assert(head->count > 0);
>  
> @@ -813,7 +813,7 @@ dclist_next_node(dclist_head *head, dlist_node *node)
>   *   Return the prev node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dclist_prev_node(dclist_head *head, dlist_node *node)
> +dclist_prev_node(const dclist_head *head, const dlist_node *node)
>  {
>   Assert(head->count > 0);
>  
> @@ -822,7 +822,7 @@ dclist_prev_node(dclist_head *head, dlist_node *node)
>  
>  /* internal support function to get address of head element's struct */
>  static inline void *
> -dclist_head_element_off(dclist_head *head, size_t off)
> +dclist_head_element_off(const dclist_head *head, size_t off)
>  {
>   Assert(!dclist_is_empty(head));
>  
> @@ -834,7 +834,7 @@ dclist_head_element_off(dclist_head *head, size_t off)
>   *   Return the first node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dclist_head_node(dclist_head *head)
> +dclist_head_node(const dclist_head *head)
>  {
>   Assert(head->count > 0);
>  
> @@ -843,7 +843,7 @@ dclist_head_node(dclist_head *head)
>  
>  /* internal support function to get address of tail element's struct */
>  static inline void *
> -dclist_tail_element_off(dclist_head *head, size_t off)
> +dclist_tail_element_off(const dclist_head *head, size_t off)
>  {
>   Assert(!dclist_is_empty(head));
>  
> @@ -854,7 +854,7 @@ dclist_tail_element_off(dclist_head *head, size_t off)
>   * Return the last node in the list (there must be one).
>   */
>  static inline dlist_node *
> -dclist_tail_node(dclist_head *head)
> +dclist_tail_node(const dclist_head *head)
>  {
>   Assert(head->count > 0);
>  

Dito.


> @@ -988,7 +988,7 @@ slist_has_next(slist_head *head, slist_node *node)
>   * Return the next node in the list (there must be one).
>   */
>  static inline slist_node *
> -slist_next_node(slist_head *head, slist_node *node)
> +slist_next_node(const slist_head *head, const slist_node *node)
>  {
>   Assert(slist_has_next(head, node));
>   return

Re: Make mesage at end-of-recovery less scary.

2022-11-22 Thread Andres Freund

Hi,

On 2022-11-18 17:25:37 +0900, Kyotaro Horiguchi wrote:
> Just rebased.

Fails with address sanitizer:
https://cirrus-ci.com/task/5632986241564672

Unfortunately one of the failures is in pg_waldump and we don't seem to
capture its output in 011_crash_recovery. So we don't see the nice formattted
output...

[11:07:18.868] #0  0x7fcf43803ce1 in raise () from 
/lib/x86_64-linux-gnu/libc.so.6
[11:07:18.912] 
[11:07:18.912] Thread 1 (Thread 0x7fcf43662780 (LWP 39124)):
[11:07:18.912] #0  0x7fcf43803ce1 in raise () from 
/lib/x86_64-linux-gnu/libc.so.6
[11:07:18.912] No symbol table info available.
[11:07:18.912] #1  0x7fcf437ed537 in abort () from 
/lib/x86_64-linux-gnu/libc.so.6
[11:07:18.912] No symbol table info available.
[11:07:18.912] #2  0x7fcf43b8511b in __sanitizer::Abort () at 
../../../../src/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:155
[11:07:18.912] No locals.
[11:07:18.912] #3  0x7fcf43b8fce8 in __sanitizer::Die () at 
../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:58
[11:07:18.912] No locals.
[11:07:18.912] #4  0x7fcf43b7244c in 
__asan::ScopedInErrorReport::~ScopedInErrorReport (this=0x7ffd4fde18e6, 
__in_chrg=) at 
../../../../src/libsanitizer/asan/asan_report.cpp:186
[11:07:18.912] buffer_copy = 
{<__sanitizer::InternalMmapVectorNoCtor> = {data_ = 0x7fcf4035 '=' 
, "\n==39124==ERROR: AddressSanitizer: heap-buffer-overflow 
on address 0x62502100 at pc 0x55c36c21e315 bp 0x7ffd4fde2550 sp 
0x7ffd4fde2"..., capacity_bytes_ = 65536, size_ = }, }
...
[11:07:18.912] #6  0x7fcf43b72788 in __asan::__asan_report_load1 
(addr=) at ../../../../src/libsanitizer/asan/asan_rtl.cpp:117
[11:07:18.912] bp = 140725943412048
[11:07:18.912] pc = 
[11:07:18.912] local_stack = 140528180793728
[11:07:18.912] sp = 140725943412040
[11:07:18.912] #7  0x55c36c21e315 in ValidXLogRecordLength 
(state=state@entry=0x61a00680, RecPtr=RecPtr@entry=33655480, 
record=record@entry=0x62500bb8) at xlogreader.c:1126
[11:07:18.912] p = 
[11:07:18.912] pe = 0x62502100 ""
[11:07:18.912] #8  0x55c36c21e3b1 in ValidXLogRecordHeader 
(state=state@entry=0x61a00680, RecPtr=RecPtr@entry=33655480, 
PrevRecPtr=33655104, record=record@entry=0x62500bb8, 
randAccess=randAccess@entry=false) at xlogreader.c:1169
[11:07:18.912] No locals.

The  most important bit is "AddressSanitizer: heap-buffer-overflow on address 
0x625\
02100 at pc 0x55c36c21e315 bp 0x7ffd4fde2550 sp 0x7ffd4fde2"

Greetings,

Andres Freund

Re: Allow single table VACUUM in transaction block

2022-11-22 Thread Simon Riggs

On Tue, 22 Nov 2022 at 16:43, Justin Pryzby  wrote:
>
> On Mon, Nov 21, 2022 at 03:07:25PM +, Simon Riggs wrote:
> > Attached patch implements VACUUM (BACKGROUND).
> >
> > There are quite a few small details to consider; please read the docs
> > and comments.
> >
> > There is a noticeable delay before the background vacuum starts.
>
> You disallowed some combinations of unsupported options, but not others,
> like FULL, PARALLEL, etc.  They should either be supported or
> prohibited.
>
> +   /* use default values */
> +   tab.at_params.log_min_duration = 0;
>
> 0 isn't the default ?
>
> Maybe VERBOSE should mean to set min_duration=0, otherwise it should use
> the default ?

+1

> You only handle one rel, but ExecVacuum() has a loop around rels.
>
> +NOTICE:  autovacuum of "vactst" has been requested, using the options 
> specified
>
> => I don't think it's useful to say "using the options specified".
>
> Should autovacuum de-duplicate requests ?
> BRIN doesn't do that, but it's intended for append-only tables, so the
> issue doesn't really come up.

Easy to do

> Could add psql tab-completion.
>
> Is it going to be confusing that the session's GUC variables won't be
> transmitted to autovacuum ?  For example, the freeze and costing
> parameters.

I think we should start with the "how do I want it to behave" parts of
the above and leave spelling and tab completion as final items.

Other questions are whether there should be a limit on number of
background vacuums submitted at any time.
Whether there should be a GUC that specifies the max number of queued tasks.
Do we need a query that shows what items are queued?
etc

Justin, if you wanted to take up the patch from here, I would be more
than happy. You have the knowledge and insight to make this work
right.

We should probably start a new CF patch entry so we can return the
original patch as rejected, then continue with this new idea
separately.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Postgres picks suboptimal index after building of an extended statistics

2022-11-22 Thread Andres Freund

Hi,

On 2022-07-11 12:57:36 +0500, Andrey Lepikhov wrote:
> On 7/8/22 03:07, Tom Lane wrote:
> > Andrey Lepikhov  writes:
> > > On 12/8/21 04:26, Tomas Vondra wrote:
> > > > I wonder if we should teach clauselist_selectivity about UNIQUE indexes,
> > > > and improve the cardinality estimates directly, not just costing for
> > > > index scans.
> > 
> > > I tried to implement this in different ways. But it causes additional
> > > overhead and code complexity - analyzing a list of indexes and match
> > > clauses of each index with input clauses in each selectivity estimation.
> > > I don't like that way and propose a new patch in attachment.
> > 
> > I looked at this briefly.  I do not think that messing with
> > btcostestimate/genericcostestimate is the right response at all.
> > The problem can be demonstrated with no index whatever, as in the
> > attached shortened version of the original example.  I get
> 
> I partly agree with you. Yes, I see the problem too. But also we have a
> problem that I described above: optimizer don't choose a path with minimal
> selectivity from a set selectivities which shows cardinality less than 1
> (see badestimate2.sql).
> New patch (see in attachment), fixes this problem.

This causes the mains regression tests to fail due to a planner change:

https://cirrus-ci.com/build/6680222884429824

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-11-22 
12:27:18.852087140 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 
2022-11-22 12:28:47.934938882 +
@@ -6671,10 +6671,9 @@
Merge Cond: (j1.id1 = j2.id1)
Join Filter: (j2.id2 = j1.id2)
->  Index Scan using j1_id1_idx on j1
-   ->  Index Only Scan using j2_pkey on j2
+   ->  Index Scan using j2_id1_idx on j2
  Index Cond: (id1 >= ANY ('{1,5}'::integer[]))
- Filter: ((id1 % 1000) = 1)
-(7 rows)
+(6 rows)
 
 select * from j1
 inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
 
Greetings,

Andres Freund

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-11-22 Thread Andres Freund

On 2022-11-21 17:06:56 +0900, Masahiko Sawada wrote:
> Sure. I've attached the v10 patches. 0004 is the pure refactoring
> patch and 0005 patch introduces the pointer tagging.

This failed on cfbot, with som many crashes that the VM ran out of disk for
core dumps. During testing with 32bit, so there's probably something broken
around that.

https://cirrus-ci.com/task/4635135954386944

A failure is e.g. at: 
https://api.cirrus-ci.com/v1/artifact/task/4635135954386944/testrun/build-32/testrun/adminpack/regress/log/initdb.log

performing post-bootstrap initialization ... 
../src/backend/lib/radixtree.c:1696:21: runtime error: member access within 
misaligned address 0x590faf74 for type 'struct radix_tree_control', which 
requires 8 byte alignment
0x590faf74: note: pointer points here
  90 11 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 
00 00 00 00 00 00 00
  ^
==55813==Using libbacktrace symbolizer.
#0 0x56dcc274 in rt_create ../src/backend/lib/radixtree.c:1696
#1 0x56953d1b in tidstore_create ../src/backend/access/common/tidstore.c:57
#2 0x56a1ca4f in dead_items_alloc 
../src/backend/access/heap/vacuumlazy.c:3109
#3 0x56a2219f in heap_vacuum_rel ../src/backend/access/heap/vacuumlazy.c:539
#4 0x56cb77ed in table_relation_vacuum ../src/include/access/tableam.h:1681
#5 0x56cb77ed in vacuum_rel ../src/backend/commands/vacuum.c:2062
#6 0x56cb9a16 in vacuum ../src/backend/commands/vacuum.c:472
#7 0x56cba904 in ExecVacuum ../src/backend/commands/vacuum.c:272
#8 0x5711b6d0 in standard_ProcessUtility ../src/backend/tcop/utility.c:866
#9 0x5711bdeb in ProcessUtility ../src/backend/tcop/utility.c:530
#10 0x5711759f in PortalRunUtility ../src/backend/tcop/pquery.c:1158
#11 0x57117cb8 in PortalRunMulti ../src/backend/tcop/pquery.c:1315
#12 0x571183d2 in PortalRun ../src/backend/tcop/pquery.c:791
#13 0x57111049 in exec_simple_query ../src/backend/tcop/postgres.c:1238
#14 0x57113f9c in PostgresMain ../src/backend/tcop/postgres.c:4551
#15 0x5711463d in PostgresSingleUserMain ../src/backend/tcop/postgres.c:4028
#16 0x56df4672 in main ../src/backend/main/main.c:197
#17 0xf6ad8e45 in __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x1ae45)
#18 0x5691d0f0 in _start 
(/tmp/cirrus-ci-build/build-32/tmp_install/usr/local/pgsql/bin/postgres+0x3040f0)

Aborted (core dumped)
child process exited with exit code 134
initdb: data directory 
"/tmp/cirrus-ci-build/build-32/testrun/adminpack/regress/tmp_check/data" not 
removed at user's request

Re: Slow standby snapshot

2022-11-22 Thread Simon Riggs

On Tue, 22 Nov 2022 at 16:53, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Tue, 22 Nov 2022 at 16:28, Tom Lane  wrote:
> >> If we do those things, do we need a wasted-work counter at all?
>
> > The wasted work counter works well to respond to heavy read-only
> > traffic and also avoids wasted compressions for write-heavy workloads.
> > So I still like it the best.
>
> This argument presumes that maintenance of the counter is free,
> which it surely is not.  I don't know how bad contention on that
> atomically-updated variable could get, but it seems like it could
> be an issue when lots of processes are acquiring snapshots.

I understand. I was assuming that you and Andres liked that approach.

In the absence of that approach, falling back to a counter that
compresses every N xids would be best, in addition to the two new
forced compression events.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-22 Thread Maxim Orlov

> I've attached a draft patch. To fix it, I think we can reset
> InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder()
> and add an assertion in AllocateSnapshotBuilder() to make sure both
> are reset.
>
Thanks for the patch. It works fine. I've tested this patch for 15 and 11
versions on x86_64 and ARM
and see no fails. But the function pg_current_xact_id added by 4c04be9b05ad
doesn't exist in PG11.


> Regarding the tests, the patch includes a new scenario to
> reproduce this issue. However, since the issue can be reproduced also
> by the existing scenario (with low probability, though), I'm not sure
> it's worth adding the new scenario.
>
AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it
does not worth it. But, I do not have a strong opinion here.
Let's add tests in a separate commit and let the actual committer to decide
what to do, should we?

-- 
Best regards,
Maxim Orlov.

Re: proposal: possibility to read dumped table's name from file

2022-11-22 Thread Pavel Stehule

Hi

út 22. 11. 2022 v 8:39 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> > updated patch attached
>
> It fails with address sanitizer that's now part of CI:
>
> https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659
>
> [06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on
> address 0x61900480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp
> 0x7ffea83e1ac8
> [06:33:11.271] # READ of size 1 at 0x61900480 thread T0
> [06:33:11.271] # #0 0x559f1ac40821 in read_pattern
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
> [06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
> [06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
> [06:33:11.271] # #3 0x559f1ac2bb1b in main
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
> [06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main
> (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
> [06:33:11.271] # #5 0x559f1abe5d29 in _start
> (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
> [06:33:11.271] #
> [06:33:11.271] # 0x61900480 is located 0 bytes to the right of
> 1024-byte region [0x61900080,0x61900480)
> [06:33:11.271] # allocated by thread T0 here:
> [06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
> [06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:30
> [06:33:11.271] # #2 0x559f1ac69f35 in palloc
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:117
> [06:33:11.271] #
> [06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern
>
>
should be fixed in attached patch

I found and fix small memleak 24bytes per filter row (PQExpBufferData)

Regards

Pavel


>
> Greetings,
>
> Andres Freund
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with

Re: Slow standby snapshot

2022-11-22 Thread Tom Lane

Simon Riggs  writes:
> On Tue, 22 Nov 2022 at 16:28, Tom Lane  wrote:
>> If we do those things, do we need a wasted-work counter at all?

> The wasted work counter works well to respond to heavy read-only
> traffic and also avoids wasted compressions for write-heavy workloads.
> So I still like it the best.

This argument presumes that maintenance of the counter is free,
which it surely is not.  I don't know how bad contention on that
atomically-updated variable could get, but it seems like it could
be an issue when lots of processes are acquiring snapshots.

regards, tom lane

Re: Allow single table VACUUM in transaction block

2022-11-22 Thread Justin Pryzby

On Mon, Nov 21, 2022 at 03:07:25PM +, Simon Riggs wrote:
> Attached patch implements VACUUM (BACKGROUND).
> 
> There are quite a few small details to consider; please read the docs
> and comments.
> 
> There is a noticeable delay before the background vacuum starts.

You disallowed some combinations of unsupported options, but not others,
like FULL, PARALLEL, etc.  They should either be supported or
prohibited.

+   /* use default values */
+   tab.at_params.log_min_duration = 0;

0 isn't the default ?

Maybe VERBOSE should mean to set min_duration=0, otherwise it should use
the default ?

You only handle one rel, but ExecVacuum() has a loop around rels.

+NOTICE:  autovacuum of "vactst" has been requested, using the options specified

=> I don't think it's useful to say "using the options specified".

Should autovacuum de-duplicate requests ?
BRIN doesn't do that, but it's intended for append-only tables, so the
issue doesn't really come up.

Could add psql tab-completion.

Is it going to be confusing that the session's GUC variables won't be
transmitted to autovacuum ?  For example, the freeze and costing
parameters.

-- 
Justin

Re: fixing CREATEROLE

2022-11-22 Thread Joe Conway


On 11/21/22 15:39, Robert Haas wrote:

I'm curious to hear what other people think of these proposals, but
let me first say what I think about them. First, I think it's clear
that we need to do something, because things right now are pretty
badly broken and in a way that affects security. Although these
patches are not back-patchable, they at least promise to improve
things as older versions go out of use.


+1


Second, it's possible that we should look for back-patchable fixes
here, but I can't really see that we're going to come up with
anything much better than just telling people not to use this feature
against older releases, because back-patching catalog changes or
dramatic behavior changes seems like a non-starter. In other words, I
think this is going to be a master-only fix.


Yep, seems highly likely


Third, someone could well have a better or just different idea how to
fix the problems in this area than what I'm proposing here. This is
the best that I've been able to come up with so far, but that's not
to say it's free of problems or that no improvements are possible.


On quick inspection I like what you have proposed and no significantly 
"better" ideas jump to mind. I will try to think on it though.



Finally, I think that whatever we do about the code, the documentation
needs quite a bit of work, because the code is doing a lot of stuff
that is security-critical and entirely non-obvious from the
documentation. I have not in this version of these patches included
any documentation changes and the regression test changes that I have
included are quite minimal. That all needs to be fixed up before there
could be any thought of moving forward with these patches. However, I
thought it best to get rough patches and an outline of the proposed
direction on the table first, before doing a lot of work refining
things.


I have looked at, and even done some doc improvements in this area in 
the past, and concluded that it is simply hard to describe it in a 
clear, straightforward way.


There are multiple competing concepts (privs on objects, attributes of 
roles, membership, when things are inherited versus not, settings bound 
to roles, etc). I don't know what to do about it, but yeah, fixing the 
documentation would be a noble goal.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Slow standby snapshot

2022-11-22 Thread Simon Riggs

On Tue, 22 Nov 2022 at 16:28, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > We seem to have replaced one magic constant with another, so not sure
> > if this is autotuning, but I like it much better than what we had
> > before (i.e. better than my prev patch).
>
> Yeah, the magic constant is still magic, even if it looks like it's
> not terribly sensitive to the exact value.
>
> > 1. I was surprised that you removed the limits on size and just had
> > the wasted work limit. If there is no read traffic that will mean we
> > hardly ever compress, which means the removal of xids at commit will
> > get slower over time.  I would prefer that we forced compression on a
> > regular basis, such as every time we process an XLOG_RUNNING_XACTS
> > message (every 15s), as well as when we hit certain size limits.
>
> > 2. If there is lots of read traffic but no changes flowing, it would
> > also make sense to force compression when the startup process goes
> > idle rather than wait for the work to be wasted first.
>
> If we do those things, do we need a wasted-work counter at all?
>
> I still suspect that 90% of the problem is the max_connections
> dependency in the existing heuristic, because of the fact that
> you have to push max_connections to the moon before it becomes
> a measurable problem.  If we do
>
> -if (nelements < 4 * PROCARRAY_MAXPROCS ||
> -nelements < 2 * pArray->numKnownAssignedXids)
> +if (nelements < 2 * pArray->numKnownAssignedXids)
>
> and then add the forced compressions you suggest, where
> does that put us?

The forced compressions I propose happen
* when idle - since we have time to do it when that happens, which
happens often since most workloads are bursty
* every 15s - since we already have lock
which is overall much less often than every 64 commits, as benchmarked
by Michail.
I didn't mean to imply that superceded the wasted work approach, it
was meant to be in addition to.

The wasted work counter works well to respond to heavy read-only
traffic and also avoids wasted compressions for write-heavy workloads.
So I still like it the best.

> Also, if we add more forced compressions, it seems like we should have
> a short-circuit for a forced compression where there's nothing to do.
> So more or less like
>
> nelements = head - tail;
> if (!force)
> {
> if (nelements < 2 * pArray->numKnownAssignedXids)
> return;
> }
> else
> {
> if (nelements == pArray->numKnownAssignedXids)
> return;
> }

+1

> I'm also wondering why there's not an
>
> Assert(compress_index == pArray->numKnownAssignedXids);
>
> after the loop, to make sure our numKnownAssignedXids tracking
> is sane.

+1

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Tom Lane

Simon Riggs  writes:
> New patch version reporting for duty, sir. Please take it from here!

Why the CHECK_FOR_INTERRUPTS?  I'd supposed that there's going to be
one somewhere down inside the index or heap access --- do you have
reason to think there isn't?

Is it appropriate to count distinct pages, rather than just the
number of times we have to visit a heap tuple?  That seems to
complicate the logic a good deal, and I'm not sure it's buying
much, especially since (as you noted) it's imprecise anyway.

regards, tom lane

Re: fixing CREATEROLE

2022-11-22 Thread Tom Lane

walt...@technowledgy.de writes:
>> No, we don't support partial indexes on catalogs, and I don't think
>> we want to change that.  Partial indexes would require expression
>> evaluations occurring at very inopportune times.

> I see. Is that the same for indexes *on* an expression? Or would those 
> be ok?

Right, we don't support those on catalogs either.

> Now, you are going to tell me that EXCLUDE constraints are not supported 
> on catalogs either, right? ;)

Nor those.

regards, tom lane

Re: Slow standby snapshot

2022-11-22 Thread Tom Lane

Simon Riggs  writes:
> We seem to have replaced one magic constant with another, so not sure
> if this is autotuning, but I like it much better than what we had
> before (i.e. better than my prev patch).

Yeah, the magic constant is still magic, even if it looks like it's
not terribly sensitive to the exact value.

> 1. I was surprised that you removed the limits on size and just had
> the wasted work limit. If there is no read traffic that will mean we
> hardly ever compress, which means the removal of xids at commit will
> get slower over time.  I would prefer that we forced compression on a
> regular basis, such as every time we process an XLOG_RUNNING_XACTS
> message (every 15s), as well as when we hit certain size limits.

> 2. If there is lots of read traffic but no changes flowing, it would
> also make sense to force compression when the startup process goes
> idle rather than wait for the work to be wasted first.

If we do those things, do we need a wasted-work counter at all?

I still suspect that 90% of the problem is the max_connections
dependency in the existing heuristic, because of the fact that
you have to push max_connections to the moon before it becomes
a measurable problem.  If we do

-if (nelements < 4 * PROCARRAY_MAXPROCS ||
-nelements < 2 * pArray->numKnownAssignedXids)
+if (nelements < 2 * pArray->numKnownAssignedXids)

and then add the forced compressions you suggest, where
does that put us?

Also, if we add more forced compressions, it seems like we should have
a short-circuit for a forced compression where there's nothing to do.
So more or less like

nelements = head - tail;
if (!force)
{
if (nelements < 2 * pArray->numKnownAssignedXids)
return;
}
else
{
if (nelements == pArray->numKnownAssignedXids)
return;
}

I'm also wondering why there's not an

Assert(compress_index == pArray->numKnownAssignedXids);

after the loop, to make sure our numKnownAssignedXids tracking
is sane.

regards, tom lane

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs

On Mon, 21 Nov 2022 at 20:55, Robert Haas  wrote:
>
> On Mon, Nov 21, 2022 at 1:17 PM Andres Freund  wrote:
> > On November 21, 2022 9:37:34 AM PST, Robert Haas  
> > wrote:
> > >On Mon, Nov 21, 2022 at 12:30 PM Andres Freund  wrote:
> > >> This can't quite be right - isn't this only applying the limit if we 
> > >> found a
> > >> visible tuple?
> > >
> > >It doesn't look that way to me, but perhaps I'm just too dense to see
> > >the problem?
> >
> > The earlier version didn't have the issue, but the latest seems to only 
> > limit after a visible tuple has been found. Note the continue; when 
> > fetching a heap tuple fails.
>
> Oh, that line was removed in Simon's patch but not in Jakub's version,
> I guess. Jakub's version also leaves out the last_block = block line
> which seems pretty critical.

New patch version reporting for duty, sir. Please take it from here!

-- 
Simon Riggshttp://www.EnterpriseDB.com/


0001-Damage-control-for-planner-s-get_actual_variable_end.v3.patch
Description: Binary data

Re: fixing CREATEROLE

2022-11-22 Thread walther


Wolfgang Walther:

Tom Lane:

No, we don't support partial indexes on catalogs, and I don't think
we want to change that.  Partial indexes would require expression
evaluations occurring at very inopportune times.


I see. Is that the same for indexes *on* an expression? Or would those 
be ok?


With a custom operator, an EXCLUDE constraint on the ROW(reldatabase, 
relname) expression could work. The operator would compare:

- (0, name1) and (0, name2) as name1 == name2
- (db1, name1) and (0, name2) as name1 == name2
- (0, name1) and (db2, name2) as name1 == name2
- (db1, name1) and (db2, name2) as db1 == db2 && name1 == name2

or just (db1 == 0 || db2 == 0 || db1 == db2) && name1 == name2.


Does it even need to be on the expression? I don't think so. It would be 
enough to just make it compare relname WITH = and reldatabase WITH the 
custom operator (db1 == 0 || db2 == 0 || db1 == db2), right?


Best,

Wolfgang

Re: Partial aggregates pushdown

2022-11-22 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-22 04:01:

Hi Mr.Vondra, Mr.Pyhalov, Everyone.

I discussed with Mr.Pyhalov about the above draft by directly sending 
mail to
 him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his 
patch

along with the above draft. So I update Mr.Pyhalov's patch v10.



Hi, Yuki. Thank you for your work on this.

I've looked through the patch. Overall I like this approach, but have 
the following comments.


1) Why should we require partialaggfn for min()/max()/count()? We could 
just use original functions for a lot of aggregates, and so it would be 
possible to push down some partial aggregates to older servers. I'm not 
sure that it's a strict requirement, but a nice thing to think about. 
Can we use the function itself as partialaggfn, for example, for 
sum(int4)? For functions with internal aggtranstype (like sum(int8) it 
would be more difficult).


2) fpinfo->server_version is not aggregated, for example, when we form 
fpinfo in foreign_join_ok(), it seems we should spread it in more places 
in postgres_fdw.c.


3) In add_foreign_grouping_paths() it seems there's no need for 
additional argument, we can look at extra->patype. Also Assert() in 
add_foreign_grouping_paths() will fire in --enable-cassert build.


4) Why do you modify lookup_agg_function() signature? I don't see tests, 
showing that it's neccessary. Perhaps, more precise function naming 
should be used instead?


5) In tests:
 - Why version_num does have "name" type in 
f_alter_server_version() function?
 - You modify server_version option of 'loopback' server, but 
don't reset it after test. This could affect further tests.
 - "It's unsafe to push down partial aggregates with distinct" 
in postgres_fdw.sql:3002 seems to be misleading.

3001
3002 -- It's unsafe to push down partial aggregates with distinct
3003 SELECT f_alter_server_version('loopback', 'set', -1);
3004 EXPLAIN (VERBOSE, COSTS OFF)
3005 SELECT avg(d) FROM pagg_tab;
3006 SELECT avg(d) FROM pagg_tab;
3007 select * from pg_foreign_server;

6) While looking at it, could cause a crash with something like

CREATE TYPE COMPLEX AS (re FLOAT, im FLOAT);

CREATE OR REPLACE FUNCTION
sum_complex (sum complex, el complex)
RETURNS complex AS
$$
DECLARE
s complex;
BEGIN
if el is not null and sum is not null then
sum.re:=coalesce(sum.re,0)+el.re;
sum.im:=coalesce(sum.im,0)+el.im;
end if;
RETURN sum;
END;
$$ LANGUAGE plpgSQL;

CREATE AGGREGATE SUM(COMPLEX) (
SFUNC=sum_complex,
STYPE=complex,
partialaggfunc=,
partialagg_minversion=1400
);

where  - something nonexisting


enforce_generic_type_consistency (actual_arg_types=0x56269873d200, 
declared_arg_types=0x0, nargs=1, rettype=0, allow_poly=true) at 
parse_coerce.c:2132
2132Oid decl_type = 
declared_arg_types[j];

(gdb) bt
#0  enforce_generic_type_consistency (actual_arg_types=0x56269873d200, 
declared_arg_types=0x0, nargs=1, rettype=0, allow_poly=true) at 
parse_coerce.c:2132
#1  0x5626960072de in lookup_agg_function (fnName=0x5626986715a0, 
nargs=1, input_types=0x56269873d200, variadicArgType=0, 
rettype=0x7ffd1a4045d8, only_normal=false) at pg_aggregate.c:916
#2  0x5626960064ba in AggregateCreate (aggName=0x562698671000 "sum", 
aggNamespace=2200, replace=false, aggKind=110 'n', numArgs=1, 
numDirectArgs=0, parameterTypes=0x56269873d1e8, allParameterTypes=0, 
parameterModes=0,
parameterNames=0, parameterDefaults=0x0, variadicArgType=0, 
aggtransfnName=0x5626986712c0, aggfinalfnName=0x0, aggcombinefnName=0x0, 
aggserialfnName=0x0, aggdeserialfnName=0x0, aggmtransfnName=0x0, 
aggminvtransfnName=0x0,
aggmfinalfnName=0x0, partialaggfnName=0x5626986715a0, 
finalfnExtraArgs=false, mfinalfnExtraArgs=false, finalfnModify=114 'r', 
mfinalfnModify=114 'r', aggsortopName=0x0, aggTransType=16390, 
aggTransSpace=0, aggmTransType=0,
aggmTransSpace=0, partialaggMinversion=1400, agginitval=0x0, 
aggminitval=0x0, proparallel=117 'u') at pg_aggregate.c:582
#3  0x5626960a1e1c in DefineAggregate (pstate=0x56269869ab48, 
name=0x562698671038, args=0x5626986711b0, oldstyle=false, 
parameters=0x5626986713b0, replace=false) at aggregatecmds.c:450
#4  0x56269643061f in ProcessUtilitySlow (pstate=0x56269869ab48, 
pstmt=0x562698671a68,
queryString=0x5626986705d8 "CREATE AGGREGATE SUM(COMPLEX) 
(\nSFUNC=sum_complex,\nSTYPE=COMPLEX,\npartialaggfunc=scomplex,\npartialagg_minversion=1400\n);", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

dest=0x562698671b48, qc=0x7ffd1a4053c0) at utility.c:1407
#5  0x56269642fbb4 in standard_ProcessUtility (pstmt=0x562698671a68, 
queryString=0x5626986705d8 "CREATE AGGREGATE SUM(COMPLEX) 
(\nSFUNC=sum_complex,\nSTYPE=COMPLEX,\npartialaggfunc=scomplex,\npartialagg_minversion=1400\n);",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 
queryEnv=0x0, dest=0x562698671b48, qc=0x7ffd1a4053c0) at utility.c:1074



Later will look at it

Re: fixing CREATEROLE

2022-11-22 Thread walther


Tom Lane:

No, we don't support partial indexes on catalogs, and I don't think
we want to change that.  Partial indexes would require expression
evaluations occurring at very inopportune times.


I see. Is that the same for indexes *on* an expression? Or would those 
be ok?


With a custom operator, an EXCLUDE constraint on the ROW(reldatabase, 
relname) expression could work. The operator would compare:

- (0, name1) and (0, name2) as name1 == name2
- (db1, name1) and (0, name2) as name1 == name2
- (0, name1) and (db2, name2) as name1 == name2
- (db1, name1) and (db2, name2) as db1 == db2 && name1 == name2

or just (db1 == 0 || db2 == 0 || db1 == db2) && name1 == name2.

Now, you are going to tell me that EXCLUDE constraints are not supported 
on catalogs either, right? ;)


Best,

Wolfgang

Re: Code checks for App Devs, using new options for transaction behavior

2022-11-22 Thread Simon Riggs

On Mon, 7 Nov 2022 at 14:25, Simon Riggs  wrote:
>
> On Wed, 2 Nov 2022 at 07:40, Simon Riggs  wrote:
>
> > > What will be the behavior if someone declares a savepoint with this
> > > name ("_internal_nested_xact").  Will this interfere with this new
> > > functionality?
> >
> > Clearly! I guess you are saying we should disallow them.
> >
> > > Have we tested scenarios like that?
> >
> > No, but that can be done.
>
> More tests as requested, plus minor code rework, plus comment updates.

New versions

-- 
Simon Riggshttp://www.EnterpriseDB.com/


001_psql_parse_only.v1.patch
Description: Binary data


002_nested_xacts.v11.patch
Description: Binary data


003_rollback_on_commit.v2.patch
Description: Binary data

Re: ssl tests aren't concurrency safe due to get_free_port()

2022-11-22 Thread Andrew Dunstan



On 2022-11-20 Su 14:05, Andres Freund wrote:
>> If it works ok I will backpatch in couple of days.
> +1



Done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: dynamic result sets support in extended query protocol

2022-11-22 Thread Peter Eisentraut


On 14.10.22 09:11, Peter Eisentraut wrote:
Now that the psql support for multiple result sets exists, I want to 
revive this patch.  It's the same as the last posted version, except now 
it doesn't require any psql changes or any weird test modifications 
anymore.


(Old news: This patch allows declaring a cursor WITH RETURN in a 
procedure to make the cursor's data be returned as a result of the CALL 
invocation.  The procedure needs to be declared with the DYNAMIC RESULT 
SETS attribute.)


I added tests using the new psql \bind command to test this 
functionality in the extended query protocol, which showed that this got 
broken since I first wrote this patch.  This "blame" is on the pipeline 
mode in libpq patch (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need 
to spend more time on this and figure out how to repair it.  In the 
meantime, here is an updated patch set with the current status.
From 31ab22f7f4aab5666abec9c9b0f8e2e9a8e6421a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 14 Oct 2022 09:01:17 +0200
Subject: [PATCH v6 1/2] Dynamic result sets from procedures

Declaring a cursor WITH RETURN in a procedure makes the cursor's data be
returned as a result of the CALL invocation.  The procedure needs to
be declared with the DYNAMIC RESULT SETS attribute.

Discussion: 
https://www.postgresql.org/message-id/flat/6e747f98-835f-2e05-cde5-86ee444a7...@2ndquadrant.com
---
 doc/src/sgml/catalogs.sgml| 10 +++
 doc/src/sgml/information_schema.sgml  |  3 +-
 doc/src/sgml/plpgsql.sgml | 27 +-
 doc/src/sgml/protocol.sgml| 19 +
 doc/src/sgml/ref/alter_procedure.sgml | 12 +++
 doc/src/sgml/ref/create_procedure.sgml| 14 +++
 doc/src/sgml/ref/declare.sgml | 34 +++-
 src/backend/catalog/information_schema.sql|  2 +-
 src/backend/catalog/pg_aggregate.c|  3 +-
 src/backend/catalog/pg_proc.c |  4 +-
 src/backend/catalog/sql_features.txt  |  2 +-
 src/backend/commands/functioncmds.c   | 79 +++--
 src/backend/commands/portalcmds.c | 23 +
 src/backend/commands/typecmds.c   | 12 ++-
 src/backend/parser/gram.y | 18 +++-
 src/backend/tcop/postgres.c   | 61 -
 src/backend/tcop/pquery.c |  6 ++
 src/backend/utils/errcodes.txt|  1 +
 src/backend/utils/mmgr/portalmem.c| 48 +++
 src/bin/pg_dump/pg_dump.c | 16 +++-
 src/include/catalog/pg_proc.h |  6 +-
 src/include/commands/defrem.h |  1 +
 src/include/nodes/parsenodes.h|  1 +
 src/include/parser/kwlist.h   |  2 +
 src/include/utils/portal.h| 14 +++
 src/interfaces/libpq/fe-protocol3.c   |  6 +-
 src/pl/plpgsql/src/expected/plpgsql_call.out  | 78 +
 src/pl/plpgsql/src/pl_exec.c  |  6 ++
 src/pl/plpgsql/src/pl_gram.y  | 58 +++--
 src/pl/plpgsql/src/pl_unreserved_kwlist.h |  2 +
 src/pl/plpgsql/src/sql/plpgsql_call.sql   | 46 ++
 .../regress/expected/create_procedure.out | 85 ++-
 src/test/regress/sql/create_procedure.sql | 61 -
 33 files changed, 719 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9ed2b020b7d9..1fd7ff81a764 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6031,6 +6031,16 @@ pg_proc Columns
   
  
 
+ 
+  
+   prodynres int4
+  
+  
+   For procedures, this records the maximum number of dynamic result sets
+   the procedure may create.  Otherwise zero.
+  
+ 
+
  
   
pronargs int2
diff --git a/doc/src/sgml/information_schema.sgml 
b/doc/src/sgml/information_schema.sgml
index 350c75bc31ef..5fc9dc22aeff 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -5885,7 +5885,8 @@ routines Columns
max_dynamic_result_sets 
cardinal_number
   
   
-   Applies to a feature not available in 
PostgreSQL
+   For a procedure, the maximum number of dynamic result sets.  Otherwise
+   zero.
   
  
 
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index dda667e68e8c..c9129559a570 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3128,7 +3128,7 @@ Declaring Cursor Variables
  Another way is to use the cursor declaration syntax,
  which in general is:
 
-name   NO  SCROLL 
 CURSOR  ( arguments ) 
 FOR query;
+name   NO  SCROLL 
 CURSOR   WITH RETURN  ( 
arguments )  FOR 
query;
 
  (FOR can be replaced by IS for
  Oracle compatibility.)
@@ -3136,6 +3136,10 @@ Declaring Cursor Variables
  scrolling backward; if NO SCROLL is specified, backward
  fetches will be

Re: psql: Add command to use extended query protocol

2022-11-22 Thread Peter Eisentraut


On 21.11.22 23:02, Corey Huinker wrote:
I got thinking about this, and while things may be fine as-is, I would 
like to hear some opinions as to whether this behavior is correct:


This is all psql syntax, nothing specific to this command.  The only 
leeway is choosing the appropriate enum slash_option_type, but the 
choices other than OT_NORMAL don't seem to be particularly applicable to 
this.

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Stavros Koureas

Sure, this can be implemented as a subscription option, and it will cover
this use case scenario as each subscriber points only to one database.
I also have some more analytical/reporting use-cases which need additions
in logical-replication, I am not sure if I need to open
different discussions for each one, all ideas are for
publication/subscription.

Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila <
amit.kapil...@gmail.com> έγραψε:

> On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
>  wrote:
> >
> > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
> >  wrote:
> > >
> > > What does not support is the option for defining custom column
> expressions, as keys or values, into the upstream (publication). This will
> give more flexibility into making replication from multiple upstreams into
> less downstreams adding more logic. For instance, in a project for
> analytical purposes there is the need to consolidate data from multiple
> databases into one and at the same time keep the origin of each replicated
> data identified by a tenanant_id column. In this case we also need the
> ability to define the new column as an additional key which will
> participate into the destination table.
> > >
> > > Tenant 1 table
> > > id serial pk
> > > description varchar
> > >
> > > Tenant 2 table
> > > id integer pk
> > > description varchar
> > >
> > > Group table
> > > tenant integer pk
> > > id integer pk
> > > description varchar
> > >
> > > Possible syntax to archive that
> > > CREATE PUBLICATION pb_test FOR TABLE test
> ({value:datatype:iskey:alias} ,"id", "name")
> > >
> > > Example
> > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant}
> ,"id", "name")
> >
> > I think that's a valid usecase.
> >
> > This looks more like a subscription option to me. In multi-subscriber
> > multi-publisher scenarios, on one subscriber a given upstream may be
> > tenant 1 but on some other it could be 2. But I don't think we allow
> > specifying subscription options for a single table. AFAIU, the origin
> > ids are available as part of the commit record which contained this
> > change; that's how conflict resolution is supposed to know it. So
> > somehow the subscriber will need to fetch those from there and set the
> > tenant.
> >
>
> Yeah, to me also it appears that we can handle it on the subscriber
> side. We have the provision of sending origin information in proto.c.
> But note that by default publishers won't have any origin associated
> with change unless someone has defined it. I think this work needs
> more thought but sounds to be an interesting feature.
>
> --
> With Regards,
> Amit Kapila.
>

Re: fixing CREATEROLE

2022-11-22 Thread Tom Lane

walt...@technowledgy.de writes:
> Robert Haas:
>> 2. There are some serious implementation challenges because the
>> constraints on duplicate object names must be something which can be
>> enforced by unique constraints on the relevant catalogs. Off-hand, I
>> don't see how to do that.

> For each database created, create a partial unique index:
> CREATE UNIQUE INDEX ... ON pg_authid (rolname) WHERE roldatabase IN (0, 
> );
> Is that possible on catalogs?

No, we don't support partial indexes on catalogs, and I don't think
we want to change that.  Partial indexes would require expression
evaluations occurring at very inopportune times.

Also, we don't support creating shared indexes post-initdb.
The code has hard-wired lists of which relations are shared,
besides which there's no way to update other databases' pg_class.

Even without that, the idea of a shared catalog ending up with 1
indexes after you create 1 databases (requiring 10^8 pg_class
entries across the whole cluster) seems ... unattractive.

regards, tom lane

Re: fixing CREATEROLE

2022-11-22 Thread walther


Robert Haas:

2. There are some serious implementation challenges because the
constraints on duplicate object names must be something which can be
enforced by unique constraints on the relevant catalogs. Off-hand, I
don't see how to do that. It would be easy to make the cluster roles
all have unique names, and it would be easy to make the database roles
have unique names within each database, but I have no idea how you
would keep a database role from having the same name as a cluster
role. For anyone to try to implement this, we'd need to have a
solution to that problem.


For each database created, create a partial unique index:

CREATE UNIQUE INDEX ... ON pg_authid (rolname) WHERE roldatabase IN (0, 
);


Is that possible on catalogs?

Best,

Wolfgang

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-22 Thread Dilip Kumar

On Tue, Nov 22, 2022 at 7:44 PM Aleksander Alekseev
 wrote:
>
> Hi hackers,
>
> [ Excluding my personal e-mail from cc:, not sure how it got there.
> Please don't cc: to afis...@gmail.com, I'm not using it for reading
> pgsql-hackers@. ]
>
> > I agree with Alexander, that notifications for DBA are a little bit
> > outside the scope of the activity in this thread unless we've just
> > dropped some existing notifications, considering they're not
> > significant anymore. If that was the point, please Chris mention what
> > existing notifications you want to return. I don't think it's a big
> > deal to have the patch with certain notifications inherited from
> > Master branch.
>
> To clarify a bit: currently we DO notify the user about the upcoming
> wraparound point [1]:
>
> """
> If for some reason autovacuum fails to clear old XIDs from a table,
> the system will begin to emit warning messages like this when the
> database's oldest XIDs reach forty million transactions from the
> wraparound point:
>
> WARNING:  database "mydb" must be vacuumed within 39985967 transactions
> HINT:  To avoid a database shutdown, execute a database-wide VACUUM in
> that database.
> """
>
> So I'm not sure how the notification Chris proposes should differ or
> why it is in scope of this patch. If the point was to make sure
> certain existing notifications will be preserved - sure, why not.

IMHO, after having 64-bit XID this WARNING doesn't really make sense.
Those warnings exist because those limits were problematic for 32-bit
xid but now it is not so I think we should not have such warnings.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs

On Mon, 21 Nov 2022 at 22:15, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2022-11-21 16:17:56 -0500, Robert Haas wrote:
> >> But ... what if they're not? Could the index contain a large number of
> >> pages containing just 1 tuple each, or no tuples at all? If so, maybe
> >> we can read ten bazillion index pages trying to find each heap tuple
> >> and still end up in trouble.
>
> > ISTM that if you have an index in such a poor condition that a single
> > value lookup reads thousands of pages inside the index, planner
> > estimates taking long is going to be the smallest of your worries...
>
> Yeah, that sort of situation is going to make any operation on the
> index slow, not only get_actual_variable_endpoint().

That was also my conclusion: this is actually a common antipattern for
our indexes, not anything specific to the planner.

In another recent situation, I saw a very bad case of performance for
a "queue table". In that use case the rows are inserted at head and
removed from tail. Searching for the next item to be removed from the
queue involves an increasingly long tail search, in the case that a
long running transaction prevents us from marking the index entries
killed. Many tables exhibit such usage, for example, the neworder
table in TPC-C.

We optimized the case of frequent insertions into the rightmost index
page; now we also need to optimize the case of a long series of
deletions from the leftmost index pages. Not sure how, just framing
the problem.

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-22 Thread Aleksander Alekseev

Hi hackers,

[ Excluding my personal e-mail from cc:, not sure how it got there.
Please don't cc: to afis...@gmail.com, I'm not using it for reading
pgsql-hackers@. ]

> I agree with Alexander, that notifications for DBA are a little bit
> outside the scope of the activity in this thread unless we've just
> dropped some existing notifications, considering they're not
> significant anymore. If that was the point, please Chris mention what
> existing notifications you want to return. I don't think it's a big
> deal to have the patch with certain notifications inherited from
> Master branch.

To clarify a bit: currently we DO notify the user about the upcoming
wraparound point [1]:

"""
If for some reason autovacuum fails to clear old XIDs from a table,
the system will begin to emit warning messages like this when the
database's oldest XIDs reach forty million transactions from the
wraparound point:

WARNING:  database "mydb" must be vacuumed within 39985967 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in
that database.
"""

So I'm not sure how the notification Chris proposes should differ or
why it is in scope of this patch. If the point was to make sure
certain existing notifications will be preserved - sure, why not.

[1]: 
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

-- 
Best regards,
Aleksander Alekseev

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-22 Thread Pavel Borisov

Hi, Alexander!

On Tue, 22 Nov 2022 at 13:01, Aleksander Alekseev
 wrote:
>
> Hi Chris,
>
> > Right now the way things work is:
> > 1.  Database starts throwing warnings that xid wraparound is approaching
> > 2.  Database-owning team initiates an emergency response, may take downtime 
> > or degradation of services as a result
> > 3.  People get frustrated with PostgreSQL because this is a reliability 
> > problem.
> >
> > What I am worried about is:
> > 1.  Database is running out of space
> > 2.  Database-owning team initiates an emergency response and takes more 
> > downtime to into a good spot
> > 3.  People get frustrated with PostgreSQL because this is a reliability 
> > problem.
> >
> > If that's the way we go, I don't think we've solved that much.  And as 
> > humans we also bias our judgments towards newsworthy events, so rarer, more 
> > severe problems are a larger perceived problem than the more routine, less 
> > severe problems.  So I think our image as a reliable database would suffer.
> >
> > An ideal resolution from my perspective would be:
> > 1.  Database starts throwing warnings that xid lag has reached severely 
> > abnormal levels
> > 2.  Database owning team initiates an effort to correct this, and does not 
> > take downtime or degradation of services as a result
> > 3.  People do not get frustrated because this is not a reliability problem 
> > anymore.
> >
> > Now, 64-big xids are necessary to get us there but they are not sufficient. 
> >  One needs to fix the way we handle this sort of problem.  There is 
> > existing logic to warn if we are approaching xid wraparound.  This should 
> > be changed to check how many xids we have used rather than remaining and 
> > have a sensible default there (optionally configurable).
> >
> > I agree it is not vacuum's responsibility.  It is the responsibility of the 
> > current warnings we have to avoid more serious problems arising from this 
> > change.  These should just be adjusted rather than dropped.
>
> I disagree with the axiom that XID wraparound is merely a symptom and
> not a problem.
>
> Using 32-bit XIDs was a reasonable design decision back when disk
> space was limited and disks were slow. The drawback of this approach
> is the need to do the wraparound but agaig back then it was a
> reasonable design choice. If XIDs were 64-bit from the beginning users
> could run one billion (1,000,000,000) TPS for 584 years without a
> wraparound. We wouldn't have it similarly as there is no wraparound
> for WAL segments. Now when disks are much faster and much cheaper
> 32-bit XIDs are almost certainly not a good design choice anymore.
> (Especially considering the fact that this particular patch mitigates
> the problem of increased disk consumption greatly.)
>
> Also I disagree with an argument that a DBA that doesn't monitor disk
> space would care much about some strange warnings in the logs. If a
> DBA doesn't monitor basic system metrics I'm afraid we can't help this
> person much.
>
> I do agree that we could probably provide some additional help for the
> rest of the users when it comes to configuring VACUUM. This is indeed
> non-trivial. However I don't think this is in scope of this particular
> patchset. I suggest we keep the focus in this discussion. If you have
> a concrete proposal please consider starting a new thread.
>
> This at least is my personal opinion. Let's give the rest of the
> community a chance to share their thoughts.

I agree with Alexander, that notifications for DBA are a little bit
outside the scope of the activity in this thread unless we've just
dropped some existing notifications, considering they're not
significant anymore. If that was the point, please Chris mention what
existing notifications you want to return. I don't think it's a big
deal to have the patch with certain notifications inherited from
Master branch.

Kind regards,
Pavel Borisov
Supabase.

Re: On login trigger: take three

2022-11-22 Thread Mikhail Gribkov

Hi hackers,

Since the original authors, as Daniel said,  seems to have retired from the
patch, I have allowed myself to continue the patch polishing.

Attached v32 includes fresh rebase and the following fixes:

- Copying dathasloginevt flag during DB creation from template;

- Restoring dathasloginevt flag after re-enabling a disabled trigger;

- Preserving dathasloginevt flag during not-running a trigger because of
some filters (running with 'session_replication_role=replica' for example);

- Checking tags during the trigger creation;

The (en/dis)abling GUC was removed from the patch by default because it is
now discussed and  supported in a separate thread by Daniel Gustaffson:

*https://www.postgresql.org/message-id/9140106e-f9bf-4d85-8fc8-f2d3c094a...@yesql.se*

Still the back-ported version of the Daniel’s patch is attached here too –
just for convenience.

While the above changes represent the straight work continue, there is
another version to consider. Most of the patch problems seem to originate
from the dathasloginevt flag support. There are lots of known problems
(partly fixed) and probably a lot more unfound. At the same time the flag
itself is not a critical element, but just a performance optimizer for
logging-in of users who are NOT using the on-login triggers.

I have made a simpler version without the flag and tried to compare the
performance. The results show that life without dathasloginevt is still
possible. When there is a small number of non-login event triggers, the
difference is barely noticeable indeed. To show the difference, I have
added 1000 sql_drop event triggers and used pgbench to continuously query
the database for “select 1”  with reconnects for 3 seconds. Here are the
results. For each version I recorded average connection time with 0 and
with 1000 event triggers:

With dathasloginevt flag: 0.609 ms VS 0.611 ms

No-flag version: 0.617 ms VS 0.727 ms

You can see that the difference is noticeable, but not that critical as we
can expect for 1000 triggers (while in a vast majority of real cases there
would be a much lesser number of triggers). I.e. the flagless version of
the patch introduces a huge reliability raise at the cost of a small
performance drop.

What do you think? Maybe it’s better to use the flagless version? Or even a
small performance drop is considered as unacceptable?

The attached files include the two versions of the patch: “v32” for the
updated version with flag and “v31_nf” – for the flagless version. Both
versions contain “0001“ file for the patch itself and “0002” for
back-ported controlling GUC.
--
 best regards,
Mikhail A. Gribkov

On Fri, Nov 4, 2022 at 3:58 AM Ian Lawrence Barwick 
wrote:

> 2022年11月4日(金) 5:23 Daniel Gustafsson :
> >
> > > On 20 Sep 2022, at 16:10, Daniel Gustafsson  wrote:
> >
> > > Since the original authors seems to have retired from the patch
> > > (I've only rebased it to try and help) I am inclined to mark it as
> returned
> > > with feedback.
> >
> > Doing so now since no updates have been posted for quite some time and
> holes
> > have been poked in the patch.
>
> I see it was still "Waiting on Author" so I went ahead and changed the
> status.
>
> > The GUC for temporarily disabling event triggers, to avoid the need for
> single-
> > user mode during troubleshooting, might however be of interest so I will
> break
> > that out and post to a new thread.
>
> For reference this is the new thread:
>
>
> https://www.postgresql.org/message-id/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9%40yesql.se
>
> Regards
>
> Ian Barwick
>
>
>

v32-0002-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data

v31_nf-0001-Add-support-event-triggers-on-authenticated-login-noflag.patch
Description: Binary data

v31_nf-0002-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data

v32-0001-Add-support-event-triggers-on-authenticated-login.patch
Description: Binary data

RE: Perform streaming logical transactions by background workers and parallel apply

2022-11-22 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for updating the patch!
I tested the case whether the deadlock caused by foreign key constraint could be
detected, and it worked well.

Followings are my review comments. They are basically related with 0001, but
some contents may be not. It takes time to understand 0002 correctly...

01. typedefs.list

LeaderFileSetState should be added to typedefs.list.


02. 032_streaming_parallel_apply.pl

As I said in [1]: the test name may be not matched. Do you have reasons to
revert the change?


03. 032_streaming_parallel_apply.pl

The test does not cover the case that the backend process relates with the
deadlock. IIUC this is another motivation to use a stream/transaction lock.
I think it should be added.

04. log output

While being applied spooled changes by PA, there are so many messages like
"replayed %d changes from file..." and "applied %u changes...". They comes from
apply_handle_stream_stop() and apply_spooled_messages(). They have same meaning,
so I think one of them can be removed.

05. system_views.sql

In the previous version you modified pg_stat_subscription system view. Why do 
you revert that?

06. interrupt.c - SignalHandlerForShutdownRequest()

In the comment atop SignalHandlerForShutdownRequest(), some processes that 
assign the function
except SIGTERM are clarified. We may be able to add the parallel apply worker.

07. proto.c - logicalrep_write_stream_abort()

We may able to add assertions for abort_lsn and abort_time, like xid and subxid.


08. guc_tables.c - ConfigureNamesInt

```
_sync_workers_per_subscription,
+   2, 0, MAX_PARALLEL_WORKER_LIMIT,
+   NULL, NULL, NULL
+   },
```

The upper limit for max_sync_workers_per_subscription seems to be wrong, it 
should
be used for max_parallel_apply_workers_per_subscription.


10. worker.c - maybe_reread_subscription()


```
+   if (am_parallel_apply_worker())
+   ereport(LOG,
+   /* translator: first %s is the name of logical 
replication worker */
+   (errmsg("%s for subscription \"%s\" 
will stop because of a parameter change",
+   get_worker_name(), 
MySubscription->name)));
```

I was not sure get_worker_name() is needed. I think "logical replication apply 
worker"
should be embedded.


11. worker.c - ApplyWorkerMain()

```
+   (errmsg_internal("%s for subscription \"%s\" 
two_phase is %s",
+
get_worker_name(),
```

The message for translator is needed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58666A97D40AB8919D106AD5F5709%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: fixing CREATEROLE

2022-11-22 Thread Robert Haas

On Tue, Nov 22, 2022 at 3:02 AM  wrote:
> My suggestion to $subject and the namespace problem would be to
> introduce database-specific roles, i.e. add a database column to
> pg_authid. Having this column set to 0 will make the role a cluster-wide
> role ("cluster role") just as currently the case. But having a database
> oid set will make the role exist in the context of that database only
> ("database role"). Then, the following principles should be enforced:
>
> - database roles can not share the same name with a cluster role.
> - database roles can have the same name as database roles in other
> databases.
> - database roles can not be members of database roles in **other**
> databases.
> - database roles with CREATEROLE can only create or alter database roles
> in their own database, but not roles in other databases and also not
> cluster roles.
> - database roles with CREATEROLE can GRANT all database roles in the
> same database, but only those cluster roles they have ADMIN privilege on.
> - database roles with CREATEROLE can not set SUPERUSER.
>
> To be able to create database roles with a cluster role, there needs to
> be some syntax, e.g. something like
>
> CREATE ROLE name IN DATABASE dbname ...

I have three comments on this:

1. It's a good idea and might make for some interesting followup work.

2. There are some serious implementation challenges because the
constraints on duplicate object names must be something which can be
enforced by unique constraints on the relevant catalogs. Off-hand, I
don't see how to do that. It would be easy to make the cluster roles
all have unique names, and it would be easy to make the database roles
have unique names within each database, but I have no idea how you
would keep a database role from having the same name as a cluster
role. For anyone to try to implement this, we'd need to have a
solution to that problem.

3. I don't want to sidetrack this thread into talking about possible
future features or followup work. There is enough to do just getting
consensus on the design ideas that I proposed without addressing the
question of what else we might do later. I do not think there is any
reasonable argument that we can't clean up the CREATEROLE mess without
also implementing database-specific roles, and I have no intention of
including that in this patch series. Whether I or someone else might
work on it in the future is a question we can leave for another day.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Non-decimal integer literals

2022-11-22 Thread Peter Eisentraut


On 15.11.22 11:31, Peter Eisentraut wrote:

On 14.11.22 08:25, John Naylor wrote:
Regarding the patch, it looks good overall. My only suggestion would 
be to add a regression test for just below and just above overflow, at 
least for int2.


ok


This was a valuable suggestion, because this found some breakage.  In 
particular, the handling of grammar-level literals that overflow to 
"Float" was not correct.  (The radix prefix was simply stripped and 
forgotten.)  So I added a bunch more tests for this.  Here is a new patch.
From c0daab31eb145fbe54c2822bc093d774b993cd3d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Nov 2022 14:22:09 +0100
Subject: [PATCH v10] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 +
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  24 ++-
 src/backend/parser/scan.l  | 101 +---
 src/backend/utils/adt/numutils.c   | 170 +++--
 src/fe_utils/psqlscan.l|  78 +++---
 src/interfaces/ecpg/preproc/pgc.l  | 106 +++--
 src/test/regress/expected/int2.out |  80 ++
 src/test/regress/expected/int4.out |  80 ++
 src/test/regress/expected/int8.out |  80 ++
 src/test/regress/expected/numerology.out   | 127 ++-
 src/test/regress/sql/int2.sql  |  22 +++
 src/test/regress/sql/int4.sql  |  22 +++
 src/test/regress/sql/int8.sql  |  22 +++
 src/test/regress/sql/numerology.sql|  37 -
 16 files changed, 881 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index da7c9c772e09..e897e28ed148 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES

Re: postgres_fdw binary protocol support

2022-11-22 Thread Ashutosh Bapat

Hi Illya,

On Mon, Nov 21, 2022 at 8:50 PM Ilya Gladyshev
 wrote:
>
> Hi everyone,
>
> I have made a patch that introduces support for libpq binary protocol
> in postgres_fdw. The idea is simple, when a user knows that the foreign
> server is binary compatible with the local and his workload could
> somehow benefit from using binary protocol, it can be switched on for a
> particular server or even a particular table.
>

Why do we need this feature? If it's for performance then do we have
performance numbers?

AFAIU, binary compatibility of two postgresql servers depends upon the
binary compatibility of the platforms on which they run. So probably
postgres_fdw can not infer the binary compatibility by itself. Is that
true? We have many postgres_fdw options that user needs to set
manually to benefit from them. It will be good to infer those
automatically as much as possible. Hence this question.

> The patch adds a new foreign server and table option 'binary_format'
> (by default off) and implements serialization/deserialization of query
> results and parameters for binary protocol. I have tested the patch by
> switching foreign servers in postgres_fdw.sql tests to binary_mode, the
> only diff was in the text of the error for parsing an invalid integer
> value, so it worked as expected for the test. There are a few minor
> issues I don't like in the code and I am yet to write the tests and
> docs for it. It would be great to get some feedback and understand,
> whether this is a welcome feature, before proceeding with all of the
> abovementioned.
>

About the patch itself, I see a lot of if (binary) {} else {} block
which are repeated. It will be good if we can add functions/macros to
avoid duplication.

-- 
Best Wishes,
Ashutosh Bapat

Re: Introduce a new view for checkpointer related stats

2022-11-22 Thread Bharath Rupireddy

On Tue, Nov 22, 2022 at 1:26 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 11/17/22 1:51 PM, Bharath Rupireddy wrote:
> > Hi,
> >
> > pg_stat_bgwriter view currently reports checkpointer stats as well. It
> > is that way because historically checkpointer was part of bgwriter
> > until the commits 806a2ae and bf405ba, that went into PG 9.2,
> > separated them out. I think it is time for us to separate checkpointer
> > stats to its own view. I discussed it in another thread [1] and it
> > seems like there's an unequivocal agreement for the proposal.
> >
> > I'm attaching a patch introducing a new pg_stat_checkpointer view,
> > with this change the pg_stat_bgwriter view only focuses on bgwriter
> > related stats. The patch does mostly mechanical changes. I'll add it
> > to CF in a bit.
> >
> > Thoughts?
>
> +1 for the dedicated view.
>
> +  
> +   The pg_stat_checkpointer view will always have a
> +   single row, containing global data for the cluster.
> +  
>
> what about "containing data about checkpointer activity of the cluster"? (to 
> provide more "details" (even if that seems obvious given the name of the 
> view) and be consistent with the pg_stat_wal description too).
> And if it makes sense to you, While at it, why not go for "containing data 
> about bgwriter activity of the cluster" for pg_stat_bgwriter too?

Nice catch. Modified.

> +CREATE VIEW pg_stat_checkpointer AS
> +SELECT
> +pg_stat_get_timed_checkpoints() AS checkpoints_timed,
> +pg_stat_get_requested_checkpoints() AS checkpoints_req,
> +pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
> +pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
> +pg_stat_get_buf_written_checkpoints() AS buffers_checkpoint,
> +pg_stat_get_buf_written_backend() AS buffers_backend,
> +pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
> +pg_stat_get_checkpointer_stat_reset_time() AS stats_reset;
>
> I don't think we should keep the checkpoints_ prefix (or _checkpoint suffix) 
> in the column names now that they belong to a dedicated view (also the 
> pg_stat_bgwriter view's columns don't have a
> bgwriter prefix/suffix).
>
> And while at it, I'm not sure the wal_ suffix in pg_stat_wal make sense too.
>
> The idea is to have consistent naming between the views and their columns: 
> I'd vote without prefix/suffix.

-1. If the prefix is removed, some column names become unreadable -
timed, requested, write_time, sync_time, buffers. We might think of
renaming those columns to something more readable, I tend to not do
that as it can break largely the application/service layer/monitoring
tools, of course even with the new pg_stat_checkpointer view, we can't
avoid that, however the changes are less i.e. replace pg_stat_bgwriter
with the new view.

I'm attaching the v2 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 724d9bee5bc3bc124dd37909878a6450c4fe1019 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 22 Nov 2022 12:06:20 +
Subject: [PATCH v2] Introduce a new view for checkpointer related stats

pg_stat_bgwriter view currently reports checkpointer stats as well.
It is that way because historically checkpointer was part of
bgwriter until the commits 806a2ae and bf405ba, that went into
PG 9.2, separated them out.

It is time for us to separate checkpointer stats to its own view
called pg_stat_checkpointer.

Bump catalog version.
---
 doc/src/sgml/monitoring.sgml  | 110 +-
 src/backend/catalog/system_views.sql  |  18 +--
 .../utils/activity/pgstat_checkpointer.c  |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  19 +--
 src/include/catalog/pg_proc.dat   |  22 ++--
 src/include/pgstat.h  |   1 +
 src/test/recovery/t/029_stats_restart.pl  |   6 +-
 src/test/regress/expected/rules.out   |  15 +--
 src/test/regress/expected/stats.out   |  21 +++-
 src/test/regress/sql/stats.sql|  12 +-
 10 files changed, 155 insertions(+), 70 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e2426f7210..e532c4be08 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
  
 
+ 
+  pg_stat_checkpointerpg_stat_checkpointer
+  One row only, showing statistics about the
+   checkpointer process's activity. See
+   
+   pg_stat_checkpointer for details.
+ 
+ 
+
  
   pg_stat_walpg_stat_wal
   One row only, showing statistics about WAL activity. See
@@ -3631,7 +3640,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
   
The pg_stat_bgwriter view will always have a
-   single row, containing global data for

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-22 Thread Masahiko Sawada

Hi,

On Tue, Nov 22, 2022 at 6:37 PM Amit Kapila  wrote:
>
> On Mon, Nov 21, 2022 at 6:17 PM Maxim Orlov  wrote:
> >
> > PROBLEM
> >
> > After some investigation, I think, the problem is in the snapbuild.c 
> > (commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
> > array in the context of builder->context, but for the time when we call 
> > SnapBuildPurgeOlderTxn this context may be already free'd.
> >
>
> I think you are seeing it freed in SnapBuildPurgeOlderTxn when we
> finish and restart decoding in the same session. After finishing the
> first decoding, it frees the decoding context but we forgot to reset
> NInitialRunningXacts and InitialRunningXacts array. So, next time when
> we start decoding in the same session where we don't restore any
> serialized snapshot, it can lead to the problem you are seeing because
> NInitialRunningXacts (and InitialRunningXacts array) won't have sane
> values.
>
> This can happen in the catalog_change_snapshot test as we have
> multiple permutations and those use the same session across a restart
> of decoding.

I have the same analysis. In order to restart the decoding from the
LSN where we don't restore any serialized snapshot, we somehow need to
advance the slot's restart_lsn. In this case, I think it happened
since the same session drops at the end of the first scenario and
creates the replication slot with the same name at the beginning of
the second scenario in catalog_change_snapshot.spec.

>
> >
> > Simple fix like:
> > @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> > lsn, xl_running_xacts *runn
> >  * changes. See SnapBuildXidSetCatalogChanges.
> >  */
> > NInitialRunningXacts = nxacts;
> > -   InitialRunningXacts = MemoryContextAlloc(builder->context, 
> > sz);
> > +   InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, 
> > sz);
> > memcpy(InitialRunningXacts, running->xids, sz);
> > qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), 
> > xidComparator);
> >
> > seems to solve the described problem, but I'm not in the context of [0] and 
> > why array is allocated in builder->context.
> >
>
> It will leak the memory for InitialRunningXacts. We need to reset
> NInitialRunningXacts (and InitialRunningXacts) as mentioned above.
>
> Thank you for the report and initial analysis. I have added Sawada-San
> to know his views as he was the primary author of this work.

Thanks!

I've attached a draft patch. To fix it, I think we can reset
InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder()
and add an assertion in AllocateSnapshotBuilder() to make sure both
are reset. Regarding the tests, the patch includes a new scenario to
reproduce this issue. However, since the issue can be reproduced also
by the existing scenario (with low probability, though), I'm not sure
it's worth adding the new scenario.

I've not checked if the patch works for version 14 or older yet.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


reset_initial_running_xacts.patch
Description: Binary data

Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila

On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
 wrote:
>
> On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
>  wrote:
> >
> > What does not support is the option for defining custom column expressions, 
> > as keys or values, into the upstream (publication). This will give more 
> > flexibility into making replication from multiple upstreams into less 
> > downstreams adding more logic. For instance, in a project for analytical 
> > purposes there is the need to consolidate data from multiple databases into 
> > one and at the same time keep the origin of each replicated data identified 
> > by a tenanant_id column. In this case we also need the ability to define 
> > the new column as an additional key which will participate into the 
> > destination table.
> >
> > Tenant 1 table
> > id serial pk
> > description varchar
> >
> > Tenant 2 table
> > id integer pk
> > description varchar
> >
> > Group table
> > tenant integer pk
> > id integer pk
> > description varchar
> >
> > Possible syntax to archive that
> > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} 
> > ,"id", "name")
> >
> > Example
> > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", 
> > "name")
>
> I think that's a valid usecase.
>
> This looks more like a subscription option to me. In multi-subscriber
> multi-publisher scenarios, on one subscriber a given upstream may be
> tenant 1 but on some other it could be 2. But I don't think we allow
> specifying subscription options for a single table. AFAIU, the origin
> ids are available as part of the commit record which contained this
> change; that's how conflict resolution is supposed to know it. So
> somehow the subscriber will need to fetch those from there and set the
> tenant.
>

Yeah, to me also it appears that we can handle it on the subscriber
side. We have the provision of sending origin information in proto.c.
But note that by default publishers won't have any origin associated
with change unless someone has defined it. I think this work needs
more thought but sounds to be an interesting feature.

-- 
With Regards,
Amit Kapila.

Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-22 Thread Dimos Stamatakis

Hi hackers,

When running tpcc on sysbench with high concurrency (96 threads, scale factor 
5) we realized that a fix for visibility check (introduced in PG-14.5) causes 
sysbench to fail in 1 out of 70 runs.
The error is the following:

SQL error, errno = 0, state = 'XX000': new multixact has more than one updating 
member

And it is caused by the following statement:

UPDATE warehouse1
  SET w_ytd = w_ytd + 234
  WHERE w_id = 3;

The commit that fixes the visibility check is the following:
https://github.com/postgres/postgres/commit/e24615a0057a9932904317576cf5c4d42349b363

We reverted this commit and tpcc does not fail anymore, proving that this 
change is problematic.
Steps to reproduce:
1. Install sysbench
  https://github.com/akopytov/sysbench
2. Install percona sysbench TPCC
  https://github.com/Percona-Lab/sysbench-tpcc
3. Run percona sysbench -- prepare
  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 
--pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database 
--db-driver=pgsql --tables=1 --threads=96 --scale=5 --time=60 prepare
4. Run percona sysbench -- run
  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 
--pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database 
--db-driver=pgsql --tables=1 --report-interval=1 --rand-seed=1 --threads=96 
--scale=5 --time=60 run

We tested on a machine with 2 NUMA nodes, 16 physical cores per node, and 2 
threads per core, resulting in 64 threads total. The total memory is 376GB.
Attached please find the configuration file we used (postgresql.conf).

This commit was supposed to fix a race condition during the visibility check. 
Please let us know whether you are aware of this issue and if there is a quick 
fix.
Any input is highly appreciated.

Thanks,
Dimos
[ServiceNow]


postgresql.conf
Description: postgresql.conf

Re: Fix comments atop pg_get_replication_slots

2022-11-22 Thread Amit Kapila

On Mon, Nov 21, 2022 at 1:22 PM sirisha chamarthi
 wrote:
>
> Looks good to me. Attached a patch for the same.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: wake up logical workers after ALTER SUBSCRIPTION

2022-11-22 Thread Amit Kapila

On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart  wrote:
>
> While working on avoiding unnecessary wakeups in logical/worker.c (as was
> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
> much longer.  This seems to be caused by the reduced frequency of calls to
> maybe_reread_subscription() in LogicalRepApplyLoop().
>

I think it would be interesting to know why tests started taking more
time after a reduced frequency of calls to
maybe_reread_subscription(). IIRC, we anyway call
maybe_reread_subscription for each xact.

-- 
With Regards,
Amit Kapila.

Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs

On Mon, 21 Nov 2022 at 23:34, Robert Haas  wrote:
>
> On Mon, Nov 21, 2022 at 6:17 PM Tom Lane  wrote:
> > evidence that it's a live problem.  API warts are really hard to
> > get rid of once instituted.
>
> Yeah, agreed.

Agreed, happy not to; that version was just a WIP to see how it might work.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: when the startup process doesn't (logging startup delays)

2022-11-22 Thread Bharath Rupireddy

On Mon, Nov 21, 2022 at 10:37 PM Robert Haas  wrote:
>
> On Sun, Nov 20, 2022 at 9:20 PM Kyotaro Horiguchi
>  wrote:
> > I prefer Robert's approach as it is more robust for future changes and
> > simple. I prefer to avoid this kind of piggy-backing and it doesn't
> > seem to be needed in this case. XLogShutdownWalRcv() looks like a
> > similar case to me and honestly I don't like it in the sense of
> > robustness but it is simpler than checking walreceiver status at every
> > site that refers to the flag.
>
> I don't understand what you want changed. Can you be more specific
> about what you mean by "Robert's approach"?
>
> I don't agree with Bharath's logic for preferring an if-test to an
> Assert. There are some cases where we think we've written the code
> correctly but also recognize that the logic is complicated enough that
> something might slip through the cracks. Then, a runtime check makes
> sense, because otherwise something real bad might happen on a
> production instance. But here, I don't think that's the main risk. I
> think the main risk is that some future patch tries to add code that
> should print startup log messages later on. That would probably be a
> coding mistake, and Assert would alert the patch author about that,
> whereas amending the if-test would just make the code do something
> differently then the author intended.
>
> But I don't feel super-strongly about this, which is why I mentioned
> both options in my previous email. I'm not clear on whether you are
> expressing an opinion on this point in particular or something more
> general.

If we place just the Assert(!StandbyMode); in
enable_startup_progress_timeout(), it fails for
begin_startup_progress_phase() in ResetUnloggedRelations() because the
InitWalRecovery(), that sets StandbyMode to true, is called before
ResetUnloggedRelations() . However, with the if (StandbyMode) {
return; }, we fail to report progress of ResetUnloggedRelations() in a
standby, which isn't a good idea at all because we only want to
disable the timeout during the recovery's main loop.

I introduced an assert-only function returning true when we're in
standby's main redo apply loop and modified the assertion to be
Assert(!InStandbyMainRedoApplyLoop()); works better but it complicates
the code a bit. FWIW, I'm attaching the v6 patch with this change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v6-0001-Disable-STARTUP_PROGRESS_TIMEOUT-in-standby-mode.patch
Description: Binary data

Re: [PATCH] Add initial xid/mxid/mxoff to initdb

2022-11-22 Thread Maxim Orlov

Hi!

CF bot says patch does not apply. Rebased.
Your reviews are very much welcome!

-- 
Best regards,
Maxim Orlov.


v2-0001-Add-initdb-option-to-initialize-cluster-with-non-.patch
Description: Binary data

Re: pg_upgrade, tables_with_oids.txt -> tables_with_oids.sql?

2022-11-22 Thread Daniel Gustafsson

> On 6 Nov 2022, at 09:48, Daniel Westermann (DWE) 
>  wrote:

> as I've just upgraded an instance which contained tables "WITH OIDS" I wonder 
> if it would make sense if pg_upgrade directly creates a script to fix those. 
> I know it is easy to that with e.g. sed over tables_with_oids.txt but it 
> would be more convenient to have the script generated directly.

For the checks on the old system we don't generate any scripts, only reports of
problems.  I don't recall the reasoning but I would assume it stems from some
checks being up to the user to deal with, no one-size-fits-all script is
possible.  Having them all generate reports rather than scripts makes that
consistent across the old checks.

In this particular case we probably could safely make a script, but if we we'd
need to expand testing to validate it etc so I'm not sure it's worth it.

--
Daniel Gustafsson   https://vmware.com/

Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2022-11-22 Thread Ted Yu

On Tue, Nov 22, 2022 at 1:11 AM fujii.y...@df.mitsubishielectric.co.jp <
fujii.y...@df.mitsubishielectric.co.jp> wrote:

> Hi Mr.Yu.
>
> Thank you for comments.
>
> > + * Check that partial aggregate agg has compatibility
> >
> > If the `agg` refers to func parameter, the parameter name is aggform
> I fixed the above typo and made the above comment easy to understand
> New comment is "Check that partial aggregate function of aggform exsits in
> remote"
>
> > +   int32  partialagg_minversion = PG_VERSION_NUM;
> > +   if (aggform->partialagg_minversion ==
> > PARTIALAGG_MINVERSION_DEFAULT) {
> > +   partialagg_minversion = PG_VERSION_NUM;
> >
> >
> > I am curious why the same variable is assigned the same value twice. It
> seems
> > the if block is redundant.
> >
> > +   if ((fpinfo->server_version >= partialagg_minversion)) {
> > +   compatible = true;
> >
> >
> > The above can be simplified as: return fpinfo->server_version >=
> > partialagg_minversion;
> I fixed according to your comment.
>
> Sincerely yours,
> Yuuki Fujii
>
>
> Hi,
Thanks for the quick response.

1 2 >

1 - 100 of 113 matches

Mail list logo