RE: Ability to reference other extensions by schema in extension scripts

2023-03-11 Thread Regina Obe
> Subject: Re: Ability to reference other extensions by schema in extension
> scripts
> 
> "Regina Obe"  writes:
> >> requires = 'extfoo, extbar'
> >> no_relocate = 'extfoo'
> 
> > So when no_relocate is specified, where would that live?
> 
> In the control file.
> 
> > Would I mark the extfoo as not relocatable on CREATE / ALTER of said
> > extension?
> > Or add an extra field to pg_extension
> 
> We don't record dependent extensions in pg_extension now, so that doesn't
> seem like it would fit well.  I was envisioning that ALTER EXTENSION SET
> SCHEMA would do something along the lines of
> 
> (1) scrape the list of dependent extensions out of pg_depend
> (2) open and parse each of their control files
> (3) fail if any of their control files mentions the target one in
> no_relocate.
> 
> Admittedly, this'd be a bit slow, but I doubt that ALTER EXTENSION SET
> SCHEMA is a performance bottleneck for anybody.
> 
> > I had tried to do that originally, e.g. instead of even bothering with
> > such an extra arg, just mark it as not relocatable if the extension's
> > script contains references to the required extension's schema.
> 
> I don't think that's a great approach, because those references might
appear
> in places that can track a rename (ie, in an object name that's resolved
to a
> stored OID).  Short of fully parsing the script file you aren't going to
get a
> reliable answer.  I'm content to lay that problem off on the extension
authors.
> 
> > But then what if extfoo is upgraded?
> 
> We already have mechanisms for version-dependent control files, so I don't
> see where there's a problem.
> 
>   regards, tom lane

Attached is a revised patch with these changes in place.


0006-Allow-use-of-extschema-reqextname-to-reference.patch
Description: Binary data


Re: optimize several list functions with SIMD intrinsics

2023-03-11 Thread Ankit Kumar Pandey
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hello, 

Adding some review comments:

1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from 
#ifdef USE_NO_SIMD
const ListCell *cell;
#endif
to #else like as mentioned below? This will make visual separation between #if 
cases more cleaner
#else
  const ListCell *cell;

foreach(cell, list)
{
if (lfirst(cell) == datum)
return true;
}

return false;

#endif

2. Lots of duplicated
if (list == NIL) checks before calling  list_member_inline_internal(list, datum)
Can we not add this check in list_member_inline_internal itself?
3. if (cell)
return list_delete_cell(list, cell) in list_delete_int/oid
can we change if (cell) to if (cell != NULL) as used elsewhere?
4. list_member_inline_interal_idx , there is typo in name.
5. list_member_inline_interal_idx and list_member_inline_internal are 
practically same with almost 90+ % duplication.
can we not refactor both into one and allow cell or true/false as return? 
Although I see usage of const ListCell so this might be problematic.
6. Loop for (i = 0; i < tail_idx; i += nelem_per_iteration) for Vector32 in 
list.c looks duplicated from pg_lfind32 (in pg_lfind.h), can we not reuse that?
7. Is it possible to add a benchmark which shows improvement against HEAD ?

Regards,
Ankit

The new status of this patch is: Waiting on Author


Re: Add LZ4 compression in pg_dump

2023-03-11 Thread gkokolatos
--- Original Message ---
On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin 
 wrote:

> Hello,
> 23.02.2023 23:24, Tomas Vondra wrote:
> 
> > On 2/23/23 16:26, Tomas Vondra wrote:
> > 
> > > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > > fixing a comment typo and removing (I think) an unnecessary change in an
> > > error message.
> > > 
> > > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> > 
> > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> > and marked the CF entry as committed. Thanks for the patch!
> > 
> > I wonder how difficult would it be to add the zstd compression, so that
> > we don't have the annoying "unsupported" cases.
> 
> 
> With the patch 0003 committed, a single warning -Wtype-limits appeared in the
> master branch:
> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
> compress_lz4.c: In function ‘LZ4File_gets’:
> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is
> always false [-Wtype-limits]
> 492 | if (dsize < 0)
> |

Thank you Alexander. Please find attached an attempt to address it.

> (I wonder, is it accidental that there no other places that triggers
> the warning, or some buildfarm animals had this check enabled before?)

I can not answer about the buildfarms. Do you think that adding an explicit
check for this warning in meson would help? I am a bit uncertain as I think
that type-limits are included in extra.

@@ -1748,6 +1748,7 @@ common_warning_flags = [
   '-Wshadow=compatible-local',
   # This was included in -Wall/-Wformat in older GCC versions
   '-Wformat-security',
+  '-Wtype-limits',
 ]

> 
> It is not a false positive as can be proved by the 002_pg_dump.pl modified as
> follows:
> - program => $ENV{'LZ4'},
> 
> + program => 'mv',
> 
> args => [
> 
> - '-z', '-f', '--rm',
> "$tempdir/compression_lz4_dir/blobs.toc",
> "$tempdir/compression_lz4_dir/blobs.toc.lz4",
> ],
> },

Correct, it is not a false positive. The existing testing framework provides
limited support for exercising error branches. Especially so when those are
dependent on generated output. 

> A diagnostic logging added shows:
> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615
> 
> and pg_restore fails with:
> error: invalid line in large object TOC file
> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": 
> ""

It is a good thing that the restore fails with bad input. Yet it should
have failed earlier. The attached makes certain it does fail earlier. 

Cheers,
//Georgios

> 
> Best regards,
> AlexanderFrom b80bb52ef6546aee8c8431d7cc126fa4a76b543c Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Sat, 11 Mar 2023 09:54:40 +
Subject: [PATCH v1] Respect return type of LZ4File_read_internal

The function LZ4File_gets() was storing the return value of
LZ4File_read_internal in a variable of the wrong type, disregarding sign-es.
As a consequence, LZ4File_gets() would not take the error path when it should.

In an attempt to improve readability, spell out the significance of a negative
return value of LZ4File_read_internal() in LZ4File_read().

Reported-by: Alexander Lakhin 
---
 src/bin/pg_dump/compress_lz4.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 63e794cdc6..cc039f0b47 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -453,7 +453,7 @@ LZ4File_read(void *ptr, size_t size, CompressFileHandle *CFH)
 	int			ret;
 
 	ret = LZ4File_read_internal(fs, ptr, size, false);
-	if (ret != size && !LZ4File_eof(CFH))
+	if (ret < 0 || (ret != size && !LZ4File_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	return ret;
@@ -486,14 +486,14 @@ static char *
 LZ4File_gets(char *ptr, int size, CompressFileHandle *CFH)
 {
 	LZ4File*fs = (LZ4File *) CFH->private_data;
-	size_t		dsize;
+	int			ret;
 
-	dsize = LZ4File_read_internal(fs, ptr, size, true);
-	if (dsize < 0)
+	ret = LZ4File_read_internal(fs, ptr, size, true);
+	if (ret < 0)
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	/* Done reading */
-	if (dsize == 0)
+	if (ret == 0)
 		return NULL;
 
 	return ptr;
-- 
2.34.1



Re: optimize several list functions with SIMD intrinsics

2023-03-11 Thread Ankit Kumar Pandey



> 7. Is it possible to add a benchmark which shows improvement against 
HEAD ?



Please ignore this from my earlier mail, I just saw stats now.

Thanks,

Ankit






Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-03-11 Thread Andrei Zubkov
Hi,

I've done a rebase of a patch to the current master.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From e43e9eab39bbd377665a7cad3b7fe7162f8f6578 Mon Sep 17 00:00:00 2001
From: Andrei Zubkov 
Date: Sat, 11 Mar 2023 09:53:10 +0300
Subject: [PATCH] pg_stat_statements: Track statement entry timestamp

This patch adds stats_since and minmax_stats_since columns to the
pg_stat_statements view and pg_stat_statements() function. The new min/max reset
mode for the pg_stat_stetments_reset() function is controlled by the
parameter minmax_only.
stat_since column is populated with the current timestamp when a new statement
is added to the pg_stat_statements hashtable. It provides clean information
about statistics collection time interval for each statement. Besides it can be
used by sampling solutions to detect situations when a statement was evicted and
stored again between samples.
Such sampling solution could derive any pg_stat_statements statistic values for
an interval between two samples with the exception of all min/max statistics. To
address this issue this patch adds the ability to reset min/max statistics
independently of the statement reset using the new minmax_only parameter
of the pg_stat_statements_reset(userid oid, dbid oid, queryid bigint,
minmax_only boolean) function. Timestamp of such reset is stored in the
minmax_stats_since field for each statement.
pg_stat_statements_reset() function now returns the timestamp of a reset as a
result.

Discussion:
https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
---
 contrib/pg_stat_statements/Makefile   |   4 +-
 .../pg_stat_statements/expected/cursors.out   |  28 +--
 contrib/pg_stat_statements/expected/dml.out   |  20 +--
 .../expected/entry_timestamp.out  | 159 ++
 .../expected/level_tracking.out   |  80 -
 .../expected/oldextversions.out   |  70 
 .../pg_stat_statements/expected/planning.out  |  18 +-
 .../pg_stat_statements/expected/select.out|  44 ++---
 .../expected/user_activity.out| 120 ++---
 .../pg_stat_statements/expected/utility.out   | 150 -
 contrib/pg_stat_statements/expected/wal.out   |  10 +-
 contrib/pg_stat_statements/meson.build|   2 +
 .../pg_stat_statements--1.10--1.11.sql|  81 +
 .../pg_stat_statements/pg_stat_statements.c   | 139 +++
 .../pg_stat_statements.control|   2 +-
 contrib/pg_stat_statements/sql/cursors.sql|   6 +-
 contrib/pg_stat_statements/sql/dml.sql|   4 +-
 .../sql/entry_timestamp.sql   | 114 +
 .../pg_stat_statements/sql/level_tracking.sql |  12 +-
 .../pg_stat_statements/sql/oldextversions.sql |   7 +
 contrib/pg_stat_statements/sql/planning.sql   |   4 +-
 contrib/pg_stat_statements/sql/select.sql |  10 +-
 .../pg_stat_statements/sql/user_activity.sql  |  15 +-
 contrib/pg_stat_statements/sql/utility.sql|  28 +--
 contrib/pg_stat_statements/sql/wal.sql|   2 +-
 doc/src/sgml/pgstatstatements.sgml|  66 +++-
 26 files changed, 881 insertions(+), 314 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/entry_timestamp.out
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.10--1.11.sql
 create mode 100644 contrib/pg_stat_statements/sql/entry_timestamp.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 5578a9dd4e3..b2446e7a1cf 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,7 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql \
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.10--1.11.sql \
 	pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
@@ -18,7 +18,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
-	user_activity wal cleanup oldextversions
+	user_activity wal entry_timestamp cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/cursors.out b/contrib/pg_stat_statements/expected/cursors.out
index 46375ea9051..0fc4b2c098d 100644
--- a/contrib/pg_stat_statements/expected/cursors.out
+++ b/contrib/pg_stat_statements/expected/cursors.out
@@ -3,10 +3,10 @@
 --
 -- These tests require track_utility to be enabled.
 SET pg_stat_statements.track_utility = TRUE;
-SELECT pg_stat_statements_

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-11 Thread Melanie Plageman
> On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy
>  wrote:
>
> > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund  wrote:
> > >
> > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:
> > > > Should we just add "ring_buffers" to the existing "shared_buffers" and
> > > > "temp_buffers" settings?
> > >
> > > The different types of ring buffers have different sizes, for good 
> > > reasons. So
> > > I don't see that working well. I also think it'd be more often useful to
> > > control this on a statement basis - if you have a parallel import tool 
> > > that
> > > starts NCPU COPYs you'd want a smaller buffer than a single threaded 
> > > COPY. Of
> > > course each session can change the ring buffer settings, but still.
> >
> > How about having GUCs for each ring buffer (bulk_read_ring_buffers,
> > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
> > These options can help especially when statement level controls aren't
> > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
> > needed users can also set them at the system level. For instance, one
> > can set bulk_write_ring_buffers to other than 16MB or -1 to disable
> > the ring buffer to use shared_buffers and run a bunch of bulk write
> > queries.

In attached v3, I've changed the name of the guc from buffer_usage_limit
to vacuum_buffer_usage_limit, since it is only used for vacuum and
autovacuum.

I haven't added the other suggested strategy gucs, as those could easily
be done in a future patchset.

I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize()
-- similar to initArrayResultWithSize().

And I've added tab completion for BUFFER_USAGE_LIMIT so that it is
easier to try out my patch.

Most of the TODOs in the code are related to the question of how
autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates
the buffer access strategy ring in do_autovacuum() before looping
through and vacuuming tables. It passes this strategy object on to
vacuum(). Since we reuse the same strategy object for all tables in a
given invocation of do_autovacuum(), only failsafe autovacuum would
change buffer access strategies. This is probably okay, but it does mean
that the table-level VacuumParams variable, ring_size, means something
different for autovacuum than vacuum. Autovacuum workers will always
have set it to -1. We won't ever reach code in vacuum() which relies on
VacuumParams->ring_size as long as autovacuum workers pass a non-NULL
BufferAccessStrategy object to vacuum(), though.

- Melanie
From 6f40d87f4f462d48a67721260be7e30a7438520e Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 22 Feb 2023 12:26:01 -0500
Subject: [PATCH v3 2/3] use shared buffers when failsafe active

---
 src/backend/access/heap/vacuumlazy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f3..b319a244d5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2622,6 +2622,11 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
 	if (unlikely(vacuum_xid_failsafe_check(&vacrel->cutoffs)))
 	{
 		vacrel->failsafe_active = true;
+		/*
+		 * Assume the caller who allocated the memory for the
+		 * BufferAccessStrategy object will free it.
+		 */
+		vacrel->bstrategy = NULL;
 
 		/* Disable index vacuuming, index cleanup, and heap rel truncation */
 		vacrel->do_index_vacuuming = false;
-- 
2.37.2

From 139e71241729d7304188dffb603e6f43db0bf67c Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 22 Feb 2023 15:28:34 -0500
Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc

---
 src/backend/commands/vacuum.c | 51 ++-
 src/backend/commands/vacuumparallel.c |  6 ++-
 src/backend/postmaster/autovacuum.c   | 15 +-
 src/backend/storage/buffer/README |  2 +
 src/backend/storage/buffer/freelist.c | 40 +++
 src/backend/utils/init/globals.c  |  2 +
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  2 +-
 src/include/commands/vacuum.h |  1 +
 src/include/miscadmin.h   |  1 +
 src/include/storage/bufmgr.h  |  5 ++
 12 files changed, 136 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index c62be52e3b..22fe42ee2e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -127,6 +127,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* By default parallel vacuum is enabled */
 	params.nworkers = 0;
 
+	/* by default use buffer access strategy with default size */
+	params.ring_size = -1;
+
 	/* Parse options list */
 	foreach(lc, vacstmt->options)
 	{
@@ -210,6 +213,43 @@ ExecVacuum(ParseState *pstate, Vacuu

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-11 Thread John Naylor
On Fri, Mar 10, 2023 at 9:30 PM Masahiko Sawada 
wrote:
>
> On Fri, Mar 10, 2023 at 3:42 PM John Naylor
>  wrote:

> > I'd suggest sharing your todo list in the meanwhile, it'd be good to
discuss what's worth doing and what is not.
>
> Apart from more rounds of reviews and tests, my todo items that need
> discussion and possibly implementation are:

Quick thoughts on these:

> * The memory measurement in radix trees and the memory limit in
> tidstores. I've implemented it in v30-0007 through 0009 but we need to
> review it. This is the highest priority for me.

Agreed.

> * Additional size classes. It's important for an alternative of path
> compression as well as supporting our decoupling approach. Middle
> priority.

I'm going to push back a bit and claim this doesn't bring much gain, while
it does have a complexity cost. The node1 from Andres's prototype is 32
bytes in size, same as our node3, so it's roughly equivalent as a way to
ameliorate the lack of path compression. I say "roughly" because the loop
in node3 is probably noticeably slower. A new size class will by definition
still use that loop.

About a smaller node125-type class: I'm actually not even sure we need to
have any sub-max node bigger about 64 (node size 768 bytes). I'd just let
65+ go to the max node -- there won't be many of them, at least in
synthetic workloads we've seen so far.

> * Node shrinking support. Low priority.

This is an architectural wart that's been neglected since the tid store
doesn't perform deletion. We'll need it sometime. If we're not going to
make this work, why ship a deletion API at all?

I took a look at this a couple weeks ago, and fixing it wouldn't be that
hard. I even had an idea of how to detect when to shrink size class within
a node kind, while keeping the header at 5 bytes. I'd be willing to put
effort into that, but to have a chance of succeeding, I'm unwilling to make
it more difficult by adding more size classes at this point.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add LZ4 compression in pg_dump

2023-03-11 Thread Alexander Lakhin

Hi Georgios,

11.03.2023 13:50, gkokola...@pm.me wrote:

I can not answer about the buildfarms. Do you think that adding an explicit
check for this warning in meson would help? I am a bit uncertain as I think
that type-limits are included in extra.

@@ -1748,6 +1748,7 @@ common_warning_flags = [
'-Wshadow=compatible-local',
# This was included in -Wall/-Wformat in older GCC versions
'-Wformat-security',
+  '-Wtype-limits',
  ]

I'm not sure that I can promote additional checks (or determine where
to put them), but if some patch introduces a warning of a type that wasn't
present before, I think it's worth to eliminate the warning (if it is
sensible) to keep the source code check baseline at the same level
or even lift it up gradually.
I've also found that the same commit introduced a single instance of
the analyzer-possible-null-argument warning:
CPPFLAGS="-Og -fanalyzer -Wno-analyzer-malloc-leak -Wno-analyzer-file-leak 
-Wno-analyzer-null-dereference -Wno-analyzer-shift-count-overflow 
-Wno-analyzer-free-of-non-heap -Wno-analyzer-null-argument 
-Wno-analyzer-double-free -Wanalyzer-possible-null-argument" ./configure 
--with-lz4 -q && make -s -j8

compress_io.c: In function ‘hasSuffix’:
compress_io.c:158:47: warning: use of possibly-NULL ‘filename’ where non-null 
expected [CWE-690] [-Wanalyzer-possible-null-argument]

  158 | int filenamelen = strlen(filename);
  | ^~~~
  ‘InitDiscoverCompressFileHandle’: events 1-3
...

(I use gcc-11.3.)
As I can see, many existing uses of strdup() are followed by a check for
null result, so maybe it's a common practice and a similar check should
be added in InitDiscoverCompressFileHandle().
(There also a couple of other warnings introduced with the lz4 compression
patches, but those ones are not unique, so I maybe they aren't worth fixing.)


It is a good thing that the restore fails with bad input. Yet it should
have failed earlier. The attached makes certain it does fail earlier.


Thanks! Your patch definitely fixes the issue.

Best regards,
Alexander




Re: Transparent column encryption

2023-03-11 Thread Mark Dilger



> On Mar 9, 2023, at 11:18 PM, Peter Eisentraut 
>  wrote:
> 
> Here is an updated patch.

Thanks, Peter.  The patch appears to be in pretty good shape, but I have a few 
comments and concerns.

CEKIsVisible() and CMKIsVisible() are obviously copied from 
TSParserIsVisible(), and the code comments weren't fully updated.  
Specifically, the phrase "hidden by another parser of the same name" should be 
updated to not mention "parser".

Why does get_cmkalg_name() return the string "unspecified" for 
PG_CMK_UNSPECIFIED, but the next function get_cmkalg_jwa_name() returns NULL 
for PG_CMK_UNSPECIFIED?  It seems they would both return NULL, or both return 
"unspecified".  If there's a reason for the divergence, could you add a code 
comment to clarify?  BTW, get_cmkalg_jwa_name() has no test coverage.
 
Looking further at code coverage, the new conditional in printsimple_startup() 
is never tested with (MyProcPort->column_encryption_enabled), so the block is 
never entered.  This would seem to be a consequence of backends like walsender 
not using column encryption, which is not terribly surprising, but it got me 
wondering if you had a particular client use case in mind when you added this 
block?

The new function pg_encrypted_in() appears totally untested, but I have to 
wonder if that's because we're not round-trip testing pg_dump with column 
encryption...?  The code coverage in pg_dump looks fairly decent, but some 
column encryption code is not covered.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-11 Thread Gilles Darold

Le 04/03/2023 à 20:18, Tom Lane a écrit :

As noted, "childs" is bad English and "partitions" is flat out wrong
(unless you change it to recurse only to partitions, which doesn't
seem like a better definition).  We could go with
--[exclude-]table-and-children, or maybe
--[exclude-]table-and-child-tables, but those are getting into
carpal-tunnel-syndrome-inducing territory 🙁.  I lack a better
naming suggestion offhand.



In attachment is version 3 of the patch, it includes the use of options 
suggested by Stephane and Tom:


    --table-and-children,

    --exclude-table-and-children

    --exclude-table-data-and-children.

 Documentation have been updated too.


Thanks

--
Gilles Darold
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 334e4b7fd1..e97b73194e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-and-children=pattern
+  
+   
+Same as -T/--exclude-table option but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --exclude-table-data=pattern
   
@@ -793,6 +803,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-data-and-children=pattern
+  
+   
+Same as --exclude-table-data  but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
+
  
   --extra-float-digits=ndigits
   
@@ -1158,6 +1179,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --table-with-children=pattern
+  
+   
+Same as -t/--table option but automatically
+include partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --use-set-session-authorization
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4217908f84..09d37991d6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -130,6 +130,10 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList table_include_patterns_and_children = {NULL, NULL};
+static SimpleStringList table_exclude_patterns_and_children = {NULL, NULL};
+static SimpleStringList tabledata_exclude_patterns_and_children = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -180,7 +184,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool with_child_tables);
 static void prohibit_crossdb_refs(PGconn *conn, const char *dbname,
   const char *pattern);
 
@@ -421,6 +426,9 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"table-and-children", required_argument, NULL, 12},
+		{"exclude-table-and-children", required_argument, NULL, 13},
+		{"exclude-table-data-and-children", required_argument, NULL, 14},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +639,19 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* include table(s) with children */
+simple_string_list_append(&table_include_patterns_and_children, optarg);
+dopt.include_everything = false;
+break;
+
+			case 13:			/* exclude table(s) with children */
+simple_string_list_append(&table_exclude_patterns_and_children, optarg);
+break;
+
+			case 14:/* exclude table(s) data */
+simple_string_list_append(&tabledata_exclude_patterns_and_children, optarg);
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -814,17 +835,34 @@ main(int argc, char **argv)
 	{
 		expand_table_name_patterns(fout, &table_include_patterns,
    &table_include_oids,
-   strict_names);
+   strict_names, false);
 		if (table_include_oids.head == NULL)
 			pg_fatal("no matching tables were found");
 	}
 	expand_table_name_patterns(fout, &table_exclude_patterns,
 			   &table_exclude_oids,
-			   false);
+			   false, false);
 
 	expand_table_name_patterns(fout, &tabledata_exclude_patterns,
 			   &tabledata_exclude_oids,
-			   false);
+			   false, false);
+
+	/* Expand table and children selection patterns into OID lists */
+	if (table_include_patterns_and_children.head != NULL)
+	{
+		expand_table_name_patterns(fout, &table_include_patterns_and_children,
+   &table_in

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-11 Thread Justin Pryzby
On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote:
> Subject: [PATCH v3 2/3] use shared buffers when failsafe active
> 
> + /*
> +  * Assume the caller who allocated the memory for the
> +  * BufferAccessStrategy object will free it.
> +  */
> + vacrel->bstrategy = NULL;

This comment could use elaboration:

** VACUUM normally restricts itself to a small ring buffer; but in
failsafe mode, in order to process tables as quickly as possible, allow
the leaving behind large number of dirty buffers.

> Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc

>  #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var
> +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ)

Macros are normally be capitalized

It's a good idea to write "(bufsize)", in case someone passes "a+b".

> @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype)
> +BufferAccessStrategy
> +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size)

Maybe it would make sense for GetAccessStrategy() to call
GetAccessStrategyWithSize().  Or maybe not.

> + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM,
> + gettext_noop("Sets the buffer pool size for operations 
> employing a buffer access strategy."),

The description should mention vacuum, if that's the scope of the GUC's
behavior.

> +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring.
> + # -1 to use default,
> + # 0 to disable vacuum buffer access strategy 
> and use shared buffers
> + # > 0 to specify size

If I'm not wrong, there's still no documentation about "ring buffers" or
postgres' "strategy".  Which seems important to do for this patch, along
with other documentation.

This patch should add support in vacuumdb.c.  And maybe a comment about
adding support there, since it's annoying when it the release notes one
year say "support VACUUM (FOO)" and then one year later say "support
vacuumdb --foo".

-- 
Justin




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-03-11 Thread Nitin Jadhav
> [1]
> subscription tests:
> PATCHED: WAL buffers hit - 1972, misses - 32616

Can you share more details about the test here?

I went through the v8 patch. Following are my thoughts to improve the
WAL buffer hit ratio.

Currently the no-longer-needed WAL data present in WAL buffers gets
cleared in XLogBackgroundFlush() which is called based on the
wal_writer_delay config setting. Once the data is flushed to the disk,
it is treated as no-longer-needed and it will be cleared as soon as
possible based on some config settings. I have done some testing by
tweaking the wal_writer_delay config setting to confirm the behaviour.
We can see that the WAL buffer hit ratio is good when the
wal_writer_delay is big enough [2] compared to smaller
wal_writer_delay [1]. So irrespective of the wal_writer_delay
settings, we should keep the WAL data in the WAL buffers as long as
possible so that all the readers (Mainly WAL senders) can take
advantage of this. The WAL page should be evicted from the WAL buffers
only when the WAL buffer is full and we need room for the new page.
The patch attached takes care of this. We can see the improvements in
WAL buffer hit ratio even when the wal_writer_delay is set to lower
value [3].

Second, In WALRead(), we try to read the data from disk whenever we
don't find the data from WAL buffers. We don't store this data in the
WAL buffer. We just read the data, use it and leave it. If we store
this data to the WAL buffer, then we may avoid a few disk reads.

[1]:
wal_buffers=1GB
wal_writer_delay=1ms
./pgbench --initialize --scale=300 postgres

WAL buffers hit=5046
WAL buffers miss=56767

[2]:
wal_buffers=1GB
wal_writer_delay=10s
./pgbench --initialize --scale=300 postgres

WAL buffers hit=45454
WAL buffers miss=14064

[3]:
wal_buffers=1GB
wal_writer_delay=1ms
./pgbench --initialize --scale=300 postgres

WAL buffers hit=37214
WAL buffers miss=844

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Tue, Mar 7, 2023 at 12:39 PM Bharath Rupireddy
 wrote:
>
> On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart  
> wrote:
> >
> > +void
> > +XLogReadFromBuffers(XLogRecPtr startptr,
> >
> > Since this function presently doesn't return anything, can we have it
> > return the number of bytes read instead of storing it in a pointer
> > variable?
>
> Done.
>
> > +ptr = startptr;
> > +nbytes = count;
> > +dst = buf;
> >
> > These variables seem superfluous.
>
> Needed startptr and count for DEBUG1 message and assertion at the end.
> Removed dst and used buf in the new patch now.
>
> > +/*
> > + * Requested WAL isn't available in WAL buffers, so return with
> > + * what we have read so far.
> > + */
> > +break;
> >
> > nitpick: I'd move this to the top so that you can save a level of
> > indentation.
>
> Done.
>
> > +/*
> > + * All the bytes are not in one page. Read available bytes 
> > on
> > + * the current page, copy them over to output buffer and
> > + * continue to read remaining bytes.
> > + */
> >
> > Is it possible to memcpy more than a page at a time?
>
> It would complicate things a lot there; the logic to figure out the
> last page bytes that may or may not fit in the whole page gets
> complicated. Also, the logic to verify each page's header gets
> complicated. We might lose out if we memcpy all the pages at once and
> start verifying each page's header in another loop.
>
> I would like to keep it simple - read a single page from WAL buffers,
> verify it and continue.
>
> > +/*
> > + * The fact that we acquire WALBufMappingLock while reading 
> > the WAL
> > + * buffer page itself guarantees that no one else initializes 
> > it or
> > + * makes it ready for next use in AdvanceXLInsertBuffer().
> > + *
> > + * However, we perform basic page header checks for ensuring 
> > that
> > + * we are not reading a page that just got initialized. Callers
> > + * will anyway perform extensive page-level and record-level
> > + * checks.
> > + */
> >
> > Hm.  I wonder if we should make these assertions instead.
>
> Okay. I added XLogReaderValidatePageHeader for assert-only builds
> which will help catch any issues there. But we can't perform record
> level checks here because this function doesn't know where the record
> starts from, it knows only pages. This change required us to pass in
> XLogReaderState to XLogReadFromBuffers. I marked it as
> PG_USED_FOR_ASSERTS_ONLY and did page header checks only when it is
> passed as non-null so that someone who doesn't have XLogReaderState
> can still read from buffers.
>
> > +elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for 
> > given LSN %X/%X, Timeline ID %u",
> > + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli);
> >
> > I definitely don't think we should

Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-11 Thread Önder Kalacı
Hi all,

(cc'ed Amit as he has the context)

While working on [1], I realized that on HEAD there is a problem with the
$subject.  Here is the relevant discussion on the thread [2]. Quoting my
own notes on that thread below;

I realized that the dropped columns also get into the tuples_equal()
> function. And,
> the remote sends NULL to for the dropped columns(i.e., remoteslot), but
> index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
> columns on the outslot. So, the dropped columns are not NULL in the outslot


Amit also suggested checking generated columns, which indeed has the same
problem.

Here are the steps to repro the problem with dropped columns:

- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM
generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───┐
│ count │
├───┤
│ 0 │
└───┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;

-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───┐
│ count │
├───┤
│ 0 │
└───┘
(1 row)


**-- target, OOPS wrong result**SELECT count(*) FROM test WHERE x = 1;
┌───┐
│ count │
├───┤
│ 1 │
└───┘
(1 row)



Attaching a patch that could possibly solve the problem.

Thanks,
Onder KALACI


[1]
https://www.postgresql.org/message-id/flat/CACawEhUN%3D%2BvjY0%2B4q416-rAYx6pw-nZMHQYsJZCftf9MjoPN3w%40mail.gmail.com#2f7fa76f9e4496e3b52a9be6736e5b43
[2]
https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com


v1-0001-Skip-dropped-and-generated-columns-when-REPLICA-I.patch
Description: Binary data


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-11 Thread Önder Kalacı
Hi Amit, all


> > This triggers tuples_equal to fail. To fix that, I improved the
> tuples_equal
> > such that it skips the dropped columns.
> >
>
>  By any chance, have you tried with generated columns?


Yes, it shows the same behavior.


> See
> logicalrep_write_tuple()/logicalrep_write_attrs() where we neither
> send anything for dropped columns nor for generated columns.

Similarly, on receiving side, in logicalrep_rel_open() and
> slot_store_data(), we seem to be using NULL for such columns.
>
>
Thanks for the explanation, it helps a lot.


>
> Yes, it would be better to report and discuss this in a separate thread,
>

Done via [1]

>
> > Attached as v40_0001 on the patch.
> >
> > Note that I need to have that commit as 0001 so that 0002 patch
> > passes the tests.
> >
>
> I think we can add such a test (which relies on existing buggy
> behavior) later after fixing the existing bug. For now, it would be
> better to remove that test and add it after we fix dropped columns
> issue in HEAD.
>

Alright, when I push the next version (hopefully tomorrow), I'll follow
this suggestion.

Thanks,
Onder KALACI

[1]
https://www.postgresql.org/message-id/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3%2Bv8q32AWYsdpGg%40mail.gmail.com


Re: buildfarm + meson

2023-03-11 Thread Andres Freund
Hi,

On 2023-03-09 18:31:10 -0500, Andrew Dunstan wrote:
> Another thing: the test for uuid.h is too strict. On Fedora 36 the OSSP
> header is in /usr/include, not /usr/include/ossp (I got around that for now
> by symlinking it, but obviously that's a nasty hack we can't ask people to
> do)

Yea, that was just wrong. It happened to work on debian and a few other OSs,
but ossp's .pc puts whatever the right directory is into the include
path. Pushed the fairly obvious fix.

Greetings,

Andres Freund




Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-03-11 Thread Karl O. Pinc
Hi Alvaro,

On Thu, 9 Mar 2023 10:22:49 +0100
Alvaro Herrera  wrote:

> On 2023-Jan-22, Karl O. Pinc wrote:
> 
> > Actually, this CSS, added to doc/src/sgml/stylesheet.css,
> > makes the column spacing look pretty good:
> Okay, this looks good to me too.  However, for it to actually work, we
> need to patch the corresponding CSS file in the pgweb repository too.
> I'll follow up in the other mailing list.

Do you also like the page breaking in the PDF for each
contributed package, per the
v10-0002-Page-break-before-sect1-in-contrib-appendix-when-pdf.patch
of
https://www.postgresql.org/message-id/20230122144246.0ff87372%40slate.karlpinc.com
?

No need to reply if I don't need to do anything.  (I didn't
want the patch to get lost.)

Thanks for the review.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2023-03-11 Thread Justin Pryzby
On Thu, Dec 15, 2022 at 10:13:23AM -0600, Justin Pryzby wrote:
> Rebased on c727f511b.

Rebased on 30a53b792.
With minor changes including fixes to an intermediate patch.

> This patch record was "closed for lack of interest", but I think what's
> actually needed is committer review of which approach to take.
> 
>  - add backend functions but do not modify psql ?
>  - add to psql slash-plus commnds ?
>  - introduce psql double-plus commands for new options ?
>  - change pre-existing psql plus commands to only show size with
>double-plus ?
>  - go back to the original, two-line client-side sum() ?
> 
> Until then, the patchset is organized with those questions in mind.
>From cf1596faa449ff1bd1d8b56306118d5eac764291 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 2 files changed, 151 insertions(+)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e5c0f1c45b6..af0955d1790 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,19 +13,25 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(AttrNumber attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData skey;
+	Relation	pg_class;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(&skey, attnum,
+BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey);
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+		Relation	rel;
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid);
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nspOid = PG_GETARG_OID(0);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(PG_FUNCTION_ARGS)
+{
+	Name		nspName = PG_GETARG_NAME(0);
+	Oid			nspOid = get_namespace_oid(NameStr(*nspName), false);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+/* Compute the size of relations using the given access method */
+static int64
+calculate_am_size(Oid amOid)
+{
+	/* XXX acl_check? */
+
+	return calculate_size_attvalue(Anum_pg_class_relam, amOid);
+}
+
+Datum
+pg_am_size_oid(PG_FUNCTION_ARGS)
+{
+	Oid			amOid = PG_GETARG_OID(0);
+	int64		size;
+
+	size = calculate_am_size(amOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_am_size_name(PG_FUNCTION_ARGS)
+{
+	Name		amName = PG_GETARG_NAME(0);
+	Oid			amOid = get_am_oid(NameStr(*amName), false);
+	int64		size;
+
+	size = calculate_am_size(amOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
 /*
  * Get the filenode of a relation
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1351d4be67c

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-03-11 Thread Andres Freund
Hi,

On 2023-03-09 12:15:16 -0800, Mark Dilger wrote:
> > Somewhat random note:
> > 
> > Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
> > effectively O(ROWCOUNT^2), albeit with small enough constants to not really
> > matter. I don't think we need to insert the rows one-by-one either. Changing
> > that to a single INSERT and FREEZE shaves 10-12% off the tests.  I didn't
> > change that, but we also fire off a psql for each tuple for 
> > heap_page_items(),
> > with offset $N no less. That seems to be another 500ms.
> 
> I don't recall the reasoning.  Feel free to optimize the tests.

Something like the attached.

I don't know enough perl to know how to interpolate something like
use constant ROWCOUNT => 17;
so I just made it a variable.

Greetings,

Andres Freund
>From a01e1481505e74097112c0ea358e7e0eef6a5684 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 11 Mar 2023 15:15:35 -0800
Subject: [PATCH v1] pg_amcheck: Minor test speedups

Discussion: https://postgr.es/m/20230309001558.b7shzvio645eb...@awork3.anarazel.de
---
 src/bin/pg_amcheck/t/004_verify_heapam.pl | 33 +++
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 9984d0d9f87..e5ae7e6aada 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -217,17 +217,17 @@ my $rel = $node->safe_psql('postgres',
 my $relpath = "$pgdata/$rel";
 
 # Insert data and freeze public.test
-use constant ROWCOUNT => 17;
+my $ROWCOUNT = 17;
 $node->safe_psql(
 	'postgres', qq(
 	INSERT INTO public.test (a, b, c)
-		VALUES (
+		SELECT
 			x'DEADF9F9DEADF9F9'::bigint,
 			'abcdefg',
 			repeat('w', 1)
-		);
-	VACUUM FREEZE public.test
-	)) for (1 .. ROWCOUNT);
+FROM generate_series(1, $ROWCOUNT);
+	VACUUM FREEZE public.test;)
+);
 
 my $relfrozenxid = $node->safe_psql('postgres',
 	q(select relfrozenxid from pg_class where relname = 'test'));
@@ -246,16 +246,13 @@ if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
 }
 
 # Find where each of the tuples is located on the page.
-my @lp_off;
-for my $tup (0 .. ROWCOUNT - 1)
-{
-	push(
-		@lp_off,
-		$node->safe_psql(
-			'postgres', qq(
-select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
-	offset $tup limit 1)));
-}
+my @lp_off = split '\n', $node->safe_psql(
+	'postgres', qq(
+	select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+		where lp <= $ROWCOUNT
+)
+);
+is(scalar @lp_off, $ROWCOUNT, "acquired row offsets");
 
 # Sanity check that our 'test' table on disk layout matches expectations.  If
 # this is not so, we will have to skip the test until somebody updates the test
@@ -267,7 +264,7 @@ open($file, '+<', $relpath)
 binmode $file;
 
 my $ENDIANNESS;
-for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
 {
 	my $offnum = $tupidx + 1;# offnum is 1-based, not zero-based
 	my $offset = $lp_off[$tupidx];
@@ -345,7 +342,7 @@ open($file, '+<', $relpath)
   or BAIL_OUT("open failed: $!");
 binmode $file;
 
-for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++)
 {
 	my $offnum = $tupidx + 1;# offnum is 1-based, not zero-based
 	my $offset = $lp_off[$tupidx];
@@ -522,7 +519,7 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
 		$tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
 
 		push @expected,
-  qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
+		  qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/;
 	}
 	write_tuple($file, $offset, $tup);
 }
-- 
2.38.0



Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-03-11 Thread Mark Dilger



> On Mar 11, 2023, at 3:22 PM, Andres Freund  wrote:
> 
> Something like the attached.

I like that your patch doesn't make the test longer.  I assume you've already 
run the tests and that it works.

> I don't know enough perl to know how to interpolate something like
> use constant ROWCOUNT => 17;
> so I just made it a variable.

Seems fair.  I certainly don't mind.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







slapd logs to syslog during tests

2023-03-11 Thread Andres Freund
Hi,

On my buildfarm host (for all my animals) I noted that slapd was by far the
biggest contributor to syslog. Even though there's not normally slapd
running. It's of course the slapds started by various tests.

Would anybody mind if I add 'logfile_only' to slapd's config in LdapServer.pm?
That still leaves a few logline, from before the config file parsing, but it's
a lot better than all requests getting logged.

Obviously I also could reconfigure syslog to just filter this stuff, but it
seems that the tests shouldn't spam like that.

Greetings,

Andres Freund




Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-03-11 Thread Andres Freund
Hi,

On 2023-03-11 15:34:55 -0800, Mark Dilger wrote:
> > On Mar 11, 2023, at 3:22 PM, Andres Freund  wrote:
> > 
> > Something like the attached.
> 
> I like that your patch doesn't make the test longer.  I assume you've already 
> run the tests and that it works.

I did check that, yes :). My process of writing perl is certainly, uh,
iterative. No way I would get anything close to working without testing it.

CI now finished the tests as well:
https://cirrus-ci.com/build/6675457702100992

So I'll go ahead and push that.

Greetings,

Andres Freund




Re: [BUG] pg_stat_statements and extended query protocol

2023-03-11 Thread Imseih (AWS), Sami
> If I remove this patch and recompile again, then "initdb -D $PGDATA" works.

It appears you must "make clean; make install" to correctly compile after
applying the patch.

Regards,

Sami Imseih
Amazon Web Services (AWS)











Re: Transparent column encryption

2023-03-11 Thread Andres Freund
Hi,

On 2023-03-10 08:18:34 +0100, Peter Eisentraut wrote:
> Here is an updated patch.  I have done some cosmetic polishing and fixed a
> minor Windows-related bug.
> 
> In my mind, the patch is complete.
> 
> If someone wants to do some in-depth code review, I suggest focusing on the
> following files:
> 
> * src/backend/access/common/printtup.c

Have you done benchmarks of some simple workloads to verify this doesn't cause
slowdowns (when not using encryption, obviously)? printtup.c is a performance
sensitive portion for simple queries, particularly when they return multiple
columns.

And making tupledescs even wider is likely to have some price, both due to the
increase in memory usage, and due to the lower cache density - and that's code
where we're already hurting noticeably.

Greetings,

Andres Freund




Re: unsafe_tests module

2023-03-11 Thread Andres Freund
Hi,

On 2023-02-22 06:47:34 -0500, Andrew Dunstan wrote:
> Given its nature and purpose as a module we don't want to run against an
> installed instance, shouldn't src/test/modules/unsafe_tests have
> NO_INSTALLCHECK=1 in its Makefile and runningcheck:false in its meson.build?

Seems like a good idea to me.

Greetings,

Andres Freund




Re: slapd logs to syslog during tests

2023-03-11 Thread Andrew Dunstan



> On Mar 11, 2023, at 6:37 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On my buildfarm host (for all my animals) I noted that slapd was by far the
> biggest contributor to syslog. Even though there's not normally slapd
> running. It's of course the slapds started by various tests.
> 
> Would anybody mind if I add 'logfile_only' to slapd's config in LdapServer.pm?
> That still leaves a few logline, from before the config file parsing, but it's
> a lot better than all requests getting logged.

Makes sense

Cheers 

Andrew




Re: [BUG] pg_stat_statements and extended query protocol

2023-03-11 Thread Michael Paquier
On Sat, Mar 11, 2023 at 11:55:22PM +, Imseih (AWS), Sami wrote:
> It appears you must "make clean; make install" to correctly compile after
> applying the patch.

In a git repository, I've learnt to rely on this simple formula, even
if it means extra cycles when running ./configure:
git clean -d -x -f
--
Michael


signature.asc
Description: PGP signature


Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-11 Thread Thomas Munro
On Fri, Mar 10, 2023 at 1:05 PM Thomas Munro  wrote:
> On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart
>  wrote:
> > On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote:
> > > Is it reasonable to assume that all modern platforms can time
> > > millisecond delays accurately?  Ten years ago I'd have suggested
> > > truncating the delay to a multiple of 10msec and using this logic
> > > to track the remainder, but maybe now that's unnecessary.
> >
> > If so, it might also be worth updating or removing this comment in
> > pgsleep.c:
> >
> >  * NOTE: although the delay is specified in microseconds, the effective
> >  * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
> >  * the requested delay to be rounded up to the next resolution boundary.
> >
> > I've had doubts for some time about whether this is still accurate...

Unfortunately I was triggered by this Unix archeology discussion, and
wasted some time this weekend testing every Unix we target.  I found 3
groups:

1.  OpenBSD, NetBSD: Like the comment says, kernel ticks still control
sleep resolution.  I measure an average time of ~20ms when I ask for
1ms sleeps in a loop with select() or nanosleep().  I don't actually
understand why it's not ~10ms because HZ is 100 on these systems, but
I didn't look harder.

2.  AIX, Solaris, illumos: select() can sleep for 1ms accurately, but
not fractions of 1ms.  If you use nanosleep() instead of select(),
then AIX joins the third group (huh, maybe it's just that its
select(us) calls poll(ms) under the covers?), but Solaris does not
(maybe it's still tick-based, but HZ == 1000?).

3.  Linux, FreeBSD, macOS: sub-ms sleeps are quite accurate (via
various system calls).

I didn't test Windows but it sounds a lot like it is in group 1 if you
use WaitForMultipleObjects() or SleepEx(), as we do.

You can probably tune some of the above; for example FreeBSD can go
back to the old way with kern.eventtimer.periodic=1 to get a thousand
interrupts per second (kern.hz) instead of programming a hardware
timer to get an interrupt at just the right time, and then 0.5ms sleep
requests get rounded to an average of 1ms, just like on Solaris.  And
power usage probably goes up.

As for what do do about it, I dunno, how about this?

  * NOTE: although the delay is specified in microseconds, the effective
- * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
- * the requested delay to be rounded up to the next resolution boundary.
+ * resolution is only 1/HZ on systems that use periodic kernel ticks to limit
+ * sleeping.  This may cause sleeps to be rounded up by as much as 1-20
+ * milliseconds on old Unixen and Windows.

As for the following paragraph about the dangers of select() and
interrupts and restarts, I suspect it is describing the HP-UX
behaviour (a dropped platform), which I guess must have led to POSIX's
reluctance to standardise that properly, but in any case all
hypothetical concerns would disappear if we just used POSIX
[clock_]nanosleep(), no?  It has defined behaviour on signals, and it
also tells you the remaining time (if we cared, which we wouldn't).




Re: Testing autovacuum wraparound (including failsafe)

2023-03-11 Thread Peter Geoghegan
On Wed, Mar 8, 2023 at 10:47 PM Michael Paquier  wrote:
> I may be missing something, but you cannot use directly a "postgres"
> command in a TAP test, can you?  See 1a9d802, that has fixed a problem
> when TAP tests run with a privileged account on Windows.

I was joking. But I did have a real point: once we have tests for the
xidStopLimit mechanism, why not take the opportunity to correct the
long standing issue with the documentation advising the use of single
user mode?

-- 
Peter Geoghegan




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-11 Thread Amit Kapila
On Sun, Mar 12, 2023 at 1:34 AM Önder Kalacı  wrote:
>
>>
>> I think we can add such a test (which relies on existing buggy
>> behavior) later after fixing the existing bug. For now, it would be
>> better to remove that test and add it after we fix dropped columns
>> issue in HEAD.
>
>
> Alright, when I push the next version (hopefully tomorrow), I'll follow this 
> suggestion.
>

Okay, thanks. See, if you can also include your changes in the patch
wip_for_optimize_index_column_match (after discussed modification).
Few other minor comments:

1.
+   are enforced for primary keys.  Internally, we follow a similar approach for
+   supporting index scans within logical replication scope.  If there are no

I think we can remove the above line: "Internally, we follow a similar
approach for supporting index scans within logical replication scope."
This didn't seem useful for users.

2.
diff --git a/src/backend/executor/execReplication.c
b/src/backend/executor/execReplication.c
index bc6409f695..646e608eb7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
rel, Relation idxrel,
if (!AttributeNumberIsValid(table_attno))
{
/*
-* XXX: For a non-primary/unique index with an
additional
-* expression, we do not have to continue at
this point. However,
-* the below code assumes the index scan is
only done for simple
-* column references. If we can relax the
assumption in the below
-* code-block, we can also remove the continue.
+* XXX: Currently, we don't support
expressions in the scan key,
+* see code below.
 */


I have tried to simplify the above comment. See, if that makes sense to you.

3.
/*
+ * We only need to allocate once. This is allocated within per
+ * tuple context -- ApplyMessageContext -- hence no need to
+ * explicitly pfree().
+ */

We normally don't write why we don't need to explicitly pfree. It is
good during the review but not sure if it is a good idea to keep it in
the final code.

4. I have modified the proposed commit message as follows, see if that
makes sense to you, and let me know if I missed anything especially
the review/author credit.

Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that
satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: 
https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqoniigz7g73hfys7+ng...@mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

2023-03-11 Thread Thomas Munro
And again:

TRAP: failed Assert("PMSignalState->PMChildFlags[slot] ==
PM_CHILD_ASSIGNED"), File: "../src/backend/storage/ipc/pmsigna...

https://cirrus-ci.com/task/6558324615806976
https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/002_pg_upgrade_old_node.log
https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/crashlog/crashlog-postgres.exe_0974_2023-03-11_13-57-27-982.txt




Re: [PoC] Implementation of distinct in Window Aggregates

2023-03-11 Thread Ankit Kumar Pandey


On 04/01/23 18:10, Ankit Kumar Pandey wrote:

On 29/12/22 20:58, Ankit Kumar Pandey wrote:
>
> On 24/12/22 18:22, Ankit Pandey wrote:
>> Hi,
>>
>> This is a PoC patch which implements distinct operation in window 
>> aggregates (without order by and for single column aggregation, final 
>> version may vary wrt these limitations). Purpose of this PoC is to 
>> get feedback on the approach used and corresponding implementation, 
>> any nitpicking as deemed reasonable.

>>
>> Distinct operation is mirrored from implementation in nodeAgg. 
>> Existing partitioning logic determines if row is in partition and 
>> when distinct is required, all tuples for the aggregate column are 
>> stored in tuplesort. When finalize_windowaggregate gets called, 
>> tuples are sorted and duplicates are removed, followed by calling the 
>> transition function on each tuple.
>> When distinct is not required, the above process is skipped and the 
>> transition function gets called directly and nothing gets inserted 
>> into tuplesort.
>> Note: For each partition, in tuplesort_begin and tuplesort_end is 
>> involved to rinse tuplesort, so at any time, max tuples in tuplesort 
>> is equal to tuples in a particular partition.

>>
>> I have verified it for interger and interval column aggregates (to 
>> rule out obvious issues related to data types).

>>
>> Sample cases:
>>
>> create table mytable(id int, name text);
>> insert into mytable values(1, 'A');
>> insert into mytable values(1, 'A');
>> insert into mytable values(5, 'B');
>> insert into mytable values(3, 'A');
>> insert into mytable values(1, 'A');
>>
>> select avg(distinct id) over (partition by name) from mytable;
>>         avg
>> 
>> 2.
>> 2.
>> 2.
>> 2.
>> 5.
>>
>> select avg(id) over (partition by name) from mytable;
>>         avg
>> 
>>  1.5000
>>  1.5000
>>  1.5000
>>  1.5000
>>  5.
>>
>> select avg(distinct id) over () from mytable;
>>         avg
>> 
>>  3.
>>  3.
>>  3.
>>  3.
>>  3.
>>
>> select avg(distinct id)  from mytable;
>>         avg
>> 
>>  3.
>>
>> This is my first-time contribution. Please let me know if anything 
>> can be

>> improved as I`m eager to learn.
>>
>> Regards,
>> Ankit Kumar Pandey
>
> Hi all,
>
> I know everyone is busy with holidays (well, Happy Holidays!) but I 
> will be glad if someone can take a quick look at this PoC and share 
> thoughts.

>
> This is my first time contribution so I am pretty sure there will be 
> some very obvious feedbacks (which will help me to move forward with 
> this change).

>
>
Updated patch with latest master. Last patch was an year old.


Attaching patch with rebase from latest HEAD


Thanks,

Ankit
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 9240c691c1..7c07fb0684 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -155,13 +155,6 @@ typedef struct WindowStatePerAggData
 
 	int64		transValueCount;	/* number of currently-aggregated rows */
 
-	Datum		lastdatum;		/* used for single-column DISTINCT */
-	FmgrInfo	equalfnOne; /* single-column comparisons*/
-	Oid			*eq_ops;
-	Oid			*sort_ops;
-
-	bool sort_in;
-
 	/* Data local to eval_windowaggregates() */
 	bool		restart;		/* need to restart this agg in this cycle? */
 } WindowStatePerAggData;
@@ -171,7 +164,7 @@ static void initialize_windowaggregate(WindowAggState *winstate,
 	   WindowStatePerAgg peraggstate);
 static void advance_windowaggregate(WindowAggState *winstate,
 	WindowStatePerFunc perfuncstate,
-	WindowStatePerAgg peraggstate, Datum value, bool isNull);
+	WindowStatePerAgg peraggstate);
 static bool advance_windowaggregate_base(WindowAggState *winstate,
 		 WindowStatePerFunc perfuncstate,
 		 WindowStatePerAgg peraggstate);
@@ -181,9 +174,6 @@ static void finalize_windowaggregate(WindowAggState *winstate,
 	 Datum *result, bool *isnull);
 
 static void eval_windowaggregates(WindowAggState *winstate);
-static void process_ordered_aggregate_single(WindowAggState *winstate, 
-			 WindowStatePerFunc perfuncstate,
- 			 WindowStatePerAgg peraggstate);
 static void eval_windowfunction(WindowAggState *winstate,
 WindowStatePerFunc perfuncstate,
 Datum *result, bool *isnull);
@@ -241,7 +231,6 @@ initialize_windowaggregate(WindowAggState *winstate,
 	peraggstate->transValueIsNull = peraggstate->initValueIsNull;
 	peraggstate->transValueCount = 0;
 	peraggstate->resultValue = (Datum) 0;
-	peraggstate->lastdatum = (Datum) 0;
 	peraggstate->resultValueIsNull = true;
 }
 
@@ -252,21 +241,43 @@ initialize_windowaggregate(WindowAggState *winstate,
 static void