RE: Ability to reference other extensions by schema in extension scripts

2023-02-25 Thread Regina Obe
> So in conclusion we have 3 possible paths to go with this
> 
> 1) Just don't allow any extensions referenced by other extensions to be
> relocatable.
> It will show a message something like
> "SET SCHEMA not allowed because other extensions depend on it"
> Given that if you don't specify relocatable in you .control file, the
assume is
> relocatable = false , this isn't too far off from standard protocol.
> 
> 2) Use objsubid=1 to denote that another extension explicitly references
the
> schema of another extension so setting schema of other extension is not
okay.
> So instead of introducing another dependency, we'd update the
> DEPENDENCY_NORMAL one between the two schemas with objsubid=1
> instead of 0.
> 
> This has 2 approaches:
> 
> a) Update the existing DEPENDENCY_NORMAL between the two extensions
> setting the objsubid=1
> 
> or
> b) Create a new DEPEDENCY_NORMAL between the two extensions with
> objsubid=1
> 
> I'm not sure if either has implications in backup / restore .  I suspect b
would
> be safer since I  suspect objsubid might be checked and this dependency
only
> needs checking during SET SCHEMA time.
> 
> 3) Create a whole new DEPENDENCY type, perhaps calling it something like
> DEPENDENCY_EXTENSION_SCHEMA
> 
> 4) Just don't allow @extschema:@ syntax to be used unless
> the  is marked as relocatable=false.  This one I don't like
> because it doesn't solve my fundamental issue of
> 
> postgis_tiger_geocoder relying on fuzzystrmatch, which is marked as
> relocatable.
> 
> The main issue I was trying to solve is my extension references
fuzzystrmatch
> functions in  a function used for functional indexes, and this fails
restore of
> table indexes because I can't schema qualify the fuzzystrmatch extension
in
> the backing function.
> 
> 
> If no one has any opinion, I'll go with option 1 which is the one that
strk had
> actually proposed before and seems least programmatically invasive, but
> perhaps more annoying user facing.
> 
> My preferred would be #2
> 
> Thanks,
> Regina

Attached is my revision 3 patch, which follows the proposed #1.
Don't allow schema relocation of an extension if another extension requires
it.



0003-Allow-use-of-extschema-reqextname-to-reference.patch
Description: Binary data


Re: verbose mode for pg_input_error_message?

2023-02-25 Thread Michael Paquier
On Sat, Feb 25, 2023 at 08:58:17PM -0800, Nathan Bossart wrote:
> pg_input_error_info() seems more descriptive to me.  I changed the name to
> that in v4.

error_info() is fine by me.  My recent history is poor lately when it
comes to name new things.

+   values[0] = CStringGetTextDatum(escontext.error_data->message);
+
+   if (escontext.error_data->detail != NULL)
+   values[1] = CStringGetTextDatum(escontext.error_data->detail);
+   else
+   isnull[1] = true;
+
+   if (escontext.error_data->hint != NULL)
+   values[2] = CStringGetTextDatum(escontext.error_data->hint);
+   else
+   isnull[2] = true;
+
+   values[3] = CStringGetTextDatum(
+   unpack_sql_state(escontext.error_data->sqlerrcode));

I am OK with this data set as well.  If somebody makes a case about
more fields in ErrorData, we could always consider these separately.

FWIW, I would like to change some of the regression tests as we are
bikeshedding the whole.

+SELECT pg_input_error_info(repeat('too_long', 32), 'rainbow');
For example, we could use the expanded display for this case in
enum.sql.

 -- test non-error-throwing API
 SELECT str as jsonpath,
pg_input_is_valid(str,'jsonpath') as ok,
-   pg_input_error_message(str,'jsonpath') as errmsg
+   pg_input_error_info(str,'jsonpath') as errmsg
This case in jsonpath.sql is actually wrong, because we have more than
just the error message.

For the others, I would make the choice of expanding the calls of
pg_input_error_info() rather than just showing row outputs, though I
agree that this part is minor.
--
Michael


signature.asc
Description: PGP signature


Re: verbose mode for pg_input_error_message?

2023-02-25 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 08:07:33PM -0500, Tom Lane wrote:
> Maybe pg_input_error_info()?  I tend to agree with Michael that as
> soon as you throw things like the SQLSTATE code into it, "message"
> seems not very apropos.  I'm not dead set on that position, though.

pg_input_error_info() seems more descriptive to me.  I changed the name to
that in v4.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e06bba8374ba486c8593138b10a256aeef42a8af Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 10:31:24 -0800
Subject: [PATCH v4 1/1] add details to pg_input_error_message and rename to
 pg_input_error_info

---
 contrib/cube/expected/cube.out|   8 +-
 contrib/cube/sql/cube.sql |   2 +-
 contrib/hstore/expected/hstore.out|  16 +--
 contrib/hstore/sql/hstore.sql |   4 +-
 contrib/intarray/expected/_int.out|  12 +-
 contrib/intarray/sql/_int.sql |   2 +-
 contrib/isn/expected/isn.out  |  12 +-
 contrib/isn/sql/isn.sql   |   2 +-
 contrib/ltree/expected/ltree.out  |  22 ++--
 contrib/ltree/sql/ltree.sql   |   2 +-
 contrib/seg/expected/seg.out  |  18 +--
 contrib/seg/sql/seg.sql   |   2 +-
 doc/src/sgml/func.sgml|  33 +++--
 src/backend/utils/adt/misc.c  |  42 +--
 src/include/catalog/pg_proc.dat   |  10 +-
 src/test/regress/expected/arrays.out  |   8 +-
 src/test/regress/expected/bit.out |  40 +++---
 src/test/regress/expected/boolean.out |   8 +-
 src/test/regress/expected/box.out |  16 +--
 src/test/regress/expected/char.out|   8 +-
 src/test/regress/expected/char_1.out  |   8 +-
 src/test/regress/expected/char_2.out  |   8 +-
 src/test/regress/expected/date.out|  16 +--
 src/test/regress/expected/domain.out  |  34 +++---
 src/test/regress/expected/enum.out|  16 +--
 .../expected/float4-misrounded-input.out  |   4 +-
 src/test/regress/expected/float4.out  |   8 +-
 src/test/regress/expected/float8.out  |   8 +-
 src/test/regress/expected/geometry.out|  16 +--
 src/test/regress/expected/inet.out|  24 ++--
 src/test/regress/expected/int2.out|  24 ++--
 src/test/regress/expected/int4.out|   8 +-
 src/test/regress/expected/int8.out|   8 +-
 src/test/regress/expected/interval.out|  16 +--
 src/test/regress/expected/json.out|   8 +-
 src/test/regress/expected/json_encoding.out   |   8 +-
 src/test/regress/expected/json_encoding_1.out |   8 +-
 src/test/regress/expected/jsonb.out   |  16 +--
 src/test/regress/expected/jsonpath.out|  16 +--
 src/test/regress/expected/line.out|  40 +++---
 src/test/regress/expected/lseg.out|   8 +-
 src/test/regress/expected/macaddr.out |  16 +--
 src/test/regress/expected/macaddr8.out|  16 +--
 src/test/regress/expected/money.out   |  16 +--
 src/test/regress/expected/multirangetypes.out |  16 +--
 src/test/regress/expected/numeric.out |  24 ++--
 src/test/regress/expected/oid.out |  32 ++---
 src/test/regress/expected/path.out|  16 +--
 src/test/regress/expected/pg_lsn.out  |   8 +-
 src/test/regress/expected/point.out   |   8 +-
 src/test/regress/expected/polygon.out |  16 +--
 src/test/regress/expected/privileges.out  |  24 ++--
 src/test/regress/expected/rangetypes.out  |  40 +++---
 src/test/regress/expected/regproc.out | 114 +-
 src/test/regress/expected/rowtypes.out|  16 +--
 src/test/regress/expected/strings.out |  24 ++--
 src/test/regress/expected/tid.out |  16 +--
 src/test/regress/expected/time.out|  16 +--
 src/test/regress/expected/timestamp.out   |  16 +--
 src/test/regress/expected/timestamptz.out |  16 +--
 src/test/regress/expected/timetz.out  |  16 +--
 src/test/regress/expected/tstypes.out |  24 ++--
 src/test/regress/expected/uuid.out|   8 +-
 src/test/regress/expected/varchar.out |   8 +-
 src/test/regress/expected/varchar_1.out   |   8 +-
 src/test/regress/expected/varchar_2.out   |   8 +-
 src/test/regress/expected/xid.out |  32 ++---
 src/test/regress/expected/xml.out |  10 +-
 src/test/regress/expected/xml_1.out   |   4 +-
 src/test/regress/expected/xml_2.out   |   8 +-
 src/test/regress/sql/arrays.sql   |   2 +-
 src/test/regress/sql/bit.sql  |  10 +-
 src/test/regress/sql/boolean.sql  |   2 +-
 src/test/regress/sql/box.sql  |   4 +-
 src/test/regress/sql/char.sql |   2 +-
 src/test/regress/sql/date.sql |   4 +-
 

Re: pg_upgrade and logical replication

2023-02-25 Thread Julien Rouhaud
On Sat, Feb 25, 2023 at 11:24:17AM +0530, Amit Kapila wrote:
> On Wed, Feb 22, 2023 at 12:13 PM Julien Rouhaud  wrote:
> >
> > > Is there really a use case for dumping the content of pg_subscription_rel
> > > outside of pg_upgrade?
>
> I think the users who want to take a dump and restore the entire
> cluster may need it there for the same reason as pg_upgrade needs it.
> TBH, I have not seen such a request but this is what I imagine one
> would expect if we provide this functionality via pg_upgrade.

But the pg_subscription_rel data are only needed if you want to resume logical
replication from the exact previous state, otherwise you can always refresh the
subscription and it will retrieve the list of relations automatically (dealing
with initial sync and so on).  It's hard to see how it could be happening with
a plain pg_dump.

The only usable scenario I can see would be to disable all subscriptions on the
logical replica, maybe make sure that no one does any write those tables if you
want to eventually switch over on the restored node, do a pg_dump(all), restore
it and then resume the logical replication / subscription(s) on the restored
server.  That's a lot of constraints for something that pg_upgrade deals with
so much more efficiently.  Maybe one plausible use case would be to split a
single logical replica to N servers, one per database / publication or
something like that.  In that case pg_upgrade won't be that useful and if each
target subset is small enough a pg_dump/pg_restore may be a viable option.  But
if that's a viable option then surely creating the logical replica from scratch
using normal logical table sync should be an even better option.

I'm really worried that it's going to be a giant foot-gun that any user should
really avoid.

> > > For the pg_upgrade use-case, do you see any reason to not restore the
> > > pg_subscription_rel by default?
>
> As I said earlier, one can very well say that giving more flexibility
> (in terms of where the publications will be after restore) after a
> restore is a better idea. Also, we are doing the same till now without
> any major complaints about the same, so it makes sense to keep the
> current behavior as default.

I'm a bit dubious that anyone actually tried to run pg_upgrade on a logical
replica and then kept using logical replication, as it's currently impossible
to safely resume replication without truncating all target relations.

As I mentioned before, if we keep the current behavior as a default there
should be an explicit warning in the documentation stating that you need to
truncate all target relations before resuming logical replication as otherwise
you have a guarantee that you will lose data.

> > >  Maybe having an option to not restore it would
> > > make sense if it indeed add noticeable overhead when publications have a 
> > > lot of
> > > tables?
>
> Yeah, that could be another reason to not do it default.

I will do some benchmark with various number of relations, from high to
unreasonable.

> >
> > Since I didn't hear any objection I worked on a POC patch with this 
> > approach.
> >
> > For now when pg_dump is invoked with --binary, it will always emit extra
> > commands to restore the relation list.  This command is only allowed when 
> > the
> > server is started in binary upgrade mode.
> >
> > The new command is of the form
> >
> > ALTER SUBSCRIPTION name ADD TABLE (relid = X, state = 'Y', lsn = 'Z/Z')
> >
> > with the lsn part being optional.
> >
>
> BTW, do we restore the origin and its LSN after the upgrade? Because
> without that this won't be sufficient as that is required for apply
> worker to ensure that it is in sync with table sync workers.

We currently don't, which is yet another sign that no one actually tried to
resume logical replication after a pg_upgrade.  That being said, trying to
pg_upgrade a node that's currently syncing relations seems like a bad idea
(I didn't even think to try), but I guess it should also be supported.  I will
work on that too.  Assuming we add a new option for controlling either plain
pg_dump and/or pg_upgrade behavior, should this option control both
pg_subscription_rel and replication origins and their data or do we need more
granularity?




Re: zstd compression for pg_dump

2023-02-25 Thread Justin Pryzby
On Fri, Feb 24, 2023 at 01:18:40PM -0600, Justin Pryzby wrote:
> This is a draft patch - review is welcome and would help to get this
> ready to be considererd for v16, if desired.
> 
> I'm going to add this thread to the old CF entry.
> https://commitfest.postgresql.org/31/2888/

This resolves cfbot warnings: windows and cppcheck.
And refactors zstd routines.
And updates docs.
And includes some fixes for earlier patches that these patches conflicts
with/depends on.
>From ea9b67d09fe1b51e7946cf34fca5795e57dd3858 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH 1/4] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_gzip.c  | 8 
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 8 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..52f41c2e58c 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,17 +123,9 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
-		/* Just be paranoid - maybe End is called after Start, with no Write */
 		zp->next_out = gzipcs->outbuf;
 		zp->avail_out = gzipcs->outsize;
 	}
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	bool		supported = false;
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..46815fa2ebe 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -21,7 +21,7 @@
 #define ZLIB_OUT_SIZE	4096
 #define ZLIB_IN_SIZE	4096
 
-extern char *supports_compression(const pg_compress_specification compression_spec);
+extern char *pgdump_supports_compression(const pg_compress_specification compression_spec);
 
 /*
  * Prototype for callback function used in writeData()
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm
  * from 'path', either by examining its suffix or by appending the supported
  * suffixes in 'path'.
  */
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fe1014e6e77..63e794cdc68 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -161,8 +161,8 @@ typedef struct LZ4File
 } LZ4File;
 
 /*
- * LZ4 equivalent to feof() or gzeof(). The end of file is reached if there
- * is no decompressed output in the overflow buffer and the end of the file
+ * LZ4 equivalent to feof() or gzeof().  Return true iff there is no
+ * decompressed 

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-02-25 Thread Vik Fearing

On 1/4/23 12:57, Alvaro Herrera wrote:

I haven't read this patch other than superficially; I suppose the
feature it's introducing is an OK one to have as an extension to the
standard.  (I hope the community members that are committee members
will propose this extension to become part of the standard.)



I have been doing some research on this, reading the original papers 
that introduced the feature and its improvements.


I don't see anything that ever considered what this patch proposes, even 
though SQL Server has it. (The initial MERGE didn't even have DELETE!)


SOURCE and TARGET are not currently keywords, but the only things that 
can come after MATCHED are THEN and AND, so I don't foresee any issues 
with us implementing this before the committee accepts such a change 
proposal.  I also don't see how the committee could possibly change the 
semantics of this, and two implementations having it is a good argument 
for getting it in.


We should be cautious in doing something differently from SQL Server 
here, and I would appreciate any differences being brought to my 
attention so I can incorporate them into a specification, even if that 
means resorting to the hated "implementation-defined".

--
Vik Fearing





Re: verbose mode for pg_input_error_message?

2023-02-25 Thread Tom Lane
Nathan Bossart  writes:
> On Sat, Feb 25, 2023 at 01:39:21PM +0900, Michael Paquier wrote:
>> pg_input_error_message() does not strike me as a good function name,
>> though, because it now returns much more than an error message.
>> Hence, couldn't something like pg_input_error() be better, because
>> more generic?

> I personally think the existing name is fine.  It returns the error
> message, which includes the primary, detail, and hint messages.  Also, I'm
> not sure that pg_input_error() is descriptive enough.  That being said, I'm
> happy to run the sed command to change the name to whatever folks think is
> best.

Maybe pg_input_error_info()?  I tend to agree with Michael that as
soon as you throw things like the SQLSTATE code into it, "message"
seems not very apropos.  I'm not dead set on that position, though.

regards, tom lane




Re: verbose mode for pg_input_error_message?

2023-02-25 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 01:39:21PM +0900, Michael Paquier wrote:
> pg_input_error_message() does not strike me as a good function name,
> though, because it now returns much more than an error message.
> Hence, couldn't something like pg_input_error() be better, because
> more generic?

I personally think the existing name is fine.  It returns the error
message, which includes the primary, detail, and hint messages.  Also, I'm
not sure that pg_input_error() is descriptive enough.  That being said, I'm
happy to run the sed command to change the name to whatever folks think is
best.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: windows/meson cfbot warnings

2023-02-25 Thread Justin Pryzby
On Sat, Feb 25, 2023 at 10:29:33AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-12-28 17:44:47 -0800, Andres Freund wrote:
> > On 2022-12-28 18:35:38 -0600, Justin Pryzby wrote:
> > > Since a few days ago, the windows/meson task has been spewing messages 
> > > for each tap
> > > test:
> > > 
> > > | Unknown TAP version. The first line MUST be `TAP version `. 
> > > Assuming version 12.
> > > 
> > > I guess because the image is updated to use meson v1.0.0.
> > > https://github.com/mesonbuild/meson/commit/b7a5c384a1f1ba80c09904e7ef4f5160bdae3345
> > > 
> > > mesonbuild/mtest.py-if version is None:
> > > mesonbuild/mtest.py:self.warnings.append('Unknown TAP 
> > > version. The first line MUST be `TAP version `. Assuming version 
> > > 12.')
> > > mesonbuild/mtest.py-version = 12
> > 
> > I think the change is somewhat likely to be reverted in the next meson minor
> > version. It apparently came about due to somewhat odd language in the tap-14
> > spec...  So I think we should just wait for now.
> 
> FWIW, that did happen in 1.0.1, and the output is now clean again.

Unrelated, but something else changed and now there's this.

https://cirrus-ci.com/task/6202242768830464

[20:10:34.310] c:\cirrus>call sh -c 'if grep ": warning " build.txt; then exit 
1; fi; exit 0' 
[20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
macro redefinition
[20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
macro redefinition
[20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
macro redefinition
[20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
macro redefinition




Re: stopgap fix for signal handling during restore_command

2023-02-25 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:
> I think I opined on this before, but we really ought to have a function to do
> some minimal signal safe output. Implemented centrally, instead of being open
> coded in a bunch of places.

While looking around for the right place to put this, I noticed that
there's a write_stderr() function in elog.c that we might be able to use.
I used that in v9.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7dcf8fe947ab7b6e0b37ddd42a08bbbc560d4a37 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 14:27:48 -0800
Subject: [PATCH v9 1/2] Move extra code out of the Pre/PostRestoreCommand()
 block.

If SIGTERM is received within this block, the startup process will
immediately proc_exit() in the signal handler, so it is inadvisable
to include any more code than is required in this section.  This
change moves the code recently added to this block (see 1b06d7b and
7fed801) to outside of the block.  This ensures that only system()
is called while proc_exit() might be called in the SIGTERM handler,
which is how this code worked from v8.4 to v14.
---
 src/backend/access/transam/xlogarchive.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index fcc87ff44f..41684418b6 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+
 	/*
-	 * Check signals before restore command and reset afterwards.
+	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
+	 * that it should proc_exit() right away.  This is done for the duration of
+	 * the system() call because there isn't a good way to break out while it
+	 * is executing.  Since we might call proc_exit() in a signal handler, it
+	 * is best to put any additional logic before or after the
+	 * PreRestoreCommand()/PostRestoreCommand() section.
 	 */
 	PreRestoreCommand();
 
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
 	rc = system(xlogRestoreCmd);
-	pgstat_report_wait_end();
 
 	PostRestoreCommand();
+
+	pgstat_report_wait_end();
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
-- 
2.25.1

>From e41a13a77b1c6cf156387cee3ca49fb2f4b1253b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v9 2/2] Don't proc_exit() in startup's SIGTERM handler if
 forked by system().

Instead, emit a message to STDERR and _exit() in this case.  This
change also adds assertions to proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
 src/backend/postmaster/startup.c | 17 -
 src/backend/storage/ipc/ipc.c|  3 +++
 src/backend/storage/lmgr/proc.c  |  2 ++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index efc2580536..261f992357 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
@@ -121,7 +123,20 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 	int			save_errno = errno;
 
 	if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			write_stderr("StartupProcShutdownHandler() called in child process\n");
+			_exit(1);
+		}
+	}
 	else
 		shutdown_requested = true;
 	WakeupRecovery();
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 1904d21795..d5097dc008 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -103,6 +103,9 @@ static int	on_proc_exit_index,
 void
 proc_exit(int code)
 {
+	/* proc_exit() is not safe in forked processes from system(), etc. */
+	Assert(MyProcPid == (int) getpid());
+
 	/* Clean up everything that must be cleaned up */
 	proc_exit_prepare(code);
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..b9e2c3aafe 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
 	dlist_head *procgloballist;
 
 	Assert(MyProc != NULL);
+	Assert(MyProc->pid == (int) getpid());  /* not safe if forked by system(), etc. */
 
 	/* Make sure we're out of the sync rep 

Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-02-25 Thread Andres Freund
Hi,

Around
https://www.postgresql.org/message-id/20230224015417.75yimxbksejpffh3%40awork3.anarazel.de
I suggested that we should evaluate the arguments of correlated SubPlans as
part of the expression referencing the subplan.

Here's a patch for that.

Ended up simpler than I'd thought. I see small, consistent, speedups and
reductions in memory usage.

I think individual arguments are mainly (always?)  Var nodes. By evaluating
them as part of the containing expression we avoid the increased memory usage,
and the increased dispatch of going through another layer of
ExprState. Because the arguments are a single Var, which end up with a
slot_getattr() via ExecJust*Var, we also elide redundant slot_getattr()
checks. I think we already avoided redundant tuple deforming, because the
parent ExprState will have done that already.

Greetings,

Andres Freund
>From 3b68577bbcd0f78b80abe1ac07eedd6998254951 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 25 Feb 2023 13:39:19 -0800
Subject: [PATCH v1] WIP: Evaluate arguments of correlated SubPlans in the
 referencing ExprState

---
 src/include/nodes/execnodes.h   |  1 -
 src/backend/executor/execExpr.c | 81 +
 src/backend/executor/execProcnode.c |  5 ++
 src/backend/executor/nodeSubplan.c  | 30 +--
 4 files changed, 66 insertions(+), 51 deletions(-)

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f3..437cf8b5a02 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -947,7 +947,6 @@ typedef struct SubPlanState
 	struct PlanState *planstate;	/* subselect plan's state tree */
 	struct PlanState *parent;	/* parent plan node's state tree */
 	ExprState  *testexpr;		/* state of combining expression */
-	List	   *args;			/* states of argument expression(s) */
 	HeapTuple	curTuple;		/* copy of most recent tuple from subplan */
 	Datum		curArray;		/* most recent array from ARRAY() subplan */
 	/* these are used when hashing the subselect's output: */
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c61f23c6c18..7a9d5729b4b 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -87,6 +87,9 @@ static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
   FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
   int transno, int setno, int setoff, bool ishash,
   bool nullcheck);
+static void ExecInitSubPlanExpr(SubPlan *subplan,
+ExprState *state,
+Datum *resv, bool *resnull);
 
 
 /*
@@ -1388,7 +1391,6 @@ ExecInitExprRec(Expr *node, ExprState *state,
 		case T_SubPlan:
 			{
 SubPlan*subplan = (SubPlan *) node;
-SubPlanState *sstate;
 
 /*
  * Real execution of a MULTIEXPR SubPlan has already been
@@ -1405,19 +1407,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	break;
 }
 
-if (!state->parent)
-	elog(ERROR, "SubPlan found with no parent plan");
-
-sstate = ExecInitSubPlan(subplan, state->parent);
-
-/* add SubPlanState nodes to state->parent->subPlan */
-state->parent->subPlan = lappend(state->parent->subPlan,
- sstate);
-
-scratch.opcode = EEOP_SUBPLAN;
-scratch.d.subplan.sstate = sstate;
-
-ExprEvalPushStep(state, );
+ExecInitSubPlanExpr(subplan, state, resv, resnull);
 break;
 			}
 
@@ -2618,29 +2608,12 @@ ExecPushExprSetupSteps(ExprState *state, ExprSetupInfo *info)
 	foreach(lc, info->multiexpr_subplans)
 	{
 		SubPlan*subplan = (SubPlan *) lfirst(lc);
-		SubPlanState *sstate;
 
 		Assert(subplan->subLinkType == MULTIEXPR_SUBLINK);
 
-		/* This should match what ExecInitExprRec does for other SubPlans: */
-
-		if (!state->parent)
-			elog(ERROR, "SubPlan found with no parent plan");
-
-		sstate = ExecInitSubPlan(subplan, state->parent);
-
-		/* add SubPlanState nodes to state->parent->subPlan */
-		state->parent->subPlan = lappend(state->parent->subPlan,
-		 sstate);
-
-		scratch.opcode = EEOP_SUBPLAN;
-		scratch.d.subplan.sstate = sstate;
-
 		/* The result can be ignored, but we better put it somewhere */
-		scratch.resvalue = >resvalue;
-		scratch.resnull = >resnull;
-
-		ExprEvalPushStep(state, );
+		ExecInitSubPlanExpr(subplan, state,
+			>resvalue, >resnull);
 	}
 }
 
@@ -4040,3 +4013,45 @@ ExecBuildParamSetEqual(TupleDesc desc,
 
 	return state;
 }
+
+static void
+ExecInitSubPlanExpr(SubPlan *subplan,
+	ExprState *state,
+	Datum *resv, bool *resnull)
+{
+	ExprEvalStep scratch = {0};
+	SubPlanState *sstate;
+	ListCell   *pvar;
+	ListCell   *l;
+	EState	   *estate = state->parent->state;
+
+	if (!state->parent)
+		elog(ERROR, "SubPlan found with no parent plan");
+
+	/*
+	 * Generate steps to evaluate input arguments for the subplan.
+	 *
+	 * Any calculation we have to do can be done in the parent econtext, since
+	 * the Param values don't need to have per-query lifetime.
+	 */
+	forboth(l, subplan->parParam, pvar, 

Re: use __builtin_clz to compute most significant bit set

2023-02-25 Thread Magnus Hagander
On Sat, Feb 25, 2023 at 9:32 PM Joseph Yu  wrote:

> hi community
>
> This is the first time for me to submit a patch to Postgres community.
>
> instead of using for loop to find the most significant bit set. we could
> use __builtin_clz function to first find the number of leading zeros for
> the mask and then we can find the index by 32 - __builtin_clz(mask).
>

Hi!

This file has already been removed, as of 4f1f5a7f85. Which already uses
__builtin_clz if it' available.

Were you perhaps looking at an old version instead of the master branch?


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


RE: Ability to reference other extensions by schema in extension scripts

2023-02-25 Thread Regina Obe
> On Mon, Feb 06, 2023 at 05:19:39AM -0500, Regina Obe wrote:
> >
> > Attached is a revised version of the original patch. It is revised to
> > prevent
> >
> > ALTER EXTENSION .. SET SCHEMA  if there is a dependent extension that
> > references the extension in their scripts using
> > @extschema:extensionname@ It also adds additional tests to verify that
> new feature.
> >
> > In going thru the code base, I was tempted to add a new dependency
> > type instead of using the existing DEPENDENCY_AUTO.  I think this
> > would be cleaner, but I felt that was overstepping the area a bit,
> > since it requires making changes to dependency.h and dependency.c
> >
> > My main concern with using DEPENDENCY_AUTO is because it was designed
> > for cases where an object can be dropped without need for CASCADE.  In
> > this case, we don't want a dependent extension to be dropped if it's
> > required is dropped.  However since there will already exist a
> > DEPENDENCY_NORMAL between the 2 extensions, I figure we are protected
> > against that issue already.
> 
> I was thinking: how about using the "refobjsubid" to encode the "level" of
> dependency on an extension ? Right now "refobjsubid" is always 0 when the
> referenced object is an extension.
> Could we consider subid=1 to mean the dependency is not only on the
> extension but ALSO on it's schema location ?
> 

I like that idea.  It's only been ever used for tables I think, but I don't
see why it wouldn't apply in this case as the concept is kinda the same.
Only concern if other parts rely on this being 0.

The other question, should this just update the existing DEPENDENCY_NORMAL
extension or add a new DEPENDENCY_NORMAL between the extensions with
subid=1?


> Also: should we really allow extensions to rely on other extension w/out
fully-
> qualifying calls to their functions ? Or should it be discouraged and thus
> forbidden ? If we wanted to forbid it we then would not need to encode any
> additional dependency but rather always forbid `ALTER EXTENSION .. SET
> SCHEMA` whenever the extension is a dependency of any other extension.
> 
> On the code in the patch itself, I tried with this simple use case:
> 
>   - ext1, relocatable, exposes an ext1log(text) function
> 
>   - ext2, relocatable, exposes an ext2log(text) function
> calling @extschema:ext1@.ext1log()
> 

This would be an okay solution to me too if everyone is okay with it.


> What is not good:
> 
>   - Drop of ext1 automatically cascades to drop of ext2 without even a
> notice:
> 
>   test=# create extension ext2 cascade;
>   NOTICE:  installing required extension "ext1"
>   CREATE EXTENSION
>   test=# drop extension ext1;
>   DROP EXTENSION -- no WARNING, no NOTICE, ext2 is gone
> 

Oops.  I don't know why I thought the normal dependency would protect
against this.  I should have tested that.  So DEPENDENCY_AUTO is not an
option to use and creating a new type of dependency seems like over stepping
the bounds of this patch.


> What is good:
> 
>   - ext1 cannot be relocated while ext2 is loaded:
> 
>   test=# create extension ext2 cascade;
>   NOTICE:  installing required extension "ext1"
>   CREATE EXTENSION
>   test=# alter extension ext1 set schema n1;
>   ERROR:  Extension can not be relocated because dependent
> extension references it's location
>   test=# drop extension ext2;
>   DROP EXTENSION
>   test=# alter extension ext1 set schema n1;
>   ALTER EXTENSION
> 
> --strk;
> 
>   Libre GIS consultant/developer
>   https://strk.kbt.io/services.html

So in conclusion we have 3 possible paths to go with this

1) Just don't allow any extensions referenced by other extensions to be
relocatable.
It will show a message something like 
"SET SCHEMA not allowed because other extensions depend on it"
Given that if you don't specify relocatable in you .control file, the assume
is relocatable = false , this isn't too far off from standard protocol.

2) Use objsubid=1 to denote that another extension explicitly references the
schema of another extension so setting schema of other extension is not
okay.  So instead of introducing another dependency, we'd update the
DEPENDENCY_NORMAL one between the two schemas with objsubid=1 instead of 0.

This has 2 approaches:

a) Update the existing DEPENDENCY_NORMAL between the two extensions setting
the objsubid=1

or 
b) Create a new DEPEDENCY_NORMAL between the two extensions with objsubid=1

I'm not sure if either has implications in backup / restore .  I suspect b
would be safer since I  suspect objsubid might be checked and this
dependency only needs checking during SET SCHEMA time.

3) Create a whole new DEPENDENCY type, perhaps calling it something like
DEPENDENCY_EXTENSION_SCHEMA

4) Just don't allow @extschema:@ syntax to be used unless the
 is marked as relocatable=false.  This one I 

use __builtin_clz to compute most significant bit set

2023-02-25 Thread Joseph Yu
hi community

This is the first time for me to submit a patch to Postgres community.

instead of using for loop to find the most significant bit set. we could
use __builtin_clz function to first find the number of leading zeros for
the mask and then we can find the index by 32 - __builtin_clz(mask).

diff --git a/src/port/fls.c b/src/port/fls.c
index 19b4221826..4f4c412732 100644
--- a/src/port/fls.c
+++ b/src/port/fls.c
@@ -54,11 +54,7 @@
 int
 fls(int mask)
 {
- int bit;
-
  if (mask == 0)
  return (0);
- for (bit = 1; mask != 1; bit++)
- mask = (unsigned int) mask >> 1;
- return (bit);
+ return (sizeof(int) << 3) - __builtin_clz(mask);
 }

Best Regards,

Joseph
diff --git a/src/port/fls.c b/src/port/fls.c
index 19b4221826..4f4c412732 100644
--- a/src/port/fls.c
+++ b/src/port/fls.c
@@ -54,11 +54,7 @@
 int
 fls(int mask)
 {
-	int			bit;
-
 	if (mask == 0)
 		return (0);
-	for (bit = 1; mask != 1; bit++)
-		mask = (unsigned int) mask >> 1;
-	return (bit);
+	return (sizeof(int) << 3) - __builtin_clz(mask);
 }


Add shared buffer hits to pg_stat_io

2023-02-25 Thread Melanie Plageman
Hi,

As suggested in [1], the attached patch adds shared buffer hits to
pg_stat_io.

I remember at some point having this in the view and then removing it
but I can't quite remember what the issue was -- nor do I see a
rationale mentioned in the thread [2].

It might have had something to do with the interaction between the
now-removed "rejected" buffers column.

I am looking for input as to the order of this column in the view. I
think it should go after op_bytes since it is not relevant for
non-block-oriented IO. However, I'm not sure what the order of hits,
evictions, and reuses should be (all have to do with buffers).

While adding this, I noticed that I had made all of the IOOP columns
int8 in the view, and I was wondering if this is sufficient for hits (I
imagine you could end up with quite a lot of those).

- Melanie

[1] 
https://www.postgresql.org/message-id/20230209050319.chyyup4vtq4jzobq%40awork3.anarazel.de
[2] 
https://www.postgresql.org/message-id/flat/20230209050319.chyyup4vtq4jzobq%40awork3.anarazel.de#63ff7a97b7a5bb7b86c1a250065be7f9
From 4055b14e3c2c610575a9bb3fe631b2094d3ea1a6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 14:37:25 -0500
Subject: [PATCH v1 1/2] Reorder pgstatfuncs-local enum

---
 src/backend/utils/adt/pgstatfuncs.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9d707c3521..1e55e077b7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1276,16 +1276,16 @@ pgstat_get_io_op_index(IOOp io_op)
 	{
 		case IOOP_EVICT:
 			return IO_COL_EVICTIONS;
+		case IOOP_EXTEND:
+			return IO_COL_EXTENDS;
+		case IOOP_FSYNC:
+			return IO_COL_FSYNCS;
 		case IOOP_READ:
 			return IO_COL_READS;
 		case IOOP_REUSE:
 			return IO_COL_REUSES;
 		case IOOP_WRITE:
 			return IO_COL_WRITES;
-		case IOOP_EXTEND:
-			return IO_COL_EXTENDS;
-		case IOOP_FSYNC:
-			return IO_COL_FSYNCS;
 	}
 
 	elog(ERROR, "unrecognized IOOp value: %d", io_op);
-- 
2.37.2

From a3821a166b57464839ee5878598c233b5a722572 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 14:36:06 -0500
Subject: [PATCH v1 2/2] Add IOOP_HIT to pg_stat_io

Begin tracking shared buffer hits in pg_stat_io.
---
 doc/src/sgml/monitoring.sgml   | 11 
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/storage/buffer/bufmgr.c| 38 ++
 src/backend/storage/buffer/localbuf.c  | 10 +--
 src/backend/utils/activity/pgstat_io.c |  2 +-
 src/backend/utils/adt/pgstatfuncs.c|  3 ++
 src/include/catalog/pg_proc.dat|  6 ++--
 src/include/pgstat.h   |  1 +
 src/include/storage/buf_internals.h|  2 +-
 src/test/regress/expected/rules.out|  3 +-
 10 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b0b997f092..53e7e40a07 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3855,6 +3855,17 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+hits bigint
+   
+   
+The number of times a desired block was found in a shared buffer.
+   
+  
+ 
+
  
   

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 34ca0e739f..87bbbdfcb3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1126,6 +1126,7 @@ SELECT
b.writes,
b.extends,
b.op_bytes,
+   b.hits,
b.evictions,
b.reuses,
b.fsyncs,
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 98904a7c05..7a23088c0b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -472,7 +472,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 			   ForkNumber forkNum,
 			   BlockNumber blockNum,
 			   BufferAccessStrategy strategy,
-			   bool *foundPtr, IOContext *io_context);
+			   bool *foundPtr, IOContext io_context);
 static void FlushBuffer(BufferDesc *buf, SMgrRelation reln,
 		IOObject io_object, IOContext io_context);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
@@ -850,13 +850,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	if (isLocalBuf)
 	{
 		/*
-		 * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We
-		 * do not use a BufferAccessStrategy for I/O of temporary tables.
+		 * We do not use a BufferAccessStrategy for I/O of temporary tables.
 		 * However, in some cases, the "strategy" may not be NULL, so we can't
 		 * rely on IOContextForStrategy() to set the right IOContext for us.
 		 * This may happen in cases like CREATE TEMPORARY TABLE AS...
 		 */
-		bufHdr = LocalBufferAlloc(smgr, 

Re: stopgap fix for signal handling during restore_command

2023-02-25 Thread Andres Freund
Hi,

On 2023-02-25 11:28:25 -0800, Nathan Bossart wrote:
> On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:
> > Why do we need that rc variable? Don't we normally get away with (void)
> > write(...)?
> 
> My compiler complains about that.  :/
> 
>   ../postgresql/src/backend/postmaster/startup.c: In function 
> ‘StartupProcShutdownHandler’:
>   ../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring 
> return value of ‘write’, declared with attribute warn_unused_result 
> [-Werror=unused-result]
> 139 |(void) write(STDERR_FILENO, msg, sizeof(msg));
> |   ^~
>   cc1: all warnings being treated as errors

Ick.  I guess we've already encountered this, because we've apparently removed
all the (void) write cases. Which I am certain we had at some point. We still
do it for a bunch of other functions though.  Ah, yes: aa90e148ca7,
27314d32a88, 6c72a28e5ce etc.

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.


> >> diff --git a/src/backend/storage/lmgr/proc.c 
> >> b/src/backend/storage/lmgr/proc.c
> >> index 22b4278610..e3da0622d7 100644
> >> --- a/src/backend/storage/lmgr/proc.c
> >> +++ b/src/backend/storage/lmgr/proc.c
> >> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
> >>dlist_head *procgloballist;
> >>  
> >>Assert(MyProc != NULL);
> >> +  Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
> >> system(), etc. */
> >>  
> >>/* Make sure we're out of the sync rep lists */
> >>SyncRepCleanupAtProcExit();
> >> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
> >>PGPROC *proc;
> >>  
> >>Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
> >> +  Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
> >> system(), etc. */
> >>  
> >>auxproc = [proctype];
> >>  
> >> -- 
> >> 2.25.1
> > 
> > I think the much more interesting assertion here would be to check that
> > MyProc->pid equals the current pid.
> 
> I don't mind changing this, but why is this a more interesting assertion?

Because we so far have little to no protection against some state corruption
leading to releasing PGPROC that's not ours.  I didn't actually mean that we
shouldn't check that MyProcPid == (int) getpid(), just that the much more
interesting case to check is that MyProc->pid matches, because that protect
against multiple releases, releasing the wrong slot, etc.

Greetings,

Andres Freund




Re: stopgap fix for signal handling during restore_command

2023-02-25 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 11:28:25AM -0800, Nathan Bossart wrote:
> On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:
>> I think the much more interesting assertion here would be to check that
>> MyProc->pid equals the current pid.
> 
> I don't mind changing this, but why is this a more interesting assertion?

Here is a new patch set with this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 575b959d5e2980454c7ebe10794fff7c5aaf68ed Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 14:27:48 -0800
Subject: [PATCH v8 1/2] Move extra code out of the Pre/PostRestoreCommand()
 block.

If SIGTERM is received within this block, the startup process will
immediately proc_exit() in the signal handler, so it is inadvisable
to include any more code than is required in this section.  This
change moves the code recently added to this block (see 1b06d7b and
7fed801) to outside of the block.  This ensures that only system()
is called while proc_exit() might be called in the SIGTERM handler,
which is how this code worked from v8.4 to v14.
---
 src/backend/access/transam/xlogarchive.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index fcc87ff44f..41684418b6 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+
 	/*
-	 * Check signals before restore command and reset afterwards.
+	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
+	 * that it should proc_exit() right away.  This is done for the duration of
+	 * the system() call because there isn't a good way to break out while it
+	 * is executing.  Since we might call proc_exit() in a signal handler, it
+	 * is best to put any additional logic before or after the
+	 * PreRestoreCommand()/PostRestoreCommand() section.
 	 */
 	PreRestoreCommand();
 
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
 	rc = system(xlogRestoreCmd);
-	pgstat_report_wait_end();
 
 	PostRestoreCommand();
+
+	pgstat_report_wait_end();
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
-- 
2.25.1

>From 42a9760cd34e9aafa6ffe30b0e28e2c893ae6a2c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v8 2/2] Don't proc_exit() in startup's SIGTERM handler if
 forked by system().

Instead, emit a message to STDERR and _exit() in this case.  This
change also adds assertions to proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
 src/backend/postmaster/startup.c | 20 +++-
 src/backend/storage/ipc/ipc.c|  3 +++
 src/backend/storage/lmgr/proc.c  |  2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index efc2580536..bace915881 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
@@ -121,7 +123,23 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 	int			save_errno = errno;
 
 	if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			const char	msg[] = "StartupProcShutdownHandler() called in child process\n";
+			int			rc pg_attribute_unused();
+
+			rc = write(STDERR_FILENO, msg, sizeof(msg));
+			_exit(1);
+		}
+	}
 	else
 		shutdown_requested = true;
 	WakeupRecovery();
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 1904d21795..d5097dc008 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -103,6 +103,9 @@ static int	on_proc_exit_index,
 void
 proc_exit(int code)
 {
+	/* proc_exit() is not safe in forked processes from system(), etc. */
+	Assert(MyProcPid == (int) getpid());
+
 	/* Clean up everything that must be cleaned up */
 	proc_exit_prepare(code);
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..b9e2c3aafe 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
 	dlist_head *procgloballist;
 
 	Assert(MyProc != NULL);
+	Assert(MyProc->pid == (int) getpid());  /* not safe if forked by system(), etc. */
 
 	

Re: stopgap fix for signal handling during restore_command

2023-02-25 Thread Nathan Bossart
On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:
> On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:>  
>>  if (in_restore_command)
>> -proc_exit(1);
>> +{
>> +/*
>> + * If we are in a child process (e.g., forked by system() in
>> + * RestoreArchivedFile()), we don't want to call any exit 
>> callbacks.
>> + * The parent will take care of that.
>> + */
>> +if (MyProcPid == (int) getpid())
>> +proc_exit(1);
>> +else
>> +{
>> +const char  msg[] = "StartupProcShutdownHandler() 
>> called in child process\n";
>> +int rc pg_attribute_unused();
>> +
>> +rc = write(STDERR_FILENO, msg, sizeof(msg));
>> +_exit(1);
>> +}
>> +}
> 
> Why do we need that rc variable? Don't we normally get away with (void)
> write(...)?

My compiler complains about that.  :/

../postgresql/src/backend/postmaster/startup.c: In function 
‘StartupProcShutdownHandler’:
../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring 
return value of ‘write’, declared with attribute warn_unused_result 
[-Werror=unused-result]
  139 |(void) write(STDERR_FILENO, msg, sizeof(msg));
  |   ^~
cc1: all warnings being treated as errors

>> diff --git a/src/backend/storage/lmgr/proc.c 
>> b/src/backend/storage/lmgr/proc.c
>> index 22b4278610..e3da0622d7 100644
>> --- a/src/backend/storage/lmgr/proc.c
>> +++ b/src/backend/storage/lmgr/proc.c
>> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
>>  dlist_head *procgloballist;
>>  
>>  Assert(MyProc != NULL);
>> +Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
>> system(), etc. */
>>  
>>  /* Make sure we're out of the sync rep lists */
>>  SyncRepCleanupAtProcExit();
>> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
>>  PGPROC *proc;
>>  
>>  Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
>> +Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
>> system(), etc. */
>>  
>>  auxproc = [proctype];
>>  
>> -- 
>> 2.25.1
> 
> I think the much more interesting assertion here would be to check that
> MyProc->pid equals the current pid.

I don't mind changing this, but why is this a more interesting assertion?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: stopgap fix for signal handling during restore_command

2023-02-25 Thread Andres Freund
Hi,

On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:>  
>   if (in_restore_command)
> - proc_exit(1);
> + {
> + /*
> +  * If we are in a child process (e.g., forked by system() in
> +  * RestoreArchivedFile()), we don't want to call any exit 
> callbacks.
> +  * The parent will take care of that.
> +  */
> + if (MyProcPid == (int) getpid())
> + proc_exit(1);
> + else
> + {
> + const char  msg[] = "StartupProcShutdownHandler() 
> called in child process\n";
> + int rc pg_attribute_unused();
> +
> + rc = write(STDERR_FILENO, msg, sizeof(msg));
> + _exit(1);
> + }
> + }

Why do we need that rc variable? Don't we normally get away with (void)
write(...)?


> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index 22b4278610..e3da0622d7 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
>   dlist_head *procgloballist;
>  
>   Assert(MyProc != NULL);
> + Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
> system(), etc. */
>  
>   /* Make sure we're out of the sync rep lists */
>   SyncRepCleanupAtProcExit();
> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
>   PGPROC *proc;
>  
>   Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
> + Assert(MyProcPid == (int) getpid());  /* not safe if forked by 
> system(), etc. */
>  
>   auxproc = [proctype];
>  
> -- 
> 2.25.1

I think the much more interesting assertion here would be to check that
MyProc->pid equals the current pid.

Greetings,

Andres Freund




Re: Weird failure with latches in curculio on v15

2023-02-25 Thread Andres Freund
Hi,

On 2023-02-19 20:06:24 +0530, Robert Haas wrote:
> On Sun, Feb 19, 2023 at 2:45 AM Andres Freund  wrote:
> > To me that seems even simpler? Nothing but the archiver is supposed to 
> > create
> > .done files and nothing is supposed to remove .ready files without archiver
> > having created the .done files.  So the archiver process can scan
> > archive_status until its done or until N archives have been collected, and
> > then process them at once?  Only the creation of the .done files would be
> > serial, but I don't think that's commonly a problem (and could be optimized 
> > as
> > well, by creating multiple files and then fsyncing them in a second pass,
> > avoiding N filesystem journal flushes).
> >
> > Maybe I am misunderstanding what you see as the problem?
> 
> Well right now the archiver process calls ArchiveFileCB when there's a
> file ready for archiving, and that process is supposed to archive the
> whole thing before returning. That pretty obviously seems to preclude
> having more than one file being archived at the same time. What
> callback structure do you have in mind to allow for that?

TBH, I think the current archive and restore module APIs aren't useful. I
think it was a mistake to add archive modules without having demonstrated that
one can do something useful with them that the restore_command didn't already
do. If anything, archive modules have made it harder to improve archiving
performance via concurrency.

My point was that it's easy to have multiple archive commands in process at
the same time, because we already have a queuing system, and that
archive_command is entire compatible with doing that, because running multiple
subprocesses is pretty trivial. It wasn't that the archive API is suitable for
that.


> I mean, my idea was to basically just have one big callback:
> ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would
> only return when archiving was totally caught up and there was nothing
> more to do right now. And then that callback could call functions like
> AreThereAnyMoreFilesIShouldBeArchivingAndIfYesWhatIsTheNextOne(). So
> it would call that function and it would find out about a file and
> start an HTTP session or whatever and then call that function again
> and start another HTTP session for the second file and so on until it
> had as much concurrency as it wanted. And then when it hit the
> concurrency limit, it would wait until at least one HTTP request
> finished. At that point it would call
> HeyEverybodyISuccessfullyArchivedAWalFile(), after which it could
> again ask for the next file and start a request for that one and so on
> and so forth.

> I don't really understand what the other possible model is here,
> honestly. Right now, control remains within the archive module for the
> entire time that a file is being archived. If we generalize the model
> to allow multiple files to be in the process of being archived at the
> same time, the archive module is going to need to have control as long
> as >= 1 of them are in progress, at least AFAICS. If you have some
> other idea how it would work, please explain it to me...

I don't think that a main loop approach is the only viable one. It might be
the most likely to succeed one though. As an alternative, consider something
like

struct ArchiveFileState {
   int fd;
   enum WaitFor { READ, WRITE, CONNECT };
   void *file_private;
}

typedef bool (*ArchiveFileStartCB)(ArchiveModuleState *state,
   ArchiveFileState *file_state,
   const char *file, const char *path);

typedef bool (*ArchiveFileContinueCB)(ArchiveModuleState *state,
   ArchiveFileState *file_state);

An archive module could open an HTTP connection, do IO until it's blocked, put
the fd in file_state, return. The main loop could do big event loop around all
of the file descriptors and whenever any of FDs signal IO is ready, call
ArchiveFileContinueCB() for that file.

I don't know if that's better than ArchiverModuleMainLoopCB(). I can see both
advantages and disadvantages.

Greetings,

Andres Freund




Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-02-25 Thread Gilles Darold

Le 25/02/2023 à 16:40, Stéphane Tachoires a écrit :

Hi,

I'm not sure about the "child" -> "partition" change as it also 
selects childs that are not partitions.
I'm more dubious about the --with-childs option, I'd rather have 
--table-with-childs= and 
--exclude-table-with-childs=. That will be clearer about what 
is what.


I'm working on that, but have a hard time with 
test pg_dump/002_pg_dump (It's brand new to me)


Stéphane

Le ven. 24 févr. 2023 à 23:50, Cary Huang  a écrit :

The following review has been posted through the commitfest
application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi

the patch applies fine on current master branch and it works as
described. However, I would suggest changing the new option name
from "--with-childs" to "--with-partitions" for several reasons.

"childs" is grammatically incorrect and in the PG community, the
term "partitioned table" is normally used to denote a parent
table, and the term "partition" is used to denote the child table
under the parent table. We should use these terms to stay
consistent with the community.

Also, I would rephrase the documentation as:

Used in conjunction with
-t/--table or
-T/--exclude-table options to
include or exclude partitions of the specified tables if any.

thank you

Cary Huang

HighGo Software Canada
www.highgo.ca 



Hi,


This is right this patch also works with inherited tables so 
--with-partitions can be confusing this is why --with-childs was chosen. 
But I disagree the use of --table-with-childs and 
--exclude-table-with-childs because we already have the --table and 
--exclude-table, and it will add lot of code where we just need a switch 
to include children tables. Actually my first though was that this 
behavior (dump child tables when the root table is dumped using --table) 
should be the default in pg_dump but the problem is that it could break 
existing scripts using pg_dump so I prefer to implement the 
--with-childs options.



I think we can use --with-partitions, provided that it is clear in the 
documentation that this option also works with inheritance.



Attached is a new patch v2 using the --with-partitions and the 
documentation fix.



--
Gilles Darold
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905f..15ada5c8ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1173,6 +1173,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --with-partitions
+  
+   
+	Used in conjunction with -t/--table
+	or -T/--exclude-table options to
+	include or exclude partitions of the specified tables if any.
+	This option is also valid for tables using inheritance.
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..2e7e919391 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -200,6 +200,7 @@ typedef struct _dumpOptions
 
 	int			sequence_data;	/* dump sequence data even in schema-only mode */
 	int			do_nothing;
+	bool			with_partitions;
 } DumpOptions;
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 24ba936332..2b7a122d7e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -421,6 +421,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"with-partitions", no_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +632,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* dump child table too */
+dopt.with_partitions = true;
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -809,6 +814,14 @@ main(int argc, char **argv)
 false);
 	/* non-matching exclusion patterns aren't an error */
 
+	/*
+	 * The include child option require that there is
+	 * at least one table inclusion
+	 */
+	if (dopt.with_partitions && table_include_patterns.head == NULL
+			&& table_exclude_patterns.head == NULL)
+		pg_fatal("option --with-partitions requires option -t/--table or -T/--exclude-table");
+
 	/* Expand table selection patterns into OID lists */
 	if (table_include_patterns.head != NULL)
 	{
@@ -1087,6 +1100,9 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --with-partitons

Re: meson: Optionally disable installation of test modules

2023-02-25 Thread Andres Freund
Hi,

On 2023-02-22 10:09:10 +0100, Peter Eisentraut wrote:
> On 20.02.23 20:48, Andres Freund wrote:
> > On 2023-02-20 19:43:56 +0100, Peter Eisentraut wrote:
> > > I don't think any callers try to copy a directory source, so the
> > > shutil.copytree() stuff isn't necessary.
> > 
> > I'd like to use it for installing docs outside of the normal install
> > target. Of course we could add the ability at a later point, but that seems 
> > a
> > bit pointless back-forth to me.
> 
> I figured it could be useful as a general installation tool, but the current
> script has specific command-line options for this specific purpose, so I
> don't think it would work for your purpose anyway.
> 
> For the purpose here, we really just need something that does
> 
> for src in sys.argv[1:-1]:
> shutil.copy2(src, sys.argv[-1])
> 
> But we need to call it twice for different sets of files and destinations,
> and since we can't have more than one command per test, we either need to
> write two "tests" or write a wrapper script like the one we have here.

How about making the arguments
  --install target-path list of files or directories
  --install another-path another set of files


> I don't know what the best way to slice this is, but it's not a lot of code
> that we couldn't move around again in the future.

That's true. The main work here is going through all the test modules, and
that won't be affected by changing the argument syntax.

Greetings,

Andres Freund




Re: windows/meson cfbot warnings

2023-02-25 Thread Andres Freund
Hi,

On 2022-12-28 17:44:47 -0800, Andres Freund wrote:
> On 2022-12-28 18:35:38 -0600, Justin Pryzby wrote:
> > Since a few days ago, the windows/meson task has been spewing messages for 
> > each tap
> > test:
> > 
> > | Unknown TAP version. The first line MUST be `TAP version `. Assuming 
> > version 12.
> > 
> > I guess because the image is updated to use meson v1.0.0.
> > https://github.com/mesonbuild/meson/commit/b7a5c384a1f1ba80c09904e7ef4f5160bdae3345
> > 
> > mesonbuild/mtest.py-if version is None:
> > mesonbuild/mtest.py:self.warnings.append('Unknown TAP version. 
> > The first line MUST be `TAP version `. Assuming version 12.')
> > mesonbuild/mtest.py-version = 12
> 
> I think the change is somewhat likely to be reverted in the next meson minor
> version. It apparently came about due to somewhat odd language in the tap-14
> spec...  So I think we should just wait for now.

FWIW, that did happen in 1.0.1, and the output is now clean again.

Greetings,

Andres Freund




Re: broken formatting?

2023-02-25 Thread Pavel Stehule
so 25. 2. 2023 v 18:13 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > so 25. 2. 2023 v 17:57 odesílatel Tom Lane  napsal:
> >> Pavel Stehule  writes:
> >>> -numeric(PG_FUNCTION_ARGS)
> >>> +numeric(PG_FUNCTION_ARGS)
>
> >> Sadly, pgindent will just put that back, because it knows that "numeric"
> >> is a typedef.
>
> > Is it possible to rename this function?
>
> That would be a way out, but it never seemed worth the trouble.
>

ook

Regards

Pavel


>
> regards, tom lane
>


locale/encoding vs vcregress.pl installcheck

2023-02-25 Thread Andrew Dunstan
vcregress's installcheck_internal has "--encoding=SQL_ASCII --no-locale" 
hardcoded. It's been like that for a long time, for no good reason that 
I can see. The practical effect is to make it well nigh impossible to 
run the regular regression tests against any other encoding/locale. This 
in turn has apparently masked an issue with the collate.windows.win1252 
test, which only runs on a WIN1252-encoded database.


I propose simply to remove those settings for the installcheck target. 
We already run the regression tests under these conditions in 
'vcregress.pl check', so we wouldn't be giving up anything important. 
Although this partcular test is only present in HEAD, I think we should 
backpatch the change to all live branches.


(Yes, I know we are trying to get rid of these tools, but we haven't 
done so yet. I'm working on it for the buildfarm, which is how I 
discovered this issue.)



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: broken formatting?

2023-02-25 Thread Tom Lane
Pavel Stehule  writes:
> so 25. 2. 2023 v 17:57 odesílatel Tom Lane  napsal:
>> Pavel Stehule  writes:
>>> -numeric(PG_FUNCTION_ARGS)
>>> +numeric(PG_FUNCTION_ARGS)

>> Sadly, pgindent will just put that back, because it knows that "numeric"
>> is a typedef.

> Is it possible to rename this function?

That would be a way out, but it never seemed worth the trouble.

regards, tom lane




Re: how does postgresql handle LOB/CLOB/BLOB column data that dies before the query ends

2023-02-25 Thread Tom Lane
Noel Grandin  writes:
> OK, so it seems like so far my design is not far off the PostgreSQL design
> (which is very comforting).

> I wonder if the difference is in the client<->server protocol.

That could be a piece of the puzzle, yeah.

> Does PostgreSQL hold the transaction open until the client side has closed
> the resultset (or the query object possibly, not sure about the PostgreSQL
> API here).

We use single-threaded server processes, so we couldn't close the
transaction (or more to the point, drop the query's snapshot) until
we've computed and sent the whole resultset.  I should think that
there's a similar requirement even if multi-threaded: if you do MVCC
at all then you have to hold your snapshot (or whatever mechanism
you use) until the resultset is all computed, or else later rows
in the query result might be wrong.

In the scenario I'm describing with a query fetching some large
object OID(s) followed by separate queries retrieving those large
objects, we put it on the client to create an explicit transaction
block around those queries (ie send BEGIN and COMMIT commands),
and to select a transaction mode that causes the same snapshot to
be used across the whole transaction.  If the client fails to do
this, there could be concurrency anomalies.  Any one of those
queries will still deliver self-consistent results, but they
might not match up with earlier or later queries.

regards, tom lane




Re: broken formatting?

2023-02-25 Thread Pavel Stehule
so 25. 2. 2023 v 17:57 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > -numeric(PG_FUNCTION_ARGS)
> > +numeric(PG_FUNCTION_ARGS)
>
> Sadly, pgindent will just put that back, because it knows that "numeric"
> is a typedef.
>

Is it possible to rename this function?



>
> regards, tom lane
>


Re: broken formatting?

2023-02-25 Thread Tom Lane
Pavel Stehule  writes:
> -numeric(PG_FUNCTION_ARGS)
> +numeric(PG_FUNCTION_ARGS)

Sadly, pgindent will just put that back, because it knows that "numeric"
is a typedef.

regards, tom lane




Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-02-25 Thread Stéphane Tachoires
Hi,

I'm not sure about the "child" -> "partition" change as it also selects
childs that are not partitions.
I'm more dubious about the --with-childs option, I'd rather have
--table-with-childs= and --exclude-table-with-childs=.
That will be clearer about what is what.

I'm working on that, but have a hard time with test pg_dump/002_pg_dump
(It's brand new to me)

Stéphane

Le ven. 24 févr. 2023 à 23:50, Cary Huang  a écrit :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Hi
>
> the patch applies fine on current master branch and it works as described.
> However, I would suggest changing the new option name from "--with-childs"
> to "--with-partitions" for several reasons.
>
> "childs" is grammatically incorrect and in the PG community, the term
> "partitioned table" is normally used to denote a parent table, and the term
> "partition" is used to denote the child table under the parent table. We
> should use these terms to stay consistent with the community.
>
> Also, I would rephrase the documentation as:
>
> Used in conjunction with -t/--table or
> -T/--exclude-table options to include or
> exclude partitions of the specified tables if any.
>
> thank you
>
> Cary Huang
> 
> HighGo Software Canada
> www.highgo.ca



-- 
"Où se posaient les hirondelles avant l'invention du téléphone ?"
  -- Grégoire Lacroix


Re: zstd compression for pg_dump

2023-02-25 Thread Euler Taveira
On Sat, Feb 25, 2023, at 7:31 AM, Tomas Vondra wrote:
> On 2/24/23 20:18, Justin Pryzby wrote:
> > This is a draft patch - review is welcome and would help to get this
> > ready to be considererd for v16, if desired.
> > 
> > I'm going to add this thread to the old CF entry.
> > https://commitfest.postgresql.org/31/2888/
> > 
> 
> Thanks. Sadly cfbot is unhappy - the windows and cplusplus builds failed
> because of some issue in pg_backup_archiver.h. But it's a bit bizarre
> because the patch does not modify that file at all ...
cpluspluscheck says

# pg_dump is not C++-clean because it uses "public" and "namespace"
# as field names, which is unfortunate but we won't change it now.

Hence, the patch should exclude the new header file from it.

--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -153,6 +153,7 @@ do
test "$f" = src/bin/pg_dump/compress_gzip.h && continue
test "$f" = src/bin/pg_dump/compress_io.h && continue
test "$f" = src/bin/pg_dump/compress_lz4.h && continue
+   test "$f" = src/bin/pg_dump/compress_zstd.h && continue
test "$f" = src/bin/pg_dump/compress_none.h && continue
test "$f" = src/bin/pg_dump/parallel.h && continue
test "$f" = src/bin/pg_dump/pg_backup_archiver.h && continue


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add LZ4 compression in pg_dump

2023-02-25 Thread Justin Pryzby
On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.

One more - WriteDataToArchiveGzip() says:

+   if (cs->compression_spec.level == 0)
+   pg_fatal("requested to compress the archive yet no level was 
specified");

That was added at e9960732a.  

But if you specify gzip:0, the compression level is already enforced by
validate_compress_specification(), before hitting gzip.c:

| pg_dump: error: invalid compression specification: compression algorithm 
"gzip" expects a compression level between 1 and 9 (default at -1)

5e73a6048 intended that to work as before, and you *can* specify -Z0:

The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.

$ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
--compress 0 |file -
/dev/stdin: ASCII text

Right now, I think that pg_fatal in gzip.c is dead code - that was first
added in the patch version sent on 21 Dec 2022.

-- 
Justin
>From 07b446803ec89ccd0954583640f314fa7f77048f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_gzip.c  | 7 ---
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..af68fd27a12 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,13 +123,6 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	bool		supported = false;
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..46815fa2ebe 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -21,7 +21,7 @@
 #define ZLIB_OUT_SIZE	4096
 #define ZLIB_IN_SIZE	4096
 
-extern char *supports_compression(const pg_compress_specification compression_spec);
+extern char *pgdump_supports_compression(const pg_compress_specification compression_spec);
 
 /*
  * Prototype for callback function used in writeData()
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. 

Re: Doc update for pg_stat_statements normalization

2023-02-25 Thread Imseih (AWS), Sami
>> Could things be done in a more stable way?  For example, imagine that
>> we have an extra Query field called void *private_data that extensions
>> can use to store custom data associated to a query ID, then we could
>> do something like that:
>> - In the post-analyze hook, check if an entry with the query ID
>> calculated exists.
>> -- If the entry exists, grab a copy of the existing query string,
>> which may be normalized or not, and save it into Query->private_data.
>> -- If the entry does not exist, normalize the query, store it in
>> Query->private_data but do not yet create an entry in the hash table.
>> - In the planner/utility hook, fetch the normalized query from
>> private_data, then use it if an entry needs to be created in the hash
>> table.  The entry may have been deallocated since the post-analyze
>> hook, in which case it is re-created with the normalized copy saved in
>> the first phase.

>I think the idea of a "private_data" like thing has been discussed before 
> and
>rejected IIRC, as it could be quite expensive and would also need to
>accommodate for multiple extensions and so on.

The overhead of storing this additional private data for the life of the query
execution may not be  desirable. I think we also will need to copy the
private data to QueryDesc as well to make it available to planner/utility/exec
hooks.

>Overall, I think that if the pgss eviction rate is high enough that it's
>problematic for doing performance analysis, the performance overhead will 
> be so
>bad that simply removing pg_stat_statements will give you a ~ x2 
> performance
>increase.  I don't see much point trying to make such a performance killer
>scenario more usable.

In v14, we added a dealloc metric to pg_stat_statements_info, which is helpful.
However, this only deals with the pgss_hash entry deallocation.
I think we should also add a metric for the text file garbage collection.

Regards

-- 
Sami Imseih
Amazon Web Services



broken formatting?

2023-02-25 Thread Pavel Stehule
Hi

diff --git a/src/backend/utils/adt/numeric.c
b/src/backend/utils/adt/numeric.c
index a83feea396..12c6548675 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -1233,7 +1233,7 @@ numeric_support(PG_FUNCTION_ARGS)
  * scale of the attribute have to be applied on the value.
  */
 Datum
-numeric(PG_FUNCTION_ARGS)
+numeric(PG_FUNCTION_ARGS)
 {
Numeric num = PG_GETARG_NUMERIC(0);
int32   typmod = PG_GETARG_INT32(1);

Regards

Pavel


Re: PG_FREE_IF_COPY extraneous in numeric_cmp?

2023-02-25 Thread CK Tan
Thanks!

On Fri, Feb 24, 2023 at 2:16 PM Tom Lane  wrote:
>
> CK Tan  writes:
> > Isn't it true that pfree() will never be called by PG_FREE_IF_COPY?
>
> No.  You're forgetting the possibility that PG_GETARG_NUMERIC will
> have to de-toast a toasted input.  Granted, numerics are seldom
> going to be long enough to get compressed or pushed out-of-line;
> but that's possible, and what's very possible is that they'll have
> a short header.
>
> regards, tom lane




Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2023-02-25 Thread Tomas Vondra
On 2/24/23 22:07, Heikki Linnakangas wrote:
> I had a quick look at just the preliminary cleanup patches:
> 
>> 0001-BRIN-bloom-cleanup-20230218.patch
> 
> Looks good to me
> 
>> 0002-BRIN-minmax-multi-cleanup-20230218.patch
> 
> Looks good, although it would feel more natural to me to do it the other
> way round, and define 'matches' as 'bool matches', and use DatumGetBool.
> 

Yeah, probably. I was trying to only do the minimal change because of
(maybe) backpatching this.

> Not new with this patch, but I find the 'matches' and 'matching'
> variables a bit strange. Wouldn't it be simpler to have just one variable?
> 

True. I don't recall why we did it this way.

>> 0003-Introduce-bloom_filter_size-20230218.patch
> 
> Looks good
> 
>> 0004-Add-minmax-multi-inequality-tests-20230218.patch
> 
> Looks good
> 
>> +SELECT i/5 + mod(911 * i + 483, 25),
>> +   i/10 + mod(751 * i + 221, 41)
> 
> Peculiar formulas. Was there a particular reason for these values?
> 

No, not really. I simply wanted a random-looking data, but reproducible
and deterministic. And linear congruential generator is a simple way to
do that. I just picked a couple co-prime numbers, and that's it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Disable rdns for Kerberos tests

2023-02-25 Thread Heikki Linnakangas
On 25 February 2023 00:50:30 EET, Stephen Frost  wrote:
>Thanks for reviewing!  Comments added and updated the commit message.
>
>Unless there's anything else, I'll push this early next week.

s/capture portal/captive portal/. Other than that, looks good to me.

- Heikki




Re: zstd compression for pg_dump

2023-02-25 Thread Tomas Vondra
On 2/24/23 20:18, Justin Pryzby wrote:
> This is a draft patch - review is welcome and would help to get this
> ready to be considererd for v16, if desired.
> 
> I'm going to add this thread to the old CF entry.
> https://commitfest.postgresql.org/31/2888/
> 

Thanks. Sadly cfbot is unhappy - the windows and cplusplus builds failed
because of some issue in pg_backup_archiver.h. But it's a bit bizarre
because the patch does not modify that file at all ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-25 Thread Amit Kapila
On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı  wrote:
>
>>
>> I think this overhead seems to be mostly due to the need to perform
>> tuples_equal multiple times for duplicate values. I don't know if
>> there is any simple way to avoid this without using the planner stuff
>> as was used in the previous approach. So, this brings us to the
>> question of whether just providing a way to disable/enable the use of
>> index scan for such cases is sufficient or if we need any other way.
>>
>> Tom, Andres, or others, do you have any suggestions on how to move
>> forward with this patch?
>>
>
> Thanks for the feedback and testing. Due to personal circumstances,
> I could not reply the thread in the last 2 weeks, but I'll be more active
> going forward.
>
>  I also agree that we should have a way to control the behavior.
>
> I created another patch (v24_0001_optionally_disable_index.patch) which can 
> be applied
> on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch.
>
> The new patch adds a new subscription_parameter for both CREATE and ALTER 
> subscription
> named: enable_index_scan. The setting is valid only when REPLICA IDENTITY is 
> full.
>
> What do you think about such a patch to control the behavior? It does not 
> give a per-relation
> level of control, but still useful for many cases.
>

Wouldn't a table-level option like 'apply_index_scan' be better than a
subscription-level option with a default value as false? Anyway, the
bigger point is that we don't see a better way to proceed here than to
introduce some option to control this behavior.

I see this as a way to provide this feature for users but I would
prefer to proceed with this if we can get some more buy-in from senior
community members (at least one more committer) and some user(s) if
possible. So, I once again request others to chime in and share their
opinion.

-- 
With Regards,
Amit Kapila.




Re: how does postgresql handle LOB/CLOB/BLOB column data that dies before the query ends

2023-02-25 Thread Noel Grandin
On Sat, 25 Feb 2023 at 08:33, Tom Lane  wrote:

> Yeah, Postgres has an analogous kind of problem.  Our standard way to
> use "large objects" is to store their identifying OIDs in tables,
>
...

> and in particular they can *not* close the transaction that read the
> OID if they'd like to read a matching state of the large object.
> So far there's not been a lot of complaints about that ...
>
>
OK, so it seems like so far my design is not far off the PostgreSQL design
(which is very comforting).

I wonder if the difference is in the client<->server protocol.

Does PostgreSQL hold the transaction open until the client side has closed
the resultset (or the query object possibly, not sure about the PostgreSQL
API here).
H2 has a very simple client-server protocol, which means the client simply
sends a query and gets back a result-set stream, and there is no explicit
acknowledgement of when the client closes the resultset, which means that
the MVCC transaction is typically closed by the time the client even starts
reading the resultset.


Re: Doc update for pg_stat_statements normalization

2023-02-25 Thread Julien Rouhaud
On Sat, Feb 25, 2023 at 02:58:36PM +0900, Michael Paquier wrote:
> On Fri, Feb 24, 2023 at 08:54:00PM +, Imseih (AWS), Sami wrote:
> > I think the only thing to do here is to call this out in docs with a
> > suggestion to increase pg_stat_statements.max to reduce the
> > likelihood. I also attached the suggested  doc enhancement as well.
> 
> Improving the docs about that sounds like a good idea.  This would be
> less surprising for users, if we had some details about that.
> 
> > Any thoughts?
> 
> The risk of deallocation of an entry between the post-analyze hook and
> the planner/utility hook represented with two calls of pgss_store()
> means that this will never be able to work correctly as long as we
> don't know the shape of the normalized query in the second code path
> (planner, utility execution) updating the entries with the call
> information, etc.  And we only want to pay the cost of normalization
> once, after the post-analyze where we do the query jumbling.

Note also that this is a somewhat wanted behavior (to evict queries that didn't
have any planning or execution stats record), per the
STICKY_DECREASE_FACTOR and related stuff.

> Could things be done in a more stable way?  For example, imagine that
> we have an extra Query field called void *private_data that extensions
> can use to store custom data associated to a query ID, then we could
> do something like that:
> - In the post-analyze hook, check if an entry with the query ID
> calculated exists.
> -- If the entry exists, grab a copy of the existing query string,
> which may be normalized or not, and save it into Query->private_data.
> -- If the entry does not exist, normalize the query, store it in
> Query->private_data but do not yet create an entry in the hash table.
> - In the planner/utility hook, fetch the normalized query from
> private_data, then use it if an entry needs to be created in the hash
> table.  The entry may have been deallocated since the post-analyze
> hook, in which case it is re-created with the normalized copy saved in
> the first phase.

I think the idea of a "private_data" like thing has been discussed before and
rejected IIRC, as it could be quite expensive and would also need to
accommodate for multiple extensions and so on.

Overall, I think that if the pgss eviction rate is high enough that it's
problematic for doing performance analysis, the performance overhead will be so
bad that simply removing pg_stat_statements will give you a ~ x2 performance
increase.  I don't see much point trying to make such a performance killer
scenario more usable.