Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-01 Thread Masahiko Sawada
On Thu, Dec 1, 2022 at 4:00 PM John Naylor  wrote:
>
>
> On Wed, Nov 30, 2022 at 11:09 PM Masahiko Sawada  
> wrote:
> >
> > I've investigated this issue and have a question about using atomic
> > variables on palloc'ed memory. In non-parallel vacuum cases,
> > radix_tree_control is allocated via aset.c. IIUC in 32-bit machines,
> > the memory allocated by aset.c is 4-bytes aligned so these atomic
> > variables are not always 8-bytes aligned. Is there any way to enforce
> > 8-bytes aligned memory allocations in 32-bit machines?
>
> The bigger question in my mind is: Why is there an atomic variable in 
> backend-local memory?

Because I use the same radix_tree and radix_tree_control structs for
non-parallel and parallel vacuum. Therefore, radix_tree_control is
allocated in DSM for parallel-vacuum cases or in backend-local memory
for non-parallel vacuum cases.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Bug in row_number() optimization

2022-12-01 Thread Richard Guo
On Mon, Nov 28, 2022 at 5:59 PM Sergey Shinderuk 
wrote:

> Not quite sure that we don't need to do anything for the
> WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more
> tuples for the current partition, we still call ExecProject with
> dangling pointers. Is it okay?


AFAIU once we go into WINDOWAGG_PASSTHROUGH_STRICT we will spool all the
remaining tuples in the current partition without storing them and then
move to the next partition if available and become WINDOWAGG_RUN again
or become WINDOWAGG_DONE if there are no further partitions.  It seems
we would not have chance to see the dangling pointers.


> +   if (!func_strict(opexpr->opfuncid))
> +   return false;
>
> Should return true instead?


Yeah, you're right.  This should be a thinko.

Thanks
Richard


Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas

2022-12-01 Thread Noah Misch
On Wed, Nov 30, 2022 at 05:35:01PM -0500, Tom Lane wrote:
> Also, I'd like to structure things so that the first para covers what
> you need to know in a clean v15+ installation, and details that only
> apply in upgrade scenarios are in the second para.  The upgrade scenario
> is going to be interesting to fewer and fewer people over time, so let's
> not clutter the lede with it.
> 
> So maybe about like this?
> 
> Constrain ordinary users to user-private schemas.  To implement
> this pattern, for every user needing to create non-temporary
> objects, create a schema with the same name as that user.  (Recall
> that the default search path starts with $user, which resolves to
> the user name. Therefore, if each user has a separate schema, they
> access their own schemas by default.)  Also ensure that no other
> schemas have public CREATE privileges.  This pattern is a secure
> schema usage pattern unless an untrusted user is the database
> owner or holds the CREATEROLE privilege, in which case no secure
> schema usage pattern exists.

This is free from the problem found in ddl-create-public-reorg-really.patch.
However, the word "other" doesn't belong there.  (The per-user schemas should
not have public CREATE privilege.)  I would also move that same sentence up
front, like this:

Constrain ordinary users to user-private schemas.  To implement this
pattern, first ensure that no schemas have public CREATE privileges.
Then, for every user needing to create non-temporary objects, create a
schema with the same name as that user.  (Recall that the default search
path starts with $user, which resolves to the user name. Therefore, if
each user has a separate schema, they access their own schemas by
default.)  This pattern is a secure schema usage pattern unless an
untrusted user is the database owner or holds the CREATEROLE privilege, in
which case no secure schema usage pattern exists.

With that, I think you have improved on the status quo.  Thanks.

> In PostgreSQL 15 and later, the default configuration supports
> this usage pattern.  In prior versions, or when using a database
> that has been upgraded from a prior version, you will need to
> remove the public CREATE privilege from the public schema (issue
> REVOKE CREATE ON SCHEMA public FROM PUBLIC).  Then consider
> auditing the public schema for objects named like objects in
> schema pg_catalog.

> BTW, is "create a schema with the same name" sufficient detail?
> You have to either make it owned by that user, or explicitly
> grant CREATE permission on it.  I'm not sure if that detail
> belongs here, but it feels like maybe it does.

Maybe.  Failing to GRANT that will yield a clear error when the user starts
work, so it's not critical to explain here.




File API cleanup

2022-12-01 Thread Peter Eisentraut
Here are a couple of patches that clean up the internal File API and 
related things a bit:


0001-Update-types-in-File-API.patch

Make the argument types of the File API match stdio better:

- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.

In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().

0002-Remove-unnecessary-casts.patch

Some code carefully cast all data buffer arguments for
BufFileWrite() and BufFileRead() to void *, even though the
arguments are already void * (and AFAICT were never anything else).
Remove this unnecessary clutter.

(I had initially thought these casts were related to the first patch, 
but as I said the BufFile API never used char * arguments, so this 
turned out to be unrelated, but still weird.)From 0ca35b75e383272356ba49211913d8aead5ca10d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Dec 2022 08:36:09 +0100
Subject: [PATCH 1/2] Update types in File API

Make the argument types of the File API match stdio better:

- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.

In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().
---
 src/backend/storage/file/fd.c | 8 
 src/include/storage/fd.h  | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 4151cafec547..f6c938202309 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1980,7 +1980,7 @@ FileClose(File file)
  * to read into.
  */
 int
-FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info)
+FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info)
 {
 #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
int returnCode;
@@ -2031,7 +2031,7 @@ FileWriteback(File file, off_t offset, off_t nbytes, 
uint32 wait_event_info)
 }
 
 int
-FileRead(File file, char *buffer, int amount, off_t offset,
+FileRead(File file, void *buffer, size_t amount, off_t offset,
 uint32 wait_event_info)
 {
int returnCode;
@@ -2039,7 +2039,7 @@ FileRead(File file, char *buffer, int amount, off_t 
offset,
 
Assert(FileIsValid(file));
 
-   DO_DB(elog(LOG, "FileRead: %d (%s) " INT64_FORMAT " %d %p",
+   DO_DB(elog(LOG, "FileRead: %d (%s) " INT64_FORMAT " %zu %p",
   file, VfdCache[file].fileName,
   (int64) offset,
   amount, buffer));
@@ -2087,7 +2087,7 @@ FileRead(File file, char *buffer, int amount, off_t 
offset,
 }
 
 int
-FileWrite(File file, char *buffer, int amount, off_t offset,
+FileWrite(File file, const void *buffer, size_t amount, off_t offset,
  uint32 wait_event_info)
 {
int returnCode;
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index c0a212487d92..7144fc9f6050 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -102,9 +102,9 @@ extern File PathNameOpenFile(const char *fileName, int 
fileFlags);
 extern File PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t 
fileMode);
 extern File OpenTemporaryFile(bool interXact);
 extern void FileClose(File file);
-extern int FilePrefetch(File file, off_t offset, int amount, uint32 
wait_event_info);
-extern int FileRead(File file, char *buffer, int amount, off_t offset, 
uint32 wait_event_info);
-extern int FileWrite(File file, char *buffer, int amount, off_t offset, 
uint32 wait_event_info);
+extern int FilePrefetch(File file, off_t offset, off_t amount, uint32 
wait_event_info);
+extern int FileRead(File file, void *buffer, size_t amount, off_t offset, 
uint32 wait_event_info);
+extern int FileWrite(File file, const void *buffer, size_t amount, off_t 
offset, uint32 wait_event_info);
 extern int FileSync(File file, uint32 wait_event_info);
 extern off_t FileSize(File file);
 extern int FileTruncate(File file, off_t offset, uint32 wait_event_info);

base-commit: 43351557d0d2b9c5e20298b5fee2849abef86aff
-- 
2.38.1

From 8d0530fbf9095bc16fa3465b69aef2c31c4c3c9b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Dec 2022 08:36:09 +0100
Subject: [PATCH 2/2] Remove unnecessary casts

Some code carefully cast all data buffer arguments for BufFileWrite()
and BufFileRead() to void *, even though the arguments are already
void * (and AFAICT were never anything else).  Remove this unnecessary
clutter.
---
 src/backend/executor/nodeHashjoin.c |  8 
 src/backend/utils/sort/tuplestore.c | 12 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/e

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-01 Thread Amit Kapila
On Thu, Dec 1, 2022 at 11:44 AM Masahiko Sawada  wrote:
>
> On Wed, Nov 30, 2022 at 7:54 PM Amit Kapila  wrote:
> >
> > On Tue, Nov 29, 2022 at 10:18 AM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach the new version patch which addressed all comments.
> > >
> >
> > Some comments on v53-0002*
> > 
> > 1. I think testing the scenario where the shm_mq buffer is full
> > between the leader and parallel apply worker would require a large
> > amount of data and then also there is no guarantee. How about having a
> > developer GUC [1] force_apply_serialize which allows us to serialize
> > the changes and only after commit the parallel apply worker would be
> > allowed to apply it?
>
> +1
>
> The code coverage report shows that we don't cover the partial
> serialization codes. This GUC would improve the code coverage.
>

Shall we keep it as a boolean or an integer? Keeping it as an integer
as suggested by Kuroda-San [1] would have an added advantage that we
can easily test the cases where serialization would be triggered after
sending some changes.

[1] - 
https://www.postgresql.org/message-id/TYAPR01MB5866160DE81FA2D88B8F22DEF5159%40TYAPR01MB5866.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-01 Thread John Naylor
On Thu, Dec 1, 2022 at 3:03 PM Masahiko Sawada 
wrote:
>
> On Thu, Dec 1, 2022 at 4:00 PM John Naylor 
wrote:
> >
> > The bigger question in my mind is: Why is there an atomic variable in
backend-local memory?
>
> Because I use the same radix_tree and radix_tree_control structs for
> non-parallel and parallel vacuum. Therefore, radix_tree_control is
> allocated in DSM for parallel-vacuum cases or in backend-local memory
> for non-parallel vacuum cases.

Ok, that could be yet another reason to compile local- and shared-memory
functionality separately, but now I'm wondering why there are atomic
variables at all, since there isn't yet any locking support.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: File API cleanup

2022-12-01 Thread Bharath Rupireddy
On Thu, Dec 1, 2022 at 1:55 PM Peter Eisentraut
 wrote:
>
> Here are a couple of patches that clean up the internal File API and
> related things a bit:
>
> 0001-Update-types-in-File-API.patch
>
>  Make the argument types of the File API match stdio better:
>
>  - Change the data buffer to void *, from char *.
>  - Change FileWrite() data buffer to const on top of that.
>  - Change amounts to size_t, from int.
>
>  In passing, change the FilePrefetch() amount argument from int to
>  off_t, to match the underlying posix_fadvise().
>
> 0002-Remove-unnecessary-casts.patch
>
>  Some code carefully cast all data buffer arguments for
>  BufFileWrite() and BufFileRead() to void *, even though the
>  arguments are already void * (and AFAICT were never anything else).
>  Remove this unnecessary clutter.
>
> (I had initially thought these casts were related to the first patch,
> but as I said the BufFile API never used char * arguments, so this
> turned out to be unrelated, but still weird.)

Thanks. Please note that I've not looked at the patches attached.
However, I'm here after reading the $subject - can we have a generic,
single function file_exists() in fd.c/file_utils.c so that both
backend and frontend code can use it? I see there are 3 uses and
definitions of it in jit.c, dfmgr.c and pg_regress.c. This will reduce
the code duplication. Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Allow round() function to accept float and double precision

2022-12-01 Thread Dean Rasheed
On Thu, 1 Dec 2022 at 02:58, David Rowley  wrote:
>
> On Thu, 1 Dec 2022 at 15:41, David G. Johnston
>  wrote:
> > I don't get the point of adding a function here (or at least one called 
> > round) - the type itself is inexact so, as you say, it is actually more of 
> > a type conversion with an ability to specify precision, which is exactly 
> > what you get today when you write 1.48373::numeric(20,3) - though it is a 
> > bit annoying having to specify an arbitrary precision.
>
> An additional problem with that which you might have missed is that
> you'd need to know what to specify in the precision part of the
> typemod.  You might start getting errors one day if you don't select a
> value large enough. That problem does not exist with round().  Having
> to specify 131072 each time does not sound like a great solution, it's
> not exactly a very memorable number.
>

I don't really see the point of such a function either.

Casting to numeric(1000, n) will work fine in all cases AFAICS (1000
being the maximum allowed precision in a numeric typemod, and somewhat
more memorable).

Note that double precision numbers range in magnitude from something
like 2.2e-308 to 1.8e308, so you won't ever get an error (except, I
suppose, if you also chose "n" larger than 692 or so, but that would
be silly, given the input).


> > At present round does allow you to specify a negative position to round at 
> > positions to the left of the decimal point (this is undocumented though...) 
> > which the actual cast cannot do, but that seems like a marginal case.

Note that, as of PG15, "n" can be negative in such typemods, if you
want to round before the decimal point.

The fact that passing a negative scale to round() isn't documented
does seem like an oversight though...

Regards,
Dean




Re: [DOCS] Stats views and functions not in order?

2022-12-01 Thread Peter Eisentraut

On 29.11.22 08:29, Peter Smith wrote:

PSA v8* patches.

Here, patches 0001 and 0002 are unchanged, but 0003 has many changes
per David's suggestion [1] to change all these views to 
blocks.


I don't understand what order 0001 is trying to achieve.  I know we 
didn't necessarily want to go fully alphabetic, but if we're going to 
spend time on this, let's come up with a system that the next 
contributor who adds a view will be able to understand and follow.


As an aside, I find the mixing of pg_stat_* and pg_statio_* views 
visually distracting.  It was easier to read before when they were in 
separate blocks.


I think something like this would be manageable:


pg_stat_archiver
pg_stat_bgwriter
pg_stat_database
pg_stat_database_conflicts
pg_stat_replication_slots
pg_stat_slru
pg_stat_subscription_stats
pg_stat_wal


pg_stat_all_tables
pg_stat_sys_tables
pg_stat_user_tables
pg_stat_xact_all_tables
pg_stat_xact_sys_tables
pg_stat_xact_user_tables
pg_stat_all_indexes
pg_stat_sys_indexes
pg_stat_user_indexes
pg_stat_user_functions
pg_stat_xact_user_functions


pg_statio_all_tables
pg_statio_sys_tables
pg_statio_user_tables
pg_statio_all_indexes
pg_statio_sys_indexes
pg_statio_user_indexes
pg_statio_all_sequences
pg_statio_sys_sequences
pg_statio_user_sequences


In any case, the remaining patches are new and need further review, so 
I'll move this to the next CF.







Re: pg_upgrade allows itself to be run twice

2022-12-01 Thread Peter Eisentraut

On 01.11.22 14:07, Justin Pryzby wrote:

On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote:

On 07.07.22 08:22, Justin Pryzby wrote:

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID.  Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.


I think it would be better to update the control file directly instead of
going through pg_resetwal.  (See src/include/common/controldata_utils.h for
the required functions.)

However, I don't know whether we need to add special provisions that guard
against people using postgres --single in complicated ways.  Many consider
the single-user mode deprecated outside of initdb use.


Thanks for looking.


I think the above is a "returned with feedback" at this point.


One other thing I noticed (by accident!) is that pg_upgrade doesn't
prevent itself from trying to upgrade a cluster on top of itself:

| $ /usr/pgsql-15/bin/initdb -D pg15.dat -N
| $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin
| Performing Upgrade
| --
| Analyzing all rows in the new cluster   ok
| Freezing all rows in the new clusterok
| Deleting files from new pg_xact ok
   ^^^
| Copying old pg_xact to new server
| *failure*
|
| Consult the last few lines of 
"pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 
for



| command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> 
"pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 2>&1
| cp: cannot stat 'pg15.dat/pg_xact': No such file or directory

This may be of little concern since it's upgrading a version to itself, which
only applies to developers.


I think this would be worth addressing nonetheless, for robustness.  For 
comparison, "cp" and "mv" will error if you give source and destination 
that are the same file.






Re: Improve performance of pg_strtointNN functions

2022-12-01 Thread Dean Rasheed
On Thu, 1 Dec 2022 at 05:38, David Rowley  wrote:
>
> On Thu, 1 Dec 2022 at 18:27, John Naylor  wrote:
> > I don't see why the non-decimal literal patch needs to be "immediately" 
> > faster? If doing this first leads to less code churn, that's another 
> > consideration, but you haven't made that argument.
>
> My view is that Peter wants to keep the code he's adding for the hex,
> octal and binary parsing as similar to the existing code as possible.
> I very much understand Peter's point of view on that. Consistency is
> good. However, if we commit the hex literals patch first, people might
> ask "why don't we use bit-wise operators to make the power-of-2 bases
> faster?", which seems like a very legitimate question. I asked it,
> anyway...  On the other hand, if Peter adds the bit-wise operators
> then the problem of code inconsistency remains.
>
> As an alternative to those 2 options, I'm proposing we commit this
> first then the above dilemma disappears completely.
>
> If this was going to cause huge conflicts with Peter's patch then I
> might think differently. I feel like it's a fairly trivial task to
> rebase.
>
> If the consensus is that we should fix this afterwards, then I'm happy to 
> delay.
>

I feel like it should be done afterwards, so that any performance
gains can be measured for all bases. Otherwise, we won't really know,
or have any record of, how much faster this was for other bases, or be
able to go back and test that.

Regards,
Dean




initdb: Refactor PG_CMD_PUTS loops

2022-12-01 Thread Peter Eisentraut

Keeping the SQL commands that initdb runs in string arrays before
feeding them to PG_CMD_PUTS() seems unnecessarily verbose and
inflexible.  In some cases, the array only has one member.  In other
cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a
command, but that would require breaking up the loop or using
workarounds like replace_token().  This patch unwinds all that; it's 
much simpler that way.From e177a142a9a5412ff8aeb271330005ef518b32d1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Dec 2022 09:49:48 +0100
Subject: [PATCH] initdb: Refactor PG_CMD_PUTS loops

Keeping the SQL commands that initdb runs in string arrays before
feeding them to PG_CMD_PUTS() seems unnecessarily verbose and
inflexible.  In some cases, the array only has one member.  In other
cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a
command, but that would require breaking up the loop or using
workarounds like replace_token().  Unwind all that; it's much simpler
that way.
---
 src/bin/initdb/initdb.c | 379 ++--
 1 file changed, 170 insertions(+), 209 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a04305590..7c391aaf0b13 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1350,18 +1350,11 @@ bootstrap_template1(void)
 static void
 setup_auth(FILE *cmdfd)
 {
-   const char *const *line;
-   static const char *const pg_authid_setup[] = {
-   /*
-* The authid table shouldn't be readable except through views, 
to
-* ensure passwords are not publicly visible.
-*/
-   "REVOKE ALL ON pg_authid FROM public;\n\n",
-   NULL
-   };
-
-   for (line = pg_authid_setup; *line != NULL; line++)
-   PG_CMD_PUTS(*line);
+   /*
+* The authid table shouldn't be readable except through views, to
+* ensure passwords are not publicly visible.
+*/
+   PG_CMD_PUTS("REVOKE ALL ON pg_authid FROM public;\n\n");
 
if (superuser_password)
PG_CMD_PRINTF("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
@@ -1433,18 +1426,11 @@ get_su_pwd(void)
 static void
 setup_depend(FILE *cmdfd)
 {
-   const char *const *line;
-   static const char *const pg_depend_setup[] = {
-   /*
-* Advance the OID counter so that subsequently-created objects 
aren't
-* pinned.
-*/
-   "SELECT pg_stop_making_pinned_objects();\n\n",
-   NULL
-   };
-
-   for (line = pg_depend_setup; *line != NULL; line++)
-   PG_CMD_PUTS(*line);
+   /*
+* Advance the OID counter so that subsequently-created objects aren't
+* pinned.
+*/
+   PG_CMD_PUTS("SELECT pg_stop_making_pinned_objects();\n\n");
 }
 
 /*
@@ -1530,147 +1516,138 @@ setup_collation(FILE *cmdfd)
 static void
 setup_privileges(FILE *cmdfd)
 {
-   char  **line;
-   char  **priv_lines;
-   static char *privileges_setup[] = {
-   "UPDATE pg_class "
-   "  SET relacl = (SELECT array_agg(a.acl) FROM "
-   " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl "
-   "  UNION SELECT unnest(pg_catalog.acldefault("
-   "CASE WHEN relkind = " CppAsString2(RELKIND_SEQUENCE) " 
THEN 's' "
-   " ELSE 'r' END::\"char\"," 
CppAsString2(BOOTSTRAP_SUPERUSERID) "::oid))"
-   " ) as a) "
-   "  WHERE relkind IN (" CppAsString2(RELKIND_RELATION) ", "
-   CppAsString2(RELKIND_VIEW) ", " CppAsString2(RELKIND_MATVIEW) 
", "
-   CppAsString2(RELKIND_SEQUENCE) ")"
-   "  AND relacl IS NULL;\n\n",
-   "GRANT USAGE ON SCHEMA pg_catalog, public TO PUBLIC;\n\n",
-   "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n\n",
-   "INSERT INTO pg_init_privs "
-   "  (objoid, classoid, objsubid, initprivs, privtype)"
-   "SELECT"
-   "oid,"
-   "(SELECT oid FROM pg_class WHERE relname = 'pg_class'),"
-   "0,"
-   "relacl,"
-   "'i'"
-   "FROM"
-   "pg_class"
-   "WHERE"
-   "relacl IS NOT NULL"
-   "AND relkind IN (" CppAsString2(RELKIND_RELATION) ", "
-   CppAsString2(RELKIND_VIEW) ", " CppAsString2(RELKIND_MATVIEW) 
", "
-   CppAsString2(RELKIND_SEQUENCE) ");\n\n",
-   "INSERT INTO pg_init_privs "
-   "  (objoid, classoid, objsubid, initprivs, privtype)"
-   "SELECT"
-   "pg_class.oid,"
-   "(SELECT oid FROM pg_class WHERE relname = 'pg_class'),"
-   "pg_attribute.attnum,"
-   "pg_at

RE: Perform streaming logical transactions by background workers and parallel apply

2022-12-01 Thread houzj.f...@fujitsu.com
On Thursday, December 1, 2022 3:58 PM Masahiko Sawada  
wrote:
> 
> On Wed, Nov 30, 2022 at 10:51 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, November 30, 2022 9:41 PM houzj.f...@fujitsu.com
>  wrote:
> > >
> > > On Tuesday, November 29, 2022 8:34 PM Amit Kapila
> > > > Review comments on v53-0001*
> > >
> > > Attach the new version patch set.
> >
> > Sorry, there were some mistakes in the previous patch set.
> > Here is the correct V54 patch set. I also ran pgindent for the patch set.
> >
> 
> Thank you for updating the patches. Here are random review comments for
> 0001 and 0002 patches.

Thanks for the comments!

> 
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("logical replication parallel apply worker exited
> abnormally"),
>  errcontext("%s", edata.context))); and
> 
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("logical replication parallel apply worker exited
> because of subscription information change")));
> 
> I'm not sure ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is appropriate
> here. Given that parallel apply worker has already reported the error message
> with the error code, I think we don't need to set the errorcode for the logs
> from the leader process.
> 
> Also, I'm not sure the term "exited abnormally" is appropriate since we use it
> when the server crashes for example. I think ERRORs reported here don't mean
> that in general.

How about reporting "xxx worker exited due to error" ?

> ---
> if (am_parallel_apply_worker() && on_subinfo_change) {
> /*
>  * If a parallel apply worker exits due to the subscription
>  * information change, we notify the leader apply worker so that the
>  * leader can report more meaningful message in time and restart the
>  * logical replication.
>  */
> pq_putmessage('X', NULL, 0);
> }
> 
> and
> 
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("logical replication parallel apply worker exited
> because of subscription information change")));
> 
> Do we really need an additional message in case of 'X'? When we call
> apply_worker_clean_exit with on_subinfo_change = true, we have reported the
> error message such as:
> 
> ereport(LOG,
> (errmsg("logical replication parallel apply worker for subscription
> \"%s\" will stop because of a parameter change",
> MySubscription->name)));
> 
> I think that reporting a similar message from the leader might not be
> meaningful for users.

The intention is to let leader report more meaningful message if a worker
exited due to subinfo change. Otherwise, the leader is likely to report an
error like " lost connection ... to parallel apply worker" when trying to send
data via shared memory if the worker exited. What do you think ?

> ---
> -if (options->proto.logical.streaming &&
> -PQserverVersion(conn->streamConn) >= 14)
> -appendStringInfoString(&cmd, ", streaming 'on'");
> +if (options->proto.logical.streaming_str)
> +appendStringInfo(&cmd, ", streaming '%s'",
> +
> options->proto.logical.streaming_str);
> 
> and
> 
> +/*
> + * Assign the appropriate option value for streaming option
> according to
> + * the 'streaming' mode and the publisher's ability to
> support that mode.
> + */
> +if (server_version >= 16 &&
> +MySubscription->stream == SUBSTREAM_PARALLEL)
> +{
> +options.proto.logical.streaming_str = pstrdup("parallel");
> +MyLogicalRepWorker->parallel_apply = true;
> +}
> +else if (server_version >= 14 &&
> + MySubscription->stream != SUBSTREAM_OFF)
> +{
> +options.proto.logical.streaming_str = pstrdup("on");
> +MyLogicalRepWorker->parallel_apply = false;
> +}
> +else
> +{
> +options.proto.logical.streaming_str = NULL;
> +MyLogicalRepWorker->parallel_apply = false;
> +}
> 
> This change moves the code of adjustment of the streaming option based on
> the publisher server version from libpqwalreceiver.c to worker.c.
> On the other hand, the similar logic for other parameters such as "two_phase"
> and "origin" are still done in libpqwalreceiver.c. How about passing
> MySubscription->stream via WalRcvStreamOptions and constructing a
> streaming option string in libpqrcv_startstreaming()?
> In ApplyWorkerMain(), we just need to set
> MyLogicalRepWorker->parallel_apply = true if (server_version >= 16
> && MySubscription->stream == SUBSTREAM_PARALLEL). We won't need
> pstrdup for "parallel" and "on", and it's more consistent with other 
> parameters.

Thanks for the suggestion. I thought 

Re: ExecRTCheckPerms() and many prunable partitions

2022-12-01 Thread Alvaro Herrera
Hello,

This didn't apply, so I rebased it on current master, excluding the one
I already pushed.  No further changes.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)
>From c8a9bcd071ad88d5ae286bcb1a241b4fccd2449a Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 28 Nov 2022 16:12:15 +0900
Subject: [PATCH v30 1/2] Add ri_RootToChildMap and ExecGetRootToChildMap()

It's a AttrMap provided for converting "root" table column bitmapsets
into their child relation counterpart, as a more generalized
alternative to using ri_RootToPartitionMap.attrMap to do the same.
More generalized in the sense that it can also be requested for
regular inheritance child relations, whereas ri_RootToPartitionMap
is currently only initialized in tuple-routing "partition" result
relations.

One of the differences between the two cases is that the regular
inheritance child relations can have their own columns that are not
present in the "root" table, so the map must be created in a way that
ignores such columns.  To that end, ExecGetRootToChildMap() passes
true for the missing_ok argument of build_attrmap_by_name(), so that
it puts 0 (InvalidAttr) in the map for the columns of a child table
that are not present in the root table.

root-table-to-child-table bitmapset conversions that would need
ri_RootToChildMap (cannot be done with ri_RootToPartitionMap) are
as of this commit unnecessary, but will become necessary in a
subsequent commit that will remove the insertedCols et al bitmapset
fields from RangeTblEntry node in favor of a new type of node that
will only be created and added to the plan for root tables
in a query and never for children.  The child table bitmapsets will
be created on-the-fly during execution if needed, by copying the
root table bitmapset and converting with the aforementioned map as
required.
---
 src/backend/executor/execUtils.c | 39 
 src/include/executor/executor.h  |  2 ++
 src/include/nodes/execnodes.h|  8 +++
 3 files changed, 49 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 0e595ffa6e..e2b4272d90 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1252,6 +1252,45 @@ ExecGetChildToRootMap(ResultRelInfo *resultRelInfo)
 	return resultRelInfo->ri_ChildToRootMap;
 }
 
+/*
+ * Return the map needed to convert "root" table column bitmapsets to the
+ * rowtype of an individual child table.  A NULL result is valid and means
+ * that no conversion is needed.
+ */
+AttrMap *
+ExecGetRootToChildMap(ResultRelInfo *resultRelInfo,
+	  EState *estate)
+{
+	/* If we didn't already do so, compute the map for this child. */
+	if (!resultRelInfo->ri_RootToChildMapValid)
+	{
+		ResultRelInfo *rootRelInfo = resultRelInfo->ri_RootResultRelInfo;
+		MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		if (rootRelInfo)
+		{
+			/*
+			 * Passing 'true' below means any columns present in the child
+			 * table but not in the root parent are to be ignored; note that
+			 * such a case is possible with traditional inheritance but never
+			 * with partitioning.
+			 */
+			resultRelInfo->ri_RootToChildMap =
+build_attrmap_by_name_if_req(RelationGetDescr(rootRelInfo->ri_RelationDesc),
+			 RelationGetDescr(resultRelInfo->ri_RelationDesc),
+			 true);
+		}
+		else	/* this isn't a child result rel */
+			resultRelInfo->ri_RootToChildMap = NULL;
+
+		resultRelInfo->ri_RootToChildMapValid = true;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	return resultRelInfo->ri_RootToChildMap;
+}
+
 /* Return a bitmap representing columns being inserted */
 Bitmapset *
 ExecGetInsertedCols(ResultRelInfo *relinfo, EState *estate)
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..5c02a1521f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -600,6 +600,8 @@ extern TupleTableSlot *ExecGetTriggerOldSlot(EState *estate, ResultRelInfo *relI
 extern TupleTableSlot *ExecGetTriggerNewSlot(EState *estate, ResultRelInfo *relInfo);
 extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relInfo);
 extern TupleConversionMap *ExecGetChildToRootMap(ResultRelInfo *resultRelInfo);
+extern AttrMap *ExecGetRootToChildMap(ResultRelInfo *resultRelInfo,
+	  EState *estate);
 
 extern Bitmapset *ExecGetInsertedCols(ResultRelInfo *relinfo, EState *estate);
 extern Bitmapset *ExecGetUpdatedCols(ResultRelInfo *relinfo, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18e572f171..313840fe32 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -563,6 +563,14 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	boo

Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas

2022-12-01 Thread Alvaro Herrera
On 2022-Dec-01, Noah Misch wrote:

> This is free from the problem found in ddl-create-public-reorg-really.patch.
> However, the word "other" doesn't belong there.  (The per-user schemas should
> not have public CREATE privilege.)  I would also move that same sentence up
> front, like this:
> 
> Constrain ordinary users to user-private schemas.  To implement this
> pattern, first ensure that no schemas have public CREATE privileges.
> Then, for every user needing to create non-temporary objects, create a
> schema with the same name as that user.  (Recall that the default search
> path starts with $user, which resolves to the user name. Therefore, if
> each user has a separate schema, they access their own schemas by
> default.)  This pattern is a secure schema usage pattern unless an
> untrusted user is the database owner or holds the CREATEROLE privilege, in
> which case no secure schema usage pattern exists.

+1 LGTM

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: generic plans and "initial" pruning

2022-12-01 Thread Alvaro Herrera
On 2022-Dec-01, Amit Langote wrote:

> Hmm, how about keeping the [Merge]Append's parent relation's RT index
> in the PartitionPruneInfo and passing it down to
> ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> cross-checking?  Both Append and MergeAppend already have a
> 'apprelids' field that we can save a copy of in the
> PartitionPruneInfo.  Tried that in the attached delta patch.

Ah yeah, that sounds about what I was thinking.  I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling.  So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure.  It's running now:
  https://cirrus-ci.com/task/5857239638999040

If all goes well with this run, I'll get this 0001 pushed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)




Re: Bug in row_number() optimization

2022-12-01 Thread Sergey Shinderuk

On 01.12.2022 11:18, Richard Guo wrote:


On Mon, Nov 28, 2022 at 5:59 PM Sergey Shinderuk 
mailto:s.shinde...@postgrespro.ru>> wrote:


Not quite sure that we don't need to do anything for the
WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more
tuples for the current partition, we still call ExecProject with
dangling pointers. Is it okay?

AFAIU once we go into WINDOWAGG_PASSTHROUGH_STRICT we will spool all the
remaining tuples in the current partition without storing them and then
move to the next partition if available and become WINDOWAGG_RUN again
or become WINDOWAGG_DONE if there are no further partitions.  It seems
we would not have chance to see the dangling pointers.


Maybe I'm missing something, but the previous call to spool_tuples() 
might have read extra tuples (if the tuplestore spilled to disk), and 
after switching to WINDOWAGG_PASSTHROUGH_STRICT mode we nevertheless 
would loop through these extra tuples and call ExecProject if only to 
increment winstate->currentpos.


--
Sergey Shinderukhttps://postgrespro.com/





Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-01 Thread Amit Kapila
On Wed, Nov 30, 2022 at 4:23 PM Amit Kapila  wrote:
>
> 2.
> + /*
> + * The stream lock is released when processing changes in a
> + * streaming block, so the leader needs to acquire the lock here
> + * before entering PARTIAL_SERIALIZE mode to ensure that the
> + * parallel apply worker will wait for the leader to release the
> + * stream lock.
> + */
> + if (in_streamed_transaction &&
> + action != LOGICAL_REP_MSG_STREAM_STOP)
> + {
> + pa_lock_stream(winfo->shared->xid, AccessExclusiveLock);
>
> This comment is not completely correct because we can even acquire the
> lock for the very streaming chunk. This check will work but doesn't
> appear future-proof or at least not very easy to understand though I
> don't have a better suggestion at this stage. Can we think of a better
> check here?
>

One idea is that we acquire this lock every time and callers like
stream_commit are responsible to release it. Also, we can handle the
close of stream file in the respective callers. I think that will make
this part of the patch easier to follow.

Some other comments:
=
1. The handling of buffile inside pa_stream_abort() looks bit ugly to
me. I think you primarily required it because the buffile opened by
parallel apply worker is in CurrentResourceOwner. Can we think of
having a new resource owner to apply spooled messages? I think that
will avoid the need to have a special purpose code to handle buffiles
in parallel apply worker.

2.
@@ -564,6 +571,7 @@ handle_streamed_transaction(LogicalRepMsgType
action, StringInfo s)
  TransactionId current_xid;
  ParallelApplyWorkerInfo *winfo;
  TransApplyAction apply_action;
+ StringInfoData original_msg;

  apply_action = get_transaction_apply_action(stream_xid, &winfo);

@@ -573,6 +581,8 @@ handle_streamed_transaction(LogicalRepMsgType
action, StringInfo s)

  Assert(TransactionIdIsValid(stream_xid));

+ original_msg = *s;
+
  /*
  * We should have received XID of the subxact as the first part of the
  * message, so extract it.
@@ -596,10 +606,14 @@ handle_streamed_transaction(LogicalRepMsgType
action, StringInfo s)
  stream_write_change(action, s);
  return true;

+ case TRANS_LEADER_PARTIAL_SERIALIZE:
  case TRANS_LEADER_SEND_TO_PARALLEL:
  Assert(winfo);

- pa_send_data(winfo, s->len, s->data);
+ if (apply_action == TRANS_LEADER_SEND_TO_PARALLEL)
+ pa_send_data(winfo, s->len, s->data);
+ else
+ stream_write_change(action, &original_msg);

Please add the comment to specify the reason to remember the original string.

3.
@@ -1797,8 +1907,8 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
  changes_filename(path, MyLogicalRepWorker->subid, xid);
  elog(DEBUG1, "replaying changes from file \"%s\"", path);

- fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
- false);
+ stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
+ stream_xid = xid;

Why do we need stream_xid here? I think we can avoid having global
stream_fd if the comment #1 is feasible.

4.
+ * TRANS_LEADER_APPLY:
+ * The action means that we

/The/This. Please make a similar change for other actions.

5. Apart from the above, please find a few changes to the comments for
0001 and 0002 patches in the attached patches.


-- 
With Regards,
Amit Kapila.


changes_amit_v54_0001.patch
Description: Binary data


changes_amit_v54_0002.patch
Description: Binary data


Re: generic plans and "initial" pruning

2022-12-01 Thread Amit Langote
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera  wrote:
> On 2022-Dec-01, Amit Langote wrote:
> > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > in the PartitionPruneInfo and passing it down to
> > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > cross-checking?  Both Append and MergeAppend already have a
> > 'apprelids' field that we can save a copy of in the
> > PartitionPruneInfo.  Tried that in the attached delta patch.
>
> Ah yeah, that sounds about what I was thinking.  I've merged that in and
> pushed to github, which had a strange pg_upgrade failure on Windows
> mentioning log files that were not captured by the CI tooling.  So I
> pushed another one trying to grab those files, in case it wasn't an
> one-off failure.  It's running now:
>   https://cirrus-ci.com/task/5857239638999040
>
> If all goes well with this run, I'll get this 0001 pushed.

Thanks for pushing 0001.

Rebased 0002 attached.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data


Optimizing Node Files Support

2022-12-01 Thread Ranier Vilela
Hi,

I believe that has room for improving generation node files.

The patch attached reduced the size of generated files by 27 kbytes.
>From 891 kbytes to 864 kbytes.

About the patch:
1. Avoid useless attribution when from->field is NULL, once that
the new node is palloc0.

2. Avoid useless declaration variable Size, when it is unnecessary.

3. Optimize comparison functions like memcmp and strcmp, using
 a short-cut comparison of the first element.

4. Switch several copy attributions like COPY_SCALAR_FIELD or
COPY_LOCATION_FIELD
by one memcpy call.

5. Avoid useless attribution, ignoring the result of pg_strtok when it is
unnecessary.

regards,
Ranier Vilela


optimize_gen_nodes_support.patch
Description: Binary data


Re: Support logical replication of DDLs

2022-12-01 Thread li jie
I applied patch 0005.

I think this modification is a bit overdone.
This design skips all subcommands, which results in many ddl
replication failures.
For example:
```
CREATE TABLE datatype_table (id SERIAL);
```
deparsed ddl is:
CREATE  TABLE  public.datatype_table (id pg_catalog.int4 STORAGE plain
NOT NULL DEFAULT
pg_catalog.nextval('public.datatype_table_id_seq'::pg_catalog.regclass))
CREATE SEQUENCE subcommand will be skipped.

OR:
```
CREATE SCHEMA element_test
CREATE TABLE foo (id int)
CREATE VIEW bar AS SELECT * FROM foo;
```
deparsed ddl is:
CREATE SCHEMA element_test.

Its subcommands will be skipped.
There may be other cases.

For the initial CREATE LIKE statement, It is special,
It derives the subcommand of alter table column.
Just skipping them may be enough.
Instead of skipping subcommands of all statements.
After all, our design is to obtain the actual ddl information from the
catalog instead of parsing raw parsetree.
This is why we cannot skip all subcommands.

Do you have any better ideas?

Regards, Adger.




Re: Asynchronous execution support for Custom Scan

2022-12-01 Thread Kazutaka Onishi
Thank you for your comment.
I've removed the tabs.

>  I can think of at least a few use cases where this customscan is helpful and 
> not merely testing code.

IIUC, we already can use ctid in the where clause on the latest
PostgreSQL, can't we?

2022年11月22日(火) 18:07 Ronan Dunklau :
>
> Le mardi 6 septembre 2022, 11:29:55 CET Etsuro Fujita a écrit :
> > On Mon, Sep 5, 2022 at 10:32 PM Kazutaka Onishi  wrote:
> > > I'm sorry for my error on your name...
> >
> > No problem.
> >
> > > >  IIUC, it uses the proposed
> > > >
> > > > APIs, but actually executes ctidscans *synchronously*, so it does not
> > > > improve performance.  Right?
> > >
> > > Exactly.
> > > The actual CustomScan that supports asynchronous execution will
> > > start processing in CustomScanAsyncRequest,
> > > configure to detect completion via file descriptor in
> > > CustomScanAsyncConfigureWait,
> > > and receive the result in CustomScanAsyncNotify.
> >
> > Ok, thanks!
>
> Thanks for this patch, seems like a useful addition to the CustomScan API.
> Just to nitpick: there are extraneous tabs in createplan.c on a blank line.
>
> Sorry for the digression, but I know your ctidscan module had been proposed
> for inclusion in contrib a long time ago, and I wonder if the rationale for
> not including it could have changed. We still don't have tests which cover
> CustomScan, and I can think of at least a few use cases where this customscan
> is helpful and not merely testing code.
>
> One of those use case is when performing migrations on a table, and one wants
> to update the whole table by filling a new column with a computed value. You
> obviously don't want to do it in a single transaction, so you end up batching
> updates using an index looking for null values. If you want to do this, it's
> much faster to update rows in a range of block, performing a first series of
> batch updating all such block ranges, and then finally update the ones we
> missed transactionally (inserted in a block we already processed while in the
> middle of the batch, or in new blocks resulting from a relation extension).
>
> Best regards,
>
> --
> Ronan Dunklau
>
>


v3-0001-allow-custon-scan-asynchronous-execution.patch
Description: Binary data


Re: Collation version tracking for macOS

2022-12-01 Thread Dagfinn Ilmari Mannsåker
Jeff Davis  writes:

> On Mon, 2022-11-28 at 19:36 -0800, Jeff Davis wrote:
>> On Mon, 2022-11-28 at 21:57 -0500, Robert Haas wrote:
>> > That is ... astonishingly bad.
>> 
>> https://unicode-org.atlassian.net/browse/CLDR-16175
>
> Oops, reported in CLDR instead of ICU. Moved to:
>
> https://unicode-org.atlassian.net/browse/ICU-22215

Out of morbid curiosity I went source diving, and the culprit is this
bit (which will also break if a version component ever goes above 999):

/* write the decimal field value */
field=versionArray[part];
if(field>=100) {
*versionString++=(char)('0'+field/100);
field%=100;
}
if(field>=10) {
*versionString++=(char)('0'+field/10);
field%=10;
}
*versionString++=(char)('0'+field);

(https://sources.debian.org/src/icu/72.1-3/source/common/putil.cpp#L2308)

because apparently snprintf() is too hard?

- ilmari




Re: Asynchronous execution support for Custom Scan

2022-12-01 Thread Ronan Dunklau
> IIUC, we already can use ctid in the where clause on the latest
> PostgreSQL, can't we?

Oh, sorry, I missed the TidRangeScan. My apologies for the noise.

Best regards,

--
Ronan Dunklau








Warning When Creating FOR EACH STATEMENT Trigger On Logical Replication Subscriber Side

2022-12-01 Thread Avi Weinberg
Hi Hackers,

There is no error or warning when creating FOR EACH STATEMENT trigger on  
Logical Replication subscriber side, but it is not doing anything.  Shouldn't a 
warning be helpful?

CREATE TRIGGER set_updated_time_trig
   AFTER INSERT OR UPDATE OR DELETE ON test
FOR EACH STATEMENT EXECUTE FUNCTION set_updated_time();

Thanks

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas

2022-12-01 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Dec-01, Noah Misch wrote:
>> This is free from the problem found in ddl-create-public-reorg-really.patch.
>> However, the word "other" doesn't belong there.  (The per-user schemas should
>> not have public CREATE privilege.)  I would also move that same sentence up
>> front, like this:
>> 
>> Constrain ordinary users to user-private schemas.  To implement this
>> pattern, first ensure that no schemas have public CREATE privileges.
>> Then, for every user needing to create non-temporary objects, create a
>> schema with the same name as that user.  (Recall that the default search
>> path starts with $user, which resolves to the user name. Therefore, if
>> each user has a separate schema, they access their own schemas by
>> default.)  This pattern is a secure schema usage pattern unless an
>> untrusted user is the database owner or holds the CREATEROLE privilege, in
>> which case no secure schema usage pattern exists.

> +1 LGTM

Sounds good.  I'll make it so in a bit.

regards, tom lane




Re: [DOCS] Stats views and functions not in order?

2022-12-01 Thread David G. Johnston
On Thu, Dec 1, 2022 at 2:20 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 29.11.22 08:29, Peter Smith wrote:
> > PSA v8* patches.
> >
> > Here, patches 0001 and 0002 are unchanged, but 0003 has many changes
> > per David's suggestion [1] to change all these views to 
> > blocks.
>
> I don't understand what order 0001 is trying to achieve.


The rule behind 0001 is:

All global object stats
All table object stats (stat > statio > xact; all > sys > user)
All index object stats
All sequence object stats
All function object stats


> As an aside, I find the mixing of pg_stat_* and pg_statio_* views
> visually distracting.  It was easier to read before when they were in
> separate blocks.
>

I found that having the statio at the end of each object type block added a
natural partitioning for tables and indexes that the existing order lacked
and that made reading the table be more "wall-of-text-ish", and thus more
difficult to read, than necessary.

I'm not opposed to the following though.  The object-type driven order just
feels more useful but I really cannot justify it beyond that.

I'm not particularly enamored with the existing single large table but
don't have a better structure to offer at this time.


> I think something like this would be manageable:
>
> 
> pg_stat_archiver
> pg_stat_bgwriter
> pg_stat_database
> pg_stat_database_conflicts
> pg_stat_replication_slots
> pg_stat_slru
> pg_stat_subscription_stats
> pg_stat_wal
>

WAL being adjacent to archiver/bgwriter seemed reasonable so I left that
alone.
Replication and Subscription being adjacent seemed reasonable so I left
that alone.
Thus slru ended up last, with database* remaining as-is.

At 8 items, with a group size average of 2, pure alphabetical is also
reasonable.


> 
>
> 
>
>
David J.


Re: Allow round() function to accept float and double precision

2022-12-01 Thread Tom Lane
Dean Rasheed  writes:
> I don't really see the point of such a function either.
> Casting to numeric(1000, n) will work fine in all cases AFAICS (1000
> being the maximum allowed precision in a numeric typemod, and somewhat
> more memorable).

Right, but I think what the OP wants is to not have to think about
whether the input is of exact or inexact type.  That's easily soluble
locally by making your own function:

create function round(float8, int) returns numeric
  as $$select pg_catalog.round($1::pg_catalog.numeric, $2)$$
  language sql strict immutable parallel safe;

but I'm not sure that the argument for it is strong enough to
justify putting it into Postgres.

> The fact that passing a negative scale to round() isn't documented
> does seem like an oversight though...

Agreed, will do something about that.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2022-12-01 Thread gkokolatos





--- Original Message ---
On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier 
 wrote:


> 
> 
> On Wed, Nov 30, 2022 at 05:11:44PM +, gkokola...@pm.me wrote:
> 
> > Fair enough. The atteched v11 does that. 0001 introduces compression
> > specification and is using it throughout. 0002 paves the way to the
> > new interface by homogenizing the use of cfp. 0003 introduces the new
> > API and stores the compression algorithm in the custom format header
> > instead of the compression level integer. Finally 0004 adds support for
> > LZ4.
> 
> 
> I have been looking at 0001, and.. Hmm. I am really wondering
> whether it would not be better to just nuke this warning into orbit.
> This stuff enforces non-compression even if -Z has been used to a
> non-default value. This has been moved to its current location by
> cae2bb1 as of this thread:
> https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp
> 
> However, this is only active if -Z is used when not building with
> zlib. At the end, it comes down to whether we want to prioritize the
> portability of pg_dump commands specifying a -Z/--compress across
> environments knowing that these may or may not be built with zlib,
> vs the amount of simplification/uniformity we would get across the
> binaries in the tree once we switch everything to use the compression
> specifications. Now that pg_basebackup and pg_receivewal are managed
> by compression specifications, and that we'd want more compression
> options for pg_dump, I would tend to do the latter and from now on
> complain if attempting to do a pg_dump -Z under --without-zlib with a
> compression level > 0. zlib is also widely available, and we don't
> document the fact that non-compression is enforced in this case,
> either. (Two TAP tests with the custom format had to be tweaked.)

Fair enough. Thank you for looking. However I have a small comment
on your new patch.

-   /* Custom and directory formats are compressed by default, others not */
-   if (compressLevel == -1)
-   {
-#ifdef HAVE_LIBZ
-   if (archiveFormat == archCustom || archiveFormat == 
archDirectory)
-   compressLevel = Z_DEFAULT_COMPRESSION;
-   else
-#endif
-   compressLevel = 0;
-   }


Nuking the warning from orbit and changing the behaviour around disabling
the requested compression when the libraries are not present, should not
mean that we need to change the behaviour of default values for different
formats. Please find v13 attached which reinstates it.

Which in itself it got me looking and wondering why the tests succeeded.
The only existing test covering that path is `defaults_dir_format` in
`002_pg_dump.pl`. However as the test is currently written it does not
check whether the output was compressed. The restore command would succeed
in either case. A simple `gzip -t -r` against the directory will not
suffice to test it, because there exist files which are never compressed
in this format (.toc). A little bit more involved test case would need
to be written, yet before I embark to this journey, I would like to know
if you would agree to reinstate the defaults for those formats.

> 
> As per the patch, it is true that we do not need to bump the format of
> the dump archives, as we can still store only the compression level
> and guess the method from it. I have added some notes about that in
> ReadHead and WriteHead to not forget.

Agreed. A minor suggestion if you may.

 #ifndef HAVE_LIBZ
-   if (AH->compression != 0)
+   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
pg_log_warning("archive is compressed, but this installation 
does not support compression -- no data will be available");
 #endif

It would seem a more consistent to error out in this case. We do error
in all other cases where the compression is not available.

> 
> Most of the changes are really-straight forward, and it has resisted
> my tests, so I think that this is in a rather-commitable shape as-is.

Thank you.

Cheers,
//Georgios

> --
> MichaelFrom 16e10b38cc8eb6eb5b1ffc15365d7e6ce23eef0a Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 1 Dec 2022 08:58:51 +
Subject: [PATCH v13] Teach pg_dump about compress_spec and use it throughout.

Align pg_dump with the rest of the binaries which use common compression. It is
teaching pg_dump.c about the common compression definitions and interfaces. Then
it propagates those throughout the code.
---
 doc/src/sgml/ref/pg_dump.sgml   |  34 +--
 src/bin/pg_dump/compress_io.c   | 107 
 src/bin/pg_dump/compress_io.h   |  20 ++--
 src/bin/pg_dump/pg_backup.h |   7 +-
 src/bin/pg_dump/pg_backup_archiver.c|  78 +-
 src/bin/pg_dump/pg_backup_archiver.h|  10 +-
 src/bin/pg_dump/pg_backup_custom.c  |   6 +-
 src/bin/pg_dump/pg_back

Re: Documentation for building with meson

2022-12-01 Thread Peter Eisentraut

On 23.11.22 22:24, samay sharma wrote:

Thank you. Attaching v7 addressing most of the points below.


I have committed this, after some editing and making some structural 
changes.  I moved the "Requirements" section back to the top level.  It 
did not look appealing to have to maintain two copies of this that have 
almost no substantial difference (but for some reason were written with 
separate structure and wording).  Also, I rearranged the Building with 
Meson section to use the same internal structure as the Building with 
Autoconf and Make section.  This will make it easier to maintain going 
forward.  For example if someone adds a new option, it will be easier to 
find the corresponding places in the lists where to add them.


We will likely keep iterating on the contents for the next little while, 
but I'm glad we now have a structure in place that we should be able to 
live with.






Re: File API cleanup

2022-12-01 Thread Peter Eisentraut

On 01.12.22 09:55, Bharath Rupireddy wrote:

can we have a generic,
single function file_exists() in fd.c/file_utils.c so that both
backend and frontend code can use it? I see there are 3 uses and
definitions of it in jit.c, dfmgr.c and pg_regress.c. This will reduce
the code duplication. Thoughts?


Well, the first problem with that would be that all three of those 
implementations are slightly different.  Maybe that is intentional, or 
maybe not, in which case a common implementation might be beneficial.


(Another thing to consider is that checking whether a file exists is not 
often actually useful.  If you want to use the file, you should just 
open it and then check for any errors.  The cases above have special 
requirements, so there obviously are uses, but I'm not sure how many in 
the long run.)






pg_upgrade: Make testing different transfer modes easier

2022-12-01 Thread Peter Eisentraut
I wanted to test the different pg_upgrade transfer modes (--link, 
--clone), but that was not that easy, because there is more than one 
place in the test script you have to find and manually change.  So I 
wrote a little patch to make that easier.  It's still manual, but it's a 
start.  (In principle, we could automatically run the tests with each 
supported mode in a loop, but that would be very slow.)


While doing that, I also found it strange that the default transfer mode 
(referred to as "copy" internally) did not have any external 
representation, so it is awkward to refer to it in text, and obscure to 
see where it is used for example in those test scripts.  So I added an 
option --copy, which effectively does nothing, but it's not uncommon to 
have options that select default behaviors explicitly.  (I also thought 
about something like a "mode" option with an argument, but given that we 
already have --link and --clone, this seemed the most sensible.)


Thoughts?From e01feabdfc0e2ea01242afd93261885c035e7942 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Dec 2022 15:34:00 +0100
Subject: [PATCH 1/2] pg_upgrade: Add --copy option

This option selects the default transfer mode.  Having an explicit
option is handy to make scripts and tests more explicit.  It also
makes it easier to talk about a "copy" mode rather than "the default
mode" or something like that, since until now the default mode didn't
have an externally visible name.
---
 doc/src/sgml/ref/pgupgrade.sgml | 10 ++
 src/bin/pg_upgrade/option.c |  6 ++
 2 files changed, 16 insertions(+)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 8f7a3025c368..7816b4c6859b 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -230,6 +230,16 @@ Options
   
  
 
+ 
+  --copy
+  
+   
+Copy files to the new cluster.  This is the default.  (See also
+--link and --clone.)
+   
+  
+ 
+
  
   -?
   --help
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index f441668c612a..f986129c2fb9 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -56,6 +56,7 @@ parseCommandLine(int argc, char *argv[])
{"socketdir", required_argument, NULL, 's'},
{"verbose", no_argument, NULL, 'v'},
{"clone", no_argument, NULL, 1},
+   {"copy", no_argument, NULL, 2},
 
{NULL, 0, NULL, 0}
};
@@ -194,6 +195,10 @@ parseCommandLine(int argc, char *argv[])
user_opts.transfer_mode = TRANSFER_MODE_CLONE;
break;
 
+   case 2:
+   user_opts.transfer_mode = TRANSFER_MODE_COPY;
+   break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"),
os_info.progname);
@@ -283,6 +288,7 @@ usage(void)
printf(_("  -v, --verbose enable verbose internal 
logging\n"));
printf(_("  -V, --version display version information, 
then exit\n"));
printf(_("  --clone   clone instead of copying 
files to new cluster\n"));
+   printf(_("  --copycopy files to new cluster 
(default)\n"));
printf(_("  -?, --helpshow this help, then 
exit\n"));
printf(_("\n"
 "Before running pg_upgrade you must:\n"

base-commit: ec386948948c1708c0c28c48ef08b9c4dd9d47cc
-- 
2.38.1

From b87a6bbb293deb94693774fa7b5c1e4918704f57 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Dec 2022 15:36:12 +0100
Subject: [PATCH 2/2] pg_upgrade: Make testing different transfer modes easier

It still requires a manual change in the test script, but now there is
only one well-marked place to change.  (Automatically running the
pg_upgrade tests for all supported modes would be too slow.)
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl 
b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index add6ea9c3437..365d81a8a380 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -12,6 +12,9 @@
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+# Can be changed (manually) to test the other modes.
+my $mode = '--copy';
+
 # Generate a database with a name made of a range of ASCII characters.
 sub generate_db
 {
@@ -256,6 +259,7 @@ sub filter_dump
'-s', $newnode->host,
'-p', $oldnode->port,
'-P', $newnode->port,
+   $mode,
'--check'
],
'run of pg_upgrade --check for new instance with incorr

Re: New docs chapter on Transaction Management and related changes

2022-12-01 Thread Bruce Momjian
On Wed, Nov 30, 2022 at 12:31:55PM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > I find it a bit shocking to have had it backpatched, even to 15 -- a
> > whole chapter in the documentation?  I don't see why it wouldn't be
> > treated like any other "major feature" patch, which we only consider for
> > the development branch.  Also, this is a first cut -- presumably we'll
> > want to copy-edit it before it becomes released material.
> 
> I think that last point is fairly convincing.  I've not read the
> new material, but I didn't get further than the first line of
> the new chapter file before noting a copy-and-paste error:
> 
> --- /dev/null
> +++ b/doc/src/sgml/xact.sgml
> @@ -0,0 +1,205 @@
> +

Fixed in master.

> That doesn't leave me with a warm feeling that it's ready to ship.
> I too vote for reverting it out of the released branches.

Patch reverted in all back branches.  I was hoping to get support for
more aggressive backpatches of docs, but obviously failed.  I should
have been clearer about my intent to backpatch, and will have to
consider these issues in future doc backpatches.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Temporary tables versus wraparound... again

2022-12-01 Thread Greg Stark
On Sat, 5 Nov 2022 at 11:34, Tom Lane  wrote:
>
> Greg Stark  writes:
> > Simple Rebase
>
> I took a little bit of a look through these.
>
> * I find 0001 a bit scary, specifically that it's decided it's
> okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext,
> and especially relation_needs_vacanalyze to another session's
> temp table.  How safe is that really?

I  can look a bit more closely but none of them are doing any thing
with the table itself, just the catalog entries which afaik have
always been fair game for other sessions. So I'm not really clear what
kind of unsafeness you're asking about.

> * Don't see much point in renaming checkTempNamespaceStatus.
> That doesn't make it not an ABI break.  If we want to back-patch
> this we'll have to do something different than what you did here.

Well it's an ABI break but at least it's an ABI break that gives a
build-time error or shared library loading error rather than one that
just crashes or writes to random memory at runtime.

> * In 0002, I don't especially approve of what you've done with
> the relminmxid calculation --- that seems to move it out of
> "pure bug fix" territory and into "hmm, I wonder if this
> creates new bugs" territory.

Hm. Ok, I can separate that into a separate patch. I admit I have a
lot of trouble remembering how multixactids work.


>  Also, skipping that update
> for non-temp tables immediately falsifies ResetVacStats'
> claimed charter of "resetting to the same values used when
> creating tables".  Surely GetOldestMultiXactId isn't *that*
> expensive, especially compared to the costs of file truncation.
> I think you should just do GetOldestMultiXactId straight up,
> and maybe submit a separate performance-improvement patch
> to make it do the other thing (in both places) for temp tables.

Hm. the feedback I got earlier was that it was quite expensive. That
said, I think the concern was about the temp tables where the truncate
was happening on every transaction commit when they're used. For
regular truncates I'm not sure how important optimizing it is.

> * I wonder if this is the best place for ResetVacStats --- it
> doesn't seem to be very close to the code it needs to stay in
> sync with.  If there's no better place, perhaps adding cross-
> reference comments in both directions would be advisable.

I'll look at that. I think relfrozenxid and relfrozenmxid are touched
in a lot of places so it may be tilting at windmills to try to
centralize the code working with them at this point...

> * 0003 says it's running temp.sql by itself to avoid interference
> from other sessions, but sadly that cannot avoid interference
> from background autovacuum/autoanalyze.  I seriously doubt this
> patch would survive contact with the buildfarm.  Do we actually
> need a new test case?  It's not like the code won't get exercised
> without it --- we have plenty of temp table truncations, surely.

No I don't think we do. I kept it in a separate commit so it could be
dropped when committing.

But just having truncate working isn't really good enough either. An
early version of the patch had a bug that meant it didn't run at all
so truncate worked fine but relfrozenxid never got reset.

In thinking about whether we could have a basic test that temp tables
are getting reset at all it occurs to me that there's still a gap
here:

You can have a session attached to a temp namespace that never
actually uses the temp tables. That would prevent autovacuum from
dropping them and still never reset their vacuum stats. :( Offhand I
think PreCommit_on_commit_actions() could occasionally truncate all ON
COMMIT TRUNCATE tables even if they haven't been touched in this
transaction.


--
greg




Re: Partial aggregates pushdown

2022-12-01 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-01 05:23:

Hi Mr.Pyhalov.


Hi.

Attaching minor fixes. I haven't proof-read all comments (but perhaps, 
they need attention from some native speaker).


Tested it with queries from 
https://github.com/swarm64/s64da-benchmark-toolkit, works as expected.

--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 35f2d102374..bd8a4acc112 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -3472,9 +3472,9 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 		if ((aggform->aggtranstype != INTERNALOID) && (aggform->aggfinalfn == InvalidOid)) {
 			appendFunctionName(node->aggfnoid, context);
 		} else if(aggform->partialaggfn) {
-			appendFunctionName((Oid)(aggform->partialaggfn), context);
+			appendFunctionName(aggform->partialaggfn, context);
 		} else {
-			elog(ERROR, "there in no partialaggfn %u", node->aggfnoid);
+			elog(ERROR, "there is no partialaggfn %u", node->aggfnoid);
 		}
 		ReleaseSysCache(aggtup);
 	}
@@ -3986,7 +3986,8 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 }
 
 /*
- * Check that partial aggregate function of aggform exsits in remote
+ * Check that partial aggregate function, described by aggform,
+ * exists on remote server, described by fpinfo.
  */
 static bool
 partial_agg_compatible(Form_pg_aggregate aggform, PgFdwRelationInfo *fpinfo)


Re: [PATCH] Add native windows on arm64 support

2022-12-01 Thread Niyas Sait



On 05/11/2022 18:31, Andres Freund wrote:

On 2022-11-03 11:06:46 +, Niyas Sait wrote:

I've attached a new version of the patch which excludes the already merged
ASLR changes and add
small changes to handle latest changes in the build scripts.

Note that we're planning to remove the custom windows build scripts before the
next release, relying on the meson build instead.



Thanks. I will add changes to add meson build support.


This won't suffice with the meson build, since the relevant configure test
also uses arm_acle.h:
elif host_cpu == 'arm' or host_cpu == 'aarch64'

  prog = '''
#include 

int main(void)
{
unsigned int crc = 0;
crc = __crc32cb(crc, 0);
crc = __crc32ch(crc, 0);
crc = __crc32cw(crc, 0);
crc = __crc32cd(crc, 0);

/* return computed value, to prevent the above being optimized away */
return crc == 0;
}
'''

  if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
  args: test_c_args)
# Use ARM CRC Extension unconditionally
cdata.set('USE_ARMV8_CRC32C', 1)
have_optimized_crc = true
  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
with -march=armv8-a+crc',
  args: test_c_args + ['-march=armv8-a+crc'])
# Use ARM CRC Extension, with runtime check
cflags_crc += '-march=armv8-a+crc'
cdata.set('USE_ARMV8_CRC32C', false)
cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
have_optimized_crc = true
  endif
endif

The meson checking logic is used both for msvc and other compilers, so this
will need to work with both.


Yes, will handle that.

--
Niyas




Re: [PATCH] Add native windows on arm64 support

2022-12-01 Thread Niyas Sait



On 07/11/2022 06:58, Michael Paquier wrote:

  #if defined(_WIN64)
  static __forceinline void
  spin_delay(void)
  {
+#ifdef _M_ARM64
+   /*
+* arm64 way of hinting processor for spin loops optimisations
+* ref: 
https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq
+   */
+   __isb(_ARM64_BARRIER_SY);
+#else
_mm_pause();
+#endif
  }
  #else
  static __forceinline void


I think we should just apply this, there seems very little risk of making
anything worse, given the gating to _WIN64 && _M_ARM64.


Seems so.  Hmm, where does _ARM64_BARRIER_SY come from?  Perhaps it
would be better to have a comment referring to it from a different
place than the forums of arm, like some actual docs?



_ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation 
- 
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions


I couldn't find something more official for the sse2neon library part.

--
Niyas




Re: Documentation for building with meson

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-01 15:58:39 +0100, Peter Eisentraut wrote:
> On 23.11.22 22:24, samay sharma wrote:
> > Thank you. Attaching v7 addressing most of the points below.
> 
> I have committed this, after some editing and making some structural
> changes.

Thanks. I was working on that too, but somehow felt a bit stuck...

I'll try if I can adapt my pending changes.


> I moved the "Requirements" section back to the top level.  It did
> not look appealing to have to maintain two copies of this that have almost
> no substantial difference (but for some reason were written with separate
> structure and wording).

I don't think this is good. The whole "The following software packages are
required for building PostgreSQL" section is wrong now.  "They are not
required in the default configuration, but they are needed when certain build
options are enabled, as explained below:" section is misleading as well.

By the time we fix all of those we'll end up with a different section again.


> Also, I rearranged the Building with Meson section to use the same internal
> structure as the Building with Autoconf and Make section.  This will make it
> easier to maintain going forward.  For example if someone adds a new option,
> it will be easier to find the corresponding places in the lists where to add
> them.

I don't know. The existing list order makes very little sense to me. The
E.g. --enable-nls is before the rest in configure, presumably because it sorts
there alphabetically. But that's not the case for meson.

Copying "Anti-Features" as a structure element to the meson docs seems bogus
(also the name is bogus, but that's a pre-existing issue). There's no
difference in -Dreadline= to the other options meson-options-features list.

Nor does -Dspinlocks= -Datomics= make sense in the "anti features" section. It
made some sense for autoconf because of the --without- prefix, but that's not
at play in meson.  Their placement in the "Developer Options" made a whole lot
more sense.


I don't like "Miscellaneous" bit containing minor stuff like krb_srvnam and
data layout changing options like blocksize,segsize,wal_blocksize. But it
makes sense to change that for both at the same time.


> We will likely keep iterating on the contents for the next little while, but
> I'm glad we now have a structure in place that we should be able to live
> with.

I agree that it's good to have something we can work from more
iteratively. But I don't think this is a structure that we can live with.


I'm not particularly happy about this level of structural change made without
discussing it prior.

Greetings,

Andres Freund




Re: Error-safe user functions

2022-12-01 Thread Andrew Dunstan

On 2022-11-21 Mo 00:26, Tom Lane wrote:
> Corey Huinker  writes:
>> On Tue, Nov 15, 2022 at 11:36 AM Andrew Dunstan  wrote:
>>> Nikita,
>>> just checking in, are you making progress on this? I think we really
>>> need to get this reviewed and committed ASAP if we are to have a chance
>>> to get the SQL/JSON stuff reworked to use it in time for release 16.
>> I'm making an attempt at this or something very similar to it. I don't yet
>> have a patch ready.
> Cool.  We can't delay too much longer on this if we want to have
> a credible feature in v16.  Although I want a minimal initial
> patch, there will still be a ton of incremental work to do after
> the core capability is reviewed and committed, so there's no
> time to lose.
>
>   


OK, here's a set of minimal patches based on Nikita's earlier work and
also some work by my colleague Amul Sul. It tries to follow Tom's
original outline at [1], and do as little else as possible.

Patch 1 introduces the IOCallContext node. The caller should set the
no_error_throw flag and clear the error_found flag, which will be set by
a conforming IO function if an error is found. It also includes a string
field for an error message. I haven't used that, it's more there to
stimulate discussion. Robert suggested to me that maybe it should be an
ErrorData, but I'm not sure how we would use it.

Patch 2 introduces InputFunctionCallContext(), which is similar to
InputFunctionCall() but with an extra context parameter. Note that it's
ok for an input function to return a NULL to this function if an error
is found.

Patches 3 and 4 modify the bool_in() and int4in() functions respectively
to handle an IOContextCall appropriately if provided one in their
fcinfo.context.

Patch 5 introduces COPY FROM ... NULL_ON_ERROR which, in addition to
being useful in itself, provides a test of the previous patches.

I hope we can get a fairly quick agreement so that work can being on
adjusting at least those things needed for the SQL/JSON patches ASAP.
Our goal should be to adjust all the core input functions, but that's
not quite so urgent, and can be completed in parallel with the SQL/JSON
work.


cheers


andrew

[1] https://www.postgresql.org/message-id/13351.1661965592%40sss.pgh.pa.us




--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From e84cf9461fe3898c0326e7c369c65fc4ef1761c6 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Wed, 30 Nov 2022 10:44:27 -0500
Subject: [PATCH 1/5] Add IOCallContext node

---
 src/include/nodes/primnodes.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 74f228d959..3e2ecd11b3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1692,4 +1692,21 @@ typedef struct OnConflictExpr
 	List	   *exclRelTlist;	/* tlist of the EXCLUDED pseudo relation */
 } OnConflictExpr;
 
+/*--
+ * IOCallContext - passed to an input function as FunctionCallInfo.context
+ *
+ * if the field no_error_throw is set then a conforming IO function will
+ * set the field error_found on error rather than calling ereport(ERROR ...)
+ *
+ */
+typedef struct IOCallContext
+{
+	pg_node_attr(nodetag_only) /* don't need copy etc. support functions */
+
+	NodeTag		type;
+	boolno_error_throw; /* set by caller to suppress errors */
+	boolerror_found;/* cleared by caller, set by IO function */
+	char   *error_message;
+} IOCallContext;
+
 #endif			/* PRIMNODES_H */
-- 
2.34.1

From b6b277f5dd3837ab9ac2df819f4e67f39d790d1f Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Wed, 30 Nov 2022 10:46:03 -0500
Subject: [PATCH 2/5] Add InputFunctionCallContext()

---
 src/backend/utils/fmgr/fmgr.c | 50 +++
 src/include/fmgr.h|  3 +++
 2 files changed, 53 insertions(+)

diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 3c210297aa..bd91ae1dac 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -24,6 +24,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/primnodes.h"
 #include "pgstat.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -1548,6 +1549,55 @@ InputFunctionCall(FmgrInfo *flinfo, char *str, Oid typioparam, int32 typmod)
 	return result;
 }
 
+Datum
+InputFunctionCallContext(FmgrInfo *flinfo, char *str,
+		 Oid typioparam, int32 typmod,
+		 void * ctxt)
+{
+	LOCAL_FCINFO(fcinfo, 3);
+	Datum		result;
+
+	if (str == NULL && flinfo->fn_strict)
+		return (Datum) 0;		/* just return null result */
+
+	InitFunctionCallInfoData(*fcinfo, flinfo, 3, InvalidOid, NULL, NULL);
+
+	fcinfo->args[0].value = CStringGetDatum(str);
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+	fcinfo->args[1].isnull = false;
+	fcinfo->args[2].value = Int32GetDatum(typmod);
+	fcinfo->args[2].isnull = false;
+
+	fcinfo->context = ctxt;
+

Re: [PATCH] Add native windows on arm64 support

2022-12-01 Thread Niyas Sait

Hello,

I've attached a new revision of the patch (v4) and includes following 
changes,


1. Add support for meson build system
2. Extend MSVC scripts to handle ARM64 platform.
3. Add arm64 definition of spin_delay function.
4. Exclude arm_acle.h import with MSVC compiler.

V3->V4: Add support for meson build system
V2->V3: Removed ASLR enablement and rebased on master.
V1->V2: Rebased on top of master

--
NiyasFrom 872f538f6261b36a32cbcecae82e99778b747799 Mon Sep 17 00:00:00 2001
From: Niyas Sait 
Date: Tue, 22 Feb 2022 13:07:24 +
Subject: [PATCH v4] Enable postgres native build for windows-arm64 platform

Following changes are included

- Extend MSVC scripts to handle ARM64 platform
- Add arm64 definition of spin_delay function
- Exclude arm_acle.h import for MSVC
- Add support for meson build
---
 meson.build  | 33 +++-
 src/include/storage/s_lock.h | 10 +-
 src/port/pg_crc32c_armv8.c   |  2 ++
 src/tools/msvc/MSBuildProject.pm | 16 
 src/tools/msvc/Mkvcbuild.pm  |  9 +++--
 src/tools/msvc/Solution.pm   | 11 ---
 src/tools/msvc/gendef.pl |  8 
 7 files changed, 62 insertions(+), 27 deletions(-)

diff --git a/meson.build b/meson.build
index 725e10d815..e354ad7650 100644
--- a/meson.build
+++ b/meson.build
@@ -1944,7 +1944,13 @@ int main(void)
 
 elif host_cpu == 'arm' or host_cpu == 'aarch64'
 
-  prog = '''
+  if cc.get_id() == 'msvc'
+cdata.set('USE_ARMV8_CRC32C', false)
+cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+have_optimized_crc = true
+  else
+
+prog = '''
 #include 
 
 int main(void)
@@ -1960,18 +1966,19 @@ int main(void)
 }
 '''
 
-  if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
-  args: test_c_args)
-# Use ARM CRC Extension unconditionally
-cdata.set('USE_ARMV8_CRC32C', 1)
-have_optimized_crc = true
-  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
with -march=armv8-a+crc',
-  args: test_c_args + ['-march=armv8-a+crc'])
-# Use ARM CRC Extension, with runtime check
-cflags_crc += '-march=armv8-a+crc'
-cdata.set('USE_ARMV8_CRC32C', false)
-cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
-have_optimized_crc = true
+if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
+args: test_c_args)
+  # Use ARM CRC Extension unconditionally
+  cdata.set('USE_ARMV8_CRC32C', 1)
+  have_optimized_crc = true
+elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
with -march=armv8-a+crc',
+args: test_c_args + ['-march=armv8-a+crc'])
+  # Use ARM CRC Extension, with runtime check
+  cflags_crc += '-march=armv8-a+crc'
+  cdata.set('USE_ARMV8_CRC32C', false)
+  cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+  have_optimized_crc = true
+endif
   endif
 endif
 
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 8b19ab160f..bf6a6dba35 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -708,13 +708,21 @@ typedef LONG slock_t;
 #define SPIN_DELAY() spin_delay()
 
 /* If using Visual C++ on Win64, inline assembly is unavailable.
- * Use a _mm_pause intrinsic instead of rep nop.
+ * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop.
  */
 #if defined(_WIN64)
 static __forceinline void
 spin_delay(void)
 {
+#ifdef _M_ARM64
+   /*
+* arm64 way of hinting processor for spin loops optimisations
+* ref: 
https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq
+   */
+   __isb(_ARM64_BARRIER_SY);
+#else
_mm_pause();
+#endif
 }
 #else
 static __forceinline void
diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c
index 9e301f96f6..981718752f 100644
--- a/src/port/pg_crc32c_armv8.c
+++ b/src/port/pg_crc32c_armv8.c
@@ -14,7 +14,9 @@
  */
 #include "c.h"
 
+#ifndef _MSC_VER
 #include 
+#endif
 
 #include "port/pg_crc32c.h"
 
diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
index 58590fdac2..274ddc8860 100644
--- a/src/tools/msvc/MSBuildProject.pm
+++ b/src/tools/msvc/MSBuildProject.pm
@@ -310,10 +310,18 @@ sub WriteItemDefinitionGroup
  : ($self->{type} eq "dll" ? 'DynamicLibrary' : 'StaticLibrary');
my $libs = $self->GetAdditionalLinkerDependencies($cfgname, ';');
 
-   my $targetmachine =
- $self->{platform} eq 'Win32' ? 'MachineX86' : 'MachineX64';
-   my $arch =
- $self->{platform} eq 'Win32' ? 'x86' : 'x86_64';
+   my $targetmachine;
+   my $arch;
+   if ($self->{platform} eq 'Win32') {
+   $targetmachine = 'MachineX86';
+   $arch = 'x86';
+   } elsif ($self->{platform} eq 'ARM64'){
+   $targetmachine = 'MachineARM64';
+   $arch = 'a

Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

2022-12-01 Thread Justin Pryzby
On Sat, Nov 05, 2022 at 10:43:07AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > You set the patch to "waiting on author", which indicates that there's
> > no need for further input or review.  But, I think that's precisely
> > what's needed - without input from more people, what could I do to
> > progress the patch ?  I don't think it's reasonable to put 001 first and
> > change thousands (actually, 1338) of regression results.  If nobody
> > wants to discuss 001, then this patch series won't progress.
> 
> Well ...
> 
> 1. 0001 invents a new GUC but provides no documentation for it.
> That certainly isn't committable, and it's discouraging the
> discussion you seek because people have to read the whole patch
> in detail to understand what is being proposed.
> 
> 2. The same complaint for 0004, which labors under the additional

I suggested that the discussion be limited to the early patches (004 is
an optional, possible idea that I had, but it's evidently still causing
confusion).

> 3. I'm not really on board with the entire approach.  Making
> EXPLAIN work significantly differently for developers and test
> critters than it does for everyone else seems likely to promote
> confusion and hide bugs.  I don't think getting rid of the need
> for filter functions in test scripts is worth that.

I'm open to suggestions how else to move towards the goal.

Note that there's a few other things which have vaguely-similar
behavior:

HIDE_TABLEAM=on
HIDE_TOAST_COMPRESSION=on
compute_query_id = regress

Another possibility is to make a new "explain" format, like
| explain(FORMAT REGRESS, ...).

...but I don't see how that's would be less objectionable than what I've
written.

The patch record should probably be closed until someone proposes
another way to implement what's necessary to enable explain(BUFFERS) by
default.

-- 
Justin




Re: Error-safe user functions

2022-12-01 Thread Tom Lane
Andrew Dunstan  writes:
> OK, here's a set of minimal patches based on Nikita's earlier work and
> also some work by my colleague Amul Sul. It tries to follow Tom's
> original outline at [1], and do as little else as possible.

This is not really close at all to what I had in mind.

The main objection is that we shouldn't be passing back a "char *"
error string (though I observe that your 0003 and 0004 patches aren't
even bothering to do that much).  I think we want to pass back a
fully populated ErrorData struct so that we can report everything
the actual error would have done (notably, the SQLSTATE).  That means
that elog.h/.c has to be intimately involved in this.  I liked Nikita's
overall idea of introducing an "ereturn" macro, with the idea that
where we have, say,

ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("value \"%s\" is out of range for type %s",
s, "integer")));

we would write

ereturn(context, ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("value \"%s\" is out of range for type %s",
s, "integer")));
return NULL;  // or whatever is appropriate

and the involvement with the contents of the context node would
all be confined to some new code in elog.c.  That would help
prevent the #include-footprint-bloat that is otherwise going to
ensue.

(Maybe we could assume that ereturn's elevel must be ERROR, and
save a little notation.  I'm not very wedded to "ereturn" as the
new macro name either, though it's not awful.)

Also, as I said before, the absolute first priority has to be
documentation explaining what function authors are supposed to
do differently than before.

I'd be willing to have a go at this myself, except that Corey
said he was working on it, so I don't want to step on his toes.

regards, tom lane




Re: New docs chapter on Transaction Management and related changes

2022-12-01 Thread Alvaro Herrera
On 2022-Dec-01, Bruce Momjian wrote:

> Patch reverted in all back branches.  I was hoping to get support for
> more aggressive backpatches of docs, but obviously failed.  I should
> have been clearer about my intent to backpatch, and will have to
> consider these issues in future doc backpatches.

FWIW I am in favor of backpatching doc fixes (even if they're not
completely trivial, such as 02d43ad6262d) and smallish additions of
content (63a370938).  But we've added a few terms to the glossary
(606c38459, 3dddb2a82) and those weren't backpatched, which seems
appropriate to me.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
   (Paul Graham)




Re: Error-safe user functions

2022-12-01 Thread Robert Haas
On Thu, Dec 1, 2022 at 1:14 PM Tom Lane  wrote:
> The main objection is that we shouldn't be passing back a "char *"
> error string (though I observe that your 0003 and 0004 patches aren't
> even bothering to do that much).  I think we want to pass back a
> fully populated ErrorData struct so that we can report everything
> the actual error would have done (notably, the SQLSTATE).

+1.

> That means that elog.h/.c has to be intimately involved in this.
> I liked Nikita's
> overall idea of introducing an "ereturn" macro, with the idea that
> where we have, say,
>
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>  errmsg("value \"%s\" is out of range for type %s",
> s, "integer")));
>
> we would write
>
> ereturn(context, ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>  errmsg("value \"%s\" is out of range for type %s",
> s, "integer")));
> return NULL;  // or whatever is appropriate

It sounds like you're imagining that ereturn doesn't return, which
seems confusing. But I don't know that I'd like it better if it did.
Magic return statements hidden inside macros seem not too fun. What
I'd like to see is a macro that takes a pointer to an ErrorData and
the rest of the arguments like ereport() and stuffs everything in
there. And then you can pass that to ThrowErrorData() later if you
like. That way it's visible when you're using the macro where you're
putting the error. I think that would make the code more readable.

> Also, as I said before, the absolute first priority has to be
> documentation explaining what function authors are supposed to
> do differently than before.

+1.

> I'd be willing to have a go at this myself, except that Corey
> said he was working on it, so I don't want to step on his toes.

Time is short, and I do not think Corey will be too sad if you decide
to have a go at it. The chances of person A being able to write the
code person B is imagining as well as person B could write it are not
great, regardless of who A and B are. And I think the general
consensus around here is that you're a better coder than most.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Questions regarding distinct operation implementation

2022-12-01 Thread Ankit Kumar Pandey


On 25/11/22 11:00, Ankit Kumar Pandey wrote:


On 25/11/22 02:14, David Rowley wrote:
On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey 
 wrote:

Please let me know any opinions on this.

I think if you're planning on working on this then step 1 would have
to be checking the SQL standard to see which set of rows it asks
implementations to consider for duplicate checks when deciding if the
transition should be performed or not.  Having not looked, I don't
know if this is the entire partition or just the rows in the current
frame.

Depending on what you want, an alternative today would be to run a
subquery to uniquify the rows the way you want and then do the window
function stuff.

David
Thanks David, these are excellent pointers, I will look into SQL 
standard first and so on.



Hi,

Looking further into it, I am bit clear about expectations of having 
distinct in Windows Aggregates (although I couldn't got hands on SQL 
standard as it is not in public domain but distinct in windows aggregate 
is supported by Oracle and I am using it as reference).


For table (mytable):

id, name

1, A

1, A

10, B

3, A

1, A


/select avg(distinct id) over (partition by name)/ from mytable (in 
oracle db) yields:


2

2

2

2

10


From this, it is seen distinct is taken across the all rows in the 
partition.


I also thought of using a subquery approach: /select avg(id) over 
(partition by name) from (select distinct(id), name from mytable)/


but this obviously doesn't yield right answer because result should 
contain same number of rows as input. This implies we need to find 
partition first and then remove duplicates within the partition.


Can we avoid any ordering/sort until existing logic finds if value is in 
frame (so as to respect any /order by/ clause given by user), and once 
it is determined that tuple is in frame, skip the tuple if it is a 
duplicate? If aforementioned approach is right, question is how do we 
check if it is duplicate? Should we create a lookup table (as tuples 
coming to advance_windowaggregate can be in arbitrary order)? Or any 
other approach would be better?


Any opinion on this will be appreciated.

--
Regards,
Ankit Kumar Pandey


Re: Temporary tables versus wraparound... again

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-01 11:13:01 -0500, Greg Stark wrote:
> On Sat, 5 Nov 2022 at 11:34, Tom Lane  wrote:
> >
> > Greg Stark  writes:
> > > Simple Rebase
> >
> > I took a little bit of a look through these.
> >
> > * I find 0001 a bit scary, specifically that it's decided it's
> > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext,
> > and especially relation_needs_vacanalyze to another session's
> > temp table.  How safe is that really?
> 
> I  can look a bit more closely but none of them are doing any thing
> with the table itself, just the catalog entries which afaik have
> always been fair game for other sessions. So I'm not really clear what
> kind of unsafeness you're asking about.

Is that actually true? Don't we skip some locking operations for temporary
tables, which then also means catalog modifications cannot safely be done in
other sessions?


> >  Also, skipping that update
> > for non-temp tables immediately falsifies ResetVacStats'
> > claimed charter of "resetting to the same values used when
> > creating tables".  Surely GetOldestMultiXactId isn't *that*
> > expensive, especially compared to the costs of file truncation.
> > I think you should just do GetOldestMultiXactId straight up,
> > and maybe submit a separate performance-improvement patch
> > to make it do the other thing (in both places) for temp tables.
> 
> Hm. the feedback I got earlier was that it was quite expensive. That
> said, I think the concern was about the temp tables where the truncate
> was happening on every transaction commit when they're used. For
> regular truncates I'm not sure how important optimizing it is.

And it's not called just once, but once for each relation.


> > * I wonder if this is the best place for ResetVacStats --- it
> > doesn't seem to be very close to the code it needs to stay in
> > sync with.  If there's no better place, perhaps adding cross-
> > reference comments in both directions would be advisable.
> 
> I'll look at that. I think relfrozenxid and relfrozenmxid are touched
> in a lot of places so it may be tilting at windmills to try to
> centralize the code working with them at this point...

Not convinced. Yes, there's plenty of references to relfrozenxid, but most of
them don't modify it.


I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums
etc go through tableam but you put a ResetVacStats() besides each call to
table_relation_nontransactional_truncate().  Seems like this should just be in
heapam_relation_nontransactional_truncate()?

Is it a good idea to use heap_inplace_update() in ResetVacStats()?

Greetings,

Andres Freund




Re: Allow round() function to accept float and double precision

2022-12-01 Thread David G. Johnston
On Thu, Dec 1, 2022 at 7:39 AM Tom Lane  wrote:

> Dean Rasheed  writes:
>
> > The fact that passing a negative scale to round() isn't documented
> > does seem like an oversight though...
>
> Agreed, will do something about that.
>
>
Thanks. I'm a bit surprised you left "Rounds v to s decimal places." alone
though.  I feel like the prose should also make clear that positions to the
left of the decimal, which are not conventionally considered decimal
places, can be identified.

Rounds v at s digits before or after the decimal place.

The examples will hopefully clear up any off-by-one concerns that someone
may have.

David J.


Re: Error-safe user functions

2022-12-01 Thread Tom Lane
Robert Haas  writes:
> It sounds like you're imagining that ereturn doesn't return, which
> seems confusing. But I don't know that I'd like it better if it did.

The spec I had in mind was that it would behave as ereport(ERROR)
unless a suitable FuncErrorContext node is passed, in which case
it'd store the error data into that node and return.  This leaves
the invoker with only the job of passing control back afterwards,
if it gets control back.  I'd be the first to agree that "ereturn"
doesn't capture that detail very well, but I don't have a better name.
(And I do like the fact that this name is the same length as "ereport",
so that we won't end up with lots of reindentation to do.)
   
> Magic return statements hidden inside macros seem not too fun. What
> I'd like to see is a macro that takes a pointer to an ErrorData and
> the rest of the arguments like ereport() and stuffs everything in
> there. And then you can pass that to ThrowErrorData() later if you
> like. That way it's visible when you're using the macro where you're
> putting the error. I think that would make the code more readable.

I think that'd just complicate the places that are having to report
such errors --- of which there are likely to be hundreds by the time
we are done.  I will not accept a solution that requires more than
the absolute minimum of additions to the error-reporting spots.

regards, tom lane




Re: Allow round() function to accept float and double precision

2022-12-01 Thread David Rowley
On Thu, 1 Dec 2022 at 21:55, Dean Rasheed  wrote:
> Casting to numeric(1000, n) will work fine in all cases AFAICS (1000
> being the maximum allowed precision in a numeric typemod, and somewhat
> more memorable).

I wasn't aware of the typemod limit.

I don't really agree that it will work fine in all cases though. If
the numeric has more than 1000 digits left of the decimal point then
the method won't work at all.

# select length(('1' || repeat('0',2000))::numeric(1000,0)::text);
ERROR:  numeric field overflow
DETAIL:  A field with precision 1000, scale 0 must round to an
absolute value less than 10^1000.

No issue with round() with the same number.

# select length(round(('1' || repeat('0',2000))::numeric,0)::text);
 length

   2001

David




Re: Allow round() function to accept float and double precision

2022-12-01 Thread Tom Lane
David Rowley  writes:
> I don't really agree that it will work fine in all cases though. If
> the numeric has more than 1000 digits left of the decimal point then
> the method won't work at all.

But what we're talking about is starting from a float4 or float8
input, so it can't be more than ~308 digits.

regards, tom lane




Re: Error-safe user functions

2022-12-01 Thread Robert Haas
On Thu, Dec 1, 2022 at 2:41 PM Tom Lane  wrote:
> Robert Haas  writes:
> > It sounds like you're imagining that ereturn doesn't return, which
> > seems confusing. But I don't know that I'd like it better if it did.
>
> The spec I had in mind was that it would behave as ereport(ERROR)
> unless a suitable FuncErrorContext node is passed, in which case
> it'd store the error data into that node and return.  This leaves
> the invoker with only the job of passing control back afterwards,
> if it gets control back.  I'd be the first to agree that "ereturn"
> doesn't capture that detail very well, but I don't have a better name.
> (And I do like the fact that this name is the same length as "ereport",
> so that we won't end up with lots of reindentation to do.)

I don't think it's sensible to make decisions about important syntax
on the basis of byte-length. Reindenting is a one-time effort; code
clarity will be with us forever.

> > Magic return statements hidden inside macros seem not too fun. What
> > I'd like to see is a macro that takes a pointer to an ErrorData and
> > the rest of the arguments like ereport() and stuffs everything in
> > there. And then you can pass that to ThrowErrorData() later if you
> > like. That way it's visible when you're using the macro where you're
> > putting the error. I think that would make the code more readable.
>
> I think that'd just complicate the places that are having to report
> such errors --- of which there are likely to be hundreds by the time
> we are done.

OK, that's a fair point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: meson PGXS compatibility

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-01 08:43:19 +0100, Peter Eisentraut wrote:
> On 13.10.22 07:23, Andres Freund wrote:
> > > > 0005: meson: Add PGXS compatibility
> > > > 
> > > > The actual meson PGXS compatibility. Plenty more replacements to 
> > > > do, but
> > > > suffices to build common extensions on a few platforms.
> > > > 
> > > > What level of completeness do we want to require here?
> > > 
> > > I have tried this with a few extensions.  Seems to work alright.  I don't
> > > think we need to overthink this.  The way it's set up, if someone needs
> > > additional variables set, they can easily be added.
> > 
> > Yea, I am happy enough with it now that the bulk is out of src/meson.build 
> > and
> > strip isn't set to an outright wrong value.
> 
> How are you planning to proceed with this?  I thought it was ready, but it
> hasn't moved in a while.

I basically was hoping for a review of the prerequisite patches I posted at
[1], particularly 0003 (different approach than before, comparatively large).

Here's an updated version of the patches. There was a stupid copy-paste bug in
the prior version of the 0003 / export_dynamic patch.

I'll push 0001, 0002 shortly. I don't think 0002 is the most elegant approach,
but it's not awful. I'd still like some review for 0003, but will try to push
it in a few days if that's not forthcoming.

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20221013051648.ufz7ud2b5tioctyt%40awork3.anarazel.de
>From 53b4063ff6230963ee60122d47b154f91990d097 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 5 Oct 2022 12:24:14 -0700
Subject: [PATCH v4 1/4] autoconf: Unify CFLAGS_SSE42 and CFLAGS_ARMV8_CRC32C

Until now we emitted the cflags to build the CRC objects into architecture
specific variables. That doesn't make a whole lot of sense to me - we're never
going to target x86 and arm at the same time, so they don't need to be
separate variables.

It might be better to instead continue to have CFLAGS_SSE42 /
CFLAGS_ARMV8_CRC32C be computed by PGAC_ARMV8_CRC32C_INTRINSICS /
PGAC_SSE42_CRC32_INTRINSICS and then set CFLAGS_CRC based on those. But it
seems unlikely that we'd need other sets of CRC specific flags for those two
architectures at the same time.

This simplifies the upcoming meson PGXS compatibility.

Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6cl...@awork3.anarazel.de
---
 configure.ac   | 10 +-
 config/c-compiler.m4   |  8 
 src/port/Makefile  | 16 
 configure  | 19 +--
 src/Makefile.global.in |  3 +--
 5 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/configure.ac b/configure.ac
index f76b7ee31fc..6e7c8e09411 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2091,12 +2091,11 @@ fi
 #
 # First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used
 # with the default compiler flags. If not, check if adding the -msse4.2
-# flag helps. CFLAGS_SSE42 is set to -msse4.2 if that's required.
+# flag helps. CFLAGS_CRC is set to -msse4.2 if that's required.
 PGAC_SSE42_CRC32_INTRINSICS([])
 if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
   PGAC_SSE42_CRC32_INTRINSICS([-msse4.2])
 fi
-AC_SUBST(CFLAGS_SSE42)
 
 # Are we targeting a processor that supports SSE 4.2? gcc, clang and icc all
 # define __SSE4_2__ in that case.
@@ -2110,12 +2109,13 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [
 #
 # First check if __crc32c* intrinsics can be used with the default compiler
 # flags. If not, check if adding -march=armv8-a+crc flag helps.
-# CFLAGS_ARMV8_CRC32C is set if the extra flag is required.
+# CFLAGS_CRC is set if the extra flag is required.
 PGAC_ARMV8_CRC32C_INTRINSICS([])
 if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then
   PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc])
 fi
-AC_SUBST(CFLAGS_ARMV8_CRC32C)
+
+AC_SUBST(CFLAGS_CRC)
 
 # Select CRC-32C implementation.
 #
@@ -2144,7 +2144,7 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
   USE_SSE42_CRC32C_WITH_RUNTIME_CHECK=1
 else
   # Use ARM CRC Extension if available.
-  if test x"$pgac_armv8_crc32c_intrinsics" = x"yes" && test x"$CFLAGS_ARMV8_CRC32C" = x""; then
+  if test x"$pgac_armv8_crc32c_intrinsics" = x"yes" && test x"$CFLAGS_CRC" = x""; then
 USE_ARMV8_CRC32C=1
   else
 # ARM CRC Extension, with runtime check?
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 000b075312e..eb8cc8ce170 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -597,7 +597,7 @@ fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
 # the other ones are, on x86-64 platforms)
 #
 # An optional compiler flag can be passed as argument (e.g. -msse4.2). If the
-# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_SSE42.
+# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_CRC.
 AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS],
 [define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrins

Re: Error-safe user functions

2022-12-01 Thread Tom Lane
Robert Haas  writes:
> On Thu, Dec 1, 2022 at 2:41 PM Tom Lane  wrote:
>> I'd be the first to agree that "ereturn"
>> doesn't capture that detail very well, but I don't have a better name.
>> (And I do like the fact that this name is the same length as "ereport",
>> so that we won't end up with lots of reindentation to do.)

> I don't think it's sensible to make decisions about important syntax
> on the basis of byte-length. Reindenting is a one-time effort; code
> clarity will be with us forever.

Sure, but without a proposal for a better name, that's irrelevant.

regards, tom lane




Re: Lazy JIT IR code generation to increase JIT speed with partitions

2022-12-01 Thread Dmitry Dolgov
> On Thu, Jul 14, 2022 at 02:45:29PM +0200, David Geier wrote:
> On Mon, Jul 4, 2022 at 10:32 PM Andres Freund  wrote:
> > On 2022-06-27 16:55:55 +0200, David Geier wrote:
> > > Indeed, the total JIT time increases the more modules are used. The
> > reason
> > > for this to happen is that the inlining pass loads and deserializes all
> > to
> > > be inlined modules (.bc files) from disk prior to inlining them via
> > > llvm::IRMover. There's already a cache for such modules in the code, but
> > it
> > > is currently unused. This is because llvm::IRMover takes the module to be
> > > inlined as std::unique_ptr. The by-value argument requires
> > > the source module to be moved, which means it cannot be reused
> > afterwards.
> > > The code is accounting for that by erasing the module from the cache
> > after
> > > inlining it, which in turns requires reloading the module next time a
> > > reference to it is encountered.
> > >
> > > Instead of each time loading and deserializing all to be inlined modules
> > > from disk, they can reside in the cache and instead be cloned via
> > > llvm::CloneModule() before they get inlined. Key to calling
> > > llvm::CloneModule() is fully deserializing the module upfront, instead of
> > > loading the module lazily. That is why I changed the call from
> > > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via
> > > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2()
> > (which
> > > fully loads the module via llvm::parseBitcodeFile()). Beyond that it
> > seems
> > > like that prior to LLVM 13, cloning modules could fail with an assertion
> > > (not sure though if that would cause problems in a release build without
> > > assertions). Andres reported this problem back in the days here [1]. In
> > the
> > > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13,
> > see
> > > [3].
> >
> > Unfortunately that doesn't work right now - that's where I had started. The
> > problem is that IRMover renames types. Which, in the case of cloned modules
> > unfortunately means that types used cloned modules are also renamed in the
> > "origin" module. Which then causes problems down the line, because parts of
> > the LLVM code match types by type names.
> >
> > That can then have the effect of drastically decreasing code generation
> > quality over time, because e.g. inlining never manages to find signatures
> > compatible.

Hi,

Thanks for the patch, looks quite interesting!

First to summarize things a bit: from what I understand there are two
suggestions on the table, one is about caching modules when doing
inlining, the second is about actual lazy jitting. Are those to tightly
coupled together, could they be split into two separate patches? It
would make it a bit easier to review and test.

I was playing with the caching part of the patch (still have to think
about the lazy jitting), which generally seems to be a good idea.  From
what I see the main concern here is a chance that IRMover will rename
types, degrading performance of the generated code in long run. I have
to admit, I'm not fully sure mentioned LLVM 13 fix [1] completely
eliminates those concerns, somehow its description is formulated in not
very committing way ("don't know if this patch fixes ..., but it does
fix a few soundness issues that have crept in."). But I haven't found
any crashes or false asserts coming from the LLVM side (using v13),
running several rounds of regression tests with forced jitting, so a
point to the fix.

I would be curious to learn how Andres was troubleshooting type renaming
issues? Using LLVM 13 from packages, jitting the same query twice and
dumping the bitcode out showed some difference in types a-la
"%struct.ExprState.142" vs "%struct.ExprState.430" (from what I
understood, the whole thing is an identifier, including the number) with
the patch, but the very same changes are happening on the main branch as
well. Of course, I was inspecting bitcode only for certain queries, it
doesn't exclude that some other examples actually feature type renaming.

In general, it would be interesting to know how to do some sort of "smoke
tests" for the generated code, e.g. in case if LLVM has fixed this
particular issue, but they might reappear in the future?

I did few performance tests and got numbers similar to posted in the
thread, inlining time being impressively reduced (~10 times) as well as
(suspiciously) optimization time (~1.5 times). The approach was a bit
different though, I've run the sequence of example queries from the
thread using pgbench and checked jit counters from pgss.

Few small notes:

If caching of modules is safe from LLVM >= 13, I guess it should be
wrapped into a corresponding condition on LLVM_VERSION_MAJOR, right?

Why the assert about hasExternalLinkage was removed from the
llvmjit_inline.cpp?

For so much discussion about such a small change there is definitely not
enough commentaries in the code about dangers of cloning modules.

A

Using WaitEventSet in the postmaster

2022-12-01 Thread Thomas Munro
Hi,

Here's a work-in-progress patch that uses WaitEventSet for the main
event loop in the postmaster, with a latch as the wakeup mechanism for
"PM signals" (requests from backends to do things like start a
background worker, etc).  There are still raw signals that are part of
the external interface (SIGHUP etc), but those handlers just set a
flag and set the latch, instead of doing the state machine work.  Some
advantages I can think of:

1.  Inherits various slightly more efficient modern kernel APIs for
multiplexing.
2.  Will automatically benefit from later improvements to WaitEventSet.
3.  Looks much more like the rest of our code.
4.  Requires no obscure signal programming knowledge to understand.
5.  Removes the strange call stacks we have, where most of postgres is
forked from inside a signal handler.
6.  Might help with weirdness and bugs in some signal implementations
(Cygwin, NetBSD?).
7.  Removes the need to stat() PROMOTE_SIGNAL_FILE and
LOGROTATE_SIGNAL_FILE whenever PM signals are sent, now that SIGUSR1
is less overloaded.
8.  It's a small step towards removing the need to emulate signals on Windows.

In order to avoid adding a new dependency on the contents of shared
memory, I introduced SetLatchRobustly() that will always use the slow
path kernel wakeup primitive, even in cases where SetLatch() would
not.  The idea here is that if one backend trashes shared memory,
others backends can still wake the postmaster even though it may
appear that the postmaster isn't waiting or the latch is already set.
It would be possible to go further and have a robust wait mode that
doesn't read is_set too.  It was indecision here that stopped me
proposing this sooner...

One thing that might need more work is cleanup of the PM's WES in
child processes.  Also I noticed in passing that the Windows kernel
event handles for latches are probably leaked on crash-reinit, but
that was already true, this just adds one more.  Also the way I re-add
the latch every time through the event loop in case there was a
crash-reinit is stupid, I'll tidy that up.

This is something I extracted and rejuvenated from a larger set of
patches I was hacking on a year or two ago to try to get rid of lots
of users of raw signals.  The recent thread about mamba's struggles
and the possibility that latches might help reminded me to dust this
part off, and potentially avoid some duplicated effort.

I'm not saying this is free of bugs, but it's passing on CI and seems
like enough to share and see what people think.

(Some other ideas I thought about back then: we could invent
WL_SIGNAL, and not need all those global flag variables or the
handlers that set the latch.  For eg kqueue it's trivial, and for
ancient Unixes you could do a sort of home-made signalfd with a single
generic signal handler that just does write(self_pipe, &siginfo,
sizeof(siginfo)).  But that starts to seems like refactoring for
refactoring's sake; that's probably how I'd write a native kqueue
program, but it's not even that obvious to me that we really should be
pretending that Windows has signals, which put me off that idea.
Perhaps in a post-fake-signals world we just have the postmaster's
event loop directly consume commands from pg_ctl from a control pipe?
Too many decisions at once, I gave that line of thinking up for now.)
From 978aa358885312372f842cd47549bb04a78477ab Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
its main work entirely inside signal handlers, which were only unblocked
while waiting in select().

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Work in progress!
---
 src/backend/libpq/pqsignal.c|  40 
 src/backend/postmaster/postmaster.c | 336 ++--
 src/backend/storage/ipc/latch.c |  54 -
 src/backend/storage/ipc/pmsignal.c  |  26 ++-
 src/backend/utils/init/miscinit.c   |  13 +-
 src/include/libpq/pqsignal.h|   3 -
 src/include/miscadmin.h |   1 +
 src/include/storage/latch.h |   9 +-
 src/include/storage/pmsignal.h  |   3 +
 9 files changed, 259 insertions(+), 226 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we b

Re: Error-safe user functions

2022-12-01 Thread Robert Haas
On Thu, Dec 1, 2022 at 3:49 PM Tom Lane  wrote:
> > I don't think it's sensible to make decisions about important syntax
> > on the basis of byte-length. Reindenting is a one-time effort; code
> > clarity will be with us forever.
>
> Sure, but without a proposal for a better name, that's irrelevant.

Sure, but you're far too clever not to be able to come up with
something good without any help from me. io_error_return_or_throw()?
store_or_report_io_error()? Or just store_io_error()?

It sounds to me like we're crafting something that is specific to and
can only be used with type input and output functions, so the name
probably should reflect that rather than being something totally
generic like ereturn() or error_stash() or whatever. If we were making
this into a general-purpose way of sticking an error someplace, then a
name like that would make sense and this would be an extension of the
elog.c interface. But what you're proposing is a great deal more
specialized than that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allow round() function to accept float and double precision

2022-12-01 Thread David Rowley
On Fri, 2 Dec 2022 at 09:02, Tom Lane  wrote:
>
> David Rowley  writes:
> > I don't really agree that it will work fine in all cases though. If
> > the numeric has more than 1000 digits left of the decimal point then
> > the method won't work at all.
>
> But what we're talking about is starting from a float4 or float8
> input, so it can't be more than ~308 digits.

I may have misunderstood. I thought David J was proposing this as a
useful method for rounding numeric too. Re-reading what he wrote, I no
longer think he was.

David




Re: Questions regarding distinct operation implementation

2022-12-01 Thread David Rowley
On Fri, 2 Dec 2022 at 08:10, Ankit Kumar Pandey  wrote:
> select avg(distinct id) over (partition by name) from mytable (in oracle db) 
> yields:
> 2
> 2
> 2
> 2
> 10
>
> From this, it is seen distinct is taken across the all rows in the partition.

Due to the lack of ORDER BY clause, all rows in the partition are in
the window frame at once.  The question is, what *should* happen if
you add an ORDER BY.

Looking at the copy of the standard that I have, I see nothing
explicitly mentioned about aggregates with DISTINCT used as window
functions, however, I do see in the Window Function section:

"The window aggregate functions compute an 
(COUNT, SUM, AVG, etc.), the same as
a group aggregate function, except that the computation aggregates
over the window frame of a row rather than
over a group of a grouped table. The hypothetical set functions are
not permitted as window aggregate functions."

So you could deduce that the DISTINCT would also need to be applied
over the frame too.

The question is, what do you want to make work?  If you're not worried
about supporting DISTINCT when there is an ORDER BY clause and the
frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING, then it's going to be much easier to make work.
You never need to worry about rows dropping out of visibility in the
frame. Simply all rows in the partition are in the frame.

You do need to be careful as, if I remember correctly, we do support
some non-standard things here. I believe the standard requires an
ORDER BY when specifying frame options. I think we didn't see any
technical reason to apply that limitation, so didn't.  That means in
Postgres, you can do things like:

select avg(id) over (partition by name ROWS BETWEEN CURRENT ROW AND 3
FOLLOWING) from mytable;

but that's unlikely to work on most other databases without adding an ORDER BY.

So if you are going to limit this to only being supported without an
ORDER BY, then you'll need to ensure that the specified frame options
don't cause your code to break.  I'm unsure, but this might be a case
of checking for FRAMEOPTION_NONDEFAULT unless it's
FRAMEOPTION_START_UNBOUNDED_PRECEDING|FRAMEOPTION_END_UNBOUNDED_FOLLOWING.
You'll need to study that a bit more than I just did though.

One way to make that work might be to add code to
eval_windowaggregates() around the call to advance_windowaggregate(),
you can see the row being aggregated is set by:

winstate->tmpcontext->ecxt_outertuple = agg_row_slot;

what you'd need to do here is change the code so that you put all the
rows to aggregate into a tuplesort then sort them by the distinct
column and instead, feed the tuplesort rows to
advance_windowaggregate(). You'd need to add code similar to what is
in process_ordered_aggregate_single() in nodeAgg.c to have the
duplicate consecutive rows skipped.

Just a word of warning on this. This is a hugely complex area of
Postgres. If I was you, I'd make sure and spend quite a bit of time
reading nodeWindowAgg.c and likely much of nodeAgg.c. Any changes we
accept in that area are going to have to be very carefully done. Make
sure you're comfortable with the code before doing too much. It would
be very easy to end up with a giant mess if you try to do this without
fully understanding the implications of your changes.  Also, you'll
need to show you've not regressed the performance of the existing
features with the code you've added.

Good luck!

David




Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Nathan Bossart
On Wed, Nov 30, 2022 at 05:27:10PM +0530, Bharath Rupireddy wrote:
> On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy
>  wrote:
>> Thanks for the patches. I spent some time on reviewing v17 patch set
>> and here are my comments:

Thanks for reviewing!

>> 0001:
>> 1. I think the custodian process needs documentation - it needs a
>> definition in glossary.sgml and perhaps a dedicated page describing
>> what tasks it takes care of.

Good catch.  I added this in v18.  I stopped short of adding a dedicated
page to describe the tasks because 1) there are no parameters for the
custodian and 2) AFAICT none of its tasks are described in the docs today.

>> 2.
>> +LWLockReleaseAll();
>> +ConditionVariableCancelSleep();
>> +AbortBufferIO();
>> +UnlockBuffers();
>> +ReleaseAuxProcessResources(false);
>> +AtEOXact_Buffers(false);
>> +AtEOXact_SMgr();
>> +AtEOXact_Files(false);
>> +AtEOXact_HashTables(false);
>> Do we need all of these in the exit path? Isn't the stuff that
>> ShutdownAuxiliaryProcess() does enough for the custodian process?
>> AFAICS, the custodian process uses LWLocks (which the
>> ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared
>> buffers and so on.
>> Having said that, I'm fine to keep them for future use and all of
>> those cleanup functions exit if nothing related occurs.

Yeah, I don't think we need a few of these.  In v18, I've kept the
following:
* LWLockReleaseAll()
* ConditionVariableCancelSleep()
* ReleaseAuxProcessResources(false)
* AtEOXact_Files(false)

>> 3.
>> + * Advertise out latch that backends can use to wake us up while we're
>> Typo - %s/out/our

fixed

>> 4. Is it a good idea to add log messages in the DoCustodianTasks()
>> loop? Maybe at a debug level? The log message can say the current task
>> the custodian is processing. And/Or setting the custodian's status on
>> the ps display is also a good idea IMO.

I'd like to pick these up in a new thread if/when this initial patch set is
committed.  The tasks already do some logging, and the checkpointer process
doesn't update the ps display for these tasks today.

>> 0002 and 0003:
>> 1.
>> +CHECKPOINT;
>> +DO $$
>> I think we need to ensure that there are some snapshot files before
>> the checkpoint. Otherwise, it may happen that the above test case
>> exits without the custodian process doing anything.
>>
>> 2. I think the best way to test the custodian process code is by
>> adding a TAP test module to see actually the custodian process kicks
>> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
>> process functions and see if we see the logs in server logs.

The test appears to reliably create snapshot and mapping files, so if the
directories are empty at some point after the checkpoint at the end, we can
be reasonably certain the custodian took action.  I didn't add explicit
checks that there are files in the directories before the checkpoint
because a concurrent checkpoint could make such checks unreliable.

>> 0004:
>> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
>> Otherwise the patch LGTM.

I'm keeping this one separate because I've received conflicting feedback
about the idea.

>> 1. I think we can trivially extend the custodian process to remove any
>> future WAL files on the old timeline, something like the attached
>> 0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file).
>> While this offloads the recovery a bit, the server may archive such
>> WAL files before the custodian removes them. We can do a bit more to
>> stop the server from archiving such WAL files, but that needs more
>> coding. I don't think we need to do all that now, perhaps, we can give
>> it a try once the basic custodian stuff gets in.
>> 2. Moving RemovePgTempFiles() to the custodian can bring up the server
>> soon. The idea is that the postmaster just renames the temp
>> directories and informs the custodian so that it can go delete such
>> temp files and directories. I have personally seen cases where the
>> server spent a good amount of time cleaning up temp files. We can park
>> it for later.
>> 3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster.
>> 4. PreallocXlogFiles() - if we ever have plans to make pre-allocation
>> more aggressive (pre-allocate more than 1 WAL file), perhaps letting
>> custodian do that is a good idea. Again, too many tasks for a single
>> process.

I definitely want to do #2.  І have some patches for that upthread, but I
removed them for now based on Simon's feedback.  I intend to pick that up
in a new thread.  I haven't thought too much about the others yet.

> Another comment:
> IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
> wakeups for power savings (being discussed in the other thread).
> However, can it happen that the custodian missed to capture SetLatch
> wakeups by other backends? In o

Re: Allow round() function to accept float and double precision

2022-12-01 Thread David G. Johnston
On Thu, Dec 1, 2022 at 2:21 PM David Rowley  wrote:

> On Fri, 2 Dec 2022 at 09:02, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > I don't really agree that it will work fine in all cases though. If
> > > the numeric has more than 1000 digits left of the decimal point then
> > > the method won't work at all.
> >
> > But what we're talking about is starting from a float4 or float8
> > input, so it can't be more than ~308 digits.
>
> I may have misunderstood. I thought David J was proposing this as a
> useful method for rounding numeric too. Re-reading what he wrote, I no
> longer think he was.
>
>
I was not, my response was that what is being asked for is basically a cast
from float to numeric, and doing that via a "round()" function seems odd.
And we can handle the desired rounding aspect of that process already via
the existing numeric(1000, n) syntax.

David J.


Re: Questions regarding distinct operation implementation

2022-12-01 Thread David G. Johnston
On Thu, Dec 1, 2022 at 2:37 PM David Rowley  wrote:

>
> The question is, what do you want to make work?  If you're not worried
> about supporting DISTINCT when there is an ORDER BY clause and the
> frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING, then it's going to be much easier to make work.
> You never need to worry about rows dropping out of visibility in the
> frame. Simply all rows in the partition are in the frame.
>

I would definitely want the ability to have the output ordered and distinct
at the same time.

array_agg(distinct col) over (order by whatever)

Conceptually this seems like it can be trivially accomplished with a simple
lookup table, the key being the distinct column(s) and the value being a
counter - with the entry being removed when the counter goes to zero
(decreases happening each time a row goes out of scope).  The main concern,
I suspect, isn't implementation ability, it is speed and memory consumption.

I would expect the distinct output to be identical to the non-distinct
output except for duplicates removed.  Using array_agg as an example makes
seeing the distinction quite easy.

Thinking over the above a bit more, is something like this possible?

array_agg(distinct col order by col) over (order by whatever)

i.e., can we add order by within the aggregate to control its internal
ordering separately from the ordering needed for the window framing?

David J.


Re: Error-safe user functions

2022-12-01 Thread Tom Lane
Robert Haas  writes:
> It sounds to me like we're crafting something that is specific to and
> can only be used with type input and output functions, so the name
> probably should reflect that rather than being something totally
> generic like ereturn() or error_stash() or whatever.

My opinion is exactly the opposite.  Don't we already have a need
for error-safe type conversions, too, in the JSON stuff?  Even if
I couldn't point to a need-it-now requirement, I think we will
eventually find a use for this with some other classes of functions.

> If we were making
> this into a general-purpose way of sticking an error someplace, then a
> name like that would make sense and this would be an extension of the
> elog.c interface. But what you're proposing is a great deal more
> specialized than that.

I'm proposing *exactly* an extension of the elog.c interface;
so were you, a couple messages back.  It's only specialized to I/O
in the sense that our current need is for that.

regards, tom lane




Re: Asynchronous execution support for Custom Scan

2022-12-01 Thread Kohei KaiGai
> > IIUC, we already can use ctid in the where clause on the latest
> > PostgreSQL, can't we?
>
> Oh, sorry, I missed the TidRangeScan. My apologies for the noise.
>
I made the ctidscan extension when we developed CustomScan API
towards v9.5 or v9.6, IIRC. It would make sense just an example of
CustomScan API (e.g, PG-Strom code is too large and complicated
to learn about this API), however, makes no sense from the standpoint
of the functionality.

Best regards,

2022年12月1日(木) 22:27 Ronan Dunklau :
>
> > IIUC, we already can use ctid in the where clause on the latest
> > PostgreSQL, can't we?
>
> Oh, sorry, I missed the TidRangeScan. My apologies for the noise.
>
> Best regards,
>
> --
> Ronan Dunklau
>
>
>
>


-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Using AF_UNIX sockets always for tests on Windows

2022-12-01 Thread Thomas Munro
Hi,

Commit f5580882 established that all supported computers have AF_UNIX.
One of the follow-up consequences that was left unfinished is that we
could simplify our test harness code to make it the same on all
platforms.  Currently we have hundreds of lines of C and perl to use
secure TCP connections instead for the benefit of defunct Windows
versions.  Here's a patch set for that.  These patches and some
discussion of them were buried in the recent
clean-up-after-recently-dropped-obsolete-systems thread[1], and I
didn't want to lose track of them.  I think they would need review and
testing from a Windows-based hacker to make progress.  The patches
are:

1.  Teach mkdtemp() to make a non-world-accessible directory.  This is
required to be able to make a socket that other processes can't
connect to, to match the paranoia level used on Unix.  This was
written just by reading documentation, because I am not a Windows
user, so I would be grateful for a second opinion and/or testing from
a Windows hacker, which would involve testing with two different
users.  The idea is that Windows' mkdir() is completely ignoring the
permissions (we can see in the mingw headers that it literally throws
away the mode argument), so we shouldn't use that, but native
CreateDirectory() when given a pointer to a SECURITY_ATTRIBUTES with
lpSecurityDesciptor set to NULL should only allow the current user to
access the object (directory).  Does this really work, and would it be
better to create some more explicit private-keep-out
SECURITY_ATTRIBUTE, and how would that look?

I'm fairly sure that filesystem permissions must be enough to stop
another OS user from connecting, because it's clear from documentation
that AF_UNIX works on Windows by opening the file and reading some
magic "reparse" data from inside it, so if you can't see into the
containing directory, you can't do it.  This patch is the one the rest
are standing on, because the tests should match Unix in their level of
security.

2.  Always use AF_UNIX for pg_regress.  Remove a bunch of
no-longer-needed sspi auth stuff.  Remove comments that worried about
signal handler safety (referring here to real Windows signals, not
fake PostgreSQL signals that are a backend-only concept).  By my
reading of the Windows documentation and our code, there is no real
concern here, so the remove_temp() stuff should be fine, as I have
explained in a new comment.  But I have not tested this signal safety
claim, not being a Windows user.  I added an assertion that should
hold if I am right.  If you run this on Windows and interrupt
pg_regress with ^C, does it hold?

3.  Use AF_UNIX for TAP tests too.

4.  In passing, remove our documentation's claim that Linux's
"abstract" AF_UNIX namespace is available on Windows.  It does not
work at all, according to all reports (IMHO it seems like an
inherently insecure interface that other OSes would be unlikely to
adopt).

Note that this thread is not about libpq, which differs from Unix by
defaulting to host=localhost rather than AF_UNIX IIRC.  That's a
user-facing policy decision I'm not touching; this thread is just
about cleaning up old test infrastructure of interest to hackers.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ3LHeP9w5Fgzdr4G8AnEtJ%3Dz%3Dp6hGDEm4qYGEUX5B6fQ%40mail.gmail.com
From 62b1cdbdc848f96eef02ed97f14be9c1f4557539 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 7 Sep 2022 07:35:11 +1200
Subject: [PATCH 1/4] WIP: Make mkdtemp() more secure on Windows.

Our POSIX mkdtemp() implementation in src/port/mkdtemp.c code would
create directories with default permissions on Windows.  Fix, using
native Windows API instead of mkdir().
---
 src/port/mkdtemp.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/port/mkdtemp.c b/src/port/mkdtemp.c
index 8809957dcd..8116317435 100644
--- a/src/port/mkdtemp.c
+++ b/src/port/mkdtemp.c
@@ -187,8 +187,20 @@ GETTEMP(char *path, int *doopen, int domkdir)
 		}
 		else if (domkdir)
 		{
+#ifdef WIN32
+			SECURITY_ATTRIBUTES sa = {
+.nLength = sizeof(SECURITY_ATTRIBUTES),
+.lpSecurityDescriptor = NULL,
+.bInheritHandle = false
+			};
+
+			if (CreateDirectory(path, &sa))
+return 1;
+			_dosmaperr(GetLastError());
+#else
 			if (mkdir(path, 0700) >= 0)
 return 1;
+#endif
 			if (errno != EEXIST)
 return 0;
 		}
-- 
2.38.1

From 388719a6fbb45896ff87794ed4bfdbc0f0aac329 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 19 Aug 2022 11:28:38 +1200
Subject: [PATCH 2/4] WIP: Always use Unix-domain sockets in pg_regress on
 Windows.

Since we can now rely on Unix-domain sockets working on supported
Windows versions (10+), we can remove a source of instability and a
difference between Unix and Windows in pg_regress.

Previously, we thought the socket cleanup code was unsafe, so we made
Unix-domain sockets an option with a "use-at-your-own-risk" note.  On
closer inspection, the concerns about signal handlers don't seem to
apply here.  (initdb.c h

Re: HOT chain validation in verify_heapam()

2022-12-01 Thread Andres Freund
Hi,

On 2022-11-30 16:09:19 +0530, Himanshu Upadhyaya wrote:
> has been updated to handle cases of sub-transaction.

Thanks!


> + /* Loop over offsets and validate the data in the predecessor 
> array. */
> + for (OffsetNumber currentoffnum = FirstOffsetNumber; 
> currentoffnum <= maxoff;
> +  currentoffnum = OffsetNumberNext(currentoffnum))
> + {
> + HeapTupleHeader pred_htup;
> + HeapTupleHeader curr_htup;
> + TransactionId pred_xmin;
> + TransactionId curr_xmin;
> + ItemId  pred_lp;
> + ItemId  curr_lp;
> + boolpred_in_progress;
> + XidCommitStatus xid_commit_status;
> + XidBoundsViolation xid_status;
> +
> + ctx.offnum = predecessor[currentoffnum];
> + ctx.attnum = -1;
> + curr_lp = PageGetItemId(ctx.page, currentoffnum);
> + if (!lp_valid[currentoffnum] || 
> ItemIdIsRedirected(curr_lp))
> + continue;

I don't think we should do PageGetItemId(ctx.page, currentoffnum); if 
!lp_valid[currentoffnum].


> + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, 
> curr_lp);
> + curr_xmin = HeapTupleHeaderGetXmin(curr_htup);
> + xid_status = get_xid_status(curr_xmin, &ctx, 
> &xid_commit_status);
> + if (!(xid_status == XID_BOUNDS_OK || xid_status == 
> XID_INVALID))
> + continue;

Why can we even get here if the xid status isn't XID_BOUNDS_OK?


> + if (ctx.offnum == 0)

For one, I think it'd be better to use InvalidOffsetNumber here. But more
generally, storing the predecessor in ctx.offnum seems quite confusing.


> + {
> + /*
> +  * No harm in overriding value of ctx.offnum as 
> we will always
> +  * continue if we are here.
> +  */
> + ctx.offnum = currentoffnum;
> + if (TransactionIdIsInProgress(curr_xmin) || 
> TransactionIdDidCommit(curr_xmin))

Is it actually ok to call TransactionIdDidCommit() here? There's a reason
get_xid_status() exists after all. And we do have the xid status for xmin
already, so this could just check xid_commit_status, no?



Greetings,

Andres Freund




Re: wake up logical workers after ALTER SUBSCRIPTION

2022-12-01 Thread Nathan Bossart
On Tue, Nov 29, 2022 at 09:04:41PM -0800, Nathan Bossart wrote:
> I don't mind fixing it!  There are a couple more I'd like to track down
> before posting another revision.

Okay, here is a new version of the patch.  This seems to clear up
everything that I could find via the tests.

Thanks to this effort, I discovered that we need to include
wal_retrieve_retry_interval in our wait time calculations after failed
tablesyncs (for the suppress-unnecessary-wakeups patch).  I'll make that
change and post that patch in a new thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 79db8837e5b3054fb6007326d230aef50f3cda87 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v4 1/1] wake up logical workers after ALTER SUBSCRIPTION and
 when two_phase mode can be enabled

---
 src/backend/access/transam/xact.c|  3 ++
 src/backend/catalog/pg_subscription.c|  7 
 src/backend/commands/alter.c |  7 
 src/backend/commands/subscriptioncmds.c  |  4 +++
 src/backend/replication/logical/worker.c | 46 
 src/include/replication/logicalworker.h  |  3 ++
 6 files changed, 70 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8086b857b9..dc00e66cfb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -47,6 +47,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/snapbuild.h"
 #include "replication/syncrep.h"
@@ -2360,6 +2361,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_LogicalRepWorkers(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2860,6 +2862,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_LogicalRepWorkers(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index a506fc3ec8..d9f21b7dcf 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "replication/logicalworker.h"
 #include "storage/lmgr.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
@@ -320,6 +321,12 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
 	/* Update the catalog. */
 	CatalogTupleUpdate(rel, &tup->t_self, tup);
 
+	/*
+	 * We might be ready to enable two_phase mode, which requires the workers
+	 * to restart.  Wake them up at commit time to check the status.
+	 */
+	LogicalRepWorkersWakeupAtCommit(subid);
+
 	/* Cleanup. */
 	table_close(rel, NoLock);
 }
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 10b6fe19a2..da14e91552 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -59,6 +59,7 @@
 #include "commands/user.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
+#include "replication/logicalworker.h"
 #include "rewrite/rewriteDefine.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt)
 		   stmt->newname);
 table_close(catalog, RowExclusiveLock);
 
+/*
+ * Wake up the logical replication workers to handle this
+ * change quickly.
+ */
+LogicalRepWorkersWakeupAtCommit(address.objectId);
+
 return address;
 			}
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index d673557ea4..e29cdcbff9 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
 #include "nodes/makefuncs.h"
 #include "pgstat.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -1358,6 +1359,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 
 	table_close(rel, RowExclusiveLock);
 
+	/* Wake up the logical replication workers to handle this change quickly. */
+	LogicalRepWorkersWakeupAtCommit(subid);
+
 	ObjectAddressSet(myself, SubscriptionRelationId, subid);
 
 	InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e48a3f589a..c39ca5a3cb 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -253,6 +253,8 @@ WalReceiverConn *LogRepWorkerWalRcvConn = NULL;
 Subscription *MySubscription = NULL;
 static bool MySubscriptionValid = false;
 
+static List *on_commit_w

Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

2022-12-01 Thread Andres Freund
On 2022-11-25 16:54:19 +0530, Bharath Rupireddy wrote:
> On Fri, Nov 25, 2022 at 12:16 AM Andres Freund  wrote:
> > I think we could improve this code more significantly by avoiding the call 
> > to
> > LWLockWaitForVar() for all locks that aren't acquired or don't have a
> > conflicting insertingAt, that'd require just a bit more work to handle 
> > systems
> > without tear-free 64bit writes/reads.
> >
> > The easiest way would probably be to just make insertingAt a 64bit atomic,
> > that transparently does the required work to make even non-atomic 
> > read/writes
> > tear free. Then we could trivially avoid the spinlock in
> > LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
> > work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
> > list lock if there aren't any waiters.
> >
> > I'd bet that start to have visible effects in a workload with many small
> > records.
> 
> Thanks Andres! I quickly came up with the attached patch. I also ran
> an insert test [1], below are the results. I also attached the results
> graph. The cirrus-ci is happy with the patch -
> https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2.
> Please let me know if the direction of the patch seems right.
> clientsHEADPATCHED
> 113541499
> 214511464
> 430693073
> 857125797
> 161133111157
> 322202022074
> 644174242213
> 1287130076638
> 256103652118944
> 512111250161582
> 76899544161987
> 102496743164161
> 204872711156686
> 409654158135713

Nice.


> From 293e789f9c1a63748147acd613c556961f1dc5c4 Mon Sep 17 00:00:00 2001
> From: Bharath Rupireddy 
> Date: Fri, 25 Nov 2022 10:53:56 +
> Subject: [PATCH v1] WAL Insertion Lock Improvements
> 
> ---
>  src/backend/access/transam/xlog.c |  8 +++--
>  src/backend/storage/lmgr/lwlock.c | 56 +--
>  src/include/storage/lwlock.h  |  7 ++--
>  3 files changed, 41 insertions(+), 30 deletions(-)
> 
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index a31fbbff78..b3f758abb3 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -376,7 +376,7 @@ typedef struct XLogwrtResult
>  typedef struct
>  {
>   LWLock  lock;
> - XLogRecPtr  insertingAt;
> + pg_atomic_uint64insertingAt;
>   XLogRecPtr  lastImportantAt;
>  } WALInsertLock;
>  
> @@ -1482,6 +1482,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
>   {
>   XLogRecPtr  insertingat = InvalidXLogRecPtr;
>  
> + /* Quickly check and continue if no one holds the lock. */
> + if (!IsLWLockHeld(&WALInsertLocks[i].l.lock))
> + continue;

I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c


I'd probably split the change to an atomic from other changes either way.

Greetings,

Andres Freund




Re: pg_upgrade: Make testing different transfer modes easier

2022-12-01 Thread Kyotaro Horiguchi
At Thu, 1 Dec 2022 16:18:21 +0100, Peter Eisentraut 
 wrote in 
> I wanted to test the different pg_upgrade transfer modes (--link,
> --clone), but that was not that easy, because there is more than one
> place in the test script you have to find and manually change.  So I
> wrote a little patch to make that easier.  It's still manual, but it's
> a start.  (In principle, we could automatically run the tests with
> each supported mode in a loop, but that would be very slow.)
> 
> While doing that, I also found it strange that the default transfer
> mode (referred to as "copy" internally) did not have any external
> representation, so it is awkward to refer to it in text, and obscure
> to see where it is used for example in those test scripts.  So I added
> an option --copy, which effectively does nothing, but it's not
> uncommon to have options that select default behaviors explicitly.  (I

I don't have a clear idea of wheter it is common or not, but I suppose many 
such commands allow to choose the default behavior by a configuration file or 
an environment variable, etc. But I don't mind the command had the effetively 
nop option only for completeness.

> also thought about something like a "mode" option with an argument,
> but given that we already have --link and --clone, this seemed the
> most sensible.)
> 
> Thoughts?

When I read up to the point of the --copy option, what came to my mind
was the --mode= option.  IMHO, if I was going to add an option
to choose the copy behavior, I would add --mode option instead, like
pg_ctl does, as it implicitly signals that the suboptions are mutually
exclusive.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Using AF_UNIX sockets always for tests on Windows

2022-12-01 Thread Tom Lane
Thomas Munro  writes:
> Commit f5580882 established that all supported computers have AF_UNIX.
> One of the follow-up consequences that was left unfinished is that we
> could simplify our test harness code to make it the same on all
> platforms.  Currently we have hundreds of lines of C and perl to use
> secure TCP connections instead for the benefit of defunct Windows
> versions.  Here's a patch set for that.

If we remove that, won't we have a whole lot of code that's not
tested at all on any platform, ie all the TCP-socket code?

regards, tom lane




Re: Using WaitEventSet in the postmaster

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-02 10:12:25 +1300, Thomas Munro wrote:
> Here's a work-in-progress patch that uses WaitEventSet for the main
> event loop in the postmaster

Wee!


> with a latch as the wakeup mechanism for "PM signals" (requests from
> backends to do things like start a background worker, etc).

Hm - is that directly related? ISTM that using a WES in the main loop, and
changing pmsignal.c to a latch are somewhat separate things?

Using a latch for pmsignal.c seems like a larger lift, because it means that
all of latch.c needs to be robust against a corrupted struct Latch.


> In order to avoid adding a new dependency on the contents of shared
> memory, I introduced SetLatchRobustly() that will always use the slow
> path kernel wakeup primitive, even in cases where SetLatch() would
> not.  The idea here is that if one backend trashes shared memory,
> others backends can still wake the postmaster even though it may
> appear that the postmaster isn't waiting or the latch is already set.

Why is that a concern that needs to be addressed?


ISTM that the important thing is that either a) the postmaster's latch can't
be corrupted, because it's not shared with backends or b) struct Latch can be
overwritten with random contents without causing additional problems in
postmaster.

I don't think b) is the case as the patch stands. Imagine some process
overwriting pm_latch->owner_pid. That'd then break the SetLatch() in
postmaster's signal handler, because it wouldn't realize that itself needs to
be woken up, and we'd just signal some random process.


It doesn't seem trivial (but not impossible either) to make SetLatch() robust
against arbitrary corruption. So it seems easier to me to just put the latch
in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler.

Greetings,

Andres Freund




Re: Using AF_UNIX sockets always for tests on Windows

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
> Thomas Munro  writes:
> > Commit f5580882 established that all supported computers have AF_UNIX.
> > One of the follow-up consequences that was left unfinished is that we
> > could simplify our test harness code to make it the same on all
> > platforms.  Currently we have hundreds of lines of C and perl to use
> > secure TCP connections instead for the benefit of defunct Windows
> > versions.  Here's a patch set for that.
> 
> If we remove that, won't we have a whole lot of code that's not
> tested at all on any platform, ie all the TCP-socket code?

There's some coverage via the auth and ssl tests. But I agree it's an
issue. But to me the fix for that seems to be to add a dedicated test for
that, rather than relying on windows to test our socket code - that's quite a
few separate code paths from the tcp support of other platforms.

Greetings,

Andres Freund




Re: Prefetch the next tuple's memory during seqscans

2022-12-01 Thread David Rowley
On Thu, 1 Dec 2022 at 18:18, John Naylor  wrote:
> I then tested a Power8 machine (also kernel 3.10 gcc 4.8). Configure reports 
> "checking for __builtin_prefetch... yes", but I don't think it does anything 
> here, as the results are within noise level. A quick search didn't turn up 
> anything informative on this platform, and I'm not motivated to dig deeper. 
> In any case, it doesn't make things worse.

Thanks for testing the power8 hardware.

Andres just let me test on some Apple M1 hardware (those cores are
insanely fast!)

Using the table and running the script from [1], with trimmed-down
output, I see:

Master @ edf12e7bbd

Testing a -> 158.037 ms
Testing a2 -> 164.442 ms
Testing a3 -> 171.523 ms
Testing a4 -> 189.892 ms
Testing a5 -> 217.197 ms
Testing a6 -> 186.790 ms
Testing a7 -> 189.491 ms
Testing a8 -> 195.384 ms
Testing a9 -> 200.547 ms
Testing a10 -> 206.149 ms
Testing a11 -> 211.708 ms
Testing a12 -> 217.976 ms
Testing a13 -> 224.565 ms
Testing a14 -> 230.642 ms
Testing a15 -> 237.372 ms
Testing a16 -> 244.110 ms

(checking for __builtin_prefetch... yes)

Master + v2-0001 + v2-0005

Testing a -> 157.477 ms
Testing a2 -> 163.720 ms
Testing a3 -> 171.159 ms
Testing a4 -> 186.837 ms
Testing a5 -> 205.220 ms
Testing a6 -> 184.585 ms
Testing a7 -> 189.879 ms
Testing a8 -> 195.650 ms
Testing a9 -> 201.220 ms
Testing a10 -> 207.162 ms
Testing a11 -> 213.255 ms
Testing a12 -> 219.313 ms
Testing a13 -> 225.763 ms
Testing a14 -> 237.337 ms
Testing a15 -> 239.440 ms
Testing a16 -> 245.740 ms

It does not seem like there's any improvement on this architecture.
There is a very small increase from "a" to "a6", but a very small
decrease in performance from "a7" to "a16". It's likely within the
expected noise level.

David

[1] 
https://postgr.es/m/caaphdvqwexy_6jgmb39vr3oqxz_w6stafkq52hodvwaw-19...@mail.gmail.com




Re: Using AF_UNIX sockets always for tests on Windows

2022-12-01 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
>> If we remove that, won't we have a whole lot of code that's not
>> tested at all on any platform, ie all the TCP-socket code?

> There's some coverage via the auth and ssl tests. But I agree it's an
> issue. But to me the fix for that seems to be to add a dedicated test for
> that, rather than relying on windows to test our socket code - that's quite a
> few separate code paths from the tcp support of other platforms.

IMO that's not the best way forward, because you'll always have
nagging questions about whether a single-purpose test covers
everything that needs coverage.  I think the best place to be in
would be to be able to run the whole test suite using either TCP or
UNIX sockets, on any platform (with stuff like the SSL test
overriding the choice as needed).  I doubt ripping out the existing
Windows-only support for TCP-based testing is a good step in that
direction.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2022-12-01 Thread Michael Paquier
On Thu, Dec 01, 2022 at 02:58:35PM +, gkokola...@pm.me wrote:
> Nuking the warning from orbit and changing the behaviour around disabling
> the requested compression when the libraries are not present, should not
> mean that we need to change the behaviour of default values for different
> formats. Please find v13 attached which reinstates it.

Gah, thanks!  And this default behavior is documented as dependent on
the compilation as well.

> Which in itself it got me looking and wondering why the tests succeeded.
> The only existing test covering that path is `defaults_dir_format` in
> `002_pg_dump.pl`. However as the test is currently written it does not
> check whether the output was compressed. The restore command would succeed
> in either case. A simple `gzip -t -r` against the directory will not
> suffice to test it, because there exist files which are never compressed
> in this format (.toc). A little bit more involved test case would need
> to be written, yet before I embark to this journey, I would like to know
> if you would agree to reinstate the defaults for those formats.

On top of my mind, I briefly recall that -r is not that portable.  And
the toc format makes the files generated non-deterministic as these
use OIDs..

[.. thinks ..]

We are going to need a new thing here, as compress_cmd cannot be
directly used.  What if we used only an array of glob()-able elements?
Let's say "expected_contents" that could include a "dir_path/*.gz"
conditional on $supports_gzip?  glob() can only be calculated when the
test is run as the file names cannot be known beforehand :/

>> As per the patch, it is true that we do not need to bump the format of
>> the dump archives, as we can still store only the compression level
>> and guess the method from it. I have added some notes about that in
>> ReadHead and WriteHead to not forget.
> 
> Agreed. A minor suggestion if you may.
> 
>  #ifndef HAVE_LIBZ
> -   if (AH->compression != 0)
> +   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> pg_log_warning("archive is compressed, but this installation 
> does not support compression -- no data will be available");
>  #endif
> 
> It would seem a more consistent to error out in this case. We do error
> in all other cases where the compression is not available.

Makes sense.

I have gone through the patch again, and applied it.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Using AF_UNIX sockets always for tests on Windows

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
> >> If we remove that, won't we have a whole lot of code that's not
> >> tested at all on any platform, ie all the TCP-socket code?
> 
> > There's some coverage via the auth and ssl tests. But I agree it's an
> > issue. But to me the fix for that seems to be to add a dedicated test for
> > that, rather than relying on windows to test our socket code - that's quite 
> > a
> > few separate code paths from the tcp support of other platforms.
> 
> IMO that's not the best way forward, because you'll always have
> nagging questions about whether a single-purpose test covers
> everything that needs coverage.

Still seems better than not having any coverage in our development
environments...


> I think the best place to be in would be to be able to run the whole test
> suite using either TCP or UNIX sockets, on any platform (with stuff like the
> SSL test overriding the choice as needed).

I agree that that's useful. But it seems somewhat independent from the
majority of the proposed changes. To be able to test force-tcp-everywhere we
don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
just needed so there's a secure way of running tests at all on windows.

I think 0003 should be "trimmed" to only change the default for
$use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
wants to, can then add a new environment variable to force tap tests to use
tcp.

Greetings,

Andres Freund




Re: Using WaitEventSet in the postmaster

2022-12-01 Thread Thomas Munro
On Fri, Dec 2, 2022 at 2:40 PM Andres Freund  wrote:
> On 2022-12-02 10:12:25 +1300, Thomas Munro wrote:
> > with a latch as the wakeup mechanism for "PM signals" (requests from
> > backends to do things like start a background worker, etc).
>
> Hm - is that directly related? ISTM that using a WES in the main loop, and
> changing pmsignal.c to a latch are somewhat separate things?

Yeah, that's a good question.  This comes from a larger patch set
where my *goal* was to use latches everywhere possible for
interprocess wakeups, but it does indeed make a lot of sense to do the
postmaster WaitEventSet retrofit completely independently of that, and
leaving the associated robustness problems for later proposals (the
posted patch clearly fails to solve them).

> I don't think b) is the case as the patch stands. Imagine some process
> overwriting pm_latch->owner_pid. That'd then break the SetLatch() in
> postmaster's signal handler, because it wouldn't realize that itself needs to
> be woken up, and we'd just signal some random process.

Right.  At some point I had an idea about a non-shared table of
latches where OS-specific things like pids and HANDLEs live, so only
the maybe_waiting and is_set flags are in shared memory, and even
those are ignored when accessing the latch in 'robust' mode (they're
only optimisations after all).  I didn't try it though.  First you
might have to switch to a model with a finite set of latches
identified by index, or something like that.  But I like your idea of
separating that whole problem.

> It doesn't seem trivial (but not impossible either) to make SetLatch() robust
> against arbitrary corruption. So it seems easier to me to just put the latch
> in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler.

Alright, good idea, I'll do a v2 like that.




Missing MaterialPath support in reparameterize_path_by_child

2022-12-01 Thread Tom Lane
Whilst fooling with my outer-join-aware-Vars patch, I tripped
across a multi-way join query that failed with
  ERROR:  could not devise a query plan for the given query
when enable_partitionwise_join is on.

I traced that to the fact that reparameterize_path_by_child()
omits support for MaterialPath, so that if the only surviving
path(s) for a child join include materialization steps, we'll
fail outright to produce a plan for the parent join.

Unfortunately, I don't have an example that produces such a
failure against HEAD.  It seems certain to me that such cases
exist, though, so I'd like to apply and back-patch the attached.

I'm suspicious now that reparameterize_path() should be
extended likewise, but I don't really have any hard
evidence for that.

regards, tom lane

commit f74ca5d8611af3306eb96719d5d1910c9b6a8db5
Author: Tom Lane 
Date:   Thu Dec 1 21:04:49 2022 -0500

Add missing MaterialPath case in reparameterize_path_by_child.

Surprised we hadn't noticed this already.

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c77399ca92..224ba84107 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4209,6 +4209,16 @@ do { \
 			}
 			break;
 
+		case T_MaterialPath:
+			{
+MaterialPath *mpath;
+
+FLAT_COPY_PATH(mpath, path, MaterialPath);
+REPARAMETERIZE_CHILD_PATH(mpath->subpath);
+new_path = (Path *) mpath;
+			}
+			break;
+
 		case T_MemoizePath:
 			{
 MemoizePath *mpath;


Re: Warning When Creating FOR EACH STATEMENT Trigger On Logical Replication Subscriber Side

2022-12-01 Thread Amit Kapila
On Thu, Dec 1, 2022 at 7:29 PM Avi Weinberg  wrote:
>
>
> There is no error or warning when creating FOR EACH STATEMENT trigger on  
> Logical Replication subscriber side, but it is not doing anything.  Shouldn't 
> a warning be helpful?
>
>
>
> CREATE TRIGGER set_updated_time_trig
>
>AFTER INSERT OR UPDATE OR DELETE ON test
>
> FOR EACH STATEMENT EXECUTE FUNCTION set_updated_time();
>

I think we need to consider a few things for this. First, how will we
decide whether a particular node is a subscriber-side? One can create
a subscription after creating the trigger. Also, it won't be
straightforward to know whether the particular table is involved in
the replication. Second, the statement triggers do get fired during
initial sync, see docs [1] (The logical replication apply process
currently only fires row triggers, not statement triggers. The initial
table synchronization, however, is implemented like a COPY command and
thus fires both row and statement triggers for INSERT.). So,
considering these points, I don't know if it is worth showing a
WARNING here.

[1] - 
https://www.postgresql.org/docs/devel/logical-replication-architecture.html

-- 
With Regards,
Amit Kapila.




Re: Failed Assert while pgstat_unlink_relation

2022-12-01 Thread Andres Freund
Hi,

On 2022-11-29 15:42:45 +0900, Kyotaro Horiguchi wrote:
> At Mon, 28 Nov 2022 14:01:30 +0530, vignesh C  wrote in 
> > Hi,
> > 
> > While reviewing/testing one of the patches I found the following Assert:
> > #0  0x55c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58)
> > at pgstat_relation.c:161
> > #1  0x55c6312bbb5a in pgstat_relation_delete_pending_cb
> > (entry_ref=0x55c6335563a8) at pgstat_relation.c:839
> > #2  0x55c6312b72bc in pgstat_delete_pending_entry
> > (entry_ref=0x55c6335563a8) at pgstat.c:1124
> > #3  0x55c6312be3f1 in pgstat_release_entry_ref (key=...,
> > entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523
> > #4  0x55c6312bee9a in pgstat_drop_entry
> > (kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at
> > pgstat_shmem.c:867
> > #5  0x55c6312c034a in AtEOXact_PgStat_DroppedStats
> > (xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97
> > #6  0x55c6312c0240 in AtEOXact_PgStat (isCommit=false,
> > parallel=false) at pgstat_xact.c:55
> > #7  0x55c630df8bee in AbortTransaction () at xact.c:2861
> > #8  0x55c630df94fd in AbortCurrentTransaction () at xact.c:3352
> > 
> > I could reproduce this issue with the following steps:
> > create table t1(c1 int);
> > BEGIN;
> > CREATE TABLE t (c1 int);
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> 
> Good catch!
> 
> AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped
> then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache()
> called just before.  Since the relcache content directly pointed from
> stats module is lost in this case, the stats side cannot defend itself
> from this.
> 
> Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or
> AtEOXact_PgStat_DroppedStats() should be called before
> AtEOXact_RelationCache(). the latter of which is simpler. I think we
> need to test this case, too.

This doesn't strike me as the right fix. What do you think about my patch at
https://postgr.es/m/20221128210908.hyffmljjylj447nu%40awork3.anarazel.de ,
leaving the quibbles around error handling aside?

Afaict it fixes the issue.

Greetings,

Andres Freund




Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Andres Freund
Hi,

On 2022-11-28 16:33:20 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Something like the attached. Still needs a bit of polish, e.g. adding the 
> > test
> > case from above.
>
> > I'm a bit uncomfortable adding a function call below
> >  * Perform swapping of the relcache entry contents.  Within this
> >  * process the old entry is momentarily invalid, so there 
> > *must* be no
> >  * possibility of CHECK_FOR_INTERRUPTS within this sequence. Do 
> > it in
> >  * all-in-line code for safety.
>
> Ugh.  I don't know what pgstat_unlink_relation does, but assuming
> that it can never throw an error seems like a pretty bad idea,

I don't think it'd be an issue - it just resets the pointer from a pgstat
entry to the relcache entry.

But you're right:

> Can't that part be done outside the critical section?

we can do that. See the attached.


Do we have any cases of relcache entries changing their relkind?

Greetings,

Andres Freund
diff --git i/src/backend/utils/cache/relcache.c w/src/backend/utils/cache/relcache.c
index bd6cd4e47b5..e2521b51ad4 100644
--- i/src/backend/utils/cache/relcache.c
+++ w/src/backend/utils/cache/relcache.c
@@ -2626,6 +2626,7 @@ RelationClearRelation(Relation relation, bool rebuild)
 		bool		keep_rules;
 		bool		keep_policies;
 		bool		keep_partkey;
+		bool		keep_pgstats;
 
 		/* Build temporary entry, but don't link it into hashtable */
 		newrel = RelationBuildDesc(save_relid, false);
@@ -2667,6 +2668,21 @@ RelationClearRelation(Relation relation, bool rebuild)
 		/* partkey is immutable once set up, so we can always keep it */
 		keep_partkey = (relation->rd_partkey != NULL);
 
+		/*
+		 * Keep stats pointers, except when the relkind changes
+		 * (e.g. converting tables into views). Different kinds of relations
+		 * might have different types of stats.
+		 *
+		 * If we don't want to keep the stats, unlink the stats and relcache
+		 * entry (and do so before entering the "critical section"
+		 * below). This is important because otherwise
+		 * PgStat_TableStatus->relation would get out of sync with
+		 * relation->pgstat_info.
+		 */
+		keep_pgstats = relation->rd_rel->relkind == newrel->rd_rel->relkind;
+		if (!keep_pgstats)
+			pgstat_unlink_relation(relation);
+
 		/*
 		 * Perform swapping of the relcache entry contents.  Within this
 		 * process the old entry is momentarily invalid, so there *must* be no
@@ -2720,9 +2736,14 @@ RelationClearRelation(Relation relation, bool rebuild)
 			SWAPFIELD(RowSecurityDesc *, rd_rsdesc);
 		/* toast OID override must be preserved */
 		SWAPFIELD(Oid, rd_toastoid);
+
 		/* pgstat_info / enabled must be preserved */
-		SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
-		SWAPFIELD(bool, pgstat_enabled);
+		if (keep_pgstats)
+		{
+			SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
+			SWAPFIELD(bool, pgstat_enabled);
+		}
+
 		/* preserve old partition key if we have one */
 		if (keep_partkey)
 		{
diff --git i/src/test/regress/expected/create_view.out w/src/test/regress/expected/create_view.out
index 17ca29ddbf7..4c108c4c40b 100644
--- i/src/test/regress/expected/create_view.out
+++ w/src/test/regress/expected/create_view.out
@@ -2202,6 +2202,29 @@ select pg_get_viewdef('tt26v', true);
 FROM ( VALUES (1,2,3)) v(x, y, z);
 (1 row)
 
+-- Test that changing the relkind of a relcache entry doesn't cause
+-- trouble. Prior instances of where it did:
+-- caldanm2yxz+zotv7y5zbd5wkt8o0ld3yxikuu3dcycvxf7g...@mail.gmail.com
+-- caldanm3oza-8wbps2jd1g5_gjrr-x3ywrjpek-mf5asrrvz...@mail.gmail.com
+CREATE TABLE tt26(c int);
+BEGIN;
+CREATE TABLE tt27(c int);
+SAVEPOINT q;
+CREATE RULE "_RETURN" AS ON SELECT TO tt27 DO INSTEAD SELECT * FROM tt26;
+SELECT * FROM tt27;
+ c 
+---
+(0 rows)
+
+ROLLBACK TO q;
+CREATE RULE "_RETURN" AS ON SELECT TO tt27 DO INSTEAD SELECT * FROM tt26;
+ROLLBACK;
+BEGIN;
+CREATE TABLE tt28(c int);
+CREATE RULE "_RETURN" AS ON SELECT TO tt28 DO INSTEAD SELECT * FROM tt26;
+CREATE RULE "_RETURN" AS ON SELECT TO tt28 DO INSTEAD SELECT * FROM tt26;
+ERROR:  "tt28" is already a view
+ROLLBACK;
 -- clean up all the random objects we made above
 DROP SCHEMA temp_view_test CASCADE;
 NOTICE:  drop cascades to 27 other objects
@@ -2233,7 +2256,7 @@ drop cascades to view aliased_view_2
 drop cascades to view aliased_view_3
 drop cascades to view aliased_view_4
 DROP SCHEMA testviewschm2 CASCADE;
-NOTICE:  drop cascades to 77 other objects
+NOTICE:  drop cascades to 78 other objects
 DETAIL:  drop cascades to table t1
 drop cascades to view temporal1
 drop cascades to view temporal2
@@ -2311,3 +2334,4 @@ drop cascades to view tt23v
 drop cascades to view tt24v
 drop cascades to view tt25v
 drop cascades to view tt26v
+drop cascades to table tt26
diff --git i/src/test/regress/sql/create_view.sql w/src/test/regress/sql/create_view.sql
index 8838a40f7ab..f305632c6aa 100644
--- i/src/test/regress/sql/create_view.sql
+++ w/src/test/regress/sql/create_view.sql
@@ -813,6 +813

Re: [PATCH] Add native windows on arm64 support

2022-12-01 Thread Michael Paquier
On Thu, Dec 01, 2022 at 05:53:47PM +, Niyas Sait wrote:
> I've attached a new revision of the patch (v4) and includes following
> changes,
> 
> 1. Add support for meson build system
> 2. Extend MSVC scripts to handle ARM64 platform.
> 3. Add arm64 definition of spin_delay function.
> 4. Exclude arm_acle.h import with MSVC compiler.
> 
> V3->V4: Add support for meson build system
> V2->V3: Removed ASLR enablement and rebased on master.
> V1->V2: Rebased on top of master

Thanks for the updated version.  I have been looking at it closely and
it looks like it should be able to do the job (no arm64 machine for
Windows here, sigh).

I have one tiny comment about this part:

-   USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef,
+   USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => 1,

Shouldn't we only enable this flag when we are under aarch64?
Similarly, I don't think that it is a good idea to switch on
USE_SSE42_CRC32C_WITH_RUNTIME_CHECK all the time.  We should only set
it when building under x86 and x86_64.

I would also add your link [1] in s_lock.h.

[1]: 
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions
--
Michael


signature.asc
Description: PGP signature


Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Tom Lane
Andres Freund  writes:
> Do we have any cases of relcache entries changing their relkind?

Just the table-to-view hack.  I'm not aware that there are any other
cases, and it seems hard to credit that there ever will be any.
I think we could get rid of table-to-view in HEAD, and use your patch
only in v15.

regards, tom lane




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-12-01 Thread Michael Paquier
On Mon, Nov 21, 2022 at 12:21:09PM +0300, Aleksander Alekseev wrote:
> After merging 1489b1ce [1] the patchset needed a rebase. PFA v47.
> 
> [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1489b1ce

The CF bot is showing some failures here.  You may want to
double-check.
--
Michael


signature.asc
Description: PGP signature


Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-12-01 Thread Michael Paquier
On Fri, Nov 04, 2022 at 09:52:39AM +0900, Ian Lawrence Barwick wrote:
> CommitFest 2022-11 is currently underway, so if you are interested
> in moving this patch forward, now would be a good time to update it.

No replies after 4 weeks, so I have marked this entry as returned
with feedback.  I am still wondering what would be the best thing to
do here..
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2022-12-01 Thread Michael Paquier
On Mon, Nov 07, 2022 at 05:04:14PM -0800, Jacob Champion wrote:
> Done. sslrootcert=system now prevents you from explicitly setting a
> weaker sslmode, to try to cement it as a Do What I Mean sort of
> feature. If you need something weird then you can still jump through
> the hoops by setting sslrootcert to a real file, same as today.
> 
> The macOS/OpenSSL 3.0.0 failure is still unfixed.

Err, could you look at that?  I am switching the patch as waiting on
author.
--
Michael


signature.asc
Description: PGP signature


Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Andres Freund
Hi,

On 2022-12-02 00:08:20 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Do we have any cases of relcache entries changing their relkind?
>
> Just the table-to-view hack.  I'm not aware that there are any other
> cases, and it seems hard to credit that there ever will be any.

I can see some halfway credible scenarios. E.g. converting a view to a
matview, or a table into a partition. I kind of wonder if it's worth keeping
the change, just in case we do - it's not that easy to hit...


> I think we could get rid of table-to-view in HEAD, and use your patch
> only in v15.

WFM. I'll push it to 15 tomorrow.

Greetings,

Andres Freund




Re: [PATCH] Add native windows on arm64 support

2022-12-01 Thread John Naylor
On Fri, Dec 2, 2022 at 12:20 AM Niyas Sait  wrote:
>
>
> On 07/11/2022 06:58, Michael Paquier wrote:
> > Seems so.  Hmm, where does _ARM64_BARRIER_SY come from?  Perhaps it
> > would be better to have a comment referring to it from a different
> > place than the forums of arm, like some actual docs?
>
>
> _ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation
> -
>
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions

In particular, at the bottom of that section:

"For the __isb intrinsic, the only restriction that is currently valid is
_ARM64_BARRIER_SY; all other values are reserved by the architecture."

This corresponds to

https://developer.arm.com/documentation/ddi0596/2021-06/Base-Instructions/ISB--Instruction-Synchronization-Barrier-

which says

"SY Full system barrier operation, encoded as CRm = 0b. Can be omitted."

> I couldn't find something more official for the sse2neon library part.

Not quite sure what this is referring to, but it seems we can just point to
the __aarch64__ section in the same file, which uses the same instruction:

spin_delay(void)
{
  __asm__ __volatile__(
  " isb; \n");
}

...and which already explains the choice with a comment.

About v4:

+ * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop.

Need a space here after __isb.

+  if cc.get_id() == 'msvc'
+cdata.set('USE_ARMV8_CRC32C', false)
+cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+have_optimized_crc = true
+  else

That seems like a heavy-handed way to force it. Could we just use the same
gating in the test program that the patch puts in the code of interest?
Namely:

+#ifndef _MSC_VER
 #include 
+#endif

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-02 00:08:20 -0500, Tom Lane wrote:
>> Just the table-to-view hack.  I'm not aware that there are any other
>> cases, and it seems hard to credit that there ever will be any.

> I can see some halfway credible scenarios. E.g. converting a view to a
> matview, or a table into a partition. I kind of wonder if it's worth keeping
> the change, just in case we do - it's not that easy to hit...

I'd suggest putting in an assertion that the relkind isn't changing,
instead.  When and if somebody makes a credible feature patch that'd
require relaxing that, we can see what to do.

(There's a couple of places in rewriteHandler.c that could
perhaps be simplified, too.)

regards, tom lane




Re: Introduce a new view for checkpointer related stats

2022-12-01 Thread Bharath Rupireddy
On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy
 wrote:
>
> I don't have a strong opinion about changing column names. However, if
> we were to change it, I prefer to use names that
> PgStat_CheckpointerStats has. BTW, that's what
> PgStat_BgWriterStats/pg_stat_bgwriter and
> PgStat_ArchiverStats/pg_stat_archiver uses.

After thinking about this a while, I convinced myself to change the
column names to be a bit more meaningful. I still think having
checkpoints in the column names is needed because it also has other
backend related columns. I'm attaching the v4 patch for further
review.
CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_timed_checkpoints() AS timed_checkpoints,
pg_stat_get_requested_checkpoints() AS requested_checkpoints,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_buf_written_checkpoints() AS buffers_written_checkpoints,
pg_stat_get_buf_written_backend() AS buffers_written_backend,
pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend,
pg_stat_get_checkpointer_stat_reset_time() AS stats_reset;

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 8280223b2b8fde888dd8540328336421bbf6138c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 2 Dec 2022 04:58:46 +
Subject: [PATCH v4] Introduce a new view for checkpointer related stats

pg_stat_bgwriter view currently reports checkpointer stats as well.
It is that way because historically checkpointer was part of
bgwriter until the commits 806a2ae and bf405ba, that went into
PG 9.2, separated them out.

It is time for us to separate checkpointer stats to its own view
called pg_stat_checkpointer.

Bump catalog version.
---
 doc/src/sgml/monitoring.sgml  | 110 +-
 src/backend/catalog/system_views.sql  |  18 +--
 .../utils/activity/pgstat_checkpointer.c  |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  19 +--
 src/include/catalog/pg_proc.dat   |  22 ++--
 src/include/pgstat.h  |   1 +
 src/test/recovery/t/029_stats_restart.pl  |   6 +-
 src/test/regress/expected/rules.out   |  17 +--
 src/test/regress/expected/stats.out   |  21 +++-
 src/test/regress/sql/stats.sql|  12 +-
 10 files changed, 156 insertions(+), 71 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 11a8ebe5ec..1279b47d27 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
  
 
+ 
+  pg_stat_checkpointerpg_stat_checkpointer
+  One row only, showing statistics about the
+   checkpointer process's activity. See
+   
+   pg_stat_checkpointer for details.
+ 
+ 
+
  
   pg_stat_walpg_stat_wal
   One row only, showing statistics about WAL activity. See
@@ -3633,7 +3642,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
   
The pg_stat_bgwriter view will always have a
-   single row, containing global data for the cluster.
+   single row, containing data about the bgwriter of the cluster.
   
 
   
@@ -3653,97 +3662,138 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
   
-   checkpoints_timed bigint
+   buffers_clean bigint
   
   
-   Number of scheduled checkpoints that have been performed
+   Number of buffers written by the background writer
   
  
 
  
   
-   checkpoints_req bigint
+   maxwritten_clean bigint
   
   
-   Number of requested checkpoints that have been performed
+   Number of times the background writer stopped a cleaning
+   scan because it had written too many buffers
   
  
 
  
   
-   checkpoint_write_time double precision
+   buffers_alloc bigint
   
   
-   Total amount of time that has been spent in the portion of
-   checkpoint processing where files are written to disk, in milliseconds
+   Number of buffers allocated
   
  
 
  
   
-   checkpoint_sync_time double precision
+   stats_reset timestamp with time zone
   
   
-   Total amount of time that has been spent in the portion of
-   checkpoint processing where files are synchronized to disk, in
-   milliseconds
+   Time at which these statistics were last reset
   
  
+
+   
+  
+
+ 
+
+ 
+  pg_stat_checkpointer
+
+  
+   pg_stat_checkpointer
+  
+
+  
+   The pg_stat_checkpointer view will always have a
+   single row, containing data about the checkpointer process of the cluster.
+  
 
+  
+   pg_stat_checkpointer View
+   
+
  
   
-   bu

Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Andres Freund
Hi, 

On December 1, 2022 9:48:48 PM PST, Tom Lane  wrote:
>Andres Freund  writes:
>> On 2022-12-02 00:08:20 -0500, Tom Lane wrote:
>>> Just the table-to-view hack.  I'm not aware that there are any other
>>> cases, and it seems hard to credit that there ever will be any.
>
>> I can see some halfway credible scenarios. E.g. converting a view to a
>> matview, or a table into a partition. I kind of wonder if it's worth keeping
>> the change, just in case we do - it's not that easy to hit...
>
>I'd suggest putting in an assertion that the relkind isn't changing,
>instead.

Sounds like a plan. Will you do that when you remove the table-to-view hack? 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Failed Assert in pgstat_assoc_relation

2022-12-01 Thread Tom Lane
Andres Freund  writes:
> On December 1, 2022 9:48:48 PM PST, Tom Lane  wrote:
>> I'd suggest putting in an assertion that the relkind isn't changing,
>> instead.

> Sounds like a plan. Will you do that when you remove the table-to-view hack? 

I'd suggest committing it concurrently with the v15 fix, instead,
so that there's a cross-reference to what some future hacker might
need to install if they remove the assertion.

I guess that means that the table-to-view removal has to go in
first.  I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

regards, tom lane




Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Bharath Rupireddy
On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart  wrote:
>
> >> 4. Is it a good idea to add log messages in the DoCustodianTasks()
> >> loop? Maybe at a debug level? The log message can say the current task
> >> the custodian is processing. And/Or setting the custodian's status on
> >> the ps display is also a good idea IMO.
>
> I'd like to pick these up in a new thread if/when this initial patch set is
> committed.  The tasks already do some logging, and the checkpointer process
> doesn't update the ps display for these tasks today.

It'll be good to have some kind of dedicated monitoring for the
custodian process as it can do a "good" amount of work at times and
users will have a way to know what it currently is doing - it can be
logs at debug level, progress reporting via
ereport_startup_progress()-sort of mechanism, ps display,
pg_stat_custodian or a special function that tells some details, or
some other. In any case, I agree to park this for later.

> >> 0002 and 0003:
> >> 1.
> >> +CHECKPOINT;
> >> +DO $$
> >> I think we need to ensure that there are some snapshot files before
> >> the checkpoint. Otherwise, it may happen that the above test case
> >> exits without the custodian process doing anything.
> >>
> >> 2. I think the best way to test the custodian process code is by
> >> adding a TAP test module to see actually the custodian process kicks
> >> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
> >> process functions and see if we see the logs in server logs.
>
> The test appears to reliably create snapshot and mapping files, so if the
> directories are empty at some point after the checkpoint at the end, we can
> be reasonably certain the custodian took action.  I didn't add explicit
> checks that there are files in the directories before the checkpoint
> because a concurrent checkpoint could make such checks unreliable.

I think you're right. I added sqls to see if the snapshot and mapping
files count > 0, see [1] and the cirrus-ci members are happy too -
https://github.com/BRupireddy/postgres/tree/custodian_review_2. I
think we can consider adding these count > 0 checks to tests.

> >> 0004:
> >> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
> >> Otherwise the patch LGTM.
>
> I'm keeping this one separate because I've received conflicting feedback
> about the idea.

If we classify custodian as a process doing non-critical tasks that
have nothing to do with regular server functioning, then processing
ShutdownRequestPending looks okay. However, delaying these
non-critical tasks such as file removals which reclaims disk space
might impact the server overall especially when it's reaching 100%
disk usage and we want the custodian to do its job fully before we
shutdown the server.

If we delay processing shutdown requests, that can impact the server
overall (might delay restarts, failovers etc.), because at times there
can be a lot of tasks with a good amount of work pending in the
custodian's task queue.

Having said above, I'm okay to process ShutdownRequestPending as early
as possible, however, should we also add CHECK_FOR_INTERRUPTS()
alongside ShutdownRequestPending?

Also, I think it's enough to just have ShutdownRequestPending check in
DoCustodianTasks(void)'s main loop and we can let
RemoveOldSerializedSnapshots() and RemoveOldLogicalRewriteMappings()
do their jobs to the fullest as they do today.

While thinking about this, one thing that really struck me is what
happens if we let the custodian exit, say after processing
ShutdownRequestPending immediately or after a restart, leaving other
queued tasks? The custodian will never get to work on those tasks
unless the requestors (say checkpoint or some other process) requests
it to do so after restart. Maybe, we don't need to worry about it.
Maybe we need to worry about it. Maybe it's an overkill to save the
custodian's task state to disk so that it can come up and do the
leftover tasks upon restart.

> > Another comment:
> > IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
> > wakeups for power savings (being discussed in the other thread).
> > However, can it happen that the custodian missed to capture SetLatch
> > wakeups by other backends? In other words, can the custodian process
> > be sleeping when there's work to do?
>
> I'm not aware of any way this could happen, but if there is one, I think we
> should treat it as a bug instead of relying on the custodian process to
> periodically wake up and check for work to do.

One possible scenario is that the requestor adds its task details to
the queue and sets the latch, the custodian can miss this SetLatch()
when it's in the midst of processing a task. However, it guarantees
the requester that it'll process the added task after it completes the
current task. And, I don't know the other reasons when the custodian
can miss SetLatch().

[1]
diff --git a/contrib/test_decoding/expected/rewrite.out
b/contrib/test_decoding/expected/rewrite

Re: initdb: Refactor PG_CMD_PUTS loops

2022-12-01 Thread John Naylor
On Thu, Dec 1, 2022 at 5:02 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> Keeping the SQL commands that initdb runs in string arrays before
> feeding them to PG_CMD_PUTS() seems unnecessarily verbose and
> inflexible.  In some cases, the array only has one member.  In other
> cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a
> command, but that would require breaking up the loop or using
> workarounds like replace_token().  This patch unwinds all that; it's
> much simpler that way.

+1,  I can't think of a reason to keep the current coding

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-01 Thread Amit Kapila
On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila  wrote:
>
> On Mon, Nov 14, 2022 at 6:52 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Amit,
> >
> > > > > It seems to me Kuroda-San has proposed this change [1] to fix the test
> > > > > but it is not clear to me why such a change is required. Why can't
> > > > > CHECK_FOR_INTERRUPTS() after waiting, followed by the existing below
> > > > > code [2] in LogicalRepApplyLoop() sufficient to handle parameter
> > > > > updates?
> >
> > (I forgot to say, this change was not proposed by me. I said that there 
> > should be
> > modified. I thought workers should wake up after the transaction was 
> > committed.)
> >
> > > So, why only honor the 'disable' option of the subscription? For
> > > example, one can change 'min_apply_delay' and it seems
> > > recoveryApplyDelay() honors a similar change in the recovery
> > > parameter. Is there a way to set the latch of the worker process, so
> > > that it can recheck if anything is changed?
> >
> > I have not considered about it, but seems reasonable. We may be able to
> > do maybe_reread_subscription() if subscription parameters are changed
> > and latch is set.
> >
>
> One more thing I would like you to consider is the point raised by me
> related to this patch's interaction with the parallel apply feature as
> mentioned in the email [1]. I am not sure the idea proposed in that
> email [1] is a good one because delaying after applying commit may not
> be good as we want to delay the apply of the transaction(s) on
> subscribers by this feature. I feel this needs more thought.
>

I have thought a bit more about this and we have the following options
to choose the delay point from. (a) apply delay just before committing
a transaction. As mentioned in comments in the patch this can lead to
bloat and locks held for a long time. (b) apply delay before starting
to apply changes for a transaction but here the problem is which time
to consider. In some cases, like for streaming transactions, we don't
receive the commit/prepare xact time in the start message. (c) use (b)
but use the previous transaction's commit time. (d) apply delay after
committing a transaction by using the xact's commit time.

At this stage, among above, I feel any one of (c) or (d) is worth
considering. Now, the difference between (c) and (d) is that if after
commit the next xact's data is already delayed by more than
min_apply_delay time then we don't need to kick the new logic of apply
delay.

The other thing to consider whether we need to process any keepalive
messages during the delay because otherwise, walsender may think that
the subscriber is not available and time out. This may not be a
problem for synchronous replication but otherwise, it could be a
problem.

Thoughts?

-- 
With Regards,
Amit Kapila.




Re: pg_dump: Remove "blob" terminology

2022-12-01 Thread Peter Eisentraut

On 30.11.22 09:07, Daniel Gustafsson wrote:

The commit message contains a typo: functinos


fixed


   * called for both BLOB and TABLE data; it is the responsibility of
- * the format to manage each kind of data using StartBlob/StartData.
+ * the format to manage each kind of data using StartLO/StartData.

Should BLOB be changed to BLOBS here (and in similar comments) to make it
clearer that it refers to the archive entry and the concept of a binary large
object in general?


I changed this one and went through it again to tidy it up a bit more. 
(There are both "BLOB" and "BLOBS" archive entries, so both forms still 
exist in the code now.)



Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl:

 # Tests which are considered 'full' dumps by pg_dump, but there.
 # are flags used to exclude specific items (ACLs, blobs, etc).


fixed

I also put back the old long options forms in the documentation and help 
but marked them deprecated.
From e32d43952a4f2a054f25883c1136177cd1ad1fc3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 2 Dec 2022 07:55:52 +0100
Subject: [PATCH v2] pg_dump: Remove "blob" terminology

For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.

This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead.  There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry.  Additional long command-line options
are added with the new naming.

Discussion: 
https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
---
 doc/src/sgml/ref/pg_dump.sgml   |  16 +-
 src/bin/pg_dump/pg_backup.h |  12 +-
 src/bin/pg_dump/pg_backup_archiver.c|  78 +++
 src/bin/pg_dump/pg_backup_archiver.h|  34 +--
 src/bin/pg_dump/pg_backup_custom.c  |  66 +++---
 src/bin/pg_dump/pg_backup_db.c  |   4 +-
 src/bin/pg_dump/pg_backup_directory.c   | 112 +-
 src/bin/pg_dump/pg_backup_null.c|  50 ++---
 src/bin/pg_dump/pg_backup_tar.c |  68 +++---
 src/bin/pg_dump/pg_dump.c   | 216 ++--
 src/bin/pg_dump/pg_dump.h   |   8 +-
 src/bin/pg_dump/pg_dump_sort.c  |  16 +-
 src/bin/pg_dump/t/002_pg_dump.pl|  46 ++---
 src/test/modules/test_pg_dump/t/001_base.pl |   2 +-
 14 files changed, 367 insertions(+), 361 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 363d1327e2..2c938cd7e1 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -132,7 +132,8 @@ Options
 
  
   -b
-  --blobs
+  --large-objects
+  --blobs (deprecated)
   

 Include large objects in the dump.  This is the default behavior
@@ -140,7 +141,7 @@ Options
 --schema-only is specified.  The -b
 switch is therefore only useful to add large objects to dumps
 where a specific schema or table has been requested.  Note that
-blobs are considered data and therefore will be included when
+large objects are considered data and therefore will be included when
 --data-only is used, but not
 when --schema-only is.

@@ -149,7 +150,8 @@ Options
 
  
   -B
-  --no-blobs
+  --no-large-objects
+  --no-blobs (deprecated)
   

 Exclude large objects in the dump.
@@ -323,7 +325,7 @@ Options
   
Output a directory-format archive suitable for input into
pg_restore. This will create a directory
-   with one file for each table and blob being dumped, plus a
+   with one file for each table and large object being dumped, plus a
so-called Table of Contents file describing the dumped objects in a
machine-readable format that pg_restore
can read. A directory format archive can be manipulated with
@@ -434,9 +436,9 @@ Options
 

 
- Non-schema objects such as blobs are not dumped when 
-n is
- specified.  You can add blobs back to the dump with the
- --blobs switch.
+ Non-schema objects such as large objects are not dumped when 
-n is
+ specified.  You can add large objects back to the dump with the
+ --large-objects switch.
 

 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index bc6b6594af..aba780ef4b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -53,9 +53,9 @@ typedef enum _archiveMode
 
 typedef enum _teSection
 {
-   SECTION_NONE = 1,   /* COMMENTs, ACLs, etc; can be 
anywhere */
+   SECTION_NONE = 1,   /* com

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-01 Thread Michael Paquier
On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote:
> Please do, if you feel so. Thanks for your review.

I don't really mind the addition of the LSN when operating on a given
record where we are reading a location, like in the five error paths
for the header validation or the three ones in ReadRecord()

Now this one looks confusing:
+   XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset,
+   wal_segment_size, lsn);
ereport(PANIC,
(errcode_for_file_access(),
 errmsg("could not write to log file %s "
-   "at offset %u, length %zu: %m",
-   xlogfname, startoffset, nleft)));
+   "at offset %u, LSN %X/%X, length %zu: %m",
+   xlogfname, startoffset,
+   LSN_FORMAT_ARGS(lsn), nleft)));

This does not always refer to an exact LSN of a record as we may be in
the middle of a write, so I would leave it as-is.

Similarly the addition of wre_lsn would be confusing?  The offset
looks kind of enough to me when referring to the middle of a page in
WALReadRaiseError().
--
Michael


signature.asc
Description: PGP signature


Re: Introduce a new view for checkpointer related stats

2022-12-01 Thread sirisha chamarthi
On Thu, Dec 1, 2022 at 9:50 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy
>  wrote:
> >
> > I don't have a strong opinion about changing column names. However, if
> > we were to change it, I prefer to use names that
> > PgStat_CheckpointerStats has. BTW, that's what
> > PgStat_BgWriterStats/pg_stat_bgwriter and
> > PgStat_ArchiverStats/pg_stat_archiver uses.
>
> After thinking about this a while, I convinced myself to change the
> column names to be a bit more meaningful. I still think having
> checkpoints in the column names is needed because it also has other
> backend related columns. I'm attaching the v4 patch for further
> review.
> CREATE VIEW pg_stat_checkpointer AS
> SELECT
> pg_stat_get_timed_checkpoints() AS timed_checkpoints,
> pg_stat_get_requested_checkpoints() AS requested_checkpoints,
> pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
> pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
> pg_stat_get_buf_written_checkpoints() AS
> buffers_written_checkpoints,
> pg_stat_get_buf_written_backend() AS buffers_written_backend,
> pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend,
> pg_stat_get_checkpointer_stat_reset_time() AS stats_reset;
>

IMO, “buffers_written_checkpoints” is confusing. What do you think?



> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: Missing MaterialPath support in reparameterize_path_by_child

2022-12-01 Thread Richard Guo
On Fri, Dec 2, 2022 at 10:55 AM Tom Lane  wrote:

> I traced that to the fact that reparameterize_path_by_child()
> omits support for MaterialPath, so that if the only surviving
> path(s) for a child join include materialization steps, we'll
> fail outright to produce a plan for the parent join.


Yeah, that's true.  It's weird we neglect MaterialPath here.


> Unfortunately, I don't have an example that produces such a
> failure against HEAD.  It seems certain to me that such cases
> exist, though, so I'd like to apply and back-patch the attached.


I tried on HEAD and got one, which leverages sampled rel to generate the
MaterialPath and lateral reference to make it the only available path.

SET enable_partitionwise_join to true;

CREATE TABLE prt (a int, b int) PARTITION BY RANGE(a);
CREATE TABLE prt_p1 PARTITION OF prt FOR VALUES FROM (0) TO (10);

CREATE EXTENSION tsm_system_time;

explain (costs off)
select * from prt t1 left join lateral (select t1.a as t1a, t2.a as t2a
from prt t2 TABLESAMPLE system_time (10)) ss on ss.t1a = ss.t2a;
ERROR:  could not devise a query plan for the given query

Thanks
Richard


Re: Introduce a new view for checkpointer related stats

2022-12-01 Thread Bharath Rupireddy
On Fri, Dec 2, 2022 at 12:54 PM sirisha chamarthi
 wrote:
>
> On Thu, Dec 1, 2022 at 9:50 PM Bharath Rupireddy 
>  wrote:
>>
>> On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy
>>  wrote:
>> >
>> > I don't have a strong opinion about changing column names. However, if
>> > we were to change it, I prefer to use names that
>> > PgStat_CheckpointerStats has. BTW, that's what
>> > PgStat_BgWriterStats/pg_stat_bgwriter and
>> > PgStat_ArchiverStats/pg_stat_archiver uses.
>>
>> After thinking about this a while, I convinced myself to change the
>> column names to be a bit more meaningful. I still think having
>> checkpoints in the column names is needed because it also has other
>> backend related columns. I'm attaching the v4 patch for further
>> review.
>> CREATE VIEW pg_stat_checkpointer AS
>> SELECT
>> pg_stat_get_timed_checkpoints() AS timed_checkpoints,
>> pg_stat_get_requested_checkpoints() AS requested_checkpoints,
>> pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
>> pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
>> pg_stat_get_buf_written_checkpoints() AS buffers_written_checkpoints,
>> pg_stat_get_buf_written_backend() AS buffers_written_backend,
>> pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend,
>> pg_stat_get_checkpointer_stat_reset_time() AS stats_reset;
>
>
> IMO, “buffers_written_checkpoints” is confusing. What do you think?

Thanks. We can be "more and more" meaningful by naming
buffers_written_by_checkpoints, buffers_written_by_backend,
buffers_fsync_by_backend. However, I don't think that's a good idea
here as names get too long.

Having said that, I'll leave it to the committer's discretion.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




refactor ExecGrant_*() functions

2022-12-01 Thread Peter Eisentraut
Continuing the ideas in [0], this patch refactors the ExecGrant_Foo() 
functions and replaces many of them by a common function that is driven 
by the ObjectProperty table.


It would be nice to do more here, for example ExecGrant_Language(), 
which has additional non-generalizable checks, but I think this is 
already a good start.  For example, the work being discussed on 
privileges on publications [1] would be able to take good advantage of this.



[0]: 
https://www.postgresql.org/message-id/95c30f96-4060-2f48-98b5-a4392d3b6...@enterprisedb.com

[1]: https://www.postgresql.org/message-id/flat/20330.1652105397@antosFrom 200879e5edfc1ce93b7af3cbfafc1f618626cbe9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 2 Dec 2022 08:16:53 +0100
Subject: [PATCH] Refactor ExecGrant_*() functions

Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions,
write one common function ExecGrant_generic() that can handle most of
them.  We already have all the information we need, such as which
system catalog corresponds to which catalog table and which column is
the ACL column.
---
 src/backend/catalog/aclchk.c | 662 ++-
 1 file changed, 34 insertions(+), 628 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 8d84a7b056..612ef1a666 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -106,14 +106,9 @@ bool   binary_upgrade_record_init_privs = 
false;
 
 static void ExecGrantStmt_oids(InternalGrant *istmt);
 static void ExecGrant_Relation(InternalGrant *istmt);
-static void ExecGrant_Database(InternalGrant *istmt);
-static void ExecGrant_Fdw(InternalGrant *istmt);
-static void ExecGrant_ForeignServer(InternalGrant *istmt);
-static void ExecGrant_Function(InternalGrant *istmt);
+static void ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode 
default_privs);
 static void ExecGrant_Language(InternalGrant *istmt);
 static void ExecGrant_Largeobject(InternalGrant *istmt);
-static void ExecGrant_Namespace(InternalGrant *istmt);
-static void ExecGrant_Tablespace(InternalGrant *istmt);
 static void ExecGrant_Type(InternalGrant *istmt);
 static void ExecGrant_Parameter(InternalGrant *istmt);
 
@@ -602,22 +597,22 @@ ExecGrantStmt_oids(InternalGrant *istmt)
ExecGrant_Relation(istmt);
break;
case OBJECT_DATABASE:
-   ExecGrant_Database(istmt);
+   ExecGrant_generic(istmt, DatabaseRelationId, 
ACL_ALL_RIGHTS_DATABASE);
break;
case OBJECT_DOMAIN:
case OBJECT_TYPE:
ExecGrant_Type(istmt);
break;
case OBJECT_FDW:
-   ExecGrant_Fdw(istmt);
+   ExecGrant_generic(istmt, ForeignDataWrapperRelationId, 
ACL_ALL_RIGHTS_FDW);
break;
case OBJECT_FOREIGN_SERVER:
-   ExecGrant_ForeignServer(istmt);
+   ExecGrant_generic(istmt, ForeignServerRelationId, 
ACL_ALL_RIGHTS_FOREIGN_SERVER);
break;
case OBJECT_FUNCTION:
case OBJECT_PROCEDURE:
case OBJECT_ROUTINE:
-   ExecGrant_Function(istmt);
+   ExecGrant_generic(istmt, ProcedureRelationId, 
ACL_ALL_RIGHTS_FUNCTION);
break;
case OBJECT_LANGUAGE:
ExecGrant_Language(istmt);
@@ -626,10 +621,10 @@ ExecGrantStmt_oids(InternalGrant *istmt)
ExecGrant_Largeobject(istmt);
break;
case OBJECT_SCHEMA:
-   ExecGrant_Namespace(istmt);
+   ExecGrant_generic(istmt, NamespaceRelationId, 
ACL_ALL_RIGHTS_SCHEMA);
break;
case OBJECT_TABLESPACE:
-   ExecGrant_Tablespace(istmt);
+   ExecGrant_generic(istmt, TableSpaceRelationId, 
ACL_ALL_RIGHTS_TABLESPACE);
break;
case OBJECT_PARAMETER_ACL:
ExecGrant_Parameter(istmt);
@@ -2132,380 +2127,22 @@ ExecGrant_Relation(InternalGrant *istmt)
 }
 
 static void
-ExecGrant_Database(InternalGrant *istmt)
-{
-   Relationrelation;
-   ListCell   *cell;
-
-   if (istmt->all_privs && istmt->privileges == ACL_NO_RIGHTS)
-   istmt->privileges = ACL_ALL_RIGHTS_DATABASE;
-
-   relation = table_open(DatabaseRelationId, RowExclusiveLock);
-
-   foreach(cell, istmt->objects)
-   {
-   Oid datId = lfirst_oid(cell);
-   Form_pg_database pg_database_tuple;
-   Datum   aclDatum;
-   boolisNull;
-   AclMode avail_goptions;
-   AclMode this_p

  1   2   >