Re: First draft of the PG 15 release notes

2022-05-18 Thread David Rowley
On Thu, 19 May 2022 at 14:41, Amit Langote  wrote:
> Though a bit late given beta is now wrapped, I have another partition
> item wording improvement suggestion:
>
> -Previously, a partitioned table with any LIST partition containing
> multiple values could not be used for ordered partition scans.  Now
> only non-pruned LIST partitions are checked.  This also helps with
> -partitioned tables with DEFAULT partitions.
>
> +Previously, an ordered partition scan would not be considered for a
> LIST-partitioned table with any partition containing multiple values,
> nor for partitioned tables with DEFAULT partition.

I think your proposed wording does not really improve things.  The
"Now only non-pruned LIST partitions are checked" is important and I
think Bruce did the right thing to mention that. Prior to this change,
ordered scans were not possible if there was a DEFAULT or if any LIST
partition allowed >1 value. Now, if the default partition is pruned
and there are no non-pruned partitions that allow Datum values that
are inter-mixed with ones from another non-pruned partition, then an
ordered scan can be performed.

For example, non-pruned partition a allows IN(1,3), and non-pruned
partition b allows IN(2,4), we cannot do the ordered scan. With
IN(1,2), IN(3,4), we can.

David




Re: weird comments in Memoize nodes

2022-05-18 Thread David Rowley
On Tue, 17 May 2022 at 08:02, David Rowley  wrote:
> Yeah, must be a copy-pasto.  I'll fix it with the attached after beta1
> is tagged.

Pushed.

David




Re: bogus: logical replication rows/cols combinations

2022-05-18 Thread Amit Kapila
On Mon, May 16, 2022 at 6:50 PM Alvaro Herrera  wrote:
>
> On 2022-May-16, Amit Kapila wrote:
>
> > Few  comments:
> > =
> > 1.
> > postgres=# select * from pg_publication_tables;
> >  pubname | schemaname | tablename | columnlist | rowfilter
> > -++---++---
> >  pub1| public | t1||
> >  pub2| public | t1| 1 2| (c3 < 10)
> > (2 rows)
> >
> > I think it is better to display column names for columnlist in the
> > exposed view similar to attnames in the pg_stats_ext view. I think
> > that will make it easier for users to understand this information.
>
> +1
>

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
@@ -1761,17 +1762,18 @@ fetch_table_list(WalReceiverConn *wrconn, List
*publications)
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
- Oid tableRow[2] = {TEXTOID, TEXTOID};
+ Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List*tablelist = NIL;

  initStringInfo();
- appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename\n"
+ appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename, \n"
+"  t.attnames\n"
 "  FROM pg_catalog.pg_publication_tables t\n"
 " WHERE t.pubname IN (");


Now, there is another way to change this query as well as done by
Hou-San in his first version [2] of the patch. The changed query with
that approach will be something like:
@@ -1761,17 +1762,34 @@ fetch_table_list(WalReceiverConn *wrconn, List
*publications)
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
- Oid tableRow[2] = {TEXTOID, TEXTOID};
+ Oid tableRow[3] = {TEXTOID, TEXTOID, INT2VECTOROID};
  List*tablelist = NIL;

  initStringInfo();
- appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename\n"
-"  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(,
+"SELECT DISTINCT t.schemaname,\n"
+"t.tablename,\n"
+"(CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+"THEN NULL ELSE pr.prattrs END)\n"
+"  FROM (SELECT P.pubname AS pubname,\n"
+"   N.nspname AS schemaname,\n"
+"   C.relname AS tablename,\n"
+"   P.oid AS pubid,\n"
+"   C.oid AS reloid,\n"
+"   C.relnatts\n"
+"  FROM pg_publication P,\n"
+"  LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+"  pg_class C JOIN pg_namespace N\n"
+" ON (N.oid = C.relnamespace)\n"
+"  WHERE C.oid = GPT.relid) t\n"
+"  LEFT OUTER JOIN pg_publication_rel pr\n"
+"   ON (t.pubid = pr.prpubid AND\n"
+"pr.prrelid = reloid)\n"

It appeared slightly complex and costly to me, so I have given the
suggestion to change it as we have now in the second patch as shown
above. Now, I can think of below ways to proceed here:

a. Revert the change in view (and underlying function) as done in
commit 0ff20288e1 and consider the alternate way (using a slightly
complex query) to fix. Then maybe for PG-16, we can simplify it by
changing the underlying function and view.
b. Proceed with the current approach of using a simplified query.

What do you think?

[1] - https://www.postgresql.org/message-id/91075.1652929852%40sss.pgh.pa.us
[2] - 
https://www.postgresql.org/message-id/OS0PR01MB5716A594C58DE4FFD1F8100B94C89%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Build-farm - intermittent error in 031_column_list.pl

2022-05-18 Thread Peter Smith
Hi hackers.

FYI, I saw that there was a recent Build-farm error on the "grison" machine [1]
[1] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison=HEAD

The error happened during "subscriptionCheck" phase in the TAP test
t/031_column_list.pl
This test file was added by this [2] commit.
[2] 
https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

~~

I checked the history of fails for that TAP test t/031_column_list.pl
and found that this same error seems to have been happening
intermittently for at least the last 50 days.

Details of similar previous errors from the BF are listed below.

~~~

1. Details for system "grison" failure at stage subscriptionCheck,
snapshot taken 2022-05-18 18:11:45
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2022-05-18%2018%3A11%3A45

[22:02:08] t/029_on_error.pl .. ok25475 ms ( 0.01
usr  0.00 sys + 15.39 cusr  5.59 csys = 20.99 CPU)
# poll_query_until timed out executing this query:
# SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 22.
[22:09:25] t/031_column_list.pl ...
...
[22:02:47.887](1.829s) ok 22 - partitions with different replica
identities not replicated correctly Waiting for replication conn
sub1's replay_lsn to pass 0/1530588 on publisher
[22:09:25.395](397.508s) # poll_query_until timed out executing this query:
# SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
timed out waiting for catchup at t/031_column_list.pl line 728.
### Stopping node "publisher" using mode immediate

~~~

2. Details for system "xenodermus" failure at stage subscriptionCheck,
snapshot taken 2022-04-16 21:00:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus=2022-04-16%2021%3A00%3A04

[00:15:32] t/029_on_error.pl .. ok 8278 ms ( 0.00
usr  0.00 sys +  1.33 cusr  0.55 csys =  1.88 CPU)
# poll_query_until timed out executing this query:
# SELECT '0/1543648' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 25.
[00:22:30] t/031_column_list.pl ...
...
[00:16:04.100](0.901s) ok 25 - partitions with different replica
identities not replicated correctly Waiting for replication conn
sub1's replay_lsn to pass 0/1543648 on publisher
[00:22:29.923](385.823s) # poll_query_until timed out executing this query:
# SELECT '0/1543648' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
timed out waiting for catchup at t/031_column_list.pl line 818.

~~~

3. Details for system "phycodurus" failure at stage subscriptionCheck,
snapshot taken 2022-04-05 17:30:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2022-04-05%2017%3A30%3A04

# poll_query_until timed out executing this query:
# SELECT '0/1528640' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 22.
[20:50:25] t/031_column_list.pl ...
...
ok 22 - partitions with different replica identities not replicated
correctly Waiting for replication conn sub1's replay_lsn to pass
0/1528640 on publisher # poll_query_until timed out executing this
query:
# SELECT '0/1528640' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('sub1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
timed out waiting for catchup at t/031_column_list.pl line 667.

~~~

4. Details for system "phycodurus" failure at stage subscriptionCheck,
snapshot taken 2022-04-05 17:30:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2022-04-05%2017%3A30%3A04

[20:43:04] t/030_sequences.pl . ok11108 ms ( 0.00
usr  0.00 sys +  1.49 cusr  0.40 csys =  1.89 CPU)
# poll_query_until timed out executing this query:
# SELECT '0/1528640' <= 

Re: Skipping schema changes in publication

2022-05-18 Thread vignesh C
On Tue, May 17, 2022 at 7:35 AM Peter Smith  wrote:
>
> Below are my review comments for v5-0002.
>
> There may be an overlap with comments recently posted by Osumi-san [1].
>
> (I also have review comments for v5-0002; will post them tomorrow)
>
> ==
>
> 1. General
>
> Is it really necessary to have to say "EXCEPT TABLE" instead of just
> "EXCEPT". It seems unnecessarily verbose and redundant when you write
> "FOR ALL TABLES EXCEPT TABLE...".
>
> If you want to keep this TABLE keyword (maybe you have plans for other
> kinds of except?) then IMO perhaps at least it can be the optional
> default except type. e.g. EXCEPT [TABLE].

I have made TABLE optional.

> ~~~
>
> 2. General
>
> (I was unsure whether to even mention this one).
>
> I understand the "EXCEPT" is chosen as the user-facing syntax, but it
> still seems strange when reading the patch to see attribute members
> and column names called 'except'. I think the problem is that "except"
> is not a verb, so saying except=t/f just does not make much sense.
> Sometimes I feel that for all the internal usage
> (code/comments/catalog) using "skip" and "skip-list" etc would be a
> much better choice of names. OTOH I can see that having consistency
> with the outside syntax might also be good. Anyway, please consider -
> maybe other people feel the same?

Earlier we had discussed whether to use SKIP, but felt SKIP was not
appropriate and planned to use except as in [1]. Let's use except
unless we find a better alternative.

> ~~~
>
> 3. General
>
> The ONLY keyword seems supported by the syntax for tables of the
> except-list (more on this in later comments) but:
> a) I am not sure if the patch code is accounting for that, and

I have kept the behavior similar to FOR TABLE

> b) There are no test cases using ONLY.

Added tests for the same

> ~~~
>
> 4. Commit message
>
> A new option "EXCEPT TABLE" in Create/Alter Publication allows
> one or more tables to be excluded, publisher will exclude sending the data
> of the excluded tables to the subscriber.
>
> SUGGESTION
> A new "EXCEPT TABLE" clause for CREATE/ALTER PUBLICATION allows one or
> more tables to be excluded. The publisher will not send the data of
> excluded tables to the subscriber.

Modified

> ~~
>
> 5. Commit message
>
> The new syntax allows specifying exclude relations while creating a 
> publication
> or exclude relations in alter publication. For example:
>
> SUGGESTION
> The new syntax allows specifying excluded relations when creating or
> altering a publication. For example:

Modified

> ~~~
>
> 6. Commit message
>
> A new column prexcept is added to table "pg_publication_rel", to maintain
> the relations that the user wants to exclude publishing through the 
> publication.
>
> SUGGESTION
> A new column "prexcept" is added to table "pg_publication_rel", to
> maintain the relations that the user wants to exclude from the
> publications.

Modified

> ~~~
>
> 7. Commit message
>
> Modified the output plugin (pgoutput) to exclude publishing the changes of the
> excluded tables.
>
> I did not feel it was necessary to say this. It is already said above
> that the data is not sent, so that seems enough.

Modified

> ~~~
>
> 8. Commit message
>
> Updates pg_dump to identify and dump the excluded tables of the publications.
> Updates the \d family of commands to display excluded tables of the
> publications and \dRp+ variant will now display associated except tables if 
> any.
>
> SUGGESTION
> pg_dump is updated to identify and dump the excluded tables of the 
> publications.
>
> The psql \d family of commands to display excluded tables. e.g. psql
> \dRp+ variant will now display associated "except tables" if any.

Modified

> ~~~
>
> 9. doc/src/sgml/catalogs.sgml
>
> @@ -6426,6 +6426,15 @@ SCRAM-SHA-256$iteration
> count:
>if there is no publication qualifying condition.
>   
>
> + 
> +  
> +  prexcept bool
> +  
> +  
> +   True if the table must be excluded
> +  
> + 
>
> Other descriptions on this page refer to "relation" instead of
> "table". Probably this should do the same to be consistent.

Modified

> ~~~
>
> 10. doc/src/sgml/logical-replication.sgml
>
> @@ -1167,8 +1167,9 @@ CONTEXT:  processing remote data for replication
> origin "pg_16395" during "INSER
>
> To add tables to a publication, the user must have ownership rights on the
> table. To add all tables in schema to a publication, the user must be a
> -   superuser. To create a publication that publishes all tables or
> all tables in
> -   schema automatically, the user must be a superuser.
> +   superuser. To add all tables to a publication, the user must be a 
> superuser.
> +   To create a publication that publishes all tables or all tables in schema
> +   automatically, the user must be a superuser.
>
>
> It seems like a valid change but how is this related to this EXCEPT
> patch. Maybe this fix should be patched separately?

Earlier we were not allowed to add 

Re: First draft of the PG 15 release notes

2022-05-18 Thread Amit Langote
On Sat, May 14, 2022 at 12:42 AM Bruce Momjian  wrote:
> On Fri, May 13, 2022 at 10:48:41AM +0900, Amit Langote wrote:
> > Btw, perhaps the following should be listed under E.1.3.2.1. Logical
> > Replication, not E.1.3.1.1. Partitioning?
> >
> > 
> >
> > 
> > 
> > Remove incorrect duplicate partitions in system view
> > pg_publication_tables (Hou Zhijie)
> > 
> > 
> >
> > Attached a patch to do so.
>
> Agreed, done.

Thank you.

Though a bit late given beta is now wrapped, I have another partition
item wording improvement suggestion:

-Previously, a partitioned table with any LIST partition containing
multiple values could not be used for ordered partition scans.  Now
only non-pruned LIST partitions are checked.  This also helps with
-partitioned tables with DEFAULT partitions.

+Previously, an ordered partition scan would not be considered for a
LIST-partitioned table with any partition containing multiple values,
nor for partitioned tables with DEFAULT partition.

I think the "Now only non-pruned LIST partitions are checked" bit in
the original wording is really an implementation detail of the actual
improvement that ordered partition scans are now possible in more
cases -- it simply became easier for the code that implements this
optimization to refer to non-pruned partitions, using a bitmapset
rather than having to trawl through the whole array of partition rels,
which is what I think the commit message of this item mentions.  David
can correct me if I got that wrong.

Attached a patch.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


reword-ordered-partition-scan-item.diff
Description: Binary data


RE: Multi-Master Logical Replication

2022-05-18 Thread kuroda.hay...@fujitsu.com
Hi hackers,

I created a small PoC. Please see the attached patches.

REQUIREMENT

Before patching them, patches in [1] must also be applied.


DIFFERENCES FROM PREVIOUS DESCRIPTIONS

* LRG is now implemented as SQL functions, not as a contrib module.
* New tables are added as system catalogs. Therefore, added tables have oid 
column.
* The node_id is the strcat of system identifier and dbid.


HOW TO USE

In the document patch, a subsection 'Example' was added for understanding LRG. 
In short, we can do

1. lrg_create on one node
2. lrg_node_attach on another node

Also attached is a test script that constructs a three-nodes system.


LIMITATIONS

This feature is under development, so there are many limitations for use case.

* The function for detaching a node from a group is not implemented.
* The function for removing a group is not implemented.
* LRG does not lock system catalogs and databases. Concurrent operations may 
cause inconsistent state.
* LRG does not wait until the upstream node reaches the latest lsn of the 
remaining nodes.
* LRG does not support initial data sync. That is, it can work well only when 
all nodes do not have initial data.


[1]: https://commitfest.postgresql.org/38/3610/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v1-0001-PoC-implement-LRG.patch
Description: v1-0001-PoC-implement-LRG.patch


v1-0002-add-doc.patch
Description: v1-0002-add-doc.patch


test.sh
Description: test.sh


Re: create_help.pl treats as replaceable

2022-05-18 Thread Kyotaro Horiguchi
At Wed, 18 May 2022 18:23:57 +0200, Peter Eisentraut 
 wrote in 
> "Boolean" is correct; see  for
> example.

Ok, so, don't we in turn need to replace "boolean"s with "Boolean"?

"only boolean operators can have negators"
"only boolean operators can have restriction selectivity"
...

And I'm not sure how to do with "bool". Should it be "Boolean" instead
from the point of uniformity?

errmsg("only bool, numeric, and text types could be "

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Skipping schema changes in publication

2022-05-18 Thread osumi.takami...@fujitsu.com
On Thursday, May 19, 2022 2:45 AM vignesh C  wrote:
> On Mon, May 16, 2022 at 8:32 AM osumi.takami...@fujitsu.com
>  wrote:
> > (3) src/test/regress/expected/publication.out
> >
> > +-- Verify that only superuser can reset a publication ALTER
> > +PUBLICATION testpub_reset OWNER TO regress_publication_user2; SET
> > +ROLE regress_publication_user2; ALTER PUBLICATION testpub_reset
> > +RESET; -- fail
> >
> >
> > We have "-- fail" for one case in this patch.
> > On the other hand, isn't better to add "-- ok" (or "-- success") for
> > other successful statements, when we consider the entire tests
> > description consistency ?
> 
> We generally do not mention success comments for all the success cases as
> that might be an overkill. I felt it is better to keep it as it is.
> Thoughts?
Thank you for updating the patches !

In terms of this point,
I meant to say we add "-- ok" for each successful
"ALTER PUBLICATION testpub_reset RESET;" statement.
That means, we'll have just three places to add "--ok"
and I thought this was not an overkill.

*But*, I'm also OK with your idea.
Please don't change the comments
and keep them as it is like v6.


Best Regards,
Takamichi Osumi



Re: Add --{no-,}bypassrls flags to createuser

2022-05-18 Thread David G. Johnston
On Wed, May 18, 2022 at 6:35 PM Shinya Kato 
wrote:

> > Too bad there's no --comment parameter to do COMMENT ON ROLE name IS
> > 'Comment';
> >
> > As you already make such changes in createuser, I would like to ask
> > for an additional --comment parameter
> > that will allow sysadmins to set a comment with additional information
> > about the new DB user.
> > psql is scary for some. :-)
>
> Since the createuser command is a wrapper for the CREATE ROLE command, I
> do not think it is appropriate to add options that the CREATE ROLE
> command does not have.
>
>
I think that this feature is at least worth considering - but absent an
existing command that does this I would agree that doing so constitutes a
separate feature.

As an aside, I'd rather overcome this particular objection by having the
CREATE object command all accept an optional "COMMENT IS" clause.

David J.


Addition of PostgreSQL::Test::Cluster::pg_version()

2022-05-18 Thread Michael Paquier
Hi all,
(Added Andrew in CC.)

While working more on expanding the tests of pg_upgrade for
cross-version checks, I have noticed that we don't expose a routine
able to get back _pg_version from a node, which should remain a
private field of Cluster.pm.  We already do that for install_path, as
of case added by 87076c4.

Any objections or comments about the addition of a routine to get the
PostgreSQL::Version, as of the attached?

Thanks,
--
Michael
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index f842be1a72..c8c7bc5045 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -331,6 +331,20 @@ sub install_path
 
 =pod
 
+=item $node->pg_version()
+
+The version number for the node, from PostgreSQL::Version.
+
+=cut
+
+sub pg_version
+{
+	my ($self) = @_;
+	return $self->{_pg_version};
+}
+
+=pod
+
 =item $node->config_data($option)
 
 Return a string holding configuration data from pg_config, with $option


signature.asc
Description: PGP signature


Re: Add --{no-,}bypassrls flags to createuser

2022-05-18 Thread Shinya Kato

Thanks for reviews and comments!

On 2022-05-06 07:08, Przemysław Sztoch wrote:


Thanks for the new patch!  Would you mind adding some tests for the new
options?


I created a new patch to test the new options!
However, not all option tests exist, so it may be necessary to consider 
whether to actually add this test.




Too bad there's no --comment parameter to do COMMENT ON ROLE name IS
'Comment';

As you already make such changes in createuser, I would like to ask
for an additional --comment parameter
that will allow sysadmins to set a comment with additional information
about the new DB user.
psql is scary for some. :-)


Since the createuser command is a wrapper for the CREATE ROLE command, I 
do not think it is appropriate to add options that the CREATE ROLE 
command does not have.



--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 17579e50af..189ca5bb67 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -76,6 +76,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -a role
+  --admin=role
+  
+   
+ The new role will be added immediately as a member with admin option
+ of this role.
+ Multiple roles to which new role will be added as a member with admin
+ option can be specified by writing multiple
+ -a switches.
+ 
+  
+ 
+
  
   -c number
   --connection-limit=number
@@ -204,6 +218,19 @@ PostgreSQL documentation
   
  
 
+ 
+  -m role
+  --member=role
+  
+   
+ The new role will be added immediately as a member of this role.
+ Multiple roles to which new role will be added as a member
+ can be specified by writing multiple
+ -m switches.
+ 
+  
+ 
+
  
   -P
   --pwprompt
@@ -258,6 +285,17 @@ PostgreSQL documentation
   
  
 
+ 
+  -v timestamp
+  --valid-until=timestamp
+  
+   
+Set a timestamp after which the role's password is no longer valid.
+The default is to set no expiration.
+   
+  
+ 
+
  
-V
--version
@@ -290,6 +328,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --bypassrls
+  
+   
+The new user will have the BYPASSRLS privilege,
+which is described more fully in the documentation for .
+   
+  
+ 
+
+ 
+  --no-bypassrls
+  
+   
+The new user will not have the BYPASSRLS
+privilege, which is described more fully in the documentation for .
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index bfba0d09d1..73cd1b479e 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -51,6 +51,11 @@ main(int argc, char *argv[])
 		{"connection-limit", required_argument, NULL, 'c'},
 		{"pwprompt", no_argument, NULL, 'P'},
 		{"encrypted", no_argument, NULL, 'E'},
+		{"bypassrls", no_argument, NULL, 4},
+		{"no-bypassrls", no_argument, NULL, 5},
+		{"valid-until", required_argument, NULL, 'v'},
+		{"member", required_argument, NULL, 'm'},
+		{"admin", required_argument, NULL, 'a'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -69,6 +74,9 @@ main(int argc, char *argv[])
 	int			conn_limit = -2;	/* less than minimum valid value */
 	bool		pwprompt = false;
 	char	   *newpassword = NULL;
+	SimpleStringList members = {NULL, NULL};
+	SimpleStringList admins = {NULL, NULL};
+	char	   *timestamp = NULL;
 
 	/* Tri-valued variables.  */
 	enum trivalue createdb = TRI_DEFAULT,
@@ -76,7 +84,8 @@ main(int argc, char *argv[])
 createrole = TRI_DEFAULT,
 inherit = TRI_DEFAULT,
 login = TRI_DEFAULT,
-replication = TRI_DEFAULT;
+replication = TRI_DEFAULT,
+bypassrls = TRI_DEFAULT;
 
 	PQExpBufferData sql;
 
@@ -89,7 +98,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "createuser", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PE",
+	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PEv:m:a:",
 			long_options, )) != -1)
 	{
 		switch (c)
@@ -165,6 +174,21 @@ main(int argc, char *argv[])
 			case 3:
 interactive = true;
 break;
+			case 4:
+bypassrls = TRI_YES;
+break;
+			case 5:
+bypassrls = TRI_NO;
+break;
+			case 'v':
+timestamp = pg_strdup(optarg);
+break;
+			case 'm':
+simple_string_list_append(, optarg);
+break;
+			case 'a':
+simple_string_list_append(, optarg);
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -304,8 +328,14 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(, " REPLICATION");
 	if (replication == TRI_NO)
 		

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2022-05-18 Thread Zhihong Yu
On Wed, May 18, 2022 at 5:49 PM Soumyadeep Chakraborty <
soumyadeep2...@gmail.com> wrote:

>
> On Wed, May 18, 2022 at 4:14 PM Justin Pryzby 
> wrote:
>
> > I didn't look closely yet, but this comment is wrong:
> >
> > + * Since these have no storage the tablespace can be updated with a
> simple
>
>
> > + * metadata only operation to update the tablespace.
>
>
>
>
> Good catch. Fixed.
>
> > It'd be convenient if AMs worked the same way (and a bit odd that they
> don't).
> > Note that in v15, pg_dump/restore now allow --no-table-am, an exact
> parallel to
> > --no-tablespace.
>
> I agree that ATSET AM should behave in a similar fashion to ATSET
> tablespaces.
> However, the way that ATSET tablespace currently behaves is not consistent
> with
> the ONLY clause.
>
> On a given partition root:
> ALTER TABLE ONLY am_partitioned SET TABLESPACE ts;
> has the same effect as:
> ALTER TABLE am_partitioned SET TABLESPACE ts;
>
> We are missing out on the feature to set the AM/tablespace throughout the
> partition hierarchy, with one command.
>
> Regards,
> Soumyadeep (VMware)
>
> Hi,

+   accessMethodId = ((Form_pg_class) GETSTRUCT(tup))->relam;

-   /* look up the access method, verify it is for a table */
-   if (accessMethod != NULL)
-   accessMethodId = get_table_am_oid(accessMethod, false);
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for relation %u", relid);

The validity check of tup should be done before fetching the value of
relam field.

Cheers


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2022-05-18 Thread Soumyadeep Chakraborty
On Wed, May 18, 2022 at 4:14 PM Justin Pryzby  wrote:

> I didn't look closely yet, but this comment is wrong:
>
> + * Since these have no storage the tablespace can be updated with a
simple


> + * metadata only operation to update the tablespace.



Good catch. Fixed.

> It'd be convenient if AMs worked the same way (and a bit odd that they
don't).
> Note that in v15, pg_dump/restore now allow --no-table-am, an exact
parallel to
> --no-tablespace.

I agree that ATSET AM should behave in a similar fashion to ATSET
tablespaces.
However, the way that ATSET tablespace currently behaves is not consistent
with
the ONLY clause.

On a given partition root:
ALTER TABLE ONLY am_partitioned SET TABLESPACE ts;
has the same effect as:
ALTER TABLE am_partitioned SET TABLESPACE ts;

We are missing out on the feature to set the AM/tablespace throughout the
partition hierarchy, with one command.

Regards,
Soumyadeep (VMware)
From 3a4a78d46fc74bb3e9b7ac9aefc689d250e1ecf4 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Tue, 17 May 2022 15:22:48 -0700
Subject: [PATCH v2 2/2] Make ATSETAM recurse by default

ATSETAM now recurses to partition children by default. To prevent
recursion, and have the new am apply to future children only, users must
specify the ONLY clause.
---
 src/backend/commands/tablecmds.c|  2 ++
 src/test/regress/expected/create_am.out | 13 -
 src/test/regress/sql/create_am.sql  |  5 -
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5febd834d5b..c5bbb71c66d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4711,6 +4711,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		 errmsg("cannot have multiple SET ACCESS METHOD subcommands")));
 
+			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, context);
 			ATPrepSetAccessMethod(tab, rel, cmd->name);
 			pass = AT_PASS_MISC;	/* does not matter; no work in Phase 2 */
 			break;
diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out
index a48199df9a2..e99de1ac912 100644
--- a/src/test/regress/expected/create_am.out
+++ b/src/test/regress/expected/create_am.out
@@ -284,7 +284,7 @@ CREATE TABLE am_partitioned(x INT, y INT)
 CREATE TABLE am_partitioned_1 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 0);
 CREATE TABLE am_partitioned_2 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 1);
 ALTER TABLE am_partitioned_1 SET ACCESS METHOD heap2;
-ALTER TABLE am_partitioned SET ACCESS METHOD heap2;
+ALTER TABLE ONLY am_partitioned SET ACCESS METHOD heap2;
 CREATE TABLE am_partitioned_3 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 2);
 SELECT relname, amname FROM pg_class c, pg_am am
   WHERE c.relam = am.oid AND c.relname LIKE 'am_partitioned%' ORDER BY 1;
@@ -296,6 +296,17 @@ SELECT relname, amname FROM pg_class c, pg_am am
  am_partitioned_3 | heap2
 (4 rows)
 
+ALTER TABLE am_partitioned SET ACCESS METHOD heap;
+SELECT relname, amname FROM pg_class c, pg_am am
+WHERE c.relam = am.oid AND c.relname LIKE 'am_partitioned%' ORDER BY 1;
+ relname  | amname 
+--+
+ am_partitioned   | heap
+ am_partitioned_1 | heap
+ am_partitioned_2 | heap
+ am_partitioned_3 | heap
+(4 rows)
+
 DROP TABLE am_partitioned;
 -- Second, create objects in the new AM by changing the default AM
 BEGIN;
diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql
index 8f3db03bba2..f4e22684782 100644
--- a/src/test/regress/sql/create_am.sql
+++ b/src/test/regress/sql/create_am.sql
@@ -184,10 +184,13 @@ CREATE TABLE am_partitioned(x INT, y INT)
 CREATE TABLE am_partitioned_1 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 0);
 CREATE TABLE am_partitioned_2 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 1);
 ALTER TABLE am_partitioned_1 SET ACCESS METHOD heap2;
-ALTER TABLE am_partitioned SET ACCESS METHOD heap2;
+ALTER TABLE ONLY am_partitioned SET ACCESS METHOD heap2;
 CREATE TABLE am_partitioned_3 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 2);
 SELECT relname, amname FROM pg_class c, pg_am am
   WHERE c.relam = am.oid AND c.relname LIKE 'am_partitioned%' ORDER BY 1;
+ALTER TABLE am_partitioned SET ACCESS METHOD heap;
+SELECT relname, amname FROM pg_class c, pg_am am
+WHERE c.relam = am.oid AND c.relname LIKE 'am_partitioned%' ORDER BY 1;
 DROP TABLE am_partitioned;
 
 -- Second, create objects in the new AM by changing the default AM
-- 
2.25.1

From 6edce14b5becb30ca7c12224acaabea62a4d999f Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Mon, 16 May 2022 14:46:20 -0700
Subject: [PATCH v2 1/2] Allow ATSETAM on partition roots

Setting the access method on partition roots was disallowed. 

Re: PostgreSQL 15 Beta 1 release announcement draft

2022-05-18 Thread Thomas Munro
On Wed, May 18, 2022 at 1:50 AM Jonathan S. Katz  wrote:
> On 5/15/22 10:58 PM, Amit Kapila wrote:
> > On Sun, May 15, 2022 at 12:22 AM Jonathan S. Katz  
> > wrote:
> >> Please provide feedback no later than 2022-05-19 0:00 AoE[1].
> >
> >> [`recovery_prefetch`](https://www.postgresql.org/docs/15/runtime-config-wal.html#GUC-RECOVERY-PREFETCH)
> >> that can help speed up all recovery operations by prefetching data blocks.
> >
> > Is it okay to say that this feature speeds up *all* recovery
> > operations? See the discussion between Simon and Tomas [1] related to
> > this.
>
> I'll all to hedge.

+1, thanks.




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Ranier Vilela
Em qua., 18 de mai. de 2022 às 19:57, David Rowley 
escreveu:

> On Thu, 19 May 2022 at 02:08, Ranier Vilela  wrote:
> > That would initialize the content at compilation and not at runtime,
> correct?
>
> Your mental model of compilation and run-time might be flawed here.
> Here's no such thing as zeroing memory at compile time. There's only
> emitting instructions that perform those tasks at run-time.
> https://godbolt.org/ might help your understanding.
>
> > There are a lot of cases using MemSet (with struct variables) and at
> Windows 64 bits, long are 4 (four) bytes.
> > So I believe that MemSet is less efficient on Windows than on Linux.
> > "The size of the '_vstart' buffer is not a multiple of the element size
> of the type 'long'."
> > message from PVS-Studio static analysis tool.
>
> I've been wondering for a while if we really need to have the MemSet()
> macro. I see it was added in 8cb415449 (1997).  I think compilers have
> evolved quite a bit in the past 25 years, so it could be time to
> revisit that.
>
> Your comment on the sizeof(long) on win64 is certainly true.  I wrote
> the attached C program to test the performance difference.
>
> (windows 64-bit)
> >cl memset.c /Ox
> >memset 2
> Running 2 loops
> MemSet: size 8: 1.833000 seconds
> MemSet: size 16: 1.841000 seconds
> MemSet: size 32: 1.838000 seconds
> MemSet: size 64: 1.851000 seconds
> MemSet: size 128: 3.228000 seconds
> MemSet: size 256: 5.278000 seconds
> MemSet: size 512: 3.943000 seconds
> memset: size 8: 0.065000 seconds
> memset: size 16: 0.131000 seconds
> memset: size 32: 0.262000 seconds
> memset: size 64: 0.53 seconds
> memset: size 128: 1.169000 seconds
> memset: size 256: 2.95 seconds
> memset: size 512: 3.191000 seconds
>
> It seems like there's no cases there where MemSet is faster than
> memset.  I was careful to only provide MemSet() with inputs that
> result in it not using the memset fallback.  I also provided constants
> so that the decision about which method to use was known at compile
> time.
>
> It's not clear to me why 512 is faster than 256. I saw the same on a
> repeat run.
>
> Changing "long" to "long long" it looks like:
>
> >memset 2
> Running 2 loops
> MemSet: size 8: 0.066000 seconds
> MemSet: size 16: 1.978000 seconds
> MemSet: size 32: 1.982000 seconds
> MemSet: size 64: 1.973000 seconds
> MemSet: size 128: 1.97 seconds
> MemSet: size 256: 3.225000 seconds
> MemSet: size 512: 5.366000 seconds
> memset: size 8: 0.069000 seconds
> memset: size 16: 0.132000 seconds
> memset: size 32: 0.265000 seconds
> memset: size 64: 0.527000 seconds
> memset: size 128: 1.161000 seconds
> memset: size 256: 2.976000 seconds
> memset: size 512: 3.179000 seconds
>
> The situation is a little different on my Linux machine:
>
> $ gcc memset.c -o memset -O2
> $ ./memset 2
> Running 2 loops
> MemSet: size 8: 0.02 seconds
> MemSet: size 16: 0.00 seconds
> MemSet: size 32: 0.094041 seconds
> MemSet: size 64: 0.184618 seconds
> MemSet: size 128: 1.781503 seconds
> MemSet: size 256: 2.547910 seconds
> MemSet: size 512: 4.005173 seconds
> memset: size 8: 0.046156 seconds
> memset: size 16: 0.046123 seconds
> memset: size 32: 0.092291 seconds
> memset: size 64: 0.184509 seconds
> memset: size 128: 1.781518 seconds
> memset: size 256: 2.577104 seconds
> memset: size 512: 4.004757 seconds
>
> It looks like part of the work might be getting optimised away in the
> 8-16 MemSet() calls.
>
> clang seems to have the opposite for size 8.
>
> $ clang memset.c -o memset -O2
> $ ./memset 2
> Running 2 loops
> MemSet: size 8: 0.007653 seconds
> MemSet: size 16: 0.005771 seconds
> MemSet: size 32: 0.011539 seconds
> MemSet: size 64: 0.023095 seconds
> MemSet: size 128: 0.046130 seconds
> MemSet: size 256: 0.092269 seconds
> MemSet: size 512: 0.968564 seconds
> memset: size 8: 0.00 seconds
> memset: size 16: 0.005776 seconds
> memset: size 32: 0.011559 seconds
> memset: size 64: 0.023069 seconds
> memset: size 128: 0.046129 seconds
> memset: size 256: 0.092243 seconds
> memset: size 512: 0.968534 seconds
>
The results from clang, only reinforce the argument in favor of native
memset.
There is still room for gcc to improve with 8/16 bytes and for sure at some
point they will.
Which will make memset faster on all platforms and compilers.

regards,
Ranier Vilela


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Ranier Vilela
Em qua., 18 de mai. de 2022 às 20:20, Tom Lane  escreveu:


> zeroing
> relatively small, known-aligned node structs is THE use case.
>
Currently, especially on 64-bit Windows, MemSet can break alignment.

regards,
Ranier Vilela


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Ranier Vilela
Em qua., 18 de mai. de 2022 às 19:57, David Rowley 
escreveu:

> On Thu, 19 May 2022 at 02:08, Ranier Vilela  wrote:
> > That would initialize the content at compilation and not at runtime,
> correct?
>
> Your mental model of compilation and run-time might be flawed here.
> Here's no such thing as zeroing memory at compile time. There's only
> emitting instructions that perform those tasks at run-time.
> https://godbolt.org/ might help your understanding.
>
> > There are a lot of cases using MemSet (with struct variables) and at
> Windows 64 bits, long are 4 (four) bytes.
> > So I believe that MemSet is less efficient on Windows than on Linux.
> > "The size of the '_vstart' buffer is not a multiple of the element size
> of the type 'long'."
> > message from PVS-Studio static analysis tool.
>
> I've been wondering for a while if we really need to have the MemSet()
> macro. I see it was added in 8cb415449 (1997).  I think compilers have
> evolved quite a bit in the past 25 years, so it could be time to
> revisit that.
>
+1
All compilers currently have memset optimized.


> Your comment on the sizeof(long) on win64 is certainly true.  I wrote
> the attached C program to test the performance difference.
>
> (windows 64-bit)
> >cl memset.c /Ox
> >memset 2
> Running 2 loops
> MemSet: size 8: 1.833000 seconds
> MemSet: size 16: 1.841000 seconds
> MemSet: size 32: 1.838000 seconds
> MemSet: size 64: 1.851000 seconds
> MemSet: size 128: 3.228000 seconds
> MemSet: size 256: 5.278000 seconds
> MemSet: size 512: 3.943000 seconds
> memset: size 8: 0.065000 seconds
> memset: size 16: 0.131000 seconds
> memset: size 32: 0.262000 seconds
> memset: size 64: 0.53 seconds
> memset: size 128: 1.169000 seconds
> memset: size 256: 2.95 seconds
> memset: size 512: 3.191000 seconds
>
> It seems like there's no cases there where MemSet is faster than
> memset.  I was careful to only provide MemSet() with inputs that
> result in it not using the memset fallback.  I also provided constants
> so that the decision about which method to use was known at compile
> time.
>
> It's not clear to me why 512 is faster than 256.

Probably broken alignment with 256?
Another warning from PVS-Studio:
[1] "The pointer '_start' is cast to a more strictly aligned pointer type."

src/contrib/postgres_fdw/connection.c (Line 1690)
MemSet(values, 0, sizeof(values));



> I saw the same on a repeat run.
>
> Changing "long" to "long long" it looks like:
>
> >memset 2
> Running 2 loops
> MemSet: size 8: 0.066000 seconds
> MemSet: size 16: 1.978000 seconds
> MemSet: size 32: 1.982000 seconds
> MemSet: size 64: 1.973000 seconds
> MemSet: size 128: 1.97 seconds
> MemSet: size 256: 3.225000 seconds
> MemSet: size 512: 5.366000 seconds
> memset: size 8: 0.069000 seconds
> memset: size 16: 0.132000 seconds
> memset: size 32: 0.265000 seconds
> memset: size 64: 0.527000 seconds
> memset: size 128: 1.161000 seconds
> memset: size 256: 2.976000 seconds
> memset: size 512: 3.179000 seconds
>
> The situation is a little different on my Linux machine:
>
> $ gcc memset.c -o memset -O2
> $ ./memset 2
> Running 2 loops
> MemSet: size 8: 0.02 seconds
> MemSet: size 16: 0.00 seconds
> MemSet: size 32: 0.094041 seconds
> MemSet: size 64: 0.184618 seconds
> MemSet: size 128: 1.781503 seconds
> MemSet: size 256: 2.547910 seconds
> MemSet: size 512: 4.005173 seconds
> memset: size 8: 0.046156 seconds
> memset: size 16: 0.046123 seconds
> memset: size 32: 0.092291 seconds
> memset: size 64: 0.184509 seconds
> memset: size 128: 1.781518 seconds
> memset: size 256: 2.577104 seconds
> memset: size 512: 4.004757 seconds
>
> It looks like part of the work might be getting optimised away in the
> 8-16 MemSet() calls.
>
On linux (long) have 8 bytes.
I'm still surprised that MemSet (8/16) is faster.


> clang seems to have the opposite for size 8.
>
> $ clang memset.c -o memset -O2
> $ ./memset 2
> Running 2 loops
> MemSet: size 8: 0.007653 seconds
> MemSet: size 16: 0.005771 seconds
> MemSet: size 32: 0.011539 seconds
> MemSet: size 64: 0.023095 seconds
> MemSet: size 128: 0.046130 seconds
> MemSet: size 256: 0.092269 seconds
> MemSet: size 512: 0.968564 seconds
> memset: size 8: 0.00 seconds
> memset: size 16: 0.005776 seconds
> memset: size 32: 0.011559 seconds
> memset: size 64: 0.023069 seconds
> memset: size 128: 0.046129 seconds
> memset: size 256: 0.092243 seconds
> memset: size 512: 0.968534 seconds
>
> There does not seem to be any significant reduction in the size of the
> binary from changing the MemSet macro to directly use memset. It went
> from 9865008 bytes down to 9860800 bytes (4208 bytes less).
>
Anyway I think on Windows 64 bits,
it is very worthwhile to remove the MemSet macro.

regards,
Ranier Vilela

[1] https://pvs-studio.com/en/docs/warnings/v1032/


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Tom Lane
David Rowley  writes:
> I've been wondering for a while if we really need to have the MemSet()
> macro. I see it was added in 8cb415449 (1997).  I think compilers have
> evolved quite a bit in the past 25 years, so it could be time to
> revisit that.

Yeah, I've thought for awhile that technology has moved on from that.
Nobody's really taken the trouble to measure it though.  (And no,
results from one compiler on one machine are not terribly convincing.)

The thing that makes this a bit more difficult than it might be is
the special cases we have for known-aligned and so on targets, which
are particularly critical for palloc0 and makeNode etc.  So there's
more than one case to look into.  But I'd argue that those special
cases are actually what we want to worry about the most: zeroing
relatively small, known-aligned node structs is THE use case.

regards, tom lane




Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2022-05-18 Thread Justin Pryzby
Thanks for copying me.

I didn't look closely yet, but this comment is wrong:

+ * Since these have no storage the tablespace can be updated with a simple 

   
+ * metadata only operation to update the tablespace.   

   

As I see it, AMs are a strong parallel to tablespaces.  The default tablespace
is convenient: 1) explicitly specified tablespace; 2) tablespace of parent,
partitioned table; 3) DB tablespace; 4) default_tablespace:
https://www.postgresql.org/message-id/20190423222633.GA8364%40alvherre.pgsql

It'd be convenient if AMs worked the same way (and a bit odd that they don't).
Note that in v15, pg_dump/restore now allow --no-table-am, an exact parallel to
--no-tablespace.

-- 
Justin




Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2022-05-18 Thread Zhihong Yu
Hi,

+   tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+   accessMethodId = ((Form_pg_class) GETSTRUCT(tup))->relam;

-   /* look up the access method, verify it is for a table */
-   if (accessMethod != NULL)
-   accessMethodId = get_table_am_oid(accessMethod, false);
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for relation %u", relid);

Shouldn't the validity of tup be checked before relam field is accessed ?

Cheers


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread David Rowley
On Thu, 19 May 2022 at 02:08, Ranier Vilela  wrote:
> That would initialize the content at compilation and not at runtime, correct?

Your mental model of compilation and run-time might be flawed here.
Here's no such thing as zeroing memory at compile time. There's only
emitting instructions that perform those tasks at run-time.
https://godbolt.org/ might help your understanding.

> There are a lot of cases using MemSet (with struct variables) and at Windows 
> 64 bits, long are 4 (four) bytes.
> So I believe that MemSet is less efficient on Windows than on Linux.
> "The size of the '_vstart' buffer is not a multiple of the element size of 
> the type 'long'."
> message from PVS-Studio static analysis tool.

I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997).  I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.

Your comment on the sizeof(long) on win64 is certainly true.  I wrote
the attached C program to test the performance difference.

(windows 64-bit)
>cl memset.c /Ox
>memset 2
Running 2 loops
MemSet: size 8: 1.833000 seconds
MemSet: size 16: 1.841000 seconds
MemSet: size 32: 1.838000 seconds
MemSet: size 64: 1.851000 seconds
MemSet: size 128: 3.228000 seconds
MemSet: size 256: 5.278000 seconds
MemSet: size 512: 3.943000 seconds
memset: size 8: 0.065000 seconds
memset: size 16: 0.131000 seconds
memset: size 32: 0.262000 seconds
memset: size 64: 0.53 seconds
memset: size 128: 1.169000 seconds
memset: size 256: 2.95 seconds
memset: size 512: 3.191000 seconds

It seems like there's no cases there where MemSet is faster than
memset.  I was careful to only provide MemSet() with inputs that
result in it not using the memset fallback.  I also provided constants
so that the decision about which method to use was known at compile
time.

It's not clear to me why 512 is faster than 256. I saw the same on a repeat run.

Changing "long" to "long long" it looks like:

>memset 2
Running 2 loops
MemSet: size 8: 0.066000 seconds
MemSet: size 16: 1.978000 seconds
MemSet: size 32: 1.982000 seconds
MemSet: size 64: 1.973000 seconds
MemSet: size 128: 1.97 seconds
MemSet: size 256: 3.225000 seconds
MemSet: size 512: 5.366000 seconds
memset: size 8: 0.069000 seconds
memset: size 16: 0.132000 seconds
memset: size 32: 0.265000 seconds
memset: size 64: 0.527000 seconds
memset: size 128: 1.161000 seconds
memset: size 256: 2.976000 seconds
memset: size 512: 3.179000 seconds

The situation is a little different on my Linux machine:

$ gcc memset.c -o memset -O2
$ ./memset 2
Running 2 loops
MemSet: size 8: 0.02 seconds
MemSet: size 16: 0.00 seconds
MemSet: size 32: 0.094041 seconds
MemSet: size 64: 0.184618 seconds
MemSet: size 128: 1.781503 seconds
MemSet: size 256: 2.547910 seconds
MemSet: size 512: 4.005173 seconds
memset: size 8: 0.046156 seconds
memset: size 16: 0.046123 seconds
memset: size 32: 0.092291 seconds
memset: size 64: 0.184509 seconds
memset: size 128: 1.781518 seconds
memset: size 256: 2.577104 seconds
memset: size 512: 4.004757 seconds

It looks like part of the work might be getting optimised away in the
8-16 MemSet() calls.

clang seems to have the opposite for size 8.

$ clang memset.c -o memset -O2
$ ./memset 2
Running 2 loops
MemSet: size 8: 0.007653 seconds
MemSet: size 16: 0.005771 seconds
MemSet: size 32: 0.011539 seconds
MemSet: size 64: 0.023095 seconds
MemSet: size 128: 0.046130 seconds
MemSet: size 256: 0.092269 seconds
MemSet: size 512: 0.968564 seconds
memset: size 8: 0.00 seconds
memset: size 16: 0.005776 seconds
memset: size 32: 0.011559 seconds
memset: size 64: 0.023069 seconds
memset: size 128: 0.046129 seconds
memset: size 256: 0.092243 seconds
memset: size 512: 0.968534 seconds

There does not seem to be any significant reduction in the size of the
binary from changing the MemSet macro to directly use memset. It went
from 9865008 bytes down to 9860800 bytes (4208 bytes less).

David
#include 
#include 
#include 
#include 
#include 

#define LONG_ALIGN_MASK (sizeof(long) - 1)
#define MEMSET_LOOP_LIMIT 1024

#define MemSet(start, val, len) \
do \
{ \
/* must be void* because we don't know if it is integer aligned 
yet */ \
void   *_vstart = (void *) (start); \
int _val = (val); \
Size_len = (len); \
\
if uintptr_t) _vstart) & LONG_ALIGN_MASK) == 0 && \
(_len & LONG_ALIGN_MASK) == 0 && \
_val == 0 && \
_len <= MEMSET_LOOP_LIMIT && \
/* \
 *  If MEMSET_LOOP_LIMIT == 0, optimizer should 
find \
 *  the whole "if" false at compile time. \
 */ \
MEMSET_LOOP_LIMIT != 0) \
{ \

Re: Limiting memory allocation

2022-05-18 Thread Joe Conway

On 5/18/22 16:20, Alvaro Herrera wrote:

On 2022-May-18, Joe Conway wrote:


On 5/18/22 11:11, Alvaro Herrera wrote:



> Apparently, if the cgroup goes over the "high" limit, the processes are
> *throttled*.  Then if the group goes over the "max" limit, OOM-killer is
> invoked.



You may be misinterpreting "throttle" in this context. From [1]:

  The memory.high boundary on the other hand can be set
  much more conservatively. When hit, it throttles
  allocations by forcing them into direct reclaim to
  work off the excess, but it never invokes the OOM
  killer.


Well, that means the backend processes don't do their expected task
(process some query) but instead they have to do "direct reclaim".  I
don't know what that is, but it sounds like we'd need to add
Linux-specific code in order for this to fix anything. 


Postgres does not need to do anything. The kernel just does its thing 
(e.g. clearing page cache or swapping out anon memory) more aggressively 
than normal to clear up some space for the impending allocation.



And what would we do in such a situation anyway?  Seems like our
best hope would be to> get malloc() to return NULL and have the
resulting transaction abort free enough memory that things in other
backends can continue to run.


With the right hooks an extension could detect the memory pressure in an 
OS specific way and return null.



*If* there is a way to have cgroups make Postgres do that, then that
would be useful enough.


Memory accounting under cgroups (particularly v2) can provide the signal 
needed for a Linux specific extension to do that.



> So ditch cgroups.

You cannot ditch cgroups if you are running in a container. And in fact most
non-container installations these days are also running in a cgroup under
systemd.


I just meant that the cgroup abstraction doesn't offer any interfaces
that we can use to improve this, not that we would be running without
them.


I agree that cgroups is very Linux specific, so likely we would not want 
such code in core.



--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Limiting memory allocation

2022-05-18 Thread Alvaro Herrera
On 2022-May-18, Joe Conway wrote:

> On 5/18/22 11:11, Alvaro Herrera wrote:

> > Apparently, if the cgroup goes over the "high" limit, the processes are
> > *throttled*.  Then if the group goes over the "max" limit, OOM-killer is
> > invoked.

> You may be misinterpreting "throttle" in this context. From [1]:
> 
>   The memory.high boundary on the other hand can be set
>   much more conservatively. When hit, it throttles
>   allocations by forcing them into direct reclaim to
>   work off the excess, but it never invokes the OOM
>   killer.

Well, that means the backend processes don't do their expected task
(process some query) but instead they have to do "direct reclaim".  I
don't know what that is, but it sounds like we'd need to add
Linux-specific code in order for this to fix anything.  And what would
we do in such a situation anyway?  Seems like our best hope would be to
get malloc() to return NULL and have the resulting transaction abort
free enough memory that things in other backends can continue to run.

*If* there is a way to have cgroups make Postgres do that, then that
would be useful enough.

> > So ditch cgroups.
> 
> You cannot ditch cgroups if you are running in a container. And in fact most
> non-container installations these days are also running in a cgroup under
> systemd.

I just meant that the cgroup abstraction doesn't offer any interfaces
that we can use to improve this, not that we would be running without
them.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)




Re: [RFC] building postgres with meson -v8

2022-05-18 Thread Andres Freund
Hi,

On 2022-05-18 10:30:12 +0200, Peter Eisentraut wrote:
> Here are some more patches that clean up various minor issues.

I rebased the meson tree, squashed a lot of the existing commits, merged your
changes, and fixed a few more differences between autoconf and meson.


For me the difference in defines now boils down to:

- CONFIGURE_ARGS - empty in meson, not clear what to fill it with
- GETTIMEOFDAY_1ARG - test doesn't exist - I suspect it might not be necessary
- PACKAGE_STRING, PACKAGE_TARNAME - unclear if they should be implemented?
- AC_APPLE_UNIVERSAL_BUILD logic - which I don't think we need?
- pg_restrict is defined in a simplistic way
- "missing" a bunch of defines that don't appear to be referenced:
  HAVE_FSEEKO
  HAVE_GSSAPI_GSSAPI_H
  HAVE_INTTYPES_H
  HAVE_LDAP_H
  HAVE_LIBCRYPTO
  HAVE_LIBLDAP
  HAVE_LIBM
  HAVE_LIBPAM
  HAVE_LIBSSL
  HAVE_LIBXML2
  HAVE_LIBXSLT
  HAVE_MEMORY_H
  HAVE_PTHREAD
  HAVE_PTHREAD_PRIO_INHERIT
  HAVE_STDINT_H
  HAVE_STDLIB_H
  HAVE_STRING_H
  HAVE_SYS_STAT_H
  HAVE_SYS_TYPES_H
  HAVE_UNISTD_H
  SIZEOF_BOOL
  SIZEOF_OFF_T
  STDC_HEADERS
- meson additional defines, seems harmless:
  HAVE_GETTIMEOFDAY - only defined on windows rn
  HAVE_SHM_UNLINK
  HAVE_SSL_NEW
  HAVE_STRTOQ
  HAVE_STRTOUQ
  HAVE_CRYPTO_NEW_EX_DATA
- a bunch of additional #undef's


Greetings,

Andres Freund




Re: Limiting memory allocation

2022-05-18 Thread Joe Conway

On 5/18/22 11:11, Alvaro Herrera wrote:

Now that's where cgroup's memory limiting features would prove useful,
if they weren't totally braindead:
https://www.kernel.org/doc/Documentation/cgroup-v2.txt
Apparently, if the cgroup goes over the "high" limit, the processes are
*throttled*.  Then if the group goes over the "max" limit, OOM-killer is
invoked.

(I can't see any way to make this even more counterproductive to the
database use case.  Making the database work more slowly doesn't fix
anything.)


You may be misinterpreting "throttle" in this context. From [1]:

  The memory.high boundary on the other hand can be set
  much more conservatively. When hit, it throttles
  allocations by forcing them into direct reclaim to
  work off the excess, but it never invokes the OOM
  killer.


So ditch cgroups.


You cannot ditch cgroups if you are running in a container. And in fact 
most non-container installations these days are also running in a cgroup 
under systemd.


The only difference is that you are more likely to see a memory limit 
set in a container than under systemd.


[1] 
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/cgroup-v2.rst


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Zstandard support for toast compression

2022-05-18 Thread Peter Geoghegan
On Wed, May 18, 2022 at 9:17 AM Robert Haas  wrote:
> But I want to point out here that you haven't really offered any kind
> of argument in favor of supporting Zstd. You basically seem to just be
> arguing that it's dumb to worry about running out of bit space, and I
> think that's just obviously false.

+1

-- 
Peter Geoghegan




Re: support for MERGE

2022-05-18 Thread Alvaro Herrera
On 2022-May-18, Tom Lane wrote:

> I don't think the committed patch will behave nicely in edge cases:
[...]
> If it's text format and total is 0, I'd wish it to print nothing,
> not to revert to the verbose format.

Oops, true.  Fixed.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers




Re: Zstandard support for toast compression

2022-05-18 Thread Justin Pryzby
On Tue, May 17, 2022 at 02:54:28PM -0400, Robert Haas wrote:
> I don't particularly have anything against adding Zstandard
> compression here, but I wonder whether there's any rush. If we decide
> not to add this now, we can always change our minds and add it later,
> but if we decide to add it now, there's no backing it out. I'd
> probably be inclined to wait and see if our public demands it of us.

+1

One consideration is that zstd with negative compression levels is comparable
to LZ4, and with positive levels gets better compression.  It can serve both
purposes (oltp vs DW or storage-limited vs cpu-limited).

If zstd is supported, then for sure at least its compression level should be
configurable.  default_toast_compression should support it.
https://commitfest.postgresql.org/35/3102/

Also, zstd is a few years newer than lz4.  Which I hope means that the API is a
bit better/further advanced - but (as we've seen) may still be evolving.

Zstd allows some of its options to be set by environment variable - in
particular, the number of threads.  We should consider explicitly setting
that to zero in the toast context unless we're convinced it's no issue for
every backend (not just basebackup).

-- 
Justin




Re: Privileges on PUBLICATION

2022-05-18 Thread Euler Taveira
On Wed, May 18, 2022, at 6:44 AM, Antonin Houska wrote:
> ok, please see the next version.
The new paragraph looks good to me. I'm not sure if the CREATE PUBLICATION is
the right place to provide such information. As I suggested in a previous email
[1], you could add it to "Logical Replication > Security".

[1] https://postgr.es/m/d96103fe-99e2-4119-bd76-952d326b7...@www.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Privileges on PUBLICATION

2022-05-18 Thread Euler Taveira
On Wed, May 18, 2022, at 6:16 AM, Antonin Houska wrote:
> The patch is attached to this message.
Great. Add it to the next CF. I'll review it when I have some spare time.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Skipping schema changes in publication

2022-05-18 Thread vignesh C
On Mon, May 16, 2022 at 2:00 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Saturday, May 14, 2022 10:33 PM vignesh C  wrote:
> > Thanks for the comments, the attached v5 patch has the changes for the same.
> > Also I have made the changes for SKIP Table based on the new syntax, the
> > changes for the same are available in
> > v5-0002-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patch.
> Hi,
>
>
>
> Several comments on v5-0002.
>
> (1) One unnecessary space before "except_pub_obj_list" syntax definition
>
> + except_pub_obj_list:  ExceptPublicationObjSpec
> +   { $$ = list_make1($1); }
> +   | except_pub_obj_list ',' ExceptPublicationObjSpec
> +   { $$ = lappend($1, $3); }
> +   |  /*EMPTY*/  
>   { $$ = NULL; }
> +   ;
> +
>
> From above part, kindly change
> FROM:
> " except_pub_obj_list:  ExceptPublicationObjSpec"
> TO:
> "except_pub_obj_list:  ExceptPublicationObjSpec"
>

Modified

> (2) doc/src/sgml/ref/create_publication.sgml
>
> (2-1)
>
> @@ -22,7 +22,7 @@ PostgreSQL documentation
>   
>  
>  CREATE PUBLICATION name
> -[ FOR ALL TABLES
> +[ FOR ALL TABLES [EXCEPT TABLE [ ONLY ]  class="parameter">table_name [ * ] [, ... ]]
>| FOR publication_object 
> [, ... ] ]
>  [ WITH (  class="parameter">publication_parameter [=  class="parameter">value] [, ... ] ) ]
>
>
> Here I think we need to add two more whitespaces around square brackets.
> Please change
> FROM:
> "[ FOR ALL TABLES [EXCEPT TABLE [ ONLY ]  class="parameter">table_name [ * ] [, ... ]]"
> TO:
> "[ FOR ALL TABLES [ EXCEPT TABLE [ ONLY ]  class="parameter">table_name [ * ] [, ... ] ]"
>
> When I check other documentations, I see whitespaces before/after square 
> brackets.
>
> (2-2)
> This whitespace alignment applies to alter_publication.sgml as well.

Modified

> (3)
>
>
> @@ -156,6 +156,24 @@ CREATE PUBLICATION  class="parameter">name
>  
> 
>
> +
> +   
> +EXCEPT TABLE
> +
> + 
> +  Marks the publication as one that excludes replicating changes for the
> +  specified tables.
> + 
> +
> + 
> +  EXCEPT TABLE can be specified only for
> +  FOR ALL TABLES publication. It is not supported for
> +  FOR ALL TABLES IN SCHEMA  publication and
> +  FOR TABLE publication.
> + 
> +
> +   
> +
>
> This EXCEPT TABLE clause is only for FOR ALL TABLES.
> So, how about extracting the main message from above part and
> moving it to an exising paragraph below, instead of having one independent 
> paragraph ?
>
>
> FOR ALL TABLES
> 
>  
>   Marks the publication as one that replicates changes for all tables in
>   the database, including tables created in the future.
>  
> 
>
>
> Something like
> "Marks the publication as one that replicates changes for all tables in
> the database, including tables created in the future. EXCEPT TABLE indicates
> excluded tables for the defined publication.
> "
>

Modified

> (4) One minor confirmation about the syntax
>
> Currently, we allow one way of writing to indicate excluded tables like below.
>
> (example) CREATE PUBLICATION mypub FOR ALL TABLES EXCEPT TABLE tab3, tab4, 
> EXCEPT TABLE tab5;
>
> This is because we define ExceptPublicationObjSpec with EXCEPT TABLE.
> Is it OK to have a room to write duplicate "EXCEPT TABLE" clauses ?
> I think there is no harm in having this,
> but I'd like to confirm whether this syntax might be better to be adjusted or 
> not.

Changed it to allow except table only once

>
> (5) CheckAlterPublication
>
> +
> +   if (excepttable && !stmt->for_all_tables)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("publication \"%s\" is not defined as 
> FOR ALL TABLES",
> +   NameStr(pubform->pubname)),
> +errdetail("except table cannot be added to, 
> dropped from, or set on NON ALL TABLES publications.")));
>
> Could you please add a test for this ?

This code can be removed because of grammar optimization, it will not
allow tables without "ALL TABLES". Removed this code

The v6 patch attached at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm0iZZDB300Dez_97S8G6_RW5QpQ8ef6X3wq8tyK-8wnXQ%40mail.gmail.com

Regards,
Vignesh




Re: Skipping schema changes in publication

2022-05-18 Thread vignesh C
On Wed, May 18, 2022 at 8:30 AM shiy.f...@fujitsu.com
 wrote:
>
> On Sat, May 14, 2022 9:33 PM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v5 patch has the changes for the
> > same. Also I have made the changes for SKIP Table based on the new
> > syntax, the changes for the same are available in
> > v5-0002-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patch.
> >
>
> Thanks for your patch. Here are some comments on v5-0001 patch.
>
> +   Oid relid = lfirst_oid(lc);
> +
> +   prid = GetSysCacheOid2(PUBLICATIONRELMAP, 
> Anum_pg_publication_rel_oid,
> +  
> ObjectIdGetDatum(relid),
> +  
> ObjectIdGetDatum(pubid));
> +   if (!OidIsValid(prid))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("relation \"%s\" is not part 
> of the publication",
> +   
> RelationGetRelationName(rel;
>
> I think the relation in the error message should be the one whose oid is
> "relid", instead of relation "rel".

Modified it

> Besides, I think it might be better not to report an error in this case. If
> "prid" is invalid, just ignore this relation. Because in RESET cases, we want 
> to
> drop all tables in the publications, and there is no specific table.
> (If you agree with that, similarly missing_ok should be set to true when 
> calling
> PublicationDropSchemas().)

Ideally this scenario should not happen, but if it happens I felt we
should throw an error in this case.

The v6 patch attached at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm0iZZDB300Dez_97S8G6_RW5QpQ8ef6X3wq8tyK-8wnXQ%40mail.gmail.com

Regards,
Vignesh




Re: Skipping schema changes in publication

2022-05-18 Thread vignesh C
On Mon, May 16, 2022 at 2:53 PM Peter Smith  wrote:
>
> Below are my review comments for v5-0001.
>
> There is some overlap with comments recently posted by Osumi-san [1].
>
> (I also have review comments for v5-0002; will post them tomorrow)
>
> ==
>
> 1. Commit message
>
> This patch adds a new RESET clause to ALTER PUBLICATION which will reset
> the publication to default state which includes resetting the publication
> options, setting ALL TABLES option to false and dropping the relations and
> schemas that are associated with the publication.
>
> SUGGEST
> "to default state" -> "to the default state"
> "ALL TABLES option" -> "ALL TABLES flag"

Modified

> ~~~
>
> 2. doc/src/sgml/ref/alter_publication.sgml
>
> +  
> +   The RESET clause will reset the publication to the
> +   default state which includes resetting the publication options, setting
> +   ALL TABLES option to false and
> +   dropping all relations and schemas that are associated with the 
> publication.
>
>
> "ALL TABLES option" -> "ALL TABLES flag"

Modified

> ~~~
>
> 3. doc/src/sgml/ref/alter_publication.sgml
>
> +   invoking user to be a superuser.  RESET of publication
> +   requires the invoking user to be a superuser. To alter the owner, you must
>
> SUGGESTION
> To RESET a publication requires the invoking user
> to be a superuser.

 I have combined it with the earlier sentence.

> ~~~
>
> 4. src/backend/commands/publicationcmds.c
>
> @@ -53,6 +53,13 @@
>  #include "utils/syscache.h"
>  #include "utils/varlena.h"
>
> +#define PUB_ATION_INSERT_DEFAULT true
> +#define PUB_ACTION_UPDATE_DEFAULT true
> +#define PUB_ACTION_DELETE_DEFAULT true
> +#define PUB_ACTION_TRUNCATE_DEFAULT true
> +#define PUB_VIA_ROOT_DEFAULT false
> +#define PUB_ALL_TABLES_DEFAULT false
>
> 4a.
> Typo: "ATION" -> "ACTION"

Modified

> 4b.
> I think these #defines deserve a 1 line comment.
> e.g.
> /* CREATE PUBLICATION default values for flags and options */

Added comment

> 4c.
> Since the "_DEFAULT" is a common part of all the names, maybe it is
> tidier if it comes first.
> e.g.
> #define PUB_DEFAULT_ACTION_INSERT true
> #define PUB_DEFAULT_ACTION_UPDATE true
> #define PUB_DEFAULT_ACTION_DELETE true
> #define PUB_DEFAULT_ACTION_TRUNCATE true
> #define PUB_DEFAULT_VIA_ROOT false
> #define PUB_DEFAULT_ALL_TABLES false

Modified

The v6 patch attached at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm0iZZDB300Dez_97S8G6_RW5QpQ8ef6X3wq8tyK-8wnXQ%40mail.gmail.com

Regards,
Vignesh




Re: Skipping schema changes in publication

2022-05-18 Thread vignesh C
On Mon, May 16, 2022 at 8:32 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Saturday, May 14, 2022 10:33 PM vignesh C  wrote:
> > Thanks for the comments, the attached v5 patch has the changes for the same.
> > Also I have made the changes for SKIP Table based on the new syntax, the
> > changes for the same are available in
> > v5-0002-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patch.
> Hi,
>
>
> Thank you for updating the patch.
> I'll share few minor review comments on v5-0001.
>
>
> (1) doc/src/sgml/ref/alter_publication.sgml
>
> @@ -73,12 +85,13 @@ ALTER PUBLICATION  class="parameter">name RENAME TO  Adding a table to a publication additionally requires owning that table.
> The ADD ALL TABLES IN SCHEMA and
> SET ALL TABLES IN SCHEMA to a publication requires the
> -   invoking user to be a superuser.  To alter the owner, you must also be a
> -   direct or indirect member of the new owning role. The new owner must have
> -   CREATE privilege on the database.  Also, the new owner
> -   of a FOR ALL TABLES or FOR ALL TABLES IN
> -   SCHEMA publication must be a superuser. However, a superuser can
> -   change the ownership of a publication regardless of these restrictions.
> +   invoking user to be a superuser.  RESET of publication
> +   requires the invoking user to be a superuser. To alter the owner, you must
> ...
>
>
> I suggest to combine the first part of your change with one existing sentence
> before your change, to make our description concise.
>
> FROM:
> "The ADD ALL TABLES IN SCHEMA and
> SET ALL TABLES IN SCHEMA to a publication requires the
> invoking user to be a superuser.  RESET of publication
> requires the invoking user to be a superuser."
>
> TO:
> "The ADD ALL TABLES IN SCHEMA,
> SET ALL TABLES IN SCHEMA to a publication and
> RESET of publication requires the invoking user to be a 
> superuser."

Modified

>
> (2) typo
>
> +++ b/src/backend/commands/publicationcmds.c
> @@ -53,6 +53,13 @@
>  #include "utils/syscache.h"
>  #include "utils/varlena.h"
>
> +#define PUB_ATION_INSERT_DEFAULT true
> +#define PUB_ACTION_UPDATE_DEFAULT true
>
>
> Kindly change
> FROM:
> "PUB_ATION_INSERT_DEFAULT"
> TO:
> "PUB_ACTION_INSERT_DEFAULT"

Modified

>
> (3) src/test/regress/expected/publication.out
>
> +-- Verify that only superuser can reset a publication
> +ALTER PUBLICATION testpub_reset OWNER TO regress_publication_user2;
> +SET ROLE regress_publication_user2;
> +ALTER PUBLICATION testpub_reset RESET; -- fail
>
>
> We have "-- fail" for one case in this patch.
> On the other hand, isn't better to add "-- ok" (or "-- success") for
> other successful statements,
> when we consider the entire tests description consistency ?

We generally do not mention success comments for all the success cases
as that might be an overkill. I felt it is better to keep it as it is.
Thoughts?

The attached v6 patch has the changes for the same.

Regards,
Vignesh
From 72f81a7c56ef764f13949721e227541acf6b719c Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sat, 14 May 2022 13:13:46 +0530
Subject: [PATCH v6 1/2] Add RESET clause to Alter Publication which will reset
 the publication with default values.

This patch adds a new RESET clause to ALTER PUBLICATION which will reset
the publication to the default state which includes resetting the publication
options, setting ALL TABLES flag to false and dropping the relations and
schemas that are associated with the publication.
Usage:
ALTER PUBLICATION pub1 RESET;
---
 doc/src/sgml/ref/alter_publication.sgml   | 38 ++---
 src/backend/commands/publicationcmds.c| 99 +--
 src/backend/parser/gram.y |  9 +++
 src/bin/psql/tab-complete.c   |  2 +-
 src/include/nodes/parsenodes.h|  3 +-
 src/test/regress/expected/publication.out | 69 
 src/test/regress/sql/publication.sql  | 37 +
 7 files changed, 241 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49471..47bd15f1fa 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -27,6 +27,7 @@ ALTER PUBLICATION name DROP name SET ( publication_parameter [= value] [, ... ] )
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
+ALTER PUBLICATION name RESET
 
 where publication_object is one of:
 
@@ -65,20 +66,33 @@ ALTER PUBLICATION name RENAME TO 
 
   
-   The remaining variants change the owner and the name of the publication.
+   The OWNER clause will change the owner of the publication.
+  
+
+  
+   The RENAME clause will change the name of the publication.
+  
+
+  
+   The RESET clause will reset the publication to the
+   default state which includes resetting the publication options, setting
+   ALL TABLES flag to false and
+   dropping all relations and schemas that are associated with the 

Re: support for MERGE

2022-05-18 Thread Tom Lane
Alvaro Herrera  writes:
> I chose the other way :-)

I don't think the committed patch will behave nicely in edge cases:

+   if (es->format == EXPLAIN_FORMAT_TEXT && total > 0)
+   {
+   ExplainIndentText(es);
+   appendStringInfoString(es->str, "Tuples:");
+   if (insert_path > 0)
+   appendStringInfo(es->str, " inserted=%.0f", insert_path);
+   if (update_path > 0)
+   appendStringInfo(es->str, " updated=%.0f", update_path);
+   if (delete_path > 0)
+   appendStringInfo(es->str, " deleted=%.0f", delete_path);
+   if (skipped_path > 0)
+   appendStringInfo(es->str, " skipped=%.0f", skipped_path);
+   appendStringInfoChar(es->str, '\n');
+   }
+   else
+   {
+   ExplainPropertyFloat("Tuples Inserted", NULL, insert_path, 0, 
es);
+   ExplainPropertyFloat("Tuples Updated", NULL, update_path, 0, 
es);
+   ExplainPropertyFloat("Tuples Deleted", NULL, delete_path, 0, 
es);
+   ExplainPropertyFloat("Tuples Skipped", NULL, skipped_path, 0, 
es);
+   }

If it's text format and total is 0, I'd wish it to print nothing,
not to revert to the verbose format.

regards, tom lane




Re: support for MERGE

2022-05-18 Thread Justin Pryzby
On Wed, May 18, 2022 at 06:42:38PM +0200, Alvaro Herrera wrote:
> On 2022-May-11, Justin Pryzby wrote:
> 
> > I suggest to reference the mvcc docs from the merge docs, or to make the 
> > merge
> > docs themselves include the referenced information.
> > 
> > diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
> > index f68aa09736c..99dd5814f36 100644
> > --- a/doc/src/sgml/ref/merge.sgml
> > +++ b/doc/src/sgml/ref/merge.sgml
> > @@ -544,6 +544,7 @@ MERGE  > class="parameter">total_count
> > UPDATE if a concurrent INSERT
> > occurs.  There are a variety of differences and restrictions between
> > the two statement types and they are not interchangeable.
> > +   See  for more information.
> 
> Reading the paragraph, I think it may be better to do it the other way
> around: first mention that concurrency is described in the MVCC page,
> then explain that INSERT ON CONFLICT also exists.  What do you think of
> the attached?

Hmm, it seems odd to put "see also" first.

My original complaint is that 1) merge.sgml doesn't include the detailed
information; but 3) mentions it vaguely without linking to it: "There are a
variety of differences and restrictions between the two statement types and
they are not interchangeable."

I prefer my original, but the most important thing is to include the link at
*somewhere*.

-- 
Justin




Re: Limiting memory allocation

2022-05-18 Thread Stephen Frost
Greetings,

* Jan Wieck (j...@wi3ck.info) wrote:
> On 5/17/22 18:30, Stephen Frost wrote:
> >This isn’t actually a solution though and that’s the problem- you end up
> >using swap but if you use more than “expected” the OOM killer comes in and
> >happily blows you up anyway. Cgroups are containers and exactly what kube
> >is doing.
> 
> Maybe I'm missing something, but what is it that you would actually consider
> a solution? Knowing your current memory consumption doesn't make the need
> for allocating some right now go away. What do you envision the response of
> PostgreSQL to be if we had that information about resource pressure? I don't
> see us using mallopt(3) or malloc_trim(3) anywhere in the code, so I don't
> think any of our processes give back unused heap at this point (please
> correct me if I'm wrong). This means that even if we knew about the memory
> pressure of the system, adjusting things like work_mem on the fly may not do
> much at all, unless there is a constant turnover of backends.
> 
> So what do you propose PostgreSQL's response to high memory pressure to be?

Fail the allocation, just how most PG systems are set up to do.  In such
a case, PG will almost always be able to fail the transaction, free up
the memory used, and continue running *without* ending up with a crash.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: support for MERGE

2022-05-18 Thread Alvaro Herrera
On 2022-May-11, Justin Pryzby wrote:

> I suggest to reference the mvcc docs from the merge docs, or to make the merge
> docs themselves include the referenced information.
> 
> diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
> index f68aa09736c..99dd5814f36 100644
> --- a/doc/src/sgml/ref/merge.sgml
> +++ b/doc/src/sgml/ref/merge.sgml
> @@ -544,6 +544,7 @@ MERGE  class="parameter">total_count
> UPDATE if a concurrent INSERT
> occurs.  There are a variety of differences and restrictions between
> the two statement types and they are not interchangeable.
> +   See  for more information.

Reading the paragraph, I think it may be better to do it the other way
around: first mention that concurrency is described in the MVCC page,
then explain that INSERT ON CONFLICT also exists.  What do you think of
the attached?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 5bad2c5a4bcd2bf04751f980f947c1050aeafd03 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 18 May 2022 18:41:04 +0200
Subject: [PATCH v2] Link to MVCC docs in MERGE docs.

---
 doc/src/sgml/mvcc.sgml  | 2 +-
 doc/src/sgml/ref/merge.sgml | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 341fea524a..4446e1c484 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -425,7 +425,7 @@ COMMIT;

 MERGE allows the user to specify various
 combinations of INSERT, UPDATE
-or DELETE subcommands. A MERGE
+and DELETE subcommands. A MERGE
 command with both INSERT and UPDATE
 subcommands looks similar to INSERT with an
 ON CONFLICT DO UPDATE clause but does not
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
index f68aa09736..6b94c863b5 100644
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -539,6 +539,8 @@ MERGE total_count
   
 
   
+   See  for more details on the behavior of
+   MERGE under concurrency.
You may also wish to consider using INSERT ... ON CONFLICT
as an alternative statement which offers the ability to run an
UPDATE if a concurrent INSERT
-- 
2.30.2



Re: support for MERGE

2022-05-18 Thread Alvaro Herrera
On 2022-May-18, Justin Pryzby wrote:

> On Wed, May 11, 2022 at 11:33:50AM -0500, Justin Pryzby wrote:
> 
> > That's separate from the question about eliding zeros.
> 
> Actually, the existing uses suggest that these *aren't* separate.
> 
> The cases where 0 output is elided (like Heap Blocks and Buffers) uses "=" and
> not ":".  The cases using ":" always show all fields (Sort Method, Buckets,
> Hits, Batches).  I'm not sure which is preferable for MERGE, but here's one
> way.

I chose the other way :-)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: create_help.pl treats as replaceable

2022-05-18 Thread Tom Lane
Peter Eisentraut  writes:
> On 17.05.22 17:09, Tom Lane wrote:
>> Isn't that a documentation bug rather than a problem with create_help?

> Yeah, there is no need for a  inside a .  So I just 
> removed it.

I think you should have upper-cased MATCH while at it, to make it clear
that it acts like a keyword in this context.  The current situation is
quite unreadable in plain-ASCII output:

regression=# \help copy
Command: COPY
...
HEADER [ boolean | match ]
...

Since "boolean" is a metasyntactic variable here, it's absolutely
not obvious that "match" isn't.

regards, tom lane




Re: create_help.pl treats as replaceable

2022-05-18 Thread Peter Eisentraut

On 18.05.22 02:58, Kyotaro Horiguchi wrote:

Oh, agreed. Thanks for the correction. By the way the error message in
defGetCopyHeaderChoice is as follows.

"%s requires a Boolean value or \"match\""

Should it be "%s requires a boolean value or MATCH"?


The documentation of COPY currently appears to use the capitalization

OPTION value

so I left it lower-case.


At least I think "Boolean" should be un-capitalized.


"Boolean" is correct; see  for 
example.





Re: create_help.pl treats as replaceable

2022-05-18 Thread Peter Eisentraut

On 17.05.22 17:09, Tom Lane wrote:

By a quick look it seems to me that the "match" in "COPY.. HEADER
match" is the first and only instance of a literal parameter as of
PG15.

Isn't that a documentation bug rather than a problem with create_help?


Yeah, there is no need for a  inside a .  So I just 
removed it.





Re: Zstandard support for toast compression

2022-05-18 Thread Robert Haas
On Tue, May 17, 2022 at 4:12 PM Stephen Frost  wrote:
> I'm getting a bit of deja-vu here from when I was first trying to add
> TRUNCATE as a GRANT'able option and being told we didn't want to burn
> those precious bits.

Right, it's the same issue ... although in that case there are a lot
more bits available than we have here.

> But, fine, then I'd suggest to Michael that he work on actively solving
> the problem we've now got where we have such a limited number of bits,
> and then come back and add Zstd after that's done.  I disagree that we
> should be pushing back so hard on adding Zstd in general, but if we are
> going to demand that we have a way to support more than these few
> compression options before ever adding any new ones (considering how
> long it's taken Zstd to get to the level it is now, we're talking
> about close to a *decade* from such a new algorithm showing up and
> getting to a similar level of adoption, and then apparently more because
> we don't feel it's 'ready' yet), then let's work towards that and not
> complain when it shows up that it's not needed yet (as I fear would
> happen ... and just leave us unable to make useful progress).

It's kind of ridiculous to talk about "pushing back so hard on adding
Zstd in general" when there's like 2 emails expressing only moderate
skepticism. I clearly said I wasn't 100% against it.

But I want to point out here that you haven't really offered any kind
of argument in favor of supporting Zstd. You basically seem to just be
arguing that it's dumb to worry about running out of bit space, and I
think that's just obviously false. PostgreSQL is full of things that
are hard to improve because nearly all of the bit space was gobbled up
early on, and there's not much left for future features. The heap
tuple header format is an excellent example of this. Surely if we were
designing that over again today we wouldn't have expended some of
those bits on the things we did.

I do understand that Zstandard is a good compression algorithm, and if
we had an extensibility mechanism here where one of the four possible
bit patterns then indicates that the next byte (or two or four) stores
the real algorithm type, then what about adding Zstandard that way
instead of consuming one of our four primary bit patterns? That way
we'd have this option for people who want it, but we'd have more
options for the future instead of fewer.

i.e. something like:

00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte (1 of 256 possible
values reserved for Zstandard)

I wouldn't be worried about getting backed into a corner with that approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Valgrind mem-check for postgres extension

2022-05-18 Thread Andrew Dunstan


On 2022-05-18 We 01:12, Tom Lane wrote:
> Natarajan R  writes:
>> I have few doubts in here,
>> 1. When I run with *--leak-check=full*, I get memory leaks for postgres
>> functions under possibly or definitely lost categories.. Is this expected?
> Maybe ... you did not show your test case, so it's hard to say.  But it
> could well be that this is an artifact of failing to define USE_VALGRIND.
>
>> 2. Is there any other way to test my extension memory leaks alone, because
>> combining with postgres leaks is making instrumentation complex?..
> No, not really.
>
>> 3. I have seen some macros for valgrind support within postgres source code
>> under utils/memdebug.h, but couldn't get complete idea of using it from the
>> comments in pg_config_manual.h under *USE_VALGRIND *macro, pls provide some
>> guidance here..
> If you didn't build the core code with USE_VALGRIND defined, then none of
> this stuff is going to work ideally.
>
> The way I like to do it is to run configure, and then manually add
> "#define USE_VALGRIND" to the generated src/include/pg_config.h
> file before invoking "make".  Probably other people have different
> habits.


The standard buildfarm config uses these for valgrind builds:

    CFLAGS   => "-fno-omit-frame-pointer -O0 -fPIC",
    CPPFLAGS => "-DUSE_VALGRIND  -DRELCACHE_FORCE_RELEASE",

cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Limiting memory allocation

2022-05-18 Thread Jan Wieck

On 5/18/22 11:11, Alvaro Herrera wrote:

On 2022-May-18, Jan Wieck wrote:


Maybe I'm missing something, but what is it that you would actually consider
a solution? Knowing your current memory consumption doesn't make the need
for allocating some right now go away. What do you envision the response of
PostgreSQL to be if we had that information about resource pressure?


What was mentioned in the talk where this issue was presented, is that
people would like malloc() to return NULL when there's memory pressure,
even if Linux has been configured indicating that memory overcommit is
OK.  The reason they can't set overcommit off is that it prevents other
services in the same system from running properly.


Thank you Alvaro, that was the missing piece. Now I understand what we 
are trying to do.



As I understand, setrlimit() sets the memory limit for any single
process.  But that isn't useful -- the limit needed is for the whole set
of processes under postmaster.  Limiting any individual process does no
good.

Now that's where cgroup's memory limiting features would prove useful,
if they weren't totally braindead:
https://www.kernel.org/doc/Documentation/cgroup-v2.txt
Apparently, if the cgroup goes over the "high" limit, the processes are
*throttled*.  Then if the group goes over the "max" limit, OOM-killer is
invoked.

(I can't see any way to make this even more counterproductive to the
database use case.  Making the database work more slowly doesn't fix
anything.)

So ditch cgroups.


Agreed.


What they (Timescale) do, is have a LD_PRELOAD library that checks
status of memory pressure, and return NULL from malloc().  This then
leads to clean abort of transactions and all is well.  There's nothing
that Postgres needs to do different than today.

I suppose that what they would like, is a way to inquire into the memory
pressure status at MemoryContextAlloc() time and return NULL if it is
too high.  How exactly this would work is unclear to me; maybe one
process keeps an eye on it in an OS-specific manner, and if it does get
near the maximum, set a bit in shared memory that other processes can
examine when MemoryContextAlloc is called.  It doesn't have to be
exactly accurate; an approximation is probably okay.


Correct, it doesn't have to be accurate. Something /proc based setting a 
flag in shared memory WOULD be good enough, IF MemoryContextAlloc() had 
some way of figuring out that its process is actually the right one to 
abort.


On a high transaction throughput system, having such a background 
process being the only one setting and clearing a flag in shared memory 
could prove disastrous. Let it check and set/clear the flag every second 
... the whole system would throw malloc(3) failures for a whole second 
on every session. Not the system I would like to benchmark ... although 
the result charts would look hilarious.


However, once we are under memory pressure to the point of aborting 
transactions, it may be reasonable to have MemoryContextAlloc() calls 
work through a queue and return NULL one by one until the pressure is 
low enough again.


I'll roll this problem around in my head for a little longer. There 
certainly is a way to do this a bit more intelligent.



Thanks again, Jan




Re: Limiting memory allocation

2022-05-18 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > On Tue, May 17, 2022 at 18:12 Tom Lane  wrote:
> >> ulimit might be interesting to check into as well.  The last time I
> >> looked, it wasn't too helpful for this on Linux, but that was years ago.
> 
> > Unfortunately I really don’t think anything here has materially changed in
> > a way which would help us.  This would also apply across all of PG’s
> > processes and I would think it’d be nice to differentiate between user
> > backends running away and sucking up a ton of memory vs backend processes
> > that shouldn’t be constrained in this way.
> 
> It may well be that they've not fixed its shortcomings, but the claim
> that it couldn't be applied selectively is nonsense.  See setrlimit(2),
> which we already use successfully (AFAIK) to set stack space on a
> per-process basis.

Yeah, that thought was quite properly formed, sorry for the confusion.

That it's per-process is actually the issue, unless we were to split
up what we're given evenly across max_connections or such, which might
work but would surely end up wasting an unfortunate amount of memory.

Consider:

shared_buffers = 8G
max_memory = 8G
max_connections = 1000 (for easy math)

With setrlimit(2), we could at process start of all user backends set
RLIMIT_AS to 8G + 8G/1000 (8M) + some fudge for code, stack, etc, 
meaning each process would only be allowed about 8M of memory for
work space, even though there's perhaps only 10 processes running,
resulting in over 7G of memory that PG should be able to use, but isn't.

Maybe we could do some tracking of per-process actual memory usage of
already running processes and consider that when starting new ones and
even allow processes to change their limit if they hit it, depending on
what else is going on in the system, but I'm really not sure that all of
this would end up being that much more efficient than just directly
tracking allocations and failing when we hit them ourselves, and it sure
seems like it'd be a lot more complicated.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Limiting memory allocation

2022-05-18 Thread Alvaro Herrera
On 2022-May-18, Jan Wieck wrote:

> Maybe I'm missing something, but what is it that you would actually consider
> a solution? Knowing your current memory consumption doesn't make the need
> for allocating some right now go away. What do you envision the response of
> PostgreSQL to be if we had that information about resource pressure?

What was mentioned in the talk where this issue was presented, is that
people would like malloc() to return NULL when there's memory pressure,
even if Linux has been configured indicating that memory overcommit is
OK.  The reason they can't set overcommit off is that it prevents other
services in the same system from running properly.

As I understand, setrlimit() sets the memory limit for any single
process.  But that isn't useful -- the limit needed is for the whole set
of processes under postmaster.  Limiting any individual process does no
good.

Now that's where cgroup's memory limiting features would prove useful,
if they weren't totally braindead:
https://www.kernel.org/doc/Documentation/cgroup-v2.txt
Apparently, if the cgroup goes over the "high" limit, the processes are
*throttled*.  Then if the group goes over the "max" limit, OOM-killer is
invoked.

(I can't see any way to make this even more counterproductive to the
database use case.  Making the database work more slowly doesn't fix
anything.)

So ditch cgroups.


What they (Timescale) do, is have a LD_PRELOAD library that checks
status of memory pressure, and return NULL from malloc().  This then
leads to clean abort of transactions and all is well.  There's nothing
that Postgres needs to do different than today.

I suppose that what they would like, is a way to inquire into the memory
pressure status at MemoryContextAlloc() time and return NULL if it is
too high.  How exactly this would work is unclear to me; maybe one
process keeps an eye on it in an OS-specific manner, and if it does get
near the maximum, set a bit in shared memory that other processes can
examine when MemoryContextAlloc is called.  It doesn't have to be
exactly accurate; an approximation is probably okay.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Limiting memory allocation

2022-05-18 Thread Ronan Dunklau
Le mercredi 18 mai 2022, 16:23:34 CEST Jan Wieck a écrit :
> On 5/17/22 18:30, Stephen Frost wrote:
> > Greetings,
> > 
> > On Tue, May 17, 2022 at 18:12 Tom Lane  > 
> > > wrote:
> > Jan Wieck mailto:j...@wi3ck.info>> writes:
> >  > On 5/17/22 15:42, Stephen Frost wrote:
> >  >> Thoughts?
> >  > 
> >  > Using cgroups one can actually force a certain process (or user, or
> >  > service) to use swap if and when that service is using more
> > 
> > memory than
> > 
> >  > it was "expected" to use.
> > 
> > I wonder if we shouldn't just provide documentation pointing to
> > OS-level
> > facilities like that one.  The kernel has a pretty trivial way to
> > check
> > the total memory used by a process.  We don't: it'd require tracking
> > total
> > space used in all our memory contexts, and then extracting some
> > number out
> > of our rear ends for allocations made directly from malloc.  In short,
> > anything we do here will be slow and unreliable, unless you want to
> > depend
> > on platform-specific things like looking at /proc/self/maps.
> > 
> > This isn’t actually a solution though and that’s the problem- you end up
> > using swap but if you use more than “expected” the OOM killer comes in
> > and happily blows you up anyway. Cgroups are containers and exactly what
> > kube is doing.
> 
> Maybe I'm missing something, but what is it that you would actually
> consider a solution? Knowing your current memory consumption doesn't
> make the need for allocating some right now go away. What do you
> envision the response of PostgreSQL to be if we had that information
> about resource pressure? I don't see us using mallopt(3) or
> malloc_trim(3) anywhere in the code, so I don't think any of our
> processes give back unused heap at this point (please correct me if I'm
> wrong). This means that even if we knew about the memory pressure of the
> system, adjusting things like work_mem on the fly may not do much at
> all, unless there is a constant turnover of backends.

I'm not sure I understand your point: when we free() a pointer, malloc is 
allowed to release the corresponding memory to the kernel. In the case of 
glibc, it doesn't necessarily do so, but it trims the top of the heap if it is 
in excess of M_TRIM_THRESHOLD. In the default glibc configuration, this 
parameter is dynamically adjusted by mmap itself, to a maximum value of 64MB 
IIRC. So any memory freed on the top of the heap totalling more than that 
threshold ends up actually freed.

In another thread, I proposed to take control over this tuning instead of 
letting malloc do it itself, as we may have better knowledge of the memory 
allocations pattern than what malloc empirically discovers: in particular, we 
could lower work_mem, adjust the threshold and maybe even call malloc_trim 
ourselves when work_mem is lowered, to reduce the padding we may keep.

> 
> So what do you propose PostgreSQL's response to high memory pressure to be?
> 
> 
> Regards, Jan


-- 
Ronan Dunklau






Re: Limiting memory allocation

2022-05-18 Thread Jan Wieck

On 5/17/22 18:30, Stephen Frost wrote:

Greetings,

On Tue, May 17, 2022 at 18:12 Tom Lane > wrote:


Jan Wieck mailto:j...@wi3ck.info>> writes:
 > On 5/17/22 15:42, Stephen Frost wrote:
 >> Thoughts?

 > Using cgroups one can actually force a certain process (or user, or
 > service) to use swap if and when that service is using more
memory than
 > it was "expected" to use.

I wonder if we shouldn't just provide documentation pointing to OS-level
facilities like that one.  The kernel has a pretty trivial way to check
the total memory used by a process.  We don't: it'd require tracking
total
space used in all our memory contexts, and then extracting some
number out
of our rear ends for allocations made directly from malloc.  In short,
anything we do here will be slow and unreliable, unless you want to
depend
on platform-specific things like looking at /proc/self/maps.


This isn’t actually a solution though and that’s the problem- you end up 
using swap but if you use more than “expected” the OOM killer comes in 
and happily blows you up anyway. Cgroups are containers and exactly what 
kube is doing.


Maybe I'm missing something, but what is it that you would actually 
consider a solution? Knowing your current memory consumption doesn't 
make the need for allocating some right now go away. What do you 
envision the response of PostgreSQL to be if we had that information 
about resource pressure? I don't see us using mallopt(3) or 
malloc_trim(3) anywhere in the code, so I don't think any of our 
processes give back unused heap at this point (please correct me if I'm 
wrong). This means that even if we knew about the memory pressure of the 
system, adjusting things like work_mem on the fly may not do much at 
all, unless there is a constant turnover of backends.


So what do you propose PostgreSQL's response to high memory pressure to be?


Regards, Jan




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Ranier Vilela
Em qua., 18 de mai. de 2022 às 10:52, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> escreveu:

> On 18.05.22 01:18, Justin Pryzby wrote:
> > Take the first one as an example.  It says:
> >
> >  GenericCosts costs;
> >  MemSet(, 0, sizeof(costs));
> >
> > You sent a patch to change it to sizeof(GenericCosts).
> >
> > But it's not a pointer, so they are the same.
>
> This instance can more easily be written as
>
>  costs = {0};
>
That would initialize the content at compilation and not at runtime,
correct?
And we would avoid MemSet/memset altogether.

There are a lot of cases using MemSet (with struct variables) and at
Windows 64 bits, long are 4 (four) bytes.
So I believe that MemSet is less efficient on Windows than on Linux.
"The size of the '_vstart' buffer is not a multiple of the element size of
the type 'long'."
message from PVS-Studio static analysis tool.

regards,
Ranier Vilela


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Peter Eisentraut

On 18.05.22 01:18, Justin Pryzby wrote:

Take the first one as an example.  It says:

 GenericCosts costs;
 MemSet(, 0, sizeof(costs));

You sent a patch to change it to sizeof(GenericCosts).

But it's not a pointer, so they are the same.


This instance can more easily be written as

costs = {0};




Re: Minor improvements to test log navigability

2022-05-18 Thread Peter Eisentraut

On 16.05.22 01:01, Thomas Munro wrote:

2.  The TAP test logs are strangely named.  Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?


I'm in favor of a saner name, but wouldn't something.log be confusable 
with a server log?  Maybe something.out would be clearer.  Or 
something_output.log, if the .log suffix is somehow desirable.






Re: support for MERGE

2022-05-18 Thread Justin Pryzby
On Wed, May 11, 2022 at 11:33:50AM -0500, Justin Pryzby wrote:
> I suggest to reference the mvcc docs from the merge docs, or to make the merge
> docs themselves include the referenced information.

No comments here ?

> Also, EXPLAIN output currently looks like this:
> 
> | Merge on ex_mtarget t (actual rows=0 loops=1)
> |   Tuples Inserted: 0
> |   Tuples Updated: 50
> |   Tuples Deleted: 0
> |   Tuples Skipped: 0
> 
> Should the "zero" rows be elided from the text output ?
> And/or, should it use a more compact output format ?
> 
> There are two output formats already in use, so the options would look like
> this:
> 
>Tuples: Inserted: 1  Updated: 2  Deleted: 3  Skipped: 4
> or
>Tuples: inserted=1 updated=2 deleted=3 skipped=4
> 
> Note double spaces and capitals.
> That's separate from the question about eliding zeros.

Actually, the existing uses suggest that these *aren't* separate.

The cases where 0 output is elided (like Heap Blocks and Buffers) uses "=" and
not ":".  The cases using ":" always show all fields (Sort Method, Buckets,
Hits, Batches).  I'm not sure which is preferable for MERGE, but here's one
way.

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b902ef30c87..491105263ff 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -4119,10 +4119,20 @@ show_modifytable_info(ModifyTableState *mtstate, List 
*ancestors,
skipped_path = total - insert_path - update_path - 
delete_path;
Assert(skipped_path >= 0);
 
-   ExplainPropertyFloat("Tuples Inserted", NULL, 
insert_path, 0, es);
-   ExplainPropertyFloat("Tuples Updated", NULL, 
update_path, 0, es);
-   ExplainPropertyFloat("Tuples Deleted", NULL, 
delete_path, 0, es);
-   ExplainPropertyFloat("Tuples Skipped", NULL, 
skipped_path, 0, es);
+   if (es->format == EXPLAIN_FORMAT_TEXT)
+   {
+   ExplainIndentText(es);
+   appendStringInfo(es->str,
+   "Tuples Inserted: %.0f  
Updated: %.0f  Deleted: %.0f  Skipped: %.0f\n",
+   insert_path, update_path, 
delete_path, skipped_path);
+   }
+   else
+   {
+   ExplainPropertyFloat("Tuples Inserted", NULL, 
insert_path, 0, es);
+   ExplainPropertyFloat("Tuples Updated", NULL, 
update_path, 0, es);
+   ExplainPropertyFloat("Tuples Deleted", NULL, 
delete_path, 0, es);
+   ExplainPropertyFloat("Tuples Skipped", NULL, 
skipped_path, 0, es);
+   }
}
}
 




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Ranier Vilela
Em qua., 18 de mai. de 2022 às 05:54, Alvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> This one caught my attention:
>
> diff --git a/contrib/pgcrypto/crypt-blowfish.c
> b/contrib/pgcrypto/crypt-blowfish.c
> index a663852ccf..63fcef562d 100644
> --- a/contrib/pgcrypto/crypt-blowfish.c
> +++ b/contrib/pgcrypto/crypt-blowfish.c
> @@ -750,7 +750,7 @@ _crypt_blowfish_rn(const char *key, const char
> *setting,
>  /* Overwrite the most obvious sensitive data we have on the stack. Note
>   * that this does not guarantee there's no sensitive data left on the
>   * stack and/or in registers; I'm not aware of portable code that does. */
> -   px_memset(, 0, sizeof(data));
> +   px_memset(, 0, sizeof(struct data));
>
> return output;
>  }
>
> The curious thing here is that sizeof(data) is correct, because it
> refers to a variable defined earlier in that function, whose type is an
> anonymous struct declared there.  But I don't know what "struct data"
> refers to, precisely because that struct is unnamed.  Am I misreading it?
>
 No, you are right.
This is definitely wrong.


>
> Also:
>
> diff --git a/contrib/pgstattuple/pgstatindex.c
> b/contrib/pgstattuple/pgstatindex.c
> index e1048e47ff..87be62f023 100644
> --- a/contrib/pgstattuple/pgstatindex.c
> +++ b/contrib/pgstattuple/pgstatindex.c
> @@ -601,7 +601,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
>  errmsg("cannot access temporary indexes
> of other sessions")));
>
> /* Get the information we need from the metapage. */
> -   memset(, 0, sizeof(stats));
> +   memset(, 0, sizeof(HashIndexStat));
> metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ,
> LH_META_PAGE);
> metap = HashPageGetMeta(BufferGetPage(metabuf));
> stats.version = metap->hashm_version;
>
> I think the working theory here is that the original line is correct
> now, and it continues to be correct if somebody edits the function and
> makes variable 'stats' be of a different type.  But if you change the
> sizeof() to use the type name, then there are two places that you need
> to edit, and they are not necessarily close together; so it is correct
> now and could become a bug in the future.  I don't think we're fully
> consistent about this, but I think you're proposing to change it in the
> opposite direction that we'd prefer.
>
Yes. I think that only advantage using the name of structure is
when you read the line of MemSet, you know what kind type
is filled.


> For the case where the variable is a pointer, the developer could write
> 'sizeof(*variable)' instead of being forced to specify the type name,
> for example (just a random one):
>
Could have used this style to make the patch.
But the intention was to correct a possible misinterpretation,
which in this case, showed that I was totally wrong.

Sorry by the noise.

regards,
Ranier Vilela


Re: Handle infinite recursion in logical replication setup

2022-05-18 Thread Amit Kapila
On Wed, May 18, 2022 at 10:29 AM Amit Kapila  wrote:
>
> On Wed, May 4, 2022 at 12:17 PM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v13 patch has the changes for the 
> > same.
> >
>
> Few comments on v13-0001
> ==
>

Few comments on v13-0002
===
1.
The steps
+ to create a two-node bidirectional replication are given below:
+   

The steps given after this will be successful only when there is no
data in any of the nodes and that is not clear by reading docs.

2.
+  
+   Adding a new node when data is present in the existing nodes
+
+ Adding a new node node3 to the existing
+ node1 and node2 when data is present
+ in existing nodes node1 and node2
+ needs similar steps. The only change required here is that
+ node3 should create a subscription with
+ copy_data = force to one of the existing nodes to
+ receive the existing data during initial data synchronization.
+   

I think the steps for these require the user to lock the required
tables/database (or in some other way hold the operations on required
tables) for node-2 till the time setup is complete, otherwise, node-3
might miss some data. It seems the same is also missing in the
section: "Adding a new node when data is present in the new node".

3.
+
+  
+   Adding a new node when data is present in the new node
...
...

+   
+Create a subscription in node1 to subscribe to
+node3. Use copy_data specified as
+force so that the existing table data is copied during
+initial sync:
+
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = force, local_only = on);
+CREATE SUBSCRIPTION
+
+
+   
+Create a subscription in node2 to subscribe to
+node3. Use copy_data specified as
+force so that the existing table data is copied during
+initial sync:
+
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = force, local_only = on);
+CREATE SUBSCRIPTION
+

Why do we need to use "copy_data = force" here? AFAIU, unless, we
create any subscription on node-3, we don't need the 'force' option.

4. We should have a generic section to explain how users can add a new
node using the new options to the existing set of nodes in all cases.
For example, say when the existing set of nodes has some data and the
new node also has some pre-existing data. I think the basic steps are
something like: a. create a required publication(s) on the new node.
(b) create subscriptions on existing nodes pointing to publication on
the new node with the local_only option as true and copy_data = on.
(c) wait for data to be copied from the new node to existing nodes.
(d) Truncate the data on the new node. (e) create subscriptions
corresponding to each of the existing publisher nodes on the new node
with local_only as true and copy_data = force for one of the nodes.
(f) One needs to ensure that there is no new activity on required
tables/database (either by locking or in some other way) in all the
nodes for which copy_data option is kept as false while creating
subscriptions in the previous step to avoid any data loss.

We also need to mention in Notes that as all operations are not
transactional, user is advised to take backup of existing data to
avoid any inconsistency.

5.
 * It is quite possible that subscriber has not yet pulled data to
+ * the tables, but in ideal cases the table data will be subscribed.
+ * To keep the code simple it is not checked if the subscriber table
+ * has pulled the data or not.
+ */
+ if (copydata == COPY_DATA_ON && local_only && !slot_attisnull(slot, 3))

Sorry, but I don't understand what you intend to say by the above
comment. Can you please explain?

6. I feel we can explain the case with only two nodes in the commit
message, why do we need to use a three-node case?

-- 
With Regards,
Amit Kapila.




Re: Privileges on PUBLICATION

2022-05-18 Thread Antonin Houska
Euler Taveira  wrote:

> On Fri, May 13, 2022, at 3:36 AM, Antonin Houska wrote:
> 
>  Attached is my proposal. It tries to be more specific and does not mention 
> the
>  absence of the privileges explicitly.
> 
> You explained the current issue but say nothing about the limitation. This
> information will trigger a question possibly in one of the MLs. IMO if you say
> something like the sentence above at the end, it will make it clear why that
> setup expose all data (there is no access control to publications) and
> explicitly say there is a TODO here.
> 
> Additional privileges might be added to control access to table data in a
> future version of PostgreSQL.

I thought it sound too negative if absence of some feature was mentioned
explicitly. However it makes sense to be clear from technical point of view.

> I also wouldn't use the warning tag because it fits in the same category as 
> the
> other restrictions listed in the page.

ok, please see the next version.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 1a828e8d2ff..259fe20a148 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -112,6 +112,17 @@ CREATE PUBLICATION name
   Specifying a table that is part of a schema specified by
   FOR ALL TABLES IN SCHEMA is not supported.
  
+
+ 
+  Note that there are currently no privileges on publication, and that any
+  subscriber can access any publication. Thus if you're trying to hide
+  some information from particular subscribers (by using the
+  WHERE clause or the column list, or by not adding the
+  whole table to the publication), please be aware that other publications
+  can expose the same information. Publication privileges might be added
+  to PostgreSQL in the future to allow for
+  fine-grained access control.
+ 
 

 


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-05-18 Thread Michael Paquier
On Wed, May 18, 2022 at 01:03:15AM -0700, Noah Misch wrote:
> On Mon, May 16, 2022 at 02:30:00PM +0900, Michael Paquier wrote:
>> Because the shape of the new names does not change the test coverage
>> ("regression" prefix or the addition of the double quotes with
>> backslashes for all the database names), while keeping the code a bit
>> simpler.  If you think that the older names are more adapted, I have
>> no objections to use them, FWIW, which is something like the patch
>> attached would achieve.
>> 
>> This uses the same convention as vcregress.pl before 322becb, but not
>> the one of test.sh where "regression" was appended to the database
>> names.
> 
> I would have picked the test.sh names, both because test.sh was the senior
> implementation and because doing so avoids warnings under
> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS.  See the warnings here:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=longfin=2022-05-18%2000%3A59%3A35=pg_upgrade-check

Yes, I saw that.  This did not bother me much as the TAP tests run in
isolation, but I am fine to stick to your option and silence these.

> More-notable line from that same log:
> sh: 
> /Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pg_upgrade/../../../src/test/regress/pg_regress--port=5678:
>  No such file or directory

So you are using EXTRA_REGRESS_OPTS, then, and a space is missing from
the first argument of the command used to make that work properly.

> Commit 7dd3ee5 adopted much of the 027_stream_regress.pl approach to running
> pg_regress, but it didn't grab the "is($rc, 0, 'regression tests pass')"
> needed to make defects like that report a failure.

Okay, added this one.

>> +generate_db($oldnode, "\\\"\\", 1,  45,  "\"\\");
>> +generate_db($oldnode, '',   46, 90,  '');
>> +generate_db($oldnode, '',   91, 127, '');
> 
> Does this pass on Windows?  I'm 65% confident that released IPC::Run can't
> handle this input due to https://github.com/toddr/IPC-Run/issues/142.  If it's
> passing for you on Windows, then disregard.

Hmm.  The CI has been passing for me with this name pattern in place,
as of https://github.com/michaelpq/postgres/tree/upgrade_tap_fixes.

Attached is an updated patch to address your concerns.
--
Michael
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8372a85e6e..86fe1b4d09 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,18 +13,16 @@ use Test::More;
 # Generate a database with a name made of a range of ASCII characters.
 sub generate_db
 {
-	my ($node, $from_char, $to_char) = @_;
+	my ($node, $prefix, $from_char, $to_char, $suffix) = @_;
 
-	my $dbname = '';
+	my $dbname = $prefix;
 	for my $i ($from_char .. $to_char)
 	{
 		next if $i == 7 || $i == 10 || $i == 13;# skip BEL, LF, and CR
 		$dbname = $dbname . sprintf('%c', $i);
 	}
 
-	# Exercise backslashes adjacent to double quotes, a Windows special
-	# case.
-	$dbname = '\\"\\' . $dbname . '"\\';
+	$dbname .= $suffix;
 	$node->command_ok([ 'createdb', $dbname ]);
 }
 
@@ -79,10 +77,12 @@ else
 {
 	# Default is to use pg_regress to set up the old instance.
 
-	# Create databases with names covering most ASCII bytes
-	generate_db($oldnode, 1,  45);
-	generate_db($oldnode, 46, 90);
-	generate_db($oldnode, 91, 127);
+	# Create databases with names covering most ASCII bytes.  The
+	# first name exercises backslashes adjacent to double quotes, a
+	# Windows special case.
+	generate_db($oldnode, 'regression\\"\\', 1,  45,   '"\\');
+	generate_db($oldnode, 'regression',  46, 90,  '');
+	generate_db($oldnode, 'regression',  91, 127, '');
 
 	# Grab any regression options that may be passed down by caller.
 	my $extra_opts = $ENV{EXTRA_REGRESS_OPTS} || "";
@@ -99,7 +99,7 @@ else
 
 	my $rc =
 	  system($ENV{PG_REGRESS}
-		  . "$extra_opts "
+		  . " $extra_opts "
 		  . "--dlpath=\"$dlpath\" "
 		  . "--bindir= "
 		  . "--host="
@@ -121,6 +121,7 @@ else
 			print "=== EOF ===\n";
 		}
 	}
+	is($rc, 0, 'regression tests pass');
 }
 
 # Before dumping, get rid of objects not existing or not supported in later


signature.asc
Description: PGP signature


Re: Privileges on PUBLICATION

2022-05-18 Thread Antonin Houska
Antonin Houska  wrote:

> Euler Taveira  wrote:
> 
> > On Tue, May 10, 2022, at 5:37 AM, Antonin Houska wrote:
> > 
> >  My understanding is that the rows/columns filtering is a way for the
> >  *publisher* to control which data is available to particular replica. From
> >  this point of view, the publication privileges would just make the control
> >  complete.
> > 
> > I agree. IMO it is a new feature. We already require high privilege for 
> > logical
> > replication. Hence, we expect the replication user to have access to all 
> > data.
> > Unfortunately, nobody mentioned about this requirement during the row 
> > filter /
> > column list development; someone could have written a patch for GRANT ... ON
> > PUBLICATION.
> 
> I can try that for PG 16, unless someone is already working on it.

The patch is attached to this message.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From 4004260cf1f3443455b7db89a70a50e38d1663b0 Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Wed, 18 May 2022 10:58:42 +0200
Subject: [PATCH] Draft implementation of USAGE privilege on PUBLICATION.

This privilege seems to be useful for databases from which multiple
subscribers replicate the data. It should not be assumed that any publication
can be exposed to any subscription.
---
 doc/src/sgml/ddl.sgml   |   7 +
 doc/src/sgml/logical-replication.sgml   |   4 +-
 doc/src/sgml/ref/grant.sgml |   7 +-
 doc/src/sgml/ref/revoke.sgml|   7 +
 src/backend/catalog/aclchk.c| 206 
 src/backend/commands/copyto.c   | 114 +++
 src/backend/commands/publicationcmds.c  |   2 +
 src/backend/parser/gram.y   |   8 +
 src/backend/replication/pgoutput/pgoutput.c |  13 +-
 src/backend/utils/adt/acl.c |   7 +
 src/bin/pg_dump/dumputils.c |   2 +
 src/bin/pg_dump/pg_dump.c   |  28 ++-
 src/bin/pg_dump/pg_dump.h   |   1 +
 src/bin/psql/tab-complete.c |   3 +
 src/include/catalog/pg_publication.h|   6 +
 src/include/nodes/parsenodes.h  |   4 +-
 src/include/utils/acl.h |   4 +
 17 files changed, 413 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index f2ac1ba0034..dc499c50756 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1935,6 +1935,13 @@ REVOKE ALL ON accounts FROM PUBLIC;
statements that have previously performed this lookup, so this is not
a completely secure way to prevent object access.
   
+  
+   For publications, allows logical replication via particular
+   publication. The user specified in
+   the CREATE
+   SUBSCRIPTION command must have this privilege on all
+   publications listed in that command.
+  
   
For sequences, allows use of the
currval and nextval functions.
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71d61b..63194b34541 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1156,7 +1156,9 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
   
In order to be able to copy the initial table data, the role used for the
replication connection must have the SELECT privilege on
-   a published table (or be a superuser).
+   a published table (or be a superuser). In addition, the role must have
+   the USAGE privilege on all the publications referenced
+   by particular subscription.
   
 
   
diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index f744b05b55d..95b9c46137f 100644
--- a/doc/src/sgml/ref/grant.sgml
+++ b/doc/src/sgml/ref/grant.sgml
@@ -82,6 +82,11 @@ GRANT { { SET | ALTER SYSTEM } [, ... ] | ALL [ PRIVILEGES ] }
 TO role_specification [, ...] [ WITH GRANT OPTION ]
 [ GRANTED BY role_specification ]
 
+GRANT { USAGE | ALL [ PRIVILEGES ] }
+ON PUBLICATION pub_name [, ...]
+TO role_specification [, ...] [ WITH GRANT OPTION ]
+[ GRANTED BY role_specification ]
+
 GRANT { { CREATE | USAGE } [, ...] | ALL [ PRIVILEGES ] }
 ON SCHEMA schema_name [, ...]
 TO role_specification [, ...] [ WITH GRANT OPTION ]
@@ -460,7 +465,7 @@ GRANT admins TO joe;

 

-Privileges on databases, tablespaces, schemas, languages, and
+Privileges on databases, tablespaces, schemas, languages, publications and
 configuration parameters are
 PostgreSQL extensions.

diff --git a/doc/src/sgml/ref/revoke.sgml b/doc/src/sgml/ref/revoke.sgml
index 62f19710369..c21ea4c9c69 100644
--- a/doc/src/sgml/ref/revoke.sgml
+++ b/doc/src/sgml/ref/revoke.sgml
@@ -104,6 +104,13 @@ REVOKE [ GRANT OPTION FOR ]
 [ GRANTED BY role_specification ]
 [ CASCADE | RESTRICT ]
 
+REVOKE [ GRANT OPTION FOR ]
+{ USAGE | ALL [ PRIVILEGES ] }
+ON 

Re: Remove support for Visual Studio 2013

2022-05-18 Thread Michael Paquier
On Wed, May 18, 2022 at 10:06:50AM +0200, Juan José Santamaría Flecha wrote:
> Right now we are ifdefing that code out for MinGW, so it's not a visible
> issue, but it'll be when we do.

OK.  Thanks, got it.
--
Michael


signature.asc
Description: PGP signature


Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-05-18 Thread Alvaro Herrera
This one caught my attention:

diff --git a/contrib/pgcrypto/crypt-blowfish.c 
b/contrib/pgcrypto/crypt-blowfish.c
index a663852ccf..63fcef562d 100644
--- a/contrib/pgcrypto/crypt-blowfish.c
+++ b/contrib/pgcrypto/crypt-blowfish.c
@@ -750,7 +750,7 @@ _crypt_blowfish_rn(const char *key, const char *setting,
 /* Overwrite the most obvious sensitive data we have on the stack. Note
  * that this does not guarantee there's no sensitive data left on the
  * stack and/or in registers; I'm not aware of portable code that does. */
-   px_memset(, 0, sizeof(data));
+   px_memset(, 0, sizeof(struct data));
 
return output;
 }

The curious thing here is that sizeof(data) is correct, because it
refers to a variable defined earlier in that function, whose type is an
anonymous struct declared there.  But I don't know what "struct data"
refers to, precisely because that struct is unnamed.  Am I misreading it?


Also:

diff --git a/contrib/pgstattuple/pgstatindex.c 
b/contrib/pgstattuple/pgstatindex.c
index e1048e47ff..87be62f023 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -601,7 +601,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
 errmsg("cannot access temporary indexes of 
other sessions")));
 
/* Get the information we need from the metapage. */
-   memset(, 0, sizeof(stats));
+   memset(, 0, sizeof(HashIndexStat));
metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
metap = HashPageGetMeta(BufferGetPage(metabuf));
stats.version = metap->hashm_version;

I think the working theory here is that the original line is correct
now, and it continues to be correct if somebody edits the function and
makes variable 'stats' be of a different type.  But if you change the
sizeof() to use the type name, then there are two places that you need
to edit, and they are not necessarily close together; so it is correct
now and could become a bug in the future.  I don't think we're fully
consistent about this, but I think you're proposing to change it in the
opposite direction that we'd prefer.

For the case where the variable is a pointer, the developer could write
'sizeof(*variable)' instead of being forced to specify the type name,
for example (just a random one):

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index a434cf93ef..e92c03686f 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -438,7 +438,7 @@ BloomFillMetapage(Relation index, Page metaPage)
 */
BloomInitPage(metaPage, BLOOM_META);
metadata = BloomPageGetMeta(metaPage);
-   memset(metadata, 0, sizeof(BloomMetaPageData));
+   memset(metadata, 0, sizeof(*metadata));
metadata->magickNumber = BLOOM_MAGICK_NUMBER;
metadata->opts = *opts;
((PageHeader) metaPage)->pd_lower += sizeof(BloomMetaPageData);

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




RE: Data is copied twice when specifying both child and parent table in publication

2022-05-18 Thread wangw.f...@fujitsu.com
On Wed, May 18, 2022 4:38 PM I wrote:
> Attach the patches.(Only changed the patch for HEAD.)
Sorry, I forgot to update commit message.

Attach the new patch.
1. Only update the commit message for HEAD_v5.

Regards,
Wang wei


HEAD_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch
Description:  HEAD_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch


REL14_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch
Description:  REL14_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch


RE: Data is copied twice when specifying both child and parent table in publication

2022-05-18 Thread wangw.f...@fujitsu.com
On Fri, May 13, 2022 10:57 PM Osumi, Takamichi/大墨 昂道 
 wrote:
> On Friday, May 13, 2022 6:42 PM Wang, Wei/王 威 
> wrote:
> > Attach the patches.(Only changed the patch for HEAD.).
> > 1. Optimize the code. Reduce calls to function filter_partitions.
> > [suggestions by Amit-san] 2. Improve the alias name in SQL. [suggestions by
> Amit-san] 3.
> > Improve coding alignments. [suggestions by Amit-san] 4. Do some
> > optimizations for list Concatenate.
> Hi, thank you for updating the patch.
> 
> 
> I have one minor comment on fetch_table_list() in HEAD v4.
Thanks for your comments.

> @@ -1759,17 +1759,22 @@ static List *
>  fetch_table_list(WalReceiverConn *wrconn, List *publications)  {
> WalRcvExecResult *res;
> -   StringInfoData cmd;
> +   StringInfoData cmd,
> +   pub_names;
> TupleTableSlot *slot;
> Oid tableRow[2] = {TEXTOID, TEXTOID};
> List   *tablelist = NIL;
> 
> +   initStringInfo(_names);
> +   get_publications_str(publications, _names, true);
> +
> 
> Kindly free the pub_names's data along with the cmd.data.
Improve it according to your suggestion. Free 'pub_names.data'.

I also made some other changes.
Kindly have a look at new patch shared in [1].

[1] 
https://www.postgresql.org/message-id/OS3PR01MB6275B26B6BDF23651B8CE86D9ED19%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei


RE: Data is copied twice when specifying both child and parent table in publication

2022-05-18 Thread wangw.f...@fujitsu.com
On Tues, May 17, 2022 9:03 PM Amit Kapila  wrote:
> On Fri, May 13, 2022 at 3:11 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > Attach the patches.(Only changed the patch for HEAD.).
> >
Thanks for your comments.

>  # publications
> -{ oid => '6119', descr => 'get OIDs of tables in a publication',
> +{ oid => '6119', descr => 'get OIDs of tables in one or more publications',
>proname => 'pg_get_publication_tables', prorows => '1000', proretset => 
> 't',
> -  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
> -  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
> +  provolatile => 's', prorettype => 'oid', proargtypes => 'any',
> +  proallargtypes => '{any,oid}', proargmodes => '{i,o}',
> 
> Won't our use case (input one or more publication names) requires the
> parameter type to be 'VARIADIC text[]' instead of 'any'? I might be
> missing something here so please let me know your reason to change the
> type to 'any' from 'text'?
Yes, you are right. I improve the approach according to your suggestion.
I didn't notice the field "provariadic" in pg_proc before. And now I found we
could change the type of input from text to variadic text by specifying fields
"provariadic" and specifying 'v' in "proargmodes".
I also make corresponding changes to the processing of the input in function
pg_get_publication_tables.

BTW, in previous patch HEAD_v4, when invoke function GetPublicationByName in
function pg_get_publication_tables, I changed the second input from "false" to
"true". I changed this because when we invoke function
pg_get_publication_tables in the query in function fetch_table_list, if the
publication does not exist, it will error.
But this change will affect the compatibility of function
pg_get_publication_tables. So I revert this change in HEAD_v5, and filter the
publications that do not exist by the query in function fetch_table_list.

Attach the patches.(Only changed the patch for HEAD.)
1. Improve the approach to modify the input type of the function
   pg_get_publication_tables. [suggestions by Amit-san]
2. Free allocated memory in function fetch_table_list. [suggestions by 
Osumi-san]
3. Improve the approach of the handling of non-existing publications.

BTW, I rename the patch for REL14
from
"REL14_v4-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch"
to
"REL14_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch".
Just for the version doesn't mess up between two branches and for cfbot.

Regards,
Wang wei


HEAD_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch
Description:  HEAD_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch


REL14_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch
Description:  REL14_v5-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch


Re: [RFC] building postgres with meson -v8

2022-05-18 Thread Peter Eisentraut


Here are some more patches that clean up various minor issues.From bc06fda7198d182dce73f39cfff8e4724e00b12d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 4 May 2022 14:38:02 +0200
Subject: [PATCH 1/7] meson: Put genbki header files back into original order

The order affects the output order in postgres.bki and other files.
This makes it match the order in the makefiles.
---
 src/include/catalog/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
index b2acc3ce30..813a963493 100644
--- a/src/include/catalog/meson.build
+++ b/src/include/catalog/meson.build
@@ -30,7 +30,6 @@ catalog_headers = [
   'pg_conversion.h',
   'pg_depend.h',
   'pg_database.h',
-  'pg_parameter_acl.h',
   'pg_db_role_setting.h',
   'pg_tablespace.h',
   'pg_authid.h',
@@ -54,6 +53,7 @@ catalog_headers = [
   'pg_seclabel.h',
   'pg_shseclabel.h',
   'pg_collation.h',
+  'pg_parameter_acl.h',
   'pg_partitioned_table.h',
   'pg_range.h',
   'pg_transform.h',
-- 
2.35.1

From b4ffaa02f1eb132b011871415c94b84e84fd5245 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 May 2022 09:22:22 +0200
Subject: [PATCH 2/7] meson: Some capitalization adjustments in setup output

This makes the style more uniform with what meson produces by itself.
---
 meson.build | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 7b1c8d47c4..3acae5c3e4 100644
--- a/meson.build
+++ b/meson.build
@@ -2583,10 +2583,10 @@ alias_target('backend', backend_targets)
 if meson.version().version_compare('>=0.57')
 
   summary({
-'Data Block Size' : cdata.get('BLCKSZ'),
-'WAL Block Size' : cdata.get('XLOG_BLCKSZ'),
-'Segment Size' : cdata.get('RELSEG_SIZE')
-}, section: 'Data Layout'
+'data block size' : cdata.get('BLCKSZ'),
+'WAL block size' : cdata.get('XLOG_BLCKSZ'),
+'segment size' : cdata.get('RELSEG_SIZE')
+}, section: 'Data layout'
   )
 
   summary(
@@ -2644,7 +2644,7 @@ if meson.version().version_compare('>=0.57')
   'zlib': zlib,
   'zstd': zstd,
 },
-section: 'External Libraries'
+section: 'External libraries'
   )
 
 endif
-- 
2.35.1

From 23a7e7ffb4db7554b2029ba3026d93f8e32b801f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 May 2022 09:47:47 +0200
Subject: [PATCH 3/7] meson: Formatting tweaks for pg_config.h to match
 autoheader better

---
 meson.build | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/meson.build b/meson.build
index 3acae5c3e4..20dea263a2 100644
--- a/meson.build
+++ b/meson.build
@@ -227,19 +227,14 @@ meson_bin = find_program(meson_binpath, native: true)
 
 cdata.set('USE_ASSERT_CHECKING', get_option('cassert') ? 1 : false)
 
-cdata.set('BLCKSZ', 8192, description: '''
- Size of a disk block --- this also limits the size of a tuple.  You
- can set it bigger if you need bigger tuples (although TOAST should
- reduce the need to have large tuples, since fields can be spread
- across multiple tuples).
-
- BLCKSZ must be a power of 2.  The maximum possible value of BLCKSZ
- is currently 2^15 (32768).  This is determined by the 15-bit widths
- of the lp_off and lp_len fields in ItemIdData (see
- include/storage/itemid.h).
-
- Changing BLCKSZ requires an initdb.
-''')
+cdata.set('BLCKSZ', 8192, description:
+'''Size of a disk block --- this also limits the size of a tuple. You can set
+   it bigger if you need bigger tuples (although TOAST should reduce the need
+   to have large tuples, since fields can be spread across multiple tuples).
+   BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ is
+   currently 2^15 (32768). This is determined by the 15-bit widths of the
+   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h).
+   Changing BLCKSZ requires an initdb.''')
 
 cdata.set('XLOG_BLCKSZ', get_option('wal-blocksize') * 1024)
 cdata.set('RELSEG_SIZE', get_option('segsize') * 131072)
@@ -1000,7 +995,8 @@ if get_option('ssl') == 'openssl'
 description: 'Define to 1 to build with OpenSSL support. 
(-Dssl=openssl)')
 
   cdata.set('OPENSSL_API_COMPAT', 0x10001000,
-description: 'Define to the OpenSSL API version in use. This 
avoids deprecation warnings from newer OpenSSL versions.')
+description: '''Define to the OpenSSL API version in use. This 
avoids deprecation warnings
+   from newer OpenSSL versions.''')
 else
   ssl = dependency('', required : false)
 endif
@@ -1639,7 +1635,7 @@ foreach c : decl_checks
 kwargs: args)
   cdata.set10(varname, found, description:
 '''Define to 1 if you have the declaration of `@0@', and to 0 if you
-  don't.'''.format(func))
+   don't.'''.format(func))
 endforeach
 
 
-- 
2.35.1

From 1d2f4c3d7eb32a01e3ea75816f9ec09add6500db Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 May 2022 09:48:43 +0200
Subject: [PATCH 4/7] meson: Remove 

RE: Handle infinite recursion in logical replication setup

2022-05-18 Thread shiy.f...@fujitsu.com
On Wed, May 4, 2022 2:47 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v13 patch has the changes for the same.
> 

Thanks for your patch. Here are some comments on v13-0001 patch.

1) 
@@ -152,6 +152,18 @@ CREATE SUBSCRIPTION subscription_name

 
+   
+local_only (boolean)
+
+ 
+  Specifies whether the subscription will request the publisher to send
+  locally originated changes at the publisher node, or send any
+  publisher node changes regardless of their origin. The default is
+  false.
+ 
+
+   
+

 slot_name (string)
 

I think this change should be put after "The following parameters control the
subscription's replication behavior after it has been created", thoughts?

2)
A new column "sublocalonly" is added to pg_subscription, so maybe we need add it
to pg_subscription document, too. (in doc/src/sgml/catalogs.sgml)

3)
/*
 * Currently we always forward.
 */
static bool
pgoutput_origin_filter(LogicalDecodingContext *ctx,
   RepOriginId origin_id)

Should we modify the comment of pgoutput_origin_filter()? It doesn't match the
code.

Regards,
Shi yu


Re: [PATCH] New [relation] option engine

2022-05-18 Thread Alvaro Herrera
forbid_realloc is only tested in an assert.  There needs to be an "if"
test for it somewhere (suppose some extension author uses this API and
only runs it in assert-disabled environment; they'll never know they
made a mistake).  But do we really need this option?  Why do we need a
hardcoded limit in the number of options?


In allocateOptionsSpecSet there's a new error message with a typo
"grater" which should be "greater".  But I think the message is
confusingly worded.  Maybe a better wording is "the value of parameter
XXX may not be greater than YYY".

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Remove support for Visual Studio 2013

2022-05-18 Thread Juan José Santamaría Flecha
On Wed, May 18, 2022 at 2:27 AM Michael Paquier  wrote:

>
> @@ -1757,7 +1757,7 @@ get_collation_actual_version(char collprovider,
> const char *collcollate)
>  collcollate,
>  GetLastError(;
>  }
> -collversion = psprintf("%d.%d,%d.%d",
> +collversion = psprintf("%ld.%ld,%ld.%ld",
> (version.dwNLSVersion >> 8) & 0x,
> version.dwNLSVersion & 0xFF,
> (version.dwDefinedVersion >> 8) & 0x,
>
> Is this change still required even if we bump MIN_WINNT to 0x0600 for
> all the environments that include win32.h?


Right now we are ifdefing that code out for MinGW, so it's not a visible
issue, but it'll be when we do.


> At the end, this would
> mean dropping support for Windows XP and Windows Server 2003 as
> run-time environments as listed in [1], which are not supported
> officially since 2014 (even if there have been some patches for
> some critical issues).  So I'd be fine to raise the bar for v16~,
> particularly as this would allow us to get rid of this code related to
> locales.
>

Even Windows Server 2008 [1] is at its End of Life, so this should surprise
no one.

[1]
https://docs.microsoft.com/en-us/troubleshoot/windows-server/windows-server-eos-faq/end-of-support-windows-server-2008-2008r2

Regards,

Juan José Santamaría Flecha

>
>


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-05-18 Thread Noah Misch
On Mon, May 16, 2022 at 02:30:00PM +0900, Michael Paquier wrote:
> On Sat, May 14, 2022 at 01:27:28AM -0700, Noah Misch wrote:
> > Here, I requested the rationale for the differences you had just described.
> > You made a choice to stop testing one list of database names and start 
> > testing
> > a different list of database names.  Why?
> 
> Because the shape of the new names does not change the test coverage
> ("regression" prefix or the addition of the double quotes with
> backslashes for all the database names), while keeping the code a bit
> simpler.  If you think that the older names are more adapted, I have
> no objections to use them, FWIW, which is something like the patch
> attached would achieve.
> 
> This uses the same convention as vcregress.pl before 322becb, but not
> the one of test.sh where "regression" was appended to the database
> names.

I would have picked the test.sh names, both because test.sh was the senior
implementation and because doing so avoids warnings under
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS.  See the warnings here:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=longfin=2022-05-18%2000%3A59%3A35=pg_upgrade-check

More-notable line from that same log:
sh: 
/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pg_upgrade/../../../src/test/regress/pg_regress--port=5678:
 No such file or directory

Commit 7dd3ee5 adopted much of the 027_stream_regress.pl approach to running
pg_regress, but it didn't grab the "is($rc, 0, 'regression tests pass')"
needed to make defects like that report a failure.

> --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
> +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
> @@ -13,18 +13,16 @@ use Test::More;
>  # Generate a database with a name made of a range of ASCII characters.
>  sub generate_db
>  {
> - my ($node, $from_char, $to_char) = @_;
> + my ($node, $prefix, $from_char, $to_char, $suffix) = @_;
>  
> - my $dbname = '';
> + my $dbname = $prefix;
>   for my $i ($from_char .. $to_char)
>   {
>   next if $i == 7 || $i == 10 || $i == 13;# skip BEL, LF, and 
> CR
>   $dbname = $dbname . sprintf('%c', $i);
>   }
>  
> - # Exercise backslashes adjacent to double quotes, a Windows special
> - # case.
> - $dbname = '\\"\\' . $dbname . '"\\';
> + $dbname .= $suffix;
>   $node->command_ok([ 'createdb', $dbname ]);
>  }
>  
> @@ -79,10 +77,12 @@ else
>  {
>   # Default is to use pg_regress to set up the old instance.
>  
> - # Create databases with names covering most ASCII bytes
> - generate_db($oldnode, 1,  45);
> - generate_db($oldnode, 46, 90);
> - generate_db($oldnode, 91, 127);
> + # Create databases with names covering most ASCII bytes.  The
> + # first name exercises backslashes adjacent to double quotes, a
> + # Windows special case.
> + generate_db($oldnode, "\\\"\\", 1,  45,  "\"\\");
> + generate_db($oldnode, '',   46, 90,  '');
> + generate_db($oldnode, '',   91, 127, '');

Does this pass on Windows?  I'm 65% confident that released IPC::Run can't
handle this input due to https://github.com/toddr/IPC-Run/issues/142.  If it's
passing for you on Windows, then disregard.




Re: Intermittent buildfarm failures on wrasse

2022-05-18 Thread Masahiko Sawada
On Sun, May 15, 2022 at 12:29 AM Alvaro Herrera  wrote:
>
> On 2022-Apr-20, Masahiko Sawada wrote:
>
> > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
> > >   (proc->statusFlags & PROC_XMIN_FLAGS);
> > >
> > > Perhaps the latter is more future-proof.
>
> > Copying only xmin-related flags in this way also makes sense to me and
> > there is no problem at least for now. A note would be that when we
> > introduce a new flag that needs to be copied in the future, we need to
> > make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a
> > similar issue we fixed by 0f0cfb494004befb0f6e could happen again.
>
> OK, done this way -- patch attached.

Thank you for updating the patch.

>
> Reading the comment I wrote about it, I wonder if flags
> PROC_AFFECTS_ALL_HORIZONS and PROC_IN_LOGICAL_DECODING should also be
> included.  I think the only reason we don't care at this point is that
> walsenders (logical or otherwise) do not engage in snapshot copying.
> But if we were to implement usage of parallel workers sharing a common
> snapshot to do table sync in parallel, then it ISTM it would be
> important to copy at least the latter.  Not sure there are any cases
> were we might care about the former.

Yeah, it seems to be inconsistent between the comment (and the new
name) and the flags actually included. I think we can include all
xmin-related flags to PROC_XMIN_FLAGS as the comment says. That way,
it would be useful also for other use cases, and I don't see any
downside for now.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Allow file inclusion in pg_hba and pg_ident files

2022-05-18 Thread Julien Rouhaud
Hi,

On Tue, Mar 29, 2022 at 04:14:54PM +0800, Julien Rouhaud wrote:
> 
> I'm attaching a rebase of the rest of the patchset.  Note that I also added
> support for an "include_dir" directive, as it's also something that would be
> quite helpful (and very welcome for my $work use case).

The cfbot reports that the patch doesn't apply anymore, rebased v7 attached.
>From 04943f3ff8dfb8efa5e538e0d9524fb041c3b39b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 21 Feb 2022 15:45:26 +0800
Subject: [PATCH v7 1/2] Allow file inclusion in pg_hba and pg_ident files.

Catversion is bumped.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya%40jrouhaud
---
 doc/src/sgml/catalogs.sgml |  48 +++-
 doc/src/sgml/client-auth.sgml  |  48 +++-
 src/backend/libpq/hba.c| 343 +
 src/backend/libpq/pg_hba.conf.sample   |  10 +-
 src/backend/libpq/pg_ident.conf.sample |  12 +-
 src/backend/utils/adt/hbafuncs.c   |  51 +++-
 src/include/catalog/pg_proc.dat|  11 +-
 src/include/libpq/hba.h|   2 +
 src/test/regress/expected/rules.out|  12 +-
 9 files changed, 453 insertions(+), 84 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..b44e62d388 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10554,12 +10554,31 @@ SCRAM-SHA-256$iteration 
count:
 
 
 
+ 
+  
+   rule_number int4
+  
+  
+   Rule number, in priority order, of this rule if the rule is valid,
+   otherwise null
+  
+ 
+
+ 
+  
+   file_name text
+  
+  
+   File name of this rule
+  
+ 
+
  
   
line_number int4
   
   
-   Line number of this rule in pg_hba.conf
+   Line number of this rule in the given file_name
   
  
 
@@ -10694,6 +10713,33 @@ SCRAM-SHA-256$iteration 
count:
 
 
 
+ 
+  
+   mapping_number int4
+  
+  
+   Rule number, in priority order, of this mapping if the mapping is valid,
+   otherwise null
+  
+ 
+
+ 
+  
+   file_name text
+  
+  
+   File name of this mapping
+  
+ 
+
+ 
+  
+   line_number int4
+  
+  
+   Line number of this mapping in the given file_name
+  
+ 
  
   
line_number int4
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index b2a459fb0d..f7d871e660 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -89,8 +89,21 @@
   
 
   
-   Each record specifies a connection type, a client IP address range
-   (if relevant for the connection type), a database name, a user name,
+   Each record can either be an inclusion directive or an authentication rule.
+   Inclusion records specifies files that can be included, which contains
+   additional records.  The records will be inserted in lieu of the inclusion
+   records.  Those records only contains two fields: the
+   include or include_dir directive and
+   the file or directory to be included.  The file or directory can be a
+   relative of absolute path, and can be double quoted if needed.  For the
+   include_dir form, all files not starting with a
+   . and ending with .conf will be
+   included.
+  
+
+  
+   Each authentication record specifies a connection type, a client IP address
+   range (if relevant for the connection type), a database name, a user name,
and the authentication method to be used for connections matching
these parameters. The first record with a matching connection type,
client address, requested database, and user name is used to perform
@@ -103,6 +116,8 @@
   
A record can have several formats:
 
+include   file
+include_dir   directory
 local database  
user  auth-method 
auth-options
 host  database  
user  address 
auth-method  
auth-options
 hostssl   database  
user  address 
auth-method  
auth-options
@@ -118,6 +133,26 @@ hostnogssenc  database  
user
+
+ include
+ 
+  
+   This line will be replaced with the content of the given file.
+  
+ 
+
+
+
+ include_dir
+ 
+  
+   This line will be replaced with the content of all the files found in
+   the directory, if they don't start with a . and end
+   with .conf.
+  
+ 
+
+
 
  local
  
@@ -835,8 +870,10 @@ local   db1,db2,@demodbs  all  
 md5
cluster's data directory.  (It is possible to place the map file
elsewhere, however; see the 
configuration parameter.)
-   The ident map file contains lines of the general form:
+   The ident map file contains lines of two general form:
 
+include file
+include_dir directory
 map-name system-username 
database-username
 
Comments, whitespace and line continuations are 

Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-18 Thread Peter Smith
"Here are my review comments for v6-0001.

==

1. General

I saw that now in most places you are referring to the new kind of
worker as the "apply background worker". But there are a few comments
remaining that still refer to "bgworker". Please search the entire
patch for "bgworker" in the comments and replace them with "apply
background worker".

==

2. Commit message

We also need to allow stream_stop to complete by the
apply background worker to finish it to avoid deadlocks because T-1's current
stream of changes can update rows in conflicting order with T-2's next stream
of changes.

Something is not right with this wording: "to complete by the apply
background worker to finish it...".

Maybe just omit the words "to finish it" (??).

~~~

3. Commit message

This patch also extends the subscription streaming option so that...

SUGGESTION
This patch also extends the SUBSCRIPTION 'streaming' option so that...

==

4. src/backend/commands/subscriptioncmds.c - defGetStreamingMode

+/*
+ * Extract the streaming mode value from a DefElem.  This is like
+ * defGetBoolean() but also accepts the special value and "apply".
+ */
+static char
+defGetStreamingMode(DefElem *def)

Typo: "special value and..." -> "special value of..."

==

5. src/backend/replication/logical/launcher.c - logicalrep_worker_launch

+
+ if (subworker_dsm == DSM_HANDLE_INVALID)
+ snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ApplyWorkerMain");
+ else
+ snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ApplyBgworkerMain");
+
+

5a.
This condition should be using the new 'is_subworker' bool

5b.
Double blank lines?

~~~

6. src/backend/replication/logical/launcher.c - logicalrep_worker_launch

- else
+ else if (subworker_dsm == DSM_HANDLE_INVALID)
  snprintf(bgw.bgw_name, BGW_MAXLEN,
  "logical replication worker for subscription %u", subid);
+ else
+ snprintf(bgw.bgw_name, BGW_MAXLEN,
+ "logical replication apply worker for subscription %u", subid);
  snprintf(bgw.bgw_type, BGW_MAXLEN, "logical replication worker");

This condition also should be using the new 'is_subworker' bool

~~~

7. src/backend/replication/logical/launcher.c - logicalrep_worker_stop_internal

+
+ Assert(LWLockHeldByMe(LogicalRepWorkerLock));
+

I think there should be a comment here to say that this lock is
required/expected to be released by the caller of this function.

==

8. src/backend/replication/logical/origin.c - replorigin_session_setup

@@ -1068,7 +1068,7 @@ ReplicationOriginExitCleanup(int code, Datum arg)
  * with replorigin_session_reset().
  */
 void
-replorigin_session_setup(RepOriginId node)
+replorigin_session_setup(RepOriginId node, bool acquire)
 {

This function has been problematic for several reviews. I saw that you
removed the previously confusing comment but I still feel some kind of
explanation is needed for the vague 'acquire' parameter. OTOH perhaps
if you just change the param name to 'must_acquire' then I think it
would be self-explanatory.

==

9. src/backend/replication/logical/worker.c - General

Some of the logs have a prefix "[Apply BGW #%u]" and some do not; I
did not really understand how you decided to prefix or not so I did
not comment about them individually. Are they all OK? Perhaps if you
can explain the reason for the choices I can review it better next
time.

~~~

10. src/backend/replication/logical/worker.c - General

There are multiple places in the code where there is code checking
if/else for bgworker or normal apply worker. And in those places,
there is often a comment like:

"If we are in main apply worker..."

But it is redundant to say "If we are" because we know we are.
Instead, those cases should say a comment at the top of the else like:

/* This is the main apply worker. */

And then the "If we are in main apply worker" text can be removed from
the comment. There are many examples in the patch like this. Please
search and modify all of them.

~~~

11. src/backend/replication/logical/worker.c - file header comment

The whole comment is similar to the commit message so any changes made
there (for #2, #3) should be made here also.

~~~

12. src/backend/replication/logical/worker.c

+typedef struct WorkerEntry
+{
+ TransactionId xid;
+ WorkerState*wstate;
+} WorkerEntry;

Missing comment for this structure

~~~

13. src/backend/replication/logical/worker.c

WorkerState
WorkerEntry

I felt that these struct names seem too generic - shouldn't they be
something more like ApplyBgworkerState, ApplyBgworkerEntry

~~~

14. src/backend/replication/logical/worker.c

+static List *ApplyWorkersIdleList = NIL;

IMO maybe ApplyWorkersFreeList is a better name than IdleList for
this. "Idle" sounds just like it is paused rather than available for
someone else to use. If you change this then please search the rest of
the patch for mentions in log messages etc

~~~

15. src/backend/replication/logical/worker.c

+static WorkerState *stream_apply_worker = NULL;
+
+/* check if we apply transaction in apply bgworker