Re: Parallel copy

2022-03-06 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 3:14 PM vignesh C  wrote:
>
> Attached is a patch that was used for the same. The patch is written
> on top of the parallel copy patch.
> The design Amit, Andres & myself voted for that is the leader
> identifying the line bound design and sharing it in shared memory is
> performing better.

Hi Hackers, I see following are some of the problem with parallel copy feature:

1) Leader identifying the line/tuple boundaries from the file, letting
the workers pick, insert parallelly vs leader reading the file and
letting workers identify line/tuple boundaries, insert
2) Determining parallel safety of partitioned tables
3) Bulk extension of relation while inserting i.e. adding more than
one extra blocks to the relation in RelationAddExtraBlocks

Please let me know if I'm missing anything.

For (1) - from Vignesh's experiments above, it shows that the " leader
identifying the line/tuple boundaries from the file, letting the
workers pick, insert parallelly" fares better.
For (2) - while it's being discussed in another thread (I'm not sure
what's the status of that thread), how about we take this feature
without the support for partitioned tables i.e. parallel copy is
disabled for partitioned tables? Once the other discussion gets to a
logical end, we can come back and enable parallel copy for partitioned
tables.
For (3) - we need a way to extend or add new blocks fastly - fallocate
might help here, not sure who's working on it, others can comment
better here.

Can we take the "parallel copy" feature forward of course with some
restrictions in place?

Thoughts?

Regards,
Bharath Rupireddy.




Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-06 Thread Kyotaro Horiguchi
At Fri, 4 Mar 2022 23:26:43 +1300, Thomas Munro  wrote 
in 
> On Fri, Mar 4, 2022 at 10:04 PM Kyotaro Horiguchi
>  wrote:
> > And I made a quick hack on do_pg_start_backup.  And I found that
> > pg_basebackup copies in-place tablespaces under the *current
> > directory*, which is not ok at all:(
> 
> Hmm.  Which OS are you on?  Looks OK here -- the "in place" tablespace
> gets copied as a directory under pg_tblspc, no symlink:

> The warning from readlink() while making the mapping file isn't ideal,
> and perhaps we should suppress that with something like the attached.
> Or does the missing map file entry break something on Windows?

Ah.. Ok, somehow I thought that pg_basebackup failed for readlink
failure and the tweak I made made things worse.  I got to make it
work.

Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Dilip Kumar
On Mon, Mar 7, 2022 at 10:15 AM Amit Kapila  wrote:
> > > I haven't yet gone through the patch, but I have a question about the
> > > idea.  Suppose I want to set up a logical replication like,
> > > node1->node2->node3->node1.  So how would I create the subscriber at
> > > node1?  only_local=on or off?.  I mean on node1, I want the changes
> > > from node3 which are generated on node3 or which are replicated from
> > > node2 but I do not want changes that are replicated from node1 itself?
> > >  So if I set only_local=on then node1 will not get the changes
> > > replicated from node2, is that right? and If I set only_local=off then
> > > it will create the infinite loop again?  So how are we protecting
> > > against this case?
> > >
> >
> > In the above topology if you want local changes from both node3 and
> > node2 then I think the way to get that would be you have to create two
> > subscriptions on node1. The first one points to node2 (with
> > only_local=off) and the second one points to node3 (with only_local
> > =off).
> >
>
> Sorry, I intend to say 'only_local=on' at both places in my previous email.

Hmm okay, so for this topology we will have to connect node1 directly
to node2 as well as to node3 but can not cascade the changes.  I was
wondering can it be done without using the extra connection between
node2 to node1?  I mean instead of making this a boolean flag that
whether we want local change or remote change, can't we control the
changes based on the origin id?  Such that node1 will get the local
changes of node3 but with using the same subscription it will get
changes from node3 which are originated from node2 but it will not
receive the changes which are originated from node1.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] vacuumdb --schema only

2022-03-06 Thread Gilles Darold
Le 06/03/2022 à 16:04, Justin Pryzby a écrit :
> On Sun, Mar 06, 2022 at 09:39:37AM +0100, Gilles Darold wrote:
>> Attached a new patch version that adds the -N | --exclude-schema option
>> to the vacuumdb command as suggested. Documentation updated too.
>>
>> +pg_log_error("cannot vacuum all tables in schema(s) and and 
>> exclude specific schema(s) at the same time");
> and and
>
> It's odd that schema_exclusion is a global var, but schemas/excluded are not.
>
> Also, it seems unnecessary to have two schemas vars, since they can't be used
> together.  Maybe there's a better way than what I did in 003.
>
>> +for (cell = schemas ? schemas->head : NULL; cell; cell = 
>> cell->next)
> It's preferred to write cell != NULL
>
>> +   boolschemas_listed = false;
> ...
>> +for (cell = schemas ? schemas->head : NULL; cell; cell = 
>> cell->next)
>> +{
>> +if (!schemas_listed) {
>> +appendPQExpBufferStr(_query,
>> + " AND 
>> pg_catalog.quote_ident(ns.nspname)");
>> +if (schema_exclusion)
>> +appendPQExpBufferStr(_query, " 
>> NOT IN (");
>> +else
>> +appendPQExpBufferStr(_query, " 
>> IN (");
>> +
>> +schemas_listed = true;
>> +}
>> +else
>> +appendPQExpBufferStr(_query, ", ");
>> +
>> +appendStringLiteralConn(_query, cell->val, 
>> conn);
>> +appendPQExpBufferStr(_query, 
>> "::pg_catalog.regnamespace::pg_catalog.name");
>> +
>> +}
>> +/* Finish formatting schema filter */
>> +if (schemas_listed)
>> +appendPQExpBufferStr(_query, ")\n");
>>  }
> Maybe it's clearer to write this with =ANY() / != ALL() ?
> See 002.
>

I have applied your changes and produced a new version v3 of the patch,
thanks for the improvements. The patch have been added to commitfest
interface, see here https://commitfest.postgresql.org/38/3587/


-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: 

RE: Handle infinite recursion in logical replication setup

2022-03-06 Thread kuroda.hay...@fujitsu.com
Dear Vignesh,

> I felt changing only_local option might be useful for the user while
> modifying the subscription like setting it with a different set of
> publications. Changes for this are included in the v2 patch attached
> at [1].

+1, thanks. I'll post if I notice something to say.

> Shall we get a few opinions on this and take it in that direction?

I prefer subscriber-option, but I also think both are reasonable.
+1 about asking other reviewers.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Mon, Mar 7, 2022 at 11:45 AM Peter Smith  wrote:
>
> On Mon, Mar 7, 2022 at 4:20 PM vignesh C  wrote:
> >
> > On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
> > >
> > > Hi Vignesh, I also have not looked at the patch yet, but I have what
> > > seems like a very fundamental (and possibly dumb) question...
> > >
> > > Basically, I do not understand the choice of syntax for setting things up.
> > >
> > > IMO that "only-local" option sounds very similar to the other
> > > PUBLICATION ("publish") options which decide the kinds of things that
> > > will be published. So it feels more natural for me to think of the
> > > publisher as being the one to decide what will be published.
> > >
> > > e.g.
> > >
> > > option 1:
> > > CREATE PUBLICATION p1 FOR TABLE t1;
> > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> > >
> > > option 2:
> > > CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> > >
> > > ~~
> > >
> > > IIUC the patch is using option 1. My first impression was it feels
> > > back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> > > publish.
> > >
> > > So, why does the patch use syntax option 1?
> >
> > I felt the advantage with keeping it at the subscription side is that,
> > the subscriber from one node can subscribe with only_local option on
> > and a different subscriber from a different node can subscribe with
> > only_local option as off. This might not be possible with having the
> > option at publisher side. Having it at the subscriber side might give
> > more flexibility for the user.
> >
>
> OK.  Option 2 needs two publications for that scenario. IMO it's more
> intuitive this way, but maybe you wanted to avoid the extra
> publications?

Yes, I wanted to avoid the extra publication creation that you pointed
out. Option 1 can handle this scenario without creating the extra
publications:
node0: CREATE PUBLICATION p1 FOR TABLE t1;
node1: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = on);
node2:  CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = off);

I'm ok with both the approaches, now that this scenario can be handled
by using both the options. i.e providing only_local option as an
option while creating publication or providing only_local option as an
option while creating subscription as Peter has pointed out at [1].
option 1:
CREATE PUBLICATION p1 FOR TABLE t1;
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);

option 2:
CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'only_local');
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;

Shall we get a few opinions on this and take it in that direction?

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPsAWaETh9VMymbBfMrqiE1KuqMq%2BwpBg0s7eMzwLATr%2Bw%40mail.gmail.com

Regards,
Vignesh




Re: wal_compression=zstd

2022-03-06 Thread Michael Paquier
On Fri, Mar 04, 2022 at 08:10:35AM -0500, Andrew Dunstan wrote:
> I don't see why patch 5 shouldn't be applied forthwith.

Only applying 0005 would result in a failure in the TAP test for a
problem whose fix is attempted in 0006.  This is an issue unrelated to
this thread.

FWIW, I am a bit disturbed by EnsureTopTransactionIdLogged() and its
design in 0006, where we'd finish by using a XLogFlush() call within
two SQL functions, but I have not really looked at the problem to see
if it is a viable solution or not.
--
Michael


signature.asc
Description: PGP signature


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Peter Smith
On Mon, Mar 7, 2022 at 5:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Dear Peter,
>
> > > So, why does the patch use syntax option 1?
>
> IMU it might be useful for the following case.
>
> Assuming that multi-master configuration with node1, node2.
> Node1 has a publication pub1 and a subscription sub2, node2 has pub2 and sub1.
>
> From that situation, please consider that new node node3 is added
> that subscribe some changes from node2.
>
> If the feature is introduced as option1, new publication must be defined in 
> node2.
> If that is introduced as option2, however, maybe pub2 can be reused.
> i.e. multiple declaration of publications can be avoided.
>

Yes. Thanks for the example. I had the same observation in my last post  [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtRxiQR_4UFLNThg-NNRV447FvwtcR-BvqMzjyMJXKwfw%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia.




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Peter Smith
On Mon, Mar 7, 2022 at 4:20 PM vignesh C  wrote:
>
> On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
> >
> > Hi Vignesh, I also have not looked at the patch yet, but I have what
> > seems like a very fundamental (and possibly dumb) question...
> >
> > Basically, I do not understand the choice of syntax for setting things up.
> >
> > IMO that "only-local" option sounds very similar to the other
> > PUBLICATION ("publish") options which decide the kinds of things that
> > will be published. So it feels more natural for me to think of the
> > publisher as being the one to decide what will be published.
> >
> > e.g.
> >
> > option 1:
> > CREATE PUBLICATION p1 FOR TABLE t1;
> > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> >
> > option 2:
> > CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> >
> > ~~
> >
> > IIUC the patch is using option 1. My first impression was it feels
> > back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> > publish.
> >
> > So, why does the patch use syntax option 1?
>
> I felt the advantage with keeping it at the subscription side is that,
> the subscriber from one node can subscribe with only_local option on
> and a different subscriber from a different node can subscribe with
> only_local option as off. This might not be possible with having the
> option at publisher side. Having it at the subscriber side might give
> more flexibility for the user.
>

OK.  Option 2 needs two publications for that scenario. IMO it's more
intuitive this way, but maybe you wanted to avoid the extra
publications?

node0:
CREATE PUBLICATION p1 FOR TABLE t1;
CREATE PUBLICATION p1_local FOR TABLE t1 WITH (publish = 'only_local');

node1: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1_local;
node2: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1;

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Handle infinite recursion in logical replication setup

2022-03-06 Thread kuroda.hay...@fujitsu.com
Dear Peter,

> > So, why does the patch use syntax option 1?

IMU it might be useful for the following case.

Assuming that multi-master configuration with node1, node2.
Node1 has a publication pub1 and a subscription sub2, node2 has pub2 and sub1.

From that situation, please consider that new node node3 is added
that subscribe some changes from node2.

If the feature is introduced as option1, new publication must be defined in 
node2.
If that is introduced as option2, however, maybe pub2 can be reused.
i.e. multiple declaration of publications can be avoided.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-03-06 Thread Julien Rouhaud
On Mon, Mar 07, 2022 at 06:35:45AM +0100, Pavel Stehule wrote:
> 
> this patch should be rejected. There is no consensus.

Thanks for the confirmation, I will take care of it!




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.
>
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.
>
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.
>
> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.
>
> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?

I felt changing only_local option might be useful for the user while
modifying the subscription like setting it with a different set of
publications. Changes for this are included in the v2 patch attached
at [1].
[2] - 
https://www.postgresql.org/message-id/CALDaNm0WSo5369pr2eN1obTGBeiJU9cQdF6Ju1sC4hMQNy5BfQ%40mail.gmail.com

Regards,
Vignesh




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-03-06 Thread Pavel Stehule
Hi

po 7. 3. 2022 v 3:31 odesílatel Julien Rouhaud  napsal:

> On Mon, Mar 07, 2022 at 11:27:14AM +0900, Michael Paquier wrote:
> > On Sat, Mar 05, 2022 at 07:31:53PM +0900, Michael Paquier wrote:
> > > I got a short look at what was proposed in the patch a couple of
> > > months ago, and still found the implementation confusing with the way
> > > aliases are handled, particularly when it came to several layers of
> > > pl/pgsql.  I am fine to let it go for now.
> >
> > Just had an extra look at the patch, still same impression.  So done
> > this way.
>
> I was actually waiting a bit to  make sure that Pavel could read the
> thread,
> since it was the weekend and right now it's 3:30 AM in Czech Republic...
>

this patch should be rejected. There is no consensus.

Regards

Pavel


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread osumi.takami...@fujitsu.com
On Monday, March 7, 2022 12:01 PM Shi, Yu/侍 雨  wrote:
> On Wed, Mar 2, 2022 5:39 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Attached an updated patch v26.
> >
> 
> Thanks for your patch. A comment on the document.
Hi, thank you for checking my patch !


> @@ -7771,6 +7771,16 @@ SCRAM-SHA-256$iteration
> count:
> 
>   
>
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled if one of its workers
> +   detects an error
> +  
> + 
> +
> + 
> +  
> subconninfo text
>
>
> 
> The document for "subdisableonerr" option is placed after "The following
> parameters control what happens during subscription creation: ". I think it
> should be placed after "The following parameters control the subscription's
> replication behavior after it has been created: ", right?
Addressed your comment for create_subscription.sgml
(not for catalogs.sgml).

Attached an updated patch v28.


Best Regards,
Takamichi Osumi



v28-0001-Optionally-disable-subscriptions-on-error.patch
Description: v28-0001-Optionally-disable-subscriptions-on-error.patch


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
>
> Hi Vignesh, I also have not looked at the patch yet, but I have what
> seems like a very fundamental (and possibly dumb) question...
>
> Basically, I do not understand the choice of syntax for setting things up.
>
> IMO that "only-local" option sounds very similar to the other
> PUBLICATION ("publish") options which decide the kinds of things that
> will be published. So it feels more natural for me to think of the
> publisher as being the one to decide what will be published.
>
> e.g.
>
> option 1:
> CREATE PUBLICATION p1 FOR TABLE t1;
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
>
> option 2:
> CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
>
> ~~
>
> IIUC the patch is using option 1. My first impression was it feels
> back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> publish.
>
> So, why does the patch use syntax option 1?

I felt the advantage with keeping it at the subscription side is that,
the subscriber from one node can subscribe with only_local option on
and a different subscriber from a different node can subscribe with
only_local option as off. This might not be possible with having the
option at publisher side. Having it at the subscriber side might give
more flexibility for the user.

Regards,
Vignesh




Re: [PATCH] Add extra statistics to explain for Nested Loop

2022-03-06 Thread Julien Rouhaud
Hi,

On Thu, Feb 03, 2022 at 12:59:03AM +0300, Ekaterina Sokolova wrote:
>
> I apply the new version of patch.
>
> I wanted to measure overheads, but could't choose correct way. Thanks for
> idea with auto_explain.
> I loaded it and made 10 requests of pgbench (number of clients: 1, of
> threads: 1).
> I'm not sure I chose the right way to measure overhead, so any suggestions
> are welcome.
> Current results are in file overhead_v0.txt.
>
> Please feel free to share your suggestions and comments. Regards,
>

>| master latency (ms) | master tps |  | new latency (ms) |   new tps
> --
>  1 |2,462| 406,190341 |  |   4,485  | 222,950527
>  2 |3,877| 257,89813  |  |   4,141  | 241,493395
>  3 |3,789| 263,935811 |  |   2,837  | 352,522297
>  4 |3,53 | 283,310196 |  |   5,510  | 181,488203
>  5 |3,413| 292,997363 |  |   6,475  | 154,432999
>  6 |3,757| 266,148564 |  |   4,073  | 245,507218
>  7 |3,752| 266,560043 |  |   3,901  | 256,331385
>  8 |4,389| 227,847524 |  |   4,658  | 214,675196
>  9 |4,341| 230,372282 |  |   4,220  | 236,983672
> 10 |3,893| 256,891104 |  |   7.059  | 141,667139
> --
> avg|3,7203   | 275,215136 |  |   4,03   | 224,8052031
>
>
> master/new latency | 0,92315 |
> master/new tps | 1,22424 |
>
> new/master latency | 1,08325 |
> new/master tps | 0,81683 |

The overhead is quite significant (at least for OLTP-style workload).

I think this should be done with a new InstrumentOption, like
INSTRUMENT_LOOP_DETAILS or something like that, and set it where appropriate.
Otherwise you will have to pay that overhead even if you won't use the new
fields at all.  It could be EXPLAIN (ANALYZE, VERBOSE OFF), but also simply
using pg_stat_statements which doesn't seem acceptable.

One problem is that some extensions (like pg_stat_statements) can rely on
INSTRUMENT_ALL but may or may not care about those extra counters.  Maybe we
should remove that alias and instead provide two (like INSTRUMENT_ALL_VERBOSE
and INSTRUMENT_ALL_SOMETHINGELSE, I don't have any bright name right now) so
that authors can decide what they need instead of silently having such
extension ruin the performance for no reason.

About the implementation itself:

+static void show_loop_info(Instrumentation *instrument, bool isworker,
+  ExplainState *es);

I think this should be done as a separate refactoring commit.

+   /*
+* this is first loop
+*
+* We only initialize the min values. We don't need to bother with the
+* max, because those are 0 and the non-zero values will get updated a
+* couple lines later.
+*/
+   if (instr->nloops == 0)
+   {
+   instr->min_t = totaltime;
+   instr->min_tuples = instr->tuplecount;
+   }
+
+   if (instr->min_t > totaltime)
+   instr->min_t = totaltime;
+
+   if (instr->max_t < totaltime)
+   instr->max_t = totaltime;
+
instr->ntuples += instr->tuplecount;
+
+   if (instr->min_tuples > instr->tuplecount)
+   instr->min_tuples = instr->tuplecount;
+
+   if (instr->max_tuples < instr->tuplecount)
+   instr->max_tuples = instr->tuplecount;
+
instr->nloops += 1;

Why do you need to initialize min_t and min_tuples but not max_t and
max_tuples while both will initially be 0 and possibly updated afterwards?

I think you should either entirely remove this if (instr->nloops == 0) part, or
handle some else block.




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Peter Smith
On Mon, Mar 7, 2022 at 3:56 PM Peter Smith  wrote:
>
> Hi Vignesh, I also have not looked at the patch yet, but I have what
> seems like a very fundamental (and possibly dumb) question...
>
> Basically, I do not understand the choice of syntax for setting things up.
>
> IMO that "only-local" option sounds very similar to the other
> PUBLICATION ("publish") options which decide the kinds of things that
> will be published. So it feels more natural for me to think of the
> publisher as being the one to decide what will be published.
>
> e.g.
>
> option 1:
> CREATE PUBLICATION p1 FOR TABLE t1;
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
>
> option 2:
> CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
>

Sorry, I mean to write WITH.

option 2:
CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'only_local');
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;

> ~~
>
> IIUC the patch is using option 1. My first impression was it feels
> back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> publish.
>
> So, why does the patch use syntax option 1?
>
> --
> Kind Regards,
> Peter Smith
> Fujitsu Australia




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-03-06 Thread Michael Paquier
On Mon, Mar 07, 2022 at 10:31:40AM +0800, Julien Rouhaud wrote:
> I was actually waiting a bit to  make sure that Pavel could read the thread,
> since it was the weekend and right now it's 3:30 AM in Czech Republic...

Sorry about that.  I have reset the state of the patch.
--
Michael


signature.asc
Description: PGP signature


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Peter Smith
Hi Vignesh, I also have not looked at the patch yet, but I have what
seems like a very fundamental (and possibly dumb) question...

Basically, I do not understand the choice of syntax for setting things up.

IMO that "only-local" option sounds very similar to the other
PUBLICATION ("publish") options which decide the kinds of things that
will be published. So it feels more natural for me to think of the
publisher as being the one to decide what will be published.

e.g.

option 1:
CREATE PUBLICATION p1 FOR TABLE t1;
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);

option 2:
CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;

~~

IIUC the patch is using option 1. My first impression was it feels
back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
publish.

So, why does the patch use syntax option 1?

--
Kind Regards,
Peter Smith
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 10:05 AM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 9:26 AM Dilip Kumar  wrote:
> >
> > On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> > > Here there are two problems for the user: a) incremental
> > > synchronization of table sending both local data and replicated data
> > > by walsender b) Table synchronization of table using copy command
> > > sending both local data and replicated data
> > >
> > > For the first problem "Incremental synchronization of table by
> > > Walsender" can be solved by:
> > > Currently the locally generated data does not have replication origin
> > > associated and the data that has originated from another instance will
> > > have a replication origin associated. We could use this information to
> > > differentiate locally generated data and replicated data and send only
> > > the locally generated data. This "only_local" could be provided as an
> > > option while subscription is created:
> > > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > > PUBLICATION pub1 with (only_local = on);
> >
> > I haven't yet gone through the patch, but I have a question about the
> > idea.  Suppose I want to set up a logical replication like,
> > node1->node2->node3->node1.  So how would I create the subscriber at
> > node1?  only_local=on or off?.  I mean on node1, I want the changes
> > from node3 which are generated on node3 or which are replicated from
> > node2 but I do not want changes that are replicated from node1 itself?
> >  So if I set only_local=on then node1 will not get the changes
> > replicated from node2, is that right? and If I set only_local=off then
> > it will create the infinite loop again?  So how are we protecting
> > against this case?
> >
>
> In the above topology if you want local changes from both node3 and
> node2 then I think the way to get that would be you have to create two
> subscriptions on node1. The first one points to node2 (with
> only_local=off) and the second one points to node3 (with only_local
> =off).
>

Sorry, I intend to say 'only_local=on' at both places in my previous email.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 6:36 AM Masahiko Sawada  wrote:
>
> Thank you for the comment. +1.
>
> I've attached updated patches.
>

Pushed the first patch. Fixed one typo in the second patch and
slightly changed the commit message, otherwise, it looks good to me.
I'll push this tomorrow unless there are more comments.

-- 
With Regards,
Amit Kapila.


v6-0001-Add-the-additional-information-to-the-logical-rep.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 9:26 AM Dilip Kumar  wrote:
>
> On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
> >
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> I haven't yet gone through the patch, but I have a question about the
> idea.  Suppose I want to set up a logical replication like,
> node1->node2->node3->node1.  So how would I create the subscriber at
> node1?  only_local=on or off?.  I mean on node1, I want the changes
> from node3 which are generated on node3 or which are replicated from
> node2 but I do not want changes that are replicated from node1 itself?
>  So if I set only_local=on then node1 will not get the changes
> replicated from node2, is that right? and If I set only_local=off then
> it will create the infinite loop again?  So how are we protecting
> against this case?
>

In the above topology if you want local changes from both node3 and
node2 then I think the way to get that would be you have to create two
subscriptions on node1. The first one points to node2 (with
only_local=off) and the second one points to node3 (with only_local
=off). Will that address your case or am I missing something?

-- 
With Regards,
Amit Kapila.




Re: role self-revocation

2022-03-06 Thread David G. Johnston
On Sun, Mar 6, 2022 at 8:19 AM Robert Haas  wrote:

> The choice of names in my example wasn't accidental. If the granted
> role is a login role, then the superuser's intention was to vest the
> privileges of that role in some other role, and it is surely not right
> for that role to be able to decide that it doesn't want it's
> privileges to be so granted. That's why I chose the name "peon".


 >> rhaas [as peon] => revoke peon from boss; -- i don't like being bossed
around!

Well, the peon is not getting bossed around, the boss is getting peoned
around and the peon has decided that they like boss too much and don't need
to do that anymore.

When you grant a group "to" a role you place the role under the group - and
inheritance flows downward.

In the original thread Stephen wrote:

"This is because we allow 'self administration' of roles, meaning that
they can decide what other roles they are a member of.:

The example, which you moved here, then attempts to demonstrate this "fact"
but gets it wrong.  Boss became a member of peon so if you want to
demonstrate self-administration of a role's membership in a different group
you have to login as boss, not peon.  Doing that, and then revoking peon
from boss, yields "ERROR: must have admin option on role "peon"".

So no, without "WITH ADMIN OPTION" a role cannot decide what other roles
they are a member of.

I don't necessarily have an issue changing self-administration but if the
motivating concern is that all these new pg_* roles we are creating are
something a normal user can opt-out of/revoke that simply isn't the case
today, unless they are added to the pg_* role WITH ADMIN OPTION.

That all said, permissions SHOULD BE strictly additive.  If boss doesn't
want to be a member of pg_read_all_files allowing them to revoke themself
from that role seems like it should be acceptable.  If there is fear in
allowing someone to revoke (not add) themselves as a member of a different
role that suggests we have a design issue in another feature of the
system.  Today, they neither grant nor revoke, and the self-revocation
doesn't seem that important to add.

David J.


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Dilip Kumar
On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> Here there are two problems for the user: a) incremental
> synchronization of table sending both local data and replicated data
> by walsender b) Table synchronization of table using copy command
> sending both local data and replicated data
>
> For the first problem "Incremental synchronization of table by
> Walsender" can be solved by:
> Currently the locally generated data does not have replication origin
> associated and the data that has originated from another instance will
> have a replication origin associated. We could use this information to
> differentiate locally generated data and replicated data and send only
> the locally generated data. This "only_local" could be provided as an
> option while subscription is created:
> ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> PUBLICATION pub1 with (only_local = on);

I haven't yet gone through the patch, but I have a question about the
idea.  Suppose I want to set up a logical replication like,
node1->node2->node3->node1.  So how would I create the subscriber at
node1?  only_local=on or off?.  I mean on node1, I want the changes
from node3 which are generated on node3 or which are replicated from
node2 but I do not want changes that are replicated from node1 itself?
 So if I set only_local=on then node1 will not get the changes
replicated from node2, is that right? and If I set only_local=off then
it will create the infinite loop again?  So how are we protecting
against this case?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Peter Smith
FYI, the v2 patch did not apply to HEAD

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v2-0001-Skip-replication-of-non-local-data.patch
--verbose
...
error: patch failed: src/backend/replication/slotfuncs.c:231
error: src/backend/replication/slotfuncs.c: patch does not apply

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Julien Rouhaud
On Sun, Mar 06, 2022 at 07:10:49PM -0800, Zhihong Yu wrote:
> On Sun, Mar 6, 2022 at 6:23 PM Julien Rouhaud  wrote:
> 
> > On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:
> > >
> > > Here is one example (same query, q, is concerned).
> > > At t1, q is performed, leaving one row in pg_stat_statements with
> > mean_time
> > > of 10.
> > > At t2, operator examines pg_stat_statements and provides some suggestion
> > > for tuning q (which is carried out).
> > > At t3, q is run again leaving the row with mean_time of 9.
> > > Now with two rows for q, how do we know whether the row written at t3 is
> > > prior to or after implementing the suggestion made at t2 ?
> >
> > Well, if pg_stat_statements is read by people doing performance tuning
> > shouldn't they be able to distinguish which query text is the one they just
> > rewrote?
> >
> Did I mention rewriting ?

How else would you end up with two entries in pg_stat_statements?

> As you said below, adding index is one way of tuning which doesn't involve
> rewriting.

Yes, and in that case you have a single row for that query, and mean_time is
useless.  You need to compute it yourself using snapshots of
pg_stat_statements if you want to know how that query performed since the
optimization.

> So some information in pg_stat_statements (or related table) is needed to
> disambiguate.

In my opinion that's not pg_stat_statements' job.  Like all other similar
infrastructure in postgres it only provides cumulated counters.  You would
have exactly the same issue with e.g. pg_stat_user_indexes or pg_stat_bgwriter.




Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Zhihong Yu
On Sun, Mar 6, 2022 at 6:23 PM Julien Rouhaud  wrote:

> On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:
> > The current design of pg_stat_statements doesn't have the concept of
> > observation.
> >
> > By observation I mean scenarios where pg_stat_statements is read by
> people
> > doing performance tuning.
> >
> > Here is one example (same query, q, is concerned).
> > At t1, q is performed, leaving one row in pg_stat_statements with
> mean_time
> > of 10.
> > At t2, operator examines pg_stat_statements and provides some suggestion
> > for tuning q (which is carried out).
> > At t3, q is run again leaving the row with mean_time of 9.
> > Now with two rows for q, how do we know whether the row written at t3 is
> > prior to or after implementing the suggestion made at t2 ?
>
> Well, if pg_stat_statements is read by people doing performance tuning
> shouldn't they be able to distinguish which query text is the one they just
> rewrote?
>
Did I mention rewriting ?
As you said below, adding index is one way of tuning which doesn't involve
rewriting.

Please also note that the person tuning the query may be different from the
person writing the query.
So some information in pg_stat_statements (or related table) is needed to
disambiguate.


> > Using other tools, a lot of the information in pg_stat_statements would
> be
> > duplicated to distinguish the counters recorded w.r.t. tuning operation.
>
> Yes, which is good.  Your example was about rewriting a query, but what
> about
> other possibilities like creating an index, changing
> hash_mem_multiplier...?
> You won't get a new record and the mean_time will mostly be useless.
>
> If you take regular snapshot, then you will be able to compute the
> mean_time
> for each interval, and that will answer bot this scenario and the one in
> your
> example (since the 2nd row won't exist in the earlier snapshots).
>


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread osumi.takami...@fujitsu.com
On Friday, March 4, 2022 3:55 PM Masahiko Sawada  wrote:
> Thank you for updating the patch.
> 
> Here are some comments on v26 patch:
Thank you for your review !



> +/*
> + * Disable the current subscription.
> + */
> +static void
> +DisableSubscriptionOnError(void)
> 
> This function now just updates the pg_subscription catalog so can we move it
> to pg_subscritpion.c while having this function accept the subscription OID to
> disable? If you agree, the function comment will also need to be updated.
Agreed. Fixed.


> ---
> +/*
> + * First, ensure that we log the error message so
> that it won't be
> + * lost if some other internal error occurs in the
> following code.
> + * Then, abort the current transaction and send the
> stats message of
> + * the table synchronization failure in an idle state.
> + */
> +HOLD_INTERRUPTS();
> +EmitErrorReport();
> +FlushErrorState();
> +AbortOutOfAnyTransaction();
> +RESUME_INTERRUPTS();
> +pgstat_report_subscription_error(MySubscription->oid,
> + false);
> +
> +if (MySubscription->disableonerr)
> +{
> +DisableSubscriptionOnError();
> +proc_exit(0);
> +}
> +
> +PG_RE_THROW();
> 
> If the disableonerr is false, the same error is reported twice. Also, the 
> code in
> PG_CATCH() in both start_apply() and start_table_sync() are almost the same.
> Can we create a common function to do post-error processing?
Yes. Also, calling PG_RE_THROW wasn't appropriate,
because in the previous v26, for the second error you mentioned,
the patch didn't call errstart when disable_on_error = false.
This was introduced by recent patch refactoring around this code and the rebase
of this patch, but has been fixed by your suggestion.


> The worker should exit with return code 1.
> I've attached a fixup patch for changes to worker.c for your reference. Feel 
> free
> to adopt the changes.
Yes. I adopted almost all of your suggestion.
One thing I fixed was a comment that mentioned table sync
in worker_post_error_processing(), because start_apply()
also uses the function.


> 
> ---
> +
> +# Confirm that we have finished the table sync.
> +is( $node_subscriber->safe_psql(
> +'postgres', qq(SELECT MAX(i), COUNT(*) FROM tbl)),
> +"1|3",
> +"subscription sub replicated data");
> +
> 
> Can we store the result to a local variable to check? I think it's more 
> consistent
> with other tap tests.
Agreed. Fixed.


Attached the v27. Kindly review the patch.


Best Regards,
Takamichi Osumi



v27-0001-Optionally-disable-subscriptions-on-error.patch
Description: v27-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread shiy.f...@fujitsu.com
On Wed, Mar 2, 2022 5:39 PM osumi.takami...@fujitsu.com 
 wrote:
> 
> Attached an updated patch v26.
> 

Thanks for your patch. A comment on the document.

@@ -7771,6 +7771,16 @@ SCRAM-SHA-256$iteration 
count:
 
  
   
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled if one of its workers
+   detects an error
+  
+ 
+
+ 
+  
subconninfo text
   
   

The document for "subdisableonerr" option is placed after "The following
parameters control what happens during subscription creation: ". I think it
should be placed after "The following parameters control the subscription's
replication behavior after it has been created: ", right?

Regards,
Shi yu


Re: row filtering for logical replication

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 12:50 AM Tomas Vondra
 wrote:
>
> On 3/3/22 21:07, Euler Taveira wrote:
> > On Thu, Mar 3, 2022, at 7:47 AM, Amit Kapila wrote:
> >> LGTM. I'll push this tomorrow unless Tomas or Euler feels otherwise.
> > Sounds good to me.
> >
>
> +1
>

Thanks, Pushed 
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ceb57afd3ce177e897cb4c5b44aa683fc0036782).

-- 
With Regards,
Amit Kapila.




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-03-06 Thread Julien Rouhaud
On Mon, Mar 07, 2022 at 11:27:14AM +0900, Michael Paquier wrote:
> On Sat, Mar 05, 2022 at 07:31:53PM +0900, Michael Paquier wrote:
> > I got a short look at what was proposed in the patch a couple of
> > months ago, and still found the implementation confusing with the way
> > aliases are handled, particularly when it came to several layers of
> > pl/pgsql.  I am fine to let it go for now.
>
> Just had an extra look at the patch, still same impression.  So done
> this way.

I was actually waiting a bit to  make sure that Pavel could read the thread,
since it was the weekend and right now it's 3:30 AM in Czech Republic...




Re: make tuplestore helper function

2022-03-06 Thread Michael Paquier
On Wed, Mar 02, 2022 at 03:43:17PM +0900, Michael Paquier wrote:
> This is actually setting up a function in the context of a single call
> where we fill the tuplestore with all its values, so instead I have
> settled down to name that SetSingleFuncCall(), to make a parallel with
> the existing MultiFuncCall*().  funcapi.c is the right place for
> that, and I have added more documentation in the fmgr's README as well
> as funcapi.h.

I have tortured all those code paths for the last couple of days, and
the new function name, as well as its options, still seemed fine to
me, so I have applied the patch.  The buildfarm is cool with it.  It
is worth noting that there are more code paths in contrib/ that could
be simplified with this new routine.
--
Michael


signature.asc
Description: PGP signature


Re: Comment typo in CheckCmdReplicaIdentity

2022-03-06 Thread Julien Rouhaud
On Mon, Mar 07, 2022 at 10:36:24AM +0900, Michael Paquier wrote:
> On Mon, Mar 07, 2022 at 09:31:33AM +1100, Peter Smith wrote:
> > PSA patch to fix a comment typo.
> > 
> > (The 'OR' should not be uppercase - that keyword is irrelevant here).
> 
> I was looking at the whole routine, and your suggestion looks like an
> improvement to me.  Will apply if there are no objections.

+1




Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-03-06 Thread Michael Paquier
On Sat, Mar 05, 2022 at 07:31:53PM +0900, Michael Paquier wrote:
> I got a short look at what was proposed in the patch a couple of
> months ago, and still found the implementation confusing with the way
> aliases are handled, particularly when it came to several layers of
> pl/pgsql.  I am fine to let it go for now.

Just had an extra look at the patch, still same impression.  So done
this way.
--
Michael


signature.asc
Description: PGP signature


Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Julien Rouhaud
On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:
> The current design of pg_stat_statements doesn't have the concept of
> observation.
>
> By observation I mean scenarios where pg_stat_statements is read by people
> doing performance tuning.
>
> Here is one example (same query, q, is concerned).
> At t1, q is performed, leaving one row in pg_stat_statements with mean_time
> of 10.
> At t2, operator examines pg_stat_statements and provides some suggestion
> for tuning q (which is carried out).
> At t3, q is run again leaving the row with mean_time of 9.
> Now with two rows for q, how do we know whether the row written at t3 is
> prior to or after implementing the suggestion made at t2 ?

Well, if pg_stat_statements is read by people doing performance tuning
shouldn't they be able to distinguish which query text is the one they just
rewrote?

> Using other tools, a lot of the information in pg_stat_statements would be
> duplicated to distinguish the counters recorded w.r.t. tuning operation.

Yes, which is good.  Your example was about rewriting a query, but what about
other possibilities like creating an index, changing hash_mem_multiplier...?
You won't get a new record and the mean_time will mostly be useless.

If you take regular snapshot, then you will be able to compute the mean_time
for each interval, and that will answer bot this scenario and the one in your
example (since the 2nd row won't exist in the earlier snapshots).




Re: logical decoding and replication of sequences

2022-03-06 Thread Peter Smith
On Tue, Mar 1, 2022 at 10:54 PM Amit Kapila  wrote:
>
> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila  wrote:
> >
> > On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
> >  wrote:
> > >
> > > On 2/10/22 19:17, Tomas Vondra wrote:
> > > > I've polished & pushed the first part adding sequence decoding
> > > > infrastructure etc. Attached are the two remaining parts.
> > > >
> > > > I plan to wait a day or two and then push the test_decoding part. The
> > > > last part (for built-in replication) will need more work and maybe
> > > > rethinking the grammar etc.
> > > >
> > >
> > > I've pushed the second part, adding sequences to test_decoding.
> > >
> >
> > The test_decoding is failing randomly in the last few days. I am not
> > completely sure but they might be related to this work. The two of
> > these appears to be due to the same reason:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07
> >
> > TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> > "reorderbuffer.c", Line: 1173, PID: 35013)
> > 0   postgres0x00593de0 ExceptionalCondition + 
> > 160\\0
> >

FYI, it looks like the same assertion has failed again on the same
build-farm machine [1]

>
> While reviewing the code for this, I noticed that in
> sequence_decode(), we don't call ReorderBufferProcessXid to register
> the first known lsn in WAL for the current xid. The similar functions
> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
> even if they decide not to queue or send the change. Is there a reason
> for not doing the same here? However, I am not able to deduce any
> scenario where lack of this will lead to such an Assertion failure.
> Any thoughts?
>

--
[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-03-03%2023%3A14%3A26

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Time to drop plpython2?

2022-03-06 Thread Andres Freund
Hi,

On 2022-03-06 17:30:15 -0800, Andres Freund wrote:
> 0003, the removal of code level support for Python 2, is now a good bit bigger
> bigger, due to the removal of the last remnants of the Py2/3 porting layer.

Oh, huh. Something here seems to be broken, causing a crash on windows, but
not elsewhere.

https://cirrus-ci.com/task/5288088032247808
stack trace and regression.diffs:
https://api.cirrus-ci.com/v1/artifact/task/5288088032247808/crashlog/crashlog-postgres.exe_1768_2022-03-07_01-07-48-535.txt
https://api.cirrus-ci.com/v1/artifact/task/5288088032247808/log/src/pl/plpython/regression.diffs

The previous version ran successfully on windows too. So I probably screwed
something up when removing the remaining Py2/3 porting layer bits... The fact
that it fails on windows but not elsewhere perhaps suggests it's something
around the width of long?

Greetings,

Andres Freund




Re: Comment typo in CheckCmdReplicaIdentity

2022-03-06 Thread Michael Paquier
On Mon, Mar 07, 2022 at 09:31:33AM +1100, Peter Smith wrote:
> PSA patch to fix a comment typo.
> 
> (The 'OR' should not be uppercase - that keyword is irrelevant here).

I was looking at the whole routine, and your suggestion looks like an
improvement to me.  Will apply if there are no objections.
--
Michael


signature.asc
Description: PGP signature


Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Andres Freund
Hi,

On 2022-03-07 01:12:07 +0200, Victor Yegorov wrote:
> пн, 7 мар. 2022 г. в 00:34, Andres Freund :
> 
> > One thing that's likely worth doing as part of the cross version upgrade
> > test,
> > even if it wouldn't even help in this case, is to run amcheck post
> > upgrade. Just dumping data isn't going to touch indices at all.
> >
> > A sequence of
> >   pg_upgrade; amcheck; upgrade all extensions; amcheck;
> > would make sense.
> >
> 
> Is it possible to amcheck gist indexes?..

No. That was what I was alluding to with "even if it wouldn't even help in
this case".

Greetings,

Andres Freund




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-06 Thread Masahiko Sawada
On Fri, Mar 4, 2022 at 9:32 PM Euler Taveira  wrote:
>
> On Fri, Mar 4, 2022, at 2:54 AM, Amit Kapila wrote:
>
> The LSN of the transaction that contains the change violating the
> constraint and the replication origin name can be found from the
> server log (LSN 0/14C0378 and replication origin
> pg_16395 in the above case).  To skip the
> transaction, the subscription needs to be disabled temporarily by
> ALTER SUBSCRIPTION ... DISABLE first. Then, the
> transaction can be skipped by calling the  linkend="pg-replication-origin-advance">
> pg_replication_origin_advance() function
> with the node_name (i.e.,
> pg_16395) and the next LSN of the commit LSN (i.e.,
> LSN 0/14C0379).
>
> You could also add:
>
> After that the replication can be resumed by ALTER SUBSCRIPTION ...
> ENABLE.
>
> Let's provide complete instructions.

Thank you for the comment. +1.

I've attached updated patches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v5-0002-Add-the-origin-name-and-remote-commit-LSN-to-logi.patch
Description: Binary data


v5-0001-Use-complete-sentences-in-logical-replication-wor.patch
Description: Binary data


Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Tom Lane
Mark Dilger  writes:
> On Mar 6, 2022, at 2:57 PM, Tom Lane  wrote:
>> I don't think this is materially different from what we do with
>> permissions on (say) functions.  If you want to revoke the public
>> SET privilege on some USERSET variable, you instantiate the default
>> and then revoke.  You end up with an empty ACL stored in pg_setting_acl,
>> and voila.

> I assume you mean the implementation of REVOKE does this, not that the user 
> needs to do both a grant and a revoke.

Right.  Again, look at what happens when you create a function and
then revoke its default PUBLIC EXECUTE permission.

>> It'd likely be necessary to refuse to record a grant/revoke on
>> an unknown GUC, since if we don't know the GUC then we can't know
>> what the relevant default ACL ought to be.  But I bet your existing
>> patch has some dubious behavior in that case too.

> The existing patch allows grants on unknown gucs, because it can't know what 
> guc an upgrade script will introduce, and the grant statement may need to 
> execute before the guc exists.

Yeah, that's the problematic case.  It might mostly work to assume that
an unknown GUC has an empty default ACL.  This could fail to retain the
default PUBLIC SET permission if it later turns out the GUC is USERSET
... but I suspect in most cases anybody who's messing with the permissions
would've started out by revoking that anyway.  We could make this
definitely work in pg_dump if we teach pg_dump to explicitly grant or
revoke the PUBLIC SET permission anytime it's emitting anything for
a GUC, even if it thinks that would be the default state anyway.
Extension scripts that want to modify permissions for their GUCs
could follow that same principle to be sure.

regards, tom lane




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread Peter Smith
On Fri, Mar 4, 2022 at 5:55 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 2, 2022 at 6:38 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada 
> >  wrote:
> > > After more thoughts, should we do both AbortOutOfAnyTransaction() and 
> > > error
> > > message handling while holding interrupts? That is,
> > >
> > > HOLD_INTERRUPTS();
> > > EmitErrorReport();
> > > FlushErrorState();
> > > AbortOutOfAny Transaction();
> > > RESUME_INTERRUPTS();
> > > pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> > > er());
> > >
> > > I think it's better that we do clean up first and then do other works 
> > > such as
> > > sending the message to the stats collector and updating the catalog.
> > I agree. Fixed. Along with this change, I corrected the header comment of
> > DisableSubscriptionOnError, too.
> >
> >
> > > Here are some comments on v24 patch:
> > >
> > > +/* Look up our subscription in the catalogs */
> > > +tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> > > +
> > > CStringGetDatum(MySubscription->name));
> > >
> > > s/catalogs/catalog/
> > >
> > > Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
> > Changed.
> >
> >
> > > ---
> > > +if (!HeapTupleIsValid(tup))
> > > +ereport(ERROR,
> > > +errcode(ERRCODE_UNDEFINED_OBJECT),
> > > +errmsg("subscription \"%s\" does not
> > > exist",
> > > +   MySubscription->name));
> > >
> > > I think we should use elog() here rather than ereport() since it's a
> > > should-not-happen error.
> > Fixed
> >
> >
> > > ---
> > > +/* Notify the subscription will be no longer valid */
> > >
> > > I'd suggest rephrasing it to like "Notify the subscription will be 
> > > disabled". (the
> > > subscription is still valid actually, but just disabled).
> > Fixed. Also, I've made this sentence past one, because of the code place
> > change below.
> >
> >
> > > ---
> > > +/* Notify the subscription will be no longer valid */
> > > +ereport(LOG,
> > > +errmsg("logical replication subscription
> > > \"%s\" will be disabled due to an error",
> > > +   MySubscription->name));
> > > +
> > >
> > > I think we can report the log at the end of this function rather than 
> > > during the
> > > transaction.
> > Fixed. In this case, I needed to adjust the comment to indicate the 
> > processing
> > to disable the sub has *completed* as well.
> >
> > > ---
> > > +my $cmd = qq(
> > > +CREATE TABLE tbl (i INT);
> > > +ALTER TABLE tbl REPLICA IDENTITY FULL;
> > > +CREATE INDEX tbl_idx ON tbl(i));
> > >
> > > I think we don't need to set REPLICA IDENTITY FULL to this table since 
> > > there is
> > > notupdate/delete.
> > >
> > > Do we need tbl_idx?
> > Removed both the replica identity and tbl_idx;
> >
> >
> > > ---
> > > +$cmd = qq(
> > > +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> > > +sr.srsubstate IN ('s', 'r'));
> > > +$node_subscriber->poll_query_until('postgres', $cmd);
> > >
> > > It seems better to add a condition of srrelid just in case.
> > Makes sense. Fixed.
> >
> >
> > > ---
> > > +$cmd = qq(
> > > +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> > > s.subname =
> > > +'sub' AND s.subenabled IS FALSE);
> > > +$node_subscriber->poll_query_until('postgres', $cmd)
> > > +  or die "Timed out while waiting for subscriber to be disabled";
> > >
> > > I think that it's more natural to directly check the subscription's 
> > > subenabled.
> > > For example:
> > >
> > > SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
> > Fixed. I modified another code similar to this for tablesync error as well.
> >
> >
> > > ---
> > > +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> > > +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> > > COUNT(1)
> > > += 3 FROM tbl WHERE i = 3);
> > > +$node_subscriber->poll_query_until('postgres', $cmd)
> > > +  or die "Timed out while waiting for applying";
> > >
> > > I think it's better to wait for the subscriber to catch up and check the 
> > > query
> > > result instead of using poll_query_until() so that we can check the query 
> > > result
> > > in case where the test fails.
> > I modified the code to wait for the subscriber and deleted poll_query_until.
> > Also, when I consider the test failure for this test as you mentioned,
> > expecting and checking the number of return value that equals 3
> > would be better. So, I expressed this point in my test as well,
> > according to your advice.
> >
> >
> > > ---
> > > +$cmd = qq(DROP INDEX tbl_unique);
> > > +$node_subscriber->safe_psql('postgres', $cmd);
> > >
> > > In the newly added tap tests, all queries are consistently assigned to 
> > > $cmd and
> > > executed even when the query is used only once. It seems a different 
> > > style from
> 

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Victor Yegorov
пн, 7 мар. 2022 г. в 00:34, Andres Freund :

> One thing that's likely worth doing as part of the cross version upgrade
> test,
> even if it wouldn't even help in this case, is to run amcheck post
> upgrade. Just dumping data isn't going to touch indices at all.
>
> A sequence of
>   pg_upgrade; amcheck; upgrade all extensions; amcheck;
> would make sense.
>

Is it possible to amcheck gist indexes?..


-- 
Victor Yegorov


Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Mark Dilger



> On Mar 6, 2022, at 2:57 PM, Tom Lane  wrote:
> 
> I don't think this is materially different from what we do with
> permissions on (say) functions.  If you want to revoke the public
> SET privilege on some USERSET variable, you instantiate the default
> and then revoke.  You end up with an empty ACL stored in pg_setting_acl,
> and voila.

I assume you mean the implementation of REVOKE does this, not that the user 
needs to do both a grant and a revoke.

> It'd likely be necessary to refuse to record a grant/revoke on
> an unknown GUC, since if we don't know the GUC then we can't know
> what the relevant default ACL ought to be.  But I bet your existing
> patch has some dubious behavior in that case too.

The existing patch allows grants on unknown gucs, because it can't know what 
guc an upgrade script will introduce, and the grant statement may need to 
execute before the guc exists.  That opens a window for granting privileges on 
non-existent gucs.  That sounds bad, but I don't know of any actual harm that 
it does, beyond just being ugly.

With your proposal, it sounds like we could avoid that ugliness, so I'm 
inclined to at least try doing it as you propose.

Thanks for the suggestion!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Tom Lane
Mark Dilger  writes:
>> On Mar 6, 2022, at 2:13 PM, Tom Lane  wrote:
>> ... Or, if that's our position, why are there
>> per-GUC changes at all, rather than just redefining what the
>> context values mean?  (That is, why not redefine USERSET and
>> SUSET as simply indicating the default ACL to be applied if there's
>> no entry in the catalog.)

> To my knowledge, there is no mechanism to revoke an implicit privilege.  You 
> can revoke a privilege explicitly listed in an aclitem[], but only if the 
> privilege is being tracked that way.

So?  What I'm suggesting is along the lines of

(1) pg_setting_acl starts out empty, or at least mostly empty (maybe
there are a few GUCs that need custom values).

(2) If there's a pg_setting_acl entry for a GUC that's to be set,
we apply it: either it grants the desired permission or it doesn't.

(3) If there's no entry, then for a USERSET GUC we assume that the
entry would be like "=s/postgres", while for any other context value
we assume the ACL grants nothing.

I don't think this is materially different from what we do with
permissions on (say) functions.  If you want to revoke the public
SET privilege on some USERSET variable, you instantiate the default
and then revoke.  You end up with an empty ACL stored in pg_setting_acl,
and voila.

It'd likely be necessary to refuse to record a grant/revoke on
an unknown GUC, since if we don't know the GUC then we can't know
what the relevant default ACL ought to be.  But I bet your existing
patch has some dubious behavior in that case too.

regards, tom lane




Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Mark Dilger



> On Mar 6, 2022, at 2:13 PM, Tom Lane  wrote:
> 
> 1. If we need to change these two contrib modules, doesn't that imply
> a lot of changes forced on external modules as well?  What are the
> security implications if somebody doesn't make such a change?
> 
> 2. It looks to me like if someone installs the updated postgres_fdw.so,
> but doesn't run ALTER EXTENSION UPDATE, they are going to be left with a
> rather broken extension.  This seems not good either, especially if it's
> multiplied by a boatload of third-party extensions requiring updates.
> 
> So I think some more thought is needed to see if we can't avoid
> the need to touch extension modules in this patch.  Or at least
> avoid the need for synchronized C-level and SQL-level updates,
> because that is going to create a lot of pain for end users.
> 
> I'm also fairly distressed by the number of changes in guc.c,
> mainly because I fear that that means that pending patches that
> add GUC variables will be subtly incorrect/insecure if they're not
> updated to account for this.  Frankly, I also reject the apparent
> position that we need to support forbidding users from touching
> many of these GUCs.  Or, if that's our position, why are there
> per-GUC changes at all, rather than just redefining what the
> context values mean?  (That is, why not redefine USERSET and
> SUSET as simply indicating the default ACL to be applied if there's
> no entry in the catalog.)

To my knowledge, there is no mechanism to revoke an implicit privilege.  You 
can revoke a privilege explicitly listed in an aclitem[], but only if the 
privilege is being tracked that way.

Userset variables are implicitly settable by any user.  There was a request, 
off-list as I recall, to make it possible to revoke the privilege to set 
variables such as "work_mem".  To make that possible, but not change the 
default behavior vis-a-vis prior releases, I upgraded most userset variables to 
suset with a corresponding grant to public on the variable.  Sites which wish 
to have a more restrictive policy on such variables can revoke that privilege 
from public and instead issue more restrictive grants.  There were a few 
variables where such treatment didn't seem sensible, such as ones to do with 
client connections, and I left them alone.  I didn't insist on a defense for 
why any particular setting needed to be revocable in order to apply this 
treatment.  My assumption was that sites should be allowed to determine their 
own security policies per setting unless there is a technical difficulty for a 
given setting that would make it overly burdensome to implement.

It seemed more complete to do the same thing for contrib modules, so I did.

If a third-party module doesn't do the equivalent operation, they would simply 
continue with their userset variables working as before.  I don't see a 
specific security concern with that, though I'd be happy to consider arguments 
to the contrary.

I believe there are also some issues with this patch relating to 
pg_dump/pg_restore and to pg_upgrade, which EDB is working to address.  We 
intend to repost something soon.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Andres Freund
Hi,

On 2022-03-06 07:46:04 -0500, Andrew Dunstan wrote:
> This is an area not currently touched by the buildfarm's cross version
> upgrade testing, which basically compares a pre-upgrade and post-upgrade
> dump of the databases. The upgraded cluster does contain
> contrib_regression_ltree.
> 
> I'm open to suggestions on how we might improve the buildfarm's testing
> of upgraded indexes generally.

One thing that's likely worth doing as part of the cross version upgrade test,
even if it wouldn't even help in this case, is to run amcheck post
upgrade. Just dumping data isn't going to touch indices at all.

A sequence of
  pg_upgrade; amcheck; upgrade all extensions; amcheck;
would make sense.

Greetings,

Andres Freund




Comment typo in CheckCmdReplicaIdentity

2022-03-06 Thread Peter Smith
PSA patch to fix a comment typo.

(The 'OR' should not be uppercase - that keyword is irrelevant here).

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Fix-comment-typo-CheckCmdReplicaIdentity.patch
Description: Binary data


Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Zhihong Yu
On Sat, Mar 5, 2022 at 8:17 PM Julien Rouhaud  wrote:

> On Sat, Mar 05, 2022 at 06:10:44PM -0800, Zhihong Yu wrote:
> >
> > Looking at pg_stat_statements, there doesn't seem to be timestamp column
> > for when the underlying query is performed.
> > Since the same query can be run multiple times, the absence of timestamp
> > column makes finding the most recent invocation of the query difficult.
> >
> > Does it make sense to add such a column ?
>
> I don't think it would be that helpful.  Why do you need to only know when
> the
> last execution was, but no other details among every other cumulated
> counters?
>
> You should consider using some other tools on top of pg_stat_statements
> (and
> possibly other extensions) that performs snapshot regularly and can show
> you
> all the details at the given frequency.
>
Hi,
The current design of pg_stat_statements doesn't have the concept of
observation.

By observation I mean scenarios where pg_stat_statements is read by people
doing performance tuning.

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with mean_time
of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Using other tools, a lot of the information in pg_stat_statements would be
duplicated to distinguish the counters recorded w.r.t. tuning operation.

I think pg_stat_statements can do better in this regard.

Cheers


Re: row filtering for logical replication

2022-03-06 Thread Tomas Vondra
On 3/3/22 21:07, Euler Taveira wrote:
> On Thu, Mar 3, 2022, at 7:47 AM, Amit Kapila wrote:
>> LGTM. I'll push this tomorrow unless Tomas or Euler feels otherwise.
> Sounds good to me.
> 

+1

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: role self-revocation

2022-03-06 Thread David G. Johnston
On Sun, Mar 6, 2022 at 9:53 AM Tom Lane  wrote:

> Robert Haas  writes:
> > ... Suppose the superuser grants "admin" to both "joe" and "sally".
> > Now "joe" can SET ROLE to "admin" and revoke it from "sally", and the
> > superuser has no tool to prevent this.
>
> Really?
>
> regression=# grant admin to joe;
> GRANT ROLE
> regression=# grant admin to sally;
> GRANT ROLE
> regression=# \c - joe
> You are now connected to database "regression" as user "joe".
> regression=> revoke admin from sally;
> ERROR:  must have admin option on role "admin"
> regression=> set role admin;
> SET
> regression=> revoke admin from sally;
> ERROR:  must have admin option on role "admin"
>
> I think there is an issue here around exactly what the admin option
> means, but if it doesn't grant you the ability to remove grants
> made by other people, it's pretty hard to see what it's for.
>
>
Precisely.

The current system, with the session_user exception, basically guides a
superuser to define two kinds of roles.

Groups: No login, permission grants
Users: Login, inherits permissions from groups, can manage group membership
if given WITH ADMIN OPTION.

The original example using only users is not all that compelling to me.
IMO, DBAs should not be setting up their system that way.

Two questions remain:

1. Are we willing to get rid of the session_user exception?

2. Do we want to track who the grantor is for role membership grants and
institute a requirement that non-superusers can only revoke the grants that
they personally made?

I'm personally in favor of getting rid of the session_user exception, which
nicely prevents the problem at the beginning of this thread and further
encourages the DBA to define groups and roles with a greater
separation-of-concerns design.  WITH ADMIN OPTION is sufficient.

I think tracking grantor information for role membership would allow for
greater auditing capabilities and a better degree of control in the
permissions system.

In short, I am in favor of both options.  The grantor tracking seems to be
headed for acceptance.

So, do we really want to treat every single login role as a potential group
by keeping the session_user exception?

David J.


Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Tomas Vondra
On 3/6/22 08:09, Alexander Korotkov wrote:
> Hi!
> 
> Sorry for this terrible oversight by me.
> 
> On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
>  wrote:
>> On 3/4/22 23:09, Nikita Glukhov wrote:
>>> On 04.03.2022 23:28, Tom Lane wrote:
>>>
 Tomas Vondra  writes:
> On 3/4/22 20:29, Nikita Glukhov wrote:
>> So, we probably have corrupted indexes that were updated since such
>> "incomplete" upgrade of ltree.
> IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
> installed version of the extension, and that's intentional.
 Yeah, exactly.  But this opens up an additional consideration we
 have to account for: whatever we do needs to work with either 1.1
 or 1.2 SQL-level versions of the extension.

  regards, tom lane
>>>
>>> It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
>>> is not so much related to PG12 => PG13+ upgrades.
> 
> So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
> PG12 and ltree 1.2 on PG13+.  And there are many scenarios involving.
> 
> It seems too difficult to identify all the broken cases in the release
> notes.  What about applying a patch and asking all ltree users to
> reindex their indexes?
> 

Yeah. I think this is getting so complicated that there's little chance
we'd be able to clearly explain when to reindex.

>> Well, it's quite a mess :-(
>>
>> It very clearly is not just 1.1 -> 1.2 upgrade issue, because you can
>> get a crash with 1.1 after PG12 -> PG13 upgrade, as demonstrated
>> earlier. So just "fixing" the extension upgrade is no enough.
>>
>> But as you showed, 1.1 -> 1.2 upgrade is broken too.
>>
>>>
>>> You can see here another problem: installation of opclass options
>>> procedure does not invalidate relation's opclass options cache.
>>>
>>
>> Seems like that.
> 
> I think opclass options procedure shouldn't normally change opclass
> options chache.  Before installation of options procedure, there
> should be no options.  And options procedure shouldn't change the
> defaults in this case.
> 

I should clarify I haven't looked into the details - I merely verified
accessing the index in the same backend works, and after reconnection it
crashes (per Nikita's example). I don't know why that happans.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: role self-revocation

2022-03-06 Thread Tom Lane
Robert Haas  writes:
> ... Suppose the superuser grants "admin" to both "joe" and "sally".
> Now "joe" can SET ROLE to "admin" and revoke it from "sally", and the
> superuser has no tool to prevent this.

Really?

regression=# grant admin to joe;
GRANT ROLE
regression=# grant admin to sally;
GRANT ROLE
regression=# \c - joe
You are now connected to database "regression" as user "joe".
regression=> revoke admin from sally;
ERROR:  must have admin option on role "admin"
regression=> set role admin;
SET
regression=> revoke admin from sally;
ERROR:  must have admin option on role "admin"

I think there is an issue here around exactly what the admin option
means, but if it doesn't grant you the ability to remove grants
made by other people, it's pretty hard to see what it's for.

regards, tom lane




Re: role self-revocation

2022-03-06 Thread Joshua Brindle
On Sun, Mar 6, 2022 at 10:19 AM Robert Haas  wrote:
>
> On Fri, Mar 4, 2022 at 5:20 PM David G. Johnston
>  wrote:
> > I think I disagree.  Or, at least, the superuser has full control of 
> > dictating how role membership is modified and that seems sufficient.
>
> The point is that the superuser DOES NOT have full control. The
> superuser cannot prevent relatively low-privileged users from undoing
> things that the superuser did intentionally and doesn't want reversed.
>
> The choice of names in my example wasn't accidental. If the granted
> role is a login role, then the superuser's intention was to vest the
> privileges of that role in some other role, and it is surely not right
> for that role to be able to decide that it doesn't want it's
> privileges to be so granted. That's why I chose the name "peon". In
> your example, where you chose the name "admin", the situation is less
> clear. If we imagine the granted role as a container for a bundle of
> privileges, giving it the ability to administer itself feels more
> reasonable. However, I am very much unconvinced that it's correct even
> there. Suppose the superuser grants "admin" to both "joe" and "sally".
> Now "joe" can SET ROLE to "admin" and revoke it from "sally", and the
> superuser has no tool to prevent this.
>
> Now you can imagine a situation where the superuser is totally OK with
> either "joe" or "sally" having the ability to lock the other one out,
> but I don't think it's right to say that this will be true in all
> cases.
>

Another example here is usage of groups in pg_hba.conf, if the admin
has a group of users with stronger authentication requirements: e.g.,

hostssl all +certonlyusers all cert map=certmap clientcert=1

and one can remove their membership, they can change their
authentication requirements.




Re: role self-revocation

2022-03-06 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2022 at 4:34 PM Tom Lane  wrote:
>> Agreed, this is not something to move on quickly.  We might want
>> to think about adjusting pg_dump to use explicit GRANTED BY
>> options in GRANT/REVOKE a release or two before making incompatible
>> changes.

> Uggh. I really want to make some meaningful progress here before the
> heat death of the universe, and I'm not sure that this manner of
> proceeding is really going in that direction. That said, I do entirely
> see your point. Are you thinking we'd actually add a GRANTED BY clause
> to GRANT/REVOKE, vs. just wrapping it in SET ROLE incantations of some
> sort?

I was thinking the former ... however, after a bit of experimentation
I see that we accept "grant foo to bar granted by baz" a VERY long
way back, but the "granted by" option for object privileges is
(a) pretty new and (b) apparently restrictively implemented:

regression=# grant delete on alices_table to bob granted by alice;
ERROR:  grantor must be current user

That's ... surprising.  I guess whoever put that in was only
interested in pro-forma SQL syntax compliance and not in making
a usable feature.

So if we decide to extend this change into object privileges
it would be advisable to use SET ROLE, else we'd be giving up
an awful lot of backwards compatibility in dump scripts.
But if we're only talking about role grants then I think
GRANTED BY would work fine.

regards, tom lane




Re: Adding CI to our tree (ccache)

2022-03-06 Thread Justin Pryzby
On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> I tried to use it, but saw that no caching was happening, and debugged
> it. Which yielded that it can't be used due to the way output files are
> specified (and due to multiple files, but that can be prevented with an
> msbuild parameter).

Could you give a hint about to the msbuild param to avoid processing multiple
files with cl.exe?  I'm not able to find it...

I don't know about the issue with output filenames ..

-- 
Justin




Re: Fix overflow in DecodeInterval

2022-03-06 Thread Joseph Koshakow
Hi All,

Sorry for the delay in the new patch, I've attached my most recent
patch to this email. I ended up reworking a good portion of my previous
patch so below I've included some reasons why, notes on my current
approach, and some pro/cons to the approach.

* The main reason for the rework had to do with double conversions and
shared code.

* The existing code for rounding had a lot of int to double
casting and vice versa. I *think* that doubles are able to completely
represent the range of ints. However doubles are not able to represent
the full range of int64. After making the change I started noticing
a lot of lossy behavior. One thought I had was to change the doubles
to long doubles, but I wasn't able to figure out if long doubles could
completely represent the range of int64. Especially since their size
varies depending on the architecture. Does anyone know the answer to
this?

* I ended up creating two intermediate data structures for Intervals.
One for decoding and one for everything else. I'll go into more detail
below.
* One common benefit was that they both contain a usec field which
means that the Interval methods no longer need to carry around a
separate fsec argument.
* The obvious con here is that Intervals require two unique
intermediate data structures, while all other date/time types
can share a single intermediate data structure. I find this to
be a bit clunky.

* pg_itm_in is the struct used for Interval decoding. It's very similar
to pg_tm, except all of the time related fields are collapsed into a
single `int64 usec` field.
* The biggest benefit of this was that all int64-double conversions
are limited to a single function, AdjustFractMicroseconds. Instead
of fractional units flowing down over every single time field, they
only need to flow down into the single `int64 usec` field.
* Overflows are caught much earlier in the decoding process which
helps avoid wasted work.
* I found that the decoding code became simpler for time fields,
though this is a bit subjective.

* pg_itm is the struct used for all other Interval functionality. It's
very similar to pg_tm, except the tm_hour field is converted from int
to int64 and an `int tm_usec` field was added.
* When encoding and working with Intervals, we almost always want
to break the time field out into hours, min, sec, usec. So it's
helpful to have a common place to do this, instead of every
function duplicating this code.
* When breaking the time fields out, a single field will never
contain a value greater than could have fit in the next unit
higher. Meaning that minutes will never be greater than 60, seconds
will be never greater than 60, and usec will never be greater than
1,000. So hours is actually the only field that needs to be int64
and the rest can be an int.
* This also helps limit the impact to shared code (see below).

* There's some shared code between Intervals and other date/time types.
Specifically the DecodeTime function and the datetime_to_char_body
function. These functions take in a `struct pg_tm` and a `fsec_t fsec`
(fsec_t is just an alias for int32) which allows them to be re-used by
all date/time types. The only difference now between pg_tm and pg_itm
is the tm_hour field size (the tm_usec field in pg_itm can be used as
the fsec). So to get around this I changed the function signatures to
take a `struct pg_tm`, `fsec_t fsec`, and an `int64 hour` argument.
It's up to the caller to provide to correct hour field. Intervals can
easily convert pg_itm to a pg_tm, fsec, and hour. It's honestly a bit
error-prone since those functions have to explicitly ignore the
pg_tm->tm_hour field and use the provided hour argument instead, but I
couldn't think of a better less intrusive solution. If anyone has a
better idea, please don't hesitate to bring it up.

* This partly existed in the previous patch, but I just wanted to
restate it. All modifications to pg_itm_in during decoding is done via
helper functions that check for overflow. All invocations of these
functions either return an error on overflow or explicitly state why an
overflow is impossible.

* I completely rewrote the ParseISO8601Number function to try and avoid
double to int64 conversions. I tried to model it after the parsing done
in DecodeInterval, though I would appreciate extra scrutiny here.

- Joe Koshakow
From a2afce720fb65b87638a634078067a796a639ddc Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Mon, 28 Feb 2022 22:52:55 -0500
Subject: [PATCH] Rework Interval encoding and decoding

The current Interval encoding and decoding has the following issues:
  * The decoding functions have many uncaught errors that allow the
Interval value to overflow/underflow.
  * Both the decoding and encoding functions do not protect against
taking the absolute value or negating INT_MIN which leads to
undefined behavior (usually it just leaves the value unchanged at
INT_MIN, which is not the desired result).
  * The encoding and decoding process 

Re: role self-revocation

2022-03-06 Thread Robert Haas
On Fri, Mar 4, 2022 at 4:34 PM Tom Lane  wrote:
> If we are not tracking the grantors of role authorizations,
> then we are doing it wrong and we ought to fix that.

Hmm, so maybe that's the place to start. We are tracking it in the
sense that we record an OID in the catalog, but nothing that happens
after that makes a lot of sense.

> > 3. What happens if a user is dropped after being recorded as a
> > grantor?
>
> Should work the same as it does now for ordinary ACLs, ie, you
> gotta drop the grant first.

OK, that makes sense to me.

> Changing it could have some bad compatibility consequences though.
> In particular, I believe it would break existing pg_dump files,
> in that after restore all privileges would be attributed to the
> restoring superuser, and there'd be no very easy way to clean that
> up.

I kind of wonder whether we ought to attribute all privileges granted
by any superuser to the bootstrap superuser. That doesn't seem to have
any meaningful downside, and it could avoid a lot of annoying
dependencies that serve no real purpose.

> Agreed, this is not something to move on quickly.  We might want
> to think about adjusting pg_dump to use explicit GRANTED BY
> options in GRANT/REVOKE a release or two before making incompatible
> changes.

Uggh. I really want to make some meaningful progress here before the
heat death of the universe, and I'm not sure that this manner of
proceeding is really going in that direction. That said, I do entirely
see your point. Are you thinking we'd actually add a GRANTED BY clause
to GRANT/REVOKE, vs. just wrapping it in SET ROLE incantations of some
sort?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-06 Thread Robert Haas
On Fri, Mar 4, 2022 at 5:20 PM David G. Johnston
 wrote:
> I think I disagree.  Or, at least, the superuser has full control of 
> dictating how role membership is modified and that seems sufficient.

The point is that the superuser DOES NOT have full control. The
superuser cannot prevent relatively low-privileged users from undoing
things that the superuser did intentionally and doesn't want reversed.

The choice of names in my example wasn't accidental. If the granted
role is a login role, then the superuser's intention was to vest the
privileges of that role in some other role, and it is surely not right
for that role to be able to decide that it doesn't want it's
privileges to be so granted. That's why I chose the name "peon". In
your example, where you chose the name "admin", the situation is less
clear. If we imagine the granted role as a container for a bundle of
privileges, giving it the ability to administer itself feels more
reasonable. However, I am very much unconvinced that it's correct even
there. Suppose the superuser grants "admin" to both "joe" and "sally".
Now "joe" can SET ROLE to "admin" and revoke it from "sally", and the
superuser has no tool to prevent this.

Now you can imagine a situation where the superuser is totally OK with
either "joe" or "sally" having the ability to lock the other one out,
but I don't think it's right to say that this will be true in all
cases.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] vacuumdb --schema only

2022-03-06 Thread Justin Pryzby
On Sun, Mar 06, 2022 at 09:39:37AM +0100, Gilles Darold wrote:
> Attached a new patch version that adds the -N | --exclude-schema option
> to the vacuumdb command as suggested. Documentation updated too.
> 
> + pg_log_error("cannot vacuum all tables in schema(s) and and 
> exclude specific schema(s) at the same time");

and and

It's odd that schema_exclusion is a global var, but schemas/excluded are not.

Also, it seems unnecessary to have two schemas vars, since they can't be used
together.  Maybe there's a better way than what I did in 003.

> + for (cell = schemas ? schemas->head : NULL; cell; cell = 
> cell->next)

It's preferred to write cell != NULL

> +   boolschemas_listed = false;
...
> + for (cell = schemas ? schemas->head : NULL; cell; cell = 
> cell->next)
> + {
> + if (!schemas_listed) {
> + appendPQExpBufferStr(_query,
> +  " AND 
> pg_catalog.quote_ident(ns.nspname)");
> + if (schema_exclusion)
> + appendPQExpBufferStr(_query, " 
> NOT IN (");
> + else
> + appendPQExpBufferStr(_query, " 
> IN (");
> +
> + schemas_listed = true;
> + }
> + else
> + appendPQExpBufferStr(_query, ", ");
> +
> + appendStringLiteralConn(_query, cell->val, 
> conn);
> + appendPQExpBufferStr(_query, 
> "::pg_catalog.regnamespace::pg_catalog.name");
> +
> + }
> + /* Finish formatting schema filter */
> + if (schemas_listed)
> + appendPQExpBufferStr(_query, ")\n");
>   }

Maybe it's clearer to write this with =ANY() / != ALL() ?
See 002.

-- 
Justin
>From 9f1b7f2fb0849a810dc1fa5c1c03d5ff4e2e7d55 Mon Sep 17 00:00:00 2001
From: Gilles Darold 
Date: Sun, 6 Mar 2022 09:39:37 +0100
Subject: [PATCH 1/3] vacuumdb --schema only
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Le 04/03/2022 à 11:56, Justin Pryzby a écrit :
> On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
>> The attached patch implements that. Option -n | --schema can be used
>> multiple time and can not be used together with options -a or -t.
> Yes, thanks.
>
> I suggest there should also be an --exclude-schema.
>
>> I do not propose to extend the VACUUM and ANALYZE commands because their
>> current syntax doesn't allow me to see an easy way to do that
> I think this would be easy with the parenthesized syntax.
> I'm not suggesting to do it there, though.
>
>> +	/*
>> +	 * When filtereing on schema name, filter by table is not allowed.
>> +	 * The schema name can already be set in a fqdn table name.
> set *to*
>

Attached a new patch version that adds the -N | --exclude-schema option
to the vacuumdb command as suggested. Documentation updated too.

I will add this patch to the commitfest unless there is cons about
adding these options.

--
Gilles Darold
---
 doc/src/sgml/ref/vacuumdb.sgml| 64 ++
 src/bin/scripts/t/100_vacuumdb.pl |  9 +++
 src/bin/scripts/t/101_vacuumdb_all.pl |  3 +
 src/bin/scripts/vacuumdb.c| 79 ++-
 4 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cbc..378328afb3d 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Andrew Dunstan


On 3/4/22 15:28, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 3/4/22 20:29, Nikita Glukhov wrote:
>>> So, we probably have corrupted indexes that were updated since such 
>>> "incomplete" upgrade of ltree.
>> IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
>> installed version of the extension, and that's intentional.
> Yeah, exactly.  But this opens up an additional consideration we
> have to account for: whatever we do needs to work with either 1.1
> or 1.2 SQL-level versions of the extension.
>
>   


This is an area not currently touched by the buildfarm's cross version
upgrade testing, which basically compares a pre-upgrade and post-upgrade
dump of the databases. The upgraded cluster does contain
contrib_regression_ltree.

I'm open to suggestions on how we might improve the buildfarm's testing
of upgraded indexes generally.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: New Table Access Methods for Multi and Single Inserts

2022-03-06 Thread Bharath Rupireddy
On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
 wrote:
> > Another rebase due to conflicts in 0003. Attaching v6 for review.
>
> I recently touched the topic of multi_insert, and I remembered this
> patch. I had to dig a bit to find it, but as it's still open I've
> added some comments:

Thanks for reviving the thread. I almost lost hope in it. In fact, it
took me a while to recollect the work and respond to your comments.
I'm now happy to answer or continue working on this patch if you or
someone is really interested to review it and take it to the end.

> > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > +#define MAX_BUFFERED_TUPLES1000
> > +#define MAX_BUFFERED_BYTES65535
>
> It looks like these values were copied over from copyfrom.c, but are
> now expressed in the context of the heapam.
> As these values are now heap-specific (as opposed to the
> TableAM-independent COPY infrastructure), shouldn't we instead
> optimize for heap page insertions? That is, I suggest using a multiple
> (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.

We can do that. In fact, it is a good idea to let callers pass in as
an input to tuple_insert_begin and have it as part of
TableInsertState. If okay, I will do that in the next patch.

> > TableInsertState->flushed
> > TableInsertState->mi_slots
>
> I don't quite like the current storage-and-feedback mechanism for
> flushed tuples. The current assumptions in this mechanism seem to be
> that
> 1.) access methods always want to flush all available tuples at once,
> 2.) access methods want to maintain the TupleTableSlots for all
> inserted tuples that have not yet had all triggers handled, and
> 3.) we need access to the not-yet-flushed TupleTableSlots.
>
> I think that that is not a correct set of assumptions; I think that
> only flushed tuples need to be visible to the tableam-agnostic code;
> and that tableams should be allowed to choose which tuples to flush at
> which point; as long as they're all flushed after a final
> multi_insert_flush.
>
> Examples:
> A heap-based access method might want bin-pack tuples and write out
> full pages only; and thus keep some tuples in the buffers as they
> didn't fill a page; thus having flushed only a subset of the current
> buffered tuples.
> A columnstore-based access method might not want to maintain the
> TupleTableSlots of original tuples, but instead use temporary columnar
> storage: TupleTableSlots are quite large when working with huge
> amounts of tuples; and keeping lots of tuple data in memory is
> expensive.
>
> As such, I think that this should be replaced with a
> TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> managed by the tableAM, in which only the flushed tuples are visible
> to the AM-agnostic code. This is not much different from how the
> implementation currently works; except that ->mi_slots now does not
> expose unflushed tuples; and that ->flushed is replaced by an integer
> value of number of flushed tuples.

Yeah, that makes sense. Let's table AMs expose the flushed tuples
outside on which the callers can handle the after-insert row triggers
upon them.

IIUC, TableInsertState needs to have few other variables:

/* Below members are only used for multi inserts. */
/* Array of buffered slots. */
TupleTableSlot  **mi_slots;
/* Number of slots that are currently buffered. */
int32   mi_cur_slots;
/* Array of flushed slots that will be used by callers to handle
after-insert row triggers or similar events outside */
TupleTableSlot  **mi_flushed_slots ;
/* Number of slots that are currently buffered. */
int32   mi_no_of_flushed_slots;

The implementation of heap_multi_insert_flush will just set the
mi_slots to mi_flushed_slots.

> A further improvement (in my opinion) would be the change from a
> single multi_insert_flush() to a signalling-based multi_insert_flush:
> It is not unreasonable for e.g. a columnstore to buffer tens of
> thousands of inserts; but doing so in TupleTableSlots would introduce
> a high memory usage. Allowing for batched retrieval of flushed tuples
> would help in memory usage; which is why multiple calls to
> multi_insert_flush() could be useful. To handle this gracefully, we'd
> probably add TIS->mi_flush_remaining, where each insert adds one to
> mi_flush_remaining; and each time mi_flushed_slots has been handled
> mi_flush_remaining is decreased by mi_flushed_len by the handler code.
> Once we're done inserting into the table, we keep calling
> multi_insert_flush until no more tuples are being flushed (and error
> out if we're still waiting for flushes but no new flushed tuples are
> returned).

The current approach is signalling-based right?
heap_multi_insert_v2
if (state->mi_cur_slots >= mistate->max_slots ||
mistate->cur_size >= mistate->max_size)
heap_multi_insert_flush(state);

The 

Re: [Proposal] vacuumdb --schema only

2022-03-06 Thread Gilles Darold
Le 04/03/2022 à 11:56, Justin Pryzby a écrit :
> On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
>> The attached patch implements that. Option -n | --schema can be used
>> multiple time and can not be used together with options -a or -t.
> Yes, thanks.
>
> I suggest there should also be an --exclude-schema.
>
>> I do not propose to extend the VACUUM and ANALYZE commands because their
>> current syntax doesn't allow me to see an easy way to do that
> I think this would be easy with the parenthesized syntax.
> I'm not suggesting to do it there, though.
>
>> +/*
>> + * When filtereing on schema name, filter by table is not allowed.
>> + * The schema name can already be set in a fqdn table name.
> set *to*
>

Attached a new patch version that adds the -N | --exclude-schema option
to the vacuumdb command as suggested. Documentation updated too.


I will add this patch to the commitfest unless there is cons about
adding these options.


-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..3dca22e1c8 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,10 +46,12 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+static bool schema_exclusion = false;
 
 static void vacuum_one_database(ConnParams *cparams,
 vacuumingOptions *vacopts,
 int stage,
+SimpleStringList *schemas,
 SimpleStringList *tables,
 int concurrentCons,
 const char *progname, bool echo, bool quiet);
@@ -94,6 +96,8 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"analyze-in-stages", no_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
@@ -125,6 +129,8 @@ main(int argc, char *argv[])
 	SimpleStringList tables = {NULL,