date:20230406

Re: Should vacuum process config file reload more often

2023-04-06 Thread Masahiko Sawada

On Fri, Apr 7, 2023 at 8:08 AM Daniel Gustafsson  wrote:
>
> > On 7 Apr 2023, at 00:12, Melanie Plageman  wrote:
> >
> > On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson  wrote:
> >>
> >>> On 6 Apr 2023, at 23:06, Melanie Plageman  
> >>> wrote:
> >>
> >>> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> >>> limit or cost delay have been changed. If they have, they assert that
> >>> they don't already hold the AutovacuumLock, take it in shared mode, and
> >>> do the logging.
> >>
> >> Another idea would be to copy the values to local temp variables while 
> >> holding
> >> the lock, and release the lock before calling elog() to avoid holding the 
> >> lock
> >> over potential IO.
> >
> > Good idea. I've done this in attached v19.
> > Also I looked through the docs and everything still looks correct for
> > balancing algo.
>
> I had another read-through and test-through of this version, and have applied
> it with some minor changes to comments and whitespace.  Thanks for the quick
> turnaround times on reviews in this thread!

Cool!

Regarding the commit 7d71d3dd08, I have one comment:

+   /* Only log updates to cost-related variables */
+   if (vacuum_cost_delay == original_cost_delay &&
+   vacuum_cost_limit == original_cost_limit)
+   return;

IIUC by default, we log not only before starting the vacuum but also
when changing cost-related variables. Which is good, I think, because
logging the initial values would also be helpful for investigation.
However, I think that we don't log the initial vacuum cost values
depending on the values. For example, if the
autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
the initial values. I think that instead of comparing old and new
values, we can write the log only if
message_level_is_interesting(DEBUG2) is true. That way, we don't need
to acquire the lwlock unnecessarily. And the code looks cleaner to me.
I've attached the patch (use_message_level_is_interesting.patch)

Also, while testing the autovacuum delay with relopt
autovacuum_vacuum_cost_delay = 0, I realized that even if we set
autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
true. wi_dobalance comes from the following expression:

/*
 * If any of the cost delay parameters has been set individually for
 * this table, disable the balancing algorithm.
 */
tab->at_dobalance =
!(avopts && (avopts->vacuum_cost_limit > 0 ||
 avopts->vacuum_cost_delay > 0));

The initial values of both avopts->vacuum_cost_limit and
avopts->vacuum_cost_delay are -1. I think we should use ">= 0" instead
of "> 0". Otherwise, we include the autovacuum worker working on a
table whose autovacuum_vacuum_cost_delay is 0 to the balancing
algorithm. Probably this behavior has existed also on back branches
but I haven't checked it yet.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

fix.patch
Description: Binary data

use_message_level_is_interesting.patch
Description: Binary data

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/7/23 7:56 AM, Andres Freund wrote:

Hi,

On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:

Done in V63 attached and did change the associated comment a bit.


Can you send your changes incrementally, relative to V62? I'm polishing them
right now, and that'd make it a lot easier to apply your changes ontop.



Sure, please find them enclosed.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index a4f9a3c972..5b6d19d379 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -6,7 +6,7 @@ use warnings;
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 67;
+use Test::More;
 
 my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
 
@@ -112,8 +112,7 @@ sub check_slots_dropped
check_pg_recvlogical_stderr($slot_user_handle, "conflict with 
recovery");
 }
 
-# Check if all the slots on standby are dropped. These include the 'activeslot'
-# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+# Change hot_standby_feedback and check xmin and catalog_xmin values.
 sub change_hot_standby_feedback_and_wait_for_xmins
 {
my ($hsf, $invalidated) = @_;
@@ -560,7 +559,7 @@ ok( find_in_log(
   'activeslot slot invalidation is logged due to wal_level');
 
 # Verify that pg_stat_database_conflicts.confl_active_logicalslot has been 
updated
-# we now expect 3 conflicts reported as the counter persist across reloads
+# we now expect 4 conflicts reported as the counter persist across reloads
 ok( $node_standby->poll_query_until(
'postgres',
"select (confl_active_logicalslot = 4) from pg_stat_database_conflicts 
where datname = 'testdb'", 't'),
@@ -703,3 +702,5 @@ ok( pump_until(
 chomp($cascading_stdout);
 is($cascading_stdout, $expected,
 'got same expected output from pg_recvlogical decoding session on 
cascading standby');
+
+done_testing();
diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index cc4f7b5302..e6427c54c5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1945,7 +1945,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
 * Indeed, logical walsenders on standby can't decode and send data 
until
 * it's been applied.
 *
-* Physical walsenders don't need to be wakon up during replay unless
+* Physical walsenders don't need to be woken up during replay unless
 * cascading replication is allowed and time line change occured (so 
that
 * they can notice that they are on a new time line).
 *
@@ -1953,9 +1953,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
 *
 *  - physical walsenders in case of new time line and cascade
 *  replication is allowed.
-*  - always true for logical walsenders.
+*  - logical walsenders in case cascade replication is allowed (could 
not
+*  be created otherwise).
 */
-   WalSndWakeup(switchedTLI && AllowCascadeReplication(), true);
+   if (AllowCascadeReplication())
+   WalSndWakeup(switchedTLI, true);
 
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-06 Thread Michael Paquier

On Fri, Apr 07, 2023 at 08:59:22AM +0900, Michael Paquier wrote:
> Okay, cool!

Done this one with 8fcb32d.
--
Michael


signature.asc
Description: PGP signature

Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:
> Done in V63 attached and did change the associated comment a bit.

Can you send your changes incrementally, relative to V62? I'm polishing them
right now, and that'd make it a lot easier to apply your changes ontop.

Greetings,

Andres Freund

Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-06 Thread Amit Kapila

On Wed, Apr 5, 2023 at 5:58 AM Peter Smith  wrote:
>
> There are some recent comment that added new options for CREATE SUBSCRIPTION
>
...
> PSA patches to add those tab completions.
>

LGTM, so pushed. BTW, while looking at this, I noticed that newly
added options "password_required" and "run_as_owner" has incorrectly
mentioned their datatype as a string in the docs. It should be
boolean. I think "password_required" belongs to first section of docs
which says: "The following parameters control what happens during
subscription creation".

The attached patch makes those changes. What do you think?

-- 
With Regards,
Amit Kapila.

fix_doc_create_sub_1.patch
Description: Binary data

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/7/23 5:47 AM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand
 wrote:


Thanks! Will update 0005.



I noticed a few typos in the latest patches.

0004
1.
+ * Physical walsenders don't need to be wakon up during replay unless

Typo.


Thanks! Fixed in V63 just posted up-thread.



0005
2.
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub check_slots_dropped
+{
+ my ($slot_user_handle) = @_;
+
+ is($node_standby->slot('inactiveslot')->{'slot_type'}, '',
'inactiveslot on standby dropped');
+ is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot
on standby dropped');
+
+ check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery");
+}
+
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub change_hot_standby_feedback_and_wait_for_xmins
+{
+ my ($hsf, $invalidated) = @_;
+
+ $node_standby->append_conf('postgresql.conf',qq[
+ hot_standby_feedback = $hsf
+ ]);
+
+ $node_standby->reload;
+
+ if ($hsf && $invalidated)
+ {
...

The comment above change_hot_standby_feedback_and_wait_for_xmins seems
to be wrong. It seems to be copied from the previous test.



Good catch! Fixed in V63.



3.
+# Verify that pg_stat_database_conflicts.confl_active_logicalslot has
been updated
+# we now expect 3 conflicts reported as the counter persist across reloads
+ok( $node_standby->poll_query_until(
+ 'postgres',
+ "select (confl_active_logicalslot = 4) from
pg_stat_database_conflicts where datname = 'testdb'", 't'),
+ 'confl_active_logicalslot updated') or die "Timed out waiting
confl_active_logicalslot to be updated";

The comment incorrectly mentions 3 conflicts whereas the query expects 4.



Good catch, fixed in v63.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 4:20 PM, Drouvot, Bertrand wrote:

Hi,

On 4/6/23 3:39 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?



Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
 WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?


Not at all, it looks good to me.





Done in V63 attached and did change the associated comment a bit.



Few more minor comments on 0005
=
0005
1.
+   
+    Take a snapshot of running transactions and write this to WAL without
+    having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.



Thanks! Will update 0005.



Done in V63.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom b5ff35d8ace1e429c6a15d53203b00304a3ff1f4 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v63 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 27 +++
 1 file changed, 27 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..8651024b8a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,33 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Then, the
+ primary may delete system catalog rows that could be needed by the logical
+ decoding on the standby (as it does not know about the catalog_xmin on the
+ standby). Existing logical slots on standby also get invalidated if 
wal_level
+ on primary is reduced to less than 'logical'. This is done as soon as the
+ standby detects such a change in the WAL stream. It means, that for 
walsenders
+ that are lagging (if any), some WAL records up to the wal_level parameter 
change
+ on the primary won't be decoded.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From ee35ea655de6d3178a1ec1b7b70345e2f4adccb5 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v63 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 706 ++
 7 files changed, 796 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.6% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index dc44a74eb2..9253cd1c18 100644
--- a/doc/src/sgm

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/7/23 4:18 AM, Andres Freund wrote:

Hi,

TBH, I don't like the state of 0001 much. I'm working on polishing it now.



Thanks Andres!


A lot of the new functions in slot.h don't seem right to me:
- ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid?


bad naming, agree.


- Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes
   not?


because part of the existing code was doing so (checking if s->data.restart_lsn 
is valid
with/without checking if data.invalidated_at is valid) and I thought it was 
better not
to change it.


- TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h -
   also, it's not actually clear what semantics it's trying to have.


Oh right, my bad for the location.


- there's no commonality in naming between the functions used to test if a
   slot needs to be invalidated (SlotIsFreshEnough() vs
   LogicalSlotIsNotConflicting()).


Agree, my bad.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


On 4/7/23 3:59 AM, Amit Kapila wrote:

On Fri, Apr 7, 2023 at 6:55 AM Andres Freund  wrote:


On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid.


We don't need them to exit, we just need them to release the slot. Which does
happen when the query is cancelled. Imagine if that weren't the case - if a
cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
it again before disconnecting. I also did verify that indeed the slot is
released upon a cancellation.



makes sense. Thanks for the clarification!



+1, thanks Andres!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand
 wrote:
>
> Thanks! Will update 0005.
>

I noticed a few typos in the latest patches.

0004
1.
+ * Physical walsenders don't need to be wakon up during replay unless

Typo.

0005
2.
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub check_slots_dropped
+{
+ my ($slot_user_handle) = @_;
+
+ is($node_standby->slot('inactiveslot')->{'slot_type'}, '',
'inactiveslot on standby dropped');
+ is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot
on standby dropped');
+
+ check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery");
+}
+
+# Check if all the slots on standby are dropped. These include the 'activeslot'
+# that was acquired by make_slot_active(), and the non-active 'inactiveslot'.
+sub change_hot_standby_feedback_and_wait_for_xmins
+{
+ my ($hsf, $invalidated) = @_;
+
+ $node_standby->append_conf('postgresql.conf',qq[
+ hot_standby_feedback = $hsf
+ ]);
+
+ $node_standby->reload;
+
+ if ($hsf && $invalidated)
+ {
...

The comment above change_hot_standby_feedback_and_wait_for_xmins seems
to be wrong. It seems to be copied from the previous test.

3.
+# Verify that pg_stat_database_conflicts.confl_active_logicalslot has
been updated
+# we now expect 3 conflicts reported as the counter persist across reloads
+ok( $node_standby->poll_query_until(
+ 'postgres',
+ "select (confl_active_logicalslot = 4) from
pg_stat_database_conflicts where datname = 'testdb'", 't'),
+ 'confl_active_logicalslot updated') or die "Timed out waiting
confl_active_logicalslot to be updated";

The comment incorrectly mentions 3 conflicts whereas the query expects 4.

-- 
With Regards,
Amit Kapila.

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Fri, Apr 7, 2023 at 8:43 AM Andres Freund  wrote:
>
> On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> > Also, it seems you have removed the checks related to
> > slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
> > used for logical slots? If so, do you think an Assert would make
> > sense?
>
> The asserts that have been added aren't correct. There's no guarantee that the
> receiver of the procsignal still holds the same slot or any slot at all.
>

For backends, that don't hold any slot, can we skip setting the
RecoveryConflictPending and other flags?

-- 
With Regards,
Amit Kapila.

Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> Also, it seems you have removed the checks related to
> slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
> used for logical slots? If so, do you think an Assert would make
> sense?

The asserts that have been added aren't correct. There's no guarantee that the
receiver of the procsignal still holds the same slot or any slot at all.

Greetings,

Andres Freund

Re: Schema variables - new implementation for Postgres 15

2023-04-06 Thread Kirk Wolak

On Thu, Mar 30, 2023 at 4:06 AM Pavel Stehule 
wrote:

> Hi
>
> ne 26. 3. 2023 v 19:44 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> napsal:
>
>> > On Fri, Mar 24, 2023 at 08:04:08AM +0100, Pavel Stehule wrote:
>> > čt 23. 3. 2023 v 19:54 odesílatel Pavel Stehule <
>> pavel.steh...@gmail.com>
>> > napsal:
>> >
>> > > čt 23. 3. 2023 v 16:33 odesílatel Peter Eisentraut <
>> > > peter.eisentr...@enterprisedb.com> napsal:
>> > >
>> > >> The other issue is that by its nature this patch adds a lot of code
>> in a
>> > >> lot of places.  Large patches are more likely to be successful if
>> they
>> ...
>> I agree, the patch scale is a bit overwhelming. It's worth noting that
>> due to the nature of this change certain heavy lifting has to be done in
>> any case, plus I've got an impression that some part of the patch are
>> quite solid (although I haven't reviewed everything, did anyone achieve
>> that milestone?). But still, it would be of great help to simplify the
>> current implementation, and I'm afraid the only way of doing this is to
>> make trades-off about functionality vs change size & complexity.
>>
>
> There is not too much space for reduction - more - sometimes there is code
> reuse between features.
>
> I can reduce temporary session variables, but the same AtSubXact routines
> are used by memory purging routines, and if only if  you drop all dependent
> features, then you can get some interesting number of reduced lines. I can
> imagine very reduced feature set like
>
> 1) no temporary variables, no reset at transaction end
> 2) without default expressions - default is null
> 3) direct memory cleaning on drop (without possibility of saved value
> after reverted drop) or cleaning at session end always
>
> Note - @1 and @3 shares code
>
> Please don't remove #2.  With Default Values, I was eyeballing these as
pseudo constants.  I find I have a DRY (Don't Repeat Yourself) issue in our
current code base (PLPGSQL) because of the lack of shared constants
throughout the application layer.  We literally created a CONST schema with
SQL functions that return a set value.  It's kludgy, but clear enough.  (We
have approximately 50 of these).

Regards, Kirk

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-06 Thread Julien Rouhaud

Hi,

On Tue, Apr 04, 2023 at 07:00:01AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
> (CC: Amit and Julien)

(thanks for the Cc)

> This is a fork thread of Julien's thread, which allows to upgrade subscribers
> without losing changes [1].
>
> I briefly implemented a prototype for allowing to upgrade publisher node.
> IIUC the key lack was that replication slots used for logical replication 
> could
> not be copied to new node by pg_upgrade command, so this patch allows that.
> This feature can be used when '--include-replication-slot' is specified. Also,
> I added a small test for the typical case. It may be helpful to understand.
>
> Pg_upgrade internally executes pg_dump for dumping a database object from the 
> old.
> This feature follows this, adds a new option '--slot-only' to pg_dump command.
> When specified, it extracts needed info from old node and generate an SQL file
> that executes pg_create_logical_replication_slot().
>
> The notable deference from pre-existing is that restoring slots are done at 
> the
> different time. Currently pg_upgrade works with following steps:
>
> ...
> 1. dump schema from old nodes
> 2. do pg_resetwal several times to new node
> 3. restore schema to new node
> 4. do pg_resetwal again to new node
> ...
>
> The probem is that if we create replication slots at step 3, the restart_lsn 
> and
> confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
> pg_resetwal discards the WAL file. Such slots cannot extracting changes.
> To handle the issue the resotring is seprarated into two phases. At the first 
> phase
> restoring is done at step 3, excepts replicatin slots. At the second phase
> replication slots are restored at step 5, after doing pg_resetwal.
>
> Before upgrading a publisher node, all the changes gerenated on publisher must
> be sent and applied on subscirber. This is because restart_lsn and 
> confirmed_flush_lsn
> of copied replication slots is same as current_wal_insert_lsn. New node resets
> the information which WALs are really applied on subscriber and restart.
> Basically it is not problematic because before shutting donw the publisher, 
> its
> walsender processes confirm all data is replicated. See WalSndDone() and 
> related code.

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied".  Is that a hard requirement for the feature to work reliably?  If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped?  I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it.  It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.
>
> Currently physical slots are ignored because this is out-of-scope for me.
> I did not any analysis about it.

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

Re: pg_upgrade and logical replication

2023-04-06 Thread Julien Rouhaud

Hi,

On Thu, Apr 06, 2023 at 04:49:59AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Julien,
>
> > I'm attaching a v3 to fix a recent conflict with pg_dump due to 
> > a563c24c9574b7
> > (Allow pg_dump to include/exclude child tables automatically).
>
> Thank you for making the patch.
> FYI - it could not be applied due to recent commits. SUBOPT_* and attributes
> in SubscriptionInfo was added these days.

Thanks a lot for warning me!

While rebasing and testing the patch, I realized that I forgot to git-add a
chunk, so I want ahead and added some minimal TAP tests to make sure that the
feature and various checks work as expected, also demonstrating that you can
safely resume after running pg_upgrade a logical replication setup where only
some of the tables are added to a publication, where new rows and new tables
are added to the publication while pg_upgrade is running (for the new table you
obviously need to make sure that the same relation exist on the subscriber side
but that's orthogonal to this patch).

While doing so, I also realized that the subscription's underlying replication
origin remote LSN is only set after some activity is seen *after* the initial
sync, so I also added a new check in pg_upgrade to make sure that all remote
origin tied to a subscription have a valid remote_lsn when the new option is
used.  Documentation is updated to cover that, same for the TAP tests.

v4 attached.
>From a5823a0ea289860367e0ebfb76c7dad7be5337e7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v4] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional commands to be able to restore the content of pg_subscription_rel,
and addition LSN parameter in the subscription creation to restore the
underlying replication origin remote LSN.  The LSN parameter is only accepted
in CREATE SUBSCRIPTION in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has
the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr if not provided.
Explicitly passing InvalidXLogRecPtr (0/0) is however not allowed.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription have a valid replication
origin remote_lsn, and that all underlying relations are in 'r' (ready) state,
and will error out if that's not the case, logging the reason for the failure.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml  |  19 +++
 src/backend/commands/subscriptioncmds.c  |  67 +++-
 src/backend/parser/gram.y|  11 ++
 src/bin/pg_dump/pg_backup.h  |   2 +
 src/bin/pg_dump/pg_dump.c| 114 -
 src/bin/pg_dump/pg_dump.h|  13 ++
 src/bin/pg_upgrade/check.c   |  82 +
 src/bin/pg_upgrade/dump.c|   3 +-
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/option.c  |   7 +
 src/bin/pg_upgrade/pg_upgrade.h  |   1 +
 src/bin/pg_upgrade/t/003_subscription.pl | 204 +++
 src/include/nodes/parsenodes.h   |   3 +-
 13 files changed, 522 insertions(+), 5 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/003_subscription.pl

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..b23c536954 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,25 @@ PostgreSQL documentation

+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve the logical subscription state if any.  That includes
+the underlying replication origin with their remote LSN and the list of
+relations in each subscription so that replication can be simply
+resumed if the subscriptions are reactived.
+If that option isn't used, it is up to th

Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund

Hi,

TBH, I don't like the state of 0001 much. I'm working on polishing it now.

A lot of the new functions in slot.h don't seem right to me:
- ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid?
- Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes
  not?
- DoNotInvalidateSlot() seems too generic a name for a function exposed to the
  outside of slot.c
- TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h -
  also, it's not actually clear what semantics it's trying to have.
- there's no commonality in naming between the functions used to test if a
  slot needs to be invalidated (SlotIsFreshEnough() vs
  LogicalSlotIsNotConflicting()).

Leaving naming etc aside, most of these don't seem to belong in slot.h, but
should just be in slot.c - there aren't conceivable users from outside slot.c.


Independent of this patch: What's the point of invalidated_at? The only reads
of it are done like
invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) &&
   
XLogRecPtrIsInvalid(s->data.restart_lsn));
i.e. the actual LSN is not used.

ISTM that we should just have it be a boolean, and that it should be used by
the different kinds of invalidating a slot.

Greetings,

Andres Freund

Re: zstd compression for pg_dump

2023-04-06 Thread Tom Lane

Justin Pryzby  writes:
> On Thu, Apr 06, 2023 at 05:34:30PM +0200, Tomas Vondra wrote:
>> I think that's all for PG16 in this patch series.  If there's more we want to
>> do, it'll have to wait for PG17 - 

> Yes

Shouldn't the CF entry be closed as committed?  It's certainly
making the cfbot unhappy because the patch-of-record doesn't apply.

regards, tom lane

Re: Partial aggregates pushdown

2023-04-06 Thread Bruce Momjian

On Fri, Mar 31, 2023 at 05:49:21AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Momjian
> 
> > First, am I correct?
> Yes, you are correct. This patch uses new special aggregate functions for 
> partial aggregate
> (then we call this partialaggfunc).

First, my apologies for not addressing this sooner.  I was so focused on
my own tasks that I didn't realize this very important patch was not
getting attention.  I will try my best to get it into PG 17.

What amazes me is that you didn't need to create _any_ actual aggregate
functions.  Rather, you just needed to hook existing functions into the
aggregate tables for partial FDW execution.

> > Second, how far away is this from being committable
> > and/or what work needs to be done to get it committable, either for PG 16 
> > or 17?
> I believe there are three: 1. and 2. are not clear if they are necessary or 
> not; 3. are clearly necessary.
> I would like to hear the opinions of the development community on whether or 
> not 1. and 2. need to be addressed.
> 
> 1. Making partialaggfunc user-defined function
> In v17, I make partialaggfuncs as built-in functions.
> Because of this approach, v17 changes specification of BKI file and 
> pg_aggregate.
> For now, partialaggfuncs are needed by only postgres_fdw which is just an 
> extension of PostgreSQL.
> In the future, when revising the specifications for BKI files and 
> pg_aggregate when modifying existing PostgreSQL functions,
> It is necessary to align them with this patch's changes.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to making partialaggfunc as user 
> defined function.

I think we have three possible cases for aggregates pushdown to FDWs:

1)  Postgres built-in aggregate functions
2)  Postgres user-defined & extension aggregate functions
3)  aggregate functions calls to non-PG FDWs

Your patch handles #1 by checking that the FDW Postgres version is the
same as the calling Postgres version.  However, it doesn't check for
extension versions, and frankly, I don't see how we could implement that
cleanly without significant complexity.

I suggest we remove the version check requirement --- instead just
document that the FDW Postgres version should be the same or newer than
the calling Postgres server --- that way, we can assume that whatever is
in the system catalogs of the caller is in the receiving side.  We
should add a GUC to turn off this optimization for cases where the FDW
Postgres version is older than the caller.  This handles case 1-2.

For case 3, I don't even know how much pushdown those do of _any_
aggregates to non-PG servers, let along parallel FDW ones.  Does anyone
know the details?

> 2. Automation of creating definition of partialaggfuncs
> In development of v17, I manually create definition of partialaggfuncs for 
> avg, min, max, sum, count.
> I am concerned that this may be undesirable.
> So I am thinking that v17 should be modified to automate creating definition 
> of partialaggfuncs
> for all built-in aggregate functions.

Are there any other builtin functions that need this?  I think we can
just provide documention for extensions on how to do this.

> 3. Documentation
> I need add explanation of partialaggfunc to documents on postgres_fdw and 
> other places.

I can help with that once we decide on the above.

I think 'partialaggfn' should be named 'aggpartialfn' to match other
columns in pg_aggregate.

I am confused by these changes to pg_aggegate:

+{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
+  aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine',
+  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
+  aggtranstype => 'internal', aggtransspace => '48' },

...

+{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
+  aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine',
+  aggserialfn => 'numeric_avg_serialize',
+  aggdeserialfn => 'numeric_avg_deserialize',
+  aggtranstype => 'internal', aggtransspace => '128' },

Why are these marked as 'sum' but use 'avg' functions?

It would be good to explain exactly how this is diffent from background
worker parallelism.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09d6dd60dd..fe219b8d2f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -202,7 +202,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 			int *relno, int *colno);
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
-
+static bool partial_agg_ok(Aggref* agg, PgFdwRelationInfo* fpinfo);

 /*
  * Examine each qual clause in input_conds, and classify

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Fri, Apr 7, 2023 at 6:55 AM Andres Freund  wrote:
>
> On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> > After this, I think for backends that have active slots, it would
> > simply cancel the current query. Will that be sufficient? Because we
> > want the backend process should exit and release the slot so that the
> > startup process can mark it invalid.
>
> We don't need them to exit, we just need them to release the slot. Which does
> happen when the query is cancelled. Imagine if that weren't the case - if a
> cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
> it again before disconnecting. I also did verify that indeed the slot is
> released upon a cancellation.
>

makes sense. Thanks for the clarification!

-- 
With Regards,
Amit Kapila.

Re: Minimal logical decoding on standbys

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-06 12:10:57 +0530, Amit Kapila wrote:
> After this, I think for backends that have active slots, it would
> simply cancel the current query. Will that be sufficient? Because we
> want the backend process should exit and release the slot so that the
> startup process can mark it invalid.

We don't need them to exit, we just need them to release the slot. Which does
happen when the query is cancelled. Imagine if that weren't the case - if a
cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call
it again before disconnecting. I also did verify that indeed the slot is
released upon a cancellation.

Greetings,

Andres Freund

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-05 18:46:16 -0700, Andres Freund wrote:
> On 2023-04-04 17:39:45 -0700, Andres Freund wrote:
> > After that I'm planning to wait for a buildfarm cycle, and push the changes
> > necessary to use bulk extension in hio.c (the main win).
> 
> Working on that. Might end up being tomorrow.

It did. So far no complaints from the buildfarm. Although I pushed the last
piece just now...

Besides editing the commit message, not a lot of changes between what I posted
last and what I pushed. A few typos and awkward sentences, code formatting,
etc. I did change the API of RelationAddBlocks() to set *did_unlock = false,
if it didn't unlock (and thus removed setting it in the caller).

> > I might split the patch to use ExtendBufferedRelTo() into two, one for
> > vm_extend() and fsm_extend(), and one for xlogutils.c. The latter is more
> > complicated and has more of a complicated history (see [1]).
> 
> I've pushed the vm_extend() and fsm_extend() piece, and did split out the
> xlogutils.c case.

Which I pushed, just now. I did perform manual testing with creating
disconnected segments on the standby, and checking that everything behaves
well in that case.

I think it might be worth having a C test for some of the bufmgr.c API. Things
like testing that retrying a failed relation extension works the second time
round.

Greetings,

Andres Freund

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-03 12:00:30 -0700, Andres Freund wrote:
> It's not great, I agree. I tried to make it easier to read in this version by
> a) changing GetVisibilityMapPins() as I proposed
> b) added a new variable "recheckVmPins", that gets set in
>if (unlockedTargetBuffer)
>and
>if (otherBuffer != InvalidBuffer)
> c) reformulated comments

I pushed this version a couple hours ago, after a bit more polishing.

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread David Rowley

On Fri, 7 Apr 2023 at 09:44, Melanie Plageman  wrote:
> Otherwise, LGTM.

Thanks for looking.  I've also taken Justin's comments about the
README into account and fixed that part.

I've pushed the patch after a little more adjusting.  I added some
text to the docs that mention larger VACUUM_BUFFER_LIMITs can speed up
vacuum and also a reason why they might not want to go nuts with it.

I've also just now pushed the vacuumdb patch too. I ended up adjusting
some of the ERROR messages in the main patch after the following not
so nice user experience:

$ vacuumdb --buffer-usage-limit=1TB --analyze postgres
vacuumdb: vacuuming database "postgres"
SQL: VACUUM (SKIP_DATABASE_STATS, ANALYZE, BUFFER_USAGE_LIMIT '1TB')
vacuumdb: error: processing of database "postgres" failed: ERROR:
buffer_usage_limit option must be 0 or between 128 KB and 16777216 KB

$ vacuumdb --buffer-usage-limit=128KB --analyze postgres
vacuumdb: vacuuming database "postgres"
SQL: VACUUM (SKIP_DATABASE_STATS, ANALYZE, BUFFER_USAGE_LIMIT '128KB')
vacuumdb: error: processing of database "postgres" failed: ERROR:
value: "128KB": is invalid for buffer_usage_limit
HINT:  Valid units for this parameter are "B", "kB", "MB", "GB", and "TB".

David

Re: what should install-world do when docs are not available?

2023-04-06 Thread Andres Freund

Hi,

On 2023-04-06 14:52:51 -0400, Andrew Dunstan wrote:
> On 2023-04-05 We 00:57, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-04-04 21:46:11 -0700, Andres Freund wrote:
> > > Pushed the changes.
> > This failed on crake - afaict because the meson buildfarm code disables all
> > features. Because 'docs' is a feature now, the BF code building
> > doc/src/sgml/html fails.
> 
> 
> I changed it so that if the config mandates building docs we add
> -Ddocs=enabled and if it mandates building a pdf we also add
> -Ddocs_pdf=enabled. See

Sounds good, thanks!


> 
> 
> It's a slight pity that you have to pick this at setup time,

FWIW, you can change options with meson configure -Ddocs=enabled (or whatnot),
in an existing buildtree. It'll rerun configure (with caching). For options
like docs, it won't lead to rebuilding binaries.


> but I guess the upside is that we don't spend time looking for stuff we're
> not actually going to use.

And that you'll learn that tools are missing before you get through most of
the build...

Greetings,

Andres Freund

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-06 Thread Michael Paquier

On Fri, Apr 07, 2023 at 01:50:00AM +0200, Matthias van de Meent wrote:
> Yes, that was a bad oversight, which would've shown up in tests on a system
> with an endianness that my computer doesn't have...

I don't think that we have many bigendian animals in the buildfarm,
either.. 

> That looks fine to me. Thanks for picking this up and fixing the issue.

Okay, cool!
--
Michael


signature.asc
Description: PGP signature

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-06 Thread Matthias van de Meent

On Fri, 7 Apr 2023, 01:35 Michael Paquier,  wrote:

> On Fri, Apr 07, 2023 at 08:08:34AM +0900, Michael Paquier wrote:
> > So bumping mainrdata_len to uint64 is actually not entirely in line
> > with this code.  Well, it will work because we'd still fail a couple
> > of lines down, but perhaps its readability should be improved so as
> > we have an extra check in this code path to make sure that
> > mainrdata_len is not higher than PG_UINT32_MAX, then use an
> > intermediate casted variable before saving the length in the record
> > data to make clear that the type of the main static length in
> > xloginsert.c is not the same as what a record has?  The v10 I sent
> > previously blocked this possibility, but not v11.
>

Yes, that was a bad oversight, which would've shown up in tests on a system
with an endianness that my computer doesn't have...


> So, I was thinking about something like the attached tweaking this
> point, the error details a bit, applying an indentation and writing a
> commit message...  Matthias?
>

That looks fine to me. Thanks for picking this up and fixing the issue.



Kind regards,

Matthias van de Meent

Re: pg_recvlogical prints bogus error when interrupted

2023-04-06 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Hello

The patch applies and tests fine. I like the way to have both ready_to_exit and 
time_to_abort variables to control the exit sequence. I think the (void) cast 
can be removed in front of PQputCopyEnd(), PQflush for consistency purposes as 
it does not give warnings and everywhere else does not have those casts. 

thank you
Cary

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-06 Thread Michael Paquier

On Fri, Apr 07, 2023 at 08:08:34AM +0900, Michael Paquier wrote:
> So bumping mainrdata_len to uint64 is actually not entirely in line
> with this code.  Well, it will work because we'd still fail a couple
> of lines down, but perhaps its readability should be improved so as
> we have an extra check in this code path to make sure that
> mainrdata_len is not higher than PG_UINT32_MAX, then use an
> intermediate casted variable before saving the length in the record
> data to make clear that the type of the main static length in
> xloginsert.c is not the same as what a record has?  The v10 I sent
> previously blocked this possibility, but not v11.

So, I was thinking about something like the attached tweaking this
point, the error details a bit, applying an indentation and writing a
commit message...  Matthias?
--
Michael
From 10eff110176a4988295118b0f17a2a868fb9b96e Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 7 Apr 2023 08:19:24 +0900
Subject: [PATCH v12] Add more protections in WAL record APIs against overflows

This commit adds a limit to the size of an XLogRecord at 1020MiB, based
on a suggestion by Heikki Linnakangas.  This counts for the overhead
needed by the xlogreader when allocating the memory it needs to read a
record in DecodeXLogRecordRequiredSpace().  An assertion based on that
is added to make sure to detect that any additions in the XLogReader
facilities would not cause any overflows.  If that's ever the case, the
upper bound would need to be adjusted.

Before this, it was possible for an external module to create WAL
records large enough to be assembled still not replayable, causing
failures when replaying such WAL records on standbys.  One case
mentioned where this is possible is the in-core function
pg_logical_emit_message(), that allows to emit WAL records with an
arbitrary amount of data up to 2GB, much higher than the replay limit
of approximately 1GB (limit of a palloc).

This commit is a follow-up of ffd1b6b that has added similar protections
for the block-level data.  Here, the checks are extended to the whole
record length.  All the error messages related to overflow checks are
improved to provide more context about what is happening.

Author: Matthias van de Meent
Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n...@mail.gmail.com
---
 src/include/access/xlogrecord.h | 11 +
 src/backend/access/transam/xloginsert.c | 60 ++---
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 0d576f7883..f355e08e1d 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -62,6 +62,17 @@ typedef struct XLogRecord
 #define XLR_INFO_MASK			0x0F
 #define XLR_RMGR_INFO_MASK		0xF0
 
+/*
+ * XLogReader needs to allocate all the data of a WAL record in a single
+ * chunk.  This means that a single XLogRecord cannot exceed MaxAllocSize
+ * in length if we ignore any allocation overhead of the XLogReader.
+ *
+ * To accommodate some overhead, this value allows for 4M of allocation
+ * overhead, that should be plenty enough for what
+ * DecodeXLogRecordRequiredSpace() expects as extra.
+ */
+#define XLogRecordMaxSize	(1020 * 1024 * 1024)
+
 /*
  * If a WAL record modifies any relation files, in ways not covered by the
  * usual block references, this flag is set. This is not used for anything
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 008612e032..b71e690c5b 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -98,7 +98,7 @@ static int	max_registered_block_id = 0;	/* highest block_id + 1 currently
  */
 static XLogRecData *mainrdata_head;
 static XLogRecData *mainrdata_last = (XLogRecData *) &mainrdata_head;
-static uint32 mainrdata_len;	/* total # of bytes in chain */
+static uint64 mainrdata_len;	/* total # of bytes in chain */
 
 /* flags for the in-progress insertion */
 static uint8 curinsert_flags = 0;
@@ -355,7 +355,10 @@ XLogRegisterData(char *data, uint32 len)
 	Assert(begininsert_called);
 
 	if (num_rdatas >= max_rdatas)
-		elog(ERROR, "too much WAL data");
+		ereport(ERROR,
+(errmsg_internal("too much WAL data"),
+ errdetail_internal("%u out of %u data segments are already in use.",
+	num_rdatas, max_rdatas)));
 	rdata = &rdatas[num_rdatas++];
 
 	rdata->data = data;
@@ -405,9 +408,16 @@ XLogRegisterBufData(uint8 block_id, char *data, uint32 len)
 	 * regbuf->rdata_len does not grow beyond what
 	 * XLogRecordBlockHeader->data_length can hold.
 	 */
-	if (num_rdatas >= max_rdatas ||
-		regbuf->rdata_len + len > UINT16_MAX)
-		elog(ERROR, "too much WAL data");
+	if (num_rdatas >= max_rdatas)
+		ereport(ERROR,
+(errmsg_internal("too much WAL data"),
+ errdetail_internal("%u out of %u data segments are already in use.",
+	num_rdatas, max_rdatas)));
+	if (regbuf->rdata_len + le

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-06 Thread Michael Paquier

On Thu, Apr 06, 2023 at 10:54:43AM +0900, Michael Paquier wrote:
> 0002 can also be done before 0001, so I'd like to get that part
> applied on HEAD before the feature freeze and close this thread.  If
> there are any objections, please feel free..

I was doing a pre-commit review of the patch, and double-checked the
uses of mainrdata_len.   And there is this part:
/* followed by main data, if any */
if (mainrdata_len > 0)
{
if (mainrdata_len > 255)
{
*(scratch++) = (char) XLR_BLOCK_ID_DATA_LONG;
memcpy(scratch, &mainrdata_len, sizeof(uint32));
scratch += sizeof(uint32);
}
else
{
*(scratch++) = (char) XLR_BLOCK_ID_DATA_SHORT;
*(scratch++) = (uint8) mainrdata_len;
}
rdt_datas_last->next = mainrdata_head;
rdt_datas_last = mainrdata_last;
total_len += mainrdata_len;
}
rdt_datas_last->next = NULL;

So bumping mainrdata_len to uint64 is actually not entirely in line
with this code.  Well, it will work because we'd still fail a couple
of lines down, but perhaps its readability should be improved so as
we have an extra check in this code path to make sure that
mainrdata_len is not higher than PG_UINT32_MAX, then use an
intermediate casted variable before saving the length in the record
data to make clear that the type of the main static length in
xloginsert.c is not the same as what a record has?  The v10 I sent
previously blocked this possibility, but not v11.
--
Michael

signature.asc
Description: PGP signature

Re: Should vacuum process config file reload more often

2023-04-06 Thread Daniel Gustafsson

> On 7 Apr 2023, at 00:12, Melanie Plageman  wrote:
> 
> On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson  wrote:
>> 
>>> On 6 Apr 2023, at 23:06, Melanie Plageman  wrote:
>> 
>>> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
>>> limit or cost delay have been changed. If they have, they assert that
>>> they don't already hold the AutovacuumLock, take it in shared mode, and
>>> do the logging.
>> 
>> Another idea would be to copy the values to local temp variables while 
>> holding
>> the lock, and release the lock before calling elog() to avoid holding the 
>> lock
>> over potential IO.
> 
> Good idea. I've done this in attached v19.
> Also I looked through the docs and everything still looks correct for
> balancing algo.

I had another read-through and test-through of this version, and have applied
it with some minor changes to comments and whitespace.  Thanks for the quick
turnaround times on reviews in this thread!

I opted for keeping the three individual commits, squashing them didn't seem
helpful enough to future commitlog readers and no other combination of the
three made more sense than what has been in the thread.

--
Daniel Gustafsson

Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-06 Thread Thomas Munro

On Thu, Apr 6, 2023 at 6:40 PM Richard Guo  wrote:
> Seems it wins if the parallel scan becomes part of a hash join in final
> plan.  I wonder if we have a way to know that in this early stage.

I haven't tried but I'm not sure off the top of my head how to make a
decision that early unless it's super coarse grained, like is there an
anti join in here somewhere...

Generally, this seems to be a consequence of our parallel query
planner design where parallel paths are generated separately and
compete on cost, as foretold by Hong and Stonebraker[1].  It's going
to be hard to prune those early without missing out on some good
plans, if you don't have a crystal ball.

I wondered for a while what horrible technical problems would come up
if we could defer creation of paths, so partial_pathlist is empty but
a new RelOptInfo flag says "you can call
finish_the_partial_pathlist_please(root, rel)) if you really want
one".  We could try to be sparing about calling it that so we don't
finish up creating them all.  That would at least move the
should-I-bother-to-make-this-path? logic close to the place with the
knowledge that it'd be useful, in this case the inner side of a
parallel hash join.  One problem is that you have to get  a
parallel_workers number from somewhere to make a partial path.  The
hash join path code knows what its own parallel_workers number will be
(it inherits it from the outer path, though I can't immediately think
of any good reason why it shouldn't be Max(inner, outer)), but if we
were to feed that into a created-on-demand inner path that is then
cached, we'd have a horrible ordering dependency (some other join in
the query gets planned first with a different parallel_workers number
and it gets cached differently).  Yuck.  As you noted, 0 isn't a great
number either, but it leads to a another thought...

I wondered if we could somehow use the complete (non-partial) path
directly in some cases here, if certain conditions are met.  Aside
from any other technical problems, you might ask "but what about the
estimates/costing we already did for that path"; well the numbers are
usually wrong anyway!  We have complete paths, and partial paths for
arbitrary parallel_workers numbers that bubbled up from our scan
size-based heuristics.  Every time we combine more than one partial
path, we have to handle non-matching "parallel degree" (I'm using that
word to mean the result of get_parallel_divisor(path), a lightly
grilled version of parallel_workers that makes dubious assumptions,
but I digress).  Concretely, parallel append and parallel hash join
already rescale some of their input variables to match their own
nominal parallel degree (ugh, I see a few things we should fix in that
code, but this email is long enough already).  I wonder if we might
need some infrastructure to formalise that sort of thing.  For
example, look at the row estimates in the EXPLAIN of parallel append
over tables with parallel_workers explicitly set to different numbers
using ALTER TABLE.  They are wrong (they're based on different
parallel degrees; turn off parallel_leader_participation to make the
arithmetic easier to follow), while the append node itself has
rescaled them and has a good row estimate for its own nominal parallel
degree, which in turn might be wrong depending on what is above.
Perhaps EXPLAIN and everything else should use some common
infrastructure to deal with this.

In other words, we avoid the need for a try-every-parallel-degree
explosion by rescaling from some arbitrary input degree to some
arbitrary output degree, but we can't go all the way and do the
two-phase Hong thing and rescale from non-partial paths in general
(for various technical reasons that apply to more complex nodes, but
not to basic scans).  From where we are, I'm not sure how much of a
big step it is to (1) formalise the path rescaling system and (2) be
able to rescale some qualifying simple complete paths too, if needed
for places like this.

Of course you could quibble with the concept of linear scaling of
various number by parallel degrees; various things aren't linear or
even continuous (probably why Hong's system included hash join
thresholds).  Even the concept of get_parallel_divisor(path) as
applied to row estimates is suspect, because it assumes that the
planned workers will show up; if a smaller number shows up (at all,
due to max_parallel_workers, or just to this node because a higher
level parallel append sent workers to different subplans) then a node
might receive many more input tuples than expected and blow through
work_mem, even if all estimates were 100% perfect.  I have no idea
what to do about that.  At least *that* problem doesn't apply to
parallel hash, which shares the memory for the number of planned
workers, even if fewer show up, but that ideas isn't without critics
either.

I dunno.  Sorry for the wall of text/ideas.  I see unfinished business
in every direction.

[1] 
https://www.postgresql.org/message-id/flat/

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-04-06 Thread reid . thompson

Updated patches attached. 

Rebased to current master.
Added additional columns to pg_stat_global_memory_allocation to summarize 
backend allocations by type.
Updated documentation.
Corrected some issues noted in review by John Morris.
Added code re EXEC_BACKEND for dev-max-memory branch.
From 34514ae2bebe5e3ab2a0b5b680d3932b5e7706ee Mon Sep 17 00:00:00 2001
From: Reid Thompson 
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
 allocated to backends.

This builds on the work that adds backend memory allocated tracking.

Add GUC variable max_total_backend_memory.

Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
exhaust max_total_backend_memory memory will be denied with an out of memory
error causing that backend's current query/transaction to fail.  Further
requests will not be allocated until dropping below the limit. Keep this in
mind when setting this value. Due to the dynamic nature of memory allocations,
this limit is not exact. This limit does not affect auxiliary backend
processes. Backend memory allocations are displayed in the
pg_stat_memory_allocation and pg_stat_global_memory_allocation views.
---
 doc/src/sgml/config.sgml  |  30 +++
 doc/src/sgml/monitoring.sgml  |  38 +++-
 src/backend/catalog/system_views.sql  |   2 +
 src/backend/port/sysv_shmem.c |   9 +
 src/backend/postmaster/postmaster.c   |   5 +
 src/backend/storage/ipc/dsm_impl.c|  18 ++
 src/backend/storage/lmgr/proc.c   |  45 +
 src/backend/utils/activity/backend_status.c   | 183 ++
 src/backend/utils/adt/pgstatfuncs.c   |  16 +-
 src/backend/utils/hash/dynahash.c |   3 +-
 src/backend/utils/init/miscinit.c |   8 +
 src/backend/utils/misc/guc_tables.c   |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/backend/utils/mmgr/aset.c |  33 
 src/backend/utils/mmgr/generation.c   |  16 ++
 src/backend/utils/mmgr/slab.c |  15 +-
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/storage/proc.h|   7 +
 src/include/utils/backend_status.h| 120 ++--
 src/test/regress/expected/rules.out   |   4 +-
 20 files changed, 537 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bcc49aec45..4c735e180f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2133,6 +2133,36 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_total_backend_memory (integer)
+  
+   max_total_backend_memory configuration parameter
+  
+  
+  
+   
+Specifies a limit to the amount of memory (MB) that may be allocated to
+backends in total (i.e. this is not a per user or per backend limit).
+If unset, or set to 0 it is disabled.  At databse startup
+max_total_backend_memory is reduced by shared_memory_size_mb
+(includes shared buffers and other memory required for initialization).
+Each backend process is intialized with a 1MB local allowance which
+also reduces max_total_bkend_mem_bytes_available. Keep this in mind
+when setting this value. A backend request that would exhaust the limit
+will be denied with an out of memory error causing that backend's
+current query/transaction to fail. Further requests will not be
+allocated until dropping below the limit.  This limit does not affect
+auxiliary backend processes
+ or the postmaster process.
+Backend memory allocations (allocated_bytes) are
+displayed in the
+pg_stat_memory_allocation
+view.  Due to the dynamic nature of memory allocations, this limit is
+not exact.
+   
+  
+ 
+
  
  
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 70b3441412..704a75bd6e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5704,10 +5704,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
   Memory currently allocated to this backend in bytes. This is the balance
-  of bytes allocated and freed by this backend. Dynamic shared memory
-  allocations are included only in the value displayed for the backend that
-  created them, they are not included in the value for backends that are
-  attached to them to avoid double counting.
+  of bytes allocated and freed by this backend.
  
  
 
@@ -5824,6 +5821,39 @@ SELECT pid, wait_event_type, wait_event

Re: cursor use vs pg_stat_statements

2023-04-06 Thread Michael Paquier

On Wed, Oct 20, 2021 at 09:53:38AM -0400, Andrew Dunstan wrote:
> Try again with autocommit turned off. Sorry, I omitted that crucial
> detail. Exact test code attached (name/password removed)

For the same of the archives, this should be OK now under 1d477a9.
See also this thread:
https://www.postgresql.org/message-id/ebe6c507-9eb6-4142-9e4d-38b167336...@amazon.com
--
Michael


signature.asc
Description: PGP signature

Re: Improve logging when using Huge Pages

2023-04-06 Thread Justin Pryzby

On Tue, Mar 28, 2023 at 09:35:30AM +0900, Michael Paquier wrote:
> On Thu, Mar 23, 2023 at 08:50:50PM -0500, Justin Pryzby wrote:
> 
> You mean that you abused of it in some custom branch running the CI on
> github?  If I may ask, what did you do to make sure that huge pages
> are set when re-attaching a backend to a shmem area?

Yes.  I hijacked a tap test to first run "show huge_pages_active" and then
also caused it to fail, such that I could check its output.

https://cirrus-ci.com/task/6309510885670912
https://cirrus-ci.com/task/6613427737591808

> > If there's continuing aversions to using a GUC, please say so, and try
> > to suggest an alternate implementation you think would be justified.
> > It'd need to work under windows with EXEC_BACKEND.
> 
> Looking at the patch, it seems like that this should work even for
> EXEC_BACKEND on *nix when a backend reattaches..  It may be better to
> be sure of that, as well, if it has not been tested.

Arg, no - it wasn't working.  I added an assert to check that a running
server won't output "unknown".

> +++ b/src/backend/port/win32_shmem.c
> @@ -327,6 +327,10 @@ retry:
> Sleep(1000);
> continue;
> }
> +
> +   SetConfigOption("huge_pages_status", (flProtect & SEC_LARGE_PAGES) ?
> +   "on" : "off", PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT)
> 
> Perhaps better to just move that at the end of PGSharedMemoryCreate()
> for WIN32?

Maybe - I don't know.

> + 
> +  huge_pages_status (enum)
> +  
> +   huge_pages_status configuration 
> parameter
> +  
> +  
> +  
> +   
> +Reports the state of huge pages in the current instance.
> +See  for more information.
> +   
> +  
> + 
> 
> The patch needs to provide more details about the unknown state, IMO,
> to make it crystal-clear to the users what are the limitations of this
> GUC, especially regarding the fact that this is useful when "try"-ing
> to allocate huge pages, and that the value can only be consulted after
> the server has been started.

I'm not sure I agree.  It can be confusing (even harmful) to overspecify the
documentation.  I used the word "instance" specifically to refer to a running
server.  Anyone querying the status of huge pages is going to need to
understand that it doesn't make sense to ask about the memory state of a server
that's not running.  Actually, I'm having trouble imagining that anybody would
ever run postgres -C huge_page_status except if they wondered whether it might
misbehave.  If what I've written is inadequate, could you propose something ?

-- 
Justin

PS. I hadn't updated this thread because my last message from you (on
another thread) indicated that you'd gotten busy.  I'm hoping you'll
respond to these other inquiries when you're able.

https://www.postgresql.org/message-id/ZCUZLiCeXq4Im7OG%40telsasoft.com
https://www.postgresql.org/message-id/ZCUKL22GutwGrrZk%40telsasoft.com
>From 078724d0ec411de1c52cb9a43a3a29c644a8d19a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Jan 2023 18:33:51 -0600
Subject: [PATCH v7] add GUC: huge_pages_active

This is useful to show the current state of huge pages when
huge_pages=try.  The effective status is not otherwise visible without
OS level tools like gdb or /proc/N/smaps.

https://www.postgresql.org/message-id/flat/tu4pr8401mb1152ebb0d271f827e2e37a01ee...@tu4pr8401mb1152.namprd84.prod.outlook.com
---
 doc/src/sgml/config.sgml| 21 -
 src/backend/port/sysv_shmem.c   |  7 +++
 src/backend/port/win32_shmem.c  |  4 
 src/backend/storage/ipc/ipci.c  |  2 ++
 src/backend/utils/misc/guc_tables.c | 20 
 src/include/storage/pg_shmem.h  |  5 +++--
 6 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 096be7cb8cc..de74d3d1a81 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1728,7 +1728,8 @@ include_dir 'conf.d'
 server will try to request huge pages, but fall back to the default if
 that fails. With on, failure to request huge pages
 will prevent the server from starting up. With off,
-huge pages will not be requested.
+huge pages will not be requested.  The actual state of huge pages is
+indicated by the server variable .

 

@@ -10710,6 +10711,24 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  huge_pages_status (enum)
+  
+   huge_pages_status configuration parameter
+  
+  
+  
+   
+Reports the state of huge pages in the current instance:
+on, off, or (if displayed with
+postgres -C) unknown.
+This parameter is useful to determine whether allocation of huge pages
+was successful when huge_pages=try.
+See  for more information.
+   
+  
+ 
+
  
   integer

Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-06 Thread Yurii Rashkovskii

Hi Tom,

On Wed, Mar 29, 2023 at 6:55 PM Tom Lane  wrote:

> Yurii Rashkovskii  writes:
> > I would like to suggest a patch against master (although it may be worth
> > backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
> > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.
>

I answered you before (
https://www.postgresql.org/message-id/CA+RLCQwYw-Er-E_RGNCDfA514w+1YL8HGhNstxX=y1glaab...@mail.gmail.com),
but I am wondering whether you missed that response. I would really be
interested to learn why you think reading port from the pid file is a
"horrid way" to find out what was picked.

I've outlined my reasoning for this feature in detail in the referenced
message. Hope you can consider it.

-- 
http://omnigres.org
Yurii

Re: Temporary tables versus wraparound... again

2023-04-06 Thread Greg Stark

On Wed, 5 Apr 2023 at 13:42, Andres Freund  wrote:
>
> Not if you determine a relation specific xmin, and the relation is not a
> shared relation.
>
> ISTM that the problem here really is that you're relying on RecentXmin, rather
> than computing something more accurate. Why not use
> GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I
> don't think it'll matter compared to the cost of truncating the relation.

Thanks for the review!

Hm, I was just copying heapam_handler.c:593 so it would be consistent
with what we do when we create a new table. I wasn't aware we had
anything that did this extra work I'll look at it.

But I'm not sure it's the best idea to decide on how
truncate/vacuum/create table work based on what happens to be easier
to test. I mean I'm all for testable code but tieing vacuum behaviour
to what our test framework happens to not interfere with might be a
bit fragile. Like, if we happen to want to change the testing
framework I think this demonstrates that it will be super easy for it
to break the tests again. And if we discover we have to change the
relfrozenxid behaviour it might be hard to keep this test working.

> Somehow it doesn't feel right to use vac_update_relstats() in
> heapam_handler.c.
>
> I also don't like that your patch references
> heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we
> shouldn't add more comments piercing tableam than necessary.

I'll take another look at this tomorrow. Probably I can extract the
common part of that function or I've misunderstood which bits of code
are above or below the tableam.

I think fundamentally the hardest bit was that the initial
relfrozenxid bubbles up from heapam_handler.c via a return value from
set_new_filelocator. So unless I want to add a new tableam method just
for relfrozenxid it's a bit awkward to get the right data to
AddNewRelationTuple and vac_update_relstats without duplicating code
and crosslinking in comments.

> To me it seems important to test that your change actually does what it
> intends to. Possibly the test needs to be relaxed some, but I do think we want
> tests for the change.

I missed the comment about relaxing the tests until just now. I'll
think about if there's an easy way out in that direction too.

If it's cutting it too fine to the end of the commitfest we could
always just commit the warnings from the 001 patch which would already
be a *huge* help for admins running into this issue.

Chag Sameach!

--
greg

Re: Should vacuum process config file reload more often

2023-04-06 Thread Melanie Plageman

On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson  wrote:
>
> > On 6 Apr 2023, at 23:06, Melanie Plageman  wrote:
>
> > Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> > limit or cost delay have been changed. If they have, they assert that
> > they don't already hold the AutovacuumLock, take it in shared mode, and
> > do the logging.
>
> Another idea would be to copy the values to local temp variables while holding
> the lock, and release the lock before calling elog() to avoid holding the lock
> over potential IO.

Good idea. I've done this in attached v19.
Also I looked through the docs and everything still looks correct for
balancing algo.

- Melanie
From 0042067ce72a474fe4087245b978847c0b835b72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 31 Mar 2023 10:38:39 -0400
Subject: [PATCH v19 1/3] Make vacuum failsafe_active global

While vacuuming a table in failsafe mode, VacuumCostActive should not be
re-enabled. This currently isn't a problem because vacuum cost
parameters are only refreshed in between vacuuming tables and failsafe
status is reset for every table. In preparation for allowing vacuum cost
parameters to be updated more frequently, elevate
LVRelState->failsafe_active to a global, VacuumFailsafeActive, which
will be checked when determining whether or not to re-enable vacuum
cost-related delays.

Reviewed-by: Masahiko Sawada 
Reviewed-by: Daniel Gustafsson 
Reviewed-by: Kyotaro Horiguchi 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 16 +++-
 src/backend/commands/vacuum.c| 15 +++
 src/include/commands/vacuum.h|  1 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 639179aa46..2ba85bd3d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,8 +153,6 @@ typedef struct LVRelState
 	bool		aggressive;
 	/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
 	bool		skipwithvm;
-	/* Wraparound failsafe has been triggered? */
-	bool		failsafe_active;
 	/* Consider index vacuuming bypass optimization? */
 	bool		consider_bypass_optimization;
 
@@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
 	Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
 		   params->truncate != VACOPTVALUE_AUTO);
-	vacrel->failsafe_active = false;
+	VacuumFailsafeActive = false;
 	vacrel->consider_bypass_optimization = true;
 	vacrel->do_index_vacuuming = true;
 	vacrel->do_index_cleanup = true;
@@ -709,7 +707,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			else
 			{
-if (!vacrel->failsafe_active)
+if (!VacuumFailsafeActive)
 	appendStringInfoString(&buf, _("index scan bypassed: "));
 else
 	appendStringInfoString(&buf, _("index scan bypassed by failsafe: "));
@@ -2293,7 +2291,7 @@ lazy_vacuum(LVRelState *vacrel)
 		 * vacuuming or heap vacuuming.  This VACUUM operation won't end up
 		 * back here again.
 		 */
-		Assert(vacrel->failsafe_active);
+		Assert(VacuumFailsafeActive);
 	}
 
 	/*
@@ -2374,7 +2372,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 	 */
 	Assert(vacrel->num_index_scans > 0 ||
 		   vacrel->dead_items->num_items == vacrel->lpdead_items);
-	Assert(allindexes || vacrel->failsafe_active);
+	Assert(allindexes || VacuumFailsafeActive);
 
 	/*
 	 * Increase and report the number of index scans.
@@ -2616,12 +2614,12 @@ static bool
 lazy_check_wraparound_failsafe(LVRelState *vacrel)
 {
 	/* Don't warn more than once per VACUUM */
-	if (vacrel->failsafe_active)
+	if (VacuumFailsafeActive)
 		return true;
 
 	if (unlikely(vacuum_xid_failsafe_check(&vacrel->cutoffs)))
 	{
-		vacrel->failsafe_active = true;
+		VacuumFailsafeActive = true;
 
 		/*
 		 * Abandon use of a buffer access strategy to allow use of all of
@@ -2820,7 +2818,7 @@ should_attempt_truncation(LVRelState *vacrel)
 {
 	BlockNumber possibly_freeable;
 
-	if (!vacrel->do_rel_truncate || vacrel->failsafe_active ||
+	if (!vacrel->do_rel_truncate || VacuumFailsafeActive ||
 		old_snapshot_threshold >= 0)
 		return false;
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ea1d8960f4..7fc5c19e37 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -72,6 +72,21 @@ int			vacuum_multixact_freeze_table_age;
 int			vacuum_failsafe_age;
 int			vacuum_multixact_failsafe_age;
 
+/*
+ * VacuumFailsafeActive is a defined as a global so that we can determine
+ * whether or not to re-enable cost-based vacuum delay when vacuuming a table.
+ * If failsafe mode has been engaged, we will not re-enable cost-based delay
+ * for the table until after vacuuming has completed, regardless of other
+ * settings.
+ *
+ * Only VACUUM code shou

Re: Should vacuum process config file reload more often

2023-04-06 Thread Daniel Gustafsson

> On 6 Apr 2023, at 23:06, Melanie Plageman  wrote:

> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> limit or cost delay have been changed. If they have, they assert that
> they don't already hold the AutovacuumLock, take it in shared mode, and
> do the logging.

Another idea would be to copy the values to local temp variables while holding
the lock, and release the lock before calling elog() to avoid holding the lock
over potential IO.

--
Daniel Gustafsson

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread Melanie Plageman

On Fri, Apr 07, 2023 at 09:12:32AM +1200, David Rowley wrote:
> On Fri, 7 Apr 2023 at 05:20, Melanie Plageman  
> wrote:
> > GUC name mentioned in comment is inconsistent with current GUC name.
> >
> > > +/*
> > > + * Upper and lower hard limits for the buffer access strategy ring size
> > > + * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT
> > > + * option to VACUUM and ANALYZE.
> 
> I did adjust this. I wasn't sure it was incorrect as I mentioned "GUC"
> as in, the user facing setting.

Oh, actually maybe you are right then.

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index bcc49aec45..220f9ee84c 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2001,6 +2001,35 @@ include_dir 'conf.d'
>
>   
>  
> +  xreflabel="vacuum_buffer_usage_limit">
> +  
> +   vacuum_buffer_usage_limit (integer)
> +   
> +vacuum_buffer_usage_limit configuration 
> parameter
> +   
> +  
> +  
> +   
> +Specifies the size of shared_buffers to be reused
> +for each backend participating in a given invocation of
> +VACUUM or ANALYZE or in
> +autovacuum.  

Rereading this, I think it is not a good sentence (my fault).
Perhaps we should use the same language as the BUFFER_USAGE_LIMIT
options. Something like:

Specifies the
Buffer Access 
Strategy
ring buffer size used by each backend participating in a given
invocation of VACUUM or ANALYZE or
in autovacuum.

Last part is admittedly a bit awkward...

> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index ea1d8960f4..995b4bd54a 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -56,6 +56,7 @@
>  #include "utils/acl.h"
>  #include "utils/fmgroids.h"
>  #include "utils/guc.h"
> +#include "utils/guc_hooks.h"
>  #include "utils/memutils.h"
>  #include "utils/pg_rusage.h"
>  #include "utils/snapmgr.h"
> @@ -95,6 +96,26 @@ static VacOptValue get_vacoptval_from_boolean(DefElem 
> *def);
>  static bool vac_tid_reaped(ItemPointer itemptr, void *state);
>  static int   vac_cmp_itemptr(const void *left, const void *right);
>  
> +/*
> + * GUC check function to ensure GUC value specified is within the allowable
> + * range.
> + */
> +bool
> +check_vacuum_buffer_usage_limit(int *newval, void **extra,
> + GucSource 
> source)
> +{

I'm not so hot on this comment. It seems very...generic. Like it could
be the comment on any GUC check function. I'm also okay with leaving it
as is.

> @@ -341,7 +422,19 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
>  
>   MemoryContext old_context = MemoryContextSwitchTo(vac_context);
>  
> - bstrategy = GetAccessStrategy(BAS_VACUUM);
> + Assert(ring_size >= -1);
> +
> + /*
> +  * If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE
> +  * command, it overrides the value of VacuumBufferUsageLimit.  
> Either
> +  * value may be 0, in which case GetAccessStrategyWithSize() 
> will
> +  * return NULL, effectively allowing full use of shared buffers.

Maybe "unlimited" is better than "full"?

> +  */
> + if (ring_size != -1)
> + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, 
> ring_size);
> + else
> + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, 
> VacuumBufferUsageLimit);
> +
>   MemoryContextSwitchTo(old_context);
>   }
>  
> diff --git a/src/backend/commands/vacuumparallel.c 
> b/src/backend/commands/vacuumparallel.c
> @@ -365,6 +371,9 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int 
> nindexes,
>   maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
>   maintenance_work_mem;
>  
> + /* Use the same buffer size for all workers */

I would say ring buffer size -- this sounds like it is the size of a
single buffer.

> + shared->ring_nbuffers = GetAccessStrategyBufferCount(bstrategy);
> +
>   pg_atomic_init_u32(&(shared->cost_balance), 0);
>   pg_atomic_init_u32(&(shared->active_nworkers), 0);
>   pg_atomic_init_u32(&(shared->idx), 0);

> + * Upper and lower hard limits for the buffer access strategy ring size
> + * specified by the VacuumBufferUsageLimit GUC and BUFFER_USAGE_LIMIT option

I agree with your original usage of the actual GUC name, now that I
realize why you were doing it and am rereading it.

> + * to VACUUM and ANALYZE.
> + */
> +#define MIN_BAS_VAC_RING_SIZE_KB 128
> +#define MAX_BAS_VAC_RING_SIZE_KB (16 * 1024 * 1024)


Otherwise, LGTM.

Re: Add SHELL_EXIT_CODE to psql

2023-04-06 Thread Tom Lane

Corey Huinker  writes:
> This is a follow up patch to apply the committed pattern to the various
> piped output commands.

Pushed with some changes:

* You didn't update the documentation.

* I thought this was way too many copies of the same logic.  I made a
  subroutine to set these variables, and changed the original uses too.

* You didn't change \w (exec_command_write) to set these variables.
  I'm assuming that was an oversight; if it was intentional, please
  explain why.

I looked through psql's other uses of pclose() and system(),
and found:
* pager invocations
* backtick evaluation within a prompt
* \e (edit query buffer)
I think that not changing these variables in those places is a good
idea.  For instance, if prompt display could change them then they'd
never survive long enough to be useful; plus, having the behavior
vary depending on your prompt settings seems pretty horrid.
In general, these things are mainly useful to scripts, and I doubt
that we want their interactive behavior to vary from scripted behavior,
so setting them during interactive-only operations seems bad.

regards, tom lane

Re: PostgreSQL 16 Release Management Team & Feature Freeze

2023-04-06 Thread Jonathan S. Katz


On 3/21/23 11:35 AM, Jonathan S. Katz wrote:

Additionally, the RMT has set the feature freeze to be **April 8, 2023 
at 0:00 AoE**[1]. This is the last time to commit features for 
PostgreSQL 16. In  other words, no new PostgreSQL 16 feature can be 
committed after April 8, 2023 at 0:00 AoE.


This is a reminder that feature freeze is rapidly approach. The freeze 
begins at April 8, 2023 at 0:00 AoE. No new PostgreSQL 16 features can 
be committed after this time.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread Justin Pryzby

+VACUUM uses a ring like sequential scans, however, the size this ring
+controlled by the vacuum_buffer_usage_limit GUC.  Dirty pages are not removed

should say: ".. the size OF this ring IS .." ?

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread David Rowley

On Fri, 7 Apr 2023 at 05:20, Melanie Plageman  wrote:
> > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml

> This still says that the default value is -1.

Oops, I had this staged but didn't commit and formed the patch with
"git diff master.."  instead of "git diff master", so missed a few
staged changes.

> > diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
> I noticed you seemed to have removed the reference to the GUC
> vacuum_buffer_usage_limit here. Was that intentional?
> We may not need to mention "falling back" as I did before, however, the
> GUC docs mention max/min values and such, which might be useful.

Unintentional. I removed it when removing the -1 stuff. It's useful to
keep something about the fallback, so I put that part back.

> > + /* Allow specifying the default or disabling Buffer Access Strategy */
> > + if (*newval == -1 || *newval == 0)
> > + return true;
>
> This should not check for -1 since that isn't the default anymore.
> It should only need to check for 0 I think?

Thanks. That one was one of the staged fixes.

> > + /* Value upper and lower hard limits are inclusive */
> > + if (*newval >= MIN_BAS_VAC_RING_SIZE_KB &&
> > + *newval <= MAX_BAS_VAC_RING_SIZE_KB)
> > + return true;
> > +
> > + /* Value does not fall within any allowable range */
> > + GUC_check_errdetail("\"vacuum_buffer_usage_limit\" must be -1, 0 or 
> > between %d KB and %d KB",
> > + MIN_BAS_VAC_RING_SIZE_KB, 
> > MAX_BAS_VAC_RING_SIZE_KB);
>
> Also remove -1 here.

And this one.

> >   * Primary entry point for manual VACUUM and ANALYZE commands
> >   *
> > @@ -114,6 +139,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> >   booldisable_page_skipping = false;
> >   boolprocess_main = true;
> >   boolprocess_toast = true;
> > + /* by default use buffer access strategy with default size */
> > + int ring_size = -1;
>
> We need to update this comment to something like, "use an invalid value
> for ring_size" so it is clear whether or not the BUFFER_USAGE_LIMIT was
> specified when making the access strategy later". Also, I think just
> removing the comment would be okay, because this is the normal behavior
> for initializing values, I think.

Yeah, I've moved the assignment away from the declaration and wrote
something along those lines.

> > @@ -240,6 +309,17 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> >   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >errmsg("VACUUM FULL cannot be performed in 
> > parallel")));
> >
> > + /*
> > +  * BUFFER_USAGE_LIMIT does nothing for VACUUM (FULL) so just raise an
> > +  * ERROR for that case.  VACUUM (FULL, ANALYZE) does make use of it, 
> > so
> > +  * we'll permit that.
> > +  */
> > + if ((params.options & VACOPT_FULL) && !(params.options & 
> > VACOPT_ANALYZE) &&
> > + ring_size > -1)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("BUFFER_USAGE_LIMIT cannot be 
> > specified for VACUUM FULL")));
> > +
> >   /*
> >* Make sure VACOPT_ANALYZE is specified if any column lists are 
> > present.
> >*/
> > @@ -341,7 +421,20 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> >
> >   MemoryContext old_context = 
> > MemoryContextSwitchTo(vac_context);
> >
> > - bstrategy = GetAccessStrategy(BAS_VACUUM);
>
> Is it worth moving this assert up above when we do the "sanity checking"
> for VACUUM FULL with BUFFER_USAGE_LIMIT?

I didn't do this, but I did adjust that check to check ring_size != -1
and put that as the first condition. It's likely more rare to have
ring_size not set to -1, so probably should check that first.

> s/expliot/exploit

oops

> I might rephrase the last sentence(s). Since it overrides it, I think it
> is clear that if it is not specified, then the thing it overrides is
> used. Then you could phrase the whole thing like:
>
>  "If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE command,
>  it overrides the value of VacuumBufferUsageLimit.  Either value may be
>  0, in which case GetAccessStrategyWithSize() will return NULL, which is
>  the expected behavior."

Fixed.

> > diff --git a/src/backend/postmaster/autovacuum.c 
> > b/src/backend/postmaster/autovacuum.c
> > index c1e911b1b3..1b5f779384 100644
> > --- a/src/backend/postmaster/autovacuum.c
> > +++ b/src/backend/postmaster/autovacuum.c
> > @@ -2287,11 +2287,21 @@ do_autovacuum(void)
> >   /*
> > -  * Create a buffer access strategy object for VACUUM to use.  We want 
> > to
> > -  * use the same one across all the vacuum operations we perf

Re: Should vacuum process config file reload more often

2023-04-06 Thread Melanie Plageman

I think attached v18 addresses all outstanding issues except a run
through the docs making sure all mentions of the balancing algorithm are
still correct.

On Wed, Apr 5, 2023 at 9:10 AM Daniel Gustafsson  wrote:
> > On 4 Apr 2023, at 22:04, Melanie Plageman  wrote:
> >> +extern int VacuumCostLimit;
> >> +extern double VacuumCostDelay;
> >> ...
> >> -extern PGDLLIMPORT int VacuumCostLimit;
> >> -extern PGDLLIMPORT double VacuumCostDelay;
> >>
> >> Same with these, I don't think this is according to our default visibility.
> >> Moreover, I'm not sure it's a good idea to perform this rename.  This will 
> >> keep
> >> VacuumCostLimit and VacuumCostDelay exported, but change their meaning.  
> >> Any
> >> external code referring to these thinking they are backing the GUCs will 
> >> still
> >> compile, but may be broken in subtle ways.  Is there a reason for not 
> >> keeping
> >> the current GUC variables and instead add net new ones?
> >
> > When VacuumCostLimit was the same variable in the code and for the GUC
> > vacuum_cost_limit, everytime we reload the config file, VacuumCostLimit
> > is overwritten. Autovacuum workers have to overwrite this value with the
> > appropriate one for themselves given the balancing logic and the value
> > of autovacuum_vacuum_cost_limit. However, the problem is, because you
> > can specify -1 for autovacuum_vacuum_cost_limit to indicate it should
> > fall back to vacuum_cost_limit, we have to reference the value of
> > VacuumCostLimit when calculating the new autovacuum worker's cost limit
> > after a config reload.
> >
> > But, you have to be sure you *only* do this after a config reload when
> > the value of VacuumCostLimit is fresh and unmodified or you risk
> > dividing the value of VacuumCostLimit over and over. That means it is
> > unsafe to call functions updating the cost limit more than once.
> >
> > This orchestration wasn't as difficult when we only reloaded the config
> > file once every table. We were careful about it and also kept the
> > original "base" cost limit around from table_recheck_autovac(). However,
> > once we started reloading the config file more often, this no longer
> > works.
> >
> > By separating the variables modified when the gucs are set and the ones
> > used the code, we can make sure we always have the original value the
> > guc was set to in vacuum_cost_limit and autovacuum_vacuum_cost_limit,
> > whenever we need to reference it.
> >
> > That being said, perhaps we should document what extensions should do?
> > Do you think they will want to use the variables backing the gucs or to
> > be able to overwrite the variables being used in the code?
>
> I think I wasn't clear in my comment, sorry.  I don't have a problem with
> introducing a new variable to split the balanced value from the GUC value.
> What I don't think we should do is repurpose an exported symbol into doing a
> new thing.  In the case at hand I think VacuumCostLimit and VacuumCostDelay
> should remain the backing variables for the GUCs, with vacuum_cost_limit and
> vacuum_cost_delay carrying the balanced values.  So the inverse of what is in
> the patch now.
>
> The risk of these symbols being used in extensions might be very low but on
> principle it seems unwise to alter a symbol and risk subtle breakage.

In attached v18, I have flipped them. Existing (in master) GUCs which
were exported for VacuumCostLimit and VacuumCostDelay retain their names
and new globals vacuum_cost_limit and vacuum_cost_delay have been
introduced for use in the code.

Flipping these kind of melted my mind, so I could definitely use another
set of eyes double checking that the correct ones are being used in the
correct places throughout 0002 and 0003.

On Thu, Apr 6, 2023 at 3:09 PM Melanie Plageman
 wrote:
>
> v17 attached does not yet fix the logging problem or variable naming
> problem.
>
> I have not changed where AutoVacuumUpdateCostLimit() is called either.
>
> This is effectively just a round of cleanup. I hope I have managed to
> address all other code review feedback so far, though some may have
> slipped through the cracks.
>
> On Wed, Apr 5, 2023 at 2:56 PM Robert Haas  wrote:
> > On Wed, Apr 5, 2023 at 11:29 AM Melanie Plageman 
> >  wrote:
> > + /*
> > + * Balance and update limit values for autovacuum workers. We must
> > + * always do this in case the autovacuum launcher or another
> > + * autovacuum worker has recalculated the number of workers across
> > + * which we must balance the limit. This is done by the launcher when
> > + * launching a new worker and by workers before vacuuming each table.
> > + */
> >
> > I don't quite understand what's going on here. A big reason that I'm
> > worried about this whole issue in the first place is that sometimes
> > there's a vacuum going on a giant table and you can't get it to go
> > fast. You want it to absorb new settings, and to do so quickly. I
> > realize that this is about the number of workers, not the actual cost
> > limit, so th

Re: Git sources doesn't contain the INSATLL file?

2023-04-06 Thread Daniel Gustafsson

> On 6 Apr 2023, at 22:13, tison  wrote:
> 
> Hi Hackers,
> 
> I opened https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree and clone 
> the repo. The README file says that I can read the INSTALL file to understand 
> how to build from source. But there is no such file in Git sources. Is it 
> expected? If so, why?

it's very expected, the README.git file contains:

"In a release or snapshot tarball of PostgreSQL, a documentation file
named INSTALL will appear in this directory.  However, this file is not
stored in git and so will not be present if you are using a git
checkout."

The INSTALL file was removed in 54d314c93c0baf5d3bd303d206d8ab9f58be1c37 a long
time ago.

That being said, maybe README should have wording along the lines of the above
since it's now referring to a file which might not exist?

--
Daniel Gustafsson

Git sources doesn't contain the INSATLL file?

2023-04-06 Thread tison

Hi Hackers,

I opened https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree and
clone the repo. The README file says that I can read the INSTALL file to
understand how to build from source. But there is no such file in Git
sources. Is it expected? If so, why?

Best,
tison.

Re: Rethinking the implementation of ts_headline()

2023-04-06 Thread Tom Lane

Alexander Lakhin  writes:
> I've found that starting from commit 5a617d75 this query:
> SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english', 
> 'or'));
> invokes a valgrind-detected error:
> ==00:00:00:03.950 3241424== Invalid read of size 1

On my machine, I also see PG-reported errors such as "unrecognized
operator: 0".  It's a live bug all right.  We need to be more careful
about empty tsqueries.

> But the less-verbose call:
> SELECT ts_headline('', '');

> discovers a different error even on 5a617d75~1:
> ==00:00:00:04.113 3139158== Conditional jump or move depends on uninitialised 
> value(s)
> ==00:00:00:04.113 3139158==    at 0x77B44F: mark_fragment (wparser_def.c:2100)

Yeah, this one seems to be ancient sloppiness.  I don't think it has
any bad effect beyond annoying valgrind, but I fixed it anyway.

Thanks for the report!

regards, tom lane

Re: Should vacuum process config file reload more often

2023-04-06 Thread Melanie Plageman

v17 attached does not yet fix the logging problem or variable naming
problem.

I have not changed where AutoVacuumUpdateCostLimit() is called either.

This is effectively just a round of cleanup. I hope I have managed to
address all other code review feedback so far, though some may have
slipped through the cracks.

On Wed, Apr 5, 2023 at 2:56 PM Robert Haas  wrote:
> On Wed, Apr 5, 2023 at 11:29 AM Melanie Plageman  
> wrote:
> + /*
> + * Balance and update limit values for autovacuum workers. We must
> + * always do this in case the autovacuum launcher or another
> + * autovacuum worker has recalculated the number of workers across
> + * which we must balance the limit. This is done by the launcher when
> + * launching a new worker and by workers before vacuuming each table.
> + */
>
> I don't quite understand what's going on here. A big reason that I'm
> worried about this whole issue in the first place is that sometimes
> there's a vacuum going on a giant table and you can't get it to go
> fast. You want it to absorb new settings, and to do so quickly. I
> realize that this is about the number of workers, not the actual cost
> limit, so that makes what I'm about to say less important. But ... is
> this often enough? Like, the time before we move onto the next table
> could be super long. The time before a new worker is launched should
> be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default
> settings, so that's not horrible, but I'm kind of struggling to
> understand the rationale for this particular choice. Maybe it's fine.

I've at least updated this comment to be more correct/less misleading.

>
> +   if (autovacuum_vac_cost_limit > 0)
> +   VacuumCostLimit = autovacuum_vac_cost_limit;
> +   else
> +   VacuumCostLimit = vacuum_cost_limit;
> +
> +   /* Only balance limit if no cost-related storage
> parameters specified */
> +   if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance))
> +   return;
> +   Assert(VacuumCostLimit > 0);
> +
> +   nworkers_for_balance = pg_atomic_read_u32(
> +
> &AutoVacuumShmem->av_nworkersForBalance);
> +
> +   /* There is at least 1 autovac worker (this worker). */
> +   if (nworkers_for_balance <= 0)
> +   elog(ERROR, "nworkers_for_balance must be > 0");
> +
> +   VacuumCostLimit = Max(VacuumCostLimit /
> nworkers_for_balance, 1);
>
> I think it would be better stylistically to use a temporary variable
> here and only assign the final value to VacuumCostLimit.

I tried that and thought it adding confusing clutter. If it is a code
cleanliness issue, I am willing to change it, though.

On Wed, Apr 5, 2023 at 3:04 PM Daniel Gustafsson  wrote:
>
> > On 5 Apr 2023, at 20:55, Robert Haas  wrote:
>
> > Again, I don't think this is something we should try to
> > address right now under time pressure, but in the future, I think we
> > should consider ripping this behavior out.
>
> I would not be opposed to that, but I wholeheartedly agree that it's not the
> job of this patch (or any patch at this point in the cycle).
>
> > +   if (autovacuum_vac_cost_limit > 0)
> > +   VacuumCostLimit = autovacuum_vac_cost_limit;
> > +   else
> > +   VacuumCostLimit = vacuum_cost_limit;
> > +
> > +   /* Only balance limit if no cost-related storage
> > parameters specified */
> > +   if 
> > (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance))
> > +   return;
> > +   Assert(VacuumCostLimit > 0);
> > +
> > +   nworkers_for_balance = pg_atomic_read_u32(
> > +
> > &AutoVacuumShmem->av_nworkersForBalance);
> > +
> > +   /* There is at least 1 autovac worker (this worker). */
> > +   if (nworkers_for_balance <= 0)
> > +   elog(ERROR, "nworkers_for_balance must be > 0");
> > +
> > +   VacuumCostLimit = Max(VacuumCostLimit /
> > nworkers_for_balance, 1);
> >
> > I think it would be better stylistically to use a temporary variable
> > here and only assign the final value to VacuumCostLimit.
>
> I can agree with that.  Another supertiny nitpick on the above is to not end a
> single-line comment with a period.

I have fixed this.

On Thu, Apr 6, 2023 at 2:40 AM Masahiko Sawada  wrote:
>
> On Thu, Apr 6, 2023 at 12:29 AM Melanie Plageman
>  wrote:
> >
> > Thanks all for the reviews.
> >
> > v16 attached. I put it together rather quickly, so there might be a few
> > spurious whitespaces or similar. There is one rather annoying pgindent
> > outlier that I have to figure out what to do about as well.
> >
> > The remaining functional TODOs that I know of are:
> >
> > - Resolve what to do about names of GUC and vacuum variables for cost
> >   limit and cost delay (since it may affect extensions)
> >
> > - Figure o

Re: Should vacuum process config file reload more often

2023-04-06 Thread Daniel Gustafsson

> On 6 Apr 2023, at 19:18, Robert Haas  wrote:
> 
> On Thu, Apr 6, 2023 at 11:52 AM Melanie Plageman
>  wrote:
>>> Gah, I think I misunderstood you. You are saying that only calling
>>> AutoVacuumUpdateCostLimit() after napping while vacuuming a table may
>>> not be enough. The frequency at which the number of workers changes will
>>> likely be different. This is a good point.
>>> It's kind of weird to call AutoVacuumUpdateCostLimit() only after napping...
>> 
>> A not fully baked idea for a solution:
>> 
>> Why not keep the balanced limit in the atomic instead of the number of
>> workers for balance. If we expect all of the workers to have the same
>> value for cost limit, then why would we just count the workers and not
>> also do the division and store that in the atomic variable. We are
>> worried about the division not being done often enough, not the number
>> of workers being out of date. This solves that, right?
> 
> A bird in the hand is worth two in the bush, though. We don't really
> have time to redesign the patch before feature freeze, and I can't
> convince myself that there's a big enough problem with what you
> already did that it would be worth putting off fixing this for another
> year.

+1, I'd rather see we did a conservative version of the feature first and
expand upon it in the 17 cycle.

> Reading your newer emails, I think that the answer to my
> original question is "we don't want to do it at every
> vacuum_delay_point because it might be too costly," which is
> reasonable.

I think we kind of need to get to that granularity eventually, but it's not a
showstopper for this feature, and can probably benefit from being done in the
context of a larger av-worker re-think (the importance of which discussed
downthread).

--
Daniel Gustafsson

Re: what should install-world do when docs are not available?

2023-04-06 Thread Andrew Dunstan



On 2023-04-05 We 00:57, Andres Freund wrote:

Hi,

On 2023-04-04 21:46:11 -0700, Andres Freund wrote:

Pushed the changes.

This failed on crake - afaict because the meson buildfarm code disables all
features. Because 'docs' is a feature now, the BF code building
doc/src/sgml/html fails.



I changed it so that if the config mandates building docs we add 
-Ddocs=enabled and if it mandates building a pdf we also add 
-Ddocs_pdf=enabled. See




It's a slight pity that you have to pick this at setup time, but I guess 
the upside is that we don't spend time looking for stuff we're not 
actually going to use.




cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-04-06 Thread Daniel Gustafsson

> On 6 Apr 2023, at 18:39, Justin Pryzby  wrote:

> *If* you wanted to do something to fix this, you could create a key
> called files_to_remove_after_loading, and run unlink on those files
> rather than running a shell command.  Or maybe just remove the file
> unconditionally at the start of the script ?

Since the test is written in Perl, and Perl has a function for deleting files
which abstracts the platform differences, using it seems like a logical choice?
{cleanup_cmd} can be replaced with {cleanup_files} with an unlink called on
that?

--
Daniel Gustafsson

Re: cataloguing NOT NULL constraints

2023-04-06 Thread Justin Pryzby

On Thu, Apr 06, 2023 at 01:33:56AM +0200, Alvaro Herrera wrote:
> -   The forms ADD (without USING INDEX),
> +   The forms ADD (without USING INDEX, 
> and
> +   except for the NOT NULL 
> column_name
> +   form to add a table constraint),

The "except" part seems pretty incoherent to me :(

> + if (isnull)
> + elog(ERROR, "null conkey for NOT NULL constraint %u", 
> conForm->oid);

could use SysCacheGetAttrNotNull()

> + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + ereport(ERROR,
> + 
> errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("cannot add constraint to only 
> the partitioned table when partitions exist"),
> + errhint("Do not specify the ONLY 
> keyword."));
> + else
> + ereport(ERROR,
> + 
> errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("cannot add constraint to table 
> with inheritance children"),

missing "only" ?

> + conrel = table_open(ConstraintRelationId, RowExclusiveLock);

Should this be opened after the following error check ?

> + arr = DatumGetArrayTypeP(adatum);   /* ensure not toasted */
> + numkeys = ARR_DIMS(arr)[0];
> + if (ARR_NDIM(arr) != 1 ||
> + numkeys < 0 ||
> + ARR_HASNULL(arr) ||
> + ARR_ELEMTYPE(arr) != INT2OID)
> + elog(ERROR, "conkey is not a 1-D smallint array");
> + attnums = (int16 *) ARR_DATA_PTR(arr);
> +
> + for (int i = 0; i < numkeys; i++)
> + unconstrained_cols = lappend_int(unconstrained_cols, 
> attnums[i]);
> + }

Does "arr" need to be freed ?

> +  * Since the above deletion has been made visible, we 
> can now
> +  * search for any remaining constraints on this column 
> (or these
> +  * columns, in the case we're dropping a multicol 
> primary key.)
> +  * Then, verify whether any further NOT NULL or primary 
> key exist,

If I'm reading it right, I think it should say "exists"

> +/*
> + * When a primary key index on a partitioned table is to be attached an index
> + * on a partition, the partition's columns should also be marked NOT NULL.
> + * Ensure that is the case.

I think the comment may be missing words, or backwards.
The index on the *partitioned* table wouldn't be attached.
Is the index on the *partition* that's attached *to* the former index.

> +create table c() inherits(inh_p1, inh_p2, inh_p3, inh_p4);
> +NOTICE:  merging multiple inherited definitions of column "f1"
> +NOTICE:  merging multiple inherited definitions of column "f1"
> +ERROR:  relation "c" already exists

Do you intend to make an error here ?

Also, I think these table names may be too generic, and conflict with
other parallel tests, now or in the future.

> +create table d(a int not null, f1 int) inherits(inh_p3, c);
> +ERROR:  relation "d" already exists

And here ?

> +-- with explicitely specified not null constraints

sp: explicitly

-- 
Justin

Re: monitoring usage count distribution

2023-04-06 Thread Nathan Bossart

On Thu, Apr 06, 2023 at 01:32:35PM -0400, Tom Lane wrote:
> There seems to be enough support for the existing summary function
> definition to leave it as-is; Andres likes it for one, and I'm not
> excited about trying to persuade him he's wrong.  But a second
> slightly-less-aggregated summary function is clearly useful as well.
> So I'm now thinking that we do want the patch as-submitted.
> (Caveat: I've not read the patch, just the description.)

In case we want to do both, here's a 0002 that changes usagecount_avg to an
array of usage counts.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 6ad6a8e3a9ed0d0265e1869c0eaa793881c2fa77 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 Jan 2023 16:39:43 -0800
Subject: [PATCH v2 1/2] introduce pg_buffercache_usage_counts()

---
 .../expected/pg_buffercache.out   |  14 +++
 .../pg_buffercache--1.3--1.4.sql  |  13 +++
 contrib/pg_buffercache/pg_buffercache_pages.c |  46 
 contrib/pg_buffercache/sql/pg_buffercache.sql |   4 +
 doc/src/sgml/pgbuffercache.sgml   | 101 +-
 5 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_buffercache/expected/pg_buffercache.out b/contrib/pg_buffercache/expected/pg_buffercache.out
index 635f01e3b2..b745dc69ea 100644
--- a/contrib/pg_buffercache/expected/pg_buffercache.out
+++ b/contrib/pg_buffercache/expected/pg_buffercache.out
@@ -17,6 +17,12 @@ from pg_buffercache_summary();
  t| t| t
 (1 row)
 
+SELECT count(*) > 0 FROM pg_buffercache_usage_counts() WHERE buffers >= 0;
+ ?column? 
+--
+ t
+(1 row)
+
 -- Check that the functions / views can't be accessed by default. To avoid
 -- having to create a dedicated user, use the pg_database_owner pseudo-role.
 SET ROLE pg_database_owner;
@@ -26,6 +32,8 @@ SELECT * FROM pg_buffercache_pages() AS p (wrong int);
 ERROR:  permission denied for function pg_buffercache_pages
 SELECT * FROM pg_buffercache_summary();
 ERROR:  permission denied for function pg_buffercache_summary
+SELECT * FROM pg_buffercache_usage_counts();
+ERROR:  permission denied for function pg_buffercache_usage_counts
 RESET role;
 -- Check that pg_monitor is allowed to query view / function
 SET ROLE pg_monitor;
@@ -41,3 +49,9 @@ SELECT buffers_used + buffers_unused > 0 FROM pg_buffercache_summary();
  t
 (1 row)
 
+SELECT count(*) > 0 FROM pg_buffercache_usage_counts();
+ ?column? 
+--
+ t
+(1 row)
+
diff --git a/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql b/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
index 8f212dc5e9..f4702e4b4b 100644
--- a/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
+++ b/contrib/pg_buffercache/pg_buffercache--1.3--1.4.sql
@@ -15,3 +15,16 @@ LANGUAGE C PARALLEL SAFE;
 -- Don't want these to be available to public.
 REVOKE ALL ON FUNCTION pg_buffercache_summary() FROM PUBLIC;
 GRANT EXECUTE ON FUNCTION pg_buffercache_summary() TO pg_monitor;
+
+CREATE FUNCTION pg_buffercache_usage_counts(
+OUT usage_count int4,
+OUT buffers int4,
+OUT dirty int4,
+OUT pinned int4)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_buffercache_usage_counts'
+LANGUAGE C PARALLEL SAFE;
+
+-- Don't want these to be available to public.
+REVOKE ALL ON FUNCTION pg_buffercache_usage_counts() FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_buffercache_usage_counts() TO pg_monitor;
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 1c6a2f22ca..f333967c51 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -18,6 +18,7 @@
 #define NUM_BUFFERCACHE_PAGES_MIN_ELEM	8
 #define NUM_BUFFERCACHE_PAGES_ELEM	9
 #define NUM_BUFFERCACHE_SUMMARY_ELEM 5
+#define NUM_BUFFERCACHE_USAGE_COUNTS_ELEM 4
 
 PG_MODULE_MAGIC;
 
@@ -61,6 +62,7 @@ typedef struct
  */
 PG_FUNCTION_INFO_V1(pg_buffercache_pages);
 PG_FUNCTION_INFO_V1(pg_buffercache_summary);
+PG_FUNCTION_INFO_V1(pg_buffercache_usage_counts);
 
 Datum
 pg_buffercache_pages(PG_FUNCTION_ARGS)
@@ -304,3 +306,47 @@ pg_buffercache_summary(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(result);
 }
+
+Datum
+pg_buffercache_usage_counts(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	int			usage_counts[BM_MAX_USAGE_COUNT + 1] = {0};
+	int			dirty[BM_MAX_USAGE_COUNT + 1] = {0};
+	int			pinned[BM_MAX_USAGE_COUNT + 1] = {0};
+	Datum		values[NUM_BUFFERCACHE_USAGE_COUNTS_ELEM];
+	bool		nulls[NUM_BUFFERCACHE_USAGE_COUNTS_ELEM] = {0};
+
+	InitMaterializedSRF(fcinfo, 0);
+
+	for (int i = 0; i < NBuffers; i++)
+	{
+		BufferDesc *bufHdr = GetBufferDescriptor(i);
+		uint32		buf_state = pg_atomic_read_u32(&bufHdr->state);
+		int			usage_count;
+
+		if ((buf_state & BM_VALID) == 0)
+			continue;
+
+		usage_count = BUF_STATE_GET_USAGECOUNT(buf_state);
+		usage_counts[usage_count]++;
+
+		if (buf_state & BM_DIRTY)
+			dirty[usage_count]++;
+
+		if (BUF_STATE_GET_REFCOUNT(buf_state) >

Re: psql \watch 2nd argument: iteration count

2023-04-06 Thread Andrey Borodin

On Thu, Apr 6, 2023 at 10:23 PM Tom Lane  wrote:
>
> Kirk Wolak  writes:
> > Marked as Ready for Committer.
>
> Pushed with a pretty fair number of cosmetic changes.

Great, thank you!

> If you write a semicolon first, you get four, but it's the semicolon
> producing the first result not \watch.

I did not know that. Well, I knew it in parts, but did not understand
as a whole. Thanks!

Best regards, Andrey Borodin.

Re: Should vacuum process config file reload more often

2023-04-06 Thread Robert Haas

On Wed, Apr 5, 2023 at 4:59 PM Peter Geoghegan  wrote:
> I think that I agree. I think that the difficulty of tuning autovacuum
> is the actual real problem. (Or maybe it's just very closely related
> to the real problem -- the precise definition doesn't seem important.)

I agree, and I think that bad choices around what the parameters do
are a big part of the problem. autovacuum_max_workers is one example
of that, but there are a bunch of others. It's not at all intuitive
that if your database gets really big you either need to raise
autovacuum_vacuum_cost_limit or lower autovacuum_vacuum_cost_delay.
And, it's not intuitive either that raising autovacuum_max_workers
doesn't increase the amount of vacuuming that gets done. In my
experience, it's very common for people to observe that autovacuum is
running constantly, and to figure out that the number of running
workers is equal to autovacuum_max_workers at all times, and to then
conclude that they need more workers. So they raise
autovacuum_max_workers and nothing gets any better. In fact, things
might get *worse*, because the time required to complete vacuuming of
a large table can increase if the available bandwidth is potentially
spread across more workers, and it's very often the time to vacuum the
largest tables that determines whether things hold together adequately
or not.

This kind of stuff drives me absolutely batty. It's impossible to make
every database behavior completely intuitive, but here we have a
parameter that seems like it is exactly the right thing to solve the
problem that the user knows they have, and it actually does nothing on
a good day and causes a regression on a bad one. That's incredibly
poor design.

The way it works at the implementation level is pretty kooky, too. The
available resources are split between the workers, but if any of the
relevant vacuum parameters are set for the table currently being
vacuumed, then that worker gets the full resources configured for that
table, and everyone else divides up the amount that's configured
globally. So if you went and set the cost delay and cost limit for all
of your tables to exactly the same values that are configured
globally, you'd vacuum 3 times faster than if you relied on the
identical global defaults (or N times faster, where N is the value
you've picked for autovacuum_max_workers). If you have one really big
table that requires continuous vacuuming, you could slow down
vacuuming on that table through manual configuration settings and
still end up speeding up vacuuming overall, because the remaining
workers would be dividing the budget implied by the default settings
among N-1 workers instead of N workers. As far as I can see, none of
this is documented, which is perhaps for the best, because IMV it
makes no sense.

I think we need to move more toward a model where VACUUM just keeps
up. Emergency mode is a step in that direction, because the definition
of an emergency is that we're definitely not keeping up, but I think
we need something less Boolean. If the database gets bigger or smaller
or more or less active, autovacuum should somehow just adjust to that,
without so much manual fiddling. I think it's good to have the
possibility of some manual fiddling to handle problematic situations,
but you shouldn't have to do it just because you made a table bigger.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: monitoring usage count distribution

2023-04-06 Thread Tom Lane

Robert Haas  writes:
> On Wed, Apr 5, 2023 at 4:16 PM Tom Lane  wrote:
>> Having two functions doesn't seem unreasonable to me either.
>> Robert spoke against it to start with, does he still want to
>> advocate for that?

> My position is that if we replace the average usage count with
> something that gives a count for each usage count, that's a win. I
> don't have a strong opinion on an array vs. a result set vs. some
> other way of doing that. If we leave the average usage count in there
> and add yet another function to give the detail, I tend to think
> that's not a great plan, but I'll desist if everyone else thinks
> otherwise.

There seems to be enough support for the existing summary function
definition to leave it as-is; Andres likes it for one, and I'm not
excited about trying to persuade him he's wrong.  But a second
slightly-less-aggregated summary function is clearly useful as well.
So I'm now thinking that we do want the patch as-submitted.
(Caveat: I've not read the patch, just the description.)

regards, tom lane

Re: Schema variables - new implementation for Postgres 15

2023-04-06 Thread Pavel Stehule

>
>
>> example
>>
>> create variable a as int;
>> create table foo(a int);
>>
>> select a from foo; -- the "a" is ambiguous, variable "a" is shadowed
>>
>> This is a basic case, and the unique names don't help. The variables are
>> more aggressive in namespace than tables, because they don't require be in
>> FROM clause. This is the reason why we specify so variables are always
>> shadowed. Only this behaviour is safe and robust. I cannot break any query
>> (that doesn't use variables) by creating any variable. On second hand, an
>> experience from Oracle's PL/SQL or from old PLpgSQL is, so unwanted
>> shadowing can be hard to investigate (without some tools).
>>
>> PL/pgSQL doesn't allow conflict between PL/pgSQL variables, and SQL
>> (now), and I think so it is best. But the scope of PLpgSQL variables is
>> relatively small, so very strict behaviour is acceptable.
>>
>> The session variables are some between tables and attributes. The catalog
>> pg_class can be enhanced about columns for variables, but it does a lot
>> now, so I think it is not practical.
>>
>>>
>>> I agree about shadowing schema variables.  But is there no way to fix
> that so that you can dereference the variable?
> [Does an Alias work inside a procedure against a schema var?]
> Does adding a schema prefix resolve it  properly, so your example, I could
> do:
> SELECT schema_var.a AS var_a, a as COL_A from t;
>

Yes, using schema can fix collisions in almost all cases. There are some
possible cases, when the schema name is the same as some variable name, and
in these cases there can still be collisions (and still there is a
possibility to use catalog.schema.object and it can fix a collision). You
can use a qualified identifier and again in most cases it fixes collisions.
These cases are tested in regression tests.

Regards

Pavel


> Again, I like the default that it is hidden, but I can envision needing
> both?
>
> Regards, Kirk
>

Re: psql \watch 2nd argument: iteration count

2023-04-06 Thread Tom Lane

Kirk Wolak  writes:
> Marked as Ready for Committer.

Pushed with a pretty fair number of cosmetic changes.

One non-cosmetic change I made is that I didn't agree with your
interpretation of the execution count.  IMO this ought to produce
three executions:

regression=# select 1 \watch c=3
Thu Apr  6 13:17:50 2023 (every 2s)

 ?column? 
--
1
(1 row)

Thu Apr  6 13:17:52 2023 (every 2s)

 ?column? 
--
1
(1 row)

Thu Apr  6 13:17:54 2023 (every 2s)

 ?column? 
--
1
(1 row)

regression=# 

If you write a semicolon first, you get four, but it's the semicolon
producing the first result not \watch.

regards, tom lane

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread Melanie Plageman

On Thu, Apr 06, 2023 at 11:34:44PM +1200, David Rowley wrote:
>  second On Thu, 6 Apr 2023 at 14:14, Melanie Plageman
>  wrote:
> >
> > Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT
> > option.
> 
> I've spent quite a bit of time looking at this since you sent it. I've
> also made quite a few changes, mostly cosmetic ones, but there are a
> few things below which are more fundamental.
> 
> 1. I don't really think we need to support VACUUM (BUFFER_USAGE_LIMIT
> -1);  It's just the same as VACUUM;  Removing that makes the
> documentation more simple.

Agreed.
 
> 2. I don't think we really need to allow vacuum_buffer_usage_limit =
> -1.  I think we can just set this to 256 and leave it.  If we allow -1
> then we need to document what -1 means. The more I think about it, the
> more strange it seems to allow -1. I can't quite imagine work_mem = -1
> means 4MB. Why 4MB?

Agreed.
 
> 3. There was a bug in GetAccessStrategyBufferCount() (I renamed it to
> that from StrategyGetBufferCount()) that didn't handle NULL input. The
> problem was that if you set vacuum_buffer_usage_limit = 0 then did a
> parallel vacuum that GetAccessStrategyWithSize() would return NULL due
> to the 0 buffer input, but GetAccessStrategyBufferCount() couldn't
> handle NULL.  I've adjusted GetAccessStrategyBufferCount() just to
> return 0 on NULL input.

Good catch.
 
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index bcc49aec45..c421da348d 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2001,6 +2001,35 @@ include_dir 'conf.d'
>
>   
>  
> +  xreflabel="vacuum_buffer_usage_limit">
> +  
> +   vacuum_buffer_usage_limit (integer)
> +   
> +vacuum_buffer_usage_limit configuration 
> parameter
> +   
> +  
> +  
> +   
> +Specifies the size of shared_buffers to be reused
> +for each backend participating in a given invocation of
> +VACUUM or ANALYZE or in
> +autovacuum.  This size is converted to the number of shared buffers
> +which will be reused as part of a
> +Buffer Access 
> Strategy.
> +A setting of 0 will allow the operation to use any
> +number of shared_buffers.  The maximum size is
> +16 GB and the minimum size is
> +128 KB.  If the specified size would exceed 1/8 
> the
> +size of shared_buffers, it is silently capped to
> +that value.  The default value is -1.  If this

This still says that the default value is -1.

> diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
> index b6d30b5764..0b02d9faef 100644
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -345,6 +346,24 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
> Buffer Access 
> Strategy
> +  ring buffer size for VACUUM.  This size is used to
> +  calculate the number of shared buffers which will be reused as part of
> +  this strategy.  0 disables use of a
> +  Buffer Access Strategy.  If ANALYZE
> +  is also specified, the BUFFER_USAGE_LIMIT value is 
> used
> +  for both the vacuum and analyze stages.  This option can't be used with
> +  the FULL option except if ANALYZE is
> +  also specified.
> + 
> +
> +   

I noticed you seemed to have removed the reference to the GUC
vacuum_buffer_usage_limit here. Was that intentional?
We may not need to mention "falling back" as I did before, however, the
GUC docs mention max/min values and such, which might be useful.

> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index ea1d8960f4..e92738c7f0 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -56,6 +56,7 @@
>  #include "utils/acl.h"
>  #include "utils/fmgroids.h"
>  #include "utils/guc.h"
> +#include "utils/guc_hooks.h"
>  #include "utils/memutils.h"
>  #include "utils/pg_rusage.h"
>  #include "utils/snapmgr.h"
> @@ -95,6 +96,30 @@ static VacOptValue get_vacoptval_from_boolean(DefElem 
> *def);
>  static bool vac_tid_reaped(ItemPointer itemptr, void *state);
>  static int   vac_cmp_itemptr(const void *left, const void *right);
>  
> +/*
> + * GUC check function to ensure GUC value specified is within the allowable
> + * range.
> + */
> +bool
> +check_vacuum_buffer_usage_limit(int *newval, void **extra,
> + GucSource 
> source)
> +{
> + /* Allow specifying the default or disabling Buffer Access Strategy */
> + if (*newval == -1 || *newval == 0)
> + return true;

This should not check for -1 since that isn't the default anymore.
It should only need to check for 0 I think?

> + /* Value upper and lower hard limits are inclusive */
> + if (*newval >= MIN_BAS_VAC_RING_SIZE_KB &&
> + *newval <= MAX_BAS_VAC_RING_SIZE_KB)
> + return true;
> +
> + /* Value does not fall within any allowab

Re: monitoring usage count distribution

2023-04-06 Thread Robert Haas

On Wed, Apr 5, 2023 at 4:16 PM Tom Lane  wrote:
> Nathan Bossart  writes:
> > On Wed, Apr 05, 2023 at 12:09:21PM -0700, Andres Freund wrote:
> >> I would not mind having a separate function returning 6 rows, if we really
> >> want that, but making pg_buffercache_summary() return 6 rows would imo 
> >> make it
> >> less useful for getting a quick overview. At least I am not that quick 
> >> summing
> >> up multple rows, just to get a quick overview over the number of dirty 
> >> rows.
>
> > This is what v1-0001 does.  We could probably make pg_buffercache_summary a
> > view on pg_buffercache_usage_counts, too, but that doesn't strike me as
> > tremendously important.
>
> Having two functions doesn't seem unreasonable to me either.
> Robert spoke against it to start with, does he still want to
> advocate for that?

My position is that if we replace the average usage count with
something that gives a count for each usage count, that's a win. I
don't have a strong opinion on an array vs. a result set vs. some
other way of doing that. If we leave the average usage count in there
and add yet another function to give the detail, I tend to think
that's not a great plan, but I'll desist if everyone else thinks
otherwise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Should vacuum process config file reload more often

2023-04-06 Thread Robert Haas

On Thu, Apr 6, 2023 at 11:52 AM Melanie Plageman
 wrote:
> > Gah, I think I misunderstood you. You are saying that only calling
> > AutoVacuumUpdateCostLimit() after napping while vacuuming a table may
> > not be enough. The frequency at which the number of workers changes will
> > likely be different. This is a good point.
> > It's kind of weird to call AutoVacuumUpdateCostLimit() only after napping...
>
> A not fully baked idea for a solution:
>
> Why not keep the balanced limit in the atomic instead of the number of
> workers for balance. If we expect all of the workers to have the same
> value for cost limit, then why would we just count the workers and not
> also do the division and store that in the atomic variable. We are
> worried about the division not being done often enough, not the number
> of workers being out of date. This solves that, right?

A bird in the hand is worth two in the bush, though. We don't really
have time to redesign the patch before feature freeze, and I can't
convince myself that there's a big enough problem with what you
already did that it would be worth putting off fixing this for another
year. Reading your newer emails, I think that the answer to my
original question is "we don't want to do it at every
vacuum_delay_point because it might be too costly," which is
reasonable.

I don't particularly like this new idea, either, I think. While it may
be true that we expect all the workers to come up with the same
answer, they need not, because rereading the configuration file isn't
synchronized. It would be pretty lame if a worker that had reread an
updated value from the configuration file recomputed the value, and
then another worker that still had an older value recalculated it
again just afterward. Keeping only the number of workers in memory
avoids the possibility of thrashing around in situations like that.

I do kind of wonder if it would be possible to rejigger things so that
we didn't have to keep recalculating av_nworkersForBalance, though.
Perhaps now is not the time due to the impending freeze, but maybe we
should explore maintaining that value in such a way that it is correct
at every instant, instead of recalculating it at intervals.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Schema variables - new implementation for Postgres 15

2023-04-06 Thread Kirk Wolak

On Wed, Apr 5, 2023 at 1:58 PM Pavel Stehule 
wrote:

>
>
> st 5. 4. 2023 v 19:20 odesílatel Greg Stark  napsal:
>
>> On Sun, 26 Mar 2023 at 07:34, Julien Rouhaud  wrote:
>> >
>> > This feature can significantly increase log size, so it's disabled by
>> default.
>> > For testing or development environments it's recommended to enable it
>> if you
>> > use session variables.
>>
>> I think it's generally not practical to have warnings for valid DML.
>> Effectively warnings in DML are errors since they make the syntax just
>> unusable. I suppose it's feasible to have it as a debugging option
>> that defaults to off but I'm not sure it's really useful.
>>
>
> It is a tool that should help with collision detection.  Without it, it
> can be pretty hard to detect it. It is similar to plpgsql's extra warnings.
>
>
>> I suppose it raises the question of whether session variables should
>> be in pg_class and be in the same namespace as tables so that
>> collisions are impossible. I haven't looked at the code to see if
>> that's feasible or reasonable. But this feels a bit like what happened
>> with sequences where they used to be a wholly special thing and later
>> we realized everything was simpler if they were just a kind of
>> relation.
>>
>
> The first patch did it. But at the end, it doesn't reduce conflicts,
> because usually the conflicts are between variables and table's attributes
> (columns).
>
> example
>
> create variable a as int;
> create table foo(a int);
>
> select a from foo; -- the "a" is ambiguous, variable "a" is shadowed
>
> This is a basic case, and the unique names don't help. The variables are
> more aggressive in namespace than tables, because they don't require be in
> FROM clause. This is the reason why we specify so variables are always
> shadowed. Only this behaviour is safe and robust. I cannot break any query
> (that doesn't use variables) by creating any variable. On second hand, an
> experience from Oracle's PL/SQL or from old PLpgSQL is, so unwanted
> shadowing can be hard to investigate (without some tools).
>
> PL/pgSQL doesn't allow conflict between PL/pgSQL variables, and SQL (now),
> and I think so it is best. But the scope of PLpgSQL variables is relatively
> small, so very strict behaviour is acceptable.
>
> The session variables are some between tables and attributes. The catalog
> pg_class can be enhanced about columns for variables, but it does a lot
> now, so I think it is not practical.
>
>>
>> I agree about shadowing schema variables.  But is there no way to fix
that so that you can dereference the variable?
[Does an Alias work inside a procedure against a schema var?]
Does adding a schema prefix resolve it  properly, so your example, I could
do:
SELECT schema_var.a AS var_a, a as COL_A from t;

Again, I like the default that it is hidden, but I can envision needing
both?

Regards, Kirk

Re: Rethinking the implementation of ts_headline()

2023-04-06 Thread Alexander Lakhin


Hi,

19.01.2023 19:13, Tom Lane wrote:

Alvaro Herrera  writes:


Anyway, I don't think this needs to stop your current patch.

Many thanks for looking at it!


I've found that starting from commit 5a617d75 this query:
SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english', 
'or'));

invokes a valgrind-detected error:
==00:00:00:03.950 3241424== Invalid read of size 1
==00:00:00:03.950 3241424==    at 0x8EE2A9: TS_execute_locations_recurse 
(tsvector_op.c:2046)
==00:00:00:03.950 3241424==    by 0x8EE220: TS_execute_locations 
(tsvector_op.c:2017)
==00:00:00:03.950 3241424==    by 0x77EAC4: prsd_headline (wparser_def.c:2677)
==00:00:00:03.950 3241424==    by 0x94536C: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:03.950 3241424==    by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:03.950 3241424==    by 0x9440F5: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:03.950 3241424==    by 0x778763: ts_headline_byid (wparser.c:343)
==00:00:00:03.950 3241424==    by 0x4AC9F2: ExecInterpExpr 
(execExprInterp.c:752)
==00:00:00:03.950 3241424==    by 0x4AEDFE: ExecInterpExprStillValid 
(execExprInterp.c:1838)
==00:00:00:03.950 3241424==    by 0x636A7E: ExecEvalExprSwitchContext 
(executor.h:344)
==00:00:00:03.950 3241424==    by 0x63E92D: evaluate_expr (clauses.c:4843)
==00:00:00:03.950 3241424==    by 0x63DA53: evaluate_function (clauses.c:4345)
...
(Initially I had encountered an asan-detected heap-buffer-overflow with a
more informative document.)

But the less-verbose call:
SELECT ts_headline('', '');

discovers a different error even on 5a617d75~1:
==00:00:00:04.113 3139158== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:04.113 3139158==    at 0x77B44F: mark_fragment (wparser_def.c:2100)
==00:00:00:04.113 3139158==    by 0x77E2F2: mark_hl_words (wparser_def.c:2519)
==00:00:00:04.113 3139158==    by 0x77E891: prsd_headline (wparser_def.c:2610)
==00:00:00:04.113 3139158==    by 0x944B68: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:04.113 3139158==    by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:04.113 3139158==    by 0x9438F1: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:04.113 3139158==    by 0x7787B6: ts_headline (wparser.c:352)
==00:00:00:04.113 3139158==    by 0x4AC9F2: ExecInterpExpr 
(execExprInterp.c:752)
==00:00:00:04.113 3139158==    by 0x4AEDFE: ExecInterpExprStillValid 
(execExprInterp.c:1838)
==00:00:00:04.113 3139158==    by 0x50BD0C: ExecEvalExprSwitchContext 
(executor.h:344)
==00:00:00:04.113 3139158==    by 0x50BD84: ExecProject (executor.h:378)
==00:00:00:04.113 3139158==    by 0x50BFBB: ExecResult (nodeResult.c:136)
==00:00:00:04.113 3139158==

I've reproduced it on REL9_4_STABLE (REL9_4_15) and it looks like the defect
in mark_hl_words():
    int bestb = -1,
    beste = -1;
    int bestlen = -1;
    int pose = 0,
...
    if (highlight == 0)
    {
    while (hlCover(prs, query, &p, &q))
    {
...
    if (bestlen < 0)
    {
    curlen = 0;
    for (i = 0; i < prs->curwords && curlen < min_words; 
i++)
    {
    if (!NONWORDTOKEN(prs->words[i].type))
    curlen++;
    pose = i;
    }
    bestb = 0;
    beste = pose;
    }
...
// here we have bestb == 0, beste == 0, but no prs->words in this case
    for (i = bestb; i <= beste; i++)
    {
    if (prs->words[i].item)
    prs->words[i].selected = 1;

exists since 2a0083ede.
(Sorry for the distraction.)

Best regards,
Alexander

Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-04-06 Thread Justin Pryzby

On Tue, Mar 14, 2023 at 12:16:16AM +0100, Tomas Vondra wrote:
> On 3/9/23 19:00, Tomas Vondra wrote:
> > On 3/9/23 01:30, Michael Paquier wrote:
> >> On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
> >>> IMO we should fix that. We have a bunch of buildfarm members running on
> >>> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
> >>> tests. But considering how trivial the fix is ...
> >>>
> >>> Barring objections, I'll push a fix early next week.
> >>
> >> +1, better to change that, thanks.  Actually, would --rm be OK even on
> >> Windows?  As far as I can see, the CI detects a LZ4 command for the
> >> VS2019 environment but not the liblz4 libraries that would be needed
> >> to trigger the set of tests.
> > 
> > Thanks for noticing that. I'll investigate next week.
> 
> So, here's a fix that should (I think) replace the 'lz4 --rm' with a
> simple 'rm'. I have two doubts about this, though:
> 
> 1) I haven't found a simple way to inject additional command into the
> test. The pg_dump runs have a predefined list of "steps" to run:
> 
>   -- compress_cmd
>   -- glob_patterns
>   -- command_like
>   -- restore_cmd
> 
> and I don't think there's a good place to inject the 'rm' so I ended up
> adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
> strange / hacky. Maybe there's a better way?

I don't know if there's a better way, and I don't think it's worth
complicating the tests by more than about 2 lines to handle this.

> 2) I wonder if Windows will know what 'rm' means. I haven't found any
> TAP test doing 'rm' and don't see 'rm' in any $ENV either.

CI probably will, since it's Andres' image built with git-tools and
other helpful stuff installed.  But in general I think it won't; perl is
being used for all the portable stuff.

*If* you wanted to do something to fix this, you could create a key
called files_to_remove_after_loading, and run unlink on those files
rather than running a shell command.  Or maybe just remove the file
unconditionally at the start of the script ?

> That being said, I have no idea how to make this work on our Windows CI.
> As mentioned, the environment is missing the lz4 library - there's a
> 
>   setup_additional_packages_script: |
> REM choco install -y --no-progress ...
> 
> in the .yml file, but AFAICS the chocolatey does not have lz4 :-/

I updated what I'd done in the zstd patch to also run with LZ4.
This won't apply directly due to other patches, but you get the idea...

Maybe it'd be good to have a commented-out "wraps" hint like there is
for choco.  The downloaded files could be cached, too.

diff --git a/.cirrus.yml b/.cirrus.yml
index a3977a4036e..b4387a739f3 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -644,9 +644,11 @@ task:
 vcvarsall x64
 mkdir subprojects
 meson wrap install zstd
-meson configure -D zstd:multithread=enabled --force-fallback-for=zstd
+meson wrap install lz4
+meson subprojects download
+meson configure -D zstd:multithread=enabled --force-fallback-for=zstd 
--force-fallback-for=lz4
 set 
CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.8-windows-x86_64\ccache.exe
 cl.exe
-meson setup --backend ninja --buildtype debug 
-Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false 
-Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include 
-DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -Dc_args="/Z7 
/MDd" build
+meson setup --backend ninja --buildtype debug 
-Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false 
-Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include 
-DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -D lz4=enabled 
-Dc_args="/Z7 /MDd" build
 
   build_script: |
 vcvarsall x64

-- 
Justin

Re: zstd compression for pg_dump

2023-04-06 Thread Justin Pryzby

On Thu, Apr 06, 2023 at 05:34:30PM +0200, Tomas Vondra wrote:
> I looked at the long mode patch again, updated the commit message and
> pushed it. I was wondering if long_mode should really be bool -
> logically it is, but ZSTD_CCtx_setParameter() expects int. But I think
> that's fine.

Thanks!

> I think that's all for PG16 in this patch series.  If there's more we want to
> do, it'll have to wait for PG17 - 

Yes

> Justin, can you update and submit the patches that you think are relevant for
> the next CF?

Yeah.

It sounds like a shiny new feature, but it's not totally clear if it's safe
here or even how useful it is.  (It might be like my patch for
wal_compression=zstd:level, and Michael's for toast_compression=zstd, neither
of which saw any support).

Last year's basebackup thread had some interesting comments about safety of
threads, although pg_dump's considerations may be different.

The patch itself is trivial, so it'd be fine to wait until PG16 is released to
get some experience.  If someone else wanted to do that, it'd be fine with me.

-- 
Justin

Re: Should vacuum process config file reload more often

2023-04-06 Thread Melanie Plageman

On Wed, Apr 5, 2023 at 11:10 PM Melanie Plageman
 wrote:
> On Wed, Apr 5, 2023 at 3:43 PM Melanie Plageman  
> wrote:
> > On Wed, Apr 5, 2023 at 2:56 PM Robert Haas  wrote:
> > >
> > > + /*
> > > + * Balance and update limit values for autovacuum workers. We must
> > > + * always do this in case the autovacuum launcher or another
> > > + * autovacuum worker has recalculated the number of workers across
> > > + * which we must balance the limit. This is done by the launcher when
> > > + * launching a new worker and by workers before vacuuming each table.
> > > + */
> > >
> > > I don't quite understand what's going on here. A big reason that I'm
> > > worried about this whole issue in the first place is that sometimes
> > > there's a vacuum going on a giant table and you can't get it to go
> > > fast. You want it to absorb new settings, and to do so quickly. I
> > > realize that this is about the number of workers, not the actual cost
> > > limit, so that makes what I'm about to say less important. But ... is
> > > this often enough? Like, the time before we move onto the next table
> > > could be super long. The time before a new worker is launched should
> > > be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default
> > > settings, so that's not horrible, but I'm kind of struggling to
> > > understand the rationale for this particular choice. Maybe it's fine.
> >
> > VacuumUpdateCosts() also calls AutoVacuumUpdateCostLimit(), so this will
> > happen if a config reload is pending the next time vacuum_delay_point()
> > is called (which is pretty often -- roughly once per block vacuumed but
> > definitely more than once per table).
> >
> > Relevant code is at the top of vacuum_delay_point():
> >
> > if (ConfigReloadPending && IsAutoVacuumWorkerProcess())
> > {
> > ConfigReloadPending = false;
> > ProcessConfigFile(PGC_SIGHUP);
> > VacuumUpdateCosts();
> > }
> >
>
> Gah, I think I misunderstood you. You are saying that only calling
> AutoVacuumUpdateCostLimit() after napping while vacuuming a table may
> not be enough. The frequency at which the number of workers changes will
> likely be different. This is a good point.
> It's kind of weird to call AutoVacuumUpdateCostLimit() only after napping...

A not fully baked idea for a solution:

Why not keep the balanced limit in the atomic instead of the number of
workers for balance. If we expect all of the workers to have the same
value for cost limit, then why would we just count the workers and not
also do the division and store that in the atomic variable. We are
worried about the division not being done often enough, not the number
of workers being out of date. This solves that, right?

- Melanie

Re: Schema variables - new implementation for Postgres 15

2023-04-06 Thread Julien Rouhaud

On Thu, Apr 6, 2023 at 1:58 AM Pavel Stehule  wrote:
>
> st 5. 4. 2023 v 19:20 odesílatel Greg Stark  napsal:
>>
>> On Sun, 26 Mar 2023 at 07:34, Julien Rouhaud  wrote:
>> >
>> > This feature can significantly increase log size, so it's disabled by 
>> > default.
>> > For testing or development environments it's recommended to enable it if 
>> > you
>> > use session variables.
>>
>> I think it's generally not practical to have warnings for valid DML.
>> Effectively warnings in DML are errors since they make the syntax just
>> unusable. I suppose it's feasible to have it as a debugging option
>> that defaults to off but I'm not sure it's really useful.
>
>
> It is a tool that should help with collision detection.  Without it, it can 
> be pretty hard to detect it. It is similar to plpgsql's extra warnings.

Another example is escape_string_warning, which can also emit warning
for valid DML.  I once had to fix some random framework that a
previous employer was using, in order to move to a more recent pg
version and have standard_conforming_strings on, and having
escape_string_warning was quite helpful.

Re: zstd compression for pg_dump

2023-04-06 Thread Tomas Vondra




On 4/5/23 21:42, Tomas Vondra wrote:
> On 4/4/23 05:04, Justin Pryzby wrote:
>> On Mon, Apr 03, 2023 at 11:26:09PM +0200, Tomas Vondra wrote:
>>> On 4/3/23 21:17, Justin Pryzby wrote:
 On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote:
>> Feel free to mess around with threads (but I'd much rather see the patch
>> progress for zstd:long).
>
> OK, understood. The long mode patch is pretty simple. IIUC it does not
> change the format, i.e. in the worst case we could leave it for PG17
> too. Correct?

 Right, libzstd only has one "format", which is the same as what's used
 by the commandline tool.  zstd:long doesn't change the format of the
 output: the library just uses a larger memory buffer to allow better
 compression.  There's no format change for zstd:workers, either.
>>>
>>> OK. I plan to do a bit more review/testing on this, and get it committed
>>> over the next day or two, likely including the long mode. One thing I
>>> noticed today is that maybe long_distance should be a bool, not int.
>>> Yes, ZSTD_c_enableLongDistanceMatching() accepts int, but it'd be
>>> cleaner to cast the value during a call and keep it bool otherwise.
>>
>> Thanks for noticing.  Evidently I wrote it using "int" to get the
>> feature working, and then later wrote the bool parsing bits but never
>> changed the data structure.
>>
>> This also updates a few comments, indentation, removes a useless
>> assertion, and updates the warning about zstd:workers.
>>
> 
> Thanks. I've cleaned up the 0001 a little bit (a couple comment
> improvements), updated the commit message and pushed it. I plan to take
> care of the 0002 (long distance mode) tomorrow, and that'll be it for
> PG16 I think.

I looked at the long mode patch again, updated the commit message and
pushed it. I was wondering if long_mode should really be bool -
logically it is, but ZSTD_CCtx_setParameter() expects int. But I think
that's fine.

I think that's all for PG16 in this patch series. If there's more we
want to do, it'll have to wait for PG17 - Justin, can you update and
submit the patches that you think are relevant for the next CF?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-06 Thread Imseih (AWS), Sami

> As one thing,
> for example, it introduces a dependency to parallel.h to do progress
> reporting without touching at backend_progress.h.  

Containing the logic in backend_progress.h is a reasonable point
from a maintenance standpoint.

We can create a new function in backend_progress.h called 
pgstat_progress_update_leader which is called from
vacuumparallel.c. 

pgstat_progress_update_leader can then call pq_putmessage('P', NULL, 0)

> Is a callback
> approach combined with a counter in shared memory the best thing there
> could be?  

It seems to be the best way.

The shared memory, ParallelVacuumState, is already tracking the
counters for the Parallel Vacuum.

Also, the callback in ParallelContext is the only way I can see
to let the 'P' message know what to do for updating progress
to the leader.


> Could it be worth thinking about a different design where
> the value incremented and the parameters of
> pgstat_progress_update_param() are passed through the 'P' message
> instead?

I am not sure how this is different than the approach suggested.
In the current design, the 'P' message is used to pass the
ParallelvacuumState to parallel_vacuum_update_progress which then
calls pgstat_progress_update_param.


> It strikes me that gathering data in the leader from a poke
> of the workers is something that could be useful in so much more areas
> than just the parallel index operations done in a vacuum because we do
> more and more things in parallel these days, so the API interface
> ought to have some attention.

We may need an interface that does more than progress
reporting, but I am not sure what those use cases are at
this point, besides progress reporting.


> There is also an argument where we could
> have each worker report their progress independently of the leader?

In this case, we don't need ParallelContext at all or to go through the
'P' message.


--
Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: doc: add missing "id" attributes to extension packaging page

2023-04-06 Thread Karl O. Pinc

On Thu, 6 Apr 2023 16:19:30 +0200
Brar Piening  wrote:

> Reviewer is Karl O. Pink

"Karl O. Pinc" actually, with a "c".

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 3:39 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?



Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
 WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?


Not at all, it looks good to me.



Few more minor comments on 0005
=
0005
1.
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.



Thanks! Will update 0005.

Regards,


--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: doc: add missing "id" attributes to extension packaging page

2023-04-06 Thread Brar Piening


On 06.04.2023 at 11:06, Peter Eisentraut wrote:

On 04.04.23 21:52, Brar Piening wrote:

The XSLT implementation looks sound to me.  It would be a touch better
if it had some comments about which parts of the templates were copied
from upstream stylesheets and which were changed.  There are examples
of such commenting in the existing customization layer.  Also, avoid
introducing whitespace differences during said copying.


I will amend the patch if we agree that this is the way forward.


Ok, it appears everyone agrees that this is the correct approach.
Please post an updated patch.  There have been so many partial patches
posted recently, I'm not sure which one is the most current one and
who is really the author.


Attached are two patches:

001-make_html_ids_discoverable_v5.postgresql.patch which needs to be
applied to the postgresql repository. It adds the XSLT to generate the
id links and the CSS to hide/display them. I've added comments as
suggested above.

002-add-discoverable-id-style_v1.pgweb.patch which needs to be applied
to the pgweb repository. It adds the CSS to the offical documentation site.

At the moment (commit 983ec23007) there are no missing ids, so the build
should just work after applying the patch but, as we already know, this
may change with every commit that gets added.

Reviewer is Karl O. Pink

Author is Brar Piening (with some additions from Karl O. Pink)

Best regards,

Brar
diff --git a/media/css/main.css b/media/css/main.css
index 5f8bfdd5..c3464cd0 100644
--- a/media/css/main.css
+++ b/media/css/main.css
@@ -1175,6 +1175,16 @@ code,
   padding-right: 2em;
 }
 
+/** Links to ids of headers and definition terms */
+#docContent a.id_link {
+  color: inherit;
+  visibility: hidden;
+}
+
+#docContent *:hover > a.id_link {
+  visibility: visible;
+}
+
 /**
   * Various callout boxes for docs, including warning, caution, note, tip
   */
diff --git a/doc/src/sgml/stylesheet-html-common.xsl 
b/doc/src/sgml/stylesheet-html-common.xsl
index bb6429ef7c..8e5d9912d5 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -309,4 +309,137 @@ set   toc,title
   
 
 
+
+
+
+  
+  
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+  6
+  
+
+  
+
+  
+  http://www.w3.org/1999/xhtml";>
+
+
+  
+clear: both
+  
+
+
+  
+
+
+  
+
+
+
+
+
+  
+
+
+  
+
+
+
+
+
+  
+  
+
+  
+  
+
+  
+
+  
+
+
+
+
+
+  
+  
+
+   
+  
+
+  #
+  
+
+  
+
+
+  id_link
+
+#
+  
+
+
+  
+  
+
+  
  
+  Ids are required in order to provide the public
+   HTML documentation with stable URLs for <
+  
+  > element content; id missing at: 
+  
+/
+
+
+  [@
+  
+   = '
+  
+  ']
+
+  
+  
   
+
+  
+
+  
+
+
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index cc14efa1ca..15bcc95d41 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -169,3 +169,13 @@ acronym{ font-style: inherit; }
 width: 75%;
   }
 }
+
+/* Links to ids of headers and definition terms */
+a.id_link {
+color: inherit;
+visibility: hidden;
+}
+
+*:hover > a.id_link {
+visibility: visible;
+}

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 4/6/23 11:55 AM, Amit Kapila wrote:
> > On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:
> >>
> >> On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
> >>  wrote:
> >>>
> >>
> >> Another comment on 0001.
> >>   extern void CheckSlotRequirements(void);
> >>   extern void CheckSlotPermissions(void);
> >> +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
> >> TransactionId xid, char *reason);
> >>
> >> This doesn't seem to be called from anywhere.
> >>
> >
> > Few other comments:
> > ==
> > 0004
> > 1.
> > + *  - physical walsenders in case of new time line and cascade
> > + *  replication is allowed.
> > + *  - logical walsenders in case of new time line or recovery is in 
> > progress
> > + *  (logical decoding on standby).
> > + */
> > + WalSndWakeup(switchedTLI && AllowCascadeReplication(),
> > + switchedTLI || RecoveryInProgress());
> >
> > Do we need AllowCascadeReplication() check specifically for physical
> > walsenders? I think this should be true for both physical and logical
> > walsenders.
> >
>
> I don't think it could be possible to create logical walsenders on a standby 
> if
> AllowCascadeReplication() is not true, or am I missing something?
>

Right, so why to even traverse walsenders for that case? What I was
imagining a code is like:
if (AllowCascadeReplication())
WalSndWakeup(switchedTLI, true);

Do you see any problem with this change?

Few more minor comments on 0005
=
0005
1.
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one.

/wait bgwriter/wait for bgwriter

2.
+use Test::More tests => 67;

We no more use the number of tests. Please refer to other similar tests.

-- 
With Regards,
Amit Kapila.

Re: Initial Schema Sync for Logical Replication

2023-04-06 Thread Masahiko Sawada

On Thu, Mar 30, 2023 at 10:11 PM Masahiko Sawada  wrote:
>
> On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin  wrote:
> > >
> > > > > > > From: Amit Kapila 
> > > > > > > > I think we won't be able to use same snapshot because the
> > > > > > > > transaction will be committed.
> > > > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > > > not sure about this part maybe walrcv_disconnect() calls the 
> > > > > > > > commits
> > > > internally ?).
> > > > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > > > can have a restart before tableSync is able to use the same
> > > > > > > > snapshot.)
> > > > > > > >
> > > > > > >
> > > > > > > Can we think of getting the table data as well along with schema
> > > > > > > via pg_dump? Won't then both schema and initial data will
> > > > > > > correspond to the same snapshot?
> > > > > >
> > > > > > Right , that will work, Thanks!
> > > > >
> > > > > While it works, we cannot get the initial data in parallel, no?
> > > > >
> > >
> > > I was thinking each TableSync process will call pg_dump --table, This way 
> > > if we have N
> > > tableSync process, we can have N pg_dump --table=table_name called in 
> > > parallel.
> > > In fact we can use --schema-only to get schema and then let COPY take 
> > > care of data
> > > syncing . We will use same snapshot for pg_dump as well as COPY table.
> >
> > How can we postpone creating the pg_subscription_rel entries until the
> > tablesync worker starts and does the schema sync? I think that since
> > pg_subscription_rel entry needs the table OID, we need either to do
> > the schema sync before creating the entry (i.e, during CREATE
> > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > apply worker needs the information of tables to sync in order to
> > launch the tablesync workers, but it needs to create the table schema
> > to get that information.
>
> For the above reason, I think that step 6 of the initial proposal won't work.
>
> If we can have the tablesync worker create an entry of
> pg_subscription_rel after creating the table, it may give us the
> flexibility to perform the initial sync. One idea is that we add a
> relname field to pg_subscription_rel so that we can create entries
> with relname instead of OID if the table is not created yet. Once the
> table is created, we clear the relname field and set the OID of the
> table instead. It's not an ideal solution but we might make it simpler
> later.

While writing a PoC patch, I found some difficulties in this idea.
First, I tried to add schemaname+relname to pg_subscription_rel but I
could not define the primary key of pg_subscription_rel. The primary
key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
also doesn't work. So I tried another idea: that we generate a new OID
for srrelid and the tablesync worker will replace it with the new
table's OID once it creates the table. However, since we use srrelid
in replication slot names, changing srrelid during the initial
schema+data sync is not straightforward (please note that the slot is
created by the tablesync worker but is removed by the apply worker).
Using relname in slot name instead of srrelid is not a good idea since
it requires all pg_subscription_rel entries have relname, and slot
names could be duplicated, for example, when the relname is very long
and we cut it.

I'm trying to consider the idea from another angle: the apply worker
fetches the table list and passes the relname to the tablesync worker.
But a problem of this approach is that the table list is not
persisted. If the apply worker restarts during the initial table sync,
it could not get the same list as before.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 2:23 PM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila  wrote:

Thinking some more on this, I think such a slot won't decode any other
records. During CreateInitDecodingContext->ReplicationSlotReserveWal,
for standby's, we use lastReplayedEndRecPtr as restart_lsn. This
should be a record before parameter_change record in the above
scenario. So, ideally, the first record to decode by such a walsender
should be parameter_change which will anyway error out. So, this
shouldn't be a problem.



Agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 7:59 AM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:


On 4/5/23 12:28 PM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
 wrote:

Maybe we could change the doc with something among those lines instead?

"
Existing logical slots on standby also get invalidated if wal_level on primary 
is reduced to
less than 'logical'. This is done as soon as the standby detects such a change 
in the WAL stream.

It means, that for walsenders that are lagging (if any), some WAL records up to 
the parameter change on the
primary won't be decoded".

I don't know whether this is what one would expect but that should be less of a 
surprise if documented.

What do you think?



Yeah, I think it is better to document to avoid any surprises if
nobody else sees any problem with it.


Ack.



This doesn't seem to be addressed in the latest version.


Right, I was waiting if "nobody else sees any problem with it".

Added it now in V62 posted up-thread.


And today, I
think I see one more point about this doc change:
+
+ A logical replication slot can also be created on a hot standby.
To prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot
between the primary
+ and the standby. Otherwise, hot_standby_feedback will work, but
only while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on
primary is reduced to
+ less than 'logical'.

If hot_standby_feedback is not set then can logical decoding on
standby misbehave? If so, that is not very clear from this doc change
if that is acceptable.


I don't think it would misbehave but that primary may delete system catalog rows
that could be needed by the logical decoding on the standby (as it does not 
know about the
catalog_xmin on the standby).

Added this remark in V62.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 11:55 AM, Amit Kapila wrote:

On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:


On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
 wrote:




Another comment on 0001.
  extern void CheckSlotRequirements(void);
  extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.



Few other comments:
==
0004
1.
+ *  - physical walsenders in case of new time line and cascade
+ *  replication is allowed.
+ *  - logical walsenders in case of new time line or recovery is in progress
+ *  (logical decoding on standby).
+ */
+ WalSndWakeup(switchedTLI && AllowCascadeReplication(),
+ switchedTLI || RecoveryInProgress());

Do we need AllowCascadeReplication() check specifically for physical
walsenders? I think this should be true for both physical and logical
walsenders.



I don't think it could be possible to create logical walsenders on a standby if
AllowCascadeReplication() is not true, or am I missing something?

If so, I think it has to be set to true for the logical walsenders in all the 
case (like
done in V62 posted up-thread).

Andres, made the point up-thread that RecoveryInProgress() is always true, and
as we don't want to be woken up only when there is a time line change then I 
think
it has to be always true for logical walsenders.


0005
2.
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -38,6 +38,7 @@
  #include "utils/pg_lsn.h"
  #include "utils/timestamp.h"
  #include "utils/tuplestore.h"
+#include "storage/standby.h"

The header includes should be in alphabetical order.



Good catch, thanks! Done in V62.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-04-06 Thread Drouvot, Bertrand


Hi,

On 4/6/23 8:40 AM, Amit Kapila wrote:

On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid. 
For walsender, an ERROR will lead

to its exit, so that is fine. If this understanding is correct, then
if 'am_cascading_walsender' is false, we should set ProcDiePending
apart from other parameters. Sorry, I haven't tested this, so I could
be wrong here. 


Oops my bad. You are fully, right. Fixed in V62 posted up-thread


Also, it seems you have removed the checks related to
slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
used for logical slots? If so, do you think an Assert would make
sense?



Yes, indeed adding an Assert makes sense: done in V62 posted up-thread.



Another comment on 0001.
  extern void CheckSlotRequirements(void);
  extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.


Good catch, removed in V62 posted up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Should vacuum process config file reload more often

2023-04-06 Thread Daniel Gustafsson

> On 6 Apr 2023, at 08:39, Masahiko Sawada  wrote:

> Also I agree with
> where to put the log but I think the log message should start with
> lower cases:
> 
> +elog(DEBUG2,
> + "Autovacuum VacuumUpdateCosts(db=%u, rel=%u,

In principle I agree, but in this case Autovacuum is a name and should IMO in
userfacing messages start with capital A.

> +/*
> + * autovac_recalculate_workers_for_balance
> + * Recalculate the number of workers to consider, given
> cost-related
> + * storage parameters and the current number of active workers.
> + *
> + * Caller must hold the AutovacuumLock in at least shared mode to access
> + * worker->wi_proc.
> + */
> 
> Does it make sense to add Assert(LWLockHeldByMe(AutovacuumLock)) at
> the beginning of this function?

That's probably not a bad idea.

> ---
> /* rebalance in case the default cost parameters changed */
> -LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
> -autovac_balance_cost();
> +LWLockAcquire(AutovacuumLock, LW_SHARED);
> +autovac_recalculate_workers_for_balance();
> LWLockRelease(AutovacuumLock);
> 
> Do we really need to have the autovacuum launcher recalculate
> av_nworkersForBalance after reloading the config file? Since the cost
> parameters are not used inautovac_recalculate_workers_for_balance()
> the comment also needs to be updated.

If I understand this comment right; there was a discussion upthread that simply
doing it in both launcher and worker simplifies the code with little overhead.
A comment can reflect that choice though.

--
Daniel Gustafsson

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila  wrote:
>
> >
>
> This doesn't seem to be addressed in the latest version. And today, I
> think I see one more point about this doc change:
> +
> + A logical replication slot can also be created on a hot standby.
> To prevent
> + VACUUM from removing required rows from the system
> + catalogs, hot_standby_feedback should be set on the
> + standby. In spite of that, if any required rows get removed, the slot 
> gets
> + invalidated. It's highly recommended to use a physical slot
> between the primary
> + and the standby. Otherwise, hot_standby_feedback will work, but
> only while the
> + connection is alive (for example a node restart would break it). 
> Existing
> + logical slots on standby also get invalidated if wal_level on
> primary is reduced to
> + less than 'logical'.
>
> If hot_standby_feedback is not set then can logical decoding on
> standby misbehave? If so, that is not very clear from this doc change
> if that is acceptable. One scenario where I think it can misbehave is
> if applying WAL records generated after changing wal_level from
> 'logical' to 'replica' physically removes catalog tuples that could be
> referenced by the logical decoding on the standby. Now, as mentioned
> in patch 0003's comment in decode.c that it is possible that some
> slots may creep even after we invalidate the slots on parameter
> change, so while decoding using that slot if some required catalog
> tuple has been removed by physical replication then the decoding can
> misbehave even before reaching XLOG_PARAMETER_CHANGE record.
>

Thinking some more on this, I think such a slot won't decode any other
records. During CreateInitDecodingContext->ReplicationSlotReserveWal,
for standby's, we use lastReplayedEndRecPtr as restart_lsn. This
should be a record before parameter_change record in the above
scenario. So, ideally, the first record to decode by such a walsender
should be parameter_change which will anyway error out. So, this
shouldn't be a problem.

-- 
With Regards,
Amit Kapila.

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-06 Thread David Rowley

 second On Thu, 6 Apr 2023 at 14:14, Melanie Plageman
 wrote:
>
> Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT
> option.

I've spent quite a bit of time looking at this since you sent it. I've
also made quite a few changes, mostly cosmetic ones, but there are a
few things below which are more fundamental.

1. I don't really think we need to support VACUUM (BUFFER_USAGE_LIMIT
-1);  It's just the same as VACUUM;  Removing that makes the
documentation more simple.

2. I don't think we really need to allow vacuum_buffer_usage_limit =
-1.  I think we can just set this to 256 and leave it.  If we allow -1
then we need to document what -1 means. The more I think about it, the
more strange it seems to allow -1. I can't quite imagine work_mem = -1
means 4MB. Why 4MB?  Changing this means we don't really need to do
anything special in:

+ if (VacuumBufferUsageLimit > -1)
+ bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit);
+ else
+ bstrategy = GetAccessStrategy(BAS_VACUUM);

That simply becomes:

bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit);

The code inside GetAccessStrategyWithSize() handles returning NULL
when the GUC is zero.

The equivalent in ExecVacuum() becomes:

if (ring_size > -1)
bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size);
else
bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit);

instead of:

if (ring_size > -1)
bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size);
else if (VacuumBufferUsageLimit > -1)
bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit);
else
bstrategy = GetAccessStrategy(BAS_VACUUM);

3. There was a bug in GetAccessStrategyBufferCount() (I renamed it to
that from StrategyGetBufferCount()) that didn't handle NULL input. The
problem was that if you set vacuum_buffer_usage_limit = 0 then did a
parallel vacuum that GetAccessStrategyWithSize() would return NULL due
to the 0 buffer input, but GetAccessStrategyBufferCount() couldn't
handle NULL.  I've adjusted GetAccessStrategyBufferCount() just to
return 0 on NULL input.

Most of the rest is cosmetic. GetAccessStrategyWithSize() ended up
looking quite different.  I didn't see the sense in converting the
shared_buffer size into kilobytes to compare when we could just
convert ring_size_kb to buffers slightly sooner and then just do:

/* Cap to 1/8th of shared_buffers */
ring_buffers = Min(NBuffers / 8, ring_buffers);

I renamed nbuffers to ring_buffers as it was a little too confusing to
have nbuffers (for ring size) and NBuffers (for shared_buffers).

A few other changes like getting rid of the regression test and code
check for VACUUM (ONLY_DATABASE_STATS, BUFFER_USAGE_LIMIT 0); There is
already an if check and ERROR that looks for ONLY_DATABASE_STATS with
any other option slightly later in the function.  I also got rid of
the documentation that mentioned that wasn't supported as there's
already a mention in the ONLY_DATABASE_STATS which says it's not
supported with anything else. No other option seemed to care enough to
mention it, so I don't think BUFFER_USAGE_LIMIT is an exception.

I've attached v15.  I've only glanced at the vacuumdb patch so far.
I'm not expecting it to be too controversial.

I'm fairly happy with v15 now but would welcome anyone who wants to
have a look in the next 8 hours or so, else I plan to push it.

David

v15_buffer_usage_limit.patch
Description: Binary data

Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-06 Thread Anton A. Melnikov


On 05.04.2023 17:35, Tom Lane wrote:

"Anton A. Melnikov"  writes:

On 03.04.2023 21:49, Tom Lane wrote:

I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.



Could you help me to figure out, please.


The problem was an Assert that was speculative when it went in,
and which we eventually found was wrong in the context of logical
replication.  We removed the Assert.  I don't think we need a test
case to keep us from putting back the Assert.  That line of thinking
leads to test suites that run for fourteen hours and are near useless
because developers can't run them easily.

regards, tom lane


Ok, i understand! Thanks a lot for the clarification. A rather important point,
i'll take it into account for the future.
Let's do that. Revoked the patch.

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Minimal logical decoding on standbys

2023-04-06 Thread Amit Kapila

On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila  wrote:
>
> On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
>  wrote:
> >
>
> Another comment on 0001.
>  extern void CheckSlotRequirements(void);
>  extern void CheckSlotPermissions(void);
> +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
> TransactionId xid, char *reason);
>
> This doesn't seem to be called from anywhere.
>

Few other comments:
==
0004
1.
+ *  - physical walsenders in case of new time line and cascade
+ *  replication is allowed.
+ *  - logical walsenders in case of new time line or recovery is in progress
+ *  (logical decoding on standby).
+ */
+ WalSndWakeup(switchedTLI && AllowCascadeReplication(),
+ switchedTLI || RecoveryInProgress());

Do we need AllowCascadeReplication() check specifically for physical
walsenders? I think this should be true for both physical and logical
walsenders.

0005
2.
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -38,6 +38,7 @@
 #include "utils/pg_lsn.h"
 #include "utils/timestamp.h"
 #include "utils/tuplestore.h"
+#include "storage/standby.h"

The header includes should be in alphabetical order.

-- 
With Regards,
Amit Kapila.

Re: meson documentation build open issues

2023-04-06 Thread Peter Eisentraut


On 05.04.23 16:45, Andres Freund wrote:

I think it's still an issue that "make docs" builds html and man but "ninja
docs" only builds html.  For some reason the wiki page actually claims that
ninja docs builds both, but this does not happen for me.


It used to, but Tom insisted that it should not. I'm afraid that it's not
quite possible to emulate make here. 'make docs' at the toplevel builds both
HTML and manpages. But 'make -C doc/src/sgml', only builds HTML.


Ok, not a topic for this thread then.


5. There doesn't appear to be an equivalent of "make world" and "make
install-world" that includes documentation builds.


This has been addressed with the additional meson auto options.  But it
seems that these options only control building, not installing, so there is
no "install-world" aspect yet.


I'm not following - install-world install docs if the docs feature is
available, and not if not?


I had expected that if meson setup enables the 'docs' feature, then 
meson compile will build the documentation, which happens, and meson 
install will install it, which does not happen.

Re: doc: add missing "id" attributes to extension packaging page

2023-04-06 Thread Peter Eisentraut


On 04.04.23 21:52, Brar Piening wrote:

The XSLT implementation looks sound to me.  It would be a touch better
if it had some comments about which parts of the templates were copied
from upstream stylesheets and which were changed.  There are examples
of such commenting in the existing customization layer.  Also, avoid
introducing whitespace differences during said copying.


I will amend the patch if we agree that this is the way forward.


Ok, it appears everyone agrees that this is the correct approach. 
Please post an updated patch.  There have been so many partial patches 
posted recently, I'm not sure which one is the most current one and who 
is really the author.

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2023-04-06 Thread Etsuro Fujita

On Tue, Apr 4, 2023 at 7:28 PM Etsuro Fujita  wrote:
> The parallel-abort patch received a review from David, and I addressed
> his comments.  Also, he tested with the patch, and showed that it
> reduces time taken to abort remote transactions.  So, if there are no
> objections, I will commit the patch.

Pushed after adding/modifying comments a little bit.

Best regards,
Etsuro Fujita

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-06 Thread Peter Smith

Hi Kuroda-san.

This is a WIP review. I'm yet to do more testing and more study of the
POC patch's design.

While reading the code I kept a local list of my review comments.
Meanwhile, there is a long weekend coming up here, so I thought it
would be better to pass these to you now rather than next week in case
you want to address them.

==
General

1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.

pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"

help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.

~~~

2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.

==
Commit message

3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

~

"new option" --> "new pg_upgrade" option

~~~

4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

~

4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"

~

4b.
Typo: /replicaiton/replication/

~

4c
"leads an ERROR" --> "leads to an ERROR"

==

doc/src/sgml/ref/pg_dump.sgml

5.
+ 
+  --slot-only
+  
+   
+Dump only replication slots, neither the schema (data definitions) nor
+data. Mainly this is used for upgrading nodes.
+   
+  

SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.

==

doc/src/sgml/ref/pgupgrade.sgml

6.
+   
+Transport replication slots. Currently this can work only for logical
+slots, and temporary slots are ignored. Note that pg_upgrade does not
+check the installation of plugins.
+   

SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...

==

src/bin/pg_dump/pg_dump.c

7. main
  {"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
  {NULL, 0, NULL, 0}

The blank line is misplaced.

~~~

8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;

typo: /onlu/only/

~~~

9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+

9a.
typo: /replicatin/replication/

~

9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?

~~~

10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }

10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.

~

10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved  above that (and also above the 'Dumping
LOs' etc...)

~~~

11. help

+ printf(_("  --slot-only  dump only replication
slots, no schema and data\n"));

11a.
SUGGESTION
"no schema and data" --> "no schema or data"

~

11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.

~~~
12. getRepliactionSlots

+/*
+ * getRepliactionSlots
+ *   get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)

Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)

~~~

13. getRepliactionSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 16 && !dopt->slot_only)
+ return;

Hmmm, is that condition correct? Shouldn't the && be || here?

~~~

14. dumpReplicationSlot

+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.

87 matches

Mail list logo