Re: Understanding, testing and improving our Windows filesystem code

2022-11-27 Thread Ian Lawrence Barwick
2022年10月25日(火) 13:12 Thomas Munro :
>
> I pushed the bug fixes from this series, without their accompanying
> tests.  Here's a rebase of the test suite, with all those tests now
> squashed into the main test patch, and also the
> tell-Windows-to-be-more-like-Unix patch.  Registered in the
> commitfest.

For reference: https://commitfest.postgresql.org/40/3951/

For my understanding, does this entry supersede the proposal in
https://commitfest.postgresql.org/40/3347/ ?

(Apologies for the previous mail with no additional content).

Regards

Ian Barwick




Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-27 Thread Yugo NAGATA
On Mon, 28 Nov 2022 16:40:52 +0900
Michael Paquier  wrote:

> On Mon, Nov 28, 2022 at 11:54:45AM +0900, Michael Paquier wrote:
> > On Fri, Nov 25, 2022 at 04:05:13PM +0800, Richard Guo wrote:
> >> I wonder if we need to revise the comment atop qual_is_pushdown_safe()
> >> too which says
> >> 
> >>  * rinfo is a restriction clause applying to the given subquery (whose RTE
> >>  * has index rti in the parent query).
> >> 
> >> since there is no 'given subquery' after we remove it from the params.
> 
> I was thinking about this point, and it seems to me that we could just
> do s/the given subquery/a subquery/.  But perhaps you have a different
> view on the matter?

+1
I also was just about to send a patch updated as so, and this is attached.

Regards,
Yugo Nagata

> --
> Michael


-- 
Yugo NAGATA 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4ddaed31a4..3ca67b0e2e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -136,8 +136,7 @@ static void check_output_expressions(Query *subquery,
 static void compare_tlist_datatypes(List *tlist, List *colTypes,
 	pushdown_safety_info *safetyInfo);
 static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
-static bool qual_is_pushdown_safe(Query *subquery, Index rti,
-  RestrictInfo *rinfo,
+static bool qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo,
   pushdown_safety_info *safetyInfo);
 static void subquery_push_qual(Query *subquery,
 			   RangeTblEntry *rte, Index rti, Node *qual);
@@ -2527,7 +2526,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 			Node	   *clause = (Node *) rinfo->clause;
 
 			if (!rinfo->pseudoconstant &&
-qual_is_pushdown_safe(subquery, rti, rinfo, ))
+qual_is_pushdown_safe(rti, rinfo, ))
 			{
 /* Push it down */
 subquery_push_qual(subquery, rte, rti, clause);
@@ -3787,7 +3786,7 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
 /*
  * qual_is_pushdown_safe - is a particular rinfo safe to push down?
  *
- * rinfo is a restriction clause applying to the given subquery (whose RTE
+ * rinfo is a restriction clause applying to a subquery (whose RTE
  * has index rti in the parent query).
  *
  * Conditions checked here:
@@ -3813,7 +3812,7 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
  * found to be unsafe to reference by subquery_is_pushdown_safe().
  */
 static bool
-qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo,
+qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo,
 	  pushdown_safety_info *safetyInfo)
 {
 	bool		safe = true;


Re: Understanding, testing and improving our Windows filesystem code

2022-11-27 Thread Ian Lawrence Barwick
2022年10月25日(火) 13:12 Thomas Munro :
>
> I pushed the bug fixes from this series, without their accompanying
> tests.  Here's a rebase of the test suite, with all those tests now
> squashed into the main test patch, and also the
> tell-Windows-to-be-more-like-Unix patch.  Registered in the
> commitfest.




Re: Fix comment in SnapBuildFindSnapshot

2022-11-27 Thread Michael Paquier
On Mon, Nov 28, 2022 at 11:13:23AM +0900, Masahiko Sawada wrote:
> We have the following comment in SnapBuildFindSnapshot():
> 
> * c) transition from FULL_SNAPSHOT to CONSISTENT.
> *
> * In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
> 
> It mentions "(state d) )", which seems like a typo of "(state d)", but
> there is no "state d" in the first place. Reading the discussion of
> the commit 955a684e040 that introduced this comment, this was a
> comment for an old version patch[1]. So I think we can remove this
> part.

Hm, yes, that seems right.  There are three "c) states" in these
paragraphs, they are incremental steps.  Will apply if there are no
objections.
--
Michael


signature.asc
Description: PGP signature


Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-27 Thread Michael Paquier
On Mon, Nov 28, 2022 at 11:54:45AM +0900, Michael Paquier wrote:
> On Fri, Nov 25, 2022 at 04:05:13PM +0800, Richard Guo wrote:
>> I wonder if we need to revise the comment atop qual_is_pushdown_safe()
>> too which says
>> 
>>  * rinfo is a restriction clause applying to the given subquery (whose RTE
>>  * has index rti in the parent query).
>> 
>> since there is no 'given subquery' after we remove it from the params.

I was thinking about this point, and it seems to me that we could just
do s/the given subquery/a subquery/.  But perhaps you have a different
view on the matter?
--
Michael


signature.asc
Description: PGP signature


Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-27 Thread Peter Smith
Here are some review comments for patch v51-0002

==

1.

GENERAL - terminology:  spool/serialize and data/changes/message

The terminology seems to be used at random. IMO it might be worthwhile
rechecking at least that terms are used consistently in all the
comments. e.g "serialize message data to disk" ... and later ...
"apply the spooled messages".

Also for places where it says "Write the message to file" maybe
consider using consistent terminology like "serialize the message to a
file".

Also, try to standardize the way things are described by using
consistent (if they really are the same) terminology for "writing
data" VS "writing data" VS "writing messages" etc. It is confusing
trying to know if the different wording has some intended meaning or
is it just random.

==

Commit message

2.
When the leader apply worker times out while sending a message to the parallel
apply worker. Instead of erroring out, switch to partial serialize mode and let
the leader serialize all remaining changes to the file and notify the parallel
apply workers to read and apply them at the end of the transaction.

~

The first sentence seems incomplete

SUGGESTION.
In patch 0001 if the leader apply worker times out while attempting to
send a message to the parallel apply worker it results in an ERROR.

This patch (0002) modifies that behaviour, so instead of erroring it
will switch to "partial serialize" mode -  in this mode the leader
serializes all remaining changes to a file and notifies the parallel
apply workers to read and apply them at the end of the transaction.

~~~

3.

This patch 0002 is called “Serialize partial changes to disk if the
shm_mq buffer is full”, but the commit message is saying nothing about
the buffer filling up. I think the Commit message should be mentioning
something that makes the commit patch name more relevant. Otherwise
change the patch name.

==

.../replication/logical/applyparallelworker.c

4. File header comment

+ * timeout is exceeded, the LA will write to file and indicate PA-2 that it
+ * needs to read file for remaining messages. Then LA will start waiting for
+ * commit which will detect deadlock if any. (See pa_send_data() and typedef
+ * enum TransApplyAction)

"needs to read file for remaining messages" -> "needs to read that
file for the remaining messages"

~~~

5. pa_free_worker

+ /*
+ * Stop the worker if there are enough workers in the pool.
+ *
+ * XXX we also need to stop the worker if the leader apply worker
+ * serialized part of the transaction data to a file due to send timeout.
+ * This is because the message could be partially written to the queue due
+ * to send timeout and there is no way to clean the queue other than
+ * resending the message until it succeeds. To avoid complexity, we
+ * directly stop the worker in this case.
+ */
+ if (winfo->serialize_changes ||
+ napplyworkers > (max_parallel_apply_workers_per_subscription / 2))

5a.

+ * XXX we also need to stop the worker if the leader apply worker
+ * serialized part of the transaction data to a file due to send timeout.

SUGGESTION
XXX The worker is also stopped if the leader apply worker needed to
serialize part of the transaction data due to a send timeout.

~

5b.

+ /* Unlink the files with serialized changes. */
+ if (winfo->serialize_changes)
+ stream_cleanup_files(MyLogicalRepWorker->subid, winfo->shared->xid);

A better comment might be

SUGGESTION
Unlink any files that were needed to serialize partial changes.

~~~

6. pa_spooled_messages

/*
 * Replay the spooled messages in the parallel apply worker if leader apply
 * worker has finished serializing changes to the file.
 */
static void
pa_spooled_messages(void)

6a.
IMO a better name for this function would be pa_apply_spooled_messages();

~

6b.
"if leader apply" -> "if the leader apply"

~

7.

+ /*
+ * Acquire the stream lock if the leader apply worker is serializing
+ * changes to the file, because the parallel apply worker will no longer
+ * have a chance to receive a STREAM_STOP and acquire the lock until the
+ * leader serialize all changes to the file.
+ */
+ if (fileset_state == LEADER_FILESET_BUSY)
+ {
+ pa_lock_stream(MyParallelShared->xid, AccessShareLock);
+ pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
+ }

SUGGESTION (rearranged comment - please check, I am not sure if I got
this right)

If the leader apply worker is still (busy) serializing partial changes
then the parallel apply worker acquires the stream lock now.
Otherwise, it would not have a chance to receive a STREAM_STOP (and
acquire the stream lock) until the leader had serialized all changes.

~~~

8. pa_send_data

+ *
+ * When sending data times out, data will be serialized to disk. And the
+ * current streaming transaction will enter PARTIAL_SERIALIZE mode, which means
+ * that subsequent data will also be serialized to disk.
  */
 void
 pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void *data)

SUGGESTION (minor comment 

Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-27 Thread Michael Paquier
On Sun, Nov 27, 2022 at 07:04:46PM +0800, Julien Rouhaud wrote:
> And here's the rebased patch for the TAP tests.  I will switch the CF entry to
> Needs Review.

I have been looking at that, and applied the part 1 of the test for
the positive tests to cover the basic ground I'd like to see covered
for this feature.  I have done quite a few changes to this part, like
reworking the way the incrementation of the counters is done for the
rule/map numbers and the line numbers of each file.  The idea to use a
global dictionary to track the current number of lines was rather
clear, so I have kept that.  add_{hba,ident}_line() rely on the
files included in a directory to be ordered in the same way as the
backend, and I am fine to live with that.

+# Normalize the data directory for Windows
+$data_dir =~ s/\/\.\//\//g;# reduce /./ to /
+$data_dir =~ s/\/\//\//g;  # reduce // to /
+$data_dir =~ s/\/$//;  # remove trailing /
+$data_dir =~ s/\\/\//g;# change \ to /
The mysticism around the conversion of the data directory path is
something I'd like to avoid, or I suspect that we are going to have a
hard time debugging any issues in this area.  For the positive cases
committed, I have gone through the approach of using the base name of
the configuration file to avoid those compatibility issues.  Wouldn't
it better to do the same for the expected error messages, switching to
an approach where we only check for the configuration file name in all
these regexps?

+# Generate the expected output for the auth file view error reporting (file
+# name, file line, error), for the given array of error conditions, as
+# generated generated by add_hba_line/add_ident_line with an err_str.
+sub generate_log_err_rows
+{
generate_log_err_rows() does not strike me as a good interface.  The
issues I am pointed at here is it depends directly on the result
returned by add_hba_line() or add_ident_line() when the caller passes
down an error string.  As far as I get it, this interface is done
as-is to correctly increment the line number of one file, and I'd like
to believe that add_{hba,ident}_line() should be kept designed so as
they only return the expected matching entry in their catalog views,
without any code path returning something else (in your case the
triplet [ $filename, $fileline, $err_str ]).  That makes the whole
harder to understand.

Attached is a rebased patch of the rest.  With everything we have
dealt with in this CF, perhaps it would be better to mark this entry
as committed and switch to a new thread where the negative TAP tests
could be discussed?  It looks like the part for the error pattern
checks compared with the logs could be the easiest portion of what's
remaining.
--
Michael
From adb076649534bf9ef3020c8a000ed7fd269b13b7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 28 Nov 2022 16:08:15 +0900
Subject: [PATCH v24] Add regression tests for file inclusion in HBA end ident
 configuration files

This covers the error tests.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
---
 .../authentication/t/004_file_inclusion.pl| 454 +-
 1 file changed, 452 insertions(+), 2 deletions(-)

diff --git a/src/test/authentication/t/004_file_inclusion.pl b/src/test/authentication/t/004_file_inclusion.pl
index c420f3ebca..4a49d55c6a 100644
--- a/src/test/authentication/t/004_file_inclusion.pl
+++ b/src/test/authentication/t/004_file_inclusion.pl
@@ -33,6 +33,10 @@ my %line_counters = ('hba_rule' => 0, 'ident_rule' => 0);
 # the general hba rule number as an include directive generates no data
 # in pg_hba_file_rules.
 #
+# If an err_str is provided, it returns an arrayref containing the provided
+# filename, the current line number in that file and the provided err_str.  The
+# err_str has to be a valid regex string.
+#
 # This function returns the entry of pg_hba_file_rules expected when this
 # is loaded by the backend.
 sub add_hba_line
@@ -40,6 +44,7 @@ sub add_hba_line
 	my $node = shift;
 	my $filename = shift;
 	my $entry= shift;
+	my $err_str  = shift;
 	my $globline;
 	my $fileline;
 	my @tokens;
@@ -58,11 +63,27 @@ sub add_hba_line
 	$fileline = ++$line_counters{$filename};
 
 	# Include directive, that does not generate a view entry.
-	return '' if ($entry =~ qr/^include/);
+	if ($entry =~ qr/^include/)
+	{
+		if (defined $err_str)
+		{
+			return [ $filename, $fileline, $err_str ];
+		}
+		else
+		{
+			return '';
+		}
+	}
 
 	# Increment pg_hba_file_rules.rule_number and save it.
 	$globline = ++$line_counters{'hba_rule'};
 
+	# If caller provided an err_str, just returns the needed metadata
+	if (defined $err_str)
+	{
+		return [ $filename, $fileline, $err_str ];
+	}
+
 	# Generate the expected pg_hba_file_rules line
 	@tokens= split(/ /, $entry);
 	$tokens[1] = '{' . $tokens[1] . '}';# database
@@ -91,6 +112,10 @@ sub add_hba_line
 # the general map number as an include directive 

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-27 Thread Bharath Rupireddy
On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin  wrote:
>
> Some funny stuff. If a user tries to cancel a non-replicated transaction
> Azure Postgres will answer: "user requested cancel while waiting for
> synchronous replication ack. The COMMIT record has already flushed to
> WAL locally and might not have been replicatead to the standby. We
> must wait here."
> AWS RDS will answer: "ignoring request to cancel wait for synchronous
> replication"
> Yandex Managed Postgres will answer: "canceling wait for synchronous
> replication due requested, but cancelation is not allowed. The
> transaction has already committed locally and might not have been
> replicated to the standby. We must wait here."
>
> So, for many services providing Postgres as a service it's only a
> matter of wording.

Thanks for verifying the behaviour. And many thanks for an off-list chat.

FWIW, I'm planning to prepare a patch as per the below idea which is
something similar to the initial proposal in this thread. Meanwhile,
thoughts are welcome.

1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
sync replication acknowledgement.
2. Process proc die immediately when a backend is waiting for sync
replication acknowledgement, as it does today, however, upon restart,
don't open up for business (don't accept ready-only connections)
unless the sync standbys have caught up.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Decouple last important WAL record LSN from WAL insert locks

2022-11-27 Thread Bharath Rupireddy
On Sun, Nov 27, 2022 at 2:43 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-11-23 19:12:07 +0530, Bharath Rupireddy wrote:
> > While working on something else, I noticed that each WAL insert lock
> > tracks its own last important WAL record's LSN (lastImportantAt) and
> > both the bgwriter and checkpointer later computes the max
> > value/server-wide last important WAL record's LSN via
> > GetLastImportantRecPtr(). While doing so, each WAL insertion lock is
> > acquired in exclusive mode in a for loop. This seems like too much
> > overhead to me.
>
> GetLastImportantRecPtr() should be a very rare operation, so it's fine for it
> to be expensive. The important thing is for the maintenance of the underlying
> data to be very cheap.
>
> > I quickly coded a patch (attached herewith) that
> > tracks the server-wide last important WAL record's LSN in
> > XLogCtlInsert (lastImportantPos) protected with a spinlock and gets
> > rid of lastImportantAt from each WAL insert lock.
>
> That strikes me as a very bad idea. It adds another point of contention to a
> very very hot code path, to make a very rare code path cheaper.

Thanks for the response. I agree that GetLastImportantRecPtr() gets
called rarely, however, what concerns me is that it's taking all the
WAL insertion locks when it gets called.

Is tracking lastImportantPos as pg_atomic_uint64 in XLogCtlInsert any
better than an explicit spinlock? I think it's better on platforms
where atomics are supported, however, it boils down to using a spin
lock on the platforms where atomics aren't supported.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Collation version tracking for macOS

2022-11-27 Thread Thomas Munro
On Sat, Nov 26, 2022 at 6:27 PM Thomas Munro  wrote:
> This is just a first cut, but enough to try out and see if we like it,
> what needs to be improved, what edge cases we haven't thought about
> etc.  Let me know what you think.

BTW one problem to highlight (mentioned but buried in the test
comments), is that REFRESH VERSION doesn't affect other sessions or
even the current session.  You have to log out and back in again to
pick up the new version.  Obviously that's not good enough, but fixing
that involves making it transactional, I think.  If you abort, we have
to go back to using the old version, if you commit you keep the new
version and we might also consider telling other backends to start
using the new version -- or something like that.  I think that's just
a Small Matter of Programming, but a little bit finickity and I need
to take a break for a bit and go work on bugs elsewhere, hence v8
didn't address that yet.




Re: Allow processes to reset procArrayGroupNext themselves instead of leader resetting for all the followers

2022-11-27 Thread Bharath Rupireddy
On Sun, Nov 27, 2022 at 2:48 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-11-24 10:43:46 +0530, Bharath Rupireddy wrote:
> > While working on something else, I noticed that the proc array group
> > XID clearing leader resets procArrayGroupNext of all the followers
> > atomically along with procArrayGroupMember. ISTM that it's enough for
> > the followers to exit the wait loop and continue if the leader resets
> > just procArrayGroupMember, the followers can reset procArrayGroupNext
> > by themselves. This relieves the leader a bit, especially when there
> > are many followers, as it avoids a bunch of atomic writes and
> > pg_write_barrier() for the leader .
>
> I doubt this is a useful change - the leader already has to modify the
> relevant cacheline (for procArrayGroupMember). That makes it pretty much free
> to modify another field in the same cacheline.

Agreed. Thanks for the response.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Reducing power consumption on idle servers

2022-11-27 Thread Bharath Rupireddy
On Mon, Nov 28, 2022 at 5:27 AM Thomas Munro  wrote:
>
> "The trigger_file and promote_trigger_file have been removed." was
> missing some words.  I've also added a sentence to say which releases
> were involved, to make it like nearby paragraphs about other obsolete
> stuff.

LGTM.

> The funny thing about that "obsolete" appendix is that it's intended
> as a URL-preserving way to document the recovery.conf file's demise in
> r12.  It doesn't look like the right place to document ongoing changes
> to recovery-related GUCs in general.  In this particular case it's
> sort of OK because the old trigger_file GUC was indeed in
> recovery.conf, so it's not completely irrelevant.  So maybe it's OK?
> Perhaps in future, in a separate commit, we could have a new appendix
> "Obsolete settings" that has an alphabetical list of the fallen?

Tracking down history to figure out and add all the removed/renamed
settings under a new appendix seems a tedious task. The best is to
look into corresponding release notes to figure out which ones have
been removed/renamed/altered IMO.

FWIW, it's been a while (3 major releases have happened) since
recovery.conf was removed, I think we need to do away with
appendix-obsolete-recovery-config.sgml someday.

> The main documentation of pg_promote() etc now has "The parameter
> promote_trigger_file has been removed" in the
> places where the GUC was previously mentioned.  Perhaps we should just
> remove the mentions completely (it's somehow either too much and not
> enough information), but I'm OK with leaving those in for now until
> some better place exists for historical notes.

+1 for the way it is in the v12 patch i.e. specifying removal of it in
the docs. However, I don't see any reason to keep the traces of it in
the code, can we remove "Use of trigger file via promote_trigger_file
is now fully removed." sentence in the code below?

+ * Wait for more WAL to arrive, when we will be woken
+ * immediately by the WAL receiver. Use of trigger file
+ * via promote_trigger_file is now fully removed.
  */

> So this is the version I will push unless someone sees anything more
> to tweak about the documentation.

Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Support logical replication of DDLs

2022-11-27 Thread li jie
>
>
> All three commands are captured by the event trigger. The first and
> second command ends up getting deparsed, WAL-logged and
> replayed on the subscriber. The replay of the ALTER TABLE command
> causes a duplicate constraint error. The problem is that
> while subcommands are captured by event triggers by default, they
> don't need to be deparsed and WAL-logged for DDL replication.
> To do that we can pass the isCompleteQuery variable in
> ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> EventTriggerAlterTableEnd() and make this information available in
> CollectedCommand so that any subcommands can be skipped.
>

May not be able to skip any subcommands.
like ' ALTER TABLE  ctlt1_like ALTER COLUMN b SET STORAGE EXTERNAL;'

It cannot be represented in the CREATE TABLE  statement.


Regards,  Adger


Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-27 Thread Michael Paquier
On Fri, Nov 25, 2022 at 04:05:13PM +0800, Richard Guo wrote:
> +1.  In 964c0d0f the checks in qual_is_pushdown_safe() that need to
> reference 'subquery' were moved to subquery_is_pushdown_safe(), so param
> 'subquery' is not used any more.  I think we can just remove it.
> 
> I wonder if we need to revise the comment atop qual_is_pushdown_safe()
> too which says
> 
>  * rinfo is a restriction clause applying to the given subquery (whose RTE
>  * has index rti in the parent query).
> 
> since there is no 'given subquery' after we remove it from the params.

When it comes to specific subpaths of the tree, it is sometimes good
to keep some symmetry in the arguments of the sub-routines used, but
that does not seem to apply much to allpaths.c.  Removing that is fine
by me, so let's do this.
--
Michael


signature.asc
Description: PGP signature


Fix comment in SnapBuildFindSnapshot

2022-11-27 Thread Masahiko Sawada
Hi,

We have the following comment in SnapBuildFindSnapshot():

* c) transition from FULL_SNAPSHOT to CONSISTENT.
*
* In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'

It mentions "(state d) )", which seems like a typo of "(state d)", but
there is no "state d" in the first place. Reading the discussion of
the commit 955a684e040 that introduced this comment, this was a
comment for an old version patch[1]. So I think we can remove this
part.

I've attached the patch.

Regards,

[1] 
https://www.postgresql.org/message-id/20170505004237.edtahvrwb3uwd5rs%40alap3.anarazel.de

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_comment_in_SnapBuildFindSnapshot.patch
Description: Binary data


Re: Report roles in pg_upgrade pg_ prefix check

2022-11-27 Thread Michael Paquier
On Thu, Nov 24, 2022 at 12:31:09PM +0100, Daniel Gustafsson wrote:
> Looking at a recent pg_upgrade thread I happened to notice that the check for
> roles with a pg_ prefix only reports the error, not the roles it found.  Other
> similar checks where the user is expected to alter the old cluster typically
> reports the found objects in a textfile.  The attached adds reporting to make
> that class of checks consistent (the check for prepared transactions which 
> also
> isn't reporting is different IMO as it doesn't expect ALTER commands).
> 
> As this check is only executed against the old cluster the patch removes the
> check when printing the error.

+1.  A backpatch would be nice, though not strictly mandatory as
that's not a bug fix.

+   ntups = PQntuples(res);
+   i_rolname = PQfnumber(res, "rolname");

Would it be worth adding the OID on top of the role name in the
generated report?  That would be a free meal.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-27 Thread Masahiko Sawada
On Fri, Nov 25, 2022 at 5:58 PM Maxim Orlov  wrote:
>
>
>
> On Fri, 25 Nov 2022 at 09:40, Amit Kapila  wrote:
>>
>> On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila  wrote:
>> >
>> > On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada  
>> > wrote:
>> > >
>> > > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila  
>> > > wrote:
>> > >
>> > > Agreed not to have a test case for this.
>> > >
>> > > I've attached an updated patch. I've confirmed this patch works for
>> > > all supported branches.
>> > >
>> >
>> > I have slightly changed the checks used in the patch, otherwise looks
>> > good to me. I am planning to push (v11-v15) the attached tomorrow
>> > unless there are more comments.
>> >
>>
>> Pushed.
>
> A big thanks to you! Could you also, close the commitfest entry 
> https://commitfest.postgresql.org/41/4024/, please?

Closed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-27 Thread Michael Paquier
On Sun, Nov 27, 2022 at 09:49:31PM +0900, Ian Lawrence Barwick wrote:
> Thanks for the quick update!

FWIW, I do intend to tackle this last part ASAP, as the last piece to
commit to get the full picture in the tree.
--
Michael


signature.asc
Description: PGP signature


Re: Amcheck verification of GiST and GIN

2022-11-27 Thread Andrey Borodin
On Sun, Nov 27, 2022 at 1:29 PM Andrey Borodin  wrote:
>
> GiST verification checks only one invariant that can be verified if
> page locks acquired the same way as page split does.
> GIN does not require ShareLock because it does not check cross-level 
> invariants.
>

I was wrong. GIN check does similar gin_refind_parent() to lock pages
in bottom-up manner and truly verify downlink-child_page invariant.

Here's v17. The only difference is that I added progress reporting to
GiST verification.
I still did not implement heapallindexed for GIN. Existence of pending
lists makes this just too difficult for a weekend coding project :(

Thank you!

Best regards, Andrey Borodin.


v17-0002-Add-gist_index_parent_check-function-to-verify-G.patch
Description: Binary data


v17-0003-Add-gin_index_parent_check-to-verify-GIN-index.patch
Description: Binary data


v17-0001-Refactor-amcheck-to-extract-common-locking-routi.patch
Description: Binary data


Re: Bug in row_number() optimization

2022-11-27 Thread David Rowley
On Sat, 26 Nov 2022 at 05:19, Tom Lane  wrote:
>
> Sergey Shinderuk  writes:
> > What about user-defined operators? I created my own <= operator for int8
> > which returns true on null input, and put it in a btree operator class.
> > Admittedly, it's weird that (null <= 1) evaluates to true. But does it
> > violate  the contract of the btree operator class or something? Didn't
> > find a clear answer in the docs.
>
> It's pretty unlikely that this would work during an actual index scan.
> I'm fairly sure that btree (and other index AMs) have hard-wired
> assumptions that index operators are strict.

If we're worried about that then we could just restrict this
optimization to only work with strict quals.

The proposal to copy the datums into the query context does not seem
to me to be a good idea. If there are a large number of partitions
then it sounds like we'll leak lots of memory.  We could invent some
partition context that we reset after each partition, but that's
probably more complexity than it would be worth.

I've attached a draft patch to move the code to nullify the aggregate
results so that's only done once per partition and adjusted the
planner to limit this to strict quals.

David
diff --git a/src/backend/executor/nodeWindowAgg.c 
b/src/backend/executor/nodeWindowAgg.c
index 81ba024bba..110c594edd 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2300,7 +2300,27 @@ ExecWindowAgg(PlanState *pstate)
continue;
}
else
+   {
winstate->status = 
WINDOWAGG_PASSTHROUGH;
+
+   /*
+* Otherwise, if we're not the 
top-window, we'd better
+* NULLify the aggregate 
results, else, by-ref result
+* types will point to freed 
memory.  We can get away
+* without storing the final 
result of the WindowFunc
+* because in the planner we 
insisted that the
+* runcondition is strict.  
Setting these to NULL will
+* ensure the top-level 
WindowAgg filter will always
+* filter out the rows produced 
in this WindowAgg when
+* not in WINDOWAGG_RUN state.
+*/
+   numfuncs = winstate->numfuncs;
+   for (i = 0; i < numfuncs; i++)
+   {
+   
econtext->ecxt_aggvalues[i] = (Datum) 0;
+   
econtext->ecxt_aggnulls[i] = true;
+   }
+   }
}
else
{
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 4ddaed31a4..a25956200e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2399,6 +2399,22 @@ check_and_push_window_quals(Query *subquery, 
RangeTblEntry *rte, Index rti,
if (list_length(opexpr->args) != 2)
return true;
 
+   /*
+* Currently, we restrict this optimization to strict OpExprs.  The 
reason
+* for this is that during execution, once we stop evaluating 
WindowFuncs
+* on lower-level WindowAgg nodes, since we're no longer updating them,
+* we NULLify the aggregate and window aggregate results to prevent 
by-ref
+* result typed window aggregates from pointing to free'd memory (this 
can
+* happen on non-FLOAT8PASSBYVAL builds).  Upper-level WindowAgg nodes 
may
+* make reference to these results in their filter clause and we can
+* ensure the filter remain false by making sure the operator is strict
+* and by performing the NULLification when the run-condition is no 
longer
+* true.
+*/
+   set_opfuncid(opexpr);
+   if (!func_strict(opexpr->opfuncid))
+   return false;
+
/*
 * Check for plain Vars that reference window functions in the subquery.
 * If we find any, we'll ask find_window_run_conditions() if 'opexpr' 
can


Re: Missing update of all_hasnulls in BRIN opclasses

2022-11-27 Thread Tomas Vondra
Here's an improved version of the fix I posted about a month ago.

0001

Adds tests demonstrating the issue, as before. I realized there's an
isolation test in src/test/module/brin that can demonstrate this, so I
modified it too, not just the pageinspect test as before.


0002

Uses the combination of all_nulls/has_nulls to identify "empty" range,
and does not store them to disk. I however realized not storing "empty"
ranges is probably not desirable. Imagine a table with a "gap" (e.g. due
to a batch DELETE) of pages with no rows:

create table x (a int) with (fillfactor = 10);
insert into x select i from generate_series(1,1000) s(i);
delete from x where a < 1000;
create index on x using brin(a) with (pages_per_range=1);

Any bitmap index scan using this index would have to scan all those
empty ranges, because there are no summaries.


0003

Still uses the all_nulls/has_nulls flags to identify empty ranges, but
stores them - and then we check the combination in bringetbitmap() to
skip those ranges as not matching any scan keys.

This also restores some of the existing behavior - for example creating
a BRIN index on entirely empty table (no pages at all) still allocates a
48kB index (3 index pages, 3 fsm pages). Seems a bit strange, but it's
an existing behavior.


As explained before, I've considered adding an new flag to one of the
BRIN structs - BrinMemTuple or BrinValues. But we can't add as last
field to BrinMemTuple because there already is FLEXIBLE_ARRAY_MEMBER,
and adding a field to BrinValues would change stride of the bt_columns
array. So this would break ABI, making this not backpatchable.

Furthermore, if we want to store summaries for empty ranges (which is
what 0003 does), we need to store the flag in the BRIN index tuple. And
we can't change the on-disk representation in backbranches, so encoding
this in the existing tuple seems like the only way.

So using the combination of all_nulls/has_nulls flag seems like the only
viable option, unfortunately.

Opinions? Considering this will need to be backpatches, it'd be good to
get some feedback on the approach. I think it's fine, but it would be
unfortunate to fix one issue but break BRIN in a different way.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom dea5e7aa821ddf745e509371f33bf1953ff6e853 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 22 Oct 2022 12:47:33 +0200
Subject: [PATCH 1/3] pageinspect brinbugs test

Introduce a brinbugs.sql test suite into pageinspect, demonstrating the
issue with forgetting about initial NULL values. Ultimately this should
be added to the exisging brin.sql suite.

Furthermore, this tweaks an existing isolation test, originally intended
to test concurrent inserts and summarization, to also test this - it's
enough to ensure the first value added to the table is NULL.
---
 contrib/pageinspect/Makefile  |   2 +-
 contrib/pageinspect/expected/brinbugs.out | 222 ++
 contrib/pageinspect/sql/brinbugs.sql  | 114 +
 ...summarization-and-inprogress-insertion.out |   6 +-
 ...ummarization-and-inprogress-insertion.spec |   1 +
 5 files changed, 341 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pageinspect/expected/brinbugs.out
 create mode 100644 contrib/pageinspect/sql/brinbugs.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index ad5a3ac5112..67eb02b78fd 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -22,7 +22,7 @@ DATA =  pageinspect--1.10--1.11.sql \
 	pageinspect--1.0--1.1.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pageinspect/expected/brinbugs.out b/contrib/pageinspect/expected/brinbugs.out
new file mode 100644
index 000..23843caa138
--- /dev/null
+++ b/contrib/pageinspect/expected/brinbugs.out
@@ -0,0 +1,222 @@
+create extension pageinspect;
+create table t (a int, b int);
+create index on t using brin (a, b);
+-- both columns should have has_nulls=false and [1,1] range
+truncate t;
+insert into t values (1,1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |  value   
++++--+--+-+--
+  1 |  0 |  1 | f| f| f   | {1 .. 1}
+  1 |  0 |  2 | f| f| f   | {1 .. 1}
+(2 rows)
+
+-- first column should have all_nulls=true, second has_nulls=false and [1,1] range
+truncate t;
+insert into t values (null, 1);
+vacuum full t;
+select * from brin_page_items(get_raw_page('t_a_b_idx', 2), 't_a_b_idx'::regclass);
+ 

Re: [DOCS] Stats views and functions not in order?

2022-11-27 Thread Peter Smith
On Sat, Nov 26, 2022 at 2:43 PM David G. Johnston
 wrote:
>
> On Wed, Nov 23, 2022 at 1:36 AM Peter Smith  wrote:
>>
>> On Thu, Nov 17, 2022 at 8:46 AM David G. Johnston
>>  wrote:
>>
>> > Also, make it so each view ends up being its own separate page.
>> >
>>
>> I did not do this. AFAIK those views of chapter 54 get rendered to
>> separate pages only because they are top-level . So I do not
>> know how to put all these stats views onto different pages without
>> radically changing the document structure. Anyway – doing this would
>> be incompatible with my  changes of patch 0006 (see above).
>>
>
> I did some experimentation and reading on this today.  Short answer - turn 
> each view into a refentry under a dedicated sect2 where the table resides.

Thanks very much for your suggestion.

I will look at redoing the v7-0003 patch using that approach when I
get some more time (maybe in a day or so),

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [DOCS] Stats views and functions not in order?

2022-11-27 Thread Peter Smith
On Fri, Nov 25, 2022 at 11:09 PM Peter Eisentraut
 wrote:
>
> On 23.11.22 09:36, Peter Smith wrote:
> > PSA new patches. Now there are 6 of them. If some of the earlier
> > patches are agreeable can those ones please be committed? (because I
> > think this patch may be susceptible to needing a big rebase if
> > anything in those tables changes).
>
> I have committed
>
> v6-0001-Re-order-sections-of-28.4.-Progress-Reporting.patch
> v6-0003-Re-order-Table-28.12-Wait-Events-of-type-LWLock.patch
> v6-0004-Re-order-Table-28.35-Per-Backend-Statistics-Funct.patch
>
> which seemed to have clear consensus.
>
> v6-0002-Re-order-Table-28.2-Collected-Statistics-Views.patch
>
> This one also seems ok, need a bit more time to look it over.
>
> v6-0005-Cleanup-view-name-hyperlinks-for-Tables-28.1-and-.patch
> v6-0006-Remove-all-stats-views-from-the-ToC-of-28.2.patch
>
> I wasn't sure yet whether these had been reviewed yet, sine they were
> late additions to the patch series.

Thank you for pushing those ones.

PSA the remaining patches re-posted so cfbot can keep working

v6-0002 --> v7-0001
v6-0005 -> v7-0002
v6-0006 -> v7-0003

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v7-0001-Re-order-Table-28.2-Collected-Statistics-Views.patch
Description: Binary data


v7-0003-Remove-all-stats-views-from-the-ToC-of-28.2.patch
Description: Binary data


v7-0002-Cleanup-view-name-hyperlinks-for-Tables-28.1-and-.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-27 Thread Thomas Munro
On Sun, Nov 27, 2022 at 6:15 PM Ian Lawrence Barwick  wrote:
> 2022年11月22日(火) 5:50 Laurenz Albe :
> > On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:
> > > Robert Haas  writes:
> > > > The reason that I pushed back -- not as successfully as I would have
> > > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I
> > > > know there are people using the old method successfully, and it's not
> > > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to
> > > > the feedback that such people exist and to hearing why adopting one of
> > > > the newer methods would be a problem for them, if that's the case. But
> > > > if there's no evidence that such people exist or that changing is a
> > > > problem for them, I don't think waiting 5 years on principle is good
> > > > for the project.
> > >
> > > We make incompatible changes in every release; see the release notes.
> > > Unless somebody can give a plausible use-case where this'd be a
> > > difficult change to deal with, I concur that we don't need to
> > > deprecate it ahead of time.
> >
> > Since I am the only one that seems to worry, I'll shut up.  You are probably
> > right that it the feature won't be missed by many users.
>
> FWIW, though I prefer to err on the side of caution when removing features
> from anything, I am struggling to remember ever having used
> "promote_trigger_file"
>  (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes
> brings up a single example from ca. 2012 when I appear to have been
> experimenting with replication.
>
> On a quick web search, a large part of the results are related to its change
> to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf 
> files;
> most of the rest seem to be blog articles/tutorials on setting up replication.

Thanks Ian, Laurenz, Robert, Tom for the discussion.  Seems like we're
all set then.  Sorry for delaying over trivialities, but I have a
couple more comments about the documentation before committing:

"The trigger_file and promote_trigger_file have been removed." was
missing some words.  I've also added a sentence to say which releases
were involved, to make it like nearby paragraphs about other obsolete
stuff.

The funny thing about that "obsolete" appendix is that it's intended
as a URL-preserving way to document the recovery.conf file's demise in
r12.  It doesn't look like the right place to document ongoing changes
to recovery-related GUCs in general.  In this particular case it's
sort of OK because the old trigger_file GUC was indeed in
recovery.conf, so it's not completely irrelevant.  So maybe it's OK?
Perhaps in future, in a separate commit, we could have a new appendix
"Obsolete settings" that has an alphabetical list of the fallen?

The main documentation of pg_promote() etc now has "The parameter
promote_trigger_file has been removed" in the
places where the GUC was previously mentioned.  Perhaps we should just
remove the mentions completely (it's somehow either too much and not
enough information), but I'm OK with leaving those in for now until
some better place exists for historical notes.

So this is the version I will push unless someone sees anything more
to tweak about the documentation.
From 15da25b8c46e3e2a57431784d07bb87c6cce5e9f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 28 Nov 2022 10:50:44 +1300
Subject: [PATCH v12] Remove promote_trigger_file.

Previously, an idle startup (recovery) process would wake up every 5
seconds to have a chance to poll for promote_trigger_file, even if that
GUC was not configured.  That promotion triggering mechanism was
effectively superseded by pg_ctl promote and pg_promote() a long time
ago.  There probably aren't many users left and it's very easy to change
to the modern mechanisms, so we agreed to remove the feature.

This is part of a campaign to reduce wakeups on idle systems.

Author: Simon Riggs 
Reviewed-by: Bharath Rupireddy 
Reviewed-by: Robert Haas 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
---
 .../appendix-obsolete-recovery-config.sgml| 12 +++
 doc/src/sgml/config.sgml  | 18 ---
 doc/src/sgml/high-availability.sgml   | 24 ++
 src/backend/access/transam/xlogrecovery.c | 32 ---
 src/backend/utils/misc/guc_tables.c   | 10 --
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/access/xlogrecovery.h |  1 -
 7 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/appendix-obsolete-recovery-config.sgml b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
index 1cf4913114..2414b5e9ec 100644
--- a/doc/src/sgml/appendix-obsolete-recovery-config.sgml
+++ b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
@@ -34,14 +34,10 @@

 

-The
-trigger_file
-
- trigger_file
- promote_trigger_file
-
-

Re: wake up logical workers after ALTER SUBSCRIPTION

2022-11-27 Thread Nathan Bossart
On Thu, Nov 24, 2022 at 05:26:27AM +, Hayato Kuroda (Fujitsu) wrote:
> I have no comments for the v3 patch.

Thanks for reviewing!  Does anyone else have thoughts on the patch?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-27 Thread Nathan Bossart
Thanks for taking a look!

On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote:
> * not sure I believe that everything it does can always be aborted out
> of and shutdown - to achieve that you will need a
> CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least

I did something like this earlier, but was advised to simply let the
functions finish as usual during shutdown [0].  I think this is what the
checkpointer process does today, anyway.

> * not sure why you want immediate execution of custodian tasks - I
> feel supporting two modes will be a lot harder. For me, I would run
> locally when !IsUnderPostmaster and also in an Assert build, so we can
> test it works right - i.e. running in its own process is just a
> production optimization for performance (which is the stated reason
> for having this)

I added this because 0004 involves requesting a task from the postmaster,
so checking for IsUnderPostmaster doesn't work.  Those tasks would always
run immediately.  However, we could use IsPostmasterEnvironment instead,
which would allow us to remove the "immediate" argument.  I did it this way
in v14.

I'm not sure about running locally in Assert builds.  It's true that would
help ensure there's test coverage for the task logic, but it would also
reduce coverage for the custodian logic.  And in general, I'm worried about
having Assert builds use a different code path than production builds.

> 0005 seems good from what I know
> * There is no check to see if it worked in any sane time

What did you have in mind?  Should the custodian begin emitting WARNINGs
after a while?

> * It seems possible that "Old" might change meaning - will that make
> it break/fail?

I don't believe so.

> Rather than explicitly use DEBUG1 everywhere I would have an
> #define CUSTODIAN_LOG_LEVEL LOG
> so we can run with it in LOG mode and then set it to DEBUG1 with a one
> line change in a later phase of Beta

I can create a separate patch for this, but I don't think I've ever seen
this sort of thing before.  Is the idea just to help with debugging during
the development phase?

> I can't really comment with knowledge on sub-patches 0002 to 0004.
> 
> Perhaps you should aim to get 1, 5, 6 committed first and then return
> to the others in a later CF/separate thread?

That seems like a good idea since those are all relatively self-contained.
I removed 0002-0004 in v14.

[0] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 443c3f842785554476b1a353bcb1af13f426116b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v14 1/3] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();

Re: Amcheck verification of GiST and GIN

2022-11-27 Thread Andrey Borodin
Hello!

Thank you for the review!

On Thu, Nov 24, 2022 at 6:04 PM Jose Arthur Benetasso Villanova
 wrote:
>
> It compiled with those 2 warnings:
>
> verify_gin.c: In function 'gin_check_parent_keys_consistency':
> verify_gin.c:481:38: warning: declaration of 'maxoff' shadows a previous
> local [-Wshadow=compatible-local]
>481 | OffsetNumber maxoff =
> PageGetMaxOffsetNumber(page);
>|  ^~
> verify_gin.c:453:41: note: shadowed declaration is here
>453 | maxoff;
>| ^~
> verify_gin.c:423:25: warning: unused variable 'heapallindexed'
> [-Wunused-variable]

Fixed.

>423 | boolheapallindexed = *((bool*)callback_state);
>| ^~
>

This one is in progress yet, heapallindexed check is not implemented yet...


>
> Also, I'm not sure about postgres' headers conventions, inside amcheck.h,
> there is "miscadmin.h" included, and inside verify_gin.c, verify_gist.h
> and verify_nbtree.c both amcheck.h and miscadmin.h are included.
Fixed.

>
> About the documentation, the bt_index_parent_check has comments about the
> ShareLock and "SET client_min_messages = DEBUG1;", and both
> gist_index_parent_check and gin_index_parent_check lack it. verify_gin
> uses DEBUG3, I'm not sure if it is on purpose, but it would be nice to
> document it or put DEBUG1 to be consistent.
GiST and GIN verifications do not take ShareLock for parent checks.
B-tree check cannot verify cross-level invariants between levels when
the index is changing.

GiST verification checks only one invariant that can be verified if
page locks acquired the same way as page split does.
GIN does not require ShareLock because it does not check cross-level invariants.

Reporting progress with DEBUG1 is a good idea, I did not know that
this feature exists. I'll implement something similar in following
versions.

> I did the following test:

Cool! Thank you!

>
> There are more code paths to follow to check the entire code, and I had a
> hard time to corrupt the indices. Is there any automated code to corrupt
> index to test such code?
>

Heapam tests do this in an automated way, look into this file
t/001_verify_heapam.pl.
Surely we can write these tests. At least automate what you have just
done in the review. However, committing similar checks is a very
tedious work: something will inevitably turn buildfarm red as a
watermelon.

I hope I'll post a version with DEBUG1 reporting and heapallindexed soon.
PFA current state.
Thank you for looking into this!

Best regards, Andrey Borodin.


v16-0001-Refactor-amcheck-to-extract-common-locking-routi.patch
Description: Binary data


v16-0003-Add-gin_index_parent_check-to-verify-GIN-index.patch
Description: Binary data


v16-0002-Add-gist_index_parent_check-function-to-verify-G.patch
Description: Binary data


Re: ALTER TABLE uses a bistate but not for toast tables

2022-11-27 Thread Justin Pryzby
On Wed, Sep 07, 2022 at 10:48:39AM +0200, Drouvot, Bertrand wrote:
> Hi,
> 
> On 6/22/22 4:38 PM, Justin Pryzby wrote:
> > ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid 
> > clobbering
> > and polluting the buffers.
> > 
> > But heap_insert() then calls
> > heap_prepare_insert() >
> > heap_toast_insert_or_update >
> > toast_tuple_externalize >
> > toast_save_datum >
> > heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( 
> > */);
> 
> What do you think about creating earlier a new dedicated bistate for the
> toast table?

Yes, but I needed to think about what data structure to put it in...

Here, I created a 2nd bistate for toast whenever creating a bistate for
heap.  That avoids the need to add arguments to tableam's
table_tuple_insert(), in addition to the 6 other functions in the call
stack.

I also updated rewriteheap.c to handle the same problem in CLUSTER:

postgres=# DROP TABLE t; CREATE TABLE t AS SELECT i, repeat((555+i)::text, 
123456)t FROM generate_series(1,)i;
postgres=# VACUUM FULL VERBOSE t ; SELECT COUNT(1), datname, 
coalesce(c.relname,b.relfilenode::text), d.relname FROM pg_buffercache b LEFT 
JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) LEFT JOIN pg_class 
d ON d.reltoastrelid=c.oid LEFT JOIN pg_database db ON db.oid=b.reldatabase 
GROUP BY 2,3,4 ORDER BY 1 DESC LIMIT 22;

Unpatched:
  5000 | postgres | pg_toast_96188840   | t
  => 40MB of shared buffers

Patched:
  2048 | postgres | pg_toast_17097  | t

Note that a similar problem seems to exist in COPY ... but I can't see
how to fix that one.

-- 
Justin
>From 90449459a6d8c4f87bfe23f462a45649beed4a8d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 21 Jun 2022 22:28:06 -0500
Subject: [PATCH v3] WIP: use BulkInsertState for toast tuples, too

DONE: ALTER, CLUSTER
TODO: copyto, copyfrom?

slot_getsomeattrs slot_deform_heap_tuple fetchatt
heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended
initscan
table_beginscan table_scan_getnextslot
RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache

(gdb) bt
 #0  table_open (relationId=relationId@entry=16390, lockmode=lockmode@entry=1) at table.c:40
 #1  0x56444cb23d3c in toast_fetch_datum (attr=attr@entry=0x7f67933cc6cc) at detoast.c:372
 #2  0x56444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at detoast.c:123
 #3  0x56444d07a4c8 in pg_detoast_datum_packed (datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743
 #4  0x56444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at varlena.c:224
 #5  0x56444d0434f9 in textout (fcinfo=) at varlena.c:573
 #6  0x56444d078f10 in FunctionCall1Coll (flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, arg1=arg1@entry=140082828592844) at fmgr.c:1124
 #7  0x56444d079d7f in OutputFunctionCall (flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) at fmgr.c:1561
 #8  0x56444ccb1665 in CopyOneRowTo (cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at copyto.c:975
 #9  0x56444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at copyto.c:891
 #10 0x56444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, stmt=stmt@entry=0x56444e3759b0, stmt_location=0, stmt_len=48, processed=processed@entry=0x7ffc212a6310) at copy.c:308

cluster:
heapam_relation_copy_for_cluster
reform_and_rewrite_tuple
rewrite_heap_tuple
raw_heap_insert

//-os-only: linux
---
 src/backend/access/common/toast_internals.c |  6 --
 src/backend/access/heap/heapam.c| 23 +++--
 src/backend/access/heap/heaptoast.c | 11 ++
 src/backend/access/heap/rewriteheap.c   |  6 +-
 src/backend/access/table/toast_helper.c |  6 --
 src/include/access/heaptoast.h  |  4 +++-
 src/include/access/hio.h|  1 +
 src/include/access/toast_helper.h   |  3 ++-
 src/include/access/toast_internals.h|  4 +++-
 9 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 576e585a89f..e5f8a0b7073 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -118,7 +118,8 @@ toast_compress_datum(Datum value, char cmethod)
  */
 Datum
 toast_save_datum(Relation rel, Datum value,
- struct varlena *oldexternal, int options)
+ struct varlena *oldexternal, int options,
+ BulkInsertState bistate)
 {
 	Relation	toastrel;
 	Relation   *toastidxs;
@@ -321,7 +322,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, mycid, options, bistate ?
+bistate->toast_state : NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git 

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-27 Thread Andrey Borodin
On Tue, Nov 8, 2022 at 9:06 PM Andrey Borodin  wrote:
>
> On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian  wrote:
> >
> > So, what happens when an insufficient number of synchronous replicas
> > reply?
>
> It's a failover.
>
> > Sessions hang because the synchronous behavior cannot be
> > guaranteed.  We then _allow_ query cancel so the user or administrator
> > can get out of the hung sessions and perhaps modify
> > synchronous_standby_names.
>
> Administrators should not modify synchronous_standby_names.
> Administrator must shoot this not in the head.
>

Some funny stuff. If a user tries to cancel a non-replicated transaction
Azure Postgres will answer: "user requested cancel while waiting for
synchronous replication ack. The COMMIT record has already flushed to
WAL locally and might not have been replicatead to the standby. We
must wait here."
AWS RDS will answer: "ignoring request to cancel wait for synchronous
replication"
Yandex Managed Postgres will answer: "canceling wait for synchronous
replication due requested, but cancelation is not allowed. The
transaction has already committed locally and might not have been
replicated to the standby. We must wait here."

So, for many services providing Postgres as a service it's only a
matter of wording.

Best regards, Andrey Borodin.




Re: Use fadvise in wal replay

2022-11-27 Thread Andrey Borodin
On Fri, Nov 25, 2022 at 1:12 PM Pavel Borisov  wrote:
>
> As I've written up in the thread we can not gain much from this
> optimization. The results of Jakub shows around 2% difference:
>
> >baseline, master, default Linux readahead (128kb):
> >33.979, 0.478
> >35.137, 0.504
> >34.649, 0.518>
>
> >master+patched, readahead disabled:
> >34.338, 0.528
> >34.568, 0.575
> >34.007, 1.136
>
> >master+patched, readahead enabled (as default):
> >33.935, 0.523
> >34.109, 0.501
> >33.408, 0.557
>

The performance benefit shows up only when readahead is disabled. And
on many workloads readahead brings unneeded data into page cache, so
it's preferred configuration.
In this particular case, time to apply WAL decreases from 53s to 33s.

Thanks!

Best Regards, Andrey Borodin.




Re: pglz compression performance, take two

2022-11-27 Thread Andrey Borodin
Hi Tomas,

On Sun, Nov 27, 2022 at 8:02 AM Tomas Vondra
 wrote:
>
> 1) For PGLZ_HISTORY_SIZE it uses literal 0x0fff, with the explanation:
>
> /* to avoid compare in iteration */
>
> which I think means intent to use this value as a bit mask, but then the
> only place using PGLZ_HISTORY_SIZE does
>
> if (hist_next == PGLZ_HISTORY_SIZE) ...
>
> i.e. a comparison. Maybe I misunderstand the comment, though.
>

As far as I recollect, it's a leftover from an attempt to optimize the
code into branchless version
I.e. instead of
if(hist_next>=PGLZ_HISTORY_SIZE)
 hist_next = 1;
use something like hist_next = hist_next & PGLZ_HISTORY_SIZE.
But the optimization did not show any measurable impact and was
improperly rolled back.

>
> 2) PGLZ_HistEntry was modified and replaces links (pointers) with
> indexes, but the comments still talk about "links", so maybe that needs
> to be changed.

The offsets still form a "linked list"... however I removed some
mentions of pointers, since they are not pointers anymore.

> Also, I wonder why next_id is int16 while hist_idx is
> uint16 (and also why id vs. idx)?

+1. I'd call them next and hash.

int16 next; /* instead of next_d */
uint16 hash; /* instead of hist_idx */

What do you think?
hist_idx comes from the function name... I'm not sure how far renaming
should go here.


>
> 3) minor formatting of comments
>
> 4) the comment in pglz_find_match about traversing the history seems too
> early - it's before handling invalid entries and cleanup, but it does
> not talk about that at all, and the actual while loop is after that.

Yes, this seems right for me.

PFA review fixes (step 1 is unchanged).
I did not include next_id->next and hist_idx -> hash rename.

Thank you!


Best regards, Andrey Borodin.


v7-0002-Review-fixes.patch
Description: Binary data


v7-0001-Reorganize-pglz-compression-code.patch
Description: Binary data


Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-11-27 Thread Peter Geoghegan
On Sun, Nov 27, 2022 at 8:54 AM Laurenz Albe  wrote:
> That's exactly what I was trying to debate.  Wouldn't it make sense to
> trigger VACUUM earlier so that it has a chance of being less heavy?
> On the other hand, if there are not sufficiently many modifications
> on the table to trigger autovacuum, perhaps it doesn't matter in many
> cases.

Maybe. There is a deeper problem here, though: table age is a really
terrible proxy for whether or not it's appropriate for VACUUM to
freeze preexisting all-visible pages. It's not obvious that half
autovacuum_freeze_max_age is much better than
autovacuum_freeze_max_age if your concern is avoiding getting too far
into debt on freezing. Afterall, this is debt that must be paid back
by freezing some number of physical heap pages, which in general has
approximately zero relationship with table age (we need physical units
for this, not logical units).

This is a long standing problem that I hope and expect will be fixed
in 16, by my ongoing work to completely remove the concept of
aggressive mode VACUUM:

https://commitfest.postgresql.org/40/3843/

This makes VACUUM care about both table age and the number of unfrozen
heap pages (mostly the latter). It weighs everything at the start of
each VACUUM, and decides on how it must advance relfrozenxid based on
the conditions in the table and the picture over time. Note that
performance stability is the main goal; we will not just keep
accumulating unfrozen pages for no good reason. All of the behaviors
previously associated with aggressive mode are retained, but are
individually applied on a timeline that is attuned to the needs of the
table (we can still wait for a cleanup lock, but that happens much
later than the point that the same page first becomes eligible for
freezing, not at exactly the same time).

In short, "aggressiveness" becomes a continuous thing, rather than a
discrete mode of operation, improving performance stability. We go
back to having only one kind of lazy vacuum, which is how things
worked prior to the introduction of the visibility map. (We did have
antiwraparound autovacuums in 8.3, but we did not have
aggressive/scan_all VACUUMs at the time.)

> Is that really so much less aggressive?  Will that autovacuum run want
> to process all pages that are not all-frozen?  If not, it probably won't
> do much good.  If yes, it will be just as heavy as an anti-wraparound
> autovacuum (except that it won't block other sessions).

Even if we assume that my much bigger patch set won't make it into 16,
it'll probably still be a good idea to do this in 16. I admit that I
haven't really given that question enough thought to be sure of that,
though. Naturally my goal is to get everything in. Hopefully I'll
never have to make that call.

It is definitely true that this patch is "the autovacuum side" of the
work from the other much larger patchset (which handles "the VACUUM
side" of things). This antiwraparound patch should probably be
considered in that context, even though it's theoretically independent
work. It just worked out that way.

> True.  On the other hand, it might happen that after this, people start
> worrying about normal autovacuum runs because they occasionally experience
> a table age autovacuum that is much heavier than the other ones.  And
> they can no longer tell the reason, because it doesn't show up anywhere.

But you can tell the reason, just by looking at the autovacuum log
reports. The only thing you can't do is see "(to prevent wraparound)"
in pg_stat_activity. That (and the autocancellation behavioral change)
are the only differences.

The big picture is that users really will have no good reason to care
very much about autovacuums that were triggered to advance
relfrozenxid (at least in the common case where we haven't needed to
make them antiwraparound autovacuums). They could almost (though not
quite) now be explained as "an autovacuum that takes place because
it's been a while since we did an autovacuum to deal with bloat and/or
tuple inserts". That will at least be reasonable if you assume all of
the patches get in.

-- 
Peter Geoghegan




Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Ted Yu
On Sun, Nov 27, 2022 at 7:41 AM Justin Pryzby  wrote:

> On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> > @@ -32,6 +33,12 @@ typedef enum BackendState
> >   STATE_DISABLED
> >  } BackendState;
> >
> > +/* Enum helper for reporting memory allocated bytes */
> > +enum allocation_direction
> > +{
> > + DECREASE = -1,
> > + INCREASE = 1,
> > +};
>
> BTW, these should have some kind of prefix, like PG_ALLOC_* to avoid
> causing the same kind of problem for someone else that another header
> caused for you by defining something somewhere called IGNORE (ignore
> what, I don't know).  The other problem was probably due to a define,
> though.  Maybe instead of an enum, the function should take a boolean.
>
> I still wonder whether there needs to be a separate CF entry for the
> 0001 patch.  One issue is that there's two different lists of people
> involved in the threads.
>
> --
> Justin
>
>
> I am a bit curious: why is the allocation_direction enum needed ?

pgstat_report_allocated_bytes() can be given the amount (either negative or
positive) to adjust directly.

Cheers


Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-11-27 Thread Laurenz Albe
On Sat, 2022-11-26 at 11:00 -0800, Peter Geoghegan wrote:

> > I think that is a good idea.
> > Wouldn't it make sense to trigger that at *half* 
> > "autovacuum_freeze_max_age"?
> 
> That would be equivalent to what I've done here, provided you also
> double the autovacuum_freeze_max_age setting.

That's exactly what I was trying to debate.  Wouldn't it make sense to
trigger VACUUM earlier so that it has a chance of being less heavy?
On the other hand, if there are not sufficiently many modifications
on the table to trigger autovacuum, perhaps it doesn't matter in many
cases.

> I did it this way
> because I believe that it has fewer problems. The approach I took
> makes the general perception that antiwraparound autovacuum are a
> scary thing (really just needed for emergencies) become true, for the
> first time.
> 
> We should expect to see very few antiwraparound autovacuums with the
> patch, but when we do see even one it'll be after a less aggressive
> approach was given the opportunity to succeed, but (for whatever
> reason) failed.

Is that really so much less aggressive?  Will that autovacuum run want
to process all pages that are not all-frozen?  If not, it probably won't
do much good.  If yes, it will be just as heavy as an anti-wraparound
autovacuum (except that it won't block other sessions).

> Just seeing any antiwraparound autovacuums will become
> a useful signal of something being amiss in a way that it just isn't
> at the moment. The docs can be updated to talk about antiwraparound
> autovacuum as a thing that you should encounter approximately never.
> This is possible even though the patch isn't invasive at all.

True.  On the other hand, it might happen that after this, people start
worrying about normal autovacuum runs because they occasionally experience
a table age autovacuum that is much heavier than the other ones.  And
they can no longer tell the reason, because it doesn't show up anywhere.

Yours,
Laurenz Albe




Re: pglz compression performance, take two

2022-11-27 Thread Tomas Vondra
On 11/27/22 17:02, Tomas Vondra wrote:
> Hi,
> 
> I took a look at the v6 patch, with the intention to get it committed. I
> have a couple minor comments:
> 
> 1) For PGLZ_HISTORY_SIZE it uses literal 0x0fff, with the explanation:
> 
> /* to avoid compare in iteration */
> 
> which I think means intent to use this value as a bit mask, but then the
> only place using PGLZ_HISTORY_SIZE does
> 
> if (hist_next == PGLZ_HISTORY_SIZE) ...
> 
> i.e. a comparison. Maybe I misunderstand the comment, though.
> 
> 
> 2) PGLZ_HistEntry was modified and replaces links (pointers) with
> indexes, but the comments still talk about "links", so maybe that needs
> to be changed. Also, I wonder why next_id is int16 while hist_idx is
> uint16 (and also why id vs. idx)?
> 
> 3) minor formatting of comments
> 
> 4) the comment in pglz_find_match about traversing the history seems too
> early - it's before handling invalid entries and cleanup, but it does
> not talk about that at all, and the actual while loop is after that.
> 
> Attached is v6 in 0001 (verbatim), with the review comments in 0002.
> 

BTW I've switched this to WoA, but the comments should be trivial to
resolve and to get it committed.

Also, I still see roughly 15-20% improvement on some compression-heavy
tests, as reported before. Which is nice.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pglz compression performance, take two

2022-11-27 Thread Tomas Vondra
Hi,

I took a look at the v6 patch, with the intention to get it committed. I
have a couple minor comments:

1) For PGLZ_HISTORY_SIZE it uses literal 0x0fff, with the explanation:

/* to avoid compare in iteration */

which I think means intent to use this value as a bit mask, but then the
only place using PGLZ_HISTORY_SIZE does

if (hist_next == PGLZ_HISTORY_SIZE) ...

i.e. a comparison. Maybe I misunderstand the comment, though.


2) PGLZ_HistEntry was modified and replaces links (pointers) with
indexes, but the comments still talk about "links", so maybe that needs
to be changed. Also, I wonder why next_id is int16 while hist_idx is
uint16 (and also why id vs. idx)?

3) minor formatting of comments

4) the comment in pglz_find_match about traversing the history seems too
early - it's before handling invalid entries and cleanup, but it does
not talk about that at all, and the actual while loop is after that.

Attached is v6 in 0001 (verbatim), with the review comments in 0002.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 5aee164e5d6ec716e607751d179755a35969a333 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 27 Nov 2022 15:02:27 +0100
Subject: [PATCH 1/2] Reorganize pglz compression code

This patch accumulates several changes:
1. Convert macro-functions to regular functions for readability
2. Use more compact hash table with uint16 indexes instead of pointers
3. Avoid prev pointer in hash table
4. Use 4-byte comparisons in a search instead of 1-byte
Total speedup about x1.4
---
 src/common/pg_lzcompress.c | 432 +
 1 file changed, 242 insertions(+), 190 deletions(-)

diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index 9de5d5a0d43..248545a108e 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -118,13 +118,13 @@
  *			our 4K maximum look-back distance is too small.
  *
  *			The compressor creates a table for lists of positions.
- *			For each input position (except the last 3), a hash key is
+ *			For each input position (except the last 4), a hash key is
  *			built from the 4 next input bytes and the position remembered
  *			in the appropriate list. Thus, the table points to linked
  *			lists of likely to be at least in the first 4 characters
  *			matching strings. This is done on the fly while the input
  *			is compressed into the output area.  Table entries are only
- *			kept for the last 4096 input positions, since we cannot use
+ *			kept for the last 4094 input positions, since we cannot use
  *			back-pointers larger than that anyway.  The size of the hash
  *			table is chosen based on the size of the input - a larger table
  *			has a larger startup cost, as it needs to be initialized to
@@ -146,17 +146,15 @@
  *			used for the next tag in the output.
  *
  *			For each subsequent entry in the history list, the "good_match"
- *			is lowered by 10%. So the compressor will be more happy with
- *			short matches the further it has to go back in the history.
+ *			is lowered roughly by 10%. So the compressor will be more happy
+ *			with short matches the further it has to go back in the history.
  *			Another "speed against ratio" preference characteristic of
  *			the algorithm.
  *
- *			Thus there are 3 stop conditions for the lookup of matches:
+ *			Thus there are 2 stop conditions for the lookup of matches:
  *
  *- a match >= good_match is found
  *- there are no more history entries to look at
- *- the next history entry is already too far back
- *  to be coded into a tag.
  *
  *			Finally the match algorithm checks that at least a match
  *			of 3 or more bytes has been found, because that is the smallest
@@ -187,13 +185,12 @@
 
 #include "common/pg_lzcompress.h"
 
-
 /* --
  * Local definitions
  * --
  */
 #define PGLZ_MAX_HISTORY_LISTS	8192	/* must be power of 2 */
-#define PGLZ_HISTORY_SIZE		4096
+#define PGLZ_HISTORY_SIZE		0x0fff	/* to avoid compare in iteration */
 #define PGLZ_MAX_MATCH			273
 
 
@@ -202,17 +199,16 @@
  *
  *		Linked list for the backward history lookup
  *
- * All the entries sharing a hash key are linked in a doubly linked list.
- * This makes it easy to remove an entry when it's time to recycle it
- * (because it's more than 4K positions old).
+ * All the entries sharing a hash key are linked in a singly linked list.
+ * Links are not changed during insertion in order to speed it up.
+ * Instead more complicated stop condition is used during list iteration.
  * --
  */
 typedef struct PGLZ_HistEntry
 {
-	struct PGLZ_HistEntry *next;	/* links for my hash key's list */
-	struct PGLZ_HistEntry *prev;
-	int			hindex;			/* my current hash key */
-	const char *pos;			/* my input position */
+	int16		next_id;		/* links for my hash key's list */
+	uint16		hist_idx;		/* my current hash key */
+	const unsigned char *pos;	/* my input position */
 } PGLZ_HistEntry;
 
 

Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Justin Pryzby
On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> @@ -32,6 +33,12 @@ typedef enum BackendState
>   STATE_DISABLED
>  } BackendState;
>  
> +/* Enum helper for reporting memory allocated bytes */
> +enum allocation_direction
> +{
> + DECREASE = -1,
> + INCREASE = 1,
> +};

BTW, these should have some kind of prefix, like PG_ALLOC_* to avoid
causing the same kind of problem for someone else that another header
caused for you by defining something somewhere called IGNORE (ignore
what, I don't know).  The other problem was probably due to a define,
though.  Maybe instead of an enum, the function should take a boolean.

I still wonder whether there needs to be a separate CF entry for the
0001 patch.  One issue is that there's two different lists of people
involved in the threads.

-- 
Justin




Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Ted Yu
On Sat, Nov 26, 2022 at 9:32 PM Reid Thompson 
wrote:

> On Sat, 2022-11-26 at 22:10 -0500, Reid Thompson wrote:
> > Code rebased to current master.
> > Updated to incorporate additional recommendations from the the list
> >- add units to variables in view
> >- remove 'backend_' prefix from variables/functions
> >- update documentation
> >- add functional test for allocated_bytes
> >- refactor allocation reporting to reduce number of functions and
> >  branches/reduce performance hit
> >- zero allocated bytes after fork to avoid double counting
> > postmaster allocations
> >
> >
> >
> >
>
> attempt to remedy cfbot windows build issues
>
>
> Hi,

+   if (request_size > *mapped_size)
+   {
+   pgstat_report_allocated_bytes(*mapped_size,
DECREASE);
+   pgstat_report_allocated_bytes(request_size,
INCREASE);

pgstat_report_allocated_bytes is called twice for this case. Can the two
calls be combined into one (with request_size - *mapped_size, INCREASE) ?

Cheers


Re: Documentation for building with meson

2022-11-27 Thread Justin Pryzby
Thanks; two more things I saw:

 - In other docs,  isn't used around { a | b } lists:
   git grep '[^<]*|' doc
 - I think this is(was) missing a word;
   Setting this option allows you to override THE value of all 'auto' features

-- 
Justin




Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Justin Pryzby
On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> attempt to remedy cfbot windows build issues

You can trigger those tests under your own/private repo by pushing a
branch to github.  See src/tools/ci/README

I suppose the cfbot page ought to point that out.

-- 
Justin




Re: MSVC vs Perl

2022-11-27 Thread Andrew Dunstan


On 2022-11-26 Sa 16:25, Andrew Dunstan wrote:
> On 2022-11-26 Sa 16:05, Andres Freund wrote:
>> Hi,
>>
>> On 2022-11-26 09:43:19 -0500, Andrew Dunstan wrote:
>>> OK, so this cures the problem for drongo:
>>>
>>>
>>> diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
>>> index 83a3e40425..dc6b94b74f 100644
>>> --- a/src/tools/msvc/Mkvcbuild.pm
>>> +++ b/src/tools/msvc/Mkvcbuild.pm
>>> @@ -707,6 +707,7 @@ sub mkvcbuild
>>>     print "CFLAGS recommended by Perl: $Config{ccflags}\n";
>>>     print "CFLAGS to compile embedded Perl: ",
>>>   (join ' ', map { "-D$_" } @perl_embed_ccflags), "\n";
>>> +   push @perl_embed_ccflags,'NO_THREAD_SAFE_LOCALE';
>>>     foreach my $f (@perl_embed_ccflags)
>>>     {
>>>     $plperl->AddDefine($f);
>> This likely is just a test patch, in case it is not, it seems we should add
>> NO_THREAD_SAFE_LOCALE to @perl_embed_ccflags before printing it.
> Sure



OK, pushed something like that, after testing that it didn't break my
remaining ActiveState instance.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





mprove tab completion for ALTER EXTENSION ADD/DROP

2022-11-27 Thread vignesh C
Hi,

Tab completion for ALTER EXTENSION ADD and DROP was missing, this
patch adds the tab completion for the same.

Regards,
Vignesh
From 35e9e8bd22ae8767baa95a04c41ee8f68a9c5338 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sat, 26 Nov 2022 20:18:58 +0530
Subject: [PATCH v1] Missing tab completion for ALTER EXTENSION ADD/DROP

Missing tab completion for ALTER EXTENSION ADD/DROP
---
 src/bin/psql/tab-complete.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 13014f074f..9c3bc751dd 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1937,6 +1937,27 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("ALTER", "EXTENSION", MatchAny))
 		COMPLETE_WITH("ADD", "DROP", "UPDATE", "SET SCHEMA");
 
+	/* ALTER EXTENSION  ADD|DROP */
+	else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP"))
+		COMPLETE_WITH("ACCESS METHOD", "AGGREGATE", "CAST", "COLLATION",
+	  "CONVERSION", "DOMAIN", "EVENT TRIGGER", "FOREIGN",
+	  "FUNCTION", "MATERIALIZED VIEW", "OPERATOR",
+	  "PROCEDURAL LANGUAGE", "PROCEDURE", "LANGUAGE",
+	  "ROUTINE", "SCHEMA", "SEQUENCE", "SERVER", "TABLE",
+	  "TEXT SEARCH", "TRANSFORM FOR", "TYPE", "VIEW");
+
+	/* ALTER EXTENSION  ADD|DROP FOREIGN*/
+	else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP", "FOREIGN"))
+		COMPLETE_WITH("DATA WRAPPER", "TABLE");
+
+	/* ALTER EXTENSION  ADD|DROP OPERATOR*/
+	else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP", "OPERATOR"))
+		COMPLETE_WITH("CLASS", "FAMILY");
+
+	/* ALTER EXTENSION  ADD|DROP TEXT SEARCH*/
+	else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP", "TEXT", "SEARCH"))
+		COMPLETE_WITH("CONFIGURATION", "DICTIONARY", "PARSER", "TEMPLATE");
+
 	/* ALTER EXTENSION  UPDATE */
 	else if (Matches("ALTER", "EXTENSION", MatchAny, "UPDATE"))
 		COMPLETE_WITH("TO");
-- 
2.34.1



Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-27 Thread Ian Lawrence Barwick
2022年11月27日(日) 20:04 Julien Rouhaud :
>
> On Sun, Nov 27, 2022 at 03:39:44PM +0800, Julien Rouhaud wrote:
> > Le dim. 27 nov. 2022 à 15:31, Ian Lawrence Barwick
> >
> > >
> > > I'm trying to reconcile open CommitFest entries with their actual
> > > status; the entry for this:
> > >
> > >   https://commitfest.postgresql.org/40/3558/
> > >
> > > shows as "Waiting on Author", but looks like it's all been committed;
> > > is there anything
> > > left to do on this?
> > >
> >
> > right the CF entry is out of date. there's still the additional tap test
> > that needs to be taken care of. I sent a new version recently that works on
> > windows CI, so I guess the correct status should now be needs review,
> > although the latest patchset probably doesn't apply anymore since the first
> > patch has been committed. I'm traveling today I'll try to send a rebased
> > version in the evening
>
> And here's the rebased patch for the TAP tests.  I will switch the CF entry to
> Needs Review.

Thanks for the quick update!

Regards

Ian Barwick




Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-27 Thread Julien Rouhaud
On Sun, Nov 27, 2022 at 03:39:44PM +0800, Julien Rouhaud wrote:
> Le dim. 27 nov. 2022 à 15:31, Ian Lawrence Barwick
> 
> >
> > I'm trying to reconcile open CommitFest entries with their actual
> > status; the entry for this:
> >
> >   https://commitfest.postgresql.org/40/3558/
> >
> > shows as "Waiting on Author", but looks like it's all been committed;
> > is there anything
> > left to do on this?
> >
> 
> right the CF entry is out of date. there's still the additional tap test
> that needs to be taken care of. I sent a new version recently that works on
> windows CI, so I guess the correct status should now be needs review,
> although the latest patchset probably doesn't apply anymore since the first
> patch has been committed. I'm traveling today I'll try to send a rebased
> version in the evening

And here's the rebased patch for the TAP tests.  I will switch the CF entry to
Needs Review.
>From 8c0c6f9f477c8f1a28d475267f3bdca59c84ed86 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 24 Nov 2022 14:20:22 +0800
Subject: [PATCH v23] Add regression tests for file inclusion in HBA end ident
 configuration files

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
---
 src/test/authentication/meson.build   |   1 +
 .../authentication/t/004_file_inclusion.pl| 722 ++
 2 files changed, 723 insertions(+)
 create mode 100644 src/test/authentication/t/004_file_inclusion.pl

diff --git a/src/test/authentication/meson.build 
b/src/test/authentication/meson.build
index c2b48c43c9..cfc23fa213 100644
--- a/src/test/authentication/meson.build
+++ b/src/test/authentication/meson.build
@@ -7,6 +7,7 @@ tests += {
   't/001_password.pl',
   't/002_saslprep.pl',
   't/003_peer.pl',
+  't/004_file_inclusion.pl',
 ],
   },
 }
diff --git a/src/test/authentication/t/004_file_inclusion.pl 
b/src/test/authentication/t/004_file_inclusion.pl
new file mode 100644
index 00..1be807c0a2
--- /dev/null
+++ b/src/test/authentication/t/004_file_inclusion.pl
@@ -0,0 +1,722 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Set of tests for authentication and pg_hba.conf inclusion.
+# This test can only run with Unix-domain sockets.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use Time::HiRes qw(usleep);
+use IPC::Runqw(pump finish timer);
+use Data::Dumper;
+
+if (!$use_unix_sockets)
+{
+   plan skip_all =>
+ "authentication tests cannot run without Unix-domain sockets";
+}
+
+# stores the current line counter for each file.  hba_rule and ident_rule are
+# fake file names used for the global rule number for each auth view.
+my %cur_line = ('hba_rule' => 1, 'ident_rule' => 1);
+
+my $hba_file   = 'subdir1/pg_hba_custom.conf';
+my $ident_file = 'subdir2/pg_ident_custom.conf';
+
+# Initialize primary node
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->start;
+
+my $data_dir = $node->data_dir;
+
+# Normalize the data directory for Windows
+$data_dir =~ s/\/\.\//\//g;# reduce /./ to /
+$data_dir =~ s/\/\//\//g;  # reduce // to /
+$data_dir =~ s/\/$//;  # remove trailing /
+$data_dir =~ s/\\/\//g;# change \ to /
+
+
+# Add the given payload to the given relative HBA file of the given node.
+# This function maintains the %cur_line metadata, so it has to be called in the
+# expected inclusion evaluation order in order to keep it in sync.
+#
+# If the payload starts with "include" or "ignore", the function doesn't
+# increase the general hba rule number.
+#
+# If an err_str is provided, it returns an arrayref containing the provided
+# filename, the current line number in that file and the provided err_str.  The
+# err_str has to be a valid regex string.
+# Otherwise it only returns the line number of the payload in the wanted file.
+# This function has to be called in the expected inclusion evaluation order to
+# keep the %cur_line information in sync.
+sub add_hba_line
+{
+   my $node = shift;
+   my $filename = shift;
+   my $payload  = shift;
+   my $err_str  = shift;
+   my $globline;
+   my $fileline;
+   my @tokens;
+   my $line;
+
+   # Append the payload to the given file
+   $node->append_conf($filename, $payload);
+
+   # Get the current %cur_line counter for the file
+   if (not defined $cur_line{$filename})
+   {
+   $cur_line{$filename} = 1;
+   }
+   $fileline = $cur_line{$filename}++;
+
+   # Include directive, don't generate an underlying pg_hba_file_rules line
+   # but make sure we incremented the %cur_line counter.
+   # Also ignore line beginning with "ignore", for content of files that
+   # should not being included
+   if ($payload =~ qr/^(include|ignore)/)
+   {
+   if (defined $err_str)
+   {
+   

Re: TAP output format in pg_regress

2022-11-27 Thread Nikolay Shaplov
В письме от суббота, 26 ноября 2022 г. 23:35:45 MSK пользователь Daniel 
Gustafsson написал:
> > + #define bail_noatexit(...) bail_out(true, __VA_ARGS__)
> > 
> > BTW what does "noat" stands for? I thought it is typo too :-) and
> > originally meant to be "not".
> 
> Calling _exit() will cause exit handler functions registered with atexit()
> to not be invoked, no noatexit was intentional spelling.

I've read some mans:

The function _exit() terminates the calling process "immediately".

I guess, "immediately" is a good word here that very precisely describes what 
is happening here. 

I wold suggest to use word immediate instead of noatexit. This will do the 
code much more sensible for me. 

---
/*
 * Bailing out is for unrecoverable errors which prevents further testing to
 * occur and after which the test run should be aborted. By passing immediate
 * as true the process will terminate process with _exit() instead of exit().
 * This will allow to skip registered exit handlers, thus avoid possible
 * infinite recursive calls while exiting.
 */
static void
bail_out(bool immediate, const char *fmt,...)
{
va_list ap;

va_start(ap, fmt);
emit_tap_output_v(BAIL, fmt, ap);
va_end(ap);

if (immediate)
_exit(2);

exit(2);
}

#define bail_immediate(...) bail_out(true, __VA_ARGS__)
#define bail(...)   bail_out(false, __VA_ARGS__)
---

I've also rewritten the comment, the way I would understand it better, if I 
read it for the first time. I am not sure about my English, but key features 
there are: 

- "terminate process" instead of "exit". Too many exist in the sentence, 
better to use synonyms wherever is possible.
- "_exit() instead of exit()" more accurate description of what is happening 
here
- Split this sentence into two. First sentence: what does it do. Second 
sentence: why it does so.
- Added "infinite" word before "recursion". Recursion is not a bad thing, 
infinite recursion is. This explicitly state what we are trying to avoid.

==

> The diff algorithm decided that this was the compact way of displaying the
> unified diff, probably because too many lines in proximity changed.
When line is not changed it is never added to a change block by diff... I 
guess this part should be looking like that:

-   snprintf(buf, sizeof(buf),
-_(" All %d tests passed. "),
+ note(_("All %d tests passed.\n"), 
success_count);
   else if (fail_count == 0)   /* fail_count=0, fail_ignore_count>0 */
-   snprintf(buf, sizeof(buf),
-_(" %d of %d tests passed, %d failed test(s) 
ignored. "),
+   note(_("%d of %d tests passed, %d failed test(s) ignored.\n"),
success_count,
success_count + fail_ignore_count,
fail_ignore_count);
   else if (fail_ignore_count == 0)/* fail_count>0 && 
fail_ignore_count=0 */
-   snprintf(buf, sizeof(buf),
-_(" %d of %d tests failed. "),
+   diag(_("%d of %d tests failed.\n"),
fail_count,
success_count + fail_count);

Working with my students I usually insist them to provide such patches.
 
> While
> avoiding moving the comments to before the line might mitigate that somewhat
> I prefer this greatly to a slightly easier to read diff.

I do not quite understand what are you trying to achieve here, what stands 
behind "prefer", but if it is important for you, I will no longer insist.

= 
> No, they are aligned in such a way because they are running outside of a
> parallel group.  Note that it's not part of the "parallel group" note
> preceeding the tests:

> In the previous format it's a bit clearer, and maybe we should adopt that
> for
> TAP as well?

> That would if so make the output something like the below.  Personally I
> think the "test" prefix adds little value since everything printed are test
> suites, and we are already today using indentation for grouping parallel
> tests.

So this extra offset indicates that test is being included into parallel 
group? Guess it not really obvious...

I am not sure I have a clear idea what can be done here.

May be it some ideal I would give each group a name. Like

ok 8types.int245 ms
ok 9types.int473 ms
ok 10   types.int891 ms
ok 11   types.oid 47 ms
ok 12   types.float4  88 ms
ok 13   types.float8 139 ms
ok 14   types.bit165 ms
ok 15