date:20240416

Hi,

On 2024-04-16 15:05:32 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think we should work on generating a lot of func.sgml.  Particularly the
> > signature etc should just come from pg_proc.dat, it's pointlessly painful to
> > generate that by hand. And for a lot of the functions we should probably 
> > move
> > the existing func.sgml comments to the description in pg_proc.dat.
>
> Where are you going to get the examples and text descriptions from?

I think there's a few different way to do that. E.g. having long_desc, example
fields in pg_proc.dat. Or having examples and description in a separate file
and "enriching" that with auto-generated function signatures.

> (And no, I don't agree that the pg_description string should match
> what's in the docs.  The description string has to be a short
> one-liner in just about every case.)

Definitely shouldn't be the same in all cases, but I think there's a decent
number of cases where they can be the same. The differences between the two is
often minimal today.

Entirely randomly chosen example:

{ oid => '2825',
  descr => 'slope of the least-squares-fit linear equation determined by the 
(X, Y) pairs',
  proname => 'regr_slope', prokind => 'a', proisstrict => 'f',
  prorettype => 'float8', proargtypes => 'float8 float8',
  prosrc => 'aggregate_dummy' },

and

 regression slope

 regr_slope

regr_slope ( Y double 
precision, X double precision )
double precision

Computes the slope of the least-squares-fit linear equation determined
by the (X, Y)
pairs.

   Yes

The description is quite similar, the pg_proc entry lacks argument names. 

> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.

I think the manual work for writing signatures in sgml is not insignificant,
nor is the volume of sgml for them. Manually maintaining the signatures makes
it impractical to significantly improve the presentation - which I don't think
is all that great today.

And the lack of argument names in the pg_proc entries is occasionally fairly
annoying, because a \df+ doesn't provide enough information to use functions.

It'd also be quite useful if clients could render more of the documentation
for functions. People are used to language servers providing full
documentation for functions etc...

Greetings,

Andres Freund

Re: Differential code coverage between 16 and HEAD

2024-04-16 Thread Jeff Davis

On Mon, 2024-04-15 at 21:35 -0400, Tom Lane wrote:
> It's definitely not OK for the standard test suite to include
> internet access.

The update-unicode target is not run as part of the standard test
suite.

>   Seems like we need to separate "download new
> source files" from "generate the derived files".

I'm not sure that's the right dividing line. There are three-ish steps:

1. Download the Unicode files
2. Generate the derived .h files
3. Run tests

If we stop after 1, then do we check in the Unicode files? If so, then
there's inconsistency between the Unicode files and the .h files, which
doesn't seem like a good idea. If we don't check in the files, then
nobody can skip to step 2, so I don't see the point in separating the
steps.

If we separate out step 3 that makes more sense: we check in the result
after step 2, and anyone can run step 3 without downloading anything.
The only problem with that is the tests I added depend on a recent-
enough version of ICU, so I'm not sure how many people will run it,
anyway.

Andres's complaints seem mainly about code coverage in the standard
test suite for the thin layer of C code above the generated arrays. I
agree: code coverage is a good goal by itself, and having a few more
platforms exercising that C code can't hurt. I think we should just
address that concern directly by spot-checking the results for a few
code points rather than trying to make the exhaustive ICU tests run on
more hosts.

Regards,
Jeff Davis

Re: documentation structure

2024-04-16 Thread Bruce Momjian

On Tue, Apr 16, 2024 at 03:05:32PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think we should work on generating a lot of func.sgml.  Particularly the
> > signature etc should just come from pg_proc.dat, it's pointlessly painful to
> > generate that by hand. And for a lot of the functions we should probably 
> > move
> > the existing func.sgml comments to the description in pg_proc.dat.
> 
> Where are you going to get the examples and text descriptions from?
> (And no, I don't agree that the pg_description string should match
> what's in the docs.  The description string has to be a short
> one-liner in just about every case.)
> 
> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.

Maybe we could _verify_ the contents of func.sgml against pg_proc.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

2024-04-16 Thread Tom Lane

Andres Freund  writes:
> I think we should work on generating a lot of func.sgml.  Particularly the
> signature etc should just come from pg_proc.dat, it's pointlessly painful to
> generate that by hand. And for a lot of the functions we should probably move
> the existing func.sgml comments to the description in pg_proc.dat.

Where are you going to get the examples and text descriptions from?
(And no, I don't agree that the pg_description string should match
what's in the docs.  The description string has to be a short
one-liner in just about every case.)

This sounds to me like it would be a painful exercise with not a
lot of benefit in the end.

I do agree with Andrew that splitting func.sgml into multiple files
would be beneficial.

regards, tom lane

Re: Differential code coverage between 16 and HEAD

Hi,

On 2024-04-15 18:23:21 -0700, Jeff Davis wrote:
> On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
> > Can't we test this as part of the normal testsuite?
> 
> One thing that complicates things a bit is that the test compares the
> results against ICU, so a mismatch in Unicode version between ICU and
> Postgres can cause test failures. The test ignores unassigned code
> points, so normally it just results in less-exhaustive test coverage.
> But sometimes things really do change, and that would cause a failure.

Hm, that seems annoying, even for update-unicode :/. But I guess it won't be
very common to have such failures?


> Stepping back a moment, my top worry is really not to test those C
> functions, but to test the perl code that parses the text files and
> generates those arrays. Imagine a future Unicode version does something
> that the perl scripts didn't anticipate, and they fail to add array
> entries for half the code points, or something like that. By testing
> the arrays generated from freshly-parsed files exhaustively against
> ICU, then we have a good defense against that. That situation really
> only comes up when updating Unicode.

That's a good point.


> That's not to say that the C code shouldn't be tested, of course. Maybe
> we can just do some spot checks for the functions that are reachable
> via SQL and get rid of the functions that aren't yet reachable (and re-
> add them when they are)?

Yes, I think that'd be a good start. I don't think we necessarily need
exhaustive coverage, just a bit more coverage than we have.


> > I don't at all like that the tests depend on downloading new unicode
> > data. What if there was an update but I just want to test the current
> > state?
> 
> I was mostly following the precedent for normalization. Should we
> change that, also?

Yea, I think we should. But I think it's less urgent if we end up testing more
of the code without those test binaries.  I don't immediately know what
dependencies would be best, tbh.

Greetings,

Andres Freund

Re: pg17 issues with not-null contraints

2024-04-16 Thread Alvaro Herrera

On 2024-Apr-15, Justin Pryzby wrote:

> Here's a couple more issues affecting upgrades from v16 to v17.
> 
> postgres=# CREATE TABLE a(i int NOT NULL); CREATE TABLE b(i int PRIMARY KEY) 
> INHERITS (a);
> pg_restore: error: could not execute query: ERROR:  constraint 
> "pgdump_throwaway_notnull_0" of relation "b" does not exist

This one requires a separate pg_dump fix, which should --I hope-- be
pretty simple.

> postgres=# CREATE TABLE a(i int CONSTRAINT a NOT NULL PRIMARY KEY); CREATE 
> TABLE b()INHERITS(a); ALTER TABLE b ADD CONSTRAINT pkb PRIMARY KEY (i);
> pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
> constraint "pgdump_throwaway_notnull_0" of relation "b"

This one seems to be fixed with the patch I just posted.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
  https://postgr.es/m/200606261359.k5qdxe2p004...@auth-smtp.hol.gr

Re: documentation structure

Hi,

On 2024-03-19 17:39:39 -0400, Andrew Dunstan wrote:
> My own pet docs peeve is a purely editorial one: func.sgml is a 30k line
> beast, and I think there's a good case for splitting out at least the
> larger chunks of it.

I think we should work on generating a lot of func.sgml.  Particularly the
signature etc should just come from pg_proc.dat, it's pointlessly painful to
generate that by hand. And for a lot of the functions we should probably move
the existing func.sgml comments to the description in pg_proc.dat.

I suspect that we can't just generate all the documentation from pg_proc,
because of xrefs etc.  Although perhaps we could just strip those out for
pg_proc.

We'd need to add some more metadata to pg_proc, for grouping kinds of
functions together. But that seems doable.

Greetings,

Andres Freund

Re: pg17 issues with not-null contraints

2024-04-16 Thread Alvaro Herrera

On 2024-Apr-15, Alvaro Herrera wrote:

> - Fourth thought: we do as in the third thought, except we also allow
> DROP CONSTRAINT a constraint that's marked "local, inherited" to be
> simply an inherited constraint (remove its "local" marker).

Here is an initial implementation of what I was thinking.  Can you
please give it a try and see if it fixes this problem?  At least in my
run of your original test case, it seems to work as expected.

This is still missing some cleanup and additional tests, of course.
Speaking of which, I wonder if I should modify pg16's tests so that they
leave behind tables set up in this way, to immortalize pg_upgrade
testing.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From b40a418dfb3d7ce23ffa568c8a8d03640ce28435 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 16 Apr 2024 13:44:34 +0200
Subject: [PATCH] Fix add/drop of not-null constraints

---
 src/backend/catalog/heap.c  | 36 +
 src/backend/catalog/pg_constraint.c | 36 -
 src/backend/commands/tablecmds.c| 26 -
 src/include/catalog/pg_constraint.h |  2 +-
 4 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cc31909012..f0278b9c01 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2539,6 +2539,7 @@ AddRelationNewConstraints(Relation rel,
 			CookedConstraint *nncooked;
 			AttrNumber	colnum;
 			char	   *nnname;
+			int			existing;
 
 			/* Determine which column to modify */
 			colnum = get_attnum(RelationGetRelid(rel), strVal(linitial(cdef->keys)));
@@ -2547,12 +2548,39 @@ AddRelationNewConstraints(Relation rel,
 	 strVal(linitial(cdef->keys)), RelationGetRelid(rel));
 
 			/*
-			 * If the column already has a not-null constraint, we need only
-			 * update its catalog status and we're done.
+			 * If the column already has an inheritable not-null constraint,
+			 * we need only adjust its inheritance status and we're done.  If
+			 * the constraint is there but marked NO INHERIT, then it is
+			 * updated in the same way, but we also recurse to the children
+			 * (if any) to add the constraint there as well.
 			 */
-			if (AdjustNotNullInheritance1(RelationGetRelid(rel), colnum,
-		  cdef->inhcount, cdef->is_no_inherit))
+			existing = AdjustNotNullInheritance1(RelationGetRelid(rel), colnum,
+ cdef->inhcount, cdef->is_no_inherit);
+			if (existing == 1)
+continue;		/* all done! */
+			else if (existing == -1)
+			{
+List	   *children;
+
+children = find_inheritance_children(RelationGetRelid(rel), NoLock);
+foreach_oid(childoid, children)
+{
+	Relation	childrel = table_open(childoid, NoLock);
+
+	AddRelationNewConstraints(childrel,
+			  NIL,
+			  list_make1(copyObject(cdef)),
+			  allow_merge,
+			  is_local,
+			  is_internal,
+			  queryString);
+	/* these constraints are not in the return list -- good? */
+
+	table_close(childrel, NoLock);
+}
+
 continue;
+			}
 
 			/*
 			 * If a constraint name is specified, check that it isn't already
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 604280d322..a11483b1b8 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -709,36 +709,50 @@ extractNotNullColumn(HeapTuple constrTup)
  * AdjustNotNullInheritance1
  *		Adjust inheritance count for a single not-null constraint
  *
- * Adjust inheritance count, and possibly islocal status, for the not-null
- * constraint row of the given column, if it exists, and return true.
- * If no not-null constraint is found for the column, return false.
+ * If no not-null constraint is found for the column, return 0
+ * If the constraint does exist and it's inheritable, adjust its
+ * inheritance count (and possibly islocal status) and return 1.
+ * If the constraint exists but is marked NO INHERIT, adjust it as above
+ * and reset connoinherit to false, and return -1.  Caller is
+ * responsible for adding the same constraint to the children, if any.
  */
-bool
+int
 AdjustNotNullInheritance1(Oid relid, AttrNumber attnum, int count,
 		  bool is_no_inherit)
 {
 	HeapTuple	tup;
 
+	Assert(count >= 0);
+
 	tup = findNotNullConstraintAttnum(relid, attnum);
 	if (HeapTupleIsValid(tup))
 	{
 		Relation	pg_constraint;
 		Form_pg_constraint conform;
+		int			retval = 1;
 
 		pg_constraint = table_open(ConstraintRelationId, RowExclusiveLock);
 		conform = (Form_pg_constraint) GETSTRUCT(tup);
 
 		/*
-		 * Don't let the NO INHERIT status change (but don't complain
-		 * unnecessarily.)  In the future it might be useful to let an
-		 * inheritable constraint replace a non-inheritable one, but we'd need
-		 * to recurse to children to get it added there.
+		 * If the constraint already exists in this relation but it's

Re: pg_combinebackup fails on file named INCREMENTAL.*

On Tue, Apr 16, 2024 at 12:06 PM Stefan Fercot
 wrote:
> Sure, I can see your point here and how people could be tempted to through 
> away that backup_manifest if they don't know how important it is to keep it.
> Probably in this case we'd need the list to be inside the tar, just like 
> backup_label and tablespace_map then.

Yeah, I think anywhere inside the tar is better than anywhere outside
the tar, by a mile. I'm happy to leave the specific question of where
inside the tar as something TBD at time of implementation by fiat of
the person doing the work.

But that said ...

> Do you mean 1 stub-list per pgdata + 1 per tablespaces?
>
> I don't really see how it would be faster to recursively go through each 
> sub-directories of the pgdata and tablespaces to gather all the pieces 
> together compared to reading 1 main file.
> But I guess, choosing one option or the other, we will only find out how well 
> it works once people will use it on the field and possibly give some feedback.

The reason why I was suggesting one stub-list per directory is that we
recurse over the directory tree. We reach each directory in turn,
process it, and then move on to the next one. What I imagine that we
want to do is - first iterate over all of the files actually present
in a directory. Then, iterate over the list of stubs for that
directory and do whatever we would have done if there had been a stub
file present for each of those. So, we don't really want a list of
every stub in the whole backup, or even every stub in the whole
tablespace. What we want is to be able to easily get a list of stubs
for a single directory. Which is very easily done if each directory
contains its own stub-list file.

If we instead have a centralized stub-list for the whole tablespace,
or the whole backup, it's still quite possible to make it work. We
just read that centralized stub list and we build an in-memory data
structure that is indexed by containing directory, like a hash table
where the key is the directory name and the value is a list of
filenames within that directory. But, a slight disadvantage of this
model is that you have to keep that whole data structure in memory for
the whole time you're reconstructing, and you have to pass around a
pointer to it everywhere so that the code that handles individual
directories can access it. I'm sure this isn't the end of the world.
It's probably unlikely that someone has so many stub files that the
memory used for such a data structure is painfully high, and even if
they did, it's unlikely that they are spread out across multiple
databases and/or tablespaces in such a way that only needing the data
for one directory at a time would save you. But, it's not impossible
that such a scenario could exist.

Somebody might say - well, don't go directory by directory. Just
handle all of the stubs at the end. But I don't think that really
fixes anything. I want to be able to verify that none of the stubs
listed in the stub-list are also present in the backup as real files,
for sanity checking purposes. It's quite easy to see how to do that in
the design I proposed above: keep a list of the files for each
directory as you read it, and then when you read the stub-list for
that directory, check those lists against each other for duplicates.
Doing this on the level of a whole tablespace or the whole backup is
clearly also possible, but once again it potentially uses more memory,
and there's no functional gain.

Plus, this kind of approach would also make the reconstruction process
"jump around" more. It might pull a bunch of mostly-unchanged files
from the full backup while handling the non-stub files, and then come
back to that directory a second time, much later, when it's processing
the stub-list. Perhaps that would lead to a less-optimal I/O pattern,
or perhaps it would make it harder for the user to understand how much
progress reconstruction had made. Or perhaps it would make no
difference at all; I don't know. Maybe there's even some advantage in
a two-pass approach like this. I don't see one. But it might prove
otherwise on closer examination.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Stack overflow issue

On Tue, Apr 16, 2024 at 6:35 PM Andres Freund  wrote:
> On 2024-04-16 15:45:42 +0300, Alexander Korotkov wrote:
> > On Tue, Apr 16, 2024 at 1:48 AM Andres Freund  wrote:
> > > On 2024-03-06 14:17:23 +0200, Alexander Korotkov wrote:
> > > > 0001 Turn tail recursion into iteration in CommitTransactionCommand()
> > > > I did minor revision of comments and code blocks order to improve the
> > > > readability.
> > >
> > > After sending
> > > https://www.postgresql.org/message-id/20240414223305.m3i5eju6zylabvln%40awork3.anarazel.de
> > > I looked some more at important areas where changes didn't have code
> > > coverage. One thing I noticed was that the "non-internal" part of
> > > AbortCurrentTransaction() is uncovered:
> > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/transam/xact.c.gcov.html#L3403
> > >
> > > Which made me try to understand fefd9a3fed2.  I'm a bit confused about why
> > > some parts are handled in 
> > > CommitCurrentTransaction()/AbortCurrentTransaction()
> > > and others are in the *Internal functions.
> > >
> > > I understand that fefd9a3fed2 needed to remove the recursion in
> > > CommitTransactionCommand()/AbortCurrentTransaction(). But I don't 
> > > understand
> > > why that means having some code in in the non-internal and some in the
> > > internal functions?  Wouldn't it be easier to just have all the state 
> > > handling
> > > code in the Internal() function and just break after the
> > > CleanupSubTransaction() calls?
> >
> > I'm not sure I correctly get what you mean.  Do you think the attached
> > patch matches the direction you're pointing?  The patch itself is not
> > final, it requires cleanup and comments revision, just to check the
> > direction.
>
> Something like that, yea. The split does seem less confusing that way to me,
> but also not 100% certain.

Thank you for your feedback.  I'm going to go ahead and polish this patch.

--
Regards,
Alexander Korotkov

At 2024.pgconf.dev, Andres and I will be hosting a patch review
workshop.[1] Part of the workshop will be a presentation, and part of
it will be a practicum. That is, we're going to actually ask attendees
to review some patches during the workshop. We'll also comment on
those reviews, and the patches themselves, with our own thoughts.
While we could just pick some things from the CommitFest, I believe we
both felt a little uncomfortable with the idea of potentially turning
a spotlight on someone's patch where they might not have been
expecting it. So, instead, I'd like to invite you to email me, and/or
Andres, if you have a patch that isn't committed yet and which you
think would be a good candidate for review during this workshop. If
your patch is selected to be reviewed during the workshop, then you
will very likely get some reviews for your patch posted on
pgsql-hackers. But, there are no guarantees about how positive or
negative those reviews will be, so you do need to be prepared to take
the bad with the good.

Note that this is really an exercise in helping more people in the
community to get better at reviewing patches. So, if Andres and I
think that what your patch really needs is an opinion from Tom Lane
specifically, or even an opinion from Andres Freund or Robert Haas
specifically, we probably won't choose to include it in the workshop.
But there are lots of patches that just need attention from someone,
at least for starters, and perhaps this workshop can help some of
those patches to make progress, in addition to (hopefully) being
educational for the attendees.

Key points:

1. If you have a patch you think would be a good candidate for this
event, please email me and/or Andres.

2. Please only volunteer a patch that you wrote, not one that somebody
else wrote.

3. Please don't suggest a patch that's already committed.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

[1]
https://www.pgevents.ca/events/pgconfdev2024/schedule/session/40-patch-review-workshop-registration-required/

Re: pg_combinebackup fails on file named INCREMENTAL.*

On Tuesday, April 16th, 2024 at 3:22 PM, Robert Haas wrote:
> What I fear is that this will turn into another situation like we had
> with pg_xlog, where people saw "log" in the name and just blew it
> away. Matter of fact, I recently encountered one of my few recent
> examples of someone doing that thing since the pg_wal renaming
> happened. Some users don't take much convincing to remove anything
> that looks inessential. And what I'm particularly worried about with
> this feature is tar-format backups. If you have a directory format
> backup and you do an "ls", you're going to see a whole bunch of files
> in there of which backup_manifest will be one. How you treat that file
> is just going to depend on what you know about its purpose. But if you
> have a tar-format backup, possibly compressed, the backup_manifest
> file stands out a lot more. You may have something like this:
> 
> backup_manifest root.tar.gz 16384.tar.gz

Sure, I can see your point here and how people could be tempted to through away 
that backup_manifest if they don't know how important it is to keep it.
Probably in this case we'd need the list to be inside the tar, just like 
backup_label and tablespace_map then.

> The kicker for me is that I can't see any reason to do any of this
> stuff. Including the information that we need to elide incremental
> stubs in some other way, say with one stub-list per directory, will be
> easier to implement and probably perform better. Like, I'm not saying
> we can't find a way to jam this into the manifest. But I'm fairly sure
> it's just making life difficult for ourselves.
> 
> I may ultimately lose this argument, as I did the one about whether
> the backup_manifest should be JSON or some bespoke format. And that's
> fine. I respect your opinion, and David's. But I also reserve the
> right to feel differently, and I do.

Do you mean 1 stub-list per pgdata + 1 per tablespaces?

Sure, it is important to respect and value each other feelings, I never said 
otherwise.

I don't really see how it would be faster to recursively go through each 
sub-directories of the pgdata and tablespaces to gather all the pieces together 
compared to reading 1 main file.
But I guess, choosing one option or the other, we will only find out how well 
it works once people will use it on the field and possibly give some feedback.

As you mentioned in [1], we're not going to start rewriting the implementation 
a week after feature freeze nor probably already start building big new things 
now anyway.
So maybe let's start with documenting the possible gotchas/corner cases to make 
our support life easier in the future.

Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoaVxr_o3mrDBrBcXm3gowr9Qc4ABW-c73NR_201KkDavw%40mail.gmail.com

Re: Table AM Interface Enhancements

Hi,

On 2024-04-16 08:31:24 -0400, Robert Haas wrote:
> On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov  
> wrote:
> > Taking a closer look at acquire_sample_rows(), I think it would be
> > good if table AM implementation would care about block-level (or
> > whatever-level) sampling.  So that acquire_sample_rows() just fetches
> > tuples one-by-one from table AM implementation without any care about
> > blocks.  Possible table_beginscan_analyze() could take an argument of
> > target number of tuples, then those tuples are just fetches with
> > table_scan_analyze_next_tuple().  What do you think?
> 
> Andres is the expert here, but FWIW, that plan seems reasonable to me.
> One downside is that every block-based tableam is going to end up with
> a very similar implementation, which is kind of something I don't like
> about the tableam API in general: if you want to make something that
> is basically heap plus a little bit of special sauce, you have to copy
> a mountain of code. Right now we don't really care about that problem,
> because we don't have any other tableams in core, but if we ever do, I
> think we're going to find ourselves very unhappy with that aspect of
> things. But maybe now is not the time to start worrying. That problem
> isn't unique to analyze, and giving out-of-core tableams the
> flexibility to do what they want is better than not.

I think that can partially be addressed by having more "block oriented AM"
helpers in core, like we have for table_block_parallelscan*. Doesn't work for
everything, but should for something like analyze.

Greetings,

Andres Freund

Re: Table AM Interface Enhancements

On 2024-04-16 13:33:53 +0300, Alexander Korotkov wrote:
> Reverted.

Thanks!

Re: Issue with the PRNG used by Postgres

Hi,

On 2024-04-15 10:54:16 -0400, Robert Haas wrote:
> On Fri, Apr 12, 2024 at 3:33 PM Andres Freund  wrote:
> > Here's a patch implementing this approach. I confirmed that before we 
> > trigger
> > the stuck spinlock logic very quickly and after we don't. However, if most
> > sleeps are interrupted, it can delay the stuck spinlock detection a good
> > bit. But that seems much better than triggering it too quickly.
>
> +1 for doing something about this. I'm not sure if it goes far enough,
> but it definitely seems much better than doing nothing.

One thing I started to be worried about is whether a patch ought to prevent
the timeout used by perform_spin_delay() from increasing when
interrupted. Otherwise a few signals can trigger quite long waits.

But as a I can't quite see a way to make this accurate in the backbranches, I
suspect something like what I posted is still a good first version.

> Given your findings, I'm honestly kind of surprised that I haven't seen
> problems of this type more frequently.

Same. I did a bunch of searches for the error, but found surprisingly
little.

I think in practice most spinlocks just aren't contended enough to reach
perform_spin_delay(). And we have improved some on that over time.

Greetings,

Andres Freund

Re: Stack overflow issue

Hi,

On 2024-04-16 15:45:42 +0300, Alexander Korotkov wrote:
> On Tue, Apr 16, 2024 at 1:48 AM Andres Freund  wrote:
> > On 2024-03-06 14:17:23 +0200, Alexander Korotkov wrote:
> > > 0001 Turn tail recursion into iteration in CommitTransactionCommand()
> > > I did minor revision of comments and code blocks order to improve the
> > > readability.
> >
> > After sending
> > https://www.postgresql.org/message-id/20240414223305.m3i5eju6zylabvln%40awork3.anarazel.de
> > I looked some more at important areas where changes didn't have code
> > coverage. One thing I noticed was that the "non-internal" part of
> > AbortCurrentTransaction() is uncovered:
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/transam/xact.c.gcov.html#L3403
> >
> > Which made me try to understand fefd9a3fed2.  I'm a bit confused about why
> > some parts are handled in 
> > CommitCurrentTransaction()/AbortCurrentTransaction()
> > and others are in the *Internal functions.
> >
> > I understand that fefd9a3fed2 needed to remove the recursion in
> > CommitTransactionCommand()/AbortCurrentTransaction(). But I don't understand
> > why that means having some code in in the non-internal and some in the
> > internal functions?  Wouldn't it be easier to just have all the state 
> > handling
> > code in the Internal() function and just break after the
> > CleanupSubTransaction() calls?
> 
> I'm not sure I correctly get what you mean.  Do you think the attached
> patch matches the direction you're pointing?  The patch itself is not
> final, it requires cleanup and comments revision, just to check the
> direction.

Something like that, yea. The split does seem less confusing that way to me,
but also not 100% certain.

Greetings,

Andres Freund

Re: Combine headerscheck and cpluspluscheck scripts

2024-04-16 Thread Anton Voloshin


Hello, hackers,

On 10/03/2024 12:03, Peter Eisentraut wrote:

Committed, thanks.


This commit (7b8e2ae2f) have turned cpluspluscheck script into a 
--cplusplus option for headerscheck.  I propose to update the 
src/tools/pginclude/README correspondingly, please see the attached patch.


--
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ruFrom a50e58f117341e8a9df5f97fa05630f7b8f4bd86 Mon Sep 17 00:00:00 2001
From: Anton Voloshin 
Date: Tue, 16 Apr 2024 17:19:28 +0300
Subject: [PATCH] Update src/tools/pginclude/README to match recent changes to
 cpluspluscheck

7b8e2ae2f have turned cpluspluscheck from separate script into
a --cplusplus option for headerscheck. Update README correspondingly.
---
 src/tools/pginclude/README | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/tools/pginclude/README b/src/tools/pginclude/README
index 712eca76fb3..65372057dad 100644
--- a/src/tools/pginclude/README
+++ b/src/tools/pginclude/README
@@ -1,10 +1,10 @@
 src/tools/pginclude/README
 
-NOTE: headerscheck and cpluspluscheck are in current use, and any
-problems they find should generally get fixed.  The other scripts
-in this directory have not been used in some time, and have issues.
-pgrminclude in particular has a history of creating more problems
-than it fixes.  Be very wary of applying their results blindly.
+NOTE: headerscheck and headerscheck --cplusplus are in current use, and any
+problems they find should generally get fixed.  The other scripts in this
+directory have not been used in some time, and have issues.  pgrminclude in
+particular has a history of creating more problems than it fixes.  Be very
+wary of applying their results blindly.
 
 
 pginclude
@@ -84,16 +84,16 @@ prerequisite, even if postgres_fe.h or c.h would be more appropriate.
 Also note that the contents of macros are not checked; this is intentional.
 
 
-cpluspluscheck
-==
+headerscheck --cplusplus
+
 
-This script can be run to verify that all Postgres include files meet
-the project convention that they will compile as C++ code.  Although
-the project's coding language is C, some people write extensions in C++,
-so it's helpful for include files to be C++-clean.
+The headerscheck in --cplusplus mode can be run to verify that all Postgres
+include files meet the project convention that they will compile as C++
+code.  Although the project's coding language is C, some people write
+extensions in C++, so it's helpful for include files to be C++-clean.
 
 A small number of header files are exempted from this requirement,
-and are skipped by the cpluspluscheck script.
+and are skipped by the script in the --cplusplus mode.
 
 The easy way to run the script is to say "make -s cpluspluscheck" in
 the top-level build directory after completing a build.  You should
-- 
2.43.2

Re: Use streaming read API in ANALYZE

2024-04-16 Thread Nazir Bilal Yavuz

Hi,

On Wed, 3 Apr 2024 at 22:25, Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> Thank you for looking into this!
>
> On Wed, 3 Apr 2024 at 20:17, Heikki Linnakangas  wrote:
> >
> > On 03/04/2024 13:31, Nazir Bilal Yavuz wrote:
> > > Streaming API has been committed but the committed version has a minor
> > > change, the read_stream_begin_relation function takes Relation instead
> > > of BufferManagerRelation now. So, here is a v5 which addresses this
> > > change.
> >
> > I'm getting a repeatable segfault / assertion failure with this:
> >
> > postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10);
> > CREATE TABLE
> > postgres=# insert into tengiga select g, repeat('x', 900) from
> > generate_series(1, 140) g;
> > INSERT 0 140
> > postgres=# set default_statistics_target = 10; ANALYZE tengiga;
> > SET
> > ANALYZE
> > postgres=# set default_statistics_target = 100; ANALYZE tengiga;
> > SET
> > ANALYZE
> > postgres=# set default_statistics_target =1000; ANALYZE tengiga;
> > SET
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> >
> > TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File:
> > "heapam_handler.c", Line: 1079, PID: 262232
> > postgres: heikki postgres [local]
> > ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8]
> > postgres: heikki postgres [local]
> > ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34]
> > postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34]
> > postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a]
> > postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9]
> > postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0]
> > postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe]
> > postgres: heikki postgres [local]
> > ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9]
> > postgres: heikki postgres [local]
> > ANALYZE(ProcessUtility+0x136)[0x564889f0afb1]
> > postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8]
> > postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b]
> > postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015]
> > postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6]
> > postgres: heikki postgres [local]
> > ANALYZE(PostgresMain+0x80c)[0x564889f06fd7]
> > postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876]
> > postgres: heikki postgres [local]
> > ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3]
> > postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e]
> > postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0]
> > postgres: heikki postgres [local]
> > ANALYZE(PostmasterMain+0x152b)[0x564889e2214d]
> > postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305]
> > postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61]
> > 2024-04-03 20:15:49.157 EEST [262101] LOG:  server process (PID 262232)
> > was terminated by signal 6: Aborted
>
> I realized the same error while working on Jakub's benchmarking results.
>
> Cause: I was using the nblocks variable to check how many blocks will
> be returned from the streaming API. But I realized that sometimes the
> number returned from BlockSampler_Init() is not equal to the number of
> blocks that BlockSampler_Next() will return as BlockSampling algorithm
> decides how many blocks to return on the fly by using some random
> seeds.

I wanted to re-check this problem and I realized that I was wrong. I
tried using nblocks again and this time there was no failure. I looked
at block sampling logic and I am pretty sure that BlockSampler_Init()
function correctly returns the number of blocks that
BlockSampler_Next() will return. It seems 158f581923 fixed this issue
as well.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: pg_combinebackup does not detect missing files

On Wed, Apr 10, 2024 at 9:36 PM David Steele  wrote:
> I've been playing around with the incremental backup feature trying to
> get a sense of how it can be practically used. One of the first things I
> always try is to delete random files and see what happens.
>
> You can delete pretty much anything you want from the most recent
> incremental backup (not the manifest) and it will not be detected.

Sure, but you can also delete anything you want from the most recent
non-incremental backup and it will also not be detected. There's no
reason at all to hold incremental backup to a higher standard than we
do in general.

> Maybe the answer here is to update the docs to specify that
> pg_verifybackup should be run on all backup directories before
> pg_combinebackup is run. Right now that is not at all clear.

I don't want to make those kinds of prescriptive statements. If you
want to verify the backups that you use as input to pg_combinebackup,
you can use pg_verifybackup to do that, but it's not a requirement.
I'm not averse to having some kind of statement in the documentation
along the lines of "Note that pg_combinebackup does not attempt to
verify that the individual backups are intact; for that, use
pg_verifybackup." But I think it should be blindingly obvious to
everyone that you can't go whacking around the inputs to a program and
expect to get perfectly good output. I know it isn't blindingly
obvious to everyone, which is why I'm not averse to adding something
like what I just mentioned, and maybe it wouldn't be a bad idea to
document in a few other places that you shouldn't randomly remove
files from the data directory of your live cluster, either, because
people seem to keep doing it, but really, the expectation that you
can't just blow files away and expect good things to happen afterward
should hardly need to be stated.

I think it's very easy to go overboard with warnings of this type.
Weird stuff comes to me all the time because people call me when the
weird stuff happens, and I'm guessing that your experience is similar.
But my actual personal experience, as opposed to the cases reported to
me by others, practically never features files evaporating into the
ether. If I read a documentation page for PostgreSQL or any other
piece of software that made it sound like that was a normal
occurrence, I'd question the technical acumen of the authors. And if I
saw such warnings only for one particular feature of a piece of
software and not anywhere else, I'd wonder why the authors of the
software were trying so hard to scare me off the use of that
particular feature. I don't trust at all that incremental backup is
free of bugs -- but I don't trust that all the code anyone else has
written is free of bugs, either.

> Overall I think it is an issue that the combine is being driven from the
> most recent incremental directory rather than from the manifest.

I don't. I considered that design and rejected it for various reasons
that I still believe to be good. Even if I was wrong, we're not going
to start rewriting the implementation a week after feature freeze.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Typos in the code and README

2024-04-16 Thread Nazir Bilal Yavuz

Hi,

Thanks for working on this!

On Mon, 15 Apr 2024 at 15:26, Daniel Gustafsson  wrote:
>
> > On 14 Apr 2024, at 13:19, David Rowley  wrote:
> >
> > On Sat, 13 Apr 2024 at 09:17, Daniel Gustafsson  wrote:
> >>
> >>> On 12 Apr 2024, at 23:15, Heikki Linnakangas  wrote:
> >>> Here's a few more. I've accumulate these over the past couple of months, 
> >>> keeping them stashed in a branch, adding to it whenever I've spotted a 
> >>> minor typo while reading the code.
> >>
> >> Nice, let's lot all these together.
> >
> > Here are a few additional ones to add to that.
>
> Thanks.  Collecting all the ones submitted here, as well as a few submitted
> off-list by Alexander, the patch is now a 3-part patchset of cleanups:
>
> 0001 contains the typos and duplicate words fixups, 0002 fixes a parameter 
> with
> the wrong name in the prototype and 0003 removes a leftover prototype which 
> was
> accidentally left in a refactoring.

I realized two small typos: 'sgmr' -> 'smgr'. You may want to include
them in 0001.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: pg_combinebackup fails on file named INCREMENTAL.*

On Mon, Apr 15, 2024 at 10:12 PM David Steele  wrote:
> Anyway, I think it should be fixed or documented as a caveat since it
> causes a hard failure on restore.

Alright, I'll look into this.

> I know Tomas added some optimizations that work best with --no-manifest
> but if we can eventually read compressed tars (which I expect to be the
> general case) then those optimizations are not very useful.

My belief is that those optimizations work fine with or without
manifests; you only start to lose the benefit in cases where you use
different checksum types for different backups that you then try to
combine. Which should hopefully be a rare case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_combinebackup fails on file named INCREMENTAL.*

On Tue, Apr 16, 2024 at 3:10 AM Stefan Fercot
 wrote:
> > > But ... I didn't really end up feeling very comfortable with it. Right
> > > now, the backup manifest is something we only use to verify the
> > > integrity of the backup. If we were to do this, it would become a
> > > critical part of the backup.
>
> Isn't it already the case? I mean, you need the manifest of the previous 
> backup to take an incremental one, right?
> And shouldn't we encourage to verify the backup sets before (at least) trying 
> to combine them?
> It's not because a file was only use for one specific purpose until now that 
> we can't improve it later.
> Splitting the meaningful information across multiple places would be more 
> error-prone (for both devs and users) imo.

Well, right now, if you just take a full backup, and you throw away
the backup manifest because you don't care, you have a working full
backup. Furthermore, if you took any incremental backups based on that
full backup before discarding the manifest, you can still restore
them. Now, it is possible that nobody in the world cares about those
properties other than me; I have been known to (a) care about weird
stuff and (b) be a pedant. However, we've already shipped a couple of
releases where backup manifests were thoroughly non-critical: you
needed them to run pg_verifybackup, and for nothing else. I think it's
quite likely that there are users out there who are used to things
working in that way, and I'm not sure that those users will adjust
their expectations when a new feature comes out. I also feel that if I
were a user, I would think of something called a "manifest" as just a
table of contents for whatever the thing was. I still remember
downloading tar files from the Internet in the 1990s and there'd be a
file in the tarball sometimes called MANIFEST which was, you know, a
list of what was in the tarball. You didn't need that file for
anything functional; it was just so you could check if anything was
missing.

What I fear is that this will turn into another situation like we had
with pg_xlog, where people saw "log" in the name and just blew it
away. Matter of fact, I recently encountered one of my few recent
examples of someone doing that thing since the pg_wal renaming
happened. Some users don't take much convincing to remove anything
that looks inessential. And what I'm particularly worried about with
this feature is tar-format backups. If you have a directory format
backup and you do an "ls", you're going to see a whole bunch of files
in there of which backup_manifest will be one. How you treat that file
is just going to depend on what you know about its purpose. But if you
have a tar-format backup, possibly compressed, the backup_manifest
file stands out a lot more. You may have something like this:

backup_manifest root.tar.gz 16384.tar.gz

Well, at this point, it becomes much more likely that you're going to
think that there are special rules for the backup_manifest file.

The kicker for me is that I can't see any reason to do any of this
stuff. Including the information that we need to elide incremental
stubs in some other way, say with one stub-list per directory, will be
easier to implement and probably perform better. Like, I'm not saying
we can't find a way to jam this into the manifest. But I'm fairly sure
it's just making life difficult for ourselves.

I may ultimately lose this argument, as I did the one about whether
the backup_manifest should be JSON or some bespoke format. And that's
fine. I respect your opinion, and David's. But I also reserve the
right to feel differently, and I do. And I would also just gently
point out that my level of motivation to work on a particular feature
can depend quite a bit on whether I'm going to be forced to implement
it in a way that I disagree with.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-16 Thread Dean Rasheed

On Tue, 16 Apr 2024 at 11:35, Richard Guo  wrote:
>
> On Tue, Apr 16, 2024 at 5:48 PM Aleksander Alekseev 
>  wrote:
>>
>> Oversight of 0294df2f1f84 [1].
>>
>> [1]: 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0294df2f1f84
>
> +1.  I think this change improves the code quality.  I searched for
> other arrays indexed by merge match kind, but found none.  So this patch
> seems thorough.
>

Yes this makes sense, though I note that some other similar code uses
a #define rather than inserting an enum element at the end (e.g.,
NUM_ROWFILTER_PUBACTIONS).

I guess the argument against inserting an enum element at the end is
that a switch statement on the enum value might generate a compiler
warning if it didn't have a default clause.

Looking at how NUM_ROWFILTER_PUBACTIONS is defined as the last element
plus one, it might seem to be barely any better than just defining it
to be 3, since any new enum element would probably be added at the
end, requiring it to be updated in any case. But if the number of
elements were much larger, it would be much more obviously correct,
making it a good general pattern to follow. So in the interests of
code consistency, I think we should do the same here.

Regards,
Dean

Re: Stack overflow issue

On Tue, Apr 16, 2024 at 1:48 AM Andres Freund  wrote:
> On 2024-03-06 14:17:23 +0200, Alexander Korotkov wrote:
> > 0001 Turn tail recursion into iteration in CommitTransactionCommand()
> > I did minor revision of comments and code blocks order to improve the
> > readability.
>
> After sending
> https://www.postgresql.org/message-id/20240414223305.m3i5eju6zylabvln%40awork3.anarazel.de
> I looked some more at important areas where changes didn't have code
> coverage. One thing I noticed was that the "non-internal" part of
> AbortCurrentTransaction() is uncovered:
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/transam/xact.c.gcov.html#L3403
>
> Which made me try to understand fefd9a3fed2.  I'm a bit confused about why
> some parts are handled in CommitCurrentTransaction()/AbortCurrentTransaction()
> and others are in the *Internal functions.
>
> I understand that fefd9a3fed2 needed to remove the recursion in
> CommitTransactionCommand()/AbortCurrentTransaction(). But I don't understand
> why that means having some code in in the non-internal and some in the
> internal functions?  Wouldn't it be easier to just have all the state handling
> code in the Internal() function and just break after the
> CleanupSubTransaction() calls?

I'm not sure I correctly get what you mean.  Do you think the attached
patch matches the direction you're pointing?  The patch itself is not
final, it requires cleanup and comments revision, just to check the
direction.

--
Regards,
Alexander Korotkov


abort_commit_transaction.patch
Description: Binary data

Re: Table AM Interface Enhancements

On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov  wrote:
> Taking a closer look at acquire_sample_rows(), I think it would be
> good if table AM implementation would care about block-level (or
> whatever-level) sampling.  So that acquire_sample_rows() just fetches
> tuples one-by-one from table AM implementation without any care about
> blocks.  Possible table_beginscan_analyze() could take an argument of
> target number of tuples, then those tuples are just fetches with
> table_scan_analyze_next_tuple().  What do you think?

Andres is the expert here, but FWIW, that plan seems reasonable to me.
One downside is that every block-based tableam is going to end up with
a very similar implementation, which is kind of something I don't like
about the tableam API in general: if you want to make something that
is basically heap plus a little bit of special sauce, you have to copy
a mountain of code. Right now we don't really care about that problem,
because we don't have any other tableams in core, but if we ever do, I
think we're going to find ourselves very unhappy with that aspect of
things. But maybe now is not the time to start worrying. That problem
isn't unique to analyze, and giving out-of-core tableams the
flexibility to do what they want is better than not.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Table AM Interface Enhancements

2024-04-16 Thread Pavel Borisov

On Tue, 16 Apr 2024 at 14:52, Alexander Korotkov 
wrote:

> On Mon, Apr 15, 2024 at 11:17 PM Andres Freund  wrote:
> > On 2024-04-15 16:02:00 -0400, Robert Haas wrote:
> > > Do you want that patch applied, not applied, or applied with some set
> of
> > > modifications?
> >
> > I think we should apply Alexander's proposed revert and then separately
> > discuss what we should do about 041b96802ef.
>
> Taking a closer look at acquire_sample_rows(), I think it would be
> good if table AM implementation would care about block-level (or
> whatever-level) sampling.  So that acquire_sample_rows() just fetches
> tuples one-by-one from table AM implementation without any care about
> blocks.  Possible table_beginscan_analyze() could take an argument of
> target number of tuples, then those tuples are just fetches with
> table_scan_analyze_next_tuple().  What do you think?
>
Hi, Alexander!

I like the idea of splitting abstraction levels for:
1. acquirefuncs (FDW or physical table)
2. new specific row fetch functions (alike to existing
_scan_analyze_next_tuple()), that could be AM-specific.

Then scan_analyze_next_block() or another iteration algorithm would be
contained inside table AM implementation of _scan_analyze_next_tuple().

So, init of scan state would be inside table AM implementation of
_beginscan_analyze(). Scan state (like BlockSamplerData or other state that
could be custom for AM) could be transferred from _beginscan_analyze() to
_scan_analyze_next_tuple() by some opaque AM-specific data structure. If so
we'll also may need AM-specific table_endscan_analyze to clean it.

Regards,
Pavel

Re: Disallow changing slot's failover option in transaction block

2024-04-16 Thread shveta malik

On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Hou,
>
> > Kuroda-San reported an issue off-list that:
> >
> > If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
> > and rollback, only the subscription option change can be rolled back, while 
> > the
> > replication slot's failover change is preserved.
> >
> > This is because ALTER SUBSCRIPTION SET (failover) command internally
> > executes
> > the replication command ALTER_REPLICATION_SLOT to change the replication
> > slot's
> > failover property, but this replication command execution cannot be
> > rollback.
> >
> > To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
> > (failover) inside a txn block, which is also consistent to the ALTER
> > SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
> > patch to address this.
>
> Thanks for posting the patch, the fix is same as my expectation.
> Also, should we add the restriction to the doc? I feel [1] can be updated.

+1.

Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
because we call alter_replication_slot in CREATE SUB as well, for the
case when slot_name is provided and create_slot=false. But the tricky
part is we call alter_replication_slot() when creating subscription
for both failover=true and false. That means if we want to fix it on
the similar line of ALTER SUB, we have to disallow user from executing
the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
to break some existing use cases. (previously, user can execute such a
command inside a txn block).

So, we need to think if there are better ways to fix it.  After
discussion with Hou-San offlist, here are some ideas:

1. do not alter replication slot's failover option when CREATE
SUBSCRIPTION   WITH failover=false. This means we alter replication
slot only when failover is set to true. And thus we can disallow
CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false)
inside a txn block.

This option allows user to run CREATE-SUB(create_slot=false) with
failover=false in txn block like earlier. But on the downside, it
makes the behavior inconsistent for otherwise simpler option like
failover,  i.e. with failover=true, CREATE SUB is not allowed in txn
block while with failover=false, it is allowed. It makes it slightly
complex to be understood by user.

2. let's not disallow CREATE SUB in txn block as earlier, just don't
perform internal alter-failover during CREATE SUB for existing
slots(create_slot=false, slot_name=xx)  i.e. when create_slot is
false, we will ignore failover parameter of CREATE SUB and it is
user's responsibility to set it appropriately using ALTER SUB cmd. For
create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

This option does not add new restriction for CREATE SUB wrt txn block.
In context of failover with create_slot=false, we already have a
similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
failover by himself. CREAT SUB can also be documented in similar way.
This seems simpler to be understood considering existing ALTER SUB's
behavior as well. Plus, this will make CREATE-SUB code slightly
simpler and thus easily manageable.

3. add a alter_slot option for CREATE SUBSCRIPTION, we can only alter
the  slot's failover if alter_slot=true. And so we can disallow CREATE
SUB WITH (slot_name =xx, alter_slot=true) inside a txn block.

This seems a clean way, as everything will be as per user's consent
based on alter_slot parameter. But on the downside, this will need
introducing additional parameter and also adding new restriction of
running CREATE-sub in txn  block for a specific case.

4. Don't alter replication in subscription DDLs. Instead, try to alter
replication slot's failover in the apply worker. This means we need to
execute alter_replication_slot each time before starting streaming in
the apply worker.

This does not seem appealing to execute alter_replication_slot
everytime the apply worker starts. But if others think it as a better
option, it can be further analyzed.

Currently, our preference is option 2, as that looks a clean solution
and also aligns with ALTER-SUB behavior which is already documented.
Thoughts?

Note that we could not refer to the design of two_phase here, because
two_phase can be considered as a streaming option, so it's fine to
change the two_phase along with START_REPLICATION command. (the
two_phase is not changed in subscription DDLs, but get changed in
START_REPLICATION command). But the failover is closely related to a
replication slot itself.

Thanks
Shveta

Use read streams in CREATE DATABASE command when the strategy is wal_log

2024-04-16 Thread Nazir Bilal Yavuz

Hi,

I am working on using read streams in the CREATE DATABASE command when the
strategy is wal_log. RelationCopyStorageUsingBuffer() function is used in
this context. This function reads source buffers then copies them to the
destination buffers. I used read streams only when reading source buffers
because the destination buffers are read by 'RBM_ZERO_AND_LOCK' option, so
it is not important.

I created a ~6 GB table [1] and created a new database with the wal_log
strategy using the database that table was created in as a template [2]. My
benchmarking results are:

a. Timings:

patched:
12955.027 ms
12917.475 ms
13177.846 ms
12971.308 ms
13059.985 ms

master:
13156.375 ms
13054.071 ms
13151.607 ms
13152.633 ms
13160.538 ms

There is no difference in timings, the patched version is a tiny bit better
but it is negligible. I actually expected the patched version to be better
because there was no prefetching before, but the read stream API detects
sequential access and disables prefetching.

b. strace:

patched:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 68.023.749359   2   1285782   pwrite64
 18.541.021734  21 46730   preadv
  9.490.522889 826   633   fdatasync
  2.550.140339  59  2368   pwritev
  1.140.062583 409   153   fsync

master:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 59.713.831542   2   1288365   pwrite64
 29.841.914540   2747936   pread64
  7.900.506843 837   605   fdatasync
  1.580.101575  54  1856   pwritev
  0.750.048431 400   121   fsync

There are fewer (~1/16) read system calls in the patched version.

c. perf:

patched:

-   97.83% 1.13%  postgres  postgres   [.]
RelationCopyStorageUsingBuffer

   - 97.83% RelationCopyStorageUsingBuffer

  - 44.28% ReadBufferWithoutRelcache

 + 42.20% GetVictimBuffer

   0.81% ZeroBuffer

  + 31.86% log_newpage_buffer

  - 19.51% read_stream_next_buffer

 - 17.92% WaitReadBuffers

+ 17.61% mdreadv

 - 1.47% read_stream_start_pending_read

+ 1.46% StartReadBuffers

master:

-   97.68% 0.57%  postgres  postgres   [.]
RelationCopyStorageUsingBuffer

   - RelationCopyStorageUsingBuffer

  - 65.48% ReadBufferWithoutRelcache

 + 41.16% GetVictimBuffer

 - 20.42% WaitReadBuffers

+ 19.90% mdreadv

 + 1.85% StartReadBuffer

   0.75% ZeroBuffer

  + 30.82% log_newpage_buffer

Patched version spends less CPU time in read calls and more CPU time in
other calls such as write.

There are three patch files attached. First two are optimization and adding
a way to create a read stream object by using SMgrRelation, these are
already proposed in the streaming I/O thread [3]. The third one is the
actual patch file.

Any kind of feedback would be appreciated.

[1] CREATE TABLE t as select repeat('a', 100) || i || repeat('b', 500) as
filler from generate_series(1, 900) as i;
[2] CREATE DATABASE test_1 STRATEGY 'wal_log' TEMPLATE test;
[3]
https://www.postgresql.org/message-id/CAN55FZ1yGvCzCW_aufu83VimdEYHbG_zuOY3J9JL-nBptyJyKA%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From edcfacec40a70a8747c5f18777b6c28b0fccba7a Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Sun, 7 Apr 2024 22:33:36 +0300
Subject: [PATCH v1 1/3] Refactor PinBufferForBlock() to remove if checks about
 persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.
---
 src/backend/storage/aio/read_stream.c |  2 +-
 src/backend/storage/buffer/bufmgr.c   | 31 +++
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index f54dacdd914..d155dde5ce3 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
 	{
 		stream->ios[i].op.rel = rel;
 		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 901b7230fb9..17cb7127f99 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1067,24 +1067,10 @@ PinBufferForBlock(Relation rel,
 	BufferDesc *buf

Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Amul Sul

On Tue, Apr 16, 2024 at 3:44 PM David Rowley  wrote:

> On Tue, 16 Apr 2024 at 17:13, Amul Sul  wrote:
> > Attached is a small patch adding the missing BumpContext description to
> the
> > README.
>
> Thanks for noticing and working on the patch.
>
> There were a few things that were not quite accurate or are misleading:
>
> 1.
>
> > +These three memory contexts aim to free memory back to the operating
> system
>
> That's not true for bump. It's the worst of the 4.  Worse than aset.
> It only returns memory when the context is reset/deleted.
>
> 2.
>
> "These memory contexts were initially developed for ReorderBuffer, but
> may be useful elsewhere as long as the allocation patterns match."
>
> The above isn't true for bump.  It was written for tuplesort.  I think
> we can just remove that part now.  Slab and generation are both old
> enough not to care why they were conceived.
>
> Also since adding bump, I think the choice of which memory context to
> use is about 33% harder than it used to be when there were only 3
> context types.  I think this warrants giving more detail on what these
> 3 special-purpose memory allocators are good for.  I've added more
> details in the attached patch.  This includes more details about
> freeing malloc'd blocks
>
> I've tried to detail out enough of the specialities of the context
> type without going into extensive detail. My hope is that there will
> be enough detail for someone to choose the most suitable looking one
> and head over to the corresponding .c file to find out more.
>
> Is that about the right level of detail?
>

Yes, it looks much better now, thank you.

Regards,
Amul

Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread David Rowley

On Mon, 8 Apr 2024 at 00:37, David Rowley  wrote:
> I've now pushed all 3 patches.   Thank you for all the reviews on
> these and for the extra MemoryContextMethodID bit, Matthias.

I realised earlier today when working on [1] that bump makes a pretty
brain-dead move when adding dedicated blocks to the blocks list.  The
problem is that I opted to not have a current block field in
BumpContext and just rely on the head pointer of the blocks list to be
the "current" block.  The head block is the block we look at to see if
we've any space left when new allocations come in.  The problem there
is when adding a dedicated block in BumpAllocLarge(), the code adds
this to the head of the blocks list so that when a new allocation
comes in that's normal-sized, the block at the top of the list is full
and we have to create a new block for the allocation.

The attached fixes this by pushing these large/dedicated blocks to the
*tail* of the blocks list.  This means the partially filled block
remains at the head and is available for any new allocation which will
fit.  This behaviour is evident by the regression test change that I
added earlier today when working on [1].  The 2nd and smaller
allocation in that text goes onto the keeper block rather than a new
block.

I plan to push this tomorrow.

David

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bea97cd02ebb347ab469b78673c2b33a72109669
From 39a60420a83e5a059de50869c0981be9732909ab Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Tue, 16 Apr 2024 22:26:00 +1200
Subject: [PATCH v1] Push dedicated BumpBlocks to the tail of the blocks list

BumpContext relies on using the head block from its 'blocks' field to
use as the current block to allocate new chunks to.  When we receive an
allocation request larger than allocChunkLimit, we place these chunks on
a new dedicated block and push this block onto the head of the 'blocks'
list.  This behavior caused the previous bump block to no longer be
available for new normal-sized (non-large) allocations and would result
in wasting space on blocks when a large request arrived before the
current block was full.

Here adjust the code to push these dedicated blocks onto the *tail* of
the blocks list so that the head block remains intact and available to
be used by normal allocation request sizes.

Discussion: 
https://postgr.es/m/caaphdvp9___r-ayjj0nz6gd3mecgwgz0_6zptwpwj+zqhtm...@mail.gmail.com
---
 src/backend/utils/mmgr/bump.c  | 8 ++--
 src/test/regress/expected/sysviews.out | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
index 449bd29344..83798e2245 100644
--- a/src/backend/utils/mmgr/bump.c
+++ b/src/backend/utils/mmgr/bump.c
@@ -342,8 +342,12 @@ BumpAllocLarge(MemoryContext context, Size size, int flags)
randomize_mem((char *) MemoryChunkGetPointer(chunk), size);
 #endif

-   /* add the block to the list of allocated blocks */
-   dlist_push_head(&set->blocks, &block->node);
+   /*
+* Add the block to the tail of allocated blocks list.  The current 
block
+* is left at the head of the list as it may still have space for
+* non-large allocations.
+*/
+   dlist_push_tail(&set->blocks, &block->node);

 #ifdef MEMORY_CONTEXT_CHECKING
/* Ensure any padding bytes are marked NOACCESS. */
diff --git a/src/test/regress/expected/sysviews.out 
b/src/test/regress/expected/sysviews.out
index 634dc8d8b8..2f3eb4e7f1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -47,7 +47,7 @@ select name, parent, total_bytes > 0, total_nblocks, 
free_bytes > 0, free_chunks
 from pg_backend_memory_contexts where name = 'Caller tuples';
  name  | parent | ?column? | total_nblocks | ?column? | 
free_chunks 

---++--+---+--+-
- Caller tuples | TupleSort sort | t| 3 | t|
   0
+ Caller tuples | TupleSort sort | t| 2 | t|
   0
 (1 row)

 rollback;
-- 
2.40.1

Re: Table AM Interface Enhancements

On Mon, Apr 15, 2024 at 11:17 PM Andres Freund  wrote:
> On 2024-04-15 16:02:00 -0400, Robert Haas wrote:
> > Do you want that patch applied, not applied, or applied with some set of
> > modifications?
>
> I think we should apply Alexander's proposed revert and then separately
> discuss what we should do about 041b96802ef.

Taking a closer look at acquire_sample_rows(), I think it would be
good if table AM implementation would care about block-level (or
whatever-level) sampling.  So that acquire_sample_rows() just fetches
tuples one-by-one from table AM implementation without any care about
blocks.  Possible table_beginscan_analyze() could take an argument of
target number of tuples, then those tuples are just fetches with
table_scan_analyze_next_tuple().  What do you think?

--
Regards,
Alexander Korotkov

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-16 Thread Richard Guo

On Tue, Apr 16, 2024 at 5:48 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Oversight of 0294df2f1f84 [1].
>
> [1]:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0294df2f1f84


+1.  I think this change improves the code quality.  I searched for
other arrays indexed by merge match kind, but found none.  So this patch
seems thorough.

Thanks
Richard

Re: Table AM Interface Enhancements

On Mon, Apr 15, 2024 at 11:11 PM Andres Freund  wrote:
> On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote:
> > 1) If we just apply my revert patch and leave c6fc50cb4028 and
> > 041b96802ef in the tree, then we get our table AM API narrowed.  As
> > you expressed the current API requires block numbers to be 1:1 with
> > the actual physical on-disk location [2].  Not a secret I think the
> > current API is quite restrictive.  And we're getting the ANALYZE
> > interface narrower than it was since 737a292b5de.  Frankly speaking, I
> > don't think this is acceptable.
>
> As others already pointed out, c6fc50cb4028 was committed quite a while
> ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that
> until it was too late.

+1

> > In token of all of the above, is the in-tree state that bad? (if we
> > abstract the way 27bc1772fc and dd1f6b0c17 were committed).
>
> To me the 27bc1772fc doesn't make much sense on its own. You added calls
> directly to heapam internals to a file in src/backend/commands/, that just
> doesn't make sense.
>
> Leaving that aside, I think the interface isn't good on its own:
> table_relation_analyze() doesn't actually do anything, it just sets callbacks,
> that then later are called from analyze.c, which doesn't at all fit to the
> name of the callback/function.  I realize that this is kinda cribbed from the
> FDW code, but I don't think that is a particularly good excuse.
>
> I don't think dd1f6b0c17 improves the situation, at all. It sets global
> variables to redirect how an individual acquire_sample_rows invocation
> works:
> void
> block_level_table_analyze(Relation relation,
>   AcquireSampleRowsFunc *func,
>   BlockNumber *totalpages,
>   BufferAccessStrategy 
> bstrategy,
>   ScanAnalyzeNextBlockFunc 
> scan_analyze_next_block_cb,
>   ScanAnalyzeNextTupleFunc 
> scan_analyze_next_tuple_cb)
> {
> *func = acquire_sample_rows;
> *totalpages = RelationGetNumberOfBlocks(relation);
> vac_strategy = bstrategy;
> scan_analyze_next_block = scan_analyze_next_block_cb;
> scan_analyze_next_tuple = scan_analyze_next_tuple_cb;
> }
>
> Notably it does so within the ->relation_analyze tableam callback, which does
> *NOT* not actually do anything other than returning a callback.  So if
> ->relation_analyze() for another relation is called, the acquire_sample_rows()
> for the earlier relation will do something different.  Note that this isn't a
> theoretical risk, acquire_inherited_sample_rows() actually collects the
> acquirefunc for all the inherited relations before calling acquirefunc.

You're right.  No sense trying to fix this.  Reverted.

--
Regards,
Alexander Korotkov

Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread David Rowley

On Tue, 16 Apr 2024 at 17:13, Amul Sul  wrote:
> Attached is a small patch adding the missing BumpContext description to the
> README.

Thanks for noticing and working on the patch.

There were a few things that were not quite accurate or are misleading:

1.

> +These three memory contexts aim to free memory back to the operating system

That's not true for bump. It's the worst of the 4.  Worse than aset.
It only returns memory when the context is reset/deleted.

2.

"These memory contexts were initially developed for ReorderBuffer, but
may be useful elsewhere as long as the allocation patterns match."

The above isn't true for bump.  It was written for tuplesort.  I think
we can just remove that part now.  Slab and generation are both old
enough not to care why they were conceived.

Also since adding bump, I think the choice of which memory context to
use is about 33% harder than it used to be when there were only 3
context types.  I think this warrants giving more detail on what these
3 special-purpose memory allocators are good for.  I've added more
details in the attached patch.  This includes more details about
freeing malloc'd blocks

I've tried to detail out enough of the specialities of the context
type without going into extensive detail. My hope is that there will
be enough detail for someone to choose the most suitable looking one
and head over to the corresponding .c file to find out more.

Is that about the right level of detail?

David
diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index f484f7d6f5..695088bb66 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -471,25 +471,32 @@ thrashing.
 Alternative Memory Context Implementations
 --
 
-aset.c is our default general-purpose implementation, working fine
-in most situations. We also have two implementations optimized for
-special use cases, providing either better performance or lower memory
-usage compared to aset.c (or both).
-
-* slab.c (SlabContext) is designed for allocations of fixed-length
-  chunks, and does not allow allocations of chunks with different size.
-
-* generation.c (GenerationContext) is designed for cases when chunks
-  are allocated in groups with similar lifespan (generations), or
-  roughly in FIFO order.
-
-Both memory contexts aim to free memory back to the operating system
-(unlike aset.c, which keeps the freed chunks in a freelist, and only
-returns the memory when reset/deleted).
-
-These memory contexts were initially developed for ReorderBuffer, but
-may be useful elsewhere as long as the allocation patterns match.
-
+aset.c (AllocSetContext) is our default general-purpose allocator.  Three other
+allocator types also exist which are special-purpose:
+
+* slab.c (SlabContext) is designed for allocations of fixed-sized
+  chunks.  The fixed chunk size must be specified when creating the context.
+  New chunks are allocated to the fullest block, keeping used chunks densely
+  packed together to avoid memory fragmentation.  This also increases the
+  chances that pfree'ing a chunk will result in a block becoming empty of all
+  chunks and allow it to be free'd back to the operating system.
+
+* generation.c (GenerationContext) is best suited for cases when chunks are
+  allocated in groups with similar lifespan (generations), or roughly in FIFO
+  order.  No attempt is made to reuse space left by pfree'd chunks.  Blocks
+  are returned to the operating system when all chunks on them have been
+  pfree'd.
+
+* bump.c (BumpContext) is best suited for use cases that require densely
+  allocated chunks of memory that never need to be individually pfree'd or
+  repalloc'd.  These operations are unsupported due to BumpContext chunks
+  having no chunk header.  No chunk header means more densely packed chunks,
+  which is especially useful for workloads that perform lots of small
+  allocations.  Blocks are only free'd back to the operating system when the
+  context is reset or deleted.
+
+For further details, please read the header comment in the corresponding .c
+file.
 
 Memory Accounting
 -

[PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-16 Thread Aleksander Alekseev

Oversight of 0294df2f1f84 [1].

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0294df2f1f84

-- 
Best regards,
Aleksander Alekseev


v1-0001-Replace-constant-3-with-NUM_MERGE_MATCH_KINDS.patch
Description: Binary data

Re: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread Bertrand Drouvot

Hi,

On Tue, Apr 16, 2024 at 02:06:45PM +0530, shveta malik wrote:
> On Tue, Apr 16, 2024 at 1:55 PM Zhijie Hou (Fujitsu)
>  wrote:
> > I personally feel adding the additional check for sync_replication_slots may
> > not improve the situation here. Because the GUC sync_replication_slots can
> > change at any point, the GUC could be false when performing this addition 
> > check
> > and is set to true immediately after the check, so It could not simplify 
> > the logic
> > anyway.
> 
> +1.
> I feel doc and "cannot synchronize replication slots concurrently"
> check should suffice.
> 
> In the scenario which Hou-San pointed out,  if after performing the
> GUC check in SQL function, this GUC is enabled immediately and say
> worker is started sooner than the function could get chance to sync,
> in that case as well, SQL function will ultimately get error "cannot
> synchronize replication slots concurrently", even though GUC is
> enabled.  Thus, I feel we should stick with samer error in all
> scenarios.

Okay, fine by me, let's forget about checking sync_replication_slots then.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add recovery to pg_control and remove backup_label

Hi,

On Sunday, March 10th, 2024 at 4:47 AM, David Steele wrote:
> I've had a new idea which may revive this patch. The basic idea is to
> keep backup_label but also return a copy of pg_control from
> pg_stop_backup(). This copy of pg_control would be safe from tears and
> have a backupLabelRequired field set (as Andres suggested) so recovery
> cannot proceed without the backup label.
> 
> So, everything will continue to work as it does now. But, backup
> software can be enhanced to write the improved pg_control that is
> guaranteed not to be torn and has protection against a missing backup label.
> 
> Of course, pg_basebackup will write the new backupLabelRequired field
> into pg_control, but this way third party software can also gain
> advantages from the new field.

Bump on this idea.

Given the discussion in [1], even if it obviously makes sense to improve the in 
core backup capabilities, the more we go in that direction, the more we'll rely 
on outside orchestration.
So IMHO it also worth worrying about given more leverage to such orchestration 
tools. In that sense, I really like the idea to extend the backup functions.

More thoughts?

Thanks all,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1] 
https://www.postgresql.org/message-id/lwXoqQdOT9Nw1tJIx_h7WuqMKrB1YMePQY99RFTZ87H7V52mgUJaSlw2WRbcOgKNUurF1yJqX3nqtZi4hJhtd3e_XlmLsLvnEtGXY-fZPoA%3D%40protonmail.com

Re: post-freeze damage control

Hi,

On Saturday, April 13th, 2024 at 12:18 PM, Tomas Vondra wrote:
> On 4/13/24 01:03, David Steele wrote:
> > On 4/12/24 22:12, Tomas Vondra wrote:
> > > On 4/11/24 23:48, David Steele wrote:
> > > > On 4/11/24 20:26, Tomas Vondra wrote:
> > > > 
> > > > > FWIW that discussion also mentions stuff that I think the feature
> > > > > should
> > > > > not do. In particular, I don't think the ambition was (or should be) 
> > > > > to
> > > > > make pg_basebackup into a stand-alone tool. I always saw pg_basebackup
> > > > > more as an interface to "backup steps" correctly rather than a 
> > > > > complete
> > > > > backup solution that'd manage backup registry, retention, etc.
> > > > 
> > > > Right -- this is exactly my issue. pg_basebackup was never easy to use
> > > > as a backup solution and this feature makes it significantly more
> > > > complicated. Complicated enough that it would be extremely difficult for
> > > > most users to utilize in a meaningful way.

pg_basebackup has its own use-cases IMHO. It is very handy to simply take a 
copy of the running cluster, thanks to its ability to carry on the needed WAL 
segs without the user even needing to think about 
archive_command/pg_receivewal. And for this kind of tasks, the new incremental 
thing will not really be interesting and won't make things more complicated.

I totally agree that we don't have any complete backup solution in core though. 
And adding more tools to the picture (pg_basebackup, pg_combinebackup, 
pg_receivewal, pg_verifybackup,...) will increase the need of on-top 
orchestration. But that's not new. And for people already having such 
orchestration, having the incremental feature will help.

> > > Perhaps, I agree we could/should try to do better job to do backups, no
> > > argument there. But I still don't quite see why introducing such
> > > infrastructure to "manage backups" should be up to the patch adding
> > > incremental backups. I see it as something to build on top of
> > > pg_basebackup/pg_combinebackup, not into those tools.
> > 
> > I'm not saying that managing backups needs to be part of pg_basebackup,
> > but I am saying without that it is not a complete backup solution.
> > Therefore introducing advanced features that the user then has to figure
> > out how to manage puts a large burden on them. Implementing
> > pg_combinebackup inefficiently out of the gate just makes their life
> > harder.
> 
> I agree with this in general, but I fail to see how it'd be the fault of
> this patch. It merely extends what pg_basebackup did before, so if it's
> not a complete solution now, it wasn't a complete solution before.

+1. We can see it as a step to having a better backup solution in core for the 
future, but we definitely shouldn't rule out the fact that lots of people 
already developed such orchestration (home-made or relying to pgbarman, 
pgbackrest, wal-g,...). IMHO, if we're trying to extend the in core features, 
we should also aim at giving more lights and features for those tools (like 
adding more fields to the backup functions,...).

> > > Sure, I'm not against making it clearer pg_combinebackup is not a
> > > complete backup solution, and documenting the existing restrictions.
> > 
> > Let's do that then. I think it would make sense to add caveats to the
> > pg_combinebackup docs including space requirements, being explicit about
> > the format required (e.g. plain), and also possible mitigation with COW
> > filesystems.
> 
> OK. I'll add this as an open item, for me and Robert.

Thanks for this! It's probably not up to core docs to state all the steps that 
would be needed to develop a complete backup solution but documenting the links 
between the tools and mostly all the caveats (the "don't use INCREMENTAL.* 
filenames",...) will help users not be caught off guard. And as I mentioned in 
[1], IMO the earlier we can catch a potential issue (wrong filename, missing 
file,...), the better for the user.

Thank you all for working on this.
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1] 
https://www.postgresql.org/message-id/vJnnuiaye5rNnCPN8h3xN1Y3lyUDESIgEQnR-Urat9_ld_fozShSJbEk8JDM_3K6BVt5HXT-CatWpSfEZkYVeymlrxKO2_kfKmVZNWyCuJc%3D%40protonmail.com

Re: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread shveta malik

On Tue, Apr 16, 2024 at 1:55 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, April 16, 2024 2:52 PM Bertrand Drouvot 
>  wrote:
>
>
> Hi,
>
> > On Tue, Apr 16, 2024 at 10:00:04AM +0530, shveta malik wrote:
> > > Please find v5 addressing above comments.
> >
> > Thanks!
> >
> > @@ -1634,9 +1677,14 @@ SyncReplicationSlots(WalReceiverConn *wrconn)  {
> > PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> > PointerGetDatum(wrconn));
> > {
> > +   check_flags_and_set_sync_info(InvalidPid);
> > +
> >
> > Given the fact that if the sync worker is running it won't be possible to 
> > trigger a
> > manual sync with pg_sync_replication_slots(), what about also checking the
> > "sync_replication_slots" value at the start of SyncReplicationSlots() and 
> > emmit
> > an error if sync_replication_slots is set to on? (The message could 
> > explicitly
> > states that it's not possible to use the function if sync_replication_slots 
> > is set to
> > on).
>
> I personally feel adding the additional check for sync_replication_slots may
> not improve the situation here. Because the GUC sync_replication_slots can
> change at any point, the GUC could be false when performing this addition 
> check
> and is set to true immediately after the check, so It could not simplify the 
> logic
> anyway.

+1.
I feel doc and "cannot synchronize replication slots concurrently"
check should suffice.

In the scenario which Hou-San pointed out,  if after performing the
GUC check in SQL function, this GUC is enabled immediately and say
worker is started sooner than the function could get chance to sync,
in that case as well, SQL function will ultimately get error "cannot
synchronize replication slots concurrently", even though GUC is
enabled.  Thus, I feel we should stick with samer error in all
scenarios.


thanks
Shveta

Re: Typos in the code and README

2024-04-16 Thread Richard Guo

On Mon, Apr 15, 2024 at 8:26 PM Daniel Gustafsson  wrote:

> Thanks.  Collecting all the ones submitted here, as well as a few submitted
> off-list by Alexander, the patch is now a 3-part patchset of cleanups:
>
> 0001 contains the typos and duplicate words fixups, 0002 fixes a parameter
> with
> the wrong name in the prototype and 0003 removes a leftover prototype
> which was
> accidentally left in a refactoring.


BTW, it seems that 0001 needs a rebase over 9dfcac8e15.

Thanks
Richard

Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Daniel Gustafsson

> On 16 Apr 2024, at 07:12, Amul Sul  wrote:
> 
> Attached is a small patch adding the missing BumpContext description to the
> README.

Nice catch, we should add it to the README.

+  pfree'd or realloc'd.
I think it's best to avoid mixing API:s, "pfree'd or repalloc'd" keeps it to
functions in our API instead.

--
Daniel Gustafsson

RE: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 16, 2024 2:52 PM Bertrand Drouvot 
 wrote:

Hi,

> On Tue, Apr 16, 2024 at 10:00:04AM +0530, shveta malik wrote:
> > Please find v5 addressing above comments.
> 
> Thanks!
> 
> @@ -1634,9 +1677,14 @@ SyncReplicationSlots(WalReceiverConn *wrconn)  {
> PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> {
> +   check_flags_and_set_sync_info(InvalidPid);
> +
> 
> Given the fact that if the sync worker is running it won't be possible to 
> trigger a
> manual sync with pg_sync_replication_slots(), what about also checking the
> "sync_replication_slots" value at the start of SyncReplicationSlots() and 
> emmit
> an error if sync_replication_slots is set to on? (The message could explicitly
> states that it's not possible to use the function if sync_replication_slots 
> is set to
> on).

I personally feel adding the additional check for sync_replication_slots may
not improve the situation here. Because the GUC sync_replication_slots can
change at any point, the GUC could be false when performing this addition check
and is set to true immediately after the check, so It could not simplify the 
logic
anyway.

Best Regards,
Hou zj

Re: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread shveta malik

On Tue, Apr 16, 2024 at 12:03 PM Bertrand Drouvot
 wrote:
>
> > I think for now let's restrict their usage in parallel and make the
> > promotion behavior consistent both for worker and API.
>
> Okay, let's do it that way. Is it worth to add a few words in the doc related 
> to
> pg_sync_replication_slots() though? (to mention it can not be used if the sync
> slot worker is running).

+1. Please find v6 having the suggested doc changes.


thanks
Shveta


v6-0001-Handle-stopSignaled-during-sync-function-call.patch
Description: Binary data

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-16 Thread Daniel Gustafsson



> On 16 Apr 2024, at 01:03, Michael Paquier  wrote:
> 
> On Mon, Apr 15, 2024 at 11:07:05AM +0200, Daniel Gustafsson wrote:
>> On 15 Apr 2024, at 07:04, Michael Paquier  wrote:
>>> On Fri, Apr 12, 2024 at 02:42:57PM +0200, Daniel Gustafsson wrote:
 Is the attached split in line with how you were thinking about it?
>>> 
>>> If I may, 0001 looks sensible here.  The bits from 0003 and 0004 could
>>> be applied before 0002, as well.
>> 
>> Agreed, once we are in post-freeze I think those three are mostly
>> ready to go.
> 
> Is there a point to wait for 0001, 0003 and 0004, though, and
> shouldn't these three be backpatched?  0001 is certainly OK as a
> doc-only change to be consistent across the board without waiting 5
> more years.  0003 and 0004 are conditional and depend on if
> SSL_R_VERSION_TOO_LOW and SSL_OP_NO_CLIENT_RENEGOTIATION are defined
> at compile-time.  0003 is much more important, and note that
> 01e6f1a842f4 has been backpatched all the way down.  0004 is nice,
> still not strongly mandatory.

I forgot (and didn't check) that we backpatched 01e6f1a842f4, with that in mind
I agree that we should backpatch 0003 as well to put LibreSSL on par as much as
we can.  0004 is a fix for the LibreSSL support, not adding anything new, so
pushing that to master now makes sense.  Unless objections are raised I'll push
0001, 0003 and 0004 shortly.  0002 and 0005 can hopefully be addressed in the
July commitfest.

>>> Rather than calling always RAND_poll(), this
>>> could use a static flag to call it only once when pg_strong_random is
>>> called for the first time.
>> 
>> Agreed, we can good that. Fixed.
> 
> +#if (OPENSSL_VERSION_NUMBER <= 0x1010L)
> + static bool rand_initialized = false;
> 
> This does not look right.  At the top of upstream's branch
> OpenSSL_1_1_0-stable, OPENSSL_VERSION_NUMBER is 0x101000d0L, so the
> initialization would be missed for any version in the 1.1.0 series
> except the GA one without this code being compiled, no?

Meh, I was clearly not caffeinated as that's a thinko with a typo attached to
it.  The check should be "< 0x10101000" to catch any version prior to 1.1.1.
Fixed.

--
Daniel Gustafsson



v9-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


v9-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v9-0003-Support-disallowing-SSL-renegotiation-in-LibreSSL.patch
Description: Binary data


v9-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v9-0001-Doc-Use-past-tense-for-things-which-happened-in-t.patch
Description: Binary data

RE: Disallow changing slot's failover option in transaction block

2024-04-16 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

> Kuroda-San reported an issue off-list that:
> 
> If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
> and rollback, only the subscription option change can be rolled back, while 
> the
> replication slot's failover change is preserved.
> 
> This is because ALTER SUBSCRIPTION SET (failover) command internally
> executes
> the replication command ALTER_REPLICATION_SLOT to change the replication
> slot's
> failover property, but this replication command execution cannot be
> rollback.
> 
> To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
> (failover) inside a txn block, which is also consistent to the ALTER
> SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
> patch to address this.

Thanks for posting the patch, the fix is same as my expectation.
Also, should we add the restriction to the doc? I feel [1] can be updated.


[1]:https://www.postgresql.org/docs/devel/sql-altersubscription.html#SQL-ALTERSUBSCRIPTION-PARAMS-SET

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-16 Thread Ajin Cherian

On Tue, Apr 16, 2024 at 1:31 AM Давыдов Виталий 
wrote:

> Dear All,
> Just interested, does anyone tried to reproduce the problem with slow
> catchup of twophase transactions (pgbench should be used with big number of
> clients)? I haven't seen any messages from anyone other that me that the
> problem takes place.
>
>
>

 Yes, I was able to reproduce the slow catchup of twophase transactions
with pgbench with 20 clients.

regards,
Ajin Cherian
Fujitsu Australia

Re: Disallow changing slot's failover option in transaction block

2024-04-16 Thread Bertrand Drouvot

Hi,

On Tue, Apr 16, 2024 at 06:32:11AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> Kuroda-San reported an issue off-list that:
> 
> If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
> and rollback, only the subscription option change can be rolled back, while 
> the
> replication slot's failover change is preserved.

Nice catch, thanks!

> To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
> (failover) inside a txn block, which is also consistent to the ALTER
> SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
> patch to address this.

Agree. The patch looks pretty straightforward to me. Worth to add this
case in the doc? (where we already mention that "Commands ALTER SUBSCRIPTION ...
REFRESH PUBLICATION and ALTER SUBSCRIPTION ... {SET|ADD|DROP} PUBLICATION ...
with refresh option as true cannot be executed inside a transaction block"

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block

2024-04-16 Thread Quan Zongliang




On 2024/3/11 09:25, Quan Zongliang wrote:



On 2024/3/4 15:48, jian he wrote:


Maybe we can tolerate LOG, first output the query plan then statement.


It is more appropriate to let the extension solve its own problems.
Of course, this change is not easy to implement.
Due to the way XID is assigned, there seems to be no good solution at 
the moment.



According to the discussion with Jian He. Use the guc hook to check if 
the xid needs to be output. If needed, the statement log can be delayed 
to be output.diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..217910c9de 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1068,11 +1068,14 @@ exec_simple_query(const char *query_string)
/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
-   ereport(LOG,
-   (errmsg("statement: %s", query_string),
-errhidestmt(true),
-errdetail_execute(parsetree_list)));
-   was_logged = true;
+   if (!Log_has_xid || 
TransactionIdIsValid(GetTopTransactionIdIfAny()))
+   {
+   ereport(LOG,
+   (errmsg("statement: %s", query_string),
+   errhidestmt(true),
+   errdetail_execute(parsetree_list)));
+   was_logged = true;
+   }
}
 
/*
@@ -1283,6 +1286,16 @@ exec_simple_query(const char *query_string)
 
PortalDrop(portal, false);
 
+   /* Log if dictated by log_statement and has not been logged. */
+   if (!was_logged && check_log_statement(parsetree_list))
+   {
+   ereport(LOG,
+   (errmsg("statement: %s", query_string),
+   errhidestmt(true),
+   errdetail_execute(parsetree_list)));
+   was_logged = true;
+   }
+
if (lnext(parsetree_list, parsetree_item) == NULL)
{
/*
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 605ff3b045..e7dd9f5027 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -79,6 +79,7 @@
 #include "storage/ipc.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
+#include "utils/guc.h"
 #include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -114,6 +115,9 @@ char   *Log_destination_string = NULL;
 bool   syslog_sequence_numbers = true;
 bool   syslog_split_messages = true;
 
+/* Whether the transaction id will appear in the log messages. */
+bool   Log_has_xid = false;
+
 /* Processed form of backtrace_functions GUC */
 static char *backtrace_function_list;
 
@@ -286,6 +290,50 @@ message_level_is_interesting(int elevel)
 }
 
 
+/*
+ * check_log_line_prefix: GUC check_hook for Log_line_prefix
+ */
+bool check_log_line_prefix(char **newval, void **extra, GucSource source)
+{
+   const char *p;
+
+   if ((*newval) == NULL)
+   {
+   Log_has_xid = false;
+   return true;
+   }
+
+   Log_has_xid = false;
+
+   for (p = (*newval); *p != '\0'; p++)
+   {
+   if (*p != '%')
+   {
+   /* literal char, just skip */
+   continue;
+   }
+
+   /* must be a '%', so skip to the next char */
+   p++;
+   if (*p == '\0')
+   break;  /* format error - 
ignore it */
+   else if (*p == '%')
+   {
+   /* string contains %% */
+   continue;
+   }
+
+   /* process the option */
+   if (*p == 'x')
+   {
+   Log_has_xid = true;
+   break;
+   }
+   }
+
+   return true;
+}
+
 /*
  * in_error_recursion_trouble --- are we at risk of infinite error recursion?
  *
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index c68fdc008b..96948e5005 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -4103,7 +4103,7 @@ struct config_string ConfigureNamesString[] =
},
&Log_line_prefix,
"%m [%p] ",
-   NULL, NULL, NULL
+   check_log_line_prefix, NULL, NULL
},
 
{
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62..9536146ce3 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -502,6 +502,8 @@ extern PGDLLIMPORT char *Log_de

Re: pg_combinebackup fails on file named INCREMENTAL.*