Re: logical decoding and replication of sequences, take 2

2023-03-17 Thread Amit Kapila
On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
 wrote:
>
> On 3/17/23 18:55, Tomas Vondra wrote:
> >
> > ...
> >
> > This however made me realize the initial sync of sequences may not be
> > correct. I mean, the idea of tablesync is syncing the data in REPEATABLE
> > READ transaction, and then applying decoded changes. But sequences are
> > not transactional in this way - if you select from a sequence, you'll
> > always see the latest data, even in REPEATABLE READ.
> >
> > I wonder if this might result in losing some of the sequence increments,
> > and/or applying them in the wrong order (so that the sequence goes
> > backward for a while).
> >
>
> Yeah, I think my suspicion was warranted - it's pretty easy to make the
> sequence go backwards for a while by adding a sleep between the slot
> creation and the copy_sequence() call, and increment the sequence in
> between (enough to do some WAL logging).
>
> The copy_sequence() then reads the current on-disk state (because of the
> non-transactional nature w.r.t. REPEATABLE READ), applies it, and then
> we start processing the WAL added since the slot creation. But those are
> older, so stuff like this happens:
>
> 21:52:54.147 CET [35404] WARNING:  copy_sequence 1222 0 1
> 21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 990 0 1
> 21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 1023 0 1
> 21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 1056 0 1
> 21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1089 0 1
> 21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1122 0 1
> 21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1155 0 1
> 21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1188 0 1
> 21:52:54.175 CET [35404] WARNING:  apply_handle_sequence 1221 0 1
> 21:52:54.898 CET [35402] WARNING:  apply_handle_sequence 1254 0 1
>
> Clearly, for sequences we can't quite rely on snapshots/slots, we need
> to get the LSN to decide what changes to apply/skip from somewhere else.
> I wonder if we can just ignore the queued changes in tablesync, but I
> guess not - there can be queued increments after reading the sequence
> state, and we need to apply those. But maybe we could use the page LSN
> from the relfilenode - that should be the LSN of the last WAL record.
>
> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
> use to read the sequence state ...
>

What if some Alter Sequence is performed before the copy starts and
after the copy is finished, the containing transaction rolled back?
Won't it copy something which shouldn't have been copied?

-- 
With Regards,
Amit Kapila.




Re: meson issue? ninja clean doesn't drop queryjumblefuncs.funcs.c

2023-03-17 Thread Pavel Stehule
so 18. 3. 2023 v 1:58 odesílatel Michael Paquier 
napsal:

> On Fri, Mar 17, 2023 at 09:11:53PM +0100, Pavel Stehule wrote:
> > and queryjumblefuncs.switch.c files.
>
> Let me see..  It looks like src/include/nodes/meson.build is just
> missing a refresh.  Will check and fix, thanks for the report!
>

thank you

Pavel


> --
> Michael
>


Fix misplaced shared_preload_libraries_in_progress check in few extensions

2023-03-17 Thread Bharath Rupireddy
Hi,

I'm attaching a patch to do $subject in autoprewarm.c and worker_spi
extensions. The way it is right now doesn't hurt anyone, but why to
fail after defining custom GUCs if we aren't loading them via
shared_preload_libraries.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Fix-misplaced-shared_preload_libraries_in_progres.patch
Description: Binary data


Re: Data is copied twice when specifying both child and parent table in publication

2023-03-17 Thread Amit Kapila
On Sat, Mar 18, 2023 at 5:06 AM Jacob Champion  wrote:
>
> On Thu, Mar 16, 2023 at 11:28 PM wangw.f...@fujitsu.com
>  wrote:
> > Attach the new patch set.
>
> Hi,
>
> I ran into this problem while hacking on [1], so thank you for tackling
> it! I have no strong opinions on the implementation itself; I just want
> to register a concern that the tests have not kept up with the
> implementation complexity.
>
> For example, the corner case mentioned in 0003, with multiple
> publications having conflicting pubviaroot settings, isn't tested as far
> as I can see. (I checked manually, and it appears to work as intended.)
> And the related pub_lower_level test currently only covers the case
> where multiple publications have pubviaroot=true, so the following test
> comment is now misleading:
>
> > # for tab4, we publish changes through the "middle" partitioned table
> > $node_publisher->safe_psql('postgres',
> >   "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH 
> > (publish_via_partition_root = true)"
> > );
>
> ...since the changes are now in fact published via the tab4 root after
> this patchset is applied.
>
> > I think the aim of joining it with pg_publication before is to exclude
> > non-existing publications. Otherwise, we would get an error because of the 
> > call
> > to function GetPublicationByName (with 'missing_ok = false') in function
> > pg_get_publication_tables.
>
> In the same vein, I don't think that case is covered anywhere.
>

We can have a test case to cover this scenario.

> There are a bunch of moving parts and hidden subtleties here, and I fell
> into a few traps when I was working on my patch, so it'd be nice to have
> additional coverage. I'm happy to contribute effort in that area if it's
> helpful.
>

I think it depends on what tests you have in mind. I suggest you can
propose a patch to cover tests for this are in a separate thread. We
can then evaluate those separately.

-- 
With Regards,
Amit Kapila.




Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Bharath Rupireddy
On Sat, Mar 18, 2023 at 1:06 AM Peter Geoghegan  wrote:
>
> On Fri, Mar 17, 2023 at 12:20 AM Bharath Rupireddy
>  wrote:
> > +1 for pg_get_wal_block_info emitting per-record WAL info too along
> > with block info, attached v2 patch does that. IMO, usability wins the
> > race here.
>
> I think that this direction makes a lot of sense. Under this scheme,
> we still have pg_get_wal_records_info(), which is more or less an SQL
> interface to the information that pg_waldump presents by default.
> That's important when the record view of things is of primary
> importance. But now you also have a "block oriented" view of WAL
> presented by pg_get_wal_block_info(), which is useful when particular
> blocks are of primary interest. I think that I'll probably end up
> using both, while primarily using pg_get_wal_block_info() for more
> advanced analysis that focuses on what happened to particular blocks
> over time.

Hm.

> It makes sense to present pg_get_wal_block_info() immediately after
> pg_get_wal_records_info() in the documentation under this scheme,
> since they're closely related.

-1. I don't think we need that and even if we did, it's hard to
maintain that ordering in future. One who knows to use these functions
will anyway get to know how they're related.

> (Checks pg_walinspect once more...)
>
> Actually, I now see that block_ref won't be NULL for those records
> that have no block references at all -- it just outputs an empty
> string.

Yes, that's unnecessary.

> But wouldn't it be better if it actually output NULL?

+1 done so in the attached 0001 patch.

> Better
> for its own sake, but also better because doing so enables describing
> the relationship between the two functions with reference to
> block_ref. It seems particularly helpful to me to be able to say that
> pg_get_wal_block_info() doesn't show anything for precisely those WAL
> records whose block_ref is NULL according to
> pg_get_wal_records_info().

Hm.

Attaching v3 patch set - 0001 optimizations around block references,
0002 enables pg_get_wal_block_info() to emit per-record info. Any
thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 153d1d8ae7f03471aa74b685d20eb110ba7eda98 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 18 Mar 2023 04:01:46 +
Subject: [PATCH v3] Few optimizations around block references in pg_walinspect

---
 contrib/pg_walinspect/pg_walinspect.c | 29 +++
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 3b3215daf5..c3ed31db8f 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -199,9 +199,12 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 	initStringInfo(_desc);
 	desc.rm_desc(_desc, record);
 
-	/* Block references. */
-	initStringInfo(_blk_ref);
-	XLogRecGetBlockRefInfo(record, false, true, _blk_ref, _len);
+	/* Get block references, if any. */
+	if (XLogRecHasAnyBlockRefs(record))
+	{
+		initStringInfo(_blk_ref);
+		XLogRecGetBlockRefInfo(record, false, true, _blk_ref, _len);
+	}
 
 	main_data_len = XLogRecGetDataLen(record);
 
@@ -215,7 +218,12 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 	values[i++] = UInt32GetDatum(main_data_len);
 	values[i++] = UInt32GetDatum(fpi_len);
 	values[i++] = CStringGetTextDatum(rec_desc.data);
-	values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+
+	/* Output block references, if any. */
+	if (XLogRecHasAnyBlockRefs(record))
+		values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+	else
+		nulls[i++] = true;
 
 	Assert(i == ncols);
 }
@@ -377,6 +385,12 @@ pg_get_wal_block_info(PG_FUNCTION_ARGS)
 	while (ReadNextXLogRecord(xlogreader) &&
 		   xlogreader->EndRecPtr <= end_lsn)
 	{
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get block references, if any, otherwise continue. */
+		if (!XLogRecHasAnyBlockRefs(xlogreader))
+			continue;
+
 		/* Use the tmp context so we can clean up after each tuple is done */
 		old_cxt = MemoryContextSwitchTo(tmp_cxt);
 
@@ -385,8 +399,6 @@ pg_get_wal_block_info(PG_FUNCTION_ARGS)
 		/* clean up and switch back */
 		MemoryContextSwitchTo(old_cxt);
 		MemoryContextReset(tmp_cxt);
-
-		CHECK_FOR_INTERRUPTS();
 	}
 
 	MemoryContextDelete(tmp_cxt);
@@ -483,8 +495,6 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
 #define PG_GET_WAL_RECORDS_INFO_COLS 11
 	XLogReaderState *xlogreader;
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-	Datum		values[PG_GET_WAL_RECORDS_INFO_COLS] = {0};
-	bool		nulls[PG_GET_WAL_RECORDS_INFO_COLS] = {0};
 	MemoryContext old_cxt;
 	MemoryContext tmp_cxt;
 
@@ -501,6 +511,9 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
 	while (ReadNextXLogRecord(xlogreader) &&
 		   xlogreader->EndRecPtr <= end_lsn)
 	{
+		Datum		values[PG_GET_WAL_RECORDS_INFO_COLS] = {0};
+		bool		

Re: BF mamba failure

2023-03-17 Thread Tom Lane
Amit Kapila  writes:
> Peter Smith has recently reported a BF failure [1]. AFAICS, the call
> stack of failure [2] is as follows:

Note the assertion report a few lines further up:

TRAP: failed Assert("pg_atomic_read_u32(_ref->shared_entry->refcount) == 
0"), File: "pgstat_shmem.c", Line: 560, PID: 25004

regards, tom lane




BF mamba failure

2023-03-17 Thread Amit Kapila
Hi,

Peter Smith has recently reported a BF failure [1]. AFAICS, the call
stack of failure [2] is as follows:

0x1e66644  at postgres
0x1d0143c  at postgres
0x1d02534  at postgres
0x1cfb120  at postgres
0x1cfd590  at postgres
0x1cfbfc0  at postgres
0x1ca7b08  at postgres
0x1ca7c74  at postgres
0x1ca7d2c  at postgres
0x1cdf060  at postgres
0x1c203f4  at postgres
0x1c2161c  at postgres
0x1edcf94  at postgres

I couldn't correlate it to the recent commits. Any thoughts?

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com
[2] -
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-03-17%2005%3A36%3A10

-- 
With Regards,
Amit Kapila.




Re: Amcheck verification of GiST and GIN

2023-03-17 Thread Andrey Borodin
Hi Peter,

Thanks for the feedback! I'll work on it during the weekend.

On Thu, Mar 16, 2023 at 6:23 PM Peter Geoghegan  wrote:
>
> existence of a "same" routine hints at some confusion about "equality
> versus equivalence" issues.

Hmm...yes, actually, GiST deals with floats routinely. And there might
be some sorts of NaNs and Infs that are equal, but not binary
equivalent.
I'll think more about it.

gist_get_adjusted() calls "same" routine, which for type point will
use FPeq(double A, double B). And this might be kind of a corruption
out of the box. Because it's an epsilon-comparison, ε=1.0E-06.
GiST might miss newly inserted data, because the "adjusted" tuple was
"same" if data is in proximity of 0.01 of any previously indexed
point, but out of known MBRs.
I'll try to reproduce this tomorrow, so far no luck.


Best regards, Andrey Borodin.




Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Michael Paquier
On Fri, Mar 17, 2023 at 06:09:05PM -0700, Peter Geoghegan wrote:
>> This said, your point about having rec_blk_ref reported as an empty
>> string rather than NULL if there are no block references does not feel
>> natural to me, either..  Reporting NULL would be better.
> 
> You have it backwards. It outputs an empty string right now. I want to
> change that, so that it outputs NULLs instead.

My previous paragraph means exactly the same?  I have just read what I
wrote again.  Twice.  So, yes, I agree with this point.  Sorry if my
words meant the contrary to you.  :)
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Peter Geoghegan
On Fri, Mar 17, 2023 at 5:51 PM Michael Paquier  wrote:
> Yes.  The CPU cost is one thing, but I am also worrying about the
> I/O cost with a tuplestore spilling to disk a large number of FPIs,
> and some workloads can generate WAL so as FPIs is what makes for most
> of the contents stored in the WAL.  (wal_compression is very effective
> in such cases, for example.)
>
> It is true that it is possible to tweak SQLs that exactly do that with
> a large amount of data materialized, or just eat so much CPU that they
> basically DoS the backend.  Still I'd rather keep a minimalistic
> design for each function with block_info having only one field able to
> track back to which record a block information refers to, and I'd like
> to think one able to look at WAL internals will be smart enough to
> write SQL in such a way that they avoid that on a production machine.
> The current design allows to do that in this view, but that's just one
> way I see how to represent structures at SQL level.

Not really. It has nothing to do with some abstract ideal about how
the data should be logically structured. It is about how the actual
underlying physical data structures work, and are accessed in
practice, during query execution. And its about the constraints placed
on us by the laws of physics. Some ways of doing this are measurably,
provably much faster than other ways. It's very much not like we're
querying tables whose general structure is under our control, via
schema design, where the optimizer could reasonably be expected to
make better choices as the data distribution changes. So why treat it
like that?

Right now, you're still basically standing by a design that is
*fundamentally* less efficient for certain types of queries -- queries
that I am very interested in, that I'm sure many of us will be
interested in. It's not a matter of opinion. It is very much in
evidence from Bharath's analysis. If a similar analysis reached the
opposite conclusion, then you would be right and I would be wrong. It
really is that simple.

> This said, your point about having rec_blk_ref reported as an empty
> string rather than NULL if there are no block references does not feel
> natural to me, either..  Reporting NULL would be better.

You have it backwards. It outputs an empty string right now. I want to
change that, so that it outputs NULLs instead.

-- 
Peter Geoghegan




Re: pg_dump versus hash partitioning

2023-03-17 Thread Julien Rouhaud
On Fri, Mar 17, 2023 at 01:44:12PM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote:
> >> I think the odds of that yielding a usable dump are nil, so I don't
> >> see why we should bother.
>
> > No objection from me.
>
> OK, pushed with the discussed changes.

Great news, thanks a lot!




Re: meson issue? ninja clean doesn't drop queryjumblefuncs.funcs.c

2023-03-17 Thread Michael Paquier
On Fri, Mar 17, 2023 at 09:11:53PM +0100, Pavel Stehule wrote:
> and queryjumblefuncs.switch.c files.

Let me see..  It looks like src/include/nodes/meson.build is just
missing a refresh.  Will check and fix, thanks for the report!
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Michael Paquier
On Fri, Mar 17, 2023 at 04:36:58PM -0700, Peter Geoghegan wrote:
> I'm sure that they will do that much more than they would have
> otherwise. Since we'll have made pg_get_wal_block_info() so much more
> useful than pg_get_wal_records_info() for many important use cases.
> Why is that a bad thing? Are you concerned about the overhead of
> pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's
> patch is committed? That could be a problem, I suppose -- but it would
> be good to get more data on that. Do you think that this will be much
> of an issue, Bharath?

Yes.  The CPU cost is one thing, but I am also worrying about the
I/O cost with a tuplestore spilling to disk a large number of FPIs,
and some workloads can generate WAL so as FPIs is what makes for most
of the contents stored in the WAL.  (wal_compression is very effective
in such cases, for example.)

It is true that it is possible to tweak SQLs that exactly do that with
a large amount of data materialized, or just eat so much CPU that they
basically DoS the backend.  Still I'd rather keep a minimalistic
design for each function with block_info having only one field able to
track back to which record a block information refers to, and I'd like
to think one able to look at WAL internals will be smart enough to
write SQL in such a way that they avoid that on a production machine.
The current design allows to do that in this view, but that's just one
way I see how to represent structures at SQL level.  Extending
block_info() with more record-level attributes allows that as well,
still it bloats its interface unnecessarily.  Designing software is
hard, and it looks like our point of view on that is different.  If
you wish to change the current interface of block_info, feel free to
do so.  It does not mean that it cannot be changed, just that I
recommend not to do that, and that's just one opinion.

This said, your point about having rec_blk_ref reported as an empty
string rather than NULL if there are no block references does not feel
natural to me, either..  Reporting NULL would be better.
--
Michael


signature.asc
Description: PGP signature


Re: gcc 13 warnings

2023-03-17 Thread Andres Freund
Hi,

On 2023-03-17 09:06:05 +0100, Peter Eisentraut wrote:
> AFAICT, the default for meson is buildtype=debug, which is -O0.  The -O3
> comes from meson.build setting buildtype=release.

Right - my point about -O3 was just that buildtype=release defaults to it.


> I think a good compromise would be buildtype=debugoptimized, which is -O2
> with debug symbols, which also sort of matches the default in the autoconf
> world.

Looks like that'd result in a slightly worse build with msvc, as afaict we
wouldn't end up with /OPT:REF doesn't get specified, which automatically gets
disabled if /DEBUG is specified. I guess we can live with that.

Greetings,

Andres Freund




Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Peter Geoghegan
On Fri, Mar 17, 2023 at 4:11 PM Michael Paquier  wrote:
> FWIW, I am not sure that it is a good idea and that we'd better not
> encourage too much the use of block_info() across a large range of
> WAL, which is what this function will make users eager to do in this
> case as it is possible to apply directly more filters to it.

I'm sure that they will do that much more than they would have
otherwise. Since we'll have made pg_get_wal_block_info() so much more
useful than pg_get_wal_records_info() for many important use cases.
Why is that a bad thing? Are you concerned about the overhead of
pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's
patch is committed? That could be a problem, I suppose -- but it would
be good to get more data on that. Do you think that this will be much
of an issue, Bharath?

I have pushed pg_walinspect to its limits myself (which is how I found
that memory leak). Performance matters a great deal when you're doing
an analysis of how blocks change over time, on a system that has
written a realistically large amount of WAL over minutes or even
hours. Why shouldn't that be a priority for pg_walinspect? My concerns
have little to do with aesthetics, and everything to do with making
those kinds of queries feasible.

If the FPI thing is a problem then it seems to me that it should be
addressed directly. For example, perhaps it would make sense to add a
way to not incur the overhead of decompressing FPIs uselessly in cases
where they're of no interest to us (likely the majority of cases once
the patch is committed). It also might well make sense to rename
pg_get_wal_block_info() to something more general, to reflect its more
general purpose once it is expanded by Bharath's patch. As I said, it
will become a lot closer to pg_get_wal_records_info(). We should be
clear on that.

-- 
Peter Geoghegan




Re: Data is copied twice when specifying both child and parent table in publication

2023-03-17 Thread Jacob Champion
On Thu, Mar 16, 2023 at 11:28 PM wangw.f...@fujitsu.com
 wrote:
> Attach the new patch set.

Hi,

I ran into this problem while hacking on [1], so thank you for tackling
it! I have no strong opinions on the implementation itself; I just want
to register a concern that the tests have not kept up with the
implementation complexity.

For example, the corner case mentioned in 0003, with multiple
publications having conflicting pubviaroot settings, isn't tested as far
as I can see. (I checked manually, and it appears to work as intended.)
And the related pub_lower_level test currently only covers the case
where multiple publications have pubviaroot=true, so the following test
comment is now misleading:

> # for tab4, we publish changes through the "middle" partitioned table
> $node_publisher->safe_psql('postgres',
>   "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH 
> (publish_via_partition_root = true)"
> );

...since the changes are now in fact published via the tab4 root after
this patchset is applied.

> I think the aim of joining it with pg_publication before is to exclude
> non-existing publications. Otherwise, we would get an error because of the 
> call
> to function GetPublicationByName (with 'missing_ok = false') in function
> pg_get_publication_tables.

In the same vein, I don't think that case is covered anywhere.

There are a bunch of moving parts and hidden subtleties here, and I fell
into a few traps when I was working on my patch, so it'd be nice to have
additional coverage. I'm happy to contribute effort in that area if it's
helpful.

Thanks!
--Jacob

[1] https://postgr.es/m/dc57f088-039b-7a71-8f4c-082ef106246e%40timescale.com




Re: Should we remove vacuum_defer_cleanup_age?

2023-03-17 Thread Magnus Hagander
On Sat, Mar 18, 2023 at 12:09 AM Andres Freund  wrote:

> Hi,
>
> As evidenced by the bug fixed in be504a3e974, vacuum_defer_cleanup_age is
> not
> heavily used - the bug was trivial to hit as soon as
> vacuum_defer_cleanup_age
> is set to a non-toy value. It complicates thinking about visibility
> horizons
> substantially, as vacuum_defer_cleanup_age can make them go backward
> substantially. Obviously it's also severely undertested.
>
> I started writing a test for vacuum_defer_cleanup_age while working on the
> fix
> referenced above, but now I am wondering if said energy would be better
> spent
> removing vacuum_defer_cleanup_age alltogether.
>
> vacuum_defer_cleanup_age was added as part of hot standby. Back then we did
> not yet have hot_standby_feedback. Now that that exists,
> vacuum_defer_cleanup_age doesn't seem like a good idea anymore. It's
> pessimisistic, i.e. always retains rows, even if none of the standbys has
> an
> old enough snapshot.
>
> The only benefit of vacuum_defer_cleanup_age over hot_standby_feedback is
> that
> it provides a limit of some sort. But transactionids aren't producing dead
> rows in a uniform manner, so limiting via xid isn't particularly useful.
> And
> even if there are use cases, it seems those would be served better by
> introducing a cap on how much hot_standby_feedback can hold the horizon
> back.
>
> I don't think I have the cycles to push this through in the next weeks,
> but if
> we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a
> good idea to mark it as deprecated in 16?
>

+1. I haven't seen any (correct) use of this in many many years on my end
at least.

And yes, having a cap on hot_standby_feedback would also be great.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Remove last traces of SCM credential auth from libpq?

2023-03-17 Thread Michael Paquier
On Fri, Mar 17, 2023 at 09:30:32AM +0900, Michael Paquier wrote:
> KRB4 was switched in a159ad3 back in 2005, and KRB5 in 98de86e back in
> 2014 (deprecated in 8.3, so that's even older than creds).  So yes,
> that could be removed as well, I guess, falling back to the default
> error message.

This seems like something worth a thread of its own, will send a
patch.

>> Nah, I see no reason to wait.  We already dropped the higher-level
>> client support (psql/pg_dump) for these server versions in v15.
> 
> Okay.  I'll clean up this part today, then.

I got around to do that with 98ae2c8.
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Michael Paquier
On Fri, Mar 17, 2023 at 12:36:04PM -0700, Peter Geoghegan wrote:
> I think that this direction makes a lot of sense. Under this scheme,
> we still have pg_get_wal_records_info(), which is more or less an SQL
> interface to the information that pg_waldump presents by default.
> That's important when the record view of things is of primary
> importance. But now you also have a "block oriented" view of WAL
> presented by pg_get_wal_block_info(), which is useful when particular
> blocks are of primary interest. I think that I'll probably end up
> using both, while primarily using pg_get_wal_block_info() for more
> advanced analysis that focuses on what happened to particular blocks
> over time.

FWIW, I am not sure that it is a good idea and that we'd better not
encourage too much the use of block_info() across a large range of
WAL, which is what this function will make users eager to do in this
case as it is possible to apply directly more filters to it.  This is
a community, so, anyway, if you feel strongly about doing this change,
feel free to.
--
Michael


signature.asc
Description: PGP signature


Should we remove vacuum_defer_cleanup_age?

2023-03-17 Thread Andres Freund
Hi,

As evidenced by the bug fixed in be504a3e974, vacuum_defer_cleanup_age is not
heavily used - the bug was trivial to hit as soon as vacuum_defer_cleanup_age
is set to a non-toy value. It complicates thinking about visibility horizons
substantially, as vacuum_defer_cleanup_age can make them go backward
substantially. Obviously it's also severely undertested.

I started writing a test for vacuum_defer_cleanup_age while working on the fix
referenced above, but now I am wondering if said energy would be better spent
removing vacuum_defer_cleanup_age alltogether.

vacuum_defer_cleanup_age was added as part of hot standby. Back then we did
not yet have hot_standby_feedback. Now that that exists,
vacuum_defer_cleanup_age doesn't seem like a good idea anymore. It's
pessimisistic, i.e. always retains rows, even if none of the standbys has an
old enough snapshot.

The only benefit of vacuum_defer_cleanup_age over hot_standby_feedback is that
it provides a limit of some sort. But transactionids aren't producing dead
rows in a uniform manner, so limiting via xid isn't particularly useful. And
even if there are use cases, it seems those would be served better by
introducing a cap on how much hot_standby_feedback can hold the horizon back.

I don't think I have the cycles to push this through in the next weeks, but if
we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a
good idea to mark it as deprecated in 16?

Greetings,

Andres Freund




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Andres Freund
Hi,

On 2023-03-17 12:25:14 -0400, Andrew Dunstan wrote:
> On 2023-03-17 Fr 10:08, Daniel Gustafsson wrote:
> > > If we are going to keep this as a separate package, then we should put 
> > > some code in the constructor to prevent it being called from elsewhere 
> > > than the Cluster package. e.g.
> > > 
> > >  # this constructor should only be called from 
> > > PostgreSQL::Test::Cluster
> > >  my ($package, $file, $line) = caller;
> > >  die "Forbidden caller of constructor: package: $package, file: 
> > > $file:$line"
> > >unless $package eq 'PostgreSQL::Test::Cluster';
> > I don't have strong feelings about where to place this, but Cluster.pm is
> > already quite long so I see a small upside to keeping it separate to not 
> > make
> > that worse.
> > 
> 
> Yeah, I can go along with that.

Cool - I'd prefer a separate file. I do find Cluster.pm somewhat unwieldy at
this point, and I susect that we'll end up with additional helpers around
BackgroundPsql.

Greetings,

Andres Freund




Re: Add n_tup_newpage_upd to pg_stat table views

2023-03-17 Thread Peter Geoghegan
On Fri, Jan 27, 2023 at 3:23 PM Corey Huinker  wrote:
> This patch adds the n_tup_newpage_upd to all the table stat views.

I think that this is pretty close to being committable already.

I'll move on that early next week, barring any objections.

-- 
Peter Geoghegan




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Andrew Dunstan


On 2023-03-17 Fr 14:07, Dagfinn Ilmari Mannsåker wrote:

Andrew Dunstan  writes:


On 2023-03-17 Fr 10:08, Daniel Gustafsson wrote:

Why is $restart_before_query a package/class level value instead of
an instance value? And why can we only ever set it to 1 but not back
again? Maybe we don't want to, but it looks odd.

It was mostly a POC to show what I meant with the functionality.  I think there
should be a way to turn it off (set it to zero) even though I doubt it will be
used much.


A common idiom is to have a composite getter/setter method for object
properties something like this


sub settingname
{
   my ($self, $arg) = @_;
   $self->{settingname} = $arg if defined $arg;
   return $self->{settingname};
}

Or, if undef is a valid value:


 sub settingname
 {
 my $self = shift;
 $self->[settingname} = shift if @_;
 return $self->{settingname};
 }




Yes, I agree that's better (modulo the bracket typo)


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: File descriptors in exec'd subprocesses

2023-03-17 Thread Thomas Munro
I pushed the libpq changes.  I'll leave the pipe2 and accept4 changes
on ice for now, maybe for a later cycle (unlike the committed patches,
they don't currently fix a known problem, they just avoid some
syscalls that are already fairly rare).  For the libpq change, the
build farm seems happy so far.  I was a little worried that there
could be ways that #ifdef SOCK_CLOEXEC could be true for a build that
might encounter a too-old kernel and break, but it looks like you'd
have to go so far back into EOL'd releases that even our zombie build
farm animals have it.  Only macOS and AIX don't have it yet, and this
should be fine with Apple's availability guards, which leaves just
AIX.  (AFAIK headers and kernels are always in sync at the major
version level on AIX, but if not presumably there'd have to be some
similar guard system?)




Re: Add macros for ReorderBufferTXN toptxn

2023-03-17 Thread Peter Smith
The build-farm was OK for the last 18hrs after this push, except there
was one error on mamba [1] in test-decoding-check.

This patch did change the test_decoding.c file, so it seems an
unlikely coincidence, but OTOH the change was very small and I don't
see yet how it could have caused a problem here but nowhere else.

Although, mamba has since passed again since that failure.

Any thoughts?

--
[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-03-17%2005%3A36%3A10

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: logical decoding and replication of sequences, take 2

2023-03-17 Thread Tomas Vondra
On 3/17/23 18:55, Tomas Vondra wrote:
> 
> ...
> 
> This however made me realize the initial sync of sequences may not be
> correct. I mean, the idea of tablesync is syncing the data in REPEATABLE
> READ transaction, and then applying decoded changes. But sequences are
> not transactional in this way - if you select from a sequence, you'll
> always see the latest data, even in REPEATABLE READ.
> 
> I wonder if this might result in losing some of the sequence increments,
> and/or applying them in the wrong order (so that the sequence goes
> backward for a while).
> 

Yeah, I think my suspicion was warranted - it's pretty easy to make the
sequence go backwards for a while by adding a sleep between the slot
creation and the copy_sequence() call, and increment the sequence in
between (enough to do some WAL logging).

The copy_sequence() then reads the current on-disk state (because of the
non-transactional nature w.r.t. REPEATABLE READ), applies it, and then
we start processing the WAL added since the slot creation. But those are
older, so stuff like this happens:

21:52:54.147 CET [35404] WARNING:  copy_sequence 1222 0 1
21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 990 0 1
21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 1023 0 1
21:52:54.163 CET [35404] WARNING:  apply_handle_sequence 1056 0 1
21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1089 0 1
21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1122 0 1
21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1155 0 1
21:52:54.174 CET [35404] WARNING:  apply_handle_sequence 1188 0 1
21:52:54.175 CET [35404] WARNING:  apply_handle_sequence 1221 0 1
21:52:54.898 CET [35402] WARNING:  apply_handle_sequence 1254 0 1

Clearly, for sequences we can't quite rely on snapshots/slots, we need
to get the LSN to decide what changes to apply/skip from somewhere else.
I wonder if we can just ignore the queued changes in tablesync, but I
guess not - there can be queued increments after reading the sequence
state, and we need to apply those. But maybe we could use the page LSN
from the relfilenode - that should be the LSN of the last WAL record.

Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
use to read the sequence state ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add SHELL_EXIT_CODE to psql

2023-03-17 Thread Corey Huinker
On Fri, Mar 10, 2023 at 3:49 PM Tom Lane  wrote:

> I'm okay with adopting bash's rule, but then it should actually match
> bash --- signal N is reported as 128+N, not -N.
>

128+N is implemented.

Nonzero pclose errors of any kind will now overwrite an existing exit_code.

I didn't write a formal test for the signals, but instead tested it
manually:

[310444:16:54:32 EDT] corey=# \! sleep 1000
-- in another window, i found the pid of the sleep command and did a kill
-9 on it
[310444:16:55:15 EDT] corey=# \echo :SHELL_EXIT_CODE
137


137 = 128+9, so that checks out.

The OS-specific regression test has been modified. On Windows systems it
attempts to execute "CON", and on other systems it attempts to execute "/".
These are marginally better than "nosuchcommand" in that they should always
exist on their host OS and could never be a legit executable. I have no
opinion whether the test should live on past the development phase, but if
it does not then the 0001 patch is not needed.
From a5dddedcc7bf654b4b65c14ff7b89b9d83bb5c27 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:43:24 -0400
Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7b23cc80dc..333aaab139 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From fe578661eef7dbe33aec8f4eaa6dca80a1304c9b Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:44:43 -0400
Subject: [PATCH v9 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..8e1c0e3fc8 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return 128 + WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From 251863ddcbc4eecf4fa15e332724acbd3c652478 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 16:46:57 -0400
Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 34 +
 src/test/regress/sql/psql.sql  | 30 +
 6 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..e6e307fd18 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4267,6 +4267,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+  

Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2023-03-17 Thread Dmitry Dolgov
> On Tue, Feb 21, 2023 at 01:02:35PM +0100, David Geier wrote:
> Hi,
>
> On 1/20/23 09:34, David Geier wrote:
> > EXPLAIN ANALYZE for parallel Bitmap Heap Scans currently only reports
> > the number of heap blocks processed by the leader. It's missing the
> > per-worker stats. The attached patch adds that functionality in the
> > spirit of e.g. Sort or Memoize. Here is a simple test case and the
> > EXPLAIN ANALYZE output with and without the patch:
>
> Attached is a rebased version of the patch. I would appreciate someone
> taking a look.
>
> As background: the change doesn't come out of thin air. We repeatedly took
> wrong conclusions in our query analysis because we assumed that the reported
> block counts include the workers.
>
> If no one objects I would also register the patch at the commit fest. The
> patch is passing cleanly on CI.

Thanks for the patch.

The idea sounds reasonable to me, but I have to admit snapshot_and_stats
implementation looks awkward. Maybe it would be better to have a
separate structure field for both stats and snapshot, which will be set
to point to a corresponding place in the shared FAM e.g. when the worker
is getting initialized? shm_toc_allocate mentions BUFFERALIGN to handle
possibility of some atomic operations needing it, so I guess that would
have to be an alignment in this case as well.

Probably another option would be to allocate two separate pieces of
shared memory, which resolves questions like proper alignment, but
annoyingly will require an extra lookup and a new key.




meson issue? ninja clean doesn't drop queryjumblefuncs.funcs.c

2023-03-17 Thread Pavel Stehule
Hi

and queryjumblefuncs.switch.c files.

Regards

Pavel


Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Peter Geoghegan
On Fri, Mar 17, 2023 at 12:20 AM Bharath Rupireddy
 wrote:
> +1 for pg_get_wal_block_info emitting per-record WAL info too along
> with block info, attached v2 patch does that. IMO, usability wins the
> race here.

I think that this direction makes a lot of sense. Under this scheme,
we still have pg_get_wal_records_info(), which is more or less an SQL
interface to the information that pg_waldump presents by default.
That's important when the record view of things is of primary
importance. But now you also have a "block oriented" view of WAL
presented by pg_get_wal_block_info(), which is useful when particular
blocks are of primary interest. I think that I'll probably end up
using both, while primarily using pg_get_wal_block_info() for more
advanced analysis that focuses on what happened to particular blocks
over time.

It makes sense to present pg_get_wal_block_info() immediately after
pg_get_wal_records_info() in the documentation under this scheme,
since they're closely related. It would make sense to explain the
relationship directly: pg_get_wal_block_info() doesn't have the
block_ref column because it breaks that same information out by block
instead, occasionally showing multiple rows for particular record
types (which is what its "extra" columns describe). And,
pg_get_wal_block_info() won't show anything for those records whose
block_ref column is null according to pg_get_wal_records_info(), such
as commit records.

(Checks pg_walinspect once more...)

Actually, I now see that block_ref won't be NULL for those records
that have no block references at all -- it just outputs an empty
string. But wouldn't it be better if it actually output NULL? Better
for its own sake, but also better because doing so enables describing
the relationship between the two functions with reference to
block_ref. It seems particularly helpful to me to be able to say that
pg_get_wal_block_info() doesn't show anything for precisely those WAL
records whose block_ref is NULL according to
pg_get_wal_records_info().

-- 
Peter Geoghegan




Re: improving user.c error messages

2023-03-17 Thread Nathan Bossart
On Fri, Mar 17, 2023 at 10:40:06AM +0100, Peter Eisentraut wrote:
> committed

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: ICU 54 and earlier are too dangerous

2023-03-17 Thread Jeff Davis
On Tue, 2023-03-14 at 08:48 -0700, Jeff Davis wrote:
> Actually, now that I think about it, we could just search all known
> locales using either ucol_getAvailable() or uloc_getAvailable(), and
> see if there's a match. Not very clean, but it should catch most
> problems. I'll look into whether there's a reasonable way to match or
> not.

I posted a patch to do this as 0006 in the series here:

https://www.postgresql.org/message-id/9afa6dbe0d31053ad265aeba488fde784fd5b7ab.ca...@j-davis.com

Regards,
Jeff Davis





Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Dagfinn Ilmari Mannsåker
Andrew Dunstan  writes:

> On 2023-03-17 Fr 10:08, Daniel Gustafsson wrote:
>>> Why is $restart_before_query a package/class level value instead of
>>> an instance value? And why can we only ever set it to 1 but not back
>>> again? Maybe we don't want to, but it looks odd.
>> It was mostly a POC to show what I meant with the functionality.  I think 
>> there
>> should be a way to turn it off (set it to zero) even though I doubt it will 
>> be
>> used much.
>
>
> A common idiom is to have a composite getter/setter method for object
> properties something like this
>
>
>sub settingname
>{
>   my ($self, $arg) = @_;
>   $self->{settingname} = $arg if defined $arg;
>   return $self->{settingname};
>}

Or, if undef is a valid value:


sub settingname
{
my $self = shift;
$self->[settingname} = shift if @_;
return $self->{settingname};
}

- ilmari




Re: ICU locale validation / canonicalization

2023-03-17 Thread Jeff Davis
On Wed, 2023-03-15 at 15:18 -0700, Jeff Davis wrote:
> I left out the validation patch for now, and I'm evaluating a
> different
> approach that will attempt to match to the locales retrieved with
> uloc_countAvailable()/uloc_getAvailable().

I like this approach, attached new patch series with that included as
0006.

The first 3 patches are essentially bugfixes -- should they be
backported?


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 39cb2f32e087ff95da42c67a7e9b147ceca994af Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 14 Mar 2023 09:58:29 -0700
Subject: [PATCH v6 1/6] Support language tags in older ICU versions (53 and
 earlier).

By calling uloc_canonicalize() before parsing the attributes, the
existing locale attribute parsing logic works on language tags as
well.

Fix a small memory leak, too.

Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.ca...@j-davis.com
---
 src/backend/commands/collationcmds.c  |  8 +++---
 src/backend/utils/adt/pg_locale.c | 26 ---
 .../regress/expected/collate.icu.utf8.out |  8 ++
 src/test/regress/sql/collate.icu.utf8.sql |  4 +++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8949684afe..b8f2e7059f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -950,7 +950,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *iculocstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -959,20 +958,19 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))
 continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 	 nspid, GetUserId(),
 	 COLLPROVIDER_ICU, true, -1,
-	 NULL, NULL, iculocstr, NULL,
-	 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
+	 NULL, NULL, langtag, NULL,
+	 get_collation_actual_version(COLLPROVIDER_ICU, langtag),
 	 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1d3d4d86d3..b9c7fbd511 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2643,9 +2643,28 @@ pg_attribute_unused()
 static void
 icu_set_collation_attributes(UCollator *collator, const char *loc)
 {
-	char	   *str = asc_tolower(loc, strlen(loc));
+	UErrorCode	status;
+	int32_t		len;
+	char	   *icu_locale_id;
+	char	   *lower_str;
+	char	   *str;
 
-	str = strchr(str, '@');
+	/* first, make sure the string is an ICU format locale ID */
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, NULL, 0, );
+	icu_locale_id = palloc(len + 1);
+	status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, icu_locale_id, len + 1, );
+	if (U_FAILURE(status))
+		ereport(ERROR,
+(errmsg("canonicalization failed for locale string \"%s\": %s",
+		loc, u_errorName(status;
+
+	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
+
+	pfree(icu_locale_id);
+
+	str = strchr(lower_str, '@');
 	if (!str)
 		return;
 	str++;
@@ -2660,7 +2679,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 			char	   *value;
 			UColAttribute uattr;
 			UColAttributeValue uvalue;
-			UErrorCode	status;
 
 			status = U_ZERO_ERROR;
 
@@ -2727,6 +2745,8 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 loc, u_errorName(status;
 		}
 	}
+
+	pfree(lower_str);
 }
 
 #endif			/* USE_ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9a3e12e42d..6225b575ce 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1304,6 +1304,14 @@ SELECT 'abc' <= 'ABC' COLLATE case_insensitive, 'abc' >= 'ABC' COLLATE case_inse
  t| t
 (1 row)
 
+-- test language tags
+CREATE COLLATION lt_insensitive (provider = icu, locale = 'en-u-ks-level1', deterministic = false);
+SELECT 'aBcD' COLLATE lt_insensitive = 'AbCd' COLLATE lt_insensitive;
+ ?column? 
+--
+ t
+(1 row)
+
 CREATE TABLE test1cs (x text COLLATE case_sensitive);
 CREATE TABLE test2cs (x text COLLATE case_sensitive);
 CREATE TABLE test3cs (x text COLLATE case_sensitive);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 0790068f31..64cbfd0a5b 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -518,6 +518,10 @@ CREATE 

Re: logical decoding and replication of sequences, take 2

2023-03-17 Thread Tomas Vondra



On 3/17/23 06:53, John Naylor wrote:
> On Wed, Mar 15, 2023 at 7:51 PM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
>>
>>
>>
>> On 3/14/23 08:30, John Naylor wrote:
>> > I tried a couple toy examples with various combinations of use styles.
>> >
>> > Three with "automatic" reading from sequences:
>> >
>> > create table test(i serial);
>> > create table test(i int GENERATED BY DEFAULT AS IDENTITY);
>> > create table test(i int default nextval('s1'));
>> >
>> > ...where s1 has some non-default parameters:
>> >
>> > CREATE SEQUENCE s1 START 100 MAXVALUE 100 INCREMENT BY -1;
>> >
>> > ...and then two with explicit use of s1, one inserting the 'nextval'
>> > into a table with no default, and one with no table at all, just
>> > selecting from the sequence.
>> >
>> > The last two seem to work similarly to the first three, so it seems like
>> > FOR ALL TABLES adds all sequences as well. Is that expected?
>>
>> Yeah, that's a bug - we shouldn't replicate the sequence changes, unless
>> the sequence is actually added to the publication. I tracked this down
>> to a thinko in get_rel_sync_entry() which failed to check the object
>> type when puballtables or puballsequences was set.
>>
>> Attached is a patch fixing this.
> 
> Okay, I can verify that with 0001-0006, sequences don't replicate unless
> specified. I do see an additional change that doesn't make sense: On the
> subscriber I no longer see a jump to the logged 32 increment, I see the
> very next value:
> 
> # alter system set wal_level='logical';
> # port  is subscriber
> 
> echo
> echo "PUB:"
> psql -c "drop table if exists test;"
> psql -c "drop publication if exists pub1;"
> 
> echo
> echo "SUB:"
> psql -p  -c "drop table if exists test;"
> psql -p  -c "drop subscription if exists sub1 ;"
> 
> echo
> echo "PUB:"
> psql -c "create table test(i int GENERATED BY DEFAULT AS IDENTITY);"
> psql -c "CREATE PUBLICATION pub1 FOR ALL TABLES;"
> psql -c "CREATE PUBLICATION pub2 FOR ALL SEQUENCES;"
> 
> echo
> echo "SUB:"
> psql -p  -c "create table test(i int GENERATED BY DEFAULT AS IDENTITY);"
> psql -p  -c "CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
> dbname=postgres application_name=sub1 port=5432' PUBLICATION pub1;"
> psql -p  -c "CREATE SUBSCRIPTION sub2 CONNECTION 'host=localhost
> dbname=postgres application_name=sub2 port=5432' PUBLICATION pub2;"
> 
> echo
> echo "PUB:"
> psql -c "insert into test default values;"
> psql -c "insert into test default values;"
> psql -c "select * from test;"
> psql -c "select * from test_i_seq;"
> 
> sleep 1
> 
> echo
> echo "SUB:"
> psql -p  -c "select * from test;"
> psql -p  -c "select * from test_i_seq;"
> 
> psql -p  -c "drop subscription sub1 ;"
> psql -p  -c "drop subscription sub2 ;"
> 
> psql -p  -c "insert into test default values;"
> psql -p  -c "select * from test;"
> psql -p  -c "select * from test_i_seq;"
> 
> The last two queries on the subscriber show:
> 
>  i
> ---
>  1
>  2
>  3
> (3 rows)
> 
>  last_value | log_cnt | is_called
> +-+---
>           3 |      30 | t
> (1 row)
> 
> ...whereas before with 0001-0003 I saw:
> 
>  i  
> 
>   1
>   2
>  34
> (3 rows)
> 
>  last_value | log_cnt | is_called
> +-+---
>          34 |      32 | t
> 

Oh, this is a silly thinko in how sequences are synced at the beginning
(or maybe a combination of two issues).

fetch_sequence_data() simply runs a select from the sequence

SELECT last_value, log_cnt, is_called

but that's wrong, because that's the *current* state of the sequence, at
the moment it's initially synced. We to make this "correct" with respect
to the decoding, we'd need to deduce what was the last WAL record, so
something like

last_value += log_cnt + 1

That should produce 34 again.

FWIW the older patch has this issue too, I believe the difference is
merely due to a slightly different timing between the sync and decoding
the first insert. If you insert a sleep after the CREATE SUBSCRIPTION
commands, it should disappear.


This however made me realize the initial sync of sequences may not be
correct. I mean, the idea of tablesync is syncing the data in REPEATABLE
READ transaction, and then applying decoded changes. But sequences are
not transactional in this way - if you select from a sequence, you'll
always see the latest data, even in REPEATABLE READ.

I wonder if this might result in losing some of the sequence increments,
and/or applying them in the wrong order (so that the sequence goes
backward for a while).


>> > The documentation for CREATE PUBLICATION mentions sequence options,
>> > but doesn't really say how these options should be used.
>> Good point. The idea is that we handle tables and sequences the same
>> way, i.e. if you specify 'sequence' then we'll replicate increments for
>> sequences explicitly added to the publication.
>>
>> If this is not clear, the docs may need some improvements.
> 
> 

Re: pg_dump versus hash partitioning

2023-03-17 Thread Tom Lane
Julien Rouhaud  writes:
> On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote:
>> I think the odds of that yielding a usable dump are nil, so I don't
>> see why we should bother.

> No objection from me.

OK, pushed with the discussed changes.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2023-03-17 Thread Tomas Vondra
On 3/17/23 16:43, gkokola...@pm.me wrote:
>>
>> ...
>>
>> I agree it's cleaner the way you did it.
>>
>> I was thinking that with each compression function handling error
>> internally, the callers would not need to do that. But I haven't
>> realized there's logic to detect ENOSPC and so on, and we'd need to
>> duplicate that in every compression func.
>>
> 
> If you agree, I can prepare a patch to improve on the error handling
> aspect of the API as a separate thread, since here we are trying to
> focus on correctness.
> 

Yes, that makes sense. There are far too many patches in this thread
already ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Memory leak from ExecutorState context?

2023-03-17 Thread Tomas Vondra


On 3/17/23 09:18, Jehan-Guillaume de Rorthais wrote:
> Hi there,
> 
> On Fri, 10 Mar 2023 19:51:14 +0100
> Jehan-Guillaume de Rorthais  wrote:
> 
>>> So I guess the best thing would be to go through these threads, see what
>>> the status is, restart the discussion and propose what to do. If you do
>>> that, I'm happy to rebase the patches, and maybe see if I could improve
>>> them in some way.  
>>
>> [...]
>>
>>> I was hoping we'd solve this by the BNL, but if we didn't get that in 4
>>> years, maybe we shouldn't stall and get at least an imperfect stop-gap
>>> solution ...  
>>
>> Indeed. So, to sum-up:
>>
>> * Patch 1 could be rebased/applied/backpatched
> 
> Would it help if I rebase Patch 1 ("move BufFile stuff into separate 
> context")?
> 

Yeah, I think this is something we'd want to do. It doesn't change the
behavior, but it makes it easier to track the memory consumption etc.

>> * Patch 2 is worth considering to backpatch
> 

I'm not quite sure what exactly are the numbered patches, as some of the
threads had a number of different patch ideas, and I'm not sure which
one was/is the most promising one.

IIRC there were two directions:

a) "balancing" i.e. increasing work_mem to minimize the total memory
consumption (best effort, but allowing exceeding work_mem)

b) limiting the number of BufFiles, and combining data from "future"
batches into a single file

I think the spilling is "nicer" in that it actually enforces work_mem
more strictly than (a), but we should probably spend a bit more time on
the exact spilling strategy. I only really tried two trivial ideas, but
maybe we can be smarter about allocating / sizing the files? Maybe
instead of slices of constant size we could/should make them larger,
similarly to what log-structured storage does.

For example we might double the number of batches a file represents, so
the first file would be for batch 1, the next one for batches 2+3, then
4+5+6+7, then 8-15, then 16-31, ...

We should have some model calculating (i) amount of disk space we would
need for the spilled files, and (ii) the amount of I/O we do when
writing the data between temp files.

> Same question.
> 
>> * Patch 3 seemed withdrawn in favor of BNLJ
>> * Patch 4 is waiting for some more review and has some TODO
>> * discussion 5 worth few minutes to discuss before jumping on previous topics
> 
> These other patches needs more discussions and hacking. They have a low
> priority compare to other discussions and running commitfest. However, how can
> avoid losing them in limbo again?
> 

I think focusing on the backpatchability is not particularly good
approach. It's probably be better to fist check if there's any hope of
restarting the work on BNLJ, which seems like a "proper" solution for
the future.

It getting that soon (in PG17) is unlikely, let's revive the rebalance
and/or spilling patches. Imperfect but better than nothing.

And then in the end we can talk about if/what can be backpatched.


FWIW I don't think there's a lot of rush, considering this is clearly a
matter for PG17. So the summer CF at the earliest, people are going to
be busy until then.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Andrew Dunstan


On 2023-03-17 Fr 10:08, Daniel Gustafsson wrote:



A common perl idiom is to start private routine names with an underscore. so 
I'd rename wait_connect to _wait_connect;

There are quite a few routines documented as internal in Cluster.pm which don't
start with an underscore.  Should we change them as well?  I'm happy to prepare
a separate patch to address that if we want that.



Possibly. There are two concerns. First, make sure that they really are 
private. Last time I looked I think I noticed at least one thing that 
was alleged to be private but was called from a TAP script. Second, 
unless we backpatch it there will be some drift between branches, which 
can make backpatching things a bit harder. But by all means prep a patch 
so we can see the scope of the issue.






Why is $restart_before_query a package/class level value instead of an instance 
value? And why can we only ever set it to 1 but not back again? Maybe we don't 
want to, but it looks odd.

It was mostly a POC to show what I meant with the functionality.  I think there
should be a way to turn it off (set it to zero) even though I doubt it will be
used much.



A common idiom is to have a composite getter/setter method for object 
properties something like this



   sub settingname

   {

  my ($self, $arg) = @_;

  $self->{settingname} = $arg if defined $arg;

  return $self->{settingname};

   }





If we are going to keep this as a separate package, then we should put some 
code in the constructor to prevent it being called from elsewhere than the 
Cluster package. e.g.

 # this constructor should only be called from PostgreSQL::Test::Cluster
 my ($package, $file, $line) = caller;
 
 die "Forbidden caller of constructor: package: $package, file: $file:$line"

   unless $package eq 'PostgreSQL::Test::Cluster';

I don't have strong feelings about where to place this, but Cluster.pm is
already quite long so I see a small upside to keeping it separate to not make
that worse.



Yeah, I can go along with that.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Exclusion constraints on partitioned tables

2023-03-17 Thread Paul Jungwirth

On 1/24/23 06:38, Ronan Dunklau wrote:

I've taken a look at the patch, and I'm not sure why you keep the restriction
on the Gist operator being of the RTEqualStrategyNumber strategy. I don't
think  we have any other place where we expect those strategy numbers to
match. For hash it's different, as the hash-equality is the only operator
strategy and as such there is no other way to look at it. Can't we just
enforce partition_operator == exclusion_operator without adding the
RTEqualStrategyNumber for the opfamily into the mix ?


Thank you for taking a look! I did some research on the history of the 
code here, and I think I understand Tom's concern about making sure the 
index uses the same equality operator as the partition. I was confused 
about his remarks about the opfamily, but I agree with you that if the 
operator is the same, we should be okay.


I added the code about RTEqualStrategyNumber because that's what we need 
to find an equals operator when the index is GiST (except if it's using 
an opclass from btree_gist; then it needs to be BTEqual again). But then 
I realized that for exclusion constraints we have already figured out 
the operator (in RelationGetExclusionInfo) and put it in 
indexInfo->ii_ExclusionOps. So we can just compare against that. This 
works whether your index uses btree_gist or not.


Here is an updated patch with that change (also rebased).

I also included a more specific error message. If we find a matching 
column in the index but with the wrong operator, we should say so, and 
not say there is no matching column.


Thanks,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 928a31433a4b8cad25f74017ca96b843aa30e02d Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 23 Nov 2022 14:55:43 -0800
Subject: [PATCH v3] Allow some exclusion constraints on partitions

Previously we only allowed UNIQUE B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something like
&&.
---
 doc/src/sgml/ddl.sgml  | 12 ++--
 src/backend/commands/indexcmds.c   | 66 ---
 src/backend/parser/parse_utilcmd.c |  6 --
 src/test/regress/expected/alter_table.out  | 31 +++--
 src/test/regress/expected/create_table.out | 16 +++--
 src/test/regress/expected/indexing.out | 73 ++
 src/test/regress/sql/alter_table.sql   | 29 +++--
 src/test/regress/sql/create_table.sql  | 13 +++-
 src/test/regress/sql/indexing.sql  | 57 +++--
 9 files changed, 236 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 91c036d1cb..9d3c423ffd 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4233,11 +4233,13 @@ ALTER INDEX measurement_city_id_logdate_key
 
  
   
-   There is no way to create an exclusion constraint spanning the
-   whole partitioned table.  It is only possible to put such a
-   constraint on each leaf partition individually.  Again, this
-   limitation stems from not being able to enforce cross-partition
-   restrictions.
+   Similarly an exclusion constraint must include all the
+   partition key columns. Furthermore the constraint must compare those
+   columns for equality (not e.g. ).
+   Again, this limitation stems from not being able to enforce
+   cross-partition restrictions. The constraint may include additional
+   columns that aren't part of the partition key, and it may compare
+   those with any operators you like.
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e..52d2395daa 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -709,11 +709,6 @@ DefineIndex(Oid relationId,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
 			RelationGetRelationName(rel;
-		if (stmt->excludeOpNames)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
-			RelationGetRelationName(rel;
 	}
 
 	/*
@@ -918,15 +913,16 @@ DefineIndex(Oid relationId,
 		index_check_primary_key(rel, indexInfo, is_alter_table, stmt);
 
 	/*
-	 * If this table is partitioned and we're creating a unique index or a
-	 * primary key, make sure that the partition key is a subset of the
-	 * index's columns.  Otherwise it would be possible to violate uniqueness
-	 * by putting values that ought to be unique in different partitions.
+	 * If this table is partitioned and we're creating a unique index,
+	 * primary key, or exclusion constraint, make sure that the partition key
+	 * is a subset of the index's columns.  Otherwise it would be possible 

Re: Add LZ4 compression in pg_dump

2023-03-17 Thread gkokolatos





--- Original Message ---
On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 3/16/23 18:04, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra 
> > tomas.von...@enterprisedb.com wrote:
> > 
> > > On 3/14/23 16:18, gkokola...@pm.me wrote:
> > > 
> > > > ...> Would you mind me trying to come with a patch to address your 
> > > > points?
> > > 
> > > That'd be great, thanks. Please keep it split into smaller patches - two
> > > might work, with one patch for "cosmetic" changes and the other tweaking
> > > the API error-handling stuff.
> > 
> > Please find attached a set for it. I will admit that the splitting in the
> > series might not be ideal and what you requested. It is split on what
> > seemed as a logical units. Please advice how a better split can look like.
> > 
> > 0001 is unifying types and return values on the API
> > 0002 is addressing the constant definitions
> > 0003 is your previous 0004 adding comments
> 
> 
> Thanks. I think the split seems reasonable - the goal was to not mix
> different changes, and from that POV it works.
> 
> I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I
> mean, gzread/gzwrite returns int, so how does renaming the size_t
> variable solve the issue of negative values for errors? I mean, this
> 
> - size_t ret;
> + size_t gzret;
> 
> - ret = gzread(gzfp, ptr, size);
> + gzret = gzread(gzfp, ptr, size);
> 
> means we still lost the information gzread() returned a negative value,
> no? We'll still probably trigger an error, but it's a bit weird.

You are obviously correct. My bad, I miss-read the return type of gzread().

Please find an amended version attached.

> Unless I'm missing something, if gzread() ever returns -1 or some other
> negative error value, we'll cast it to size_t, while condition will
> evaluate to "true" and we'll happily chew on some random chunk of data.
> 
> So the confusion is (at least partially) a preexisting issue ...
> 
> For gzwrite() it seems to be fine, because that only returns 0 on error.
> OTOH it's defined to take 'int size' but then we happily pass size_t
> values to it.
> 
> As I wrote earlier, this apparently assumes we never need to deal with
> buffers larger than int, and I don't think we have the ambition to relax
> that (I'm not sure it's even needed / possible).

Agreed.


> I see the read/write functions are now defined as int, but we only ever
> return 0/1 from them, and then interpret that as bool. Why not to define
> it like that? I don't think we need to adhere to the custom that
> everything returns "int". This is an internal API. Or if we want to
> stick to int, I'd define meaningful "nice" constants for 0/1.

The return types are now booleans and the callers have been made aware.


> 0002 seems fine to me. I see you've ditched the idea of having two
> separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine
> with me, although I wonder if this might have negative impact on
> performance or something (but I doubt that).
> 

I doubt that too. Thank you.

> 0003 seems fine too.

Thank you.


> > As far as the error handling is concerned, you had said upthread:
> > 
> > > I think the right approach is to handle all library errors and not just
> > > let them through. So Gzip_write() needs to check the return value, and
> > > either call pg_fatal() or translate it to an error defined by the API.
> > 
> > While working on it, I thought it would be clearer and more consistent
> > for the pg_fatal() to be called by the caller of the individual functions.
> > Each individual function can keep track of the specifics of the error
> > internally. Then the caller upon detecting that there was an error by
> > checking the return value, can call pg_fatal() with a uniform error
> > message and then add the specifics by calling the get_error_func().
> 
> 
> I agree it's cleaner the way you did it.
> 
> I was thinking that with each compression function handling error
> internally, the callers would not need to do that. But I haven't
> realized there's logic to detect ENOSPC and so on, and we'd need to
> duplicate that in every compression func.
> 

If you agree, I can prepare a patch to improve on the error handling
aspect of the API as a separate thread, since here we are trying to
focus on correctness.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom eeac82b647dc4021e1dcf22d8cc59840fbde8847 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 17 Mar 2023 15:29:05 +
Subject: [PATCH v3 3/3] Improve compress_lz4 documentation.

Author: Tomas Vondra
---
 src/bin/pg_dump/compress_lz4.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 2f3e552f51..fc2f4e116d 100644
--- 

Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread Önder Kalacı
Hi Shi Yu,

Thanks for the review, really appreciate it!


> I couldn't apply
> v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
> cleanly in v13 and v14. It looks the patch needs some changes in these
> versions.
>
>

> ```
> Checking patch src/backend/executor/execReplication.c...
> Hunk #1 succeeded at 243 (offset -46 lines).
> Hunk #2 succeeded at 263 (offset -46 lines).
> Checking patch src/test/subscription/t/100_bugs.pl...
> error: while searching for:
> $node_publisher->stop('fast');
> $node_subscriber->stop('fast');
>
> done_testing();
>
> error: patch failed: src/test/subscription/t/100_bugs.pl:373
> Applied patch src/backend/executor/execReplication.c cleanly.
> Applying patch src/test/subscription/t/100_bugs.pl with 1 reject...
> Rejected hunk #1.
> ```
>
>
Hmm, interesting, it behaves differently on Macos and linux. Now attaching
new patches that should apply. Can you please try?


Besides, I tried v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch in v12.
> The
> test failed and here's some information.
>
> ```
> Can't locate object method "new" via package "PostgreSQL::Test::Cluster"
> (perhaps you forgot to load "PostgreSQL::Test::Cluster"?) at t/100_bugs.pl
> line 74.
> # Looks like your test exited with 2 just after 1.
> ```
>
> +my $node_publisher_d_cols =
> PostgreSQL::Test::Cluster->new('node_publisher_d_cols');
>
> It seems this usage is not supported in v12 and we should use
> get_new_node()
> like other test cases.
>
>
Thanks for sharing. Fixed


This time I was able to run all the tests with all the patches applied.

Again, the generated column fix also has some minor differences
per version. So, overall we have 6 patches with very minor
differences :)


Thanks,
Onder


v3-0001-Ignore-dropped-columns-REL_11.patch
Description: Binary data


v3-0001-Ignore-dropped-columns-REL_12.patch
Description: Binary data


v3-0001-Ignore-dropped-columns-REL_13-REL_14.patch
Description: Binary data


v3-0001-Ignore-generated-columns-HEAD-REL_15.patch
Description: Binary data


v3-0001-Ignore-generated-columns-REL_14-REL_13.patch
Description: Binary data


v3-0001-Ignore-generated-columns-REL_12.patch
Description: Binary data


Re: Avoid use deprecated Windows Memory API

2023-03-17 Thread Ranier Vilela
Em sex., 17 de mar. de 2023 às 10:58, Aleksander Alekseev <
aleksan...@timescale.com> escreveu:

> Hi,
>
> > Again, MSDN claims to use a new API.
>
> TWIMC the patch rotted a bit and now needs to be updated [1].
> I changed its status to "Waiting on Author" [2].
>
Rebased to latest Head.

best regards,
Ranier Vilela


v2-use-heapalloc-instead-deprecated-localalloc.patch
Description: Binary data


Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Greg Stark
On Fri, 17 Mar 2023 at 10:39, Tom Lane  wrote:
>
> You've listed a lot of small features here that still have time to
> get some love --- it's not like we're hard up against the end of the CF.
> If they'd been in Waiting on Author state for awhile, I'd agree with
> booting them, but not when they're in Needs Review.

Oh, that's exactly my intent -- when I listed them two days ago it was
a list of Waiting on Author patches without updates since March 1. But
I didn't recheck them this morning yet.

If they've gotten comments in the last two days or had their status
updated then great. It's also possible there are threads that aren't
attached to the commitfest or are attached to a related patch that I
may not be aware of.

-- 
greg




Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Justin Pryzby
On Wed, Mar 15, 2023 at 02:29:26PM -0400, Gregory Stark (as CFM) wrote:
> 1) Move it yourself to the next CF (or withdraw it)
> 2) Post to the list with any pending questions asking for specific
> feedback -- it's much more likely to get feedback than just a generic
> "here's a patch plz review"...
> 3) Mark it Ready for Committer and possibly post explaining the
> resolution to any earlier questions to make it easier for a committer
> to understand the state
> 
> If there's still no emails on these at some point I suppose it will
> make sense to move them out of the CF.

>  * Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
> showing metadata ...)

My patch.  I don't care if it's in v1[3456], but I wish it would somehow
progress - idk what else is needed.  I wrote this after Thomas Munro +1
my thinking that it's absurd for pg_ls_tmpdir() to not show [shared]
filesets in tempdirs...

>  * CREATE INDEX CONCURRENTLY on partitioned table

My patch.  I think there's agreement that this patch is ready, except
that it's waiting on the bugfix for the progress reporting patch.  IDK
if there's interest in this, but it'd be a good candidate for v16.

>  * basebackup: support zstd long distance matching

My patch.  No discussion, but I'm hopeful and don't see why this
shouldn't be in v16.

>  * warn if GUC set to an invalid shared library

My patch.  I'm waiting for feedback on 0001, which has gotten no
response.  I moved it.

>  * ALTER TABLE and CLUSTER fail to use a BulkInsertState for toast tables

My patch.  There's been no recent discussion, so I guess I'll postpone
it for v17.

>  * Check consistency of GUC defaults between .sample.conf and 
> pg_settings.boot_val

IDK what's needed to progress this;  If left here, since it will cause
*this* patch to fail if someone else forgets to add a new GUC to the
sample config.  Which is odd.

-- 
Justin




RE: Support logical replication of DDLs

2023-03-17 Thread Takamichi Osumi (Fujitsu)
Hi

On Tuesday, March 14, 2023 1:17 PM Ajin Cherian  wrote:
> I found out that the option ONLY was not parsed in the "CREATE INDEX"
> command, for eg: CREATE UNIQUE INDEX ... ON ONLY table_name ...
> 
> I've fixed this in patch 0002.
FYI, cfbot reports a failure of v80 on linux [1]. Could you please check ?


[17:39:49.745] == running regression test queries
==
[17:39:49.745] test test_ddl_deparse ... ok   27 ms
[17:39:49.745] test create_extension ... ok   60 ms
[17:39:49.745] test create_schema... ok   28 ms
[17:39:49.745] test aggregate... ok   19 ms
[17:39:49.745] test create_table ... FAILED  245 ms
[17:39:49.745] test constraints  ... FAILED  420 ms
[17:39:49.745] test alter_table  ... FAILED  448 ms
[17:39:49.745] == shutting down postmaster   
==
[17:39:49.745] 
[17:39:49.745] ==
[17:39:49.745]  3 of 7 tests failed. 
[17:39:49.745] ==
[17:39:49.745] 
[17:39:49.745] The differences that caused some tests to fail can be viewed in 
the
[17:39:49.745] file 
"/tmp/cirrus-ci-build/src/test/modules/test_ddl_deparse_regress/regression.diffs".
  A copy of the test summary that you see
[17:39:49.745] above is saved in the file 
"/tmp/cirrus-ci-build/src/test/modules/test_ddl_deparse_regress/regression.out".


[1] - https://cirrus-ci.com/task/5420096810647552


Best Regards,
Takamichi Osumi





Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Tom Lane
Greg Stark  writes:
>> These patches that are "Needs Review" and have received no comments at
>> all since before March 1st are these.

Just a couple of comments on ones that caught my eye:

>> * Simplify find_my_exec by using realpath(3)

The problem with this one is that Peter would like it to do something
other than what I think it should do.  Not sure how to resolve that.

>> * Fix assertion failure with next_phase_at in snapbuild.c

This one, and others that are bug fixes, probably deserve more slack.

>> * Periodic burst growth of the checkpoint_req counter on replica.

There is recent discussion of this one no?

>> * Fix ParamPathInfo for union-all AppendPath

I pushed this yesterday.

>> * Add OR REPLACE option for CREATE OPERATOR

I think this one should be flat-out rejected.

>> * Partial aggregates push down

You've listed a lot of small features here that still have time to
get some love --- it's not like we're hard up against the end of the CF.
If they'd been in Waiting on Author state for awhile, I'd agree with
booting them, but not when they're in Needs Review.

>> * Set arbitrary GUC options during initdb

I do indeed intend to push this one on my own authority at some point,
but I'm happy to leave it there for now in case anyone wants to take
another look.

>> * Check lateral references within PHVs for memoize cache keys

I think this one is a bug fix too.

>> * Data is copied twice when specifying both child and parent table in
>> publication

Isn't there active discussion of this one?

>> * Improve pg_bsd_indent's handling of multiline initialization expressions

This is going to get pushed, it's just waiting until the commitfest
settles.  I guess you can move it to the next one if you want, but
that won't accomplish much.

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Aleksander Alekseev
Hi Greg,

> > These patches that are "Needs Review" and have received no comments at
> > all since before March 1st are these. If your patch is amongst this
> > list I would suggest any of:
> >
> > 1) Move it yourself to the next CF (or withdraw it)
> > 2) Post to the list with any pending questions asking for specific
> > feedback -- it's much more likely to get feedback than just a generic
> > "here's a patch plz review"...
> > 3) Mark it Ready for Committer and possibly post explaining the
> > resolution to any earlier questions to make it easier for a committer
> > to understand the state

Sorry for the late reply. It was a busy week. I see several patches I
authored and/or reviewed in the list. I would like to comment on
those.

* Avoid use deprecated Windows Memory API

We can reject or mark as RwF this one due to controversy and the fact
that the patch doesn't currently apply. I poked the author today.

* Clarify the behavior of the system when approaching XID wraparound

This is a wanted [1][see the discussion] and a pretty straightforward
change. I think it should be targeting PG16.

* Compression dictionaries

This one doesn't target PG16. Moved to the next CF.

* Add pg_stat_session

This patch was in good shape last time I checked but other people had
certain questions. The author hasn't replied since Feb 16th. So it's
unlikely to end up in PG6 and I suggest moving it to the next CF,
unless anyone objects.

* ResourceOwner refactoring

IMO this one still has a chance to make it to PG16. Let's keep it in
the CF for now.

Additionally:

* Add 64-bit XIDs into PostgreSQL 16

Is not going to make it to PG16, moving to the next CF.

* Pluggable toaster

The discussion is happening in the "Compression dictionaries" thread
now, since we decided to join our efforts in this area, see the latest
messages. I suggest marking this thread as RwF, unless anyone objects.

[1]: 
https://www.postgresql.org/message-id/CAH2-Wz%3D3mmHST-t9aR5LNkivXC%2B18JD_XC0ht4y5LQBLzq%2Bpsg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: gcc 13 warnings

2023-03-17 Thread Tom Lane
Peter Eisentraut  writes:
> On 16.03.23 19:11, Andres Freund wrote:
>> So I just elected to leave it at the default for meson.

> AFAICT, the default for meson is buildtype=debug, which is -O0.  The -O3 
> comes from meson.build setting buildtype=release.

> I think a good compromise would be buildtype=debugoptimized, which is 
> -O2 with debug symbols, which also sort of matches the default in the 
> autoconf world.

That sounds promising.

> At least during the transition phase I would prefer having the same 
> default optimization level in both build systems, mainly because of how 
> this affects warnings.

I'd prefer sticking to -O2 mainly because of the risk of new bugs.
The meson conversion is a big enough job without adding "harden
Postgres against -O3" to the list of tasks that must be accomplished.
We can take that on in due time, but let's keep it separate.

regards, tom lane




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Daniel Gustafsson
> On 17 Mar 2023, at 14:48, Andrew Dunstan  wrote:
> On 2023-03-17 Fr 05:48, Daniel Gustafsson wrote:
>>> On 15 Mar 2023, at 02:03, Andres Freund 
>>>  wrote:
>>> 
 Returning a hash seems like a worse option since it will complicate 
 callsites
 which only want to know success/failure.
 
>>> Yea. Perhaps it's worth having a separate function for this? ->query_rc() 
>>> or such?
>>> 
>> If we are returning a hash then I agree it should be a separate function.
>> Maybe Andrew has input on which is the most Perl way of doing this.
> 
> I think the perlish way is use the `wantarray` function. Perl knows if you're 
> expecting a scalar return value or a list (which includes a hash).
> 
>return wantarray ? $retval : (list or hash);

Aha, TIL. That seems like precisely what we want. 

> A common perl idiom is to start private routine names with an underscore. so 
> I'd rename wait_connect to _wait_connect;

There are quite a few routines documented as internal in Cluster.pm which don't
start with an underscore.  Should we change them as well?  I'm happy to prepare
a separate patch to address that if we want that.

> Why is $restart_before_query a package/class level value instead of an 
> instance value? And why can we only ever set it to 1 but not back again? 
> Maybe we don't want to, but it looks odd.

It was mostly a POC to show what I meant with the functionality.  I think there
should be a way to turn it off (set it to zero) even though I doubt it will be
used much.

> If we are going to keep this as a separate package, then we should put some 
> code in the constructor to prevent it being called from elsewhere than the 
> Cluster package. e.g.
> 
> # this constructor should only be called from PostgreSQL::Test::Cluster
> my ($package, $file, $line) = caller;
> 
> die "Forbidden caller of constructor: package: $package, file: 
> $file:$line"
>   unless $package eq 'PostgreSQL::Test::Cluster';

I don't have strong feelings about where to place this, but Cluster.pm is
already quite long so I see a small upside to keeping it separate to not make
that worse.

--
Daniel Gustafsson





Re: Avoid use deprecated Windows Memory API

2023-03-17 Thread Aleksander Alekseev
Hi,

> Again, MSDN claims to use a new API.

TWIMC the patch rotted a bit and now needs to be updated [1].
I changed its status to "Waiting on Author" [2].

[1]: http://cfbot.cputube.org/
[2]: https://commitfest.postgresql.org/42/3893/

-- 
Best regards,
Aleksander Alekseev




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Andrew Dunstan


On 2023-03-17 Fr 05:48, Daniel Gustafsson wrote:

On 15 Mar 2023, at 02:03, Andres Freund  wrote:

Returning a hash seems like a worse option since it will complicate callsites
which only want to know success/failure.

Yea. Perhaps it's worth having a separate function for this? ->query_rc() or 
such?

If we are returning a hash then I agree it should be a separate function.
Maybe Andrew has input on which is the most Perl way of doing this.



I think the perlish way is use the `wantarray` function. Perl knows if 
you're expecting a scalar return value or a list (which includes a hash).



   return wantarray ? $retval : (list or hash);


A few more issues:

A common perl idiom is to start private routine names with an 
underscore. so I'd rename wait_connect to _wait_connect;


Why is $restart_before_query a package/class level value instead of an 
instance value? And why can we only ever set it to 1 but not back again? 
Maybe we don't want to, but it looks odd.


If we are going to keep this as a separate package, then we should put 
some code in the constructor to prevent it being called from elsewhere 
than the Cluster package. e.g.


    # this constructor should only be called from PostgreSQL::Test::Cluster
    my ($package, $file, $line) = caller;

    die "Forbidden caller of constructor: package: $package, file: 
$file:$line"

      unless $package eq 'PostgreSQL::Test::Cluster';

This should refer to the full class name:

+=item $node->background_psql($dbname, %params) => BackgroundPsql instance


Still reviewing ...


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Greg Stark
On Wed, 15 Mar 2023 at 14:29, Gregory Stark (as CFM)
 wrote:
>
> These patches that are "Needs Review" and have received no comments at
> all since before March 1st are these. If your patch is amongst this
> list I would suggest any of:
>
> 1) Move it yourself to the next CF (or withdraw it)
> 2) Post to the list with any pending questions asking for specific
> feedback -- it's much more likely to get feedback than just a generic
> "here's a patch plz review"...
> 3) Mark it Ready for Committer and possibly post explaining the
> resolution to any earlier questions to make it easier for a committer
> to understand the state
>
> If there's still no emails on these at some point I suppose it will
> make sense to move them out of the CF.

I'm going to go ahead and do this today. Any of these patches that are
"Waiting on Author" and haven't received any emails or status changes
since March 1 I'm going to move out of the commitfest(*). If you
really think your patch in this list is important to get committed
then please respond to the thread explaining any plans or feedback
needed.

It would be nice to actually do Returned With Feedback where
appropriate but there are too many to go through them thoroughly. I'll
only be able to do a quick review of each thread checking for
important bug fixes or obviously rejected patches.

(*) I reserve the right to skip and leave some patches where
appropriate. In particular I'll skip patches that are actually from
committers on the theory that they could just commit them when they
feel like it anyways. Some patches may be intentionally waiting until
the end of the release cycle to avoid conflicts too.

>  * ALTER TABLE SET ACCESS METHOD on partitioned tables
>  * New hooks in the connection path
>  * Add log messages when replication slots become active and inactive
>  * Avoid use deprecated Windows Memory API
>  * Remove dead macro exec_subplan_get_plan
>  * Consider parallel for LATERAL subqueries having LIMIT/OFFSET
>  * pg_rewind WAL deletion pitfall
>  * Simplify find_my_exec by using realpath(3)
>  * Move backup-related code to xlogbackup.c/.h
>  * Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
> showing metadata ...)
>  * Fix bogus error emitted by pg_recvlogical when interrupted
>  * warn if GUC set to an invalid shared library
>  * Check consistency of GUC defaults between .sample.conf and
> pg_settings.boot_val
>  * Code checks for App Devs, using new options for transaction behavior
>  * Lockless queue of waiters based on atomic operations for LWLock
>  * Fix assertion failure with next_phase_at in snapbuild.c
>  * Add SPLIT PARTITION/MERGE PARTITIONS commands
>  * Add sortsupport for range types and btree_gist
>  * asynchronous execution support for Custom Scan
>  * Periodic burst growth of the checkpoint_req counter on replica.
>  * CREATE INDEX CONCURRENTLY on partitioned table
>  * Fix ParamPathInfo for union-all AppendPath
>  * Add OR REPLACE option for CREATE OPERATOR
>  * ALTER TABLE and CLUSTER fail to use a BulkInsertState for toast tables
>  * Partial aggregates push down
>  * Non-replayable WAL records through overflows and >MaxAllocSize lengths
>  * Enable jitlink as an alternative jit linker of legacy Rtdyld and
> add riscv jitting support
>  * Test for function error in logrep worker
>  * basebackup: support zstd long distance matching
>  * pgbench - adding pl/pgsql versions of tests
>  * Function to log backtrace of postgres processes
>  * More scalable multixacts buffers and locking
>  * Remove nonmeaningful prefixes in PgStat_* fields
>  * COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
>  * postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
>  * Add semi-join pushdown to postgres_fdw
>  * Skip replicating the tables specified in except table option
>  * Split index and table statistics into different types of stats
>  * Exclusion constraints on partitioned tables
>  * Post-special Page Storage TDE support
>  * Direct I/O (developer-only feature)
>  * Improve doc for autovacuum on partitioned tables
>  * Patch to implement missing join selectivity estimation for range types
>  * Clarify the behavior of the system when approaching XID wraparound
>  * Set arbitrary GUC options during initdb
>  * An attempt to avoid
> locally-committed-but-not-replicated-to-standby-transactions in
> synchronous replication
>  * Check lateral references within PHVs for memoize cache keys
>  * Add n_tup_newpage_upd to pg_stat table views
>  * monitoring usage count distribution
>  * Reduce wakeup on idle for bgwriter & walwriter for >5s
>  * Report the query string that caused a memory error under Valgrind
>  * New [relation] options engine
>  * Data is copied twice when specifying both child and parent table in
> publication
>  * possibility to take name, signature and oid of currently executed
> function in GET DIAGNOSTICS statement
>  * Named Operators
>  * nbtree performance improvements through specialization 

Re: HOT chain validation in verify_heapam()

2023-03-17 Thread Aleksander Alekseev
Hi,

> On Wed, Mar 8, 2023 at 7:30 PM Himanshu Upadhyaya 
>  wrote:
> Please find the v11 patch with all these changes.

The patch needed a rebase due to a4f23f9b. PFA v12.

-- 
Best regards,
Aleksander Alekseev


v12-0001-Implement-HOT-chain-validation-in-verify_heapam.patch
Description: Binary data


Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-17 Thread Jim Jones

After some more testing I realized that v5 was leaking the xmlDocPtr.

Now fixed in v6.
From d04d8fdcbedbd5ed88469bd22e079467c26ab7a4 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 17 Mar 2023 10:23:48 +0100
Subject: [PATCH v6] Add CANONICAL output format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. This feature
is based on the function xmlC14NDocDumpMemory from the C14N module
of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/sgml/datatype.sgml|  41 +++-
 src/backend/executor/execExprInterp.c |   2 +-
 src/backend/parser/gram.y |  21 +-
 src/backend/parser/parse_expr.c   |   2 +-
 src/backend/utils/adt/xml.c   | 275 --
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |  12 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   2 +-
 src/test/regress/expected/xml.out | 112 +++
 src/test/regress/expected/xml_1.out   | 108 ++
 src/test/regress/expected/xml_2.out   | 112 +++
 src/test/regress/sql/xml.sql  |  63 ++
 13 files changed, 639 insertions(+), 113 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 4df8bd1b64..90415265d4 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,7 +4460,7 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { [ NO ] INDENT ] | CANONICAL [ WITH [NO] COMMENTS ]})
 
 type can be
 character, character varying, or
@@ -4477,6 +4477,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS 
 
+   
+The option CANONICAL converts a given
+XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology;>canonical form
+based on the https://www.w3.org/TR/xml-c14n11/;>W3C Canonical XML 1.1 Specification.
+It is basically designed to provide applications the ability to compare xml documents or test if they
+have been changed. The optional parameter WITH [NO] COMMENTS removes or keeps XML comments
+from the given document.
+
+
+ 
+ Example:
+
+
+   

 When a character string value is cast to or from type
 xml without going through XMLPARSE or
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9cb9625ce9..d3dfc9ecfc 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3840,7 +3840,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 *op->resvalue =
 	PointerGetDatum(xmltotext_with_options(DatumGetXmlP(value),
 		   xexpr->xmloption,
-		   xexpr->indent));
+		   xexpr->format));
 *op->resnull = false;
 			}
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index efe88ccf9d..fd9315feb8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,12 +613,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	xml_root_version opt_xml_root_standalone
 %type 	xmlexists_argument
 %type 	document_or_content
-%type 	xml_indent_option xml_whitespace_option
+%type 	xml_whitespace_option
 %type 	xmltable_column_list xmltable_column_option_list
 %type 	xmltable_column_el
 %type 	xmltable_column_option_el
 %type 	xml_namespace_list
 %type 	xml_namespace_el
+%type  	opt_xml_serialize_format
 
 %type 	func_application func_expr_common_subexpr
 %type 	func_expr func_expr_windowless
@@ -676,7 +677,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
 	BOOLEAN_P BOTH BREADTH BY
 
-	CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
+	CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
 	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
 	CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT
 	COMMITTED COMPRESSION CONCURRENTLY CONFIGURATION CONFLICT
@@ -15532,14 +15533,14 @@ func_expr_common_subexpr:
 	$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
 	 list_make3($3, $5, $6), @1);
 }
-			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
+			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_serialize_format ')'
 {
 	XmlSerialize *n = makeNode(XmlSerialize);
 
 	n->xmloption = $3;
 	n->expr = $4;
 	n->typeName = $6;
-	n->indent = $7;
+	n->format = $7;
 	n->location = @1;
 			

Re: Allow logical replication to copy tables in binary format

2023-03-17 Thread Melih Mutlu
Hi,

Sharing v17.

Amit Kapila , 17 Mar 2023 Cum, 03:02 tarihinde
şunu yazdı:

> I think to reduce the risk of breakage, let's change the check to
> >=v16. Also, accordingly, update the doc and commit message.
>

Done.

Peter Smith , 17 Mar 2023 Cum, 04:58 tarihinde şunu
yazdı:

> IMO the sentence "However, logical replication in binary format is
> more restrictive." should just be plain text.
>

Done.

 shiy.f...@fujitsu.com , 17 Mar 2023 Cum, 05:26
tarihinde şunu yazdı:

> It looks that you forgot to pass `offset` into wait_for_log().


Yes, I somehow didn't include those lines into the patch. Thanks for
noticing. Fixed them now.


Thanks,
-- 
Melih Mutlu
Microsoft


v17-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-17 Thread torikoshia

On 2023-03-07 18:09, Daniel Gustafsson wrote:

On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:


I felt just logging "Error: %ld" would make people wonder the meaning 
of

the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

Thanks. For more clearance change the message to: "Errors were found: 
%".


I'm not convinced that this adds enough clarity to assist the user.  We 
also
shouldn't use "error" in a WARNING log since the user has explicitly 
asked to

skip rows on error, so it's not an error per se.

+1


How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log
for reprocessing.")));
Since skipped rows cannot be inspected in the log when 
log_error_verbosity is set to terse,

it might be better without this errhint.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Data is copied twice when specifying both child and parent table in publication

2023-03-17 Thread Amit Kapila
On Fri, Mar 17, 2023 at 11:58 AM wangw.f...@fujitsu.com
 wrote:
>
> On Thu, Mar 16, 2023 at 20:25 PM Amit Kapila  wrote:
> >
>
> Thanks for your comments.
>
> > + if (server_version >= 16)
> > + {
> > + appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"
> > + "  ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> > + "FROM pg_attribute a\n"
> > + "WHERE a.attrelid = GPT.relid AND a.attnum > 0 AND\n"
> > + "  NOT a.attisdropped AND\n"
> > + "  (a.attnum = ANY(GPT.attrs) OR GPT.attrs IS NULL)\n"
> > + "  ) AS attnames\n"
> > + " FROM pg_class C\n"
> > + "   JOIN pg_namespace N ON N.oid = C.relnamespace\n"
> > + "   JOIN ( SELECT (pg_get_publication_tables(VARIADIC
> > array_agg(pubname::text))).*\n"
> > + "  FROM pg_publication\n"
> > + "  WHERE pubname IN ( %s )) as GPT\n"
> > + "   ON GPT.relid = C.oid\n",
> > + pub_names.data);
> >
> > The function pg_get_publication_tables()  has already handled dropped
> > columns, so we don't need it here in this query. Also, the part to
> > build attnames should be the same as it is in view
> > pg_publication_tables.
>
> Agree. Changed.
>
> > Can we directly try to pass the list of
> > pubnames to the function pg_get_publication_tables() instead of
> > joining it with pg_publication?
>
> Changed.
> I think the aim of joining it with pg_publication before is to exclude
> non-existing publications.
>

Okay, A comment for that would have made it clear.

> Otherwise, we would get an error because of the call
> to function GetPublicationByName (with 'missing_ok = false') in function
> pg_get_publication_tables. So, I changed "missing_ok" to true. If anyone 
> doesn't
> like this change, I'll reconsider this in the next version.
>

I am not sure about changing missing_ok behavior. Did you check it for
any other similar usage in other functions?

+ foreach(lc, pub_elem_tables)
+ {
+ published_rel *table_info = (published_rel *) malloc(sizeof(published_rel));

Is there a reason to use malloc instead of palloc?

-- 
With Regards,
Amit Kapila.




Re: Making background psql nicer to use in tap tests

2023-03-17 Thread Daniel Gustafsson
> On 15 Mar 2023, at 02:03, Andres Freund  wrote:

>> Returning a hash seems like a worse option since it will complicate callsites
>> which only want to know success/failure.
> 
> Yea. Perhaps it's worth having a separate function for this? ->query_rc() or 
> such?

If we are returning a hash then I agree it should be a separate function.
Maybe Andrew has input on which is the most Perl way of doing this.

>>> - right now there's a bit too much logic in background_psql() /
>>> interactive_psql() for my taste
>> 
>> Not sure what you mean, I don't think they're especially heavy on logic?
> 
> -EMISSINGWORD on my part. A bit too much duplicated logic.

That makes more sense, and I can kind of agree.  I don't think it's too bad but
I agree there is room for improvement.

>> +A default timeout of $PostgreSQL::Test::Utils::timeout_default is set up,
>> +which can modified later.
>> 
>> This require a bit of knowledge about the internals which I think we should
>> hide in this new API.  How about providing a function for defining the 
>> timeout?
> 
> "definining" in the sense of accessing it? Or passing one in?

I meant passing one in.

>> Re timeouts: one thing I've done repeatedly is to use short timeouts and 
>> reset
>> them per query, and that turns pretty ugly fast.  I hacked up your patch to
>> provide $h->reset_timer_before_query() which then injects a {timeout}->start
>> before running each query without the caller having to do it.  Not sure if 
>> I'm
>> alone in doing that but if not I think it makes sense to add.
> 
> I don't quite understand the use case, but I don't mind it as a functionality.

I've used it a lot when I want to run n command which each should finish
quickly or not at all.  So one time budget per command rather than having a
longer timeout for a set of commands that comprise a test.  It can be done
already today by calling ->start but it doesn't exactly make the code cleaner.

As mentioned off-list I did some small POD additions when reviewing, so I've
attached them here in a v2 in the hopes that it might be helpful.  I've also
included the above POC for restarting the timeout per query to show what I
meant.

--
Daniel Gustafsson



v2-0001-Refactor-background-psql-TAP-functions.patch
Description: Binary data


Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-17 Thread Jim Jones
v5 attached is a rebase over the latest changes in xmlserialize (INDENT 
output).From 24f045ccf7ac000a509910cb32c54ce4c91e2c33 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 17 Mar 2023 10:23:48 +0100
Subject: [PATCH v5] Add CANONICAL output format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. This feature
is based on the function xmlC14NDocDumpMemory from the C14N module
of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/sgml/datatype.sgml|  41 +++-
 src/backend/executor/execExprInterp.c |   2 +-
 src/backend/parser/gram.y |  21 ++-
 src/backend/parser/parse_expr.c   |   2 +-
 src/backend/utils/adt/xml.c   | 258 --
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |  12 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   2 +-
 src/test/regress/expected/xml.out | 112 +++
 src/test/regress/expected/xml_1.out   | 108 +++
 src/test/regress/expected/xml_2.out   | 112 +++
 src/test/regress/sql/xml.sql  |  63 +++
 13 files changed, 626 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 4df8bd1b64..90415265d4 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,7 +4460,7 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { [ NO ] INDENT ] | CANONICAL [ WITH [NO] COMMENTS ]})
 
 type can be
 character, character varying, or
@@ -4477,6 +4477,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS 
 
+   
+The option CANONICAL converts a given
+XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology;>canonical form
+based on the https://www.w3.org/TR/xml-c14n11/;>W3C Canonical XML 1.1 Specification.
+It is basically designed to provide applications the ability to compare xml documents or test if they
+have been changed. The optional parameter WITH [NO] COMMENTS removes or keeps XML comments
+from the given document.
+
+
+ 
+ Example:
+
+
+   

 When a character string value is cast to or from type
 xml without going through XMLPARSE or
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9cb9625ce9..d3dfc9ecfc 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3840,7 +3840,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 *op->resvalue =
 	PointerGetDatum(xmltotext_with_options(DatumGetXmlP(value),
 		   xexpr->xmloption,
-		   xexpr->indent));
+		   xexpr->format));
 *op->resnull = false;
 			}
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index efe88ccf9d..fd9315feb8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,12 +613,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	xml_root_version opt_xml_root_standalone
 %type 	xmlexists_argument
 %type 	document_or_content
-%type 	xml_indent_option xml_whitespace_option
+%type 	xml_whitespace_option
 %type 	xmltable_column_list xmltable_column_option_list
 %type 	xmltable_column_el
 %type 	xmltable_column_option_el
 %type 	xml_namespace_list
 %type 	xml_namespace_el
+%type  	opt_xml_serialize_format
 
 %type 	func_application func_expr_common_subexpr
 %type 	func_expr func_expr_windowless
@@ -676,7 +677,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
 	BOOLEAN_P BOTH BREADTH BY
 
-	CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
+	CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
 	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
 	CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT
 	COMMITTED COMPRESSION CONCURRENTLY CONFIGURATION CONFLICT
@@ -15532,14 +15533,14 @@ func_expr_common_subexpr:
 	$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
 	 list_make3($3, $5, $6), @1);
 }
-			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
+			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_serialize_format ')'
 {
 	XmlSerialize *n = makeNode(XmlSerialize);
 
 	n->xmloption = $3;
 	n->expr = $4;
 	n->typeName = $6;
-	n->indent = $7;
+	n->format = $7;
 	n->location = @1;
 	$$ 

Re: improving user.c error messages

2023-03-17 Thread Peter Eisentraut

On 17.03.23 00:47, Nathan Bossart wrote:

Here is a rebased patch in which I've addressed the latest feedback except
for the DropRole() part that is under discussion.


committed





Re: postgres_fdw: Useless if-test in GetConnection()

2023-03-17 Thread Etsuro Fujita
On Wed, Mar 15, 2023 at 7:18 PM Etsuro Fujita  wrote:
> This would be harmless, so I am planning to apply the patch to HEAD only.

I forgot to mention that this was added in v14.  Done that way.

Best regards,
Etsuro Fujita




Re: suppressing useless wakeups in logical/worker.c

2023-03-17 Thread Amit Kapila
On Fri, Mar 17, 2023 at 5:52 AM Nathan Bossart  wrote:
>
> I've attached a minimally-updated patch that doesn't yet address the bigger
> topics under discussion.
>
> On Thu, Mar 16, 2023 at 03:30:37PM +0530, Amit Kapila wrote:
> > On Wed, Feb 1, 2023 at 5:35 AM Nathan Bossart  
> > wrote:
> >> On Sat, Jan 28, 2023 at 10:26:25AM +0530, Amit Kapila wrote:
> >> > BTW, do we need to do something about wakeups in
> >> > wait_for_relation_state_change()?
> >>
> >> ... and wait_for_worker_state_change(), and copy_read_data().  From a quick
> >> glance, it looks like fixing these would be a more invasive change.
> >
> > What kind of logic do you have in mind to avoid waking up once per
> > second in those cases?
>
> I haven't looked into this too much yet.  I'd probably try out Tom's
> suggestions from upthread [0] next and see if those can be applied here,
> too.
>

For the clean exit of tablesync worker, we already wake up the apply
worker in finish_sync_worker(). You probably want to deal with error
cases or is there something else on your mind? BTW, for
wait_for_worker_state_change(), one possibility is to wake all the
sync workers when apply worker exits but not sure if that is a very
good idea.

Few minor comments:
=
1.
- if (wal_receiver_timeout > 0)
+ now = GetCurrentTimestamp();
+ if (now >= wakeup[LRW_WAKEUP_TERMINATE])
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("terminating logical replication worker due to timeout")));
+
+ /* Check to see if it's time for a ping. */
+ if (now >= wakeup[LRW_WAKEUP_PING])
  {
- TimestampTz now = GetCurrentTimestamp();

Previously, we use to call GetCurrentTimestamp() only when
wal_receiver_timeout > 0 but we ignore that part now. It may not
matter much but if possible let's avoid calling GetCurrentTimestamp()
at additional places.

2.
+ for (int i = 0; i < NUM_LRW_WAKEUPS; i++)
+ LogRepWorkerComputeNextWakeup(i, now);
+
+ /*
+ * LogRepWorkerComputeNextWakeup() will have cleared the tablesync
+ * worker start wakeup time, so we might not wake up to start a new
+ * worker at the appropriate time.  To deal with this, we set the
+ * wakeup time to right now so that
+ * process_syncing_tables_for_apply() recalculates it as soon as
+ * possible.
+ */
+ if (!am_tablesync_worker())
+ LogRepWorkerUpdateSyncStartWakeup(now);

Can't we avoid clearing syncstart time in the first place?

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences, take 2

2023-03-17 Thread vignesh C
On Thu, 16 Mar 2023 at 21:55, Tomas Vondra
 wrote:
>
> Hi!
>
> On 3/16/23 08:38, Masahiko Sawada wrote:
> > Hi,
> >
> > On Wed, Mar 15, 2023 at 9:52 PM Tomas Vondra
> >  wrote:
> >>
> >>
> >>
> >> On 3/14/23 08:30, John Naylor wrote:
> >>> I tried a couple toy examples with various combinations of use styles.
> >>>
> >>> Three with "automatic" reading from sequences:
> >>>
> >>> create table test(i serial);
> >>> create table test(i int GENERATED BY DEFAULT AS IDENTITY);
> >>> create table test(i int default nextval('s1'));
> >>>
> >>> ...where s1 has some non-default parameters:
> >>>
> >>> CREATE SEQUENCE s1 START 100 MAXVALUE 100 INCREMENT BY -1;
> >>>
> >>> ...and then two with explicit use of s1, one inserting the 'nextval'
> >>> into a table with no default, and one with no table at all, just
> >>> selecting from the sequence.
> >>>
> >>> The last two seem to work similarly to the first three, so it seems like
> >>> FOR ALL TABLES adds all sequences as well. Is that expected?
> >>
> >> Yeah, that's a bug - we shouldn't replicate the sequence changes, unless
> >> the sequence is actually added to the publication. I tracked this down
> >> to a thinko in get_rel_sync_entry() which failed to check the object
> >> type when puballtables or puballsequences was set.
> >>
> >> Attached is a patch fixing this.
> >>
> >>> The documentation for CREATE PUBLICATION mentions sequence options,
> >>> but doesn't really say how these options should be used.
> >> Good point. The idea is that we handle tables and sequences the same
> >> way, i.e. if you specify 'sequence' then we'll replicate increments for
> >> sequences explicitly added to the publication.
> >>
> >> If this is not clear, the docs may need some improvements.
> >>
> >
> > I'm late to this thread, but I have some questions and review comments.
> >
> > Regarding sequence logical replication, it seems that changes of
> > sequence created after CREATE SUBSCRIPTION are applied on the
> > subscriber even without REFRESH PUBLICATION command on the subscriber.
> > Which is a different behavior than tables. For example, I set both
> > publisher and subscriber as follows:
> >
> > 1. On publisher
> > create publication test_pub for all sequences;
> >
> > 2. On subscriber
> > create subscription test_sub connection 'dbname=postgres port=5551'
> > publication test_pub; -- port=5551 is the publisher
> >
> > 3. On publisher
> > create sequence s1;
> > select nextval('s1');
> >
> > I got the error "ERROR:  relation "public.s1" does not exist on the
> > subscriber". Probably we need to do should_apply_changes_for_rel()
> > check in apply_handle_sequence().
> >
>
> Yes, you're right - the sequence handling should have been calling the
> should_apply_changes_for_rel() etc.
>
> The attached 0005 patch should fix that - I still need to test it a bit
> more and maybe clean it up a bit, but hopefully it'll allow you to
> continue the review.
>
> I had to tweak the protocol a bit, so that this uses the same cache as
> tables. I wonder if maybe we should make it even more similar, by
> essentially treating sequences as tables with (last_value, log_cnt,
> called) columns.
>
> > If my understanding is correct, is there any case where the subscriber
> > needs to apply transactional sequence changes? The commit message of
> > 0001 patch says:
> >
> > * Changes for sequences created in the same top-level transaction are
> >   treated as transactional, i.e. just like any other change from that
> >   transaction, and discarded in case of a rollback.
> >
> > IIUC such sequences are not visible to the subscriber, so it cannot
> > subscribe to them until the commit.
> >
>
> The comment is slightly misleading, as it talks about creation of
> sequences, but it should be talking about relfilenodes. For example, if
> you create a sequence, add it to publication, and then in a later
> transaction you do
>
>ALTER SEQUENCE x RESTART
>
> or something else that creates a new relfilenode, then the subsequent
> increments are visible only in that transaction. But we still need to
> apply those on the subscriber, but only as part of the transaction,
> because it might roll back.
>
> > ---
> > I got an assertion failure. The reproducible steps are:
> >
>
> I do believe this was due to a thinko in apply_handle_sequence, which
> sometimes started transaction and didn't terminate it correctly. I've
> changed it to use the begin_replication_step() etc. and it seems to be
> working fine now.

Few comments:
1) One of the test is failing for me, I had also seen the same failure
in CFBOT at [1] too:
#   Failed test 'create sequence, advance it in rolled-back
transaction, but commit the create'
#   at t/030_sequences.pl line 152.
#  got: '1|0|f'
# expected: '132|0|t'
t/030_sequences.pl . 5/? ?
#   Failed test 'advance the new sequence in a transaction and roll it back'
#   at t/030_sequences.pl line 175.
#  got: '1|0|f'
# expected: '231|0|t'

#   

Re: [EXTERNAL] Support load balancing in libpq

2023-03-17 Thread Jelte Fennema
> The documentation lists the modes disabled and random, but I wonder if it's
> worth expanding the docs to mention that "disabled" is pretty much a round
> robin load balancing scheme?  It reads a bit odd to present load balancing
> without a mention of round robin balancing given how common it is.

I think you misunderstood what I meant in that section, so I rewrote
it to hopefully be clearer. Because disabled really isn't the same as
round-robin.

> This removes all uses of conn->addrlist_family and that struct member can be
> removed.

done

> s/stanby/standby/
> s/Postgres/PostgreSQL/

done

> The documentation typically use a less personal form, I would suggest 
> something
> along the lines of:
>
> "If uniform load balancing is required then an external load balancing 
> tool
> must be used.  Non-uniform load balancing can also be used to skew the
> results, e.g.  by providing the.."

rewrote this to stop using "you" and expanded a bit on the topic

>   libpq_append_conn_error(conn, "unable to initiate random number generator");

done

> -#ifndef WIN32
> +/* MinGW has sys/time.h, but MSVC doesn't */
> +#ifndef _MSC_VER
>  #include 
> This seems unrelated to the patch in question, and should be a separate 
> commit IMO.

It's not really unrelated. This only started to be needed because
libpq_prng_init calls gettimeofday . That did not work on MinGW
systems. Before this patch libpq was never calling gettimeofday. So I
think it makes sense to leave it in the commit.

> +   LOAD_BALANCE_RANDOM,/* Read-write server */
> I assume this comment is a copy/paste and should have been reflecting random 
> order?

Yes, done

> +++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
> Nitpick, but we should probably rename this load_balance to match the 
> parameter
> being tested.

Done

> A test
> which require root permission level manual system changes stand a very low
> chance of ever being executed, and as such will equate to dead code that may
> easily be broken or subtly broken.

While I definitely agree that it makes it hard to execute, I don't
think that means it will be executed nearly as few times as you
suggest. Maybe you missed it, but I modified the .cirrus.yml file to
configure the hosts file for both Linux and Windows runs. So, while I
agree it is unlikely to be executed manually by many people, it would
still be run on every commit fest entry (which should capture most
issues that I can imagine could occur).

> I am also not a fan of the random_seed parameter.

Fair enough. Removed

> A few ideas:
>
>   * Add basic tests for the load_balance_host connection param only accepting
> sane values
>
>   * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not 
> require
> random_seed but instead using randomization. Thinking out loud, how about
> something along these lines?
> - Passing a list of unreachable hostnames with a single good hostname
>   should result in a connection.
> - Passing a list of one good hostname should result in a connection
> - Passing a list on n good hostname (where all are equal) should result in
>   a connection

Implemented all these.

> - Passing a list of n unreachable hostnames should result in log entries
>   for n broken resolves in some order, and running that test n' times
>   shouldn't - statistically - result in the same order for a large enough 
> n'.

I didn't implement this one. Instead I went for another statistics
based approach with working hosts (see test for details).

>   * Remove random_seed and 004_loadbalance_dns.pl

I moved 004_load_balance_dns.pl to a separate commit (after making
similar random_seed removal related changes to it). As explained above
I think it's worth it to have it because it gets executed in CI. But
feel free to commit only the main patch, if you disagree.


v13-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v13-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v13-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v13-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v13-0004-Add-DNS-based-libpq-load-balancing-test.patch
Description: Binary data


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread shiy.f...@fujitsu.com
On Friday, March 17, 2023 3:38 PM Önder Kalacı  wrote:
> 
> Hi Amit, all
>  
> You can first submit the fix for dropped columns with patches till
> v10. Once that is committed, then you can send the patches for
> generated columns.
> 
> Alright, attaching 2 patches for dropped columns, the names of the files 
> shows which 
> versions the patch can be applied to:
> v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
> v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch 
> 
> And, then on top of that, you can apply the patch for generated columns on 
> all applicable
> versions (HEAD, 15, 14, 13 and 12). It applies cleanly.  The name of the file
> is: v2-0001-Ignore-generated-columns.patch
> 
> 
> But unfortunately I couldn't test the patch with PG 12 and below. I'm getting 
> some
> unrelated compile errors and Postgrees CI is not available on these versions 
> . I'll try
> to fix that, but I thought it would still be good to share the patches as you 
> might
> already have the environment to run the tests. 
> 

Thanks for updating the patch.

I couldn't apply v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
cleanly in v13 and v14. It looks the patch needs some changes in these versions.

```
Checking patch src/backend/executor/execReplication.c...
Hunk #1 succeeded at 243 (offset -46 lines).
Hunk #2 succeeded at 263 (offset -46 lines).
Checking patch src/test/subscription/t/100_bugs.pl...
error: while searching for:
$node_publisher->stop('fast');
$node_subscriber->stop('fast');

done_testing();

error: patch failed: src/test/subscription/t/100_bugs.pl:373
Applied patch src/backend/executor/execReplication.c cleanly.
Applying patch src/test/subscription/t/100_bugs.pl with 1 reject...
Rejected hunk #1.
```

Besides, I tried v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch in v12. The
test failed and here's some information.

```
Can't locate object method "new" via package "PostgreSQL::Test::Cluster" 
(perhaps you forgot to load "PostgreSQL::Test::Cluster"?) at t/100_bugs.pl line 
74.
# Looks like your test exited with 2 just after 1.
```

+my $node_publisher_d_cols = 
PostgreSQL::Test::Cluster->new('node_publisher_d_cols');

It seems this usage is not supported in v12 and we should use get_new_node()
like other test cases.

Regards,
Shi Yu


Re: Memory leak from ExecutorState context?

2023-03-17 Thread Jehan-Guillaume de Rorthais
Hi there,

On Fri, 10 Mar 2023 19:51:14 +0100
Jehan-Guillaume de Rorthais  wrote:

> > So I guess the best thing would be to go through these threads, see what
> > the status is, restart the discussion and propose what to do. If you do
> > that, I'm happy to rebase the patches, and maybe see if I could improve
> > them in some way.  
> 
> [...]
> 
> > I was hoping we'd solve this by the BNL, but if we didn't get that in 4
> > years, maybe we shouldn't stall and get at least an imperfect stop-gap
> > solution ...  
> 
> Indeed. So, to sum-up:
> 
> * Patch 1 could be rebased/applied/backpatched

Would it help if I rebase Patch 1 ("move BufFile stuff into separate context")?

> * Patch 2 is worth considering to backpatch

Same question.

> * Patch 3 seemed withdrawn in favor of BNLJ
> * Patch 4 is waiting for some more review and has some TODO
> * discussion 5 worth few minutes to discuss before jumping on previous topics

These other patches needs more discussions and hacking. They have a low
priority compare to other discussions and running commitfest. However, how can
avoid losing them in limbo again?

Regards,




Re: gcc 13 warnings

2023-03-17 Thread Peter Eisentraut

On 16.03.23 19:11, Andres Freund wrote:

On 2023-03-16 13:54:29 -0400, Tom Lane wrote:

Andres Freund  writes:

On 2023-03-16 12:10:27 -0400, Tom Lane wrote:

It wouldn't be entirely surprising if meson is selecting some -W
switches that the configure script doesn't ... but I don't know
where to check or change that.



I think it's just that meson defaults to -O3 (fwiw, I see substantial gains of
that over -O2).  I see such warnings with autoconf as well if I make it use
-O3.


Oh, interesting.  Should we try to standardize the two build systems
on the same -O level, and if so which one?


I'm on the fence on this one (and posed it as a question before). O3 does
result in higher performance for me, but it also does take longer to build,
and increases the numbers of warnings.

So I just elected to leave it at the default for meson.


AFAICT, the default for meson is buildtype=debug, which is -O0.  The -O3 
comes from meson.build setting buildtype=release.


I think a good compromise would be buildtype=debugoptimized, which is 
-O2 with debug symbols, which also sort of matches the default in the 
autoconf world.


(https://mesonbuild.com/Builtin-options.html#details-for-buildtype)

At least during the transition phase I would prefer having the same 
default optimization level in both build systems, mainly because of how 
this affects warnings.






Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-17 Thread Masahiko Sawada
On Fri, Mar 17, 2023 at 4:03 PM John Naylor
 wrote:
>
> On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> >  wrote:
> > >
> > > I wrote:
> > >
> > > > > > Since the block-level measurement is likely overestimating quite a 
> > > > > > bit, I propose to simply reverse the order of the actions here, 
> > > > > > effectively reporting progress for the *last page* and not the 
> > > > > > current one: First update progress with the current memory usage, 
> > > > > > then add tids for this page. If this allocated a new block, only a 
> > > > > > small bit of that will be written to. If this block pushes it over 
> > > > > > the limit, we will detect that up at the top of the loop. It's kind 
> > > > > > of like our earlier attempts at a "fudge factor", but simpler and 
> > > > > > less brittle. And, as far as OS pages we have actually written to, 
> > > > > > I think it'll effectively respect the memory limit, at least in the 
> > > > > > local mem case. And the numbers will make sense.
>
> > > I still like my idea at the top of the page -- at least for vacuum and 
> > > m_w_m. It's still not completely clear if it's right but I've got nothing 
> > > better. It also ignores the work_mem issue, but I've given up 
> > > anticipating all future cases at the moment.
>
> > IIUC you suggested measuring memory usage by tracking how much memory
> > chunks are allocated within a block. If your idea at the top of the
> > page follows this method, it still doesn't deal with the point Andres
> > mentioned.
>
> Right, but that idea was orthogonal to how we measure memory use, and in fact 
> mentions blocks specifically. The re-ordering was just to make sure that 
> progress reporting didn't show current-use > max-use.

Right. I still like your re-ordering idea. It's true that the most
area of the last allocated block before heap scanning stops is not
actually used yet. I'm guessing we can just check if the context
memory has gone over the limit. But I'm concerned it might not work
well in systems where overcommit memory is disabled.

>
> However, the big question remains DSA, since a new segment can be as large as 
> the entire previous set of allocations. It seems it just wasn't designed for 
> things where memory growth is unpredictable.
>
> I'm starting to wonder if we need to give DSA a bit more info at the start. 
> Imagine a "soft" limit given to the DSA area when it is initialized. If the 
> total segment usage exceeds this, it stops doubling and instead new segments 
> get smaller. Modifying an example we used for the fudge-factor idea some time 
> ago:
>
> m_w_m = 1GB, so calculate the soft limit to be 512MB and pass it to the DSA 
> area.
>
> 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> hit soft limit, so 
> "stairstep down" the new segment sizes:
>
> 766 + 2*(128) + 64 = 1086MB -> stop
>
> That's just an undeveloped idea, however, so likely v17 development, even 
> assuming it's not a bad idea (could be).

This is an interesting idea. But I'm concerned we don't have enough
time to get confident with adding this new concept to DSA.

>
> And sadly, unless we find some other, simpler answer soon for tracking and 
> limiting shared memory, the tid store is looking like v17 material.

Another problem we need to deal with is the supported minimum memory
in shared tidstore cases. Since the initial DSA segment size is 1MB,
memory usage of a shared tidstore will start from 1MB+. This is higher
than the minimum values of both work_mem and maintenance_work_mem,
64kB and 1MB respectively. Increasing the minimum m_w_m to 2MB seems
to be acceptable in the community but not for work_mem. One idea is to
deny the memory limit less than 2MB so it won't work with small m_w_m
settings. While it might be an acceptable restriction at this stage
(where there is no use case of using tidstore with work_mem in the
core) but it will be a blocker for the future adoptions such as
unifying with tidbitmap.c. Another idea is that the process can
specify the initial segment size at dsa_create() so that DSA can start
with a smaller segment, say 32kB. That way, a tidstore with a 32kB
limit gets full once it allocates the next DSA segment, 32kB. . But a
downside of this idea is to increase the number of segments behind
DSA. Assuming it's a relatively rare case where we use such a low
work_mem, it might be acceptable. FYI, the total number of DSM
segments available on the system is calculated by:

#define PG_DYNSHMEM_FIXED_SLOTS 64
#define PG_DYNSHMEM_SLOTS_PER_BACKEND   5

maxitems = PG_DYNSHMEM_FIXED_SLOTS
+ PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends;

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread Önder Kalacı
Hi Amit, all


> You can first submit the fix for dropped columns with patches till
> v10. Once that is committed, then you can send the patches for
> generated columns.
>
>
Alright, attaching 2 patches for dropped columns, the names of the files
shows which
versions the patch can be applied to:
v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch

And, then on top of that, you can apply the patch for generated columns on
all applicable
versions (HEAD, 15, 14, 13 and 12). It applies cleanly.  The name of the
file
is: v2-0001-Ignore-generated-columns.patch


But unfortunately I couldn't test the patch with PG 12 and below. I'm
getting some
unrelated compile errors and Postgrees CI is not available on
these versions . I'll try
to fix that, but I thought it would still be good to share the patches as
you might
already have the environment to run the tests.


Don't worry about v10 --- that's out of support and shouldn't
> get patched for this.


Given this information, I skipped the v10 patch.

Thanks,
Onder KALACI


v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch
Description: Binary data


v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
Description: Binary data


v2-0001-Ignore-generated-columns.patch
Description: Binary data


Re: Add pg_walinspect function with block info columns

2023-03-17 Thread Bharath Rupireddy
On Fri, Mar 17, 2023 at 7:33 AM Peter Geoghegan  wrote:
>
> > IIUC, the concern raised so far in this thread is not just on the
> > performance of JOIN queries to get both block info and record level
> > info, but on ease of using pg_walinspect functions. If
> > pg_get_wal_block_info emits the record level information too (which
> > turns out to be 50 LOC more), one doesn't have to be expert at writing
> > JOIN queries or such, but just can run the function, which actually
> > takes way less time (3sec) to scan the same 5mn WAL records [3].
>
> That's exactly my concern, yes. As you say, it's not just the
> performance aspect. Requiring users to write a needlessly ornamental
> query is actively misleading. It suggests that block_ref is distinct
> information from the blocks output by pg_get_wal_block_info().

+1 for pg_get_wal_block_info emitting per-record WAL info too along
with block info, attached v2 patch does that. IMO, usability wins the
race here.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v2-0001-Emit-WAL-record-info-via-pg_get_wal_block_info.patch
Description: Binary data


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-17 Thread John Naylor
On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada 
wrote:
>
> On Tue, Mar 14, 2023 at 8:27 PM John Naylor
>  wrote:
> >
> > I wrote:
> >
> > > > > Since the block-level measurement is likely overestimating quite
a bit, I propose to simply reverse the order of the actions here,
effectively reporting progress for the *last page* and not the current one:
First update progress with the current memory usage, then add tids for this
page. If this allocated a new block, only a small bit of that will be
written to. If this block pushes it over the limit, we will detect that up
at the top of the loop. It's kind of like our earlier attempts at a "fudge
factor", but simpler and less brittle. And, as far as OS pages we have
actually written to, I think it'll effectively respect the memory limit, at
least in the local mem case. And the numbers will make sense.

> > I still like my idea at the top of the page -- at least for vacuum and
m_w_m. It's still not completely clear if it's right but I've got nothing
better. It also ignores the work_mem issue, but I've given up anticipating
all future cases at the moment.

> IIUC you suggested measuring memory usage by tracking how much memory
> chunks are allocated within a block. If your idea at the top of the
> page follows this method, it still doesn't deal with the point Andres
> mentioned.

Right, but that idea was orthogonal to how we measure memory use, and in
fact mentions blocks specifically. The re-ordering was just to make sure
that progress reporting didn't show current-use > max-use.

However, the big question remains DSA, since a new segment can be as large
as the entire previous set of allocations. It seems it just wasn't designed
for things where memory growth is unpredictable.

I'm starting to wonder if we need to give DSA a bit more info at the start.
Imagine a "soft" limit given to the DSA area when it is initialized. If the
total segment usage exceeds this, it stops doubling and instead new
segments get smaller. Modifying an example we used for the fudge-factor
idea some time ago:

m_w_m = 1GB, so calculate the soft limit to be 512MB and pass it to the DSA
area.

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> hit soft limit, so
"stairstep down" the new segment sizes:

766 + 2*(128) + 64 = 1086MB -> stop

That's just an undeveloped idea, however, so likely v17 development, even
assuming it's not a bad idea (could be).

And sadly, unless we find some other, simpler answer soon for tracking and
limiting shared memory, the tid store is looking like v17 material.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: slapd logs to syslog during tests

2023-03-17 Thread Andres Freund
Hi,

On 2023-03-16 22:43:17 -0700, Andres Freund wrote:
> So unless somebody has a better idea, I'm gonna replace 'logfile-only on' with
> 'loglevel 0' for now. I also am open to reverting and trying again tomorrow.

Did that now. I used the commandline option -s0 instead of loglevel 0, as that
prevents even the first message.

Regards,

Andres




RE: Data is copied twice when specifying both child and parent table in publication

2023-03-17 Thread wangw.f...@fujitsu.com
On Thu, Mar 16, 2023 at 20:25 PM Amit Kapila  wrote:
>

Thanks for your comments.

> + if (server_version >= 16)
> + {
> + appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"
> + "  ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> + "FROM pg_attribute a\n"
> + "WHERE a.attrelid = GPT.relid AND a.attnum > 0 AND\n"
> + "  NOT a.attisdropped AND\n"
> + "  (a.attnum = ANY(GPT.attrs) OR GPT.attrs IS NULL)\n"
> + "  ) AS attnames\n"
> + " FROM pg_class C\n"
> + "   JOIN pg_namespace N ON N.oid = C.relnamespace\n"
> + "   JOIN ( SELECT (pg_get_publication_tables(VARIADIC
> array_agg(pubname::text))).*\n"
> + "  FROM pg_publication\n"
> + "  WHERE pubname IN ( %s )) as GPT\n"
> + "   ON GPT.relid = C.oid\n",
> + pub_names.data);
> 
> The function pg_get_publication_tables()  has already handled dropped
> columns, so we don't need it here in this query. Also, the part to
> build attnames should be the same as it is in view
> pg_publication_tables.

Agree. Changed.

> Can we directly try to pass the list of
> pubnames to the function pg_get_publication_tables() instead of
> joining it with pg_publication?

Changed.
I think the aim of joining it with pg_publication before is to exclude
non-existing publications. Otherwise, we would get an error because of the call
to function GetPublicationByName (with 'missing_ok = false') in function
pg_get_publication_tables. So, I changed "missing_ok" to true. If anyone doesn't
like this change, I'll reconsider this in the next version.

> Can we keep the changes in the else part (fix when publisher < 16) the
> same as HEAD and move the proposed change to a separate patch?
> Basically, for the HEAD patch, let's just try to fix this when
> publisher >=16. I am slightly worried that as this is a corner case
> bug and we didn't see any user complaints for this, so introducing a
> complex fix for back branches may not be required or at least we can
> discuss that separately.

Split the patch as suggested.

Attach the new patch set.

Regards,
Wang Wei


HEAD-v17-0001-Fix-data-replicated-twice-when-specifying-publis.patch
Description:  HEAD-v17-0001-Fix-data-replicated-twice-when-specifying-publis.patch


HEAD-v17-0002-Fix-this-problem-for-back-branches.patch
Description: HEAD-v17-0002-Fix-this-problem-for-back-branches.patch


HEAD-v17-0003-Add-clarification-for-the-behaviour-of-the-publi.patch
Description:  HEAD-v17-0003-Add-clarification-for-the-behaviour-of-the-publi.patch


REL14_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch
Description:  REL14_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch


REL15_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch
Description:  REL15_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch


Re: In-placre persistance change of a relation

2023-03-17 Thread Kyotaro Horiguchi
At Fri, 03 Mar 2023 18:03:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Correctly they are three parts. The attached patch is the first part -
> the storage mark files, which are used to identify storage files that
> have not been committed and should be removed during the next
> startup. This feature resolves the issue of orphaned storage files
> that may result from a crash occurring during the execution of a
> transaction involving the creation of a new table.
> 
> I'll post all of the three parts shortly.

Mmm. It took longer than I said, but this is the patch set that
includes all three parts.

1. "Mark files" to prevent orphan storage files for in-transaction
  created relations after a crash.

2. In-place persistence change: For ALTER TABLE SET LOGGED/UNLOGGED
  with wal_level minimal, and ALTER TABLE SET UNLOGGED with other
  wal_levels, the commands don't require a file copy for the relation
  storage. ALTER TABLE SET LOGGED with non-minimal wal_level emits
  bulk FPIs instead of a bunch of individual INSERTs.

3. An extension to ALTER TABLE SET (UN)LOGGED that can handle all
  tables in a tablespace at once.


As a side note, I quickly go over the behavior of the mark files
introduced by the first patch, particularly what happens when deletion
fails.

(1) The mark file for MAIN fork (".u") corresponds to all forks,
while the mark file for INIT fork ("_init.u") corresponds to
INIT fork alone.

(2) The mark file is created just before the the corresponding storage
file is made. This is always logged in the WAL.

(3) The mark file is deleted after removing the corresponding storage
file during the commit and rollback. This action is logged in the
WAL, too. If the deletion fails, an ERROR is output and the
transaction aborts.

(4) If a crash leaves a mark file behind, server will try to delete it
after successfully removing the corresponding storage file during
the subsequent startup that runs a recovery. If deletion fails,
server leaves the mark file alone with emitting a WARNING. (The
same behavior for non-mark files.)

(5) If the deletion of the mark file fails, the leftover mark file
prevents the creation of the corresponding storage file (causing
an ERROR).  The leftover mark files don't result in the removal of
the wrong files due to that behavior.

(6) The mark file for an INIT fork is created only when ALTER TABLE
SET UNLOGGED is executed (not for CREATE UNLOGGED TABLE) to signal
the crash-cleanup code to remove the INIT fork. (Otherwise the
cleanup code removes the main fork instead. This is the main
objective of introducing the mark files.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ba4b8140fe582ceec4ea810621e17d6a1fe9c408 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 2 Mar 2023 17:25:12 +0900
Subject: [PATCH v27 1/3] Storage mark files

In certain situations, specific operations followed by a crash-restart
can result in orphaned storage files.  These files cannot be removed
through standard methods.  To address this issue, this commit
implements 'mark files' that conveys information about the storage
file. Specifically, the "UNCOMMITED" mark file is introduced to denote
files that have not been committed and should be removed during the
next startup.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  37 +++
 src/backend/access/transam/README |  10 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  18 ++
 src/backend/backup/basebackup.c   |   9 +-
 src/backend/catalog/storage.c | 270 ++-
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 313 +++---
 src/backend/storage/smgr/md.c |  95 ++-
 src/backend/storage/smgr/smgr.c   |  32 +++
 src/backend/storage/sync/sync.c   |  26 +-
 src/bin/pg_rewind/parsexlog.c |  16 ++
 src/common/relpath.c  |  47 ++--
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  35 ++-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |   8 +-
 src/include/storage/smgr.h|  17 ++
 src/test/recovery/t/013_crash_restart.pl  |  21 ++
 src/tools/pgindent/typedefs.list  |   6 +
 22 files changed, 848 insertions(+), 144 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index bd841b96e8..f8187385c4 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,37 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = 

Re: Add macros for ReorderBufferTXN toptxn

2023-03-17 Thread Amit Kapila
On Thu, Mar 16, 2023 at 7:20 AM Peter Smith  wrote:
>
> PSA v4 which addresses both of your review comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.