Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-05-01 Thread Oliver Ford
On Sun, Apr 23, 2023 at 4:29 AM Tatsuo Ishii  wrote:

> > Vik Fearing  writes:
> >
> >> For me, this is perfectly okay.  Keep them at the lowest level of
> >> reservation as possible.
> >
> > Yeah, keep them unreserved if at all possible.  Any higher reservation
> > level risks breaking existing applications that might be using these
> > words as column or function names.
>
> Agreed.
> 

  
Attached is a new version of the code and tests to implement this. There's
now no modification to windowfuncs.c or the catalog,
it's only a bool added to FuncCall which if set to true, ignores nulls. It
adds IGNORE/RESPECT at the Unreserved, As Label level.

The implementation also aims at better performance over previous versions
by not disabling set_mark, and using an array to
track previous non-null positions in SEEK_HEAD or SEEK_CURRENT with Forward
(lead, but not lag). The mark is set if a row
is out of frame and further rows can't be in frame (to ensure it works with
an exclusion clause).

The attached test patch is mostly the same as in the previous patch
set, but it doesn't fail on row_number anymore as the main patch
only rejects aggregate functions. The test patch also adds a test for
EXCLUDE CURRENT ROW and for two contiguous null rows.

I've not yet tested custom window functions with the patch, but I'm happy
to add them to the test patch in v2 if we want to go this way
in implementing this feature.
From 81c48df9a08deb065379e8bccffb2f5592faa4d0 Mon Sep 17 00:00:00 2001
From: Oliver Ford 
Date: Wed, 19 Apr 2023 01:07:14 +0100
Subject: [PATCH] initial window ignore

---
 src/backend/executor/nodeWindowAgg.c | 263 ++-
 src/backend/optimizer/util/clauses.c |   1 +
 src/backend/parser/gram.y|  20 +-
 src/backend/parser/parse_func.c  |   9 +
 src/backend/utils/adt/ruleutils.c|   7 +-
 src/include/nodes/parsenodes.h   |   1 +
 src/include/nodes/primnodes.h|   2 +
 src/include/parser/kwlist.h  |   2 +
 8 files changed, 297 insertions(+), 8 deletions(-)

diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 4f0618f27a..fac0e05dee 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -69,6 +69,11 @@ typedef struct WindowObjectData
 	int			readptr;		/* tuplestore read pointer for this fn */
 	int64		markpos;		/* row that markptr is positioned on */
 	int64		seekpos;		/* row that readptr is positioned on */
+
+	bool		ignore_nulls;	/* ignore nulls */
+	int64		*win_nonnulls;	/* tracks non-nulls in ignore nulls mode */
+	int			nonnulls_size;	/* track size of the win_nonnulls array */
+	int			nonnulls_len;	/* track length of the win_nonnulls array */
 } WindowObjectData;
 
 /*
@@ -97,6 +102,7 @@ typedef struct WindowStatePerFuncData
 	bool		plain_agg;		/* is it just a plain aggregate function? */
 	int			aggno;			/* if so, index of its WindowStatePerAggData */
 
+	bool		ignore_nulls;	/* ignore nulls */
 	WindowObject winobj;		/* object used in window function API */
 }			WindowStatePerFuncData;
 
@@ -2560,14 +2566,14 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
 			elog(ERROR, "WindowFunc with winref %u assigned to WindowAgg with winref %u",
  wfunc->winref, node->winref);
 
-		/* Look for a previous duplicate window function */
+		/* Look for a previous duplicate window function, which needs the same ignore_nulls value */
 		for (i = 0; i <= wfuncno; i++)
 		{
 			if (equal(wfunc, perfunc[i].wfunc) &&
 !contain_volatile_functions((Node *) wfunc))
 break;
 		}
-		if (i <= wfuncno)
+		if (i <= wfuncno && wfunc->ignore_nulls == perfunc[i].ignore_nulls)
 		{
 			/* Found a match to an existing entry, so just mark it */
 			wfuncstate->wfuncno = i;
@@ -2620,6 +2626,13 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
 			winobj->argstates = wfuncstate->args;
 			winobj->localmem = NULL;
 			perfuncstate->winobj = winobj;
+			winobj->ignore_nulls = wfunc->ignore_nulls;
+			if (winobj->ignore_nulls)
+			{
+winobj->win_nonnulls = palloc_array(int64, 16);
+winobj->nonnulls_size = 16;
+winobj->nonnulls_len = 0;
+			}
 
 			/* It's a real window function, so set up to call it. */
 			fmgr_info_cxt(wfunc->winfnoid, &perfuncstate->flinfo,
@@ -3306,6 +3319,244 @@ WinRowsArePeers(WindowObject winobj, int64 pos1, int64 pos2)
 	return res;
 }
 
+static void increment_notnulls(WindowObject winobj, int64 pos)
+{
+	if (winobj->nonnulls_len == winobj->nonnulls_size)
+	{
+		winobj->nonnulls_size *= 2;
+		winobj->win_nonnulls =
+			repalloc_array(winobj->win_nonnulls,
+			int64,
+			winobj->nonnulls_size);
+	}
+	winobj->win_nonnulls[winobj->nonnulls_len] = pos;
+	winobj->nonnulls_len++;
+}
+
+static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno,
+		int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) {
+	WindowAggState *winstate;
+	ExprConte

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread John Naylor
On Mon, May 1, 2023 at 2:30 AM Peter Geoghegan  wrote:
>
> On Sat, Apr 29, 2023 at 7:30 PM John Naylor
>  wrote:
> > How about
> >
> > -HINT:  To avoid a database shutdown, [...]
> > +HINT:  To prevent XID exhaustion, [...]
> >
> > ...and "MXID", respectively? We could explain in the docs that vacuum
and read-only queries still work "when XIDs have been exhausted", etc.
>
> I think that that particular wording works in this example -- we *are*
> avoiding XID exhaustion. But it still doesn't really address my
> concern -- at least not on its own. I think that we need a term for
> xidStopLimit mode (and perhaps multiStopLimit) itself. This is a
> discrete state/mode that is associated with a specific mechanism.

Well, since you have a placeholder "xidStopLimit mode" in your other patch,
I'll confine my response to there. Inventing "modes" seems like an awkward
thing to backpatch, not to mention moving the goalposts. My modest goal
here is quite limited: to stop lying to our users about "not accepting
commands", and change tragically awful advice into sensible advice.

Here's my new idea:

-HINT:  To avoid a database shutdown, [...]
+HINT:  To prevent XID generation failure, [...]

Actually, I like "allocation" better, but the v8 patch now has "generation"
simply because one MXID message already has "generate" and I did it that
way before thinking too hard. I'd be okay with either one as long as it's
consistent.

> > (I should probably also add in the commit message that the "shutdown"
in the message was carried over to MXIDs when they arrived also in 2005).

Done

> > > Separately, there is a need to update a couple of other places to use
> > > this new terminology. The documentation for vacuum_sailsafe_age and
> > > vacuum_multixact_failsafe_age refer to "system-wide transaction ID
> > > wraparound failure", which seems less than ideal, even without your
> > > patch.
> >
> > Right, I'll have a look.

Looking now, I'm even less inclined to invent new terminology in back
branches.

> As you know, there is a more general problem with the use of the term
> "wraparound" in the docs, and in the system itself (in places like
> pg_stat_activity). Even the very basic terminology in this area is
> needlessly scary. Terms like "VACUUM (to prevent wraparound)" are
> uncomfortably close to "a race against time to avoid data corruption".
> The system isn't ever supposed to corrupt data, even if misconfigured
> (unless the misconfiguration is very low-level, such as "fsync=off").
> Users should be able to take that much for granted.

Granted. Whatever form your rewrite ends up in, it could make a lot of
sense to then backpatch a few localized corrections. I wouldn't even object
to including a few substitutions of s/wraparound failure/allocation
failure/  where appropriate. Let's see how that shakes out first.

> > I think the docs would do well to have ordered steps for recovering
from both XID and MXID exhaustion.
>
> I had planned to address this with my ongoing work on the "Routine
> Vacuuming" docs, but I think that you're right about the necessity of
> addressing it as part of this patch.

0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID
part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
--
John Naylor
EDB: http://www.enterprisedb.com
From 469d0e7123a16386e300a85a6bb08109e283b65c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v8 2/3] Stop telling users to run VACUUM in a single-user mode

Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:

* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
  is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
  of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
  mode vacuum, with or without VERBOSE

If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.

As mentioned in commit X, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.

By remaining in multi-user mode, users still have read-only access to
their data

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Robert Haas
On Wed, Apr 26, 2023 at 1:58 PM Peter Geoghegan  wrote:
> Why do we call wraparound wraparound, anyway? The 32-bit XID space is
> circular! The whole point of the design is that unsigned integer
> wraparound is meaningless -- there isn't really a point in "the
> circle" that you should think of as the start point or end point.
> (We're probably stuck with the term "wraparound" for now, so I'm not
> proposing that it be changed here, purely on pragmatic grounds.)

To me, the fact that the XID space is circular is the whole point of
talking about wraparound. If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 8:03 AM Robert Haas  wrote:
> To me, the fact that the XID space is circular is the whole point of
> talking about wraparound.

The word wraparound is ambiguous. It's not the same thing as
xidStopLimit in my view. It's literal integer wraparound.

If you think of XIDs as having a native 64-bit representation, while
using a truncated 32-bit on-disk representation in tuple headers
(which is the view promoted by the doc patch), then XIDs cannot wrap
around. There is still no possibility of "the future becoming the
past" (assuming no use of single user mode), either, because even in
the worst case we have xidStopLimit to make sure that the database
doesn't become corrupt. Why talk about what's *not* happening in a
place of prominence?

We'll still talk about literal integer wraparound with the doc patch,
but it's part of a discussion of the on-disk format in a distant
chapter. It's just an implementation detail, which is of no practical
consequence. The main discussion need only say something succinct and
vague about the use of a truncated representation (lacking a separate
epoch) in tuple headers eventually forcing freezing.

> If the XID space were non-circular, it could
> never try to reuse the XID values that have previously been used, and
> this entire class of problems would go away. Because it is circular,
> it's possible for the XID counter to arrive back at a place that it's
> been before i.e. it can wrap around.

But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.

-- 
Peter Geoghegan




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Robert Haas
On Mon, May 1, 2023 at 12:01 PM Peter Geoghegan  wrote:
> > If the XID space were non-circular, it could
> > never try to reuse the XID values that have previously been used, and
> > this entire class of problems would go away. Because it is circular,
> > it's possible for the XID counter to arrive back at a place that it's
> > been before i.e. it can wrap around.
>
> But integer wrap around isn't really aligned with anything important.
> xidStopLimit will kick in when we're only halfway towards literal
> integer wrap around. Users have practical concerns about avoiding
> xidStopLimit -- what a world without xidStopLimit looks like just
> doesn't matter. Just having some vague awareness of truncated XIDs
> being insufficient at some point is all you really need, even if
> you're an advanced user.

I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.

I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Robert Haas
On Fri, Apr 28, 2023 at 10:03 AM Eric Ridge  wrote:
> ZomboDB has 137 releases over the past 8 years.

Dang.

This may be one of those cases where the slow pace of change for
extensions shipped with PostgreSQL gives those of us who are
developers for PostgreSQL itself a sort of tunnel vision. The system
that we have in core works well for those extensions, which get a new
version at most once a year and typically much less often than that.
I'm not sure we'd be so sanguine about the status quo if we had 137
releases out there.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 9:08 AM Robert Haas  wrote:
> I disagree. If you start the cluster in single-user mode, you can
> actually wrap it around, unless something has changed that I don't
> know about.

This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.

> I'm not trying to debate the details of the patch, which I have not
> read. I am saying that, while wraparound is perhaps not a perfect term
> for what's happening, it is not, in my opinion, a bad term either. I
> don't think it's accurate to imagine that this is a 64-bit counter
> where we only store 32 bits on disk. We're trying to retcon that into
> being true, but we'd have to work significantly harder to actually
> make it true.

The purpose of this documentation section is to give users practical
guidance, obviously. The main reason to frame it this way is because
it seems to make the material easier to understand.

-- 
Peter Geoghegan




Re: Logging parallel worker draught

2023-05-01 Thread Robert Haas
On Sat, Apr 22, 2023 at 7:06 AM Amit Kapila  wrote:
> I don't think introducing a GUC for this is a good idea. We can
> directly output this message in the server log either at LOG or DEBUG1
> level.

Why not? It seems like something some people might want to log and
others not. Running the whole server at DEBUG1 to get this information
doesn't seem like a suitable answer.

What I was wondering was whether we would be better off putting this
into the statistics collector, vs. doing it via logging. Both
approaches seem to have pros and cons.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: base backup vs. concurrent truncation

2023-05-01 Thread Aleksander Alekseev
Hi,

> I think that to reproduce the scenario, you want the truncate to happen in
> its own checkpoint cycle.

OK, let's try this again.

In order to effectively disable the checkpointer I added the following lines
to postgresql.conf:

```
checkpoint_timeout = 3600
max_wal_size = 100G
```

I'm also keeping an eye on `logfile` in order to make sure the system doesn't
do anything unexpected.

Then:

```
-- Just double-checking
show checkpoint_timeout;
 checkpoint_timeout

 1h

show max_wal_size;
 max_wal_size
--
 100GB

create table truncateme(id integer, val varchar(1024));
alter table truncateme set (autovacuum_enabled = off);
select relfilenode from pg_class where relname = 'truncateme';

 relfilenode
-
   16385

-- takes ~30 seconds
insert into truncateme
  select id,
  (
select string_agg(chr((33+random()*(126-33)) :: integer), '')
from generate_series(1,1000)
  )
  from generate_series(1,2*1024*1024) as id;

delete from truncateme where id > 1024*1024;

select count(*) from truncateme;
  count
-
 1048576

-- Making a checkpoint as pg_basebackup would do.
-- Also, making sure truncate will happen in its own checkpoint cycle.
checkpoint;
```

Again I see 3 segments:

```
$ ls -lah 16385*
-rw--- 1 eax eax 1.0G May  1 19:24 16385
-rw--- 1 eax eax 1.0G May  1 19:27 16385.1
-rw--- 1 eax eax 293M May  1 19:27 16385.2
-rw--- 1 eax eax 608K May  1 19:24 16385_fsm
```

Making a backup of .2 as if I'm pg_basebackup:

```
cp 16385.2 ~/temp/16385.2
```

Truncating the table:

```
vacuum truncateme;
```

... and killing postgres:

```
$ pkill -9 postgres

```

Now I see:

```
$ ls -lah 16385*
-rw--- 1 eax eax 1.0G May  1 19:30 16385
-rw--- 1 eax eax 147M May  1 19:31 16385.1
-rw--- 1 eax eax0 May  1 19:31 16385.2
-rw--- 1 eax eax 312K May  1 19:31 16385_fsm
-rw--- 1 eax eax  40K May  1 19:31 16385_vm
$ cp ~/temp/16385.2 ./16385.2
```

Starting postgres:

```
LOG:  starting PostgreSQL 16devel on x86_64-linux, compiled by
gcc-11.3.0, 64-bit
LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
LOG:  database system was interrupted; last known up at 2023-05-01 19:27:22 MSK
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/8AAB36B0
LOG:  invalid record length at 0/CE9BDE60: expected at least 24, got 0
LOG:  redo done at 0/CE9BDE28 system usage: CPU: user: 6.51 s, system:
2.45 s, elapsed: 8.97 s
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  checkpoint complete: wrote 10 buffers (0.0%); 0 WAL file(s)
added, 0 removed, 68 recycled; write=0.026 s, sync=1.207 s,
total=1.769 s; sync files=10, longest=1.188 s, average=0.121 s;
distance=1113129 kB, estimate=1113129 kB; lsn=0/CE9BDE60, redo
lsn=0/CE9BDE60
LOG:  database system is ready to accept connections
```

```
$ ls -lah 16385*
-rw--- 1 eax eax 1.0G May  1 19:33 16385
-rw--- 1 eax eax 147M May  1 19:33 16385.1
-rw--- 1 eax eax0 May  1 19:33 16385.2
-rw--- 1 eax eax 312K May  1 19:33 16385_fsm
-rw--- 1 eax eax  40K May  1 19:33 16385_vm
```

```
select count(*) from truncateme;
  count
-
 1048576
 ```

So I'm still unable to reproduce the described scenario, at least on PG16.

-- 
Best regards,
Aleksander Alekseev




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan  wrote:
> On Mon, May 1, 2023 at 9:08 AM Robert Haas  wrote:
> > I disagree. If you start the cluster in single-user mode, you can
> > actually wrap it around, unless something has changed that I don't
> > know about.
>
> This patch relies on John's other patch which strongly discourages the
> use of single-user mode. Were it not for that, I might agree.

Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around? It's
possible (and very likely) that every unfrozen XID in the database is
from the same 64-XID-wise epoch.

I don't think that we need to say very much about this scenario (and
nothing at all about the specifics in "Routine Vacuuming"), so maybe
it doesn't matter much. But I maintain that it makes most sense to
describe this scenario as a violation of the "no more than ~2.1
billion XIDs distance between any two unfrozen XIDs" invariant, while
leaving the term "wraparound" out of it completely. That terms has way
too much baggage.

-- 
Peter Geoghegan




Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Eric Ridge


> On May 1, 2023, at 12:12 PM, Robert Haas  wrote:
> 
> On Fri, Apr 28, 2023 at 10:03 AM Eric Ridge  wrote:
>> ZomboDB has 137 releases over the past 8 years.
> 
> Dang.
> 
> This may be one of those cases where the slow pace of change for
> extensions shipped with PostgreSQL gives those of us who are
> developers for PostgreSQL itself a sort of tunnel vision.

Could be I'm a terrible programmer too.  Could be Elasticsearch is a PITA to 
deal with.

FWIW, outside of major ZDB releases, most of those have little-to-zero schema 
changes.  But that doesn't negate the fact each release needs its own 
upgrade.sql script. I'm working on a point release right this moment and it 
won't have any schema changes but it'll have an upgrade.sql script.

Maybe related, pgrx (formally pgx) has had 77 release since June 2020, but 
that's mostly us just slowly trying to wrap Postgres internals in Rust, bug 
fixing, and such.  Getting support for a new major Postgres release usually 
isn't *that* hard -- a day or two of work, depending on how drastically y'all 
changed an internal API.  `List` and `FunctionCallInfo(Base)Data` come to mind 
as minor horrors for us, hahaha.  Rust at least makes it straightforward for us 
to have divergent implementations and have the decisions solved at compile time.

> The system
> that we have in core works well for those extensions, which get a new
> version at most once a year and typically much less often than that.
> I'm not sure we'd be so sanguine about the status quo if we had 137
> releases out there.

I don't lose sleep over it but it is a lot of bookkeeping.  The nice thing 
about Postgres' extension system is that we can do what we want without regard 
to the project's release schedule.  In general, y'all nailed the extension 
system back in the day.

I feel for the PostGIS folks as they've clearly got more, hmm, complete 
Postgres version support than ZDB does, and I'd definitely want to support 
anything that would make their lives easier.  But I also don't want to see 
something that pgrx users might want to adopt that might turn out to be bad for 
them.

I have a small list of things I'd love to see changed wrt extensions but I'm 
just the idea guy and don't have much in the way of patches to offer.  I also 
have some radical ideas about annotating the sources themselves, but I don't 
feel like lobbing a grenade today.

eric



Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Maciek Sakrejda
On Mon, May 1, 2023, 18:08 Robert Haas  wrote:

> I am saying that, while wraparound is perhaps not a perfect term
> for what's happening, it is not, in my opinion, a bad term either.


I don't want to put words into Peter's mouth, but I think that he's arguing
that the term "wraparound" suggests that there is something special about
the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
There's only something special about the transition, as your current xid
advances, between the xid that's half the xid space ahead of your current
xid and the xid that's half the xid space behind the current xid, if the
latter is not frozen. I don't think that's what most users think of when
they hear "wraparound".


Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 12:03 PM Maciek Sakrejda  wrote:
> I don't want to put words into Peter's mouth, but I think that he's arguing 
> that the term "wraparound" suggests that there is something special about the 
> transition between xid 2^32 and xid 0 (or, well, 3). There isn't.

Yes, that's exactly what I mean. There are two points that seem to be
very much in tension here:

1. The scenario where you corrupt the database in single user mode by
unsafely allocating XIDs (you need single user mode to bypass the
xidStopLimit protections) generally won't involve unsigned integer
wraparound (and if it does it's *entirely* incidental to the data
corruption).

2. Actual unsigned integer wraparound is 100% harmless and routine, by design.

So why do we use the term wraparound as a synonym of "the end of the
world"? I assume that it's just an artefact of how the system worked
before the invention of freezing. Back then, you had to do a dump and
restore when the system reached about 4 billion XIDs. Wraparound
really did mean "the end of the world" over 20 years ago.

This is related to my preference for explaining the issues with
reference to a 64-bit XID space. Today we compare 64-bit XIDs using
simple unsigned integer comparisons. That's the same way that 32-bit
XID comparisons worked before freezing was invented in 2001. So it
really does seem like the natural way to explain it.

-- 
Peter Geoghegan




Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Tom Lane
Eric Ridge  writes:
> FWIW, outside of major ZDB releases, most of those have little-to-zero schema 
> changes.  But that doesn't negate the fact each release needs its own 
> upgrade.sql script. I'm working on a point release right this moment and it 
> won't have any schema changes but it'll have an upgrade.sql script.

Hmm ... our model for the in-core extensions has always been that you
don't need a new upgrade script unless the SQL declarations change.

Admittedly, that means that the script version number isn't a real
helpful guide to which version of the .so you're dealing with.
We expect the .so's own minor version number to suffice for that,
but I realize that that might not be the most user-friendly answer.

regards, tom lane




Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Eric Ridge


> On May 1, 2023, at 4:24 PM, Tom Lane  wrote:
> 
> Eric Ridge  writes:
>> FWIW, outside of major ZDB releases, most of those have little-to-zero 
>> schema changes.  But that doesn't negate the fact each release needs its own 
>> upgrade.sql script. I'm working on a point release right this moment and it 
>> won't have any schema changes but it'll have an upgrade.sql script.
> 
> Hmm ... our model for the in-core extensions has always been that you
> don't need a new upgrade script unless the SQL declarations change.

That's a great model when the schemas only change once every few years or never.

> Admittedly, that means that the script version number isn't a real
> helpful guide to which version of the .so you're dealing with.

It isn't.  ZDB, and I think (at least) PostGIS, have their own "version()" 
function.  Keeping everything the same version keeps me "sane" and eliminates a 
class of round-trip questions with users.

So when I say "little-to-zero schema changes" I mean that at least the 
version() function changes -- it's a `LANGUAGE sql` function for easy human 
inspection.

> We expect the .so's own minor version number to suffice for that,
> but I realize that that might not be the most user-friendly answer.

One of my desires is that the on-disk .so's filename be associated with the 
pg_extension entry and not Each. Individual. Function.  There's a few 
extensions that like to version the on-disk .so's filename which means a CREATE 
OR REPLACE for every function on every extension version bump.  That forces an 
upgrade script even if the schema didn't technically change and also creates 
the need for bespoke tooling around extension.sql and upgrade.sql scripts.

But I don't want to derail this thread.

eric



Order problem in GiST index

2023-05-01 Thread Kotroczó Roland
Hi!
We are developing a GiST index structure. We use M-tree for this work. We have 
a problem with it. We would like to get some closest neighbors. In my select we 
order the query by distance, and we use limit. Sometimes it works well, but in 
some cases I don’t get back the closest points. If we set the recheck flag to 
true, the query results will be correct.
I examined the CUBE extension and realized the recheck flag is false. According 
to the comments, it’s not necessary to set the recheck to true in CUBE. What 
conditions must be met to work well even with recheck=false? How is it working 
in CUBE?
Thanks for your replies.


RE: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Regina Obe
> It isn't.  ZDB, and I think (at least) PostGIS, have their own "version()"
function.
> Keeping everything the same version keeps me "sane" and eliminates a class
> of round-trip questions with users.
> 
Yes we have several version numbers and yes we too like to keep the
extension version the same, cause it's the only thing that is universally
consistent across all PostgreSQL extensions.

Yes we have our own internal version functions. One for PostGIS (has the
true lib version file) and a script version and we have logic to make sure
they are aligned.  We even have versions for our dependencies (PROJ, GEOS,
GDAL) cause behavior of PostGIS changes based on versions of those.
This goes for each extension we package that has a lib file (postgis,
postgis_raster, postgis_topology, postgis_sfcgal)

In addition to that we also have a version for PostgreSQL (that the scripts
were installed on). To catch cases when a pg_upgrade is needed to go from
3.3.0 to 3.3.0.
Yes we need same version upgrades (particularly because of pg_upgrade).
Sandro and I were talking about this.  This is something we do in our
postgis_extensions_upgrade()  (basically forcing an upgrade to a version
that does nothing, so we can force an upgrade to 3.3.0 again) to make up for
this limitation in extension model.

The reason for that is features get exposed based on version of PostgreSQL
you are running.
So in 3.3.0 we leveraged the new gist fast indexing build, which is only
enabled for users running PostgreSQL 15 and above.

What usually happens is someone has PostGIS 3.3.0 say on PG 14, they
pg_upgrade to PG 15 but they are running with PG 14 scripts 
So they are not taking advantage of the new PG 15 features until they do a 

SELECT postgis_extensions_upgrade();

So this is why we need the DO commands for scenarios like this.

> One of my desires is that the on-disk .so's filename be associated with
the
> pg_extension entry and not Each. Individual. Function.  There's a few
> extensions that like to version the on-disk .so's filename which means a
> CREATE OR REPLACE for every function on every extension version bump.
> That forces an upgrade script even if the schema didn't technically change
and
> also creates the need for bespoke tooling around extension.sql and
> upgrade.sql scripts.
> 
> But I don't want to derail this thread.
> 
> eric=

This is more or less the reason why we had to do CREATE OR REPLACE for all
our functions.
In the past we minor versioned our lib files so we had postgis-2.4,
postgis-2.5

At 3.0 after much in-fighting (battle between convenience of developers vs.
regular old users just wanting to use PostGIS and my frustration trying to
hold peoples hands thru pg_upgrade), we settled on major version for the lib
file, with option for developers to still keep the minor.

So default install will be postgis-3 for life of 3 series and become
postgis-4 when 4 comes along (hopefully not for another 10 years).
Completely stripping the version we decided not to do cause with the change
we have a whole baggage of legacy functions we needed to stub as we removed
them so pg_upgrade will work more or less seamlessly.  So come postgis-4
these stubs will be removed.

Our CI bots however many of them do use the minor versionings 3.1, 3.2, 3.3
etc, cause it's easier to test upgrades and do regressions.
And many PostGIS developers do the same.  So a replace of all functions is
still largely needed.  This is one of the reasons the whole chained upgrade
path never worked for us and why we settled on one script to handle
everything.






Re: Add PQsendSyncMessage() to libpq

2023-05-01 Thread Michael Paquier
On Sun, Apr 30, 2023 at 01:59:17AM +0100, Anton Kirilov wrote:
> I did a quick check using the TechEmpower Framework Benchmarks (
> https://www.techempower.com/benchmarks/ ) - they define 4 Web application
> tests that are database-bound. Everything was running on a single machine,
> and 3 of the tests had an improvement of 29.16%, 32.30%, and 41.78%
> respectively in the number of requests per second (Web application requests,
> not database queries), while the last test regressed by 0.66% (which I would
> say is practically no difference, given that there is always some
> measurement noise). I will try to get the changes from my patch tested in
> the project's continuous benchmarking environment, which has a proper set up
> with 3 servers (client, application server, and database) connected by a
> 10GbE link.

Well, these are nice numbers.  At ~1% I am ready to buy the noise
argument, but what would the range of the usual noise when it comes to
multiple runs under the same conditions?

Let's make sure that the API interface is the most intuitive (Robert
has commented about that a few days ago, still need to follow up on
that).
--
Michael


signature.asc
Description: PGP signature


Re: Add PQsendSyncMessage() to libpq

2023-05-01 Thread Michael Paquier
On Sat, Apr 29, 2023 at 05:06:03PM +0100, Anton Kirilov wrote:
> In any case I am not particularly attached to any naming or the exact shape
> of the new API, as long as it achieves the same goal (reducing the number of
> system calls), but before I make any substantial changes to my patch, I
> would like to hear your thoughts on the matter.

Another thing that may matter in terms of extensibility?  Would a
boolean argument really be the best design?  Could it be better to
have instead one API with a bits32 and some flags controlling its
internals?
--
Michael


signature.asc
Description: PGP signature


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-05-01 Thread Tatsuo Ishii
> The attached test patch is mostly the same as in the previous patch
> set, but it doesn't fail on row_number anymore as the main patch
> only rejects aggregate functions. The test patch also adds a test for

> +SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds

I think the standard does not allow to specify RESPECT NULLS other
than lead, lag, first_value, last_value and nth_value. Unless we agree
that PostgreSQL violates the standard in this regard, you should not
allow to use RESPECT NULLS for the window functions, expect lead etc.
and aggregates.

See my patch.

> +/*
> + * Window function option clauses
> + */
> +opt_null_treatment:
> + RESPECT NULLS_P 
> { $$ = RESPECT_NULLS; }
> + | IGNORE_P NULLS_P  
> { $$ = IGNORE_NULLS; }
> + | /*EMPTY*/ 
> { $$ = NULL_TREATMENT_NOT_SET; }
> + ;

With this, you can check if null treatment clause is used or not in
each window function.

In my previous patch I did the check in parse/analysis but I think
it's better to be checked in each window function. This way,

- need not to add a column to pg_proc.

- allow user defined window functions to decide by themselves whether
  they can accept null treatment option.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Large files for relations

2023-05-01 Thread Thomas Munro
Big PostgreSQL databases use and regularly open/close huge numbers of
file descriptors and directory entries for various anachronistic
reasons, one of which is the 1GB RELSEG_SIZE thing.  The segment
management code is trickier that you might think and also still
harbours known bugs.

A nearby analysis of yet another obscure segment life cycle bug
reminded me of this patch set to switch to simple large files and
eventually drop all that.  I originally meant to develop the attached
sketch-quality code further and try proposing it in the 16 cycle,
while I was down the modernisation rabbit hole[1], but then I got side
tracked: at some point I believed that the 56 bit relfilenode thing
might be necessary for correctness, but then I found a set of rules
that seem to hold up without that.  I figured I might as well post
what I have early in the 17 cycle as a "concept" patch to see which
way the flames blow.

There are various boring details due to Windows, and then a load of
fairly obvious changes, and then a whole can of worms about how we'd
handle the transition for the world's fleet of existing databases.
I'll cut straight to that part.  Different choices on aggressiveness
could be made, but here are the straw-man answers I came up with so
far:

1.  All new relations would be in large format only.  No 16384.N
files, just 16384 that can grow to MaxBlockNumber * BLCKSZ.

2.  The existence of a file 16384.1 means that this smgr relation is
in legacy segmented format that came from pg_upgrade (note that we
don't unlink that file once it exists, even when truncating the fork,
until we eventually drop the relation).

3.  Forks that were pg_upgrade'd from earlier releases using hard
links or reflinks would implicitly be in large format if they only had
one segment, and otherwise they could stay in the traditional format
for a grace period of N major releases, after which we'd plan to drop
segment support.  pg_upgrade's [ref]link mode would therefore be the
only way to get a segmented relation, other than a developer-only
trick for testing/debugging.

4.  Every opportunity to convert a multi-segment fork to large format
would be taken: pg_upgrade in copy mode, basebackup, COPY DATABASE,
VACUUM FULL, TRUNCATE, etc.  You can see approximately working sketch
versions of all the cases I thought of so far in the attached.

5.  The main places that do file-level copying of relations would use
copy_file_range() to do the splicing, so that on file systems that are
smart enough (XFS, ZFS, BTRFS, ...) with qualifying source and
destination, the operation can be very fast, and other degrees of
optimisation are available to the kernel too even for file systems
without block sharing magic (pushing down block range copies to
hardware/network storage, etc).  The copy_file_range() stuff could
also be proposed independently (I vaguely recall it was discussed a
few times before), it's just that it really comes into its own when
you start splicing files together, as needed here, and it's also been
adopted by FreeBSD with the same interface as Linux and has an
efficient implementation in bleeding edge ZFS there.

Stepping back, the main ideas are: (1) for some users of large
databases, it would be painlessly done at upgrade time without even
really noticing, using modern file system facilities where possible
for speed; (2) for anyone who wants to defer that because of lack of
fast copy_file_range() and a desire to avoid prolonged downtime by
using links or reflinks, concatenation can be put off for the next N
releases, giving a total of 5 + N years of option to defer the work,
and in that case there are also many ways to proactively change to
large format before the time comes with varying degrees of granularity
and disruption.  For example, set up a new replica and fail over, or
VACUUM FULL tables one at a time, etc.

There are plenty of things left to do in this patch set: pg_rewind
doesn't understand optional segmentation yet, there are probably more
things like that, and I expect there are some ssize_t vs pgoff_t
confusions I missed that could bite a 32 bit system.  But you can see
the basics working on a typical system.

I am not aware of any modern/non-historic filesystem[2] that can't do
large files with ease.  Anyone know of anything to worry about on that
front?  I think the main collateral damage would be weird old external
tools like some weird old version of Windows tar I occasionally see
mentioned, that sort of thing, but that'd just be another case of
"well don't use that then", I guess?  What else might we need to think
about, outside PostgreSQL?

What other problems might occur inside PostgreSQL?  Clearly we'd need
to figure out a decent strategy to automate testing of all of the
relevant transitions.  We could test the splicing code paths with an
optional test suite that you might enable along with a small segment
size (as we're already testing on CI and probably BF after the last
round of segmentation bugs).  To test t

Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-05-01 Thread Michael Paquier
On Sat, Apr 15, 2023 at 11:06:25AM +0900, Michael Paquier wrote:
> Thanks, I'll look at it.

+   else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
+Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny))
+   COMPLETE_WITH("CREATE", "GRANT");
+   else if (Matches("CREATE", "SCHEMA", MatchAny))
+   COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");

I had this grammar under my eyes a few days ago for a different patch,
and there are much more objects types that can be appended to a CREATE
SCHEMA, like triggers, sequences, tables or views, so this is
incomplete, isn't it?
--
Michael


signature.asc
Description: PGP signature


Re: Improve logging when using Huge Pages

2023-05-01 Thread Michael Paquier
On Thu, Apr 20, 2023 at 02:16:17PM -0700, Nathan Bossart wrote:
> AFAICT this would involve adding a bool to BackendParameters and using it
> in save_backend_variables() and restore_backend_variables(), which is an
> additional 3 lines of code.  That doesn't sound too bad to me, but perhaps
> I am missing something.

Appending more information to BackendParameters would be OK, still
this would require the extra SQL function to access it, which is
something that pg_settings is able to equally offer access to if
using a GUC.
--
Michael


signature.asc
Description: PGP signature


Re: Autogenerate some wait events code and documentation

2023-05-01 Thread Thomas Munro
> [patch]

This is not a review of the perl/make/meson glue/details, but I just
wanted to say thanks for working on this Bertrand & Michael, at a
quick glance that .txt file looks like it's going to be a lot more fun
to maintain!




Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 5:34 AM John Naylor  wrote:
> Well, since you have a placeholder "xidStopLimit mode" in your other patch, 
> I'll confine my response to there. Inventing "modes" seems like an awkward 
> thing to backpatch, not to mention moving the goalposts. My modest goal here 
> is quite limited: to stop lying to our users about "not accepting commands", 
> and change tragically awful advice into sensible advice.

I can't argue with that.

> Here's my new idea:
>
> -HINT:  To avoid a database shutdown, [...]
> +HINT:  To prevent XID generation failure, [...]
>
> Actually, I like "allocation" better, but the v8 patch now has "generation" 
> simply because one MXID message already has "generate" and I did it that way 
> before thinking too hard. I'd be okay with either one as long as it's 
> consistent.

WFM.

> Granted. Whatever form your rewrite ends up in, it could make a lot of sense 
> to then backpatch a few localized corrections. I wouldn't even object to 
> including a few substitutions of s/wraparound failure/allocation failure/  
> where appropriate. Let's see how that shakes out first.

Makes sense.

> > > I think the docs would do well to have ordered steps for recovering from 
> > > both XID and MXID exhaustion.
> >
> > I had planned to address this with my ongoing work on the "Routine
> > Vacuuming" docs, but I think that you're right about the necessity of
> > addressing it as part of this patch.
>
> 0003 is now a quick-and-dirty attempt at that, only in the docs. The MXID 
> part is mostly copy-pasted from the XID part, just to get something together. 
> I'd like to abbreviate that somehow.

Yeah, the need to abbreviate statements about MultiXact IDs by saying
that they work analogously to XIDs in some particular respect
is...another thing that makes this tricky.

I don't think that Multis are fundamentally different to XIDs. I
believe that the process through which VACUUM establishes its
OldestMXact cutoff can be pessimistic compared to OldestXmin, but I
don't think that it changes the guidance you'll need to give here.
VACUUM should always be able to advance relminmxid right up until
OldestMXact, if that's what the user insists on. For example, VACUUM
FREEZE sometimes allocates new Multis, just to be able to do that.

Obviously there are certain things that can hold back OldestMXact by a
wildly excessive amount. But I don't think that there is anything that
can hold back OldestMXact by a wildly excessive amount that won't more
or less do the same thing to OldestXmin.

-- 
Peter Geoghegan




Re: Support logical replication of DDLs

2023-05-01 Thread shveta malik
On Fri, Apr 28, 2023 at 5:11 PM Amit Kapila  wrote:
>
> On Tue, Apr 25, 2023 at 9:28 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
>
> I have a few high-level comments on the deparsing approach used in the
> patch. As per my understanding, we first build an ObjTree from the DDL
> command, then convert the ObjTree to Jsonb which is then converted to
> a JSON string.  Now, in the consecutive patch, via publication event
> triggers, we get the JSON string via the conversions mentioned, WAL
> log it, which then walsender will send to the subscriber, which will
> convert the JSON string back to the DDL command and execute it.
>
> Now, I think we can try to eliminate this entire ObjTree machinery and
> directly from the JSON blob during deparsing. We have previously also
> discussed this in an email chain at [1]. I think now the functionality
> of JSONB has also been improved and we should investigate whether it
> is feasible to directly use JSONB APIs to form the required blob.

+1.
I will investigate this and will share my findings.

thanks
Shveta




Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 7:55 PM Peter Geoghegan  wrote:
> Obviously there are certain things that can hold back OldestMXact by a
> wildly excessive amount. But I don't think that there is anything that
> can hold back OldestMXact by a wildly excessive amount that won't more
> or less do the same thing to OldestXmin.

Actually, it's probably possible for a transaction that only has a
virtual transaction ID to call MultiXactIdSetOldestVisible(), which
will then have the effect of holding back OldestMXact without also
holding back OldestXmin (in READ COMMITTED mode).

Will have to check to make sure, but that won't happen today.

--
Peter Geoghegan




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread John Naylor
On Tue, May 2, 2023 at 12:09 AM Peter Geoghegan  wrote:
>
> On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan  wrote:
> > On Mon, May 1, 2023 at 9:08 AM Robert Haas 
wrote:
> > > I disagree. If you start the cluster in single-user mode, you can
> > > actually wrap it around, unless something has changed that I don't
> > > know about.

+1 Pretending otherwise is dishonest.

> > This patch relies on John's other patch which strongly discourages the
> > use of single-user mode. Were it not for that, I might agree.

Oh that's rich. I'll note that 5% of your review was actually helpful
(actual correction), the other 95% was needless distraction trying to
enlist me in your holy crusade against the term "wraparound". It had the
opposite effect.

> Also, it's not clear that the term "wraparound" even describes what
> happens when you corrupt the database by violating the "no more than
> ~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
> single-user mode. What specific thing will have wrapped around?

In your first message you said "I'm hoping that I don't get too much push
back on this, because it's already very difficult work."

Here's some advice on how to avoid pushback:

1. Insist that all terms can only be interpreted in the most pig-headedly
literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
3. Claim that others are holding you back, and then try to move the
goalposts in their work.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 8:04 PM John Naylor  wrote:
> Here's some advice on how to avoid pushback:
>
> 1. Insist that all terms can only be interpreted in the most pig-headedly 
> literal sense possible.
> 2. Use that premise to pretend basic facts are a complete mystery.

I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.

How we use words like wraparound is actually something of great
consequence to the Postgres project. We've needlessly scared users
with the way this information has been presented up until now -- that
much is clear. To have you talk to me like this when I'm working on
such a difficult, thankless task is a real slap in the face.

> 3. Claim that others are holding you back, and then try to move the goalposts 
> in their work.

When did I say that? When did I even suggest it?

-- 
Peter Geoghegan




Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-01 Thread Amit Kapila
On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada  wrote:
>
> While investigating this issue, I've reviewed the code around
> callbacks and worker termination etc and I found a problem.
>
> A parallel apply worker calls the before_shmem_exit callbacks in the
> following order:
>
> 1. ShutdownPostgres()
> 2. logicalrep_worker_onexit()
> 3. pa_shutdown()
>
> Since the worker is detached during logicalrep_worker_onexit(),
> MyLogicalReplication->leader_pid is an invalid when we call
> pa_shutdown():
>
> static void
> pa_shutdown(int code, Datum arg)
> {
> Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> SendProcSignal(MyLogicalRepWorker->leader_pid,
>PROCSIG_PARALLEL_APPLY_MESSAGE,
>InvalidBackendId);
>
> Also, if the parallel apply worker fails shm_toc_lookup() during the
> initialization, it raises an error (because of noError = false) but
> ends up a SEGV as MyLogicalRepWorker is still NULL.
>
> I think that we should not use MyLogicalRepWorker->leader_pid in
> pa_shutdown() but instead store the leader's pid to a static variable
> before registering pa_shutdown() callback.
>

Why not simply move the registration of pa_shutdown() to someplace
after logicalrep_worker_attach()? BTW, it seems we don't have access
to MyLogicalRepWorker->leader_pid till we attach to the worker slot
via logicalrep_worker_attach(), so we anyway need to do what you are
suggesting after attaching to the worker slot.

-- 
With Regards,
Amit Kapila.




Re: Fix typos and inconsistencies for v16

2023-05-01 Thread Michael Paquier
On Fri, Apr 21, 2023 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Please look at the following two bunches for v14+ and v13+ (split to ease
> back-patching if needed). Having processed them, I've reached the state that
> could be considered "clean" ([2], [3]); at least I don't see how to detect
> yet more errors of this class in dozens, so it's my last run for now (though I
> have several entities left, which I couldn't find replacements for).

This was hanging around, and I had some time, so I have looked at the
whole.  One of the only two user-visible change was in the docs for
pg_amcheck, so I have applied that first as of 6fd8ae6 and backpatched
it down to 14.

Now, for the remaining 59..

> 1. agg_init_trans_check -> agg_trans
> 2. agg_strict_trans_check -> agg_trans

/*
 * pergroup = &aggstate->all_pergroups
-* [op->d.agg_strict_trans_check.setoff]
-* [op->d.agg_init_trans_check.transno];
+* [op->d.agg_trans.setoff]
+* [op->d.agg_trans.transno];
 */
Honestly, while incorrect, I have no idea what this comment means ;)

> 4. CommitTSBuffer -> CommitTsBuffer // the inconsistency exists since 
> 5da14938f; maybe this change should be backpatched

Yes, we'd better backpatch that.  I agree that it seems more sensible
here to switch the compiled value rather than what the docs have been
using for years.  Perhaps somebody has a different opinion?

The others were OK and in line with the discussion of upthread, so
applied.
--
Michael


signature.asc
Description: PGP signature


Re: Large files for relations

2023-05-01 Thread Pavel Stehule
Hi

I like this patch - it can save some system sources - I am not sure how
much, because bigger tables usually use partitioning usually.

Important note - this feature breaks sharing files on the backup side - so
before disabling 1GB sized files, this issue should be solved.

Regards

Pavel


RE: Perform streaming logical transactions by background workers and parallel apply

2023-05-01 Thread Zhijie Hou (Fujitsu)
On Friday, April 28, 2023 2:18 PM Masahiko Sawada  wrote:
> 
> On Fri, Apr 28, 2023 at 11:51 AM Amit Kapila  wrote:
> >
> > On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin
>  wrote:
> > > >
> > > > IIUC, that assert will fail in case of any error raised between
> > > >
> ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and
> > > >
> ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC
> > > > onnectionByOid()->InitPostgres().
> > >
> > > Thanks for reporting the issue.
> > >
> > > I think the problem is that it tried to release locks in
> > > logicalrep_worker_onexit() before the initialization of the process is
> complete
> > > because this callback function was registered before the init phase. So I
> think we
> > > can add a conditional statement before releasing locks. Please find an
> attached
> > > patch.
> > >
> >
> > Alexander, does the proposed patch fix the problem you are facing?
> > Sawada-San, and others, do you see any better way to fix it than what
> > has been proposed?
> 
> I'm concerned that the idea of relying on IsNormalProcessingMode()
> might not be robust since if we change the meaning of
> IsNormalProcessingMode() some day it would silently break again. So I
> prefer using something like InitializingApplyWorker, or another idea
> would be to do cleanup work (e.g., fileset deletion and lock release)
> in a separate callback that is registered after connecting to the
> database.

Thanks for the review. I agree that it’s better to use a new variable here.
Attach the patch for the same.


> 
> FWIW, we might need to be careful about the timing when we call
> logicalrep_worker_detach() in the worker's termination process. Since
> we rely on IsLogicalParallelApplyWorker() for the parallel apply
> worker to send ERROR messages to the leader apply worker, if an ERROR
> happens after logicalrep_worker_detach(), we will end up with the
> assertion failure.
> 
> if (IsLogicalParallelApplyWorker())
> SendProcSignal(pq_mq_parallel_leader_pid,
>PROCSIG_PARALLEL_APPLY_MESSAGE,
>pq_mq_parallel_leader_backend_id);
> else
> {
> Assert(IsParallelWorker());
>
> It normally would be a should-no-happen case, though.

Yes, I think currently PA sends ERROR message before exiting,
so the callback functions are always fired after the above code which
looks fine to me.

Best Regards,
Hou zj


v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch
Description:  v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch


Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-01 Thread Amit Kapila
On Tue, May 2, 2023 at 9:06 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, April 28, 2023 2:18 PM Masahiko Sawada  
> wrote:
> >
> > >
> > > Alexander, does the proposed patch fix the problem you are facing?
> > > Sawada-San, and others, do you see any better way to fix it than what
> > > has been proposed?
> >
> > I'm concerned that the idea of relying on IsNormalProcessingMode()
> > might not be robust since if we change the meaning of
> > IsNormalProcessingMode() some day it would silently break again. So I
> > prefer using something like InitializingApplyWorker, or another idea
> > would be to do cleanup work (e.g., fileset deletion and lock release)
> > in a separate callback that is registered after connecting to the
> > database.
>
> Thanks for the review. I agree that it’s better to use a new variable here.
> Attach the patch for the same.
>

+ *
+ * However, if the worker is being initialized, there is no need to release
+ * locks.
  */
- LockReleaseAll(DEFAULT_LOCKMETHOD, true);
+ if (!InitializingApplyWorker)
+ LockReleaseAll(DEFAULT_LOCKMETHOD, true);

Can we slightly reword this comment as: "The locks will be acquired
once the worker is initialized."?

-- 
With Regards,
Amit Kapila.




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-01 Thread Peter Geoghegan
On Mon, May 1, 2023 at 8:04 PM John Naylor  wrote:
> Oh that's rich. I'll note that 5% of your review was actually helpful (actual 
> correction), the other 95% was needless distraction trying to enlist me in 
> your holy crusade against the term "wraparound". It had the opposite effect.

I went back and checked. There were exactly two short paragraphs about
wraparound terminology on the thread associated with the patch you're
working on, towards the end of this one email:

https://postgr.es/m/CAH2-Wzm2fpPQ_=pxprvkniutybgtaufxrnw40klitxj9t3n...@mail.gmail.com

In what world does that amount to 95% of my review, or anything like it?

--
Peter Geoghegan




Re: ssl tests aren't concurrency safe due to get_free_port()

2023-05-01 Thread Peter Eisentraut

On 25.04.23 12:27, Peter Eisentraut wrote:
These patches have affected pgxs-using extensions that have their own 
TAP tests.


The portlock directory is created at

     my $build_dir = $ENV{top_builddir}
   || $PostgreSQL::Test::Utils::tmp_check ;
     $portdir ||= "$build_dir/portlock";

but for a pgxs user, top_builddir points into the installation tree, 
specifically at $prefix/lib/pgxs/.


So when running "make installcheck" for an extension, we either won't 
have write access to that directory, or if we do, then it's still not 
good to write into the installation tree during a test suite.


A possible fix is

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 5dacc4d838..c493d1a60c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
  $(MKDIR_P) '$(CURDIR)'/tmp_check && \
  cd $(srcdir) && \
     TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
-   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
+   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
     PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
     $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
$(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)

  endef


Any thoughts on this?  I would like to get this into the upcoming minor 
releases.


Note that the piece of code shown here is only applicable to PGXS, so 
given that that is currently broken, this "can't make it worse".






Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Yurii Rashkovskii
On Mon, May 1, 2023 at 11:05 PM Eric Ridge  wrote:

>
> > We expect the .so's own minor version number to suffice for that,
> > but I realize that that might not be the most user-friendly answer.
>
> One of my desires is that the on-disk .so's filename be associated with
> the pg_extension entry and not Each. Individual. Function.  There's a few
> extensions that like to version the on-disk .so's filename which means a
> CREATE OR REPLACE for every function on every extension version bump.  That
> forces an upgrade script even if the schema didn't technically change and
> also creates the need for bespoke tooling around extension.sql and
> upgrade.sql scripts.
>

I understand the potential pain with this (though I suppose MODULE_PATHNAME
can somewhat alleviate it). However, I'd like to highlight the fact
that, besides the fact that control files contain a reference to a .so
file, there's very little in the way of actual dependency of the extension
mechanism on shared libraries, as that part relies on functions being able
to use C language for their implementation.  Nothing I am aware of would
stop you from having multiple .so files in a given extension (for one
reason or another reason), and I think that's actually a great design,
incidentally or not. This does allow for a great deal of flexibility and an
escape hatch for when the straightforward case doesn't work [1]

If the extension subsystem wasn't replacing MODULE_PATHNAME, I don't think
there would be a reason for having `module_pathname` in the control file.
It doesn't preload the file or call anything from it. It's what `create
function` in the scripts will do. And that's actually great.

I am referencing this not because I don't think you know this but to try
and capture the higher-level picture here.

This doesn't mean we shouldn't improve the UX/DX of one of the common and
straightforward cases (having a single .so file with no other
complications) where possible.

[1]
https://yrashk.com/blog/2023/04/10/avoiding-postgres-extensions-limitations/
-- 
https://omnigr.es
Yurii.


Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-05-01 Thread Amit Kapila
On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand
 wrote:
>
> > Can you
> > please explain the logic behind this test a bit more like how the WAL
> > file switch helps you to achieve the purpose?
> >
>
> The idea was to generate enough "wal switch" on the primary to ensure
> the WAL file has been removed.
>
> I gave another thought on it and I think we can skip the test that the WAL is
> not on the primary any more. That way, one "wal switch" seems to be enough
> to see it removed on the standby.
>
> It's done in V7.
>
> V7 is not doing "extra tests" than necessary and I think it's probably better 
> like this.
>
> I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other 
> machines are not complaining).
>
> It does fail on "invalidated logical slots do not lead to retaining WAL", see 
> https://cirrus-ci.com/task/4518083541336064
>
> I'm not sure why it is failing, any idea?
>

I think the reason for the failure is that on standby, the test is not
able to remove the file corresponding to the invalid slot. You are
using pg_switch_wal() to generate a switch record and I think you need
one more WAL-generating statement after that to achieve your purpose
which is that during checkpoint, the tes removes the WAL file
corresponding to an invalid slot. Just doing checkpoint on primary may
not serve the need as that doesn't lead to any new insertion of WAL on
standby. Is your v6 failing in the same environment? If not, then it
is probably due to the reason that the test is doing insert after
pg_switch_wal() in that version. Why did you change the order of
insert in v7?

BTW, you can confirm the failure by changing the DEBUG2 message in
RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may
not remove the WAL file corresponding to an invalid slot whereas it
will remove the WAL file when the test succeeds.

-- 
With Regards,
Amit Kapila.