Re: remaining sql/json patches

2023-10-10 Thread Amit Langote
Hi Andres,

On Sat, Oct 7, 2023 at 6:49 AM Andres Freund  wrote:
> Hi,
>
> On 2023-09-29 13:57:46 +0900, Amit Langote wrote:
> > Thanks.  I will push the attached 0001 shortly.
>
> Sorry for not looking at this earlier.

Thanks for the review.  Replying here only to your comments on 0001.

> Have you done benchmarking to verify that 0001 does not cause performance
> regressions? I'd not be suprised if it did.

I found that it indeed did once I benchmarked with something that
would stress EEOP_IOCOERCE:

do $$
begin
for i in 1..2000 loop
i := i::text;
end loop; end; $$ language plpgsql;
DO

Times and perf report:

HEAD:

Time: 1815.824 ms (00:01.816)
Time: 1818.980 ms (00:01.819)
Time: 1695.555 ms (00:01.696)
Time: 1762.022 ms (00:01.762)

--97.49%--exec_stmts
  |
  --85.97%--exec_assign_expr
|
|--65.56%--exec_eval_expr
|  |
|  |--53.71%--ExecInterpExpr
|  |  |
|  |  |--14.14%--textin


Patched:

Time: 1872.469 ms (00:01.872)
Time: 1927.371 ms (00:01.927)
Time: 1910.126 ms (00:01.910)
Time: 1948.322 ms (00:01.948)

--97.70%--exec_stmts
  |
  --88.13%--exec_assign_expr
|
|--73.27%--exec_eval_expr
|  |
|  |--58.29%--ExecInterpExpr
|  |  |
|  |
|--25.69%--InputFunctionCallSafe
|  |  |  |
|  |  |
|--14.75%--textin

So, yes, putting InputFunctionCallSafe() in the common path may not
have been such a good idea.

> I'd split the soft-error path into
> a separate opcode. For JIT it can largely be implemented using the same code,
> eliding the check if it's the non-soft path. Or you can just put it into an
> out-of-line function.

Do you mean putting the execExprInterp.c code for the soft-error path
(with a new opcode) into an out-of-line function?  That definitely
makes the JIT version a tad simpler than if the error-checking is done
in-line.

So, the existing code for EEOP_IOCOERCE in both execExprInterp.c and
llvmjit_expr.c will remain unchanged.  Also, I can write the code for
the new opcode such that it doesn't use InputFunctionCallSafe() at
runtime, but rather passes the ErrorSaveContext directly by putting
that in the input function's FunctionCallInfo.context and checking
SOFT_ERROR_OCCURRED() directly.  That will have less overhead.

> I don't like adding more stuff to ExprState. This one seems particularly
> awkward, because it might be used by more than one level in an expression
> subtree, which means you really need to save/restore old values when
> recursing.

Hmm, I'd think that all levels will follow either soft or non-soft
error mode, so sharing the ErrorSaveContext passed via ExprState
doesn't look wrong to me.  IOW, there's only one value, not one for
every level, so there doesn't appear to be any need to have the
save/restore convention as we have for innermost_domainval et al.

I can see your point that adding another 8 bytes at the end of
ExprState might be undesirable.  Note though that ExprState.escontext
is only accessed in the ExecInitExpr phase, but during evaluation.

The alternative to not passing the ErrorSaveContext via ExprState is
to add a new parameter to ExecInitExprRec() and to functions that call
it.  The footprint would be much larger though.  Would you rather
prefer that?

> > @@ -1579,25 +1582,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
> >
> >   /* lookup the result type's input function */
> >   scratch.d.iocoerce.finfo_in = 
> > palloc0(sizeof(FmgrInfo));
> > - scratch.d.iocoerce.fcinfo_data_in = 
> > palloc0(SizeForFunctionCallInfo(3));
> > -
> >   getTypeInputInfo(iocoerce->resulttype,
> > -  , 
> > );
> > +  , 
> > );
> >   fmgr_info(iofunc, 
> > scratch.d.iocoerce.finfo_in);
> >   fmgr_info_set_expr((Node *) node, 
> > scratch.d.iocoerce.finfo_in);
> > - 
> > InitFunctionCallInfoData(*scratch.d.iocoerce.fcinfo_data_in,
> > -   
> >scratch.d.iocoerce.finfo_in,
> > -  

Re: SQL:2011 application time

2023-10-10 Thread Paul Jungwirth

On 9/25/23 14:00, Peter Eisentraut wrote:

Looking through the tests in v16-0001:

+-- PK with no columns just WITHOUT OVERLAPS:
+CREATE TABLE temporal_rng (
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS)
+);
+ERROR:  syntax error at or near "WITHOUT"
+LINE 3:  CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV...
+  ^

I think this error is confusing.  The SQL standard requires at least one 
non-period column in a PK.  I don't know why that is or why we should 
implement it.  But if we want to implement it, maybe we should enforce 
that in parse analysis rather than directly in the parser, to be able to 
produce a more friendly error message.


Okay.

(I think the reason the standard requires one non-period column is to 
identify the "entity". If philosophically the row is an Aristotelian 
proposition about that thing, the period qualifies it as true just 
during some time span. So the scalar part is doing the work that a PK 
conventionally does, and the period part does something else. Perhaps a 
PK/UNIQUE constraint with no scalar part would still be useful, but not 
very often I think, and I'm not sure it makes sense to call it PK/UNIQUE.)



+-- PK with a range column/PERIOD that isn't there:
+CREATE TABLE temporal_rng (
+   id INTEGER,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);
+ERROR:  range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist

I think here we should just produce a "column doesn't exist" error 
message, the same as if the "id" column was invalid.  We don't need to 
get into the details of what kind of column it should be.  That is done 
in the next test


I'll change it. The reason for the different wording is that it might 
not be a column at all. It might be a PERIOD. So what about just "column 
or PERIOD doesn't exist"? (Your suggestion is fine too though.)



+ERROR:  column "valid_at" in WITHOUT OVERLAPS is not a range type

Also, in any case it would be nice to have a location pointer here (for 
both cases).


Agreed.


+-- PK with one column plus a range:
+CREATE TABLE temporal_rng (
+   -- Since we can't depend on having btree_gist here,
+   -- use an int4range instead of an int.
+   -- (The rangetypes regression test uses the same trick.)
+   id int4range,
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);

I'm confused why you are using int4range here (and in further tests) for 
the scalar (non-range) part of the primary key.  Wouldn't a plaint int4 
serve here?


A plain int4 would be better, and it would match the normal use-case, 
but you must have btree_gist to create an index like that, and the 
regress tests can't assume we have that. Here is the part from 
sql/rangetypes.sql I'm referring to:


--
-- Btree_gist is not included by default, so to test exclusion
-- constraints with range types, use singleton int ranges for the "="
-- portion of the constraint.
--

create table test_range_excl(
  room int4range,
  speaker int4range,
  during tsrange,
  exclude using gist (room with =, during with &&),
  exclude using gist (speaker with =, during with &&)
);

+SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE 
conname = 'temporal_rng_pk';

+    pg_get_indexdef
+---
+ CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, 
valid_at)


Shouldn't this somehow show the operator classes for the columns?  We 
are using different operator classes for the id and valid_at columns, 
aren't we?


We only print the operator classes if they are not the default, so they 
don't appear here.


I do suspect something more is desirable though. For exclusion 
constraints we replace everything before the columns with just "EXCLUDE 
USING gist". I could embed WITHOUT OVERLAPS but it's not valid syntax in 
CREATE INDEX. Let me know if you have any ideas.



+-- PK with USING INDEX (not possible):
+CREATE TABLE temporal3 (
+   id int4range,
+   valid_at tsrange
+);
+CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at);
+ALTER TABLE temporal3
+   ADD CONSTRAINT temporal3_pk
+   PRIMARY KEY USING INDEX idx_temporal3_uq;
+ERROR:  "idx_temporal3_uq" is not a unique index
+LINE 2:  ADD CONSTRAINT temporal3_pk
+ ^
+DETAIL:  Cannot create a primary key or unique constraint using such an 
index.


Could you also add a test where the index is unique and the whole thing 
does work?


No problem!

Apart from the tests, how about renaming the column 
pg_constraint.contemporal to something like to conwithoutoverlaps?


Is that too verbose? I've got some code already changing it to 
conoverlaps but I'm probably happier with conwithoutoverlaps, assuming 
no one else minds it.


Yours,

--

Re: interval_ops shall stop using btequalimage (deduplication)

2023-10-10 Thread Peter Geoghegan
On Tue, Oct 10, 2023 at 8:51 PM Peter Geoghegan  wrote:
> I don't see any reason to delay committing your fix. The issue that
> you've highlighted is exactly the kind of issue that I anticipated
> might happen at some point. This seems straightforward.

BTW, we don't need to recommend the heapallindexed option in the
release notes. Calling bt_check_index() will reliably indicate that
corruption is present when called against existing interval_ops
indexes once your fix is in. Simply having an index whose metapage's
allequalimage field is spuriously set to true will be recognized as
corruption right away. Obviously, this will be no less true with an
existing interval_ops index that happens to be completely empty.

-- 
Peter Geoghegan




Re: SQL:2011 application time

2023-10-10 Thread Paul Jungwirth

Hi Peter et al,

On 9/1/23 12:56, Paul Jungwirth wrote:

On 9/1/23 11:30, Peter Eisentraut wrote:
I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
would be possible.  Then the WITHOUT OVERLAPS clause would directly 
correspond to the choice between equality or overlaps operator per 
column.
I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality.


I've been working on implementing this, but I've come to think it is the 
wrong way to go.


If we support this in primary key and unique constraints, then we must 
also support it for foreign keys and UPDATE/DELETE FOR PORTION OF. But 
implementing that logic is pretty tricky. For example take a foreign key 
on (id, PERIOD valid_at, PERIOD asserted_at). We need to ensure the 
referenced two-dimensional time space `contains` the referencing 
two-dimensional space. You can visualize a rectangle in two-dimensional 
space for each referencing record (which we validate one at a time). The 
referenced records must be aggregated and so form a polygon (of all 
right angles). For example the referencing record may be (1, [0,2), 
[0,2)) with referenced records of (1, [0,2), [0,1)) and (1, [0,1), 
[1,2)). (I'm using intranges since they're easier to read, but you could 
imagine these as dateranges like [2000-01-01,2002-01-01).) Now the 
range_agg of their valid_ats is [0,2) and of their asserted_ats is 
[0,2). But the referenced 2d space still doesn't contain the referencing 
space. It's got one corner missing. This is a well-known problem among 
game developers. We're lucky not to have arbitrary polygons, but it's 
still a tough issue.


Besides `contains` we also need to compute `overlaps` and `intersects` 
to support these temporal features. Implementing that for 2d, 3d, etc 
looks very complicated, for something that is far outside the normal use 
case and also not part of the standard. It will cost a little 
performance for the normal 1d use case too.


I think a better approach (which I want to attempt as an add-on patch, 
not in this main series) is to support not just range types, but any 
type with the necessary operators. Then you could have an mdrange 
(multi-dimensional range) or potentially even an arbitrary n-dimensional 
polygon. (PostGIS has something like this, but its `contains` operator 
compares (non-concave) *bounding boxes*, so it would not work for the 
example above. Still the similarity between temporal and spatial data is 
striking. I'm going to see if I can get some input from PostGIS folks 
about how useful any of this is to them.) This approach would also let 
us use multiranges: not for multiple dimensions, but for non-contiguous 
time spans stored in a single row. This puts the complexity in the types 
themselves (which seems more appropriate) and is ultimately more 
flexible (supporting not just mdrange but also multirange, and other 
things too).


This approach also means that instead of storing a mask/list of which 
columns use WITHOUT OVERLAPS, I can just store one attnum. Again, this 
saves the common use-case from paying a performance penalty to support a 
much rarer one.


I've still got my multi-WITHOUT OVERLAPS work, but I'm going to switch 
gears to what I've described here. Please let me know if you disagree!


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: stopgap fix for signal handling during restore_command

2023-10-10 Thread Michael Paquier
On Tue, Oct 10, 2023 at 08:39:29PM -0700, Andres Freund wrote:
> We shouldn't call proc_exit() in a signal handler. We perhaps have a few
> remaining calls left, but we should (and I think in some cases are) working on
> removing those.

Hmm.  I don't recall anything remaining, even after a quick check.
FWIW, I was under the impression that Thomas' work done in
0da096d78e1e4 has cleaned up the last bits of that.
--
Michael


signature.asc
Description: PGP signature


Re: interval_ops shall stop using btequalimage (deduplication)

2023-10-10 Thread Peter Geoghegan
On Tue, Oct 10, 2023 at 8:29 PM Noah Misch  wrote:
> My friend got an amcheck "lacks matching index tuple" failure, and they asked
> me about it.  I ran into the assertion failure while reproducing things.

Reminds me of the time that amcheck found a bug in the default btree
opclass for PostGIS's geography type.

The opclass wasn't really intended to be used for indexing. The issue
with the opclass (which violated transitive consistency) would
probably never have been detected were it not for the tendency of
PostGIS users to accidentally create useless B-Tree indexes on
geography columns. Users sometimes omitted "using gist", without
necessarily noticing that the index was practically useless.

> > Do we really need to change the catalog contents when backpatching?
>
> Not really.  I think we usually do.  On the other hand, unlike some past
> cases, there's no functional need for the catalog changes.  The catalog
> changes just get a bit of efficiency.  No strong preference here.

I'll defer to you on this question, then.

I don't see any reason to delay committing your fix. The issue that
you've highlighted is exactly the kind of issue that I anticipated
might happen at some point. This seems straightforward.

-- 
Peter Geoghegan




Re: Removing unneeded self joins

2023-10-10 Thread Andrei Lepikhov

On 11/10/2023 02:29, Alena Rybakina wrote:

I have reviewed your patch and I noticed a few things.


Thanks for your efforts,

I have looked at the latest version of the code, I assume that the error 
lies in the replace_varno_walker function, especially in the place where 
we check the node by type Var, and does not form any NullTest node.


It's not a bug, it's an optimization we discussed with Alexander above.

Secondly, I added some code in some places to catch erroneous cases and 
added a condition when we should not try to apply the self-join-removal 
transformation due to the absence of an empty self-join list after 
searching for it and in general if there are no joins in the query. 
Besides, I added a query for testing and wrote about it above. I have 
attached my diff file.

Ok, I will look at this
In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?
I meant here that only clauses pushed by reconsider_outer_join_clauses() 
can be found in the joininfo list, and they are not relevant, as you can 
understand.
Having written that, I realized that it was a false statement. ;) - 
joininfo can also contain results of previous SJE iterations, look:


CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;

Having executed this SQL code, you could see that in the last query, the 
SJE feature didn't delete one of the JOINs because of the reason I had 
written above.

It's not an one-minute fix - I will try to propose solution a bit later.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-10 Thread tender wang
For hash partition table, if partition key is IS NULL clause,  the
condition in if  in get_steps_using_prefix_recurse:
if (cur_keyno < step_lastkeyno - 1)
is not enough.
Like the decode crashed case, explain select * from hp where a = 1 and b is
null and c = 1;
prefix list just has a = 1 clause.
I try fix this in attached patch.
David Rowley  于2023年10月11日周三 10:50写道:

> On Tue, 10 Oct 2023 at 21:31, Sergei Glukhov 
> wrote:
> > create table hp (a int, b text, c int, d int)
> >partition by hash (a part_test_int4_ops, b part_test_text_ops, c
> > part_test_int4_ops);
> > create table hp0 partition of hp for values with (modulus 4, remainder
> 0);
> > create table hp3 partition of hp for values with (modulus 4, remainder
> 3);
> > create table hp1 partition of hp for values with (modulus 4, remainder
> 1);
> > create table hp2 partition of hp for values with (modulus 4, remainder
> 2);
> >
> >
> > Another crash in the different place even with the fix:
> > explain select * from hp where a = 1 and b is null and c = 1;
>
> Ouch.  It looks like 13838740f tried to fix things in this area before
> and even added a regression test for it. Namely:
>
> -- Test that get_steps_using_prefix() handles non-NULL step_nullkeys
> explain (costs off) select * from hp_prefix_test where a = 1 and b is
> null and c = 1 and d = 1;
>
> I guess that one does not crash because of the "d = 1" clause is in
> the "start" ListCell in get_steps_using_prefix_recurse(), whereas,
> with your case start is NULL which is an issue for cur_keyno =
> ((PartClauseInfo *) lfirst(start))->keyno;.
>
> It might have been better if PartClauseInfo could also describe IS
> NULL quals, but I feel if we do that now then it would require lots of
> careful surgery in partprune.c to account for that.  Probably the fix
> should be localised to get_steps_using_prefix_recurse() to have it do
> something like pass the keyno to try and work on rather than trying to
> get that from the "prefix" list. That way if there's no item in that
> list for that keyno, we can check in step_nullkeys for the keyno.
>
> I'll continue looking.
>
> David
>
>
>
From 1640c0d05269c3368fb41fcffc66e38630ff522d Mon Sep 17 00:00:00 2001
From: "tender.wang" 
Date: Wed, 11 Oct 2023 11:32:04 +0800
Subject: [PATCH] Fix null partition key pruning for hash parittion table.

---
 src/backend/partitioning/partprune.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index 7179b22a05..c2a388d454 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2438,6 +2438,7 @@ 
get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
List   *result = NIL;
ListCell   *lc;
int cur_keyno;
+   int prefix_lastkeyno;
 
/* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
check_stack_depth();
@@ -2445,7 +2446,11 @@ 
get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
/* Check if we need to recurse. */
Assert(start != NULL);
cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
-   if (cur_keyno < step_lastkeyno - 1)
+   /* Note that for hash partitioning, if partition key is IS NULL clause,
+* that partition key will not present in prefix List.
+*/
+   prefix_lastkeyno = ((PartClauseInfo *) llast(prefix))->keyno;
+   if (cur_keyno < step_lastkeyno - 1 && cur_keyno != prefix_lastkeyno)
{
PartClauseInfo *pc;
ListCell   *next_start;
-- 
2.25.1



Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Michael Paquier
On Wed, Oct 11, 2023 at 08:26:42AM +0900, Michael Paquier wrote:
> On Tue, Oct 10, 2023 at 09:12:49AM +0200, Drouvot, Bertrand wrote:
> > On 10/10/23 7:58 AM, Michael Paquier wrote:
> >> I was looking at v8 just before you sent this v9, and still got
> >> annoyed by the extra boolean argument added to InitPostgres().
> > 
> > Arf, I did not look at it as I had in mind to look at it once
> > this one is in.
> 
> No problem.  I'm OK to do it.

Applied 0001 for now.

> I am not sure that this is necessary in the code paths of
> BackgroundWorkerInitializeConnectionByOid() and
> BackgroundWorkerInitializeConnection() as datallowconn is handled a
> few lines down.

 /* flags for InitPostgres() */
 #define INIT_PG_LOAD_SESSION_LIBS  0x0001
 #define INIT_PG_OVERRIDE_ALLOW_CONNS   0x0002
+#define INIT_PG_BYPASS_ROLE_LOGIN  0x0004

In 0002, I am not sure that this is the best name for this new flag.
There is consistency with the bgworker part, for sure, but shouldn't
we name that OVERRIDE_ROLE_LOGIN instead in miscadmin.h?
--
Michael
From 550019a0ec84fcde8588bb9c2ae4ab08ad4d5fa5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 10 Oct 2023 14:56:36 +0900
Subject: [PATCH v11] Allow background workers to bypass login check

This adds a new flag value (BGWORKER_BYPASS_ROLELOGINCHECK) to the flags
being used in BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().  This flag allows the
background workers to bypass the login check done in
InitializeSessionUserId().
---
 src/include/miscadmin.h   |  4 ++-
 src/include/postmaster/bgworker.h |  6 ++--
 src/backend/postmaster/postmaster.c   |  6 
 src/backend/utils/init/miscinit.c |  4 +--
 src/backend/utils/init/postinit.c |  6 ++--
 .../modules/worker_spi/t/001_worker_spi.pl| 31 +++
 src/test/modules/worker_spi/worker_spi.c  |  2 ++
 doc/src/sgml/bgworker.sgml|  4 ++-
 8 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1cc3712c0f..e280b8192d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -364,7 +364,8 @@ extern bool InSecurityRestrictedOperation(void);
 extern bool InNoForceRLSOperation(void);
 extern void GetUserIdAndContext(Oid *userid, bool *sec_def_context);
 extern void SetUserIdAndContext(Oid userid, bool sec_def_context);
-extern void InitializeSessionUserId(const char *rolename, Oid roleid);
+extern void InitializeSessionUserId(const char *rolename, Oid roleid,
+	bool bypass_login_check);
 extern void InitializeSessionUserIdStandalone(void);
 extern void SetSessionAuthorization(Oid userid, bool is_superuser);
 extern Oid	GetCurrentRoleId(void);
@@ -466,6 +467,7 @@ extern PGDLLIMPORT AuxProcType MyAuxProcType;
 /* flags for InitPostgres() */
 #define INIT_PG_LOAD_SESSION_LIBS		0x0001
 #define INIT_PG_OVERRIDE_ALLOW_CONNS	0x0002
+#define INIT_PG_BYPASS_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 7815507e3d..d7a5c1a946 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -150,9 +150,11 @@ extern void BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid, ui
  * Flags to BackgroundWorkerInitializeConnection et al
  *
  *
- * Allow bypassing datallowconn restrictions when connecting to database
+ * Allow bypassing datallowconn restrictions and login check when connecting
+ * to database
  */
-#define BGWORKER_BYPASS_ALLOWCONN 1
+#define BGWORKER_BYPASS_ALLOWCONN 0x0001
+#define BGWORKER_BYPASS_ROLELOGINCHECK 0x0002
 
 
 /* Block/unblock signals in a background worker process */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3d7fec995a..b1cabbd0d6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5567,6 +5567,9 @@ BackgroundWorkerInitializeConnection(const char *dbname, const char *username, u
 	/* ignore datallowconn? */
 	if (flags & BGWORKER_BYPASS_ALLOWCONN)
 		init_flags |= INIT_PG_OVERRIDE_ALLOW_CONNS;
+	/* bypass login check? */
+	if (flags & BGWORKER_BYPASS_ROLELOGINCHECK)
+		init_flags |= INIT_PG_BYPASS_ROLE_LOGIN;
 
 	/* XXX is this the right errcode? */
 	if (!(worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION))
@@ -5598,6 +5601,9 @@ BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid, uint32 flags)
 	/* ignore datallowconn? */
 	if (flags & BGWORKER_BYPASS_ALLOWCONN)
 		init_flags |= INIT_PG_OVERRIDE_ALLOW_CONNS;
+	/* bypass login check? */
+	if (flags & BGWORKER_BYPASS_ROLELOGINCHECK)
+		init_flags |= INIT_PG_BYPASS_ROLE_LOGIN;
 
 	/* XXX is this the right errcode? */
 	if (!(worker->bgw_flags & 

Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

2023-10-10 Thread Noah Misch
On Sun, Oct 08, 2023 at 10:00:03PM -0700, David G. Johnston wrote:
> On Sun, Oct 8, 2023 at 9:10 PM Noah Misch  wrote:
> > I didn't think of any phrasing that clearly explained things without the
> > reader consulting the code.  I considered these:

I'm leaning toward:

  "socket file descriptor out of range for select(): %d" [style guide violation]

A true style guide purist might bury it in a detail message:

  ERROR: socket file descriptor out of range: %d
  DETAIL: select() accepts file descriptors from 0 to 1023, inclusive, in this 
build.
  HINT: Try fewer concurrent database clients.

> >   "socket file descriptor out of range: %d" [what range?]
> >
> Quick drive-by...but it seems that < 0 is a distinctly different problem
> than exceeding FD_SETSIZE.  I'm unsure what would cause the former but the
> error for the later seems like:
> 
> error: "Requested socket file descriptor %d exceeds connection limit of
> %d", fd, FD_SETSIZE-1
> hint: Reduce the requested number of concurrent connections
> 
> In short, the concept of range doesn't seem applicable here.  There is an
> error state at the max, and some invalid system state condition where the
> position within a set is somehow negative.  These should be separated -
> with the < 0 check happening first.

I view it as: the range of select()-able file descriptors is [0,FD_SETSIZE).
Negative is out of range.

> And apparently this condition isn't applicable when dealing with jobs in
> connect_slot?

For both pgbench.c and parallel_slot.c, there are sufficient negative-FD
checks elsewhere in the file.  Ideally, either both files would have redundant
checks, or neither file would.  I didn't feel the need to mess with that part
of the status quo.

> Also, I see that for connections we immediately issue FD_SET
> so this check on the boundary of the file descriptor makes sense.  But the
> remaining code in connect_slot doesn't involve FD_SET so the test for the
> file descriptor falling within its maximum seems to be coming out of
> nowhere.  Likely this is all good, and the lack of symmetry is just due to
> the natural progressive development of the code, but it stands out to the
> uninitiated looking at this patch.

True.  The approach in parallel_slot.c is to check the FD number each time it
opens a socket, after which its loop with FD_SET() doesn't recheck.  That's a
bit more efficient than the pgbench.c way, because each file may call FD_SET()
many times per socket.  Again, I didn't mess with that part of the status quo.




Re: stopgap fix for signal handling during restore_command

2023-10-10 Thread Andres Freund
Hi,

On 2023-10-10 22:29:34 -0500, Nathan Bossart wrote:
> On Tue, Oct 10, 2023 at 09:54:18PM -0500, Nathan Bossart wrote:
> > On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote:
> >> I'd make these elog(PANIC), I think. The paths are not performance critical
> >> enough that a single branch hurts, so the overhead of the check is 
> >> irrelevant,
> >> and the consequences of calling ProcKill() twice for the same process are 
> >> very
> >> severe.
> > 
> > Right.  Should we write_stderr_signal_safe() and then abort() to keep these
> > paths async-signal-safe?
> 
> Hm.  I see that elog() is called elsewhere in proc_exit(), and it does not
> appear to be async-signal-safe.  Am I missing something?

We shouldn't call proc_exit() in a signal handler. We perhaps have a few
remaining calls left, but we should (and I think in some cases are) working on
removing those.

Greetings,

Andres Freund




Re: stopgap fix for signal handling during restore_command

2023-10-10 Thread Nathan Bossart
On Tue, Oct 10, 2023 at 09:54:18PM -0500, Nathan Bossart wrote:
> On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote:
>> I'd make these elog(PANIC), I think. The paths are not performance critical
>> enough that a single branch hurts, so the overhead of the check is 
>> irrelevant,
>> and the consequences of calling ProcKill() twice for the same process are 
>> very
>> severe.
> 
> Right.  Should we write_stderr_signal_safe() and then abort() to keep these
> paths async-signal-safe?

Hm.  I see that elog() is called elsewhere in proc_exit(), and it does not
appear to be async-signal-safe.  Am I missing something?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: interval_ops shall stop using btequalimage (deduplication)

2023-10-10 Thread Noah Misch
On Tue, Oct 10, 2023 at 08:12:36PM -0700, Peter Geoghegan wrote:
> On Tue, Oct 10, 2023 at 6:33 PM Noah Misch  wrote:
> > interval_ops, however, recognizes equal-but-distinguishable values:
> 
> > Fails with:
> >
> >   2498151 2023-10-10 05:06:46.177 GMT DEBUG:  building index "ti" on table 
> > "t" serially
> >   2498151 2023-10-10 05:06:46.178 GMT DEBUG:  index "ti" can safely use 
> > deduplication
> >   TRAP: failed Assert("!itup_key->allequalimage || keepnatts == 
> > _bt_keep_natts_fast(rel, lastleft, firstright)"), File: "nbtutils.c", Line: 
> > 2443, PID: 2498151
> 
> Nice catch.
> 
> Out of curiosity, how did you figure this out? Did it just occur to
> you that interval_ops had a behavior that made deduplication unsafe?
> Or did the problem come to your attention after running amcheck on a
> customer database? Or was it something else?

My friend got an amcheck "lacks matching index tuple" failure, and they asked
me about it.  I ran into the assertion failure while reproducing things.

> I'm a little surprised that it took this long to notice
> the interval_ops issue.

Agreed.  I don't usually store interval values in tables, and I'm not sure
I've ever indexed one.  Who knows.

> Do we really need to change the catalog contents when backpatching?

Not really.  I think we usually do.  On the other hand, unlike some past
cases, there's no functional need for the catalog changes.  The catalog
changes just get a bit of efficiency.  No strong preference here.




Re: Doc: Minor update for enable_partitionwise_aggregate

2023-10-10 Thread Andrew Atkinson
Thank you for the feedback and clarifications Ashutosh. How about this?

"which allows grouping or aggregation on partitioned tables to be performed
separately for each partition."

Attached a v2 patch file with this change.

Here is a gist w/ a partitioned table and 2 partitions, comparing execution
plans after enabling the parameter, and reading the plan nodes up from the
bottom. With enable_partitionwise_aggregate = on, I can see the Partial
HashAggregate/"Group Key" on each of the two partitions (of the partitioned
table) where I don't see that when the setting is off (by default).
https://gist.github.com/andyatkinson/7af81fb8a5b9e677af6049e29ab2cb73

For the terms partitioned table vs. partitions, I used how they're
described here:
https://www.postgresql.org/docs/current/ddl-partitioning.html
- partitioned table (virtual table)
- partitions (of a partitioned table)

Thanks!
Andrew


On Tue, Oct 3, 2023 at 4:33 AM Ashutosh Bapat 
wrote:

> On Sun, Oct 1, 2023 at 7:38 AM Andy Atkinson 
> wrote:
> >
> > Hello. While reading the docs for the enable_partitionwise_aggregate
> parameter on the Query Planning page, I thought the description had a small
> mistake that could be improved.
> >
> > The current wording is: "which allows grouping or aggregation on a
> partitioned tables performed separately "
> >
> > Page: https://www.postgresql.org/docs/current/runtime-config-query.html
> >
> > I think possible better alternatives could be:
> >
> > (Option 1) a "partitioned table's partitions" (the possessive form of
> "it's"). The "enable_partition_pruning" parameter uses "the partitioned
> table's partitions" in this form. I think this option is good, but I had a
> slight preference for option 2.
> > (Option 2) Or to just cut out the first part and say "to be performed
> separately for each partition", which seemed simpler. So the sentence
> reads: "which allows grouping or aggregation to be performed separately for
> each partition"
>
> I would leave "on a partitioned table". Notice that I have removed "s"
> from tables.
>
> > (Option 3) dropping the "a" so it says "which allows grouping or
> aggregation on partitioned tables performed separately". I don't think this
> is as good though because the aggregation happens on the partitions, so it
> feels slightly off to me to say the "partitioned tables" instead of the
> partitions.
>
> It's technically incorrect as well. Aggregation is performed on a
> single relation always - a join or subquery or simple relation. A join
> may have multiple tables in it but the aggregation is performed on its
> result and not individual tables and hence not on partitions of
> individual tables.
>
> --
> Best Wishes,
> Ashutosh Bapat
>


query-planning-enable-partitionwise-aggregate-improved-wording_v2.patch
Description: Binary data


Re: interval_ops shall stop using btequalimage (deduplication)

2023-10-10 Thread Peter Geoghegan
On Tue, Oct 10, 2023 at 6:33 PM Noah Misch  wrote:
> interval_ops, however, recognizes equal-but-distinguishable values:

> Fails with:
>
>   2498151 2023-10-10 05:06:46.177 GMT DEBUG:  building index "ti" on table 
> "t" serially
>   2498151 2023-10-10 05:06:46.178 GMT DEBUG:  index "ti" can safely use 
> deduplication
>   TRAP: failed Assert("!itup_key->allequalimage || keepnatts == 
> _bt_keep_natts_fast(rel, lastleft, firstright)"), File: "nbtutils.c", Line: 
> 2443, PID: 2498151

Nice catch.

Out of curiosity, how did you figure this out? Did it just occur to
you that interval_ops had a behavior that made deduplication unsafe?
Or did the problem come to your attention after running amcheck on a
customer database? Or was it something else?

> I've also caught btree posting lists where one TID refers to a '1d' heap
> tuple, while another TID refers to a '24h' heap tuple.  amcheck complains.
> Index-only scans can return the '1d' bits where the actual tuple had the '24h'
> bits.  Are there other consequences to highlight in the release notes?

Nothing else comes to mind right now. I should think about posting
list splits some more tomorrow, though -- those are tricky.

> The back-branch patch is larger, to fix things without initdb.  Hence, I'm
> attaching patches for HEAD and for v16 (trivial to merge back from there).

> I glanced at the other opfamilies permitting deduplication, and they look 
> okay:

Due to the way that nbtsplitloc.c deals with duplicates, I'd expect
the same assertion failure with any index where a single leaf page is
filled with opclass-wise duplicates with more than one distinct
representation/output -- the details beyond that shouldn't matter. I
was happy with how easy it was to make this assertion fail (with a
known broken numeric_ops opclass) while testing/developing
deduplication. I'm a little surprised that it took this long to notice
the interval_ops issue.

Do we really need to change the catalog contents when backpatching?

--
Peter Geoghegan




Re: stopgap fix for signal handling during restore_command

2023-10-10 Thread Nathan Bossart
On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote:
> On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:
>> diff --git a/src/backend/storage/lmgr/proc.c 
>> b/src/backend/storage/lmgr/proc.c
>> index 22b4278610..b9e2c3aafe 100644
>> --- a/src/backend/storage/lmgr/proc.c
>> +++ b/src/backend/storage/lmgr/proc.c
>> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
>>  dlist_head *procgloballist;
>>  
>>  Assert(MyProc != NULL);
>> +Assert(MyProc->pid == (int) getpid());  /* not safe if forked by 
>> system(), etc. */
>>  
>>  /* Make sure we're out of the sync rep lists */
>>  SyncRepCleanupAtProcExit();
>> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
>>  PGPROC *proc;
>>  
>>  Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
>> +Assert(MyProc->pid == (int) getpid());  /* not safe if forked by 
>> system(), etc. */
>>  
>>  auxproc = [proctype];
>>  
> 
> I'd make these elog(PANIC), I think. The paths are not performance critical
> enough that a single branch hurts, so the overhead of the check is irrelevant,
> and the consequences of calling ProcKill() twice for the same process are very
> severe.

Right.  Should we write_stderr_signal_safe() and then abort() to keep these
paths async-signal-safe?

>> +/*
>> + * Write a message to STDERR using only async-signal-safe functions.  This 
>> can
>> + * be used to safely emit a message from a signal handler.
>> + *
>> + * TODO: It is likely possible to safely do a limited amount of string
>> + * interpolation (e.g., %s and %d), but that is not presently supported.
>> + */
>> +void
>> +write_stderr_signal_safe(const char *fmt)
> 
> As is, this isn't a format, so I'd probably just name it s or str :)

Yup.

>> -/*
>> - * Write errors to stderr (or by equal means when stderr is
>> - * not available). Used before ereport/elog can be used
>> - * safely (memory context, GUC load etc)
>> - */
>>  extern void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2);
>> +extern void write_stderr_signal_safe(const char *fmt);
> 
> Not sure why you removed the comment?

I think it was because it's an exact copy of the comment above the function
in elog.c, and I didn't want to give the impression that it applied to the
signal-safe one, too.  I added it back along with a new comment for
write_stderr_signal_safe().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 0a5d6a420352ec6601bd9967321d82b8a7808d67 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 14:27:48 -0800
Subject: [PATCH v11 1/2] Move extra code out of the Pre/PostRestoreCommand()
 block.

If SIGTERM is received within this block, the startup process will
immediately proc_exit() in the signal handler, so it is inadvisable
to include any more code than is required in this section.  This
change moves the code recently added to this block (see 1b06d7b and
7fed801) to outside of the block.  This ensures that only system()
is called while proc_exit() might be called in the SIGTERM handler,
which is how this code worked from v8.4 to v14.
---
 src/backend/access/transam/xlogarchive.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index f3fb92c8f9..b998cd651c 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+
 	/*
-	 * Check signals before restore command and reset afterwards.
+	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
+	 * that it should proc_exit() right away.  This is done for the duration of
+	 * the system() call because there isn't a good way to break out while it
+	 * is executing.  Since we might call proc_exit() in a signal handler, it
+	 * is best to put any additional logic before or after the
+	 * PreRestoreCommand()/PostRestoreCommand() section.
 	 */
 	PreRestoreCommand();
 
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
 	rc = system(xlogRestoreCmd);
-	pgstat_report_wait_end();
 
 	PostRestoreCommand();
+
+	pgstat_report_wait_end();
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
-- 
2.25.1

>From ec9b038e8f4b1ecfece456e308b658ecdc8c8f9d Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v11 2/2] Don't proc_exit() in startup's SIGTERM handler if
 forked by system().

Instead, emit a message to STDERR and _exit() in this case.  This
change also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
 src/backend/postmaster/startup.c | 17 

Re: Fix typo in psql zh_CN.po

2023-10-10 Thread Tom Lane
Richard Guo  writes:
> On Wed, Oct 11, 2023 at 4:30 AM jinser  wrote:
>> I found a typo here while using psql. I think this should be a trivial
>> patch.
>> The typo is that there is an extra `l` before `列出所有事件触发器`.

> +1.

FYI, we have a slightly odd process around this: PG's translated
messages are managed by a different set of people and have a
different authoritative repo.  There's enough overlap between
those people and pgsql-hackers that this report will likely be
seen by somebody who can commit into the translations repo.
Ideally however, translation bugs should be reported to
the pgsql-translators mailing list.

regards, tom lane




Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-10 Thread David Rowley
On Tue, 10 Oct 2023 at 21:31, Sergei Glukhov  wrote:
> create table hp (a int, b text, c int, d int)
>partition by hash (a part_test_int4_ops, b part_test_text_ops, c
> part_test_int4_ops);
> create table hp0 partition of hp for values with (modulus 4, remainder 0);
> create table hp3 partition of hp for values with (modulus 4, remainder 3);
> create table hp1 partition of hp for values with (modulus 4, remainder 1);
> create table hp2 partition of hp for values with (modulus 4, remainder 2);
>
>
> Another crash in the different place even with the fix:
> explain select * from hp where a = 1 and b is null and c = 1;

Ouch.  It looks like 13838740f tried to fix things in this area before
and even added a regression test for it. Namely:

-- Test that get_steps_using_prefix() handles non-NULL step_nullkeys
explain (costs off) select * from hp_prefix_test where a = 1 and b is
null and c = 1 and d = 1;

I guess that one does not crash because of the "d = 1" clause is in
the "start" ListCell in get_steps_using_prefix_recurse(), whereas,
with your case start is NULL which is an issue for cur_keyno =
((PartClauseInfo *) lfirst(start))->keyno;.

It might have been better if PartClauseInfo could also describe IS
NULL quals, but I feel if we do that now then it would require lots of
careful surgery in partprune.c to account for that.  Probably the fix
should be localised to get_steps_using_prefix_recurse() to have it do
something like pass the keyno to try and work on rather than trying to
get that from the "prefix" list. That way if there's no item in that
list for that keyno, we can check in step_nullkeys for the keyno.

I'll continue looking.

David




Re: Lowering the default wal_blocksize to 4K

2023-10-10 Thread Andres Freund
Hi,

On 2023-10-11 14:39:12 +1300, Thomas Munro wrote:
> On Wed, Oct 11, 2023 at 12:29 PM Andres Freund  wrote:
> > On 2023-10-10 21:30:44 +0200, Matthias van de Meent wrote:
> > > On Tue, 10 Oct 2023 at 06:14, Andres Freund  wrote:
> > > > I was thinking we should perhaps do the opposite, namely getting rid of 
> > > > short
> > > > page headers. The overhead in the "byte position" <-> LSN conversion 
> > > > due to
> > > > the differing space is worse than the gain. Or do something inbetween - 
> > > > having
> > > > the system ID in the header adds a useful crosscheck, but I'm far less
> > > > convinced that having segment and block size in there, as 32bit numbers 
> > > > no
> > > > less, is worthwhile. After all, if the system id matches, it's not 
> > > > likely that
> > > > the xlog block or segment size differ.
> > >
> > > Hmm. I don't think we should remove those checks, as I can see people
> > > that would want to change their XLog block size with e.g.
> > > pg_reset_wal.
> >
> > I don't think that's something we need to address in every physical
> > segment. For one, there's no option to do so. But more importantly, if they
> > don't change the xlog block size, we'll just accept random WAL as well. If
> > somebody goes to the trouble of writing a custom tool, they can live with 
> > the
> > consequences of that potentially causing breakage. Particularly if the 
> > checks
> > wouldn't meaningfully prevent that anyway.
> 
> How about this idea: Put the system ID etc into the new record Robert
> is proposing for the redo point, and also into the checkpoint record,
> so that it's at both ends of the to-be-replayed range.

I think that's a very good idea.


> That just leaves the WAL segments in between.  If you find yourself writing
> a new record that would go in the first usable byte of a segment, insert a
> new special system ID (etc) record that will be checked during replay.

I don't see how we can do that without incuring a lot of overhead though. This
determination would need to happen in ReserveXLogInsertLocation(), while
holding the spinlock. Which is one of the most contended bits of code in
postgres.  The whole reason that we have this "byte pos" to LSN conversion
stuff is to make the spinlock-protected part of ReserveXLogInsertLocation() as
short as possible.


> For segments that start with XLP_FIRST_IS_CONTRECORD, don't worry about it:
> those already form part of a chain of verification (xlp_rem_len, xl_crc)
> that started on the preceding page, so it seems almost impossible to
> accidentally replay from a segment that came from another system.

But I think we might just be ok with logic similar to this, even for the
non-contrecord case. If recovery starts in one segment where we have verified
sysid, xlog block size etc and we encounter a WAL record starting on the first
"content byte" of a segment, we can still verify that the prev LSN is correct
etc.  Sure, if you try hard you could come up with a scenario where you could
mislead such a check, but we don't need to protect against intentional malice
here, just against accidents.

Greetings,

Andres Freund




Re: Retire has_multiple_baserels()

2023-10-10 Thread Richard Guo
On Wed, Oct 11, 2023 at 1:13 AM Tom Lane  wrote:

> I thought this test wasn't too complete, because has_multiple_baserels
> isn't reached at all in many cases thanks to the way the calling if()
> is coded.  I tried testing like this instead:
>
> diff --git a/src/backend/optimizer/path/allpaths.c
> b/src/backend/optimizer/path/allpaths.c
> index eea49cca7b..3f6fc51fb4 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -2649,6 +2649,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo
> *rel,
>   */
>  remove_unused_subquery_outputs(subquery, rel, run_cond_attrs);
>
> +Assert(has_multiple_baserels(root) ==
> (bms_membership(root->all_baserels) == BMS_MULTIPLE));
> +
>  /*
>   * We can safely pass the outer tuple_fraction down to the subquery
> if the
>   * outer level has no joining, aggregation, or sorting to do.
> Otherwise
>
> and came to the same conclusion: check-world finds no cases where
> the assertion fails.  So it LGTM too.  Pushed.


Thanks for pushing!

Thanks
Richard


Re: Fix typo in psql zh_CN.po

2023-10-10 Thread Richard Guo
On Wed, Oct 11, 2023 at 4:30 AM jinser  wrote:

> Hi,
> I found a typo here while using psql. I think this should be a trivial
> patch.
> The typo is that there is an extra `l` before `列出所有事件触发器`.


+1.

Thanks
Richard


Re: Lowering the default wal_blocksize to 4K

2023-10-10 Thread Thomas Munro
On Wed, Oct 11, 2023 at 12:29 PM Andres Freund  wrote:
> On 2023-10-10 21:30:44 +0200, Matthias van de Meent wrote:
> > On Tue, 10 Oct 2023 at 06:14, Andres Freund  wrote:
> > > I was thinking we should perhaps do the opposite, namely getting rid of 
> > > short
> > > page headers. The overhead in the "byte position" <-> LSN conversion due 
> > > to
> > > the differing space is worse than the gain. Or do something inbetween - 
> > > having
> > > the system ID in the header adds a useful crosscheck, but I'm far less
> > > convinced that having segment and block size in there, as 32bit numbers no
> > > less, is worthwhile. After all, if the system id matches, it's not likely 
> > > that
> > > the xlog block or segment size differ.
> >
> > Hmm. I don't think we should remove those checks, as I can see people
> > that would want to change their XLog block size with e.g.
> > pg_reset_wal.
>
> I don't think that's something we need to address in every physical
> segment. For one, there's no option to do so. But more importantly, if they
> don't change the xlog block size, we'll just accept random WAL as well. If
> somebody goes to the trouble of writing a custom tool, they can live with the
> consequences of that potentially causing breakage. Particularly if the checks
> wouldn't meaningfully prevent that anyway.

How about this idea: Put the system ID etc into the new record Robert
is proposing for the redo point, and also into the checkpoint record,
so that it's at both ends of the to-be-replayed range.  That just
leaves the WAL segments in between.  If you find yourself writing a
new record that would go in the first usable byte of a segment, insert
a new special system ID (etc) record that will be checked during
replay.  For segments that start with XLP_FIRST_IS_CONTRECORD, don't
worry about it: those already form part of a chain of verification
(xlp_rem_len, xl_crc) that started on the preceding page, so it seems
almost impossible to accidentally replay from a segment that came from
another system.




interval_ops shall stop using btequalimage (deduplication)

2023-10-10 Thread Noah Misch
The btequalimage() header comment says:

 * Generic "equalimage" support function.
 *
 * B-Tree operator classes whose equality function could safely be replaced by
 * datum_image_eq() in all cases can use this as their "equalimage" support
 * function.

interval_ops, however, recognizes equal-but-distinguishable values:

  create temp table t (c interval);
  insert into t values ('1d'::interval), ('24h');
  table t;
  select distinct c from t;

The CREATE INDEX of the following test:

  begin;
  create table t (c interval);
  insert into t select x from generate_series(1,500), (values ('1 year 1 
month'::interval), ('1 year 30 days')) t(x);
  select distinct c from t;
  create index ti on t (c);
  rollback;

Fails with:

  2498151 2023-10-10 05:06:46.177 GMT DEBUG:  building index "ti" on table "t" 
serially
  2498151 2023-10-10 05:06:46.178 GMT DEBUG:  index "ti" can safely use 
deduplication
  TRAP: failed Assert("!itup_key->allequalimage || keepnatts == 
_bt_keep_natts_fast(rel, lastleft, firstright)"), File: "nbtutils.c", Line: 
2443, PID: 2498151

I've also caught btree posting lists where one TID refers to a '1d' heap
tuple, while another TID refers to a '24h' heap tuple.  amcheck complains.
Index-only scans can return the '1d' bits where the actual tuple had the '24h'
bits.  Are there other consequences to highlight in the release notes?  The
back-branch patch is larger, to fix things without initdb.  Hence, I'm
attaching patches for HEAD and for v16 (trivial to merge back from there).  I
glanced at the other opfamilies permitting deduplication, and they look okay:

[local] test=*# select amproc, amproclefttype = amprocrighttype as l_eq_r, 
array_agg(array[opfname, amproclefttype::regtype::text]) from pg_amproc join 
pg_opfamily f on amprocfamily = f.oid where amprocnum = 4 and opfmethod = 403 
group by 1,2;
─[ RECORD 1 
]───
amproc│ btequalimage
l_eq_r│ t
array_agg │ 
{{bit_ops,bit},{bool_ops,boolean},{bytea_ops,bytea},{char_ops,"\"char\""},{datetime_ops,date},{datetime_ops,"timestamp
 without time zone"},{datetime_ops,"timestamp with time 
zone"},{network_ops,inet},{integer_ops,smallint},{integer_ops,integer},{integer_ops,bigint},{interval_ops,interval},{macaddr_ops,macaddr},{oid_ops,oid},{oidvector_ops,oidvector},{time_ops,"time
 without time zone"},{timetz_ops,"time with time zone"},{varbit_ops,"bit 
varying"},{text_pattern_ops,text},{bpchar_pattern_ops,character},{money_ops,money},{tid_ops,tid},{uuid_ops,uuid},{pg_lsn_ops,pg_lsn},{macaddr8_ops,macaddr8},{enum_ops,anyenum},{xid8_ops,xid8}}
─[ RECORD 2 
]───
amproc│ btvarstrequalimage
l_eq_r│ t
array_agg │ {{bpchar_ops,character},{text_ops,text},{text_ops,name}}

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Dissociate btequalimage() from interval_ops, ending its deduplication.

Under interval_ops, some equal values are distinguishable.  One such
pair is '24:00:00' and '1 day'.  With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function.  This can cause incorrect results from index-only scans.
Users should REINDEX any indexes having interval-type columns and for
which "pg_amcheck --heapallindexed" reports an error.  This fix makes
interval_ops simply omit the support function, like numeric_ops does.
Back-pack to v13, where btequalimage() first appeared.  In back
branches, for the benefit of old catalog content, btequalimage() code
will return false for type "interval".  Going forward, back-branch
initdb will include the catalog change.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/include/catalog/pg_amproc.dat 

Re: broken master regress tests

2023-10-10 Thread Andres Freund
Hi,

On 2023-10-10 17:08:25 -0700, Jeff Davis wrote:
> On Mon, 2023-09-11 at 19:23 -0700, Andres Freund wrote:
> > > So I think injecting --no-locale to the initdb line that creates
> > > the
> > > template is a better approach; something like the attached.
> > 
> > Makes sense, thanks for taking care of this.
> 
> After this, it seems "make check" no longer picks up the locale from
> the system environment by default.

Yea. I wonder if the better fix would have been to copy setenv("LC_MESSAGES", 
"C", 1);
to the initdb template creation. That afaict also fixes the issue, with a
smaller blast radius?

Greetings,

Andres Freund




Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-10 Thread Peter Smith
On Tue, Oct 10, 2023 at 5:10 PM vignesh C  wrote:
>
> Few more instances in other logical replication related pages:
> 1) Another instance was in alter_subscription.sgml:
>   Fetch missing table information from publisher.  This will start
>   replication of tables that were added to the subscribed-to publications
>   since CREATE SUBSCRIPTION or
>   the last invocation of REFRESH PUBLICATION.

OK, modified in v4.

> 2) Few more instances were in logical-replication-subscription.html
> 2.a)Normally, the remote replication slot is created automatically when 
> the
> subscription is created using CREATE SUBSCRIPTION and 
> it
> is dropped automatically when the subscription is dropped using
> DROP SUBSCRIPTION.  In some situations, however, it can
> be useful or necessary to manipulate the subscription and the underlying
> replication slot separately.
> 2.b)  When dropping a subscription, the remote host is not reachable.  In
>that case, disassociate the slot from the subscription
>using ALTER SUBSCRIPTION before attempting to drop
>the subscription.

The above were in section "31.2.1 Replication Slot Management". OK,
modified in v4.

> 2.c)  If a subscription is affected by this problem, the only way to 
> resume
>  replication is to adjust one of the column lists on the publication
>  side so that they all match; and then either recreate the subscription,
>  or use ALTER SUBSCRIPTION ... DROP PUBLICATION to
>  remove one of the offending publications and add it again.

The above was in section "31.2.1 Replication Slot Management". OK,
modified in v4.

> 2.d) The
>transaction that produced the conflict can be skipped by using
>ALTER SUBSCRIPTION ... SKIP with the finish LSN
>(i.e., LSN 0/14C0378).
> 2.e)Before using this function, the subscription needs to be
> disabled temporarily
>either by ALTER SUBSCRIPTION ... DISABLE or,
>

The above was in the section "31.5 Conflicts". OK, modified in v4.

~~

Thanks for reporting those. PSA v4.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-Add-more-pub-sub-links.patch
Description: Binary data


Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-10 Thread Peter Smith
On Tue, Oct 10, 2023 at 11:33 PM Amit Kapila  wrote:
>
> On Tue, Oct 10, 2023 at 11:40 AM vignesh C  wrote:
> >
> > On Tue, 10 Oct 2023 at 08:47, Peter Smith  wrote:
> > > PSA v3.
> >
> > Few more instances in other logical replication related pages:
> > 1) Another instance was in alter_subscription.sgml:
> >   Fetch missing table information from publisher.  This will start
> >   replication of tables that were added to the subscribed-to 
> > publications
> >   since CREATE SUBSCRIPTION or
> >   the last invocation of REFRESH PUBLICATION.
> >
>
> Do we want each and every occurrence of the commands to have
> corresponding links? I am not against it if we think that is useful
> for users but asking as I am not aware of the general practice we
> follow in this regard. Does anyone else have any opinion on this
> matter?
>

The goal of the patch was to use a consistent approach for all the
pub/sub pages. Otherwise, there was a mixture and no apparent reason
why some commands had links while some did not.

The rules this patch is using are:
- only including inter-page links to other pub/sub commands
- if the same pub/sub linkend occurs multiple times in the same block
of text, then only give a link for the first one

~~

What links are "useful to users" is subjective, and the convenience
probably also varies depending on how much scrolling is needed to get
to the "See Also" part at the bottom. I felt a consistent linking
approach is better than having differences based on some arbitrary
judgement of usefulness.

AFAICT some other PG DOCS pages strive to do the same. For example,
the ALTER TABLE page [1] mentions the "CREATE TABLE" command 10 times
and 8 of those have links. (the missing ones don't look any different
to me so seem like accidental omissions).

==
[1] https://www.postgresql.org/docs/devel/sql-altertable.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: broken master regress tests

2023-10-10 Thread Jeff Davis
On Mon, 2023-09-11 at 19:23 -0700, Andres Freund wrote:
> > So I think injecting --no-locale to the initdb line that creates
> > the
> > template is a better approach; something like the attached.
> 
> Makes sense, thanks for taking care of this.

After this, it seems "make check" no longer picks up the locale from
the system environment by default.

What is the new way to run the regression tests with an actual locale?
If it's no longer done by default, won't that dramatically reduce the
coverage of non-C locales?

Regards,
Jeff Davis





Re: stopgap fix for signal handling during restore_command

2023-10-10 Thread Andres Freund
On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:
> Subject: [PATCH v10 1/2] Move extra code out of the Pre/PostRestoreCommand()
>  block.

LGTM


> From fb6957da01f11b75d1a1966f32b00e2e77c789a0 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Tue, 14 Feb 2023 09:44:53 -0800
> Subject: [PATCH v10 2/2] Don't proc_exit() in startup's SIGTERM handler if
>  forked by system().
> 
> Instead, emit a message to STDERR and _exit() in this case.  This
> change also adds assertions to proc_exit(), ProcKill(), and
> AuxiliaryProcKill() to verify that these functions are not called
> by a process forked by system(), etc.
> ---
>  src/backend/postmaster/startup.c | 17 -
>  src/backend/storage/ipc/ipc.c|  3 +++
>  src/backend/storage/lmgr/proc.c  |  2 ++
>  src/backend/utils/error/elog.c   | 28 
>  src/include/utils/elog.h |  6 +-
>  5 files changed, 50 insertions(+), 6 deletions(-)



> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index 22b4278610..b9e2c3aafe 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
>   dlist_head *procgloballist;
>  
>   Assert(MyProc != NULL);
> + Assert(MyProc->pid == (int) getpid());  /* not safe if forked by 
> system(), etc. */
>  
>   /* Make sure we're out of the sync rep lists */
>   SyncRepCleanupAtProcExit();
> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
>   PGPROC *proc;
>  
>   Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
> + Assert(MyProc->pid == (int) getpid());  /* not safe if forked by 
> system(), etc. */
>  
>   auxproc = [proctype];
>  

I'd make these elog(PANIC), I think. The paths are not performance critical
enough that a single branch hurts, so the overhead of the check is irrelevant,
and the consequences of calling ProcKill() twice for the same process are very
severe.


> +/*
> + * Write a message to STDERR using only async-signal-safe functions.  This 
> can
> + * be used to safely emit a message from a signal handler.
> + *
> + * TODO: It is likely possible to safely do a limited amount of string
> + * interpolation (e.g., %s and %d), but that is not presently supported.
> + */
> +void
> +write_stderr_signal_safe(const char *fmt)

As is, this isn't a format, so I'd probably just name it s or str :)


> -/*
> - * Write errors to stderr (or by equal means when stderr is
> - * not available). Used before ereport/elog can be used
> - * safely (memory context, GUC load etc)
> - */
>  extern void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2);
> +extern void write_stderr_signal_safe(const char *fmt);

Not sure why you removed the comment?

Greetings,

Andres Freund




Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

2023-10-10 Thread Michael Paquier
On Tue, Oct 10, 2023 at 10:51:10AM -0400, Evan Jones wrote:
> Here is a quick demonstration of this issue, showing that the quoting
> behavior is different between these two. Mac OS X with the "default" locale
> includes quotes because ą includes  0x85 in its UTF-8 encoding:

Ugh.  rowtypes.c has reminded me as well of gistfuncs.c in pageinspect
where included columns are printed in a ROW-like fashion.  And it also
uses isspace() when we check if double quotes are needed or not.  So
the use of the quotes would equally depend on what macos thinks is
a correct space in this case.
--
Michael


signature.asc
Description: PGP signature


Re: Lowering the default wal_blocksize to 4K

2023-10-10 Thread Andres Freund
Hi,

On 2023-10-10 21:30:44 +0200, Matthias van de Meent wrote:
> On Tue, 10 Oct 2023 at 06:14, Andres Freund  wrote:
> > On 2023-10-09 23:16:30 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> There's an alternative approach we could take, which is to write in 4KB
> >>> increments, while keeping 8KB pages. With the current format that's not
> >>> obviously a bad idea. But given there aren't really advantages in 8KB WAL
> >>> pages, it seems we should just go for 4KB?
> >>
> >> Seems like that's doubling the overhead of WAL page headers.  Do we need
> >> to try to skinny those down?
> >
> > I think the overhead is small, and we are wasting so much space in other
> > places, that I am not worried about the proportional increase page header
> > space usage at this point, particularly compared to saving in overall write
> > rate and increase in TPS. There's other areas we can save much more space, 
> > if
> > we want to focus on that.
> >
> > I was thinking we should perhaps do the opposite, namely getting rid of 
> > short
> > page headers. The overhead in the "byte position" <-> LSN conversion due to
> > the differing space is worse than the gain. Or do something inbetween - 
> > having
> > the system ID in the header adds a useful crosscheck, but I'm far less
> > convinced that having segment and block size in there, as 32bit numbers no
> > less, is worthwhile. After all, if the system id matches, it's not likely 
> > that
> > the xlog block or segment size differ.
> 
> Hmm. I don't think we should remove those checks, as I can see people
> that would want to change their XLog block size with e.g.
> pg_reset_wal.

I don't think that's something we need to address in every physical
segment. For one, there's no option to do so. But more importantly, if they
don't change the xlog block size, we'll just accept random WAL as well. If
somebody goes to the trouble of writing a custom tool, they can live with the
consequences of that potentially causing breakage. Particularly if the checks
wouldn't meaningfully prevent that anyway.


> After that we'll only have the system ID left from the extended
> header, which we could store across 2 pages in the (current) alignment
> losses of xlp_rem_len - even pages the upper half, uneven pages the
> lower half of the ID. This should allow for enough integrity checks
> without further increasing the size of XLogPageHeader in most
> installations.

I doubt that that's a good idea - what if there's just a single page in a
segment? And there aren't earlier segments? That's not a rare case, IME.

Greetings,

Andres Freund




Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Michael Paquier
On Tue, Oct 10, 2023 at 09:12:49AM +0200, Drouvot, Bertrand wrote:
> On 10/10/23 7:58 AM, Michael Paquier wrote:
>> I was looking at v8 just before you sent this v9, and still got
>> annoyed by the extra boolean argument added to InitPostgres().
> 
> Arf, I did not look at it as I had in mind to look at it once
> this one is in.

No problem.  I'm OK to do it.

>> So
>> please let me propose to bite the bullet and refactor that, as of the
>> 0001 attached that means less diff footprints in all the callers of
>> InitPostgres() (I am not wedded to the flag names).
> 
> Thanks for having looked at it!
> 
> +   bits32  init_flags = 0; /* never honor 
> session_preload_libraries */
> 
> Also a few word about datallowconn in the comment? (as the flag deals with 
> both).

I am not sure that this is necessary in the code paths of
BackgroundWorkerInitializeConnectionByOid() and
BackgroundWorkerInitializeConnection() as datallowconn is handled a
few lines down.
--
Michael


signature.asc
Description: PGP signature


Re: FDW pushdown of non-collated functions

2023-10-10 Thread Jean-Christophe Arnu
Hi Ashutosh,

Le ven. 6 oct. 2023 à 14:16, Ashutosh Bapat 
a écrit :

> Hi Jean-Christophe,
>
> On Fri, Sep 8, 2023 at 11:30 PM Jean-Christophe Arnu 
> wrote:
> >
> > Maybe we could add another condition to the first if statement in order
> to allow a “no-collation” function to be pushed down even if they have
> “collatable” parameters. I’m not sure about the possible regressions of
> behaviour of this change, but it
> seems to work fine with date_trunc() and date_part() (which suffers
> the same problem).
>
> That may not work since the output of the function may be dependent
> upon the collation on the inputs.
>
> There were similar discussions earlier. E.g.
>
> https://www.postgresql.org/message-id/flat/CACowWR1ARWyRepRxGfijMcsw%2BH84Dj8x2o9N3kvz%3Dz1p%2B6b45Q%40mail.gmail.com
> .
>
> Reading Tom's first reply there you may work around this by declaring
> the collation explicitly.
>

Thanks for your reply. I did not catch  these messages in the archive.
Thanks for spotting them.


> Briefly reading Tom's reply, the problem seems to be trusting whether
> the default collation locally and on the foreign server respectively
> is same or not. May be a simple fix is to declare a foreign server
> level option declaring that the default collation on the foreign
> server is same as the local server may be a way to move forward. But
> given that the problem remains unsolved for 7 years at least, may be
> such a simple fix is not enough.
>

I studied postgres_fdw source code a bit and the problem is not as easy to
solve : one could set an option telling the default remote collation is
aligned with local per "server" but nothing guaranties that the parameter
collation is known on the «remote» side.


>
> Another solution would be to attach another attribute to a function
> indicating whether the output of that function depends upon the input
> collations or not. Doing that just for FDW may not be acceptable
> though.
>

Yes, definitely. I thought

Anyway, you're right, after 7 years, this is a really difficult problem to
solve and there's no straightforward solution (to my eyes).
Thanks again for your kind explanations
Regards

-- 
Jean-Christophe Arnu


Re: Transaction timeout

2023-10-10 Thread Nikolay Samokhvalov
On Wed, Sep 6, 2023 at 1:16 AM Fujii Masao  wrote:
> With the v4 patch, I found that timeout errors no longer occur during the 
> idle in
> transaction phase. Instead, they occur when the next statement is executed. 
> Is this
> the intended behavior? I thought some users might want to use the transaction 
> timeout
> feature to prevent prolonged transactions and promptly release resources 
> (e.g., locks)
> in case of a timeout, similar to idle_in_transaction_session_timeout.

I agree – it seems reasonable to interrupt transaction immediately
when the timeout occurs. This was the idea – to determine the maximum
possible time for all transactions that is allowed on a server, to
avoid too long-lasting locking and not progressing xmin horizon.

That being said, I also think this wording in the docs:

+Setting transaction_timeout in
+postgresql.conf is not recommended
because it would
+affect all sessions.

It was inherited from statement_timeout, where I also find this
wording too one-sided. There are certain situations where we do want
global setting to be set – actually, any large OLTP case (to be on
lower risk side; those users who need longer timeout, can set it when
needed, but by default we do need very restrictive timeouts, usually <
1 minute, like we do in HTTP or application servers). I propose this:
> Setting transaction_timeout in postgresql.conf should be done with caution 
> because it affects all sessions.

Looking at the v4 of the patch, a couple of more comments that might
be helpful for v5 (which is planned, as I understand):

1)  it might be beneficial to add tests for more complex scenarios,
e.g., subtransactions

2) In the error message:

+ errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1,
+ stmt_reason, comma2, tx_reason)));

– it seems we can have excessive commas here

3) Perhaps, we should say that we cancel the transaction, not
statement (especially in the case when it is happening in the
idle-in-transaction state).

Thanks for working on this feature!




Re: The danger of deleting backup_label

2023-10-10 Thread David Steele

On 9/28/23 22:30, Michael Paquier wrote:

On Thu, Sep 28, 2023 at 05:14:22PM -0400, David Steele wrote:


Recovery worked perfectly as long as backup_label was present and failed
hard when it was not:

LOG:  invalid primary checkpoint record
PANIC:  could not locate a valid checkpoint record

It's not a very good message, but at least the foot gun has been removed. We
could use this as a special value to give a better message, and maybe use
something a bit more unique like 0xFADEFADE (or whatever) as the
value.


Why not just InvalidXLogRecPtr?


That fails because there is a check to make sure the checkpoint is valid 
when pg_control is loaded. Another possibility is to use a special LSN 
like we use for unlogged tables. Anything >= 24 and < WAL segment size 
will work fine.



This is all easy enough for pg_basebackup to do, but will certainly be
non-trivial for most backup software to implement. In [2] we have discussed
perhaps returning pg_control from pg_backup_stop() for the backup software
to save, or it could become part of the backup_label (encoded as hex or
base64, presumably). I prefer the latter as this means less work for the
backup software (except for the need to exclude pg_control from the backup).

I don't have a patch for this yet because I did not test this idea using
pg_basebackup, but I'll be happy to work up a patch if there is interest.


If the contents of the control file are tweaked before sending it
through a BASE_BACKUP, it would cover more than just pg_basebackup.
Switching the way the control file is sent with new contents in
sendFileWithContent() rather than sendFile() would be one way, for
instance..


Good point, and that makes this even more compelling. If we include 
pg_control into backup_label then there is no need to modify pg_control 
(as above) -- we can just exclude it from the backup entirely. That will 
certainly require some rejigging in recovery but seems worth it for 
backup solutions that can't easily modify pg_control. The C-based 
solutions can do this pretty easily but it is a pretty high bar for 
anyone else.


Regards,
-David




Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-10-10 Thread Tom Lane
Tommy Pavlicek  writes:
> I did notice one further potential bug. When creating an operator and
> adding a commutator, PostgreSQL only links the commutator back to the
> operator if the commutator has no commutator of its own, but the
> create operation succeeds regardless of whether this linkage happens.

> In other words, if A and B are a pair of commutators and one creates
> another operator, C, with A as its commutator, then C will link to A,
> but A will still link to B (and B to A). It's not clear to me if this
> in itself is a problem, but unless I've misunderstood something
> operator C must be the same as B so it implies an error by the user
> and there could also be issues if A was deleted since C would continue
> to refer to the deleted A.

Yeah, it'd make sense to tighten that up.  Per the discussion so far,
we should not allow an operator's commutator/negator links to change
once set, so overwriting the existing link with a different value
would be wrong.  But allowing creation of the new operator to proceed
with a different outcome than expected isn't good either.  I think
we should start throwing an error for that.

regards, tom lane




Fix typo in psql zh_CN.po

2023-10-10 Thread jinser
Hi,
I found a typo here while using psql. I think this should be a trivial patch.
The typo is that there is an extra `l` before `列出所有事件触发器`.

-- 
regards,
Jinser Kafak.


0001-Fix-typo-in-psql-zh_CN.po.patch
Description: Binary data


Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-10-10 Thread Tommy Pavlicek
On Thu, Sep 28, 2023 at 9:18 PM Tom Lane  wrote:
>
> Tommy Pavlicek  writes:
> > I've attached a new version of the ALTER OPERATOR patch that allows
> > no-ops. It should be ready to review now.
>
> I got around to looking through this finally (sorry about the delay).
> I'm mostly on board with the functionality, with the exception that
> I don't see why we should allow ALTER OPERATOR to cause new shell
> operators to be created.  The argument for allowing that in CREATE
> OPERATOR was mainly to allow a linked pair of operators to be created
> without a lot of complexity (specifically, being careful to specify
> the commutator or negator linkage only in the second CREATE, which
> is a rule that requires another exception for a self-commutator).
> However, if you're using ALTER OPERATOR then you might as well create
> both operators first and then link them with an ALTER command.
> In fact, I don't really see a use-case where the operators wouldn't
> both exist; isn't this feature mainly to allow retrospective
> correction of omitted linkages?  So I think allowing ALTER to create a
> second operator is more likely to allow mistakes to sneak by than to
> do anything useful --- and they will be mistakes you can't correct
> except by starting over.  I'd even question whether we want to let
> ALTER establish a linkage to an existing shell operator, rather than
> insisting you turn it into a valid operator via CREATE first.
>
> If we implement it with that restriction then I don't think the
> refactorization done in 0001 is correct, or at least not ideal.
>
> (In any case, it seems like a bad idea that the command reference
> pages make no mention of this stuff about shell operators.  It's
> explained in 38.15. Operator Optimization Information, but it'd
> be worth at least alluding to that section here.  Or maybe we
> should move that info to CREATE OPERATOR?)
>
> More generally, you muttered something about 0001 fixing some
> existing bugs, but if so I can't see those trees for the forest of
> refactorization.  I'd suggest splitting any bug fixes apart from
> the pure-refactorization step.
>
> regards, tom lane

Thanks Tom.

The rationale behind the shell operator and that part of section 38.15
of the documentation had escaped me, but what you're saying makes
complete sense. Based on your comments, I've made some changes:

1. I've isolated the bug fixes (fixing the error message and
disallowing self negation when filling in a shell operator) into
0001-bug-fixes-v3.patch.
2. I've scaled back most of the refactoring as I agree it no longer makes sense.
3. I updated the logic to prevent the creation of or linking to shell operators.
4. I made further updates to the documentation including referencing
38.15 directly in the CREATE and ALTER pages (It's easy to miss if
only 38.14 is referenced) and moved the commentary about creating
commutators and negators into the CREATE section as with the the ALTER
changes it now seems specific to CREATE. I didn't move the rest of
38.15 as I think this applies to both CREATE and ALTER.

I did notice one further potential bug. When creating an operator and
adding a commutator, PostgreSQL only links the commutator back to the
operator if the commutator has no commutator of its own, but the
create operation succeeds regardless of whether this linkage happens.

In other words, if A and B are a pair of commutators and one creates
another operator, C, with A as its commutator, then C will link to A,
but A will still link to B (and B to A). It's not clear to me if this
in itself is a problem, but unless I've misunderstood something
operator C must be the same as B so it implies an error by the user
and there could also be issues if A was deleted since C would continue
to refer to the deleted A.

The same applies for negators and alter operations.

Do you know if this behaviour is intentional or if I've missed
something because it seems undesirable to me. If it is a bug, then I
think I can see how to fix it, but wanted to ask before making any
changes.

On Mon, Sep 25, 2023 at 11:52 AM jian he  wrote:
>
> /*
>  * AlterOperator
>  * routine implementing ALTER OPERATOR  SET (option = ...).
>  *
>  * Currently, only RESTRICT and JOIN estimator functions can be changed.
>  */
> ObjectAddress
> AlterOperator(AlterOperatorStmt *stmt)
>
> The above comment needs to change, other than that, it passed the
> test, works as expected.

Thanks, added a comment.

> Can only be set when the operator does support a hash/merge join. Once
> set to true, it cannot be reset to false.

Yes, I updated the wording. Is it clearer now?


0001-bug-fixes-v3.patch
Description: Binary data


0003-alter_op-v3.patch
Description: Binary data


0002-refactor-v3.patch
Description: Binary data


Re: Is this a problem in GenericXLogFinish()?

2023-10-10 Thread Jeff Davis
I committed and backported 0001 (the GenericXLogFinish() fix but not
the Assert).

Strangely I didn't see the -committers email come through yet. If
anyone notices anything wrong, please let me know before the final v11
release.

On Thu, 2023-09-28 at 12:05 -0700, Jeff Davis wrote:
> Also, I ran into some problems with GIN that might require a bit more
> refactoring in ginPlaceToPage(). Perhaps we could consider
> REGBUF_NO_CHANGE a general bypass of the Assert(BufferIsDirty()), and
> use it temporarily until we have a chance to analyze/refactor each of
> the callers that need it.

For the Assert, I have attached a new patch for v17.

I changed the name of the flag to REGBUF_CLEAN_OK, because for some of
the callsites it was not clear to me whether the page is always
unchanged or sometimes unchanged. In other words, REGBUF_CLEAN_OK is a
way to bypass the Assert for callsites where we either know that we
actually want to register an unchanged page, or for callsites where we
don't know if the page is changed or not.

If we commit this, ideally someone would look into the places where
REGBUF_CLEAN_OK is used and make sure that it's not a bug, and perhaps
refactor so that it would benefit from the Assert. But refactoring
those places is outside of the scope of this patch.

It sounds like we have no intention to change the hash index code, so
are we OK if this flag just lasts forever? Do you still think it offers
a useful check?

Regards,
Jeff Davis

From 859c8f4284de1030ab070630fa06af3c171264d9 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 28 Sep 2023 11:19:06 -0700
Subject: [PATCH v3] Assert that buffers are marked dirty before
 XLogRegisterBuffer().

A page must be marked dirty before writing WAL, as documented in
transam/README.

Enforce this rule in XLogRegisterBuffer(), though it's not actually a
bug if (a) the page really is clean; or (b) the buffer is marked dirty
after XLogRegisterBuffer() and before XLogInsert().

Update log_newpage_range(), where it's trivial to change the order so
that we can check in XLogRegisterBuffer(). Other callers are more
complex, and updating them is outside the scope of this patch (and
perhaps not desirable), so introduce a flag REGBUF_CLEAN_OK to bypass
the Assert.

Note that this commit may have missed some callsites which can, at
least in some cases, register a clean buffer. If such a callsite is
found, it can be updated to use REGBUF_CLEAN_OK assuming it doesn't
represent an actual bug.

Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a...@iki.fi
---
 src/backend/access/gin/ginbtree.c   |  3 ++-
 src/backend/access/gin/ginfast.c|  2 +-
 src/backend/access/hash/hash.c  | 10 ++---
 src/backend/access/hash/hashovfl.c  |  9 +---
 src/backend/access/transam/xloginsert.c |  3 ++-
 src/backend/storage/buffer/bufmgr.c | 30 +
 src/include/access/xloginsert.h |  2 ++
 src/include/storage/bufmgr.h|  1 +
 8 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06e97a7fbb..c98176af77 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -389,7 +389,8 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
 		{
 			XLogBeginInsert();
-			XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD);
+			XLogRegisterBuffer(0, stack->buffer,
+			   REGBUF_STANDARD | REGBUF_CLEAN_OK);
 			if (BufferIsValid(childbuf))
 XLogRegisterBuffer(1, childbuf, REGBUF_STANDARD);
 		}
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index eb6c554831..e46e11df07 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,7 +399,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 		Assert((ptr - collectordata) <= collector->sumsize);
 		if (needWal)
 		{
-			XLogRegisterBuffer(1, buffer, REGBUF_STANDARD);
+			XLogRegisterBuffer(1, buffer, REGBUF_STANDARD | REGBUF_CLEAN_OK);
 			XLogRegisterBufData(1, collectordata, collector->sumsize);
 		}
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index fc5d97f606..7a62100c89 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -824,11 +824,15 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 XLogRegisterData((char *) , SizeOfHashDelete);
 
 /*
- * bucket buffer needs to be registered to ensure that we can
- * acquire a cleanup lock on it during replay.
+ * bucket buffer was not changed, but still needs to be
+ * registered to ensure that we can acquire a cleanup lock on
+ * it during replay.
  */
 if (!xlrec.is_primary_bucket_page)
-	XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD | REGBUF_NO_IMAGE);
+{
+	uint8 flags = REGBUF_STANDARD | 

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-10 Thread Tom Lane
David Rowley  writes:
> I've attached a slightly more worked on patch that makes maxlen == 0
> mean read-only.  Unsure if a macro is worthwhile there or not.

A few thoughts:

* initStringInfoFromStringWithLen() is kind of a mouthful.
How about "initStringInfoWithBuf", or something like that?

* logicalrep_read_tuple is doing something different from these
other callers: it's creating a *fully valid* StringInfo that
could be enlarged via repalloc.  (Whether anything downstream
depends on that, I dunno.)  Is it worth having two new init
functions, one that has that spec and initializes maxlen
appropriately, and the other that sets maxlen to 0?

* I think that this bit in the new enlargeStringInfo code path
is wrong:

+   newlen = pg_nextpower2_32(str->len) * 2;
+   while (needed > newlen)
+   newlen = 2 * newlen;

In the admittedly-unlikely case that str->len is more than half a GB
to start with, pg_nextpower2_32() will round up to 1GB and then the *2
overflows.  I think you should make this just

+   newlen = pg_nextpower2_32(str->len);
+   while (needed > newlen)
+   newlen = 2 * newlen;

It's fairly likely that this path will never be taken at all,
so trying to shave a cycle or two seems unnecessary.

> One thought I had about this is that the memory context behaviour
> might catch someone out at some point.  Right now if you do
> initStringInfo() the memory context of the "data" field will be
> CurrentMemoryContext, but if someone does
> initStringInfoFromStringWithLen() and then changes to some other
> memory context before doing an appendStringInfo on that string, then
> we'll allocate "data" in whatever that memory context is. Maybe that's
> ok if we document it.  Fixing it would mean adding a MemoryContext
> field to StringInfoData which would be set to CurrentMemoryContext
> during initStringInfo() and initStringInfoFromStringWithLen().

I think documenting it is sufficient.  I don't really foresee use-cases
where the string would get enlarged, anyway.

On the whole, I wonder about the value of allowing such a StringInfo to be
enlarged at all.  If we are defining the case as being a "read only"
buffer, under what circumstances would it be useful to enlarge it?
I'm tempted to suggest that we should just Assert(maxlen > 0) in
enlargeStringInfo, and anywhere else in stringinfo.c that modifies
the buffer.  That also removes the concern about which context the
enlargement would happen in.

I'm not really happy with what you did documentation-wise in
stringinfo.h.  I suggest more like

 * StringInfoData holds information about an extensible string.
 *  datais the current buffer for the string (allocated with palloc).
 *  len is the current string length.  There is guaranteed to be
 *  a terminating '\0' at data[len], although this is not very
 *  useful when the string holds binary data rather than text.
 *  maxlen  is the allocated size in bytes of 'data', i.e. the maximum
 *  string size (including the terminating '\0' char) that we can
 *  currently store in 'data' without having to reallocate
-*  more space.  We must always have maxlen > len, except
+*  in the read-only case described below.
 *  cursor  is initialized to zero by makeStringInfo or initStringInfo,
 *  but is not otherwise touched by the stringinfo.c routines.
 *  Some routines use it to scan through a StringInfo.
+*
+* As a special case, a StringInfoData can be initialized with a read-only
+* string buffer.  In this case "data" does not necessarily point at a
+* palloc'd chunk, and management of the buffer storage is the caller's
+* responsibility.  maxlen is set to zero to indicate that this is the case.

Also, the following comment block asserting that there are "two ways"
to initialize a StringInfo needs work, and I guess so does the above-
cited comment about the cursor field.

regards, tom lane




Re: Tab completion for ATTACH PARTITION

2023-10-10 Thread David Zhang

On 2023-09-13 12:19 a.m., Alvaro Herrera wrote:

On 2023-Sep-13, bt23nguyent wrote:


Hi,

Currently, the psql's tab completion feature does not support properly for
ATTACH PARTITION. When  key is typed after "ALTER TABLE 
ATTACH PARTITION ", all possible table names should be displayed, however,
foreign table names are not displayed. So I created a patch that addresses
this issue by ensuring that psql displays not only normal table names but
also foreign table names in this case.

Sounds reasonable, but I think if we're going to have a specific query
for this case, we should make it a lot more precise.  For example, any
relation that's already a partition cannot be attached; as can't any
relation that is involved in legacy inheritance as either parent or
child.


I applied the patch and performed below tests. I think it would be 
better if "attach partition" can filter out those partitions which has 
already been attached, just like "detach partition" is capable to filter 
out the partitions which has already been detached.


Here are my test steps and results:


### create a main PG cluster on port 5432 and run below commands:

CREATE EXTENSION postgres_fdw;
CREATE SERVER s1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname  
'postgres', host '127.0.0.1', port '5433');

CREATE USER MAPPING for david SERVER s1 OPTIONS(user 'david');
CREATE TABLE t (a INT, b TEXT) PARTITION BY RANGE (a);
CREATE TABLE t_local PARTITION OF t FOR VALUES FROM (1) TO (10);
CREATE FOREIGN TABLE t_s1 PARTITION OF t FOR VALUES FROM (11) TO (20) 
SERVER s1 OPTIONS(schema_name 'public', table_name 't');
CREATE FOREIGN TABLE t_s1 SERVER s1 OPTIONS(schema_name 'public', 
table_name 't');



### create a foreign PG cluster on port 5433 and run below command:
CREATE TABLE t (a INT, b TEXT);


### "detach partition" can filter out already detached partition, in 
this case, "t_local".


postgres=# alter table t detach partition
information_schema.  public.  t_local t_s1
postgres=# alter table t detach partition t_s1 ;
ALTER TABLE
postgres=# alter table t detach partition
information_schema.  public. t_local


## before patch, "attach partition" can't display foreign table;
postgres=# alter table t attach partition
information_schema.  public.  t t_local


### after patch, "attach partition" dose display the foreign table 
(patch works).

postgres=# alter table t attach partition
information_schema.  public.  t t_local  t_s1

In both cases, the already attached partition "t_local" shows up. If it 
can be filtered out then I believe better user experience.



Best regards,

David









Re: On login trigger: take three

2023-10-10 Thread Alexander Korotkov
Hi, Robert!

Thank you for your feedback.

On Tue, Oct 10, 2023 at 5:51 PM Robert Haas  wrote:
>
> On Mon, Oct 9, 2023 at 10:11 AM Alexander Korotkov  
> wrote:
> >  * Hold lock during setting of pg_database.dathasloginevt flag (v32
> > version actually didn't prevent race condition).
>
> So ... how does getting this flag set actually work? And how does
> clearing it work?

I hope I explained that in [1].

> In the case of row-level security, you have to explicitly enable the
> flag on the table level using DDL provided for that purpose. In the
> case of relhas{rules,triggers,subclass} the flag is set automatically
> as a side-effect of some other operation. I tend to consider that the
> latter design is somewhat messy. It's potentially even messier here,
> because at least when you add a rule or a trigger to a table you're
> expecting to take a lock on the table anyway. I don't think you
> necessarily expect creating a login trigger to take a lock on the
> database. That's a bit odd and could have visible side effects. And if
> you don't, then what happens is that if you create two login triggers
> in overlapping transactions, then (1) if there were no login triggers
> previously, one of the transactions fails with an internal-looking
> error message about a concurrent tuple update and (2) if there were
> login triggers previously, then it works fine.

Yep, in v43 it worked that way.  One transaction has to wait for
another finishing update of pg_database tuple, then fails.  This is
obviously ridiculous.  Since overlapping setters of flag will have to
wait anyway, I changed lock mode in v44 for them to
AccessExclusiveLock.  Now, waiting transaction then sees the updated
tuple and doesn't fail.

> That's also a bit weird
> and surprising. Now the counter-argument could be that adding new DDL
> to enable login triggers for a database is too much cognitive burden
> and it's better to have the kind of weird and surprising behavior that
> I just discussed. I don't know that I would buy that argument, but it
> could be made ... and my real point here is that I don't even see
> these trade-offs being discussed. Apologies if they were discussed
> earlier and I just missed that; I confess to not having read every
> email message on this topic, and some of the ones I did read I read a
> long time ago.

I have read the thread quite carefully.  I don't think manual setting
of the flag was discussed.  I do think it would be extra burden for
users, and I would prefer automatic flag as long as it works
transparently and reliably.

> > This version should be good and has no overhead.  Any thoughts?
> > Daniel, could you please re-run the performance tests?
>
> Is "no overhead" an overly bold claim here?

Yes, sure.  I meant no extra lookup.  Hopefully that would mean no
measurable overhead when is no enabled login triggers.

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdtvvozi%3Dttp8kvJQwuLrP5Q0D_5c4Pw1U67MRXcROB9yA%40mail.gmail.com

--
Regards,
Alexander Korotkov




Re: On login trigger: take three

2023-10-10 Thread Alexander Korotkov
Hi!

Thank you for the review.

On Tue, Oct 10, 2023 at 7:37 PM Andres Freund  wrote:
> On 2023-10-10 08:18:46 +0300, Alexander Korotkov wrote:
> > @@ -968,7 +969,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> >
> >   if (!get_db_info(dbtemplate, ShareLock,
> >_dboid, _owner, 
> > _encoding,
> > -  _istemplate, _allowconn,
> > +  _istemplate, _allowconn, 
> > _hasloginevt,
> >_frozenxid, _minmxid, 
> > _deftablespace,
> >_collate, _ctype, 
> > _iculocale, _icurules, _locprovider,
> >_collversion))
>
> This isn't your fault, but this imo has become unreadable. Think we ought to
> move the information about a database to a struct.

Should I do this in a separate patch?

> > @@ -296,6 +306,13 @@ insert_event_trigger_tuple(const char *trigname, const 
> > char *eventname, Oid evtO
> >   CatalogTupleInsert(tgrel, tuple);
> >   heap_freetuple(tuple);
> >
> > + /*
> > +  * Login event triggers have an additional flag in pg_database to 
> > allow
> > +  * faster lookups in hot codepaths. Set the flag unless already True.
> > +  */
> > + if (strcmp(eventname, "login") == 0)
> > + SetDatatabaseHasLoginEventTriggers();
>
> It's not really faster lookups, it's no lookups, right?

Yes, no extra lookups. Fixed.

> >   /* Depend on owner. */
> >   recordDependencyOnOwner(EventTriggerRelationId, trigoid, evtOwner);
> >
> > @@ -357,6 +374,39 @@ filter_list_to_array(List *filterlist)
> >   return PointerGetDatum(construct_array_builtin(data, l, TEXTOID));
> >  }
> >
> > +/*
> > + * Set pg_database.dathasloginevt flag for current database indicating that
> > + * current database has on login triggers.
> > + */
> > +void
> > +SetDatatabaseHasLoginEventTriggers(void)
> > +{
> > + /* Set dathasloginevt flag in pg_database */
> > + Form_pg_database db;
> > + Relationpg_db = table_open(DatabaseRelationId, 
> > RowExclusiveLock);
> > + HeapTuple   tuple;
> > +
> > + /*
> > +  * Use shared lock to prevent a conflit with EventTriggerOnLogin() 
> > trying
> > +  * to reset pg_database.dathasloginevt flag.  Note that we use
> > +  * AccessShareLock allowing setters concurently.
> > +  */
> > + LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, 
> > AccessShareLock);
>
> That seems like a very odd approach - how does this avoid concurrency issues
> with one backend setting and another unsetting the flag?  And outside of that,
> won't this just lead to concurrently updated tuples?

When backend unsets the flag, it acquires the lock first.  If lock is
acquired successfully then no other backends hold it. If the
concurrent backend have already inserted an event trigger then we will
detect it by rechecking event list under lock. If the concurrent
backend inserted/enabled event trigger and waiting for the lock, then
it will set the flag once we release the lock.

> > + tuple = SearchSysCacheCopy1(DATABASEOID, 
> > ObjectIdGetDatum(MyDatabaseId));
> > + if (!HeapTupleIsValid(tuple))
> > + elog(ERROR, "cache lookup failed for database %u", 
> > MyDatabaseId);
> > + db = (Form_pg_database) GETSTRUCT(tuple);
> > + if (!db->dathasloginevt)
> > + {
> > + db->dathasloginevt = true;
> > + CatalogTupleUpdate(pg_db, >t_self, tuple);
> > + CommandCounterIncrement();
> > + }
> > + table_close(pg_db, RowExclusiveLock);
> > + heap_freetuple(tuple);
> > +}
> > +
> >  /*
> >   * ALTER EVENT TRIGGER foo ENABLE|DISABLE|ENABLE ALWAYS|REPLICA
> >   */
> > @@ -391,6 +441,10 @@ AlterEventTrigger(AlterEventTrigStmt *stmt)
> >
> >   CatalogTupleUpdate(tgrel, >t_self, tup);
> >
> > + if (namestrcmp(>evtevent, "login") == 0 &&
> > + tgenabled != TRIGGER_DISABLED)
> > + SetDatatabaseHasLoginEventTriggers();
> > +
> >   InvokeObjectPostAlterHook(EventTriggerRelationId,
> > trigoid, 0);
> >
> > @@ -557,7 +611,7 @@ filter_event_trigger(CommandTag tag, 
> > EventTriggerCacheItem *item)
> >  static List *
> >  EventTriggerCommonSetup(Node *parsetree,
> >   EventTriggerEvent event, 
> > const char *eventstr,
> > - EventTriggerData *trigdata)
> > + EventTriggerData *trigdata, 
> > bool unfiltered)
> >  {
> >   CommandTag  tag;
> >   List   *cachelist;
> > @@ -582,10 +636,15 @@ EventTriggerCommonSetup(Node *parsetree,
> >   {
> >   CommandTag  dbgtag;
> >
> > - dbgtag = CreateCommandTag(parsetree);
> > + if (event == EVT_Login)
> > + dbgtag = 

Re: Lowering the default wal_blocksize to 4K

2023-10-10 Thread Matthias van de Meent
On Tue, 10 Oct 2023 at 06:14, Andres Freund  wrote:
>
> Hi,
>
> On 2023-10-09 23:16:30 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> There's an alternative approach we could take, which is to write in 4KB
>>> increments, while keeping 8KB pages. With the current format that's not
>>> obviously a bad idea. But given there aren't really advantages in 8KB WAL
>>> pages, it seems we should just go for 4KB?
>>
>> Seems like that's doubling the overhead of WAL page headers.  Do we need
>> to try to skinny those down?
>
> I think the overhead is small, and we are wasting so much space in other
> places, that I am not worried about the proportional increase page header
> space usage at this point, particularly compared to saving in overall write
> rate and increase in TPS. There's other areas we can save much more space, if
> we want to focus on that.
>
> I was thinking we should perhaps do the opposite, namely getting rid of short
> page headers. The overhead in the "byte position" <-> LSN conversion due to
> the differing space is worse than the gain. Or do something inbetween - having
> the system ID in the header adds a useful crosscheck, but I'm far less
> convinced that having segment and block size in there, as 32bit numbers no
> less, is worthwhile. After all, if the system id matches, it's not likely that
> the xlog block or segment size differ.

Hmm. I don't think we should remove those checks, as I can see people
that would want to change their XLog block size with e.g.
pg_reset_wal.
But I think we can relatively easily move segsize/blocksize checks to
a different place in the normal page header, which would reduce the
number of bytes we'd have to store elsewhere.

We could move segsize/blocksize into the xlp_info flags: 12 of the 16
bits are currently unused. Using 4 of these bits for segsize
(indicating 2^N MB, current accepted values are N=0..10 for 1 MB ...
1024MB) and 4 (or 3) for blcksz (as we currently support 1..64 kB
blocks, or 2^{0..6} kB). This would remove the need for 2 of the 3
fields in the large xlog block header.

After that we'll only have the system ID left from the extended
header, which we could store across 2 pages in the (current) alignment
losses of xlp_rem_len - even pages the upper half, uneven pages the
lower half of the ID. This should allow for enough integrity checks
without further increasing the size of XLogPageHeader in most
installations.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Removing unneeded self joins

2023-10-10 Thread Alena Rybakina

Hi!

I have reviewed your patch and I noticed a few things.

First of all, I think I found a bug in your latest patch version, and 
this query shows it:


EXPLAIN (COSTS OFF)
SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
  SELECT e1.oid FROM pg_extension e1, pg_extension e2
  WHERE e1.oid=e2.oid) AS e
  ON c.oid=e.oid;

In the current version we get such a query plan:

   QUERY PLAN
-
 Hash Full Join
   Hash Cond: (c.oid = e2.oid)
   ->  Seq Scan on pg_class c
   ->  Hash
 ->  Seq Scan on pg_extension e2
(5 rows)

But I think it should be:

QUERY PLAN
-
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
*Filter: (oid IS NOT NULL)*
(6 rows)

I have looked at the latest version of the code, I assume that the error 
lies in the replace_varno_walker function, especially in the place where 
we check the node by type Var, and does not form any NullTest node.


if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this 
condition works

    {
  rinfo->clause = (Expr *) makeBoolConst(true, false);
    }
    else
    {
  NullTest   *ntest = makeNode(NullTest);

  ntest->arg = leftOp;
  ntest->nulltesttype = IS_NOT_NULL;
  ntest->argisrow = false;
  ntest->location = -1;
  rinfo->clause = (Expr *) ntest;
    }


Secondly, I added some code in some places to catch erroneous cases and 
added a condition when we should not try to apply the self-join-removal 
transformation due to the absence of an empty self-join list after 
searching for it and in general if there are no joins in the query. 
Besides, I added a query for testing and wrote about it above. I have 
attached my diff file.



In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?


/*
 * At this stage joininfo lists of inner and outer can contain
 * only clauses, required for *a superior outer join* that can't
 * influence on this optimization. So, we can avoid to call the
 * build_joinrel_restrictlist() routine.
*/
 restrictlist = generate_join_implied_equalities(root, joinrelids,
  inner->relids,
  outer, NULL);

--

Regards,
Alena Rybakina
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3e10083905c..5ba5ca693f1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1704,6 +1704,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
 	List			   *binfo_candidates = NIL;
 	ReplaceVarnoContext	ctx = {.from = toRemove->relid, .to = toKeep->relid};
 
+	Assert(toKeep->relid != -1);
+
 	/*
 	 * Replace index of removing table with the keeping one. The technique of
 	 * removing/distributing restrictinfo is used here to attach just appeared
@@ -2007,6 +2009,8 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
 /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
 continue;
 
+			Assert(is_opclause(orinfo->clause));
+
 			oclause = bms_is_empty(orinfo->left_relids) ?
 	get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
 			c2 = (bms_is_empty(orinfo->left_relids) ?
@@ -2150,6 +2154,18 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			split_selfjoin_quals(root, restrictlist, ,
  , inner->relid, outer->relid);
 
+			if (list_length(selfjoinquals) == 0)
+ 			{
+ /*
+  * XXX:
+  * we would detect self-join without quals like 'x==x' if we had
+  * an foreign key constraint on some of other quals and this join
+  * haven't any columns from the outer in the target list.
+  * But it is still complex task.
+  */
+ continue;
+ 			}
+
 			/*
 			 * To enable SJE for the only degenerate case without any self join
 			 * clauses at all, add baserestrictinfo to this list.
@@ -2332,7 +2348,7 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
 	Relids	ToRemove = NULL;
 	int		relid = -1;
 
-	if (!enable_self_join_removal)
+	if ((list_length(joinlist) <=1 && !IsA(linitial(joinlist), List)) || !enable_self_join_removal)
 		return joinlist;
 
 	/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 10b23944feb..800410d6b18 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,13 +5807,11 @@ explain (costs off)
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
   where p.k = 1 and p.k = 2;
-   QUERY PLAN

Re: document the need to analyze partitioned tables

2023-10-10 Thread Bruce Momjian
On Fri, Oct  6, 2023 at 06:49:05PM +0200, Laurenz Albe wrote:
> On Fri, 2023-10-06 at 12:20 -0400, Bruce Momjian wrote:
> > Good points, updated patch attached.
> 
> That patch is good to go, as far as I am concerned.

Patch applied back to PG 11, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: New WAL record to detect the checkpoint redo location

2023-10-10 Thread Robert Haas
On Mon, Oct 9, 2023 at 4:47 PM Andres Freund  wrote:
> As noted in my email from a few minutes ago, I agree that optimizing this
> shouldn't be a requirement for merging the patch.

Here's a new patch set. I think I've incorporated the performance
fixes that you've suggested so far into this version. I also adjusted
a couple of other things:

- After further study of a point previously raised by Amit, I adjusted
CreateCheckPoint slightly to call WALInsertLockAcquireExclusive
significantly later than before. I think that there's no real reason
to do it so early and that the current coding is probably just a
historical leftover, but it would be good to have some review here.

- I added a cross-check that when starting redo from a checkpoint
whose redo pointer points to an earlier LSN that the checkpoint
itself, the record we read from that LSN must an XLOG_CHECKPOINT_REDO
record.

- I combined what were previously 0002 and 0003 into a single patch,
since that's how this would get committed.

- I fixed up some comments.

- I updated commit messages.

Hopefully this is getting close to good enough.

> I did verify that they continue to be a bottleneck even after (incorrectly
> obviously), removing the spinlock. It's also not too surprising, the latency
> of 64bit divs is just high, particularly on intel from a few years ago (my
> cascade lake workstation) and IIRC there's just a single execution port for it
> too, so multiple instructions can't be fully parallelized.

The chipset on my laptop is even older. Coffee Lake, I think.

I'm not really sure that there's a whole lot we can reasonably do
about the divs unless you like the fastpath idea that I proposed
earlier, or unless you want to write a patch to either get rid of
short page headers or make long and short page headers the same number
of bytes. I have to admit I'm surprised by how visible the division
overhead is in this code path -- but I'm also somewhat inclined to
view that less as evidence that division is something we should be
desperate to eliminate and more as evidence that this code path is
quite fast already. In light of your findings, it doesn't seem
completely impossible to me that the speed of integer division in this
code path could be part of what limits performance for some users, but
I'm also not sure it's all that likely or all that serious, because
we're deliberating creating test cases that insert unreasonable
amounts of WAL without doing any actual work. In the real world,
there's going to be a lot more other code running along with this code
- probably at least the executor and some heap AM code - and I bet not
all of that is as well-optimized as this is already. And it's also
quite likely for many users that the real limits on the speed of the
workload will be related to I/O or lock contention rather than CPU
cost in any form. I'm not saying it's not worth worrying about it. I'm
just saying that we should make sure the amount of worrying we do is
calibrated to the true importance of the issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v8-0001-Unify-two-isLogSwitch-tests-in-XLogInsertRecord.patch
Description: Binary data


v8-0002-During-online-checkpoints-insert-XLOG_CHECKPOINT_.patch
Description: Binary data


Re: Suggestion. Optional local ORDER BY clause for DISTINCT ON

2023-10-10 Thread Tom Lane
Stefan Stefanov  writes:
>  Gents, I have a suggestion for DISTINCT ON clause syntax.
>DISTINCT ON (expression(s) [ORDER BY expression(s)]) 
> Determines the precedence within each DISTINCT ON group (i.e. the ‘first’ row 
> to be picked)

> Motivation
> • Using the query-wide ORDER BY clause to determine which record to pick 
> mixes two unrelated concerns, ‘first’ row selection and result-set ordering. 
> This may be confusing;
> • The DISTINCT ON expression(s) must match the leftmost ORDER BY 
> expression(s). This may cause inconvenience and require nesting as a 
> sub-query to order the result-set.

Since you can get the desired behavior with a sub-select, I'm
not especially excited about extending DISTINCT ON.  If it weren't
such a nonstandard kluge, I might feel differently; but it's not
an area that I think we ought to put more effort into.

regards, tom lane




Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-10 Thread Tom Lane
I wrote:
> Also, if you compare that test case to the one immediately following
> it, it's downright weird that we are presently smarter about
> optimizing the more complicated case.  (I've not dug into exactly
> why that is; maybe worth running it to ground?)

The reason seems to be that joinrels.c's restriction_is_constant_false
knows that it has to check all members of the restrictinfo list, not
just one; and we get to that because some of the originally generated
EC clauses are join clauses in the second case.

So this logic in relation_excluded_by_constraints is just wrong ---
premature optimization on my part, looks like.

regards, tom lane




Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-10 Thread Tom Lane
Richard Guo  writes:
> On Tue, Oct 10, 2023 at 5:10 PM David Rowley  wrote:
>> After making the change, I saw the same regression test change as you
>> did, but didn't really feel like it was worth tackling separately from
>> the patch that we were working on.

> I was thinking that this change may be worthwhile by itself even without
> the 'reduce-NullTest' patch, because it can benefit some cases, such as
> where EC generates constant-FALSE on the fly.  So maybe it's worth a
> separate patch?  I'm not quite sure.

I think it's worth pushing separately, since it has a positive impact
on existing cases, as shown by the regression test plan change.
Also, if you compare that test case to the one immediately following
it, it's downright weird that we are presently smarter about
optimizing the more complicated case.  (I've not dug into exactly
why that is; maybe worth running it to ground?)

regards, tom lane




Suggestion. Optional local ORDER BY clause for DISTINCT ON

2023-10-10 Thread Stefan Stefanov
 Gents, I have a suggestion for DISTINCT ON clause syntax.
   
   DISTINCT ON (expression(s) [ORDER BY expression(s)]) 
Determines the precedence within each DISTINCT ON group (i.e. the ‘first’ row 
to be picked)

Motivation
• Using the query-wide ORDER BY clause to determine which record to pick mixes 
two unrelated concerns, ‘first’ row selection and result-set ordering. This may 
be confusing;
• The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). 
This may cause inconvenience and require nesting as a sub-query to order the 
result-set.

Pros
• Backward compatibility. If the local ORDER BY clause is missing then the 
current rules shall apply;
• Familiar and consistent syntax and semantics, the same as in *_agg functions;
• Clear distinction of first row selection and result-set ordering;
• Good readability;
• The DISTINCT ON expression(s) do not have to match the leftmost ORDER BY 
expression(s).

Cons
 • Possible extra verbosity  
  Best regards,  Stefan
1  1  1  1  MicrosoftInternetExplorer4  0  2  
DocumentNotSpecified  7.8 磅Normal   0   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


Re: Retire has_multiple_baserels()

2023-10-10 Thread Tom Lane
Aleksander Alekseev  writes:
>> The function has_multiple_baserels() is used in set_subquery_pathlist()
>> to check and see if there are more than 1 base rel, by looping through
>> simple_rel_array[].  I think one simpler way to do that is to leverage
>> root->all_baserels by
>>  bms_membership(root->all_baserels) == BMS_MULTIPLE

> I used the following patch to double check that nothing was missed:
> ...
> It wasn't. The patch LGTM.

I thought this test wasn't too complete, because has_multiple_baserels
isn't reached at all in many cases thanks to the way the calling if()
is coded.  I tried testing like this instead:

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index eea49cca7b..3f6fc51fb4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2649,6 +2649,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
  */
 remove_unused_subquery_outputs(subquery, rel, run_cond_attrs);
 
+Assert(has_multiple_baserels(root) == (bms_membership(root->all_baserels) 
== BMS_MULTIPLE));
+
 /*
  * We can safely pass the outer tuple_fraction down to the subquery if the
  * outer level has no joining, aggregation, or sorting to do. Otherwise

and came to the same conclusion: check-world finds no cases where
the assertion fails.  So it LGTM too.  Pushed.

regards, tom lane




Re: On login trigger: take three

2023-10-10 Thread Andres Freund
Hi,

On 2023-10-10 08:18:46 +0300, Alexander Korotkov wrote:
> @@ -968,7 +969,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>  
>   if (!get_db_info(dbtemplate, ShareLock,
>_dboid, _owner, _encoding,
> -  _istemplate, _allowconn,
> +  _istemplate, _allowconn, 
> _hasloginevt,
>_frozenxid, _minmxid, 
> _deftablespace,
>_collate, _ctype, 
> _iculocale, _icurules, _locprovider,
>_collversion))

This isn't your fault, but this imo has become unreadable. Think we ought to
move the information about a database to a struct.


> @@ -296,6 +306,13 @@ insert_event_trigger_tuple(const char *trigname, const 
> char *eventname, Oid evtO
>   CatalogTupleInsert(tgrel, tuple);
>   heap_freetuple(tuple);
>  
> + /*
> +  * Login event triggers have an additional flag in pg_database to allow
> +  * faster lookups in hot codepaths. Set the flag unless already True.
> +  */
> + if (strcmp(eventname, "login") == 0)
> + SetDatatabaseHasLoginEventTriggers();

It's not really faster lookups, it's no lookups, right?


>   /* Depend on owner. */
>   recordDependencyOnOwner(EventTriggerRelationId, trigoid, evtOwner);
>  
> @@ -357,6 +374,39 @@ filter_list_to_array(List *filterlist)
>   return PointerGetDatum(construct_array_builtin(data, l, TEXTOID));
>  }
>  
> +/*
> + * Set pg_database.dathasloginevt flag for current database indicating that
> + * current database has on login triggers.
> + */
> +void
> +SetDatatabaseHasLoginEventTriggers(void)
> +{
> + /* Set dathasloginevt flag in pg_database */
> + Form_pg_database db;
> + Relationpg_db = table_open(DatabaseRelationId, 
> RowExclusiveLock);
> + HeapTuple   tuple;
> +
> + /*
> +  * Use shared lock to prevent a conflit with EventTriggerOnLogin() 
> trying
> +  * to reset pg_database.dathasloginevt flag.  Note that we use
> +  * AccessShareLock allowing setters concurently.
> +  */
> + LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, AccessShareLock);

That seems like a very odd approach - how does this avoid concurrency issues
with one backend setting and another unsetting the flag?  And outside of that,
won't this just lead to concurrently updated tuples?


> + tuple = SearchSysCacheCopy1(DATABASEOID, 
> ObjectIdGetDatum(MyDatabaseId));
> + if (!HeapTupleIsValid(tuple))
> + elog(ERROR, "cache lookup failed for database %u", 
> MyDatabaseId);
> + db = (Form_pg_database) GETSTRUCT(tuple);
> + if (!db->dathasloginevt)
> + {
> + db->dathasloginevt = true;
> + CatalogTupleUpdate(pg_db, >t_self, tuple);
> + CommandCounterIncrement();
> + }
> + table_close(pg_db, RowExclusiveLock);
> + heap_freetuple(tuple);
> +}
> +
>  /*
>   * ALTER EVENT TRIGGER foo ENABLE|DISABLE|ENABLE ALWAYS|REPLICA
>   */
> @@ -391,6 +441,10 @@ AlterEventTrigger(AlterEventTrigStmt *stmt)
>  
>   CatalogTupleUpdate(tgrel, >t_self, tup);
>  
> + if (namestrcmp(>evtevent, "login") == 0 &&
> + tgenabled != TRIGGER_DISABLED)
> + SetDatatabaseHasLoginEventTriggers();
> +
>   InvokeObjectPostAlterHook(EventTriggerRelationId,
> trigoid, 0);
>  
> @@ -557,7 +611,7 @@ filter_event_trigger(CommandTag tag, 
> EventTriggerCacheItem *item)
>  static List *
>  EventTriggerCommonSetup(Node *parsetree,
>   EventTriggerEvent event, const 
> char *eventstr,
> - EventTriggerData *trigdata)
> + EventTriggerData *trigdata, 
> bool unfiltered)
>  {
>   CommandTag  tag;
>   List   *cachelist;
> @@ -582,10 +636,15 @@ EventTriggerCommonSetup(Node *parsetree,
>   {
>   CommandTag  dbgtag;
>  
> - dbgtag = CreateCommandTag(parsetree);
> + if (event == EVT_Login)
> + dbgtag = CMDTAG_LOGIN;
> + else
> + dbgtag = CreateCommandTag(parsetree);
> +
>   if (event == EVT_DDLCommandStart ||
>   event == EVT_DDLCommandEnd ||
> - event == EVT_SQLDrop)
> + event == EVT_SQLDrop ||
> + event == EVT_Login)
>   {
>   if (!command_tag_event_trigger_ok(dbgtag))
>   elog(ERROR, "unexpected command tag \"%s\"", 
> GetCommandTagName(dbgtag));
> @@ -604,7 +663,10 @@ EventTriggerCommonSetup(Node *parsetree,
>   return NIL;
>  
>   /* Get the command tag. */
> - tag = CreateCommandTag(parsetree);
> + if (event == EVT_Login)
> +

Re: Fwd: Advice about preloaded libraries

2023-10-10 Thread Alvaro Herrera
On 2023-Oct-10, Esteban Zimanyi wrote:

> As can be seen above, it is not REALLY mandatory to have
> shared_preload_libraries = 'postgis-3' but then the user is responsible for
> issuing a query to load PostGIS (select st_point(1,1); above) and then she
> is able to execute MobilityDB queries.

Calling a function that exists in some library will cause the library to
be loaded.  Alternatively, you can cause the library to be loaded
automatically at some point of the start sequence, by
shared_preload_libraries or the other configuration options.  Or you can
use the LOAD statement.

If by whichever mechanism postgis has been loaded into your session,
then calling a function in MobilityDB will work fine, because the
postgis library will have been loaded.  It doesn't matter exactly how
was postgis loaded.

The advantage of using shared_preload_libraries is performance of
connection establishment: the library is loaded by the postmaster, so
each new backend inherits it already loaded and doesn't have to load it
itself.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)




Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

2023-10-10 Thread Evan Jones
Thanks for bringing this up! I just looked at the uses if isspace() in that
file. It looks like it is the usual thing: it is allowing leading or
trailing whitespace when parsing values, or for this "needs quoting" logic
on output. The fix would be the same: this *should* be
using scanner_isspace. This has the same disadvantage: it would change
Postgres's results for some inputs that contain these non-ASCII "space"
characters.


Here is a quick demonstration of this issue, showing that the quoting
behavior is different between these two. Mac OS X with the "default" locale
includes quotes because ą includes  0x85 in its UTF-8 encoding:

postgres=# SELECT ROW('keyą');
   row
--
 ("keyą")
(1 row)

On Mac OS X with the LANG=C environment variable set, it does not include
quotes:

postgres=# SELECT ROW('keyą');
  row

 (keyą)
(1 row)


On Mon, Oct 9, 2023 at 11:18 PM Thomas Munro  wrote:

> FTR I ran into a benign case of the phenomenon in this thread when
> dealing with row types.  In rowtypes.c, we double-quote stuff
> containing spaces, but we detect them by passing individual bytes of
> UTF-8 sequences to isspace().  Like macOS, Windows thinks that 0xa0 is
> a space when you do that, so for example the Korean character '점'
> (code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but
> not on Linux.  That confused a migration/diff tool while comparing
> Windows and Linux database servers using that representation.  Not a
> big deal, I guess no one ever promised that the format was stable
> across platforms, and I don't immediately see a way for anything more
> serious to go wrong (though I may lack imagination).  It does seem a
> bit weird to be using locale-aware tokenising for a machine-readable
> format, and then making sure its behaviour is undefined by feeding it
> chopped up bytes.
>


Re: On login trigger: take three

2023-10-10 Thread Robert Haas
On Mon, Oct 9, 2023 at 10:11 AM Alexander Korotkov  wrote:
>  * Hold lock during setting of pg_database.dathasloginevt flag (v32
> version actually didn't prevent race condition).

So ... how does getting this flag set actually work? And how does
clearing it work?

In the case of row-level security, you have to explicitly enable the
flag on the table level using DDL provided for that purpose. In the
case of relhas{rules,triggers,subclass} the flag is set automatically
as a side-effect of some other operation. I tend to consider that the
latter design is somewhat messy. It's potentially even messier here,
because at least when you add a rule or a trigger to a table you're
expecting to take a lock on the table anyway. I don't think you
necessarily expect creating a login trigger to take a lock on the
database. That's a bit odd and could have visible side effects. And if
you don't, then what happens is that if you create two login triggers
in overlapping transactions, then (1) if there were no login triggers
previously, one of the transactions fails with an internal-looking
error message about a concurrent tuple update and (2) if there were
login triggers previously, then it works fine. That's also a bit weird
and surprising. Now the counter-argument could be that adding new DDL
to enable login triggers for a database is too much cognitive burden
and it's better to have the kind of weird and surprising behavior that
I just discussed. I don't know that I would buy that argument, but it
could be made ... and my real point here is that I don't even see
these trade-offs being discussed. Apologies if they were discussed
earlier and I just missed that; I confess to not having read every
email message on this topic, and some of the ones I did read I read a
long time ago.

> This version should be good and has no overhead.  Any thoughts?
> Daniel, could you please re-run the performance tests?

Is "no overhead" an overly bold claim here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Request for comment on setting binary format output per session

2023-10-10 Thread Robert Haas
On Tue, Oct 10, 2023 at 10:30 AM Dave Cramer  wrote:
> Correct me if I am wrong, but the client has to request this. So I'm not sure 
> how we would be surprised ?

Consider an application, a connection pooler, and a stored procedure
or function on the server. If this is controlled by a GUC, any of them
could set it at any point in the session. That could lead to the
application and/or the connection pooler being out of step with the
server behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Request for comment on setting binary format output per session

2023-10-10 Thread Robert Haas
On Mon, Oct 9, 2023 at 5:02 PM Jelte Fennema  wrote:
> Honestly I think the main difference is the need to introduce this
> explicit protocol message. If we do, I think it might be best to have
> this be a way of setting a GUC at the Protocol level, and expand the
> GucContext enum to have a way to disallow setting it from SQL (e.g.
> PGC_PROTOCOL), while still allowing PgBouncer (or other poolers) to
> change the GUC as part of the connection handoff, in a way that's
> similar to what's being done for client_encoding now. We might even
> want to make client_encoding PGC_PROTOCOL too (eventually).

That's an idea worth considering, IMHO. I'm not saying it's the best
or only idea, but it seems to have some real advantages.

The pooler case is actually a really important one here. If the client
is connected directly to the server, the difference between whether
something is controlled via the protocol or via SQL is just whether it
could be set inside some function. I think that's a thing to be
concerned about, but when you add the pooler to the equation then you
have the additional question of whether a certain value should be
controlled by the end-client or by the pooler. A really obvious
example of where you might want the latter behavior is
session_authorization. You'd like the pooler to be able to set that in
such a way that the end-client can't tinker with it by any means.
Right now we don't have a way to do that, but maybe someday we will.
This issue is perhaps a bit less critical, but it still feels bad if
the end-client can effectively pull the rug out from under the
pooler's wire protocol expectations. I'm not exactly sure what the
right policy is here concretely, so I'm not ready to argue for exactly
what we should do just yet, but I do want to argue that we should be
thinking carefully about these issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Request for comment on setting binary format output per session

2023-10-10 Thread Dave Cramer
On Tue, 10 Oct 2023 at 10:25, Robert Haas  wrote:

> On Mon, Oct 9, 2023 at 4:25 PM Jeff Davis  wrote:
> > Another thing to consider is that using a GUC for binary formats is a
> > protocol change in a way that client_encoding is not. The existing
> > documentation for the protocol already specifies when binary formats
> > will be used, and a GUC would change that behavior. We absolutely would
> > need to update the documentation, and clients (like psql) really should
> > be updated.
>
> I think the idea of using a new parameterFormat value is a good one.
> Let 0 and 1 continue to mean what they mean, and let clients opt in to
> the new mechanism if they're aware of it.
>

Correct me if I am wrong, but the client has to request this. So I'm not
sure how we would be surprised ?

Dave


Re: Request for comment on setting binary format output per session

2023-10-10 Thread Robert Haas
On Mon, Oct 9, 2023 at 4:25 PM Jeff Davis  wrote:
> Another thing to consider is that using a GUC for binary formats is a
> protocol change in a way that client_encoding is not. The existing
> documentation for the protocol already specifies when binary formats
> will be used, and a GUC would change that behavior. We absolutely would
> need to update the documentation, and clients (like psql) really should
> be updated.

I think the idea of using a new parameterFormat value is a good one.
Let 0 and 1 continue to mean what they mean, and let clients opt in to
the new mechanism if they're aware of it.

I think it's a pretty bad idea to dump new protocol behavior on
clients who have in no way indicated that they know about it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Comparing two double values method

2023-10-10 Thread Tom Lane
Heikki Linnakangas  writes:
> On 10/10/2023 13:31, Bowen Shi wrote:
>> I noticed that in the `check_GUC_init` function, there is a direct
>> comparison using the != operator for two double values, which seems
>> problematic.

> No, the compile-time initial values should match exactly.

Right.  The point of this test is to catch cases where you wrote,
say,

double my_guc = 1.1;

but the boot_val for it in guc_tables.c is 1.2.  There is no
reason to allow any divergence in the spellings of the two C
literals, so as long as they're compiled by the same compiler
there's no reason to expect that the compiled values wouldn't
be bit-equal.

The point of the exclusions for zero is to allow you to just
write

double my_guc;

without expressing an opinion about the initial value.
(Yes, this does mean that "double my_guc = 0.0;" could
be misleading.  It's just a heuristic though.)

regards, tom lane




Re: CHECK Constraint Deferrable

2023-10-10 Thread Robert Haas
On Mon, Oct 9, 2023 at 5:07 PM David G. Johnston
 wrote:
>> 2. I don't think it's a good idea for the same patch to try to solve
>> two problems unless they are so closely related that solving one
>> without solving the other is not sensible.
>
> A NOT NULL constraint apparently is just a special case of a check constraint 
> which seems closely related enough to match your definition.

Yes, that might be true. I suppose I'd like to hear from the patch
author(s) about that. I'm somewhat coming around to your idea that
maybe both should be covered together, but I'm not the one writing the
patch.

> But I guess you are right, I was trying to say no to this patch, and yes to 
> the not null deferral idea, without being so explicit and final about it.

But this, I dislike, for reasons which I'm sure you can appreciate. As
you say, people are free to choose their own development priorities. I
don't need this feature for anything either, personally, but my need
or lack of it for some particular feature doesn't define the objective
usefulness thereof. And to be honest, if I were trying to step back
from my personal needs, I'd say this seems likely to be more useful
than 75% of what's in the CommitFest. Your judgement can be different
and that's fine too, but I think the argument for calling this useless
is weak, especially given that several people have already mentioned
ways that they would like to use it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Pre-proposal: unicode normalized text

2023-10-10 Thread Robert Haas
On Tue, Oct 10, 2023 at 2:44 AM Peter Eisentraut  wrote:
> Can you restate what this is supposed to be for?  This thread appears to
> have morphed from "let's normalize everything" to "let's check for
> unassigned code points", but I'm not sure what we are aiming for now.

Jeff can say what he wants it for, but one obvious application would
be to have the ability to add a CHECK constraint that forbids
inserting unassigned code points into your database, which would be
useful if you're worried about forward-compatibility with collation
definitions that might be extended to cover those code points in the
future. Another application would be to find data already in your
database that has this potential problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: RFC: Logging plan of the running query

2023-10-10 Thread torikoshia

On 2023-10-04 03:00, James Coleman wrote:

and I think
what we need to do is explicitly disallow running this code any time
we are inside of lock acquisition code.


Updated patch to check if any locks have already been acquired by 
examining MyProc->heldLocks.


I'm not sure this change can "disallow running this code `any time` we 
are inside of lock acquisition code", but as far as select1.trace, which 
you shared, I believe it can prevent running explain codes since it must 
have set MyProc->heldLocks in LockAcquireExtended() before WaitOnLock():


```
 /*
  * Set bitmask of locks this process already holds on this 
object.

  */
 MyProc->heldLocks = proclock->holdMask;

..(snip)..

 WaitOnLock(locallock, owner);
```

On 2023-10-07 00:58, Andres Freund wrote:

How so? We shouldn't commonly acquire relevant locks while executing a 
query?
With a few exceptions, they should instead be acquired t the start of 
query
processing. We do acquire a lot of lwlocks, obviously, but we don't 
process

interrupts during the acquisition / holding of lwlocks.

And presumably the interrupt would just be processed the next time 
interrupt

processing is happening?


Thanks for your comments!

I tested v30 patch with 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes 
CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that 
very few logging queries failed with v30 patch.


This is a result in line with your prediction.


```
 $ rg -c'ignored request for logging query plan due to lock confilcts' 
postmaster.log

 8
```

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom aaaca1523ed5342c9f77d79963e0d256146381d2 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 10 Oct 2023 22:07:28 +0900
Subject: [PATCH v30] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat, Alena Rybakina

Co-authored-by: James Coleman 
---
 doc/src/sgml/func.sgml   |  49 +
 src/backend/access/transam/xact.c|  13 ++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 177 ++-
 src/backend/executor/execMain.c  |  14 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/error/elog.c   |   2 +
 src/backend/utils/init/globals.c |   2 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   4 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/tcop/pquery.h|   2 +-
 src/include/utils/elog.h |   1 +
 src/test/regress/expected/misc_functions.out |  54 --
 src/test/regress/sql/misc_functions.sql  |  41 +++--
 19 files changed, 360 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f1ad64c3d6..b7c7fa9169 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26308,6 +26308,25 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID.
+It will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+
+  
+
   

 
@@ -26528,6 +26547,36 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Amit Kapila
On Tue, Oct 10, 2023 at 4:51 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> >
> > Isn't it sufficient to add a test for silent mode in
> > begin/stream_start/begin_prepare kind of APIs and set
> > ctx->did_process? In all other APIs, we can assert that did_process
> > shouldn't be set and we never reach there when decoding mode is
> > silent.
> >
> >
> > + /* Check whether the meaningful change was found */
> > + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> > + ctx->did_process);
> >
> > Are you talking about this check in the patch? If so, can you please
> > explain when does the first check help?
>
> I changed around here so I describe once again.
>
> A flag (output_skipped) is set when the transaction is decoded till the end in
> silent mode. It is done in DecodeTXNNeedSkip() because the function is the 
> common
> path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() 
> returns
> true when the decoding context is in the silent mode. Therefore, any 
> cb_wrapper
> functions would not be called anymore. DecodingContextHasdecodedItems() just
> returns output_skipped.
>
> This approach needs to read WALs till end of transactions before returning the
> upgrading function, but codes look simpler than the previous version.
>

 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
Oid txn_dbid, RepOriginId origin_id)
 {
- return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
- (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
- ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+ bool need_skip;
+
+ need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+ ctx->decoding_mode != DECODING_MODE_NORMAL ||
+ FilterByOrigin(ctx, origin_id));
+
+ /* Set a flag if we are in the slient mode */
+ if (ctx->decoding_mode == DECODING_MODE_SILENT)
+ ctx->output_skipped = true;
+
+ return need_skip;

I think you need to set the new flag only when we are not skipping the
transaction or in other words when we decide to process the
transaction. Otherwise, how will you distinguish the case where the
xact is already decoded and sent to client?

--
With Regards,
Amit Kapila




Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-10 Thread Amit Kapila
On Tue, Oct 10, 2023 at 11:40 AM vignesh C  wrote:
>
> On Tue, 10 Oct 2023 at 08:47, Peter Smith  wrote:
> > PSA v3.
>
> Few more instances in other logical replication related pages:
> 1) Another instance was in alter_subscription.sgml:
>   Fetch missing table information from publisher.  This will start
>   replication of tables that were added to the subscribed-to publications
>   since CREATE SUBSCRIPTION or
>   the last invocation of REFRESH PUBLICATION.
>

Do we want each and every occurrence of the commands to have
corresponding links? I am not against it if we think that is useful
for users but asking as I am not aware of the general practice we
follow in this regard. Does anyone else have any opinion on this
matter?

-- 
With Regards,
Amit Kapila.




Re: Request for comment on setting binary format output per session

2023-10-10 Thread Dave Cramer
On Mon, 9 Oct 2023 at 17:11, Jelte Fennema  wrote:

> On Mon, 9 Oct 2023 at 21:08, Dave Cramer  wrote:
> > So if we use . would it be possible to have something like
>  which represents a set of well known types?
> > My goal here is to reduce the overhead of naming all the types the
> client  wants in binary. The list of well known types is pretty long.
> > Additionally we could have a shorthand for removing a well known type.
>
> You're only setting this once in the lifetime of the connection right,
>

Correct

> i.e. right at the start (although pgbouncer could set it once per
> transaction in the worst case). It seems like it shouldn't really
> matter much to optimize the size of the "SET format_binary=..."
> command, I'd expect it to be at most 1 kilobyte. I'm not super opposed
> to having a shorthand for some of the most commonly wanted built-in
> types, but then we'd need to decide on what those are, which would add
> even more discussion/bikeshedding to this thread. I'm not sure the win
> in size is worth that effort.
>
It's worth the effort if we use schema.typename, if we use oids then I'm
not that invested in this approach.

Dave


Re: Comparing two double values method

2023-10-10 Thread Bowen Shi
You're right, I made a mistake.

Thanks for your explanation.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Thanks for giving comments and apologize for late reply.
New version is available in [1].

> +1 for this approach. It looks neat.
> 
> I think we also need to add TAP tests to generate decodable WAL
> records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
> XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
> pg_upgrade as described here
> https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256
> B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com.
> Basically, these were the exceptional WAL records that may be
> generated by pg_upgrade, so having tests for them is good.

Hmm, I'm not sure it is really good. If we add such a test, we may have to add
further tests in future if new WAL log types during upgrade is introduced.
Currently we do not have if-statement for each WAL types, so it does not improve
coverage, I thought. Another concern is that I'm not sure how do we simply and
surely generate XLOG_HEAP2_PRUNE.

Based on above, I did not add the test case for now.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for reviewing! You can available new version in [1].

> 
> Few comments:
> 1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
> this funcion too:
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +   Namename = PG_GETARG_NAME(0);
> +   Nameplugin = PG_GETARG_NAME(1);
> +
> +   /* Temporary slots is never handled in this function */
> +   booltwo_phase = PG_GETARG_BOOL(2);

Yeah, needed. For testing purpose I did not add, but it should have.
Added.

> 2) Generally we are specifying the slot name in this case, is slot
> name null check required:
> +Datum
> +binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
> +{
> +   Nameslot_name;
> +   XLogRecPtr  end_of_wal;
> +   LogicalDecodingContext *ctx = NULL;
> +   boolhas_record;
> +
> +   CHECK_IS_BINARY_UPGRADE;
> +
> +   /* Quick exit if the input is NULL */
> +   if (PG_ARGISNULL(0))
> +   PG_RETURN_BOOL(false);


NULL check was added. I felt that we should raise an ERROR. 

> 3) Since this is similar to pg_create_logical_replication_slot, can we
> add a comment saying any change in pg_create_logical_replication_slot
> would also need the same check to be added in
> binary_upgrade_create_logical_replication_slot:
> +/*
> + * SQL function for creating a new logical replication slot.
> + *
> + * This function is almost same as pg_create_logical_replication_slot(), but
> + * this can specify the restart_lsn.
> + */
> +Datum
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +   Namename = PG_GETARG_NAME(0);
> +   Nameplugin = PG_GETARG_NAME(1);
> +
> +   /* Temporary slots is never handled in this function */

Added.

> 4) Any conclusion on this try catch comment, do you want to add which
> setting you want to revert in catch, if try/catch is not required we
> can remove this comment:
> +   ReplicationSlotAcquire(NameStr(*slot_name), true);
> +
> +   /* XXX: Is PG_TRY/CATCH needed around here? */
> +
> +   /*
> +* We use silent mode here to decode all changes without
> outputting them,
> +* allowing us to detect all the records that could be sent 
> downstream.
> +*/

After considering more, it's OK to raise an ERROR because caller can detect it.
Also, there are any setting to be reverted. The comment is removed.

> 5) I felt these 2 comments can be combined as both are trying to say
> the same thing:
> + * This is a special purpose function to ensure that there are no WAL records
> + * pending to be decoded after the given LSN.
> + *
> + * It is used to ensure that there is no pending WAL to be consumed for
> + * the logical slots.

Later part was removed.

> 6) I feel this memset is not required as we are initializing at the
> beginning of function, if you want to keep the memset, the
> initialization can be removed:
> +   values[2] = CStringGetTextDatum(xlogfilename);
> +
> +   memset(nulls, 0, sizeof(nulls));
> +
> +   tuple = heap_form_tuple(tupdesc, values, nulls);

The initialization was removed to follow pg_create_logical_replication_slot.

> 7) looks like a typo, "mu" should be "must":
> +   /*
> +* Also, we mu execute pg_log_standby_snapshot() when logical
> replication
> +* slots are migrated. Because RUNNING_XACTS record is
> required to create
> +* a consistent snapshot.
> +*/
> +   if (count_old_cluster_logical_slots())
> +   create_consistent_snapshot();

Fixed.

> 8) consitent should be consistent:
> +/*
> + * Log the details of the current snapshot to the WAL, allowing the snapshot
> + * state to be reconstructed for logical decoding on the upgraded slots.
> + */
> +static void
> +create_consistent_snapshot(void)
> +{
> +   DbInfo *old_db = _cluster.dbarr.dbs[0];
> +   PGconn *conn;
> +
> +   prep_status("Creating a consitent snapshot on new cluster");

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for reviewing! PSA new version.

> > Internally, I added a new decoding mode - DECODING_MODE_SILENT - and
> used it.
> > If the decoding context is in the mode, the output plugin is not loaded, but
> > any WALs are decoded without skipping.
> >
> 
> I think it may be okay not to load the output plugin as we are not
> going to process any record in this case but is that the only reason
> or you have something else in mind as well?

My main concern was for skipping to set output plugin options. Even if the
pgoutput plugin, some options like protocol_version, publications, etc are
required while loading a plugin. We cannot predict requirements for external
plugins. Based on that I thought output plugins should not be loaded during the
decode.

> > Also, a new flag "did_process" is also
> > added. This flag is set if wrappers for output plugin callbacks are called 
> > during
> > the silent mode.
>
> Isn't it sufficient to add a test for silent mode in
> begin/stream_start/begin_prepare kind of APIs and set
> ctx->did_process? In all other APIs, we can assert that did_process
> shouldn't be set and we never reach there when decoding mode is
> silent.
>
> 
> + /* Check whether the meaningful change was found */
> + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> + ctx->did_process);
> 
> Are you talking about this check in the patch? If so, can you please
> explain when does the first check help?

I changed around here so I describe once again.

A flag (output_skipped) is set when the transaction is decoded till the end in
silent mode. It is done in DecodeTXNNeedSkip() because the function is the 
common
path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
true when the decoding context is in the silent mode. Therefore, any cb_wrapper
functions would not be called anymore. DecodingContextHasdecodedItems() just
returns output_skipped.

This approach needs to read WALs till end of transactions before returning the
upgrading function, but codes look simpler than the previous version.

> >
> > Based on that, I added another binary function
> binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
> 
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
> 
> > One consideration is that pg_log_standby_snapshot() must be executed before
> > slots consuming changes. New cluster does not have RUNNING_XACTS records
> so that
> > decoding context on new cluster cannot be create a consistent snapshot 
> > as-is.
> > This may lead to discard changes during the upcoming consuming event. To
> > prevent it the function is called after the final pg_resetwal.
> >
> > How do you think?
> >
> 
> + /*
> + * Also, we mu execute pg_log_standby_snapshot() when logical replication
> + * slots are migrated. Because RUNNING_XACTS record is required to create
> + * a consistent snapshot.
> + */
> + if (count_old_cluster_logical_slots())
> + create_consistent_snapshot();
> 
> We shouldn't do this separately. Instead
> binary_upgrade_create_logical_replication_slot() should ensure that
> corresponding WAL is reserved similar to what we do in
> ReplicationSlotReserveWal() and then similarly invoke
> LogStandbySnapshot() to ensure that we have enough information to
> start.

I did not handle these parts because they needed more analysis. Let's discuss
in later versions.

> 
> Few minor comments:
> ==
> 1. The commit message and other comments like atop
> get_old_cluster_logical_slot_infos() needs to be adjusted as per
> recent changes.

I revisited comments and updated.

> 2.
> @@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
> ReorderBufferTXN *txn,
>   LogicalErrorCallbackState state;
>   ErrorContextCallback errcallback;
> 
> - Assert(!ctx->fast_forward);
> + /*
> + * In silent mode all the two-phase callbacks are not set so that the
> + * wrapper should not be called.
> + */
> + Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
> 
> This and other similar comments doesn't seems to be consistent as the
> function name and comments are not matching.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: Retire has_multiple_baserels()

2023-10-10 Thread Richard Guo
On Tue, Oct 10, 2023 at 5:43 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> I used the following patch to double check that nothing was missed:
>
> ```
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -2207,8 +2207,13 @@ has_multiple_baserels(PlannerInfo *root)
> /* ignore RTEs that are "other rels" */
> if (brel->reloptkind == RELOPT_BASEREL)
> if (++num_base_rels > 1)
> +   {
> +
> Assert(bms_membership(root->all_baserels) == BMS_MULTIPLE);
> return true;
> +   }
> }
> +
> +   Assert(bms_membership(root->all_baserels) != BMS_MULTIPLE);
> return false;
>  }
> ```
>
> It wasn't. The patch LGTM.


Thanks for the verification.

Thanks
Richard


Re: Lowering the default wal_blocksize to 4K

2023-10-10 Thread Matthias van de Meent
On Tue, 10 Oct 2023 at 01:08, Andres Freund  wrote:
>
> Hi,
>
> I've mentioned this to a few people before, but forgot to start an actual
> thread. So here we go:
>
> I think we should lower the default wal_blocksize / XLOG_BLCKSZ to 4096, from
> the current 8192.

Seems like a good idea.

> It's IMO quite interesting that even at the higher client counts, the number
> of bytes written don't reach parity.
>
> It's fun to see how the total number of writes *decreases* at higher
> concurrency, because it becomes more likely that pages are filled completely.

With higher client counts and short transactions I think it is not too
unexpected to see commit_delay+commit_siblings configured. Did you
measure the impact of this change on such configurations?

> One thing I noticed is that our auto-configuration of wal_buffers leads to
> different wal_buffers settings for different XLOG_BLCKSZ, which doesn't seem
> great.

Hmm.

> Performing the same COPY workload (1024 files, split across N clients) for
> both settings shows no performance difference, but a very slight increase in
> total bytes written (about 0.25%, which is roughly what I'd expect).
>
> Personally I'd say the slight increase in WAL volume is more than outweighed
> by the increase in throughput and decrease in bytes written.

Agreed.

> There's an alternative approach we could take, which is to write in 4KB
> increments, while keeping 8KB pages. With the current format that's not
> obviously a bad idea. But given there aren't really advantages in 8KB WAL
> pages, it seems we should just go for 4KB?

It is not just the disk overhead of blocks, but we also maintain some
other data (currently in the form of XLogRecPtrs) in memory for each
WAL buffer, the overhead of which will also increase when we increase
the number of XLog pages per MB of WAL that we cache.
Additionally, highly concurrent workloads with transactions that write
a high multiple of XLOG_BLCKSZ bytes to WAL may start to see increased
overhead due to the .25% additional WAL getting written and a doubling
of the number of XLog pages being touched (both initialization and the
smaller memcpy for records that would now cross an extra page
boundary).

However, for all of these issues I doubt that they actually matter
much in the grand scheme of things, so I definitely wouldn't mind
moving to 4KiB XLog pages.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Comparing two double values method

2023-10-10 Thread Heikki Linnakangas

On 10/10/2023 13:31, Bowen Shi wrote:

Dears,

I noticed that in the `check_GUC_init` function, there is a direct
comparison using the != operator for two double values, which seems
problematic.

I wrote this patch to fix this.


No, the compile-time initial values should match exactly.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Comparing two double values method

2023-10-10 Thread Matthias van de Meent
On Tue, 10 Oct 2023 at 12:33, Bowen Shi  wrote:
>
> Dears,
>
> I noticed that in the `check_GUC_init` function, there is a direct
> comparison using the != operator for two double values, which seems
> problematic.

I don't think I understand the problem. The code checks that the
dynamic initialization values are equal to the current value of the
GUC, or 0. Why would a "margin for error" of 1e-6 be of any use?
Why was the margin of 1e-6 chosen instead of one based on the exponent
of the GUC's current value (if any)?

In my view, this would break the code, not fix it, as it would
decrease the cases where we detect broken GUC registrations.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: pg_resetwal: Corrections around -c option

2023-10-10 Thread Alvaro Herrera
On 2023-Oct-10, Peter Eisentraut wrote:

> On 09.10.23 17:48, Alvaro Herrera wrote:
> > Hmm, not sure about this.  [...]
> 
> Would those issues also apply to the other SLRU-based guides on this man
> page?  Are they all a bit wrong?

I didn't verify, but I think it's likely that they do and they are.

> > Not sure how to write this concisely, though.  Should we really try?
> 
> Maybe not.  But the documentation currently suggests you can try (probably
> somewhat copy-and-pasted).

I bet this has not been thoroughly verified.

Perhaps we should have (somewhere outside the option list) a separate
paragraph that explains how to determine the safest maximum value to
use, and list only the multiplier together with each option.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-10 Thread Richard Guo
On Tue, Oct 10, 2023 at 5:10 PM David Rowley  wrote:

> On Sat, 7 Oct 2023 at 22:44, Richard Guo  wrote:
> >
> > In relation_excluded_by_constraints() when we're trying to figure out
> > whether the relation need not be scanned, one of the checks we do is to
> > detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> > this check only when there is exactly one baserestrictinfo entry, and
> > the comment explains this as below.
> >
> >  * Regardless of the setting of constraint_exclusion, detect
> >  * constant-FALSE-or-NULL restriction clauses.  Because const-folding
> will
> >  * reduce "anything AND FALSE" to just "FALSE", any such case should
> >  * result in exactly one baserestrictinfo entry.
>
> Coincidentally (?), I saw the same thing just a few weeks ago while
> working on [1].  I made the exact same adjustment to the code in
> relation_excluded_by_constraints() as you have.


Haha, I noticed the need of this change while writing v5 patch [1] for
that same thread.  That patch generates a new constant-FALSE
RestrictInfo for an IS NULL qual that can be reduced to FALSE, and this
makes the comment in relation_excluded_by_constraints() about 'any such
case should result in exactly one baserestrictinfo entry' not true any
more.  Without this change in relation_excluded_by_constraints(), a
query like below would not be able to be marked as dummy.

select * from t where a is null and 'otherquals';

And then the regression test diff after applying this change reminds me
that equivclass.c may also generate new constant-FALSE RestrictInfos on
the fly, so it seems to me that this change may benefit some queries
even without the 'reduce-NullTest' patch.


> I wasn't really expecting the baserestrictinfo list to be excessively
> long, and if it ever was, I think looking at things like selectivity
> estimations would by far drown out looping over the entire list in
> relation_excluded_by_constraints() rather than just looking at the
> first item in the list.


Agreed.


> After making the change, I saw the same regression test change as you
> did, but didn't really feel like it was worth tackling separately from
> the patch that we were working on.


I was thinking that this change may be worthwhile by itself even without
the 'reduce-NullTest' patch, because it can benefit some cases, such as
where EC generates constant-FALSE on the fly.  So maybe it's worth a
separate patch?  I'm not quite sure.

[1]
https://www.postgresql.org/message-id/CAMbWs4-eNVNTNc94eF%2BO_UwHYKv43vyMurhcdqMV%3DHt5fehcOg%40mail.gmail.com

Thanks
Richard


Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-10 Thread vignesh C
On Mon, 9 Oct 2023 at 16:20, David Rowley  wrote:
>
> On Mon, 9 Oct 2023 at 21:17, David Rowley  wrote:
> > Here are some more thoughts on how we could improve this:
> >
> > 1. Adjust the definition of StringInfoData.maxlen to define that -1
> > means the StringInfoData's buffer is externally managed.
> > 2. Adjust enlargeStringInfo() to add a check for maxlen = -1 and have
> > it palloc, say, pg_next_pow2(str->len * 2) bytes and memcpy the
> > existing (externally managed string) into the newly palloc'd buffer.
> > 3. Add a new function along the lines of what I originally proposed to
> > allow init of a StringInfoData using an existing allocated string
> > which sets maxlen = -1.
> > 4. Update all the existing places, including the ones I just committed
> > (plus the ones you committed in ba1e066e4) to make use of the function
> > added in #3.
>
> I just spent the past few hours playing around with the attached WIP
> patch to try to clean up the various places where we manually build
> StringInfoDatas around the tree.
>
> While working on this, I added an Assert in the new
> initStringInfoFromStringWithLen function to ensure that data[len] ==
> '\0' per the "There is guaranteed to be a terminating '\0' at
> data[len]" comment in stringinfo.h.  It looks like we have some
> existing breakers of this rule.
>
> If you apply the attached patch to 608fd198de~1 and ignore the
> rejected hunks from the deserial functions, you'll see an Assert
> failure during 023_twophase_stream.pl
>
> 023_twophase_stream_subscriber.log indicates:
> TRAP: failed Assert("data[len] == '\0'"), File:
> "../../../../src/include/lib/stringinfo.h", Line: 97, PID: 1073141
> postgres: subscriber: logical replication parallel apply worker for
> subscription 16396 (ExceptionalCondition+0x70)[0x56160451e9d0]
> postgres: subscriber: logical replication parallel apply worker for
> subscription 16396 (ParallelApplyWorkerMain+0x53c)[0x5616043618cc]
> postgres: subscriber: logical replication parallel apply worker for
> subscription 16396 (StartBackgroundWorker+0x20b)[0x56160434452b]
>
> So it seems like we have some existing issues with
> LogicalParallelApplyLoop(). The code there does not properly NUL
> terminate the StringInfoData.data field. There are some examples in
> exec_bind_message() of how that could be fixed. I've CC'd Amit to let
> him know about this.

Thanks for reporting this issue, I was able to reproduce this issue
with the steps provided. I will analyze further and start a new thread
to provide the details of the same.

Regards,
Vignesh




Comparing two double values method

2023-10-10 Thread Bowen Shi
Dears,

I noticed that in the `check_GUC_init` function, there is a direct
comparison using the != operator for two double values, which seems
problematic.

I wrote this patch to fix this.

--
Regard,
Bowen Shi


v1-0001-Fix-double-value-compare-problem.patch
Description: Binary data


Re: UUID v7

2023-10-10 Thread Brad Peabody
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. 
> > But we still can have such functions...maybe as an extension?
> Do you know of any reason for that?

I guess some of the detail may have been edited out over time with all of the 
changes, but it’s basically this: 
https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/main/draft-ietf-uuidrev-rfc4122bis.md#opacity-opacity.
 The rationale is that when you introspect a UUID you essentially add 
interoperability concerns.  E.g. if we say that applications can rely on being 
able to parse the timestamp from the UUID then it means that other 
implementations must provide guarantees about what that timestamp is.  And 
since the point of a UUID is to provide a unique value, not to transmit 
additional metadata, the decision was made early on that it’s more realistic 
and representative of the reality of the situation to say that applications 
should generate values, try not to parse them if they don’t have to, but if 
they do it’s only going to be as accurate as the original data put into it.  So 
systems with no NTP enabled, or that fuzz part of the time so as not to leak 
the exact moment in time something was done, etc - those are things that are 
going to happen and so buyer beware when parsing.

If the question is whether or not a function should exist to parse a timestamp 
from a UUID, I would say sure go ahead, just mention that the timestamp is only 
accurate as the input, and the spec doesn’t guarantee anything if your UUID 
came from another source.  I imagine a common case would be UUIDs generated in 
within the same database, and someone wants to extract the timestamp, which 
would be as reliable as the timestamp on the database machine - seems like a 
perfectly good case where supporting timestamp extraction as practical value.


> On Oct 9, 2023, at 11:11 AM, Jelte Fennema  wrote:
> 
> On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian  wrote:
>> 
>> On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin  wrote:
>>> Well, as far as I know, RFC discourages extracting timestamps from UUIDs. 
>>> But we still can have such functions...maybe as an extension?
>> 
>> Do you know of any reason for that?
> 
> No reasons are given but the RFC states this:
> 
>> UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT 
>> examine the bits in a UUID to whatever extent is possible. However, where 
>> necessary, inspectors should refer to Section 4 for more information on 
>> determining UUID version and variant.
> 
>>> However, so far I haven't figured out how to implement optional arguments 
>>> for catalog functions. I'd appreciate any pointers here.
>> 
>> I'd argue that the time argument shouldn't be optional. Asking the
>> user to supply time would force them to think whether they want to go
>> with `now()` or `clock_timestamp()` or something else.
> 
> I think using `now()` is quite prone to sequence rollover. With the
> current patch inserting more than 2^18~=0.26M rows into a table with
> `gen_uuid_v7()` as the default in a single transaction would already
> cause sequence rollover. I think using a monotonic clock source is the
> only reasonable thing to do. From the RFC:
> 
>> Implementations SHOULD use the current timestamp from a reliable source to 
>> provide values that are time-ordered and continually increasing. Care SHOULD 
>> be taken to ensure that timestamp changes from the environment or operating 
>> system are handled in a way that is consistent with implementation 
>> requirements. For example, if it is possible for the system clock to move 
>> backward due to either manual adjustment or corrections from a time 
>> synchronization protocol, implementations must decide how to handle such 
>> cases. (See Altering, Fuzzing, or Smearing bullet below.)



Re: [PoC] run SQL over ciphertext

2023-10-10 Thread Giampaolo Capelli
Hello,
I think this is a very interesting topic, especially for European companies
where data sovereignty in the cloud has become critical.

If I understand correctly, the idea is to split users into 'client users'
who can see data unencrypted, and 'server users', who are administrators
unable to decrypt data.

A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption
keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted
columns?


Hi all,
>
> We have developed an extension, allowing PostgreSQL to run queries over
> encrypted data. This functionality is achieved via user-defined functions
> that extend encrypted data types and support commonly used expression
> operations. Our tests validated its effectiveness with TPC-C and TPC-H
> benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB
> .
>
> This PoC is a reimplementation fork while collaborating with a cloud
> database company; the aim is to enable their DBAs to manage databases
> without the risk of data leaks, *meeting the requirements of laws such as
> GDPR.*
>
> I am wondering if anyone thinks this is a nice feature. If so, I am
> curious about the steps to further it mature and potentially have it
> incorporated as a part of PostgreSQL contrib.
>
> Best regards,
> Mingyu Li
>


--
best regards
Giampaolo Capelli


Re: Advice about preloaded libraries

2023-10-10 Thread Aleksander Alekseev
Hi,

> MobilityDB
> https://github.com/MobilityDB/MobilityDB
> is a PostgreSQL extension that depends on PosGIS.
>
> Bradford Boyle who has been working on packaging MobilityDB
> https://www.postgresql.org/message-id/capqrbe716d3gpd0jdbafab72elajrppg1luzvobelnbgl3r...@mail.gmail.com
> highlighted the issue of which of the GUC shared_preload_libraries vs 
> local_preload_libraries vs session_preload_libraries should be used to load 
> the postgis-3 library.
>
> Our understanding of the information in the manual
> https://www.postgresql.org/docs/16/runtime-config-client.html#GUC-SESSION-PRELOAD-LIBRARIES
> does not give us a clear-cut answer for this question. We are looking for 
> advice on which of the three options mentioned above should be used.
>
> MobilityDB requires loading PostGIS before any MobilityDB query can be 
> issued. For example, commenting out the following line
> #shared_preload_libraries = 'postgis-3'
> in the postgresql.conf shows the following
>
> $ psql test
> psql (15.3)
> Type "help" for help.
>
> test=# select tgeompoint 'Point(1 1)@2000-01-01';
> 2023-10-03 16:41:25.980 CEST [8683] ERROR:  could not load library 
> "/usr/local/pgsql/15/lib/libMobilityDB-1.1.so": 
> /usr/local/pgsql/15/lib/libMobilityDB-1.1.so: undefined symbol: ST_Intersects 
> at character 19
> 2023-10-03 16:41:25.980 CEST [8683] STATEMENT:  select tgeompoint 'Point(1 
> 1)@2000-01-01';
> ERROR:  could not load library 
> "/usr/local/pgsql/15/lib/libMobilityDB-1.1.so": 
> /usr/local/pgsql/15/lib/libMobilityDB-1.1.so: undefined symbol: ST_Intersects
> LINE 1: select tgeompoint 'Point(1 1)@2000-01-01';
>   ^
> test=# select st_point(1,1);
>   st_point
> 
>  010100F03FF03F
> (1 row)
>
> test=# select tgeompoint 'Point(1 1)@2000-01-01';
> tgeompoint
> ---
>  010100F03FF03F@2000-01-01 00:00:00+01
> (1 row)
>
> test=#
> 
>
> As can be seen above, it is not REALLY mandatory to have 
> shared_preload_libraries = 'postgis-3' but then the user is responsible for 
> issuing a query to load PostGIS (select st_point(1,1); above) and then she is 
> able to execute MobilityDB queries.
>
> Thanks for your advice.

I read the email several times but I'm still not sure I understand
what in particular you are asking.

>From what I can tell the goal is to package a third-party project,
MobilityDB in this case. This project should have clear instructions
on how to install it. If not, consider reporting the issue to the
developers. If they can't provide good documentation maybe the project
is not mature enough to invest your time into it yet.

If you find the observed behavior of PostgreSQL confusing, that's
understandable, but this behavior is expected one. Typically
PostgreSQL loads an extension to the backend when needed. This is what
happens when a user calls `select st_point(1,1);`. You can load an
extension to the postmaster by using shared_preload_libraries. This is
typically used when an extension needs to acquire locks and shared
memory when DBMS starts. To my knowledge PostGIS doesn't use any of
this, although I'm not an expert in PostGIS.

All PostgreSQL GUCs are well documented. You can find more details here [1].

Hopefully I answered the right question. If not please be a bit more specific.

[1]: https://www.postgresql.org/docs/current/runtime-config-client.html

-- 
Best regards,
Aleksander Alekseev




Re: Retire has_multiple_baserels()

2023-10-10 Thread Aleksander Alekseev
Hi,

> The function has_multiple_baserels() is used in set_subquery_pathlist()
> to check and see if there are more than 1 base rel, by looping through
> simple_rel_array[].  I think one simpler way to do that is to leverage
> root->all_baserels by
>
> bms_membership(root->all_baserels) == BMS_MULTIPLE
>
> all_baserels is computed in deconstruct_jointree (v16) or in
> make_one_rel (v15 and earlier), both are before we generate access paths
> for subquery RTEs, and it contains all base rels (but not "other" rels).
> So it should be a suitable replacement.  I doubt that there would be any
> measurable performance gains.  So please consider it cosmetic.
>
> I've attached a patch to do that.  Any thoughts?

I used the following patch to double check that nothing was missed:

```
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2207,8 +2207,13 @@ has_multiple_baserels(PlannerInfo *root)
/* ignore RTEs that are "other rels" */
if (brel->reloptkind == RELOPT_BASEREL)
if (++num_base_rels > 1)
+   {
+
Assert(bms_membership(root->all_baserels) == BMS_MULTIPLE);
return true;
+   }
}
+
+   Assert(bms_membership(root->all_baserels) != BMS_MULTIPLE);
return false;
 }
```

It wasn't. The patch LGTM.

-- 
Best regards,
Aleksander Alekseev




Re: Add const to values and nulls arguments

2023-10-10 Thread Aleksander Alekseev
Hi,

> >> The 0002 patch, which I'm not proposing to commit at this time, makes
> >> similar changes but in a way that breaks the table and index AM APIs.
> >> So I'm just including that here in case anyone wonders, why didn't you
> >> touch those.  And also maybe if we ever change that API incompatibly we
> >> could throw this one in then.
> >
> > 0001 looks good to me and passes the tests in several environments,
> > not surprisingly. Let's merge it.
>
> done

Great. FWIW changing the index AM API in this particular aspect
doesn't strike me as such a terrible idea.

-- 
Best regards,
Aleksander Alekseev




Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-10 Thread David Rowley
On Sat, 7 Oct 2023 at 22:44, Richard Guo  wrote:
>
> In relation_excluded_by_constraints() when we're trying to figure out
> whether the relation need not be scanned, one of the checks we do is to
> detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> this check only when there is exactly one baserestrictinfo entry, and
> the comment explains this as below.
>
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.

Coincidentally (?), I saw the same thing just a few weeks ago while
working on [1].  I made the exact same adjustment to the code in
relation_excluded_by_constraints() as you have.

I wasn't really expecting the baserestrictinfo list to be excessively
long, and if it ever was, I think looking at things like selectivity
estimations would by far drown out looping over the entire list in
relation_excluded_by_constraints() rather than just looking at the
first item in the list.

After making the change, I saw the same regression test change as you
did, but didn't really feel like it was worth tackling separately from
the patch that we were working on.

David

[1] 
https://postgr.es/m/CAApHDvpkfS1hY3P4DWbOw6WCgRrja=ydloez+5g+e2z19up...@mail.gmail.com




Fwd: Advice about preloaded libraries

2023-10-10 Thread Esteban Zimanyi
MobilityDB
https://github.com/MobilityDB/MobilityDB
is a PostgreSQL extension that depends on PosGIS.

Bradford Boyle who has been working on packaging MobilityDB
https://www.postgresql.org/message-id/capqrbe716d3gpd0jdbafab72elajrppg1luzvobelnbgl3r...@mail.gmail.com
highlighted the issue of which of the GUC shared_preload_libraries vs
local_preload_libraries vs session_preload_libraries should be used to load
the postgis-3 library.

Our understanding of the information in the manual
https://www.postgresql.org/docs/16/runtime-config-client.html#GUC-SESSION-PRELOAD-LIBRARIES
does not give us a clear-cut answer for this question. We are looking for
advice on which of the three options mentioned above should be used.

MobilityDB requires loading PostGIS before any MobilityDB query can be
issued. For example, commenting out the following line
#shared_preload_libraries = 'postgis-3'
in the postgresql.conf shows the following

$ psql test
psql (15.3)
Type "help" for help.

test=# select tgeompoint 'Point(1 1)@2000-01-01';
2023-10-03 16:41:25.980 CEST [8683] ERROR:  could not load library
"/usr/local/pgsql/15/lib/libMobilityDB-1.1.so": /usr/local/pgsql/15/lib/
libMobilityDB-1.1.so: undefined symbol: ST_Intersects at character 19
2023-10-03 16:41:25.980 CEST [8683] STATEMENT:  select tgeompoint 'Point(1
1)@2000-01-01';
ERROR:  could not load library "/usr/local/pgsql/15/lib/libMobilityDB-1.1.so":
/usr/local/pgsql/15/lib/libMobilityDB-1.1.so: undefined symbol:
ST_Intersects
LINE 1: select tgeompoint 'Point(1 1)@2000-01-01';
  ^
test=# select st_point(1,1);
  st_point

 010100F03FF03F
(1 row)

test=# select tgeompoint 'Point(1 1)@2000-01-01';
tgeompoint
---
 010100F03FF03F@2000-01-01 00:00:00+01
(1 row)

test=#


As can be seen above, it is not REALLY mandatory to have
shared_preload_libraries = 'postgis-3' but then the user is responsible for
issuing a query to load PostGIS (select st_point(1,1); above) and then she
is able to execute MobilityDB queries.

Thanks for your advice.


Re: Use virtual tuple slot for Unique node

2023-10-10 Thread David Rowley
On Wed, 27 Sept 2023 at 20:01, David Rowley  wrote:
>
> On Sat, 23 Sept 2023 at 03:15, Heikki Linnakangas  wrote:
> > So not a win in this case. Could you peek at the outer slot type, and
> > use the same kind of slot for the Unique's result? Or some more
> > complicated logic, like use a virtual slot if all the values are
> > pass-by-val? I'd also like to keep this simple, though...
> >
> > Would this kind of optimization make sense elsewhere?
>
> There are a few usages of ExecGetResultSlotOps(). e.g ExecInitHashJoin().
>
> If I adjust the patch to:
>
> -   ExecInitResultTupleSlotTL(>ps, );
> +   ExecInitResultTupleSlotTL(>ps,
> +
> ExecGetResultSlotOps(outerPlanState(uniquestate),
> +
> NULL));

Just to keep this from going cold, here's that in patch form for
anyone who wants to test.

I spent a bit more time running some more benchmarks and I don't see
any workload where it slows things down.  I'd be happy if someone else
had a go at finding a regression.

David
diff --git a/src/backend/executor/nodeUnique.c 
b/src/backend/executor/nodeUnique.c
index 01f951197c..be585e284b 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -115,6 +115,7 @@ UniqueState *
 ExecInitUnique(Unique *node, EState *estate, int eflags)
 {
UniqueState *uniquestate;
+   const TupleTableSlotOps *ops;
 
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -137,11 +138,14 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
 */
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, 
eflags);
 
+   /* initialize result slot and type */
+   ops = ExecGetResultSlotOps(outerPlanState(uniquestate), NULL);
+   ExecInitResultTupleSlotTL(>ps, ops);
+
/*
-* Initialize result slot and type. Unique nodes do no projections, so
-* initialize projection info for this node appropriately.
+* Unique nodes do no projections, so initialize projection info for 
this
+* node appropriately.
 */
-   ExecInitResultTupleSlotTL(>ps, );
uniquestate->ps.ps_ProjInfo = NULL;
 
/*


Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-10 Thread Richard Guo
On Tue, Oct 10, 2023 at 1:45 PM Ashutosh Bapat 
wrote:

> On Tue, Oct 10, 2023 at 11:09 AM Richard Guo 
> wrote:
> > Hm, I don't think so.  get_gating_quals is called in createplan.c, where
> > we've selected the best path, while the optimization with my code
> > happens much earlier, when we set size estimates for a base relation.
> > Neither of these two is a duplicate of the other.  I think the theory
> > here is that it's always a win to mark a rel as dummy if possible as
> > early as we can.
>
> Right. Do you have an example where this could be demonstrated?


Hmm, do you think the two examples in the initial email of this thread
can serve the purpose, by observing how we avoid building access paths
for the dummy rel with this optimization?

Thanks
Richard


Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-10 Thread Sergei Glukhov

Hi David,


On 10/9/23 03:26, David Rowley wrote:

On Sat, 7 Oct 2023 at 03:11, Sergei Glukhov  wrote:

I noticed that combination of prepared statement with generic plan and
'IS NULL' clause could lead partition pruning to crash.
Test case:
--
set plan_cache_mode to force_generic_plan;
prepare stmt AS select * from hp where a is null and b = $1;
explain execute stmt('xxx');

Thanks for the detailed report and proposed patch.

I think your proposed fix isn't quite correct.  I think the problem
lies in InitPartitionPruneContext() where we assume that the list
positions of step->exprs are in sync with the keyno.  If you look at
perform_pruning_base_step() the code there makes a special effort to
skip over any keyno when a bit is set in opstep->nullkeys.

It seems that your patch is adjusting the keyno that's given to the
PruneCxtStateIdx() and it looks like (for your test case) it'll end up
passing keyno==0 when it should be passing keyno==1.  keyno is the
index of the partition key, so you can't pass 0 when it's for key
index 1.

I wonder if it's worth expanding the tests further to cover more of
the pruning cases to cover run-time pruning too.


Thanks for the explanation. I thought by some reason that 'exprstates ' 
array doesn't
contain elements related to 'IS NULL' clause. Now I see that they are 
there and

just empty and untouched.

I verified the patch and it fixes the problem.

Regarding test case,
besides the current test case and the test for dynamic partition 
pruning, say,


select a, (select b from hp where a is null and b = a.b) AS b from hp a 
where a = 1 and b = 'xxx';


I would like to suggest to slightly refactor 'Test Partition pruning for 
HASH partitioning' test
from 'partition_prune.sql' and add one more key field. The reason is 
that two-element
key is not enough for thorough testing since it tests mostly corner 
cases. Let me know

if it's worth doing.

Example:
--
create table hp (a int, b text, c int, d int)
  partition by hash (a part_test_int4_ops, b part_test_text_ops, c 
part_test_int4_ops);

create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);

insert into hp values (null, null, null, 0);
insert into hp values (1, null, 1, 1);
insert into hp values (1, 'xxx', 1, 2);
insert into hp values (null, 'xxx', null, 3);
insert into hp values (2, 'xxx', 2, 4);
insert into hp values (1, 'abcde', 1, 5);
--

Another crash in the different place even with the fix:
--
explain select * from hp where a = 1 and b is null and c = 1;
--


Regards,
Gluh





Retire has_multiple_baserels()

2023-10-10 Thread Richard Guo
The function has_multiple_baserels() is used in set_subquery_pathlist()
to check and see if there are more than 1 base rel, by looping through
simple_rel_array[].  I think one simpler way to do that is to leverage
root->all_baserels by

bms_membership(root->all_baserels) == BMS_MULTIPLE

all_baserels is computed in deconstruct_jointree (v16) or in
make_one_rel (v15 and earlier), both are before we generate access paths
for subquery RTEs, and it contains all base rels (but not "other" rels).
So it should be a suitable replacement.  I doubt that there would be any
measurable performance gains.  So please consider it cosmetic.

I've attached a patch to do that.  Any thoughts?

Thanks
Richard


v1-0001-Retire-has_multiple_baserels.patch
Description: Binary data


Re: Clean up some pg_dump tests

2023-10-10 Thread Peter Eisentraut

On 09.10.23 11:20, Alvaro Herrera wrote:

I tried this out.  I agree it's a good change.  BTW, this made me
realize that "unlike" is not a good name: maybe it should be called
"except".


right


I would add quotes to the words "like" and "unlike" there.  Otherwise,
these sentences are hard to parse.  Also, some commentary on what this
is about seems warranted: maybe "Check that this test properly defines
which dumps the output should match on." or similar.


Done.

I also moved the code a bit earlier, before the checks for supported 
compression libraries etc., so it runs even if those cause a skip.



I didn't like using diag(), because automated runs will not alert to any
problems.  Now maybe that's not critical, but I fear that people would
not notice problems if they are just noise in the output.  Let's make
them test errors.  fail() seems good enough: with the lines I quote
above and omitting the test corrections, I get this, which seems good
enough:


After researching this a bit more, I think "die" is the convention for 
problems in the test definitions themselves.  (Otherwise, you're writing 
a test about the tests, which would be a bit weird.)  The result is 
approximately the same.
From ec4380986519f966303c14ea49223f0cf7729220 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Oct 2023 09:54:43 +0200
Subject: [PATCH v2] Clean up some pg_dump tests

1) Remove useless entries from "unlike" lists.  Runs that are not
   listed in "like" don't need to be excluded in "unlike".

2) Ensure there is always a "like" list, even if it is empty.  This
   makes the test more self-documenting.

3) Use predefined lists such as %full_runs where appropriate, instead
   of listing all runs separately.

Also add code that checks 1 and 2 automatically and dies with an error
for violations.

Discussion: 
https://www.postgresql.org/message-id/flat/1f8cb371-e84e-434e-0367-6b716fb16...@eisentraut.org
---
 src/bin/pg_dump/t/002_pg_dump.pl | 78 +---
 1 file changed, 21 insertions(+), 57 deletions(-)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 326c9a7639..eb3ec534b4 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -818,7 +818,7 @@
regexp => qr/^\QALTER COLLATION public.test0 OWNER TO \E.+;/m,
collation => 1,
like => { %full_runs, section_pre_data => 1, },
-   unlike => { %dump_test_schema_runs, no_owner => 1, },
+   unlike => { no_owner => 1, },
},
 
'ALTER FOREIGN DATA WRAPPER dummy OWNER TO' => {
@@ -977,7 +977,7 @@
create_sql =>
  'ALTER SCHEMA public OWNER TO "regress_quoted  \"" role";',
regexp => qr/^(GRANT|REVOKE)/m,
-   unlike => { defaults_public_owner => 1 },
+   like => {},
},
 
'ALTER SEQUENCE test_table_col1_seq' => {
@@ -1285,9 +1285,7 @@
  { %full_runs, %dump_test_schema_runs, section_pre_data => 1, 
},
unlike => {
exclude_dump_test_schema => 1,
-   only_dump_test_table => 1,
no_owner => 1,
-   role => 1,
only_dump_measurement => 1,
},
},
@@ -1351,7 +1349,6 @@
binary_upgrade => 1,
no_large_objects => 1,
schema_only => 1,
-   section_pre_data => 1,
},
},
 
@@ -3210,7 +3207,6 @@
binary_upgrade => 1,
exclude_dump_test_schema => 1,
schema_only => 1,
-   only_dump_measurement => 1,
},
},
 
@@ -3457,7 +3453,6 @@
'Disabled trigger on partition is not created' => {
regexp => qr/CREATE TRIGGER test_trigger.*ON 
dump_test_second_schema/,
like => {},
-   unlike => { %full_runs, %dump_test_schema_runs },
},
 
# Triggers on partitions should not be dropped individually
@@ -3834,35 +3829,12 @@
\QCREATE INDEX measurement_city_id_logdate_idx ON ONLY 
dump_test.measurement USING\E
/xm,
like => {
-   binary_upgrade => 1,
-   clean => 1,
-   clean_if_exists => 1,
-   compression => 1,
-   createdb => 1,
-   defaults => 1,
-   exclude_test_table => 1,
-   exclude_test_table_data => 1,
-   no_toast_compression => 1,
-   no_large_objects => 1,
-   no_privs => 1,
-   no_owner => 1,
-   no_table_access_method => 1,
-   only_dump_test_schema => 1,
- 

Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-10 Thread tender wang
The comment  /* not needed for Consts */  may be more better close to if
(!IsA(expr, Const)).
Others look good to me.

David Rowley  于2023年10月9日周一 07:28写道:

> On Sat, 7 Oct 2023 at 03:11, Sergei Glukhov 
> wrote:
> > I noticed that combination of prepared statement with generic plan and
> > 'IS NULL' clause could lead partition pruning to crash.
>
> > Test case:
> > --
> > set plan_cache_mode to force_generic_plan;
> > prepare stmt AS select * from hp where a is null and b = $1;
> > explain execute stmt('xxx');
>
> Thanks for the detailed report and proposed patch.
>
> I think your proposed fix isn't quite correct.  I think the problem
> lies in InitPartitionPruneContext() where we assume that the list
> positions of step->exprs are in sync with the keyno.  If you look at
> perform_pruning_base_step() the code there makes a special effort to
> skip over any keyno when a bit is set in opstep->nullkeys.
>
> It seems that your patch is adjusting the keyno that's given to the
> PruneCxtStateIdx() and it looks like (for your test case) it'll end up
> passing keyno==0 when it should be passing keyno==1.  keyno is the
> index of the partition key, so you can't pass 0 when it's for key
> index 1.
>
> I wonder if it's worth expanding the tests further to cover more of
> the pruning cases to cover run-time pruning too.
>

   I think it's worth doing that.

David
>


Re: Add annotation syntax to pg_hba.conf entries

2023-10-10 Thread Jim Jones
Hi Robert, Hi Tom,

Thanks for the feedback!

On 05.10.23 00:55, Tom Lane wrote:
> Robert Haas  writes:
>> You're probably not going to like this answer very much, but this
>> doesn't seem particularly worthwhile to me.
> Yeah, I was unconvinced about the number of use-cases too.
> As you say, some support from other potential users could convince
> me otherwise, but right now the evidence seems thin.
Most likely I am one of the very few using comments to sort of
semantically annotate pg_hba entries :)
>> The argument for this
>> feature is not that this information needs to exist, but that it needs
>> to be queryable from within PostgreSQL.
> Not only that, but that it needs to be accessible via the
> pg_hba_file_rules view.  Superusers could already see the
> pg_hba file's contents via pg_read_file().
That's my current strategy. I will keep doing that :)
> Again, that's not an argument that this is a bad idea.
> But it's an answer that would likely satisfy some fraction
> of whatever potential users are out there, which makes the
> question of how many use-cases really exist even more
> pressing.
>
>   regards, tom lane

I'll withdraw the CF entry, since the feature didn't seem to resonate
with other users.

Thanks again for the feedback.

Best, Jim




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-10 Thread Amit Kapila
On Sat, Oct 7, 2023 at 3:46 AM Amit Kapila  wrote:
>
> On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
> >
> > Based on that, I added another binary function 
> > binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
>
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
>

The other problem is that pg_resetwal removes all pre-existing WAL
files which in this case could lead to the removal of the WAL file
corresponding to restart_lsn. This is because at least the shutdown
checkpoint record will be written after the creation of slots which
could be in the new file used for restart_lsn. Then when we invoke
pg_resetwal, it can remove that file.

One idea to deal with this could be to do the reset WAL stuff
(FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
create slots. If we do this, then we additionally need an option in
pg_resetwal which skips resetting the WAL as that would have been done
before creating the slots.

-- 
With Regards,
Amit Kapila.




Re: pg_resetwal: Corrections around -c option

2023-10-10 Thread Peter Eisentraut

On 09.10.23 17:48, Alvaro Herrera wrote:

Hmm, not sure about this.  SLRU_PAGES_PER_SEGMENT is 32, and
COMMIT_TS_XACTS_PER_PAGE is 819, so this formula gives decimal 26208 =
0x6660.  But more generally, I'm not sure about the algorithm.  Really,
the safe value also depends on how large the latest file actually is;
e.g., if your numerically greatest file is only 32kB long (four pages)
then you can't specify values that refer to Xids in pages 5 and beyond,
because those pages will not have been zeroed into existence yet, so
you'll get an error:
   ERROR:  could not access status of transaction 55692
   DETAIL:  Could not read from file "pg_commit_ts/0002" at offset 32768: read 
too few bytes.
I think a useful value can be had by multiplying 26208 by the latest
*complete* file number, then if there is an incomplete last file, add
819 multiplied by the number of pages in it.

Also, "numerically greatest" is the wrong thing in case you've recently
wrapped XIDs around but the oldest files (before the wraparound) are
still present.  You really want the "logically latest" files.  (I think
that'll coincide with the files having the latest modification times.)


Would those issues also apply to the other SLRU-based guides on this man 
page?  Are they all a bit wrong?



Not sure how to write this concisely, though.  Should we really try?


Maybe not.  But the documentation currently suggests you can try 
(probably somewhat copy-and-pasted).



Second, the present pg_resetwal code hardcodes the minimum value as 2, which
is FrozenTransactionId, but it's not clear why that is allowed. Maybe we
should change that to FirstNormalTransactionId, which matches other
xid-related options in pg_resetwal.


Yes, that's clearly a mistake I made at the last minute: in [1] I posted
this patch *without* the test for 2, and when I pushed the patch two
days later, I had introduced that without any further explanation.

BTW if you `git show 666e8db` (which is the SHA1 identifier for
pg_resetxlog.c that appears in the patch I posted back then) you'll see
that the existing code did not have any similar protection for valid XID
values.  The tests to FirstNormalTransactionId for -u and -x were
introduced by commit 74cf7d46a91d, seven years later -- that commit both
introduced -u as a new feature, and hardened the tests for -x, which was
previously only testing for zero.


I have committed this.





Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Drouvot, Bertrand



On 10/10/23 9:21 AM, Michael Paquier wrote:

On Tue, Oct 10, 2023 at 09:18:04AM +0200, Drouvot, Bertrand wrote:

Please forget about it ;-)


That's called an ENOCOFFEE :D


Exactly, good one! ;-)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Synchronizing slots from primary to standby

2023-10-10 Thread Peter Smith
On Mon, Oct 9, 2023 at 9:34 PM shveta malik  wrote:
>
> On Wed, Oct 4, 2023 at 8:53 AM Peter Smith  wrote:
> >
> > Here are some review comments for v20-0002.
> >
>
> Thanks Peter for the feedback. Comments from 31 till end are addressed
> in v22. First 30 comments will be addressed in the next version.
>

Thanks for addressing my previous comments.

I checked those and went through other changes in v22-0002 to give a
few more review comments below.

I understand there are some design changes coming soon regarding the
use of GUCs so maybe a few of these comments will become redundant.

==
doc/src/sgml/config.sgml

1.
   A password needs to be provided too, if the sender demands password
   authentication.  It can be provided in the
   primary_conninfo string, or in a separate
-  ~/.pgpass file on the standby server (use
-  replication as the database name).
-  Do not specify a database name in the
-  primary_conninfo string.
+  ~/.pgpass file on the standby server.
+ 
+ 
+  Specify a database name in primary_conninfo string
+  to allow synchronization of slots from the primary to standby. This
+  dbname will only be used for slots synchronization purpose and will
+  be irrelevant for streaming.
  

1a.
"Specify a database name in...". Shouldn't that say "Specify dbname in..."?

~

1b.
BEFORE
This dbname will only be used for slots synchronization purpose and
will be irrelevant for streaming.

SUGGESTION
This will only be used for slot synchronization. It is ignored for streaming.

==
doc/src/sgml/system-views.sgml

2. pg_replication_slots

+ 
+  
+   synced_slot bool
+  
+  
+   True if this logical slot is created on physical standby as part of
+   slot-synchronization from primary server. Always false for
physical slots.
+  
+ 

/on physical standby/on the physical standby/

/from primary server/from the primary server/

==
src/backend/replication/logical/launcher.c

3. LaunchSlotSyncWorkers

+ /*
+ * If we failed to launch this slotsync worker, return and try
+ * launching rest of the workers in next sync cycle. But change
+ * launcher's wait time to minimum of wal_retrieve_retry_interval and
+ * default wait time to try next sync-cycle sooner.
+ */

3a.
Use consistent terms -- choose "sync cycle" or "sync-cycle"

~

3b.
Is it correct to just say "rest of the workers"; won't it also try to
relaunch this same failed worker again?

~~~

4. LauncherMain

+ /*
+ * Stop the slot-sync workers if any of the related GUCs changed.
+ * These will be relaunched using the new values during next
+ * sync-cycle. Also revalidate the new configurations and
+ * reconnect.
+ */
+ if (SlotSyncConfigsChanged())
+ {
+ slotsync_workers_stop();
+
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
+ if (RecoveryInProgress())
+ wrconn = slotsync_remote_connect();
+ }

Was it overkill to disconnect/reconnect every time any of those GUCs
changed? Or is it enough to do that only if the
PrimaryConnInfoPreReload was changed?

==
src/backend/replication/logical/logical.c

5. CreateDecodingContext

+ /*
+ * Do not allow consumption of a "synchronized" slot until the standby
+ * gets promoted.
+ */
+ if (RecoveryInProgress() && slot->data.synced)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use replication slot \"%s\" for logical decoding",
+ NameStr(slot->data.name)),
+ errdetail("This slot is being synced from primary."),
+ errhint("Specify another replication slot.")));
+

/from primary/from the primary/

==
src/backend/replication/logical/slotsync.c

6. use_slot_in_query

+ /*
+ * Return TRUE if either slot is not yet created on standby or if it
+ * belongs to one of the dbs passed in dbids.
+ */
+ if (!slot_found || relevant_db)
+ return true;
+
+ return false;

Same as single line:

return (!slot_found || relevant_db);

~~~

7. synchronize_one_slot

+ /*
+ * If the local restart_lsn and/or local catalog_xmin is ahead of
+ * those on the remote then we cannot create the local slot in sync
+ * with primary because that would mean moving local slot backwards
+ * and we might not have WALs retained for old lsns. In this case we
+ * will wait for primary's restart_lsn and catalog_xmin to catch up
+ * with the local one before attempting the sync.
+ */

/moving local slot/moving the local slot/

/with primary/with the primary/

/wait for primary's/wait for the primary's/

~~~

8. ProcessSlotSyncInterrupts

+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+
+ /* Save the PrimaryConnInfo before reloading */
+ *conninfo_prev = pstrdup(PrimaryConnInfo);

If the configuration keeps changing then there might be a slow leak
here because I didn't notice anywhere where this strdup'ed string is
getting freed. Is that something worth worrying about?

==
src/backend/replication/slot.c

9. ReplicationSlotDrop

+ /*
+ * Do 

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Michael Paquier
On Tue, Oct 10, 2023 at 09:18:04AM +0200, Drouvot, Bertrand wrote:
> Please forget about it ;-)

That's called an ENOCOFFEE :D
--
Michael


signature.asc
Description: PGP signature


Re: False "pg_serial": apparent wraparound” in logs

2023-10-10 Thread Michael Paquier
On Thu, Oct 05, 2023 at 11:28:02PM +, Imseih (AWS), Sami wrote:
> I spent sometime studying this and it appears to be a good approach. 
> 
> Passing the cutoff page as headPage (SLRU not needed code path ) instead of 
> the tailPage to 
> SimpleLruTruncate is already being done when the tailXid is not a valid XID. 
> I added an additional condition to make sure that the tailPage proceeds the 
> headPage
> as well. 
> 
> Attached is v2 of the patch.

Thanks for the updated patch.  I have begun looking at what you have
here.
--
Michael


signature.asc
Description: PGP signature


Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Drouvot, Bertrand




On 10/10/23 9:12 AM, Drouvot, Bertrand wrote:

Hi,
Any reason why INIT_PG_BYPASS_ROLE_LOGIN is not 0x0003?



Please forget about it ;-)

Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-10 Thread Drouvot, Bertrand

Hi,

On 10/10/23 7:58 AM, Michael Paquier wrote:

On Tue, Oct 10, 2023 at 06:57:05AM +0200, Drouvot, Bertrand wrote:

Please find attached v9 (v8 rebase due to f483b2090).


I was looking at v8 just before you sent this v9, and still got
annoyed by the extra boolean argument added to InitPostgres(). 


Arf, I did not look at it as I had in mind to look at it once
this one is in.


So
please let me propose to bite the bullet and refactor that, as of the
0001 attached that means less diff footprints in all the callers of
InitPostgres() (I am not wedded to the flag names).


Thanks for having looked at it!

+   bits32  init_flags = 0; /* never honor 
session_preload_libraries */

Also a few word about datallowconn in the comment? (as the flag deals with 
both).



It looks like 0002 had the same issues as f483b209: the worker that
could not be started because of the login restriction could be
detected as stopped by worker_spi_launch(), causing the script to fail
hard.

0002 is basically your v9, able to work with the refactoring from
0001.


Thanks!

 #define INIT_PG_OVERRIDE_ALLOW_CONNS   0x0002
+#define INIT_PG_BYPASS_ROLE_LOGIN  0x0004

Any reason why INIT_PG_BYPASS_ROLE_LOGIN is not 0x0003?

Except that it does look good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




  1   2   >