date:20241114

On Fri, Nov 15, 2024 at 2:07 PM Amit Kapila  wrote:
>
> On Fri, Nov 15, 2024 at 6:10 AM Peter Smith  wrote:
> >
> > 3. A different approach?
> >
> > TBH, is introducing a whole new error message even a good idea?
> >
> > Now there are going to be two separate error messages where previously
> > there was only one. So if the table has multiple problems at the same
> > time then still only one of them can "win". i.e. you have to either
> > report the "generated columns" problem 1st or the "missing columns"
> > problem 1st -- either way that might not be a good user experience
> > because they might be unaware of multiple problems until they try the
> > CREATE SUBSCRIPTION a 2nd time and then it fails a 2nd time with the
> > other kind of error! That could be annoying.
> >
>
> I don't know why the user needs to perform CREATE SUBSCRIPTION
> multiple times to see this. IIUC, this error will happen in the apply
> worker and after fixing the first, the user should see the second. I
> think this can happen in other ways in apply worker as well.

Yeah, I was thinking more of the scenario where the CREATE
SUBSCRIPTION gave the immediate error, so the user panics and does
DROP SUBSCRIPTION to give them all the time they need while they fix
the problem. Then they won't see the second problem until they
recreate the subscription.

But if they just are happy to leave the original CREATE SUBSCRIPTION
failing continuously while they fix the first problem then I think you
are correct --- the error should just fall through further to show the
next problem.

>
> > A better solution may be just to *combine* everything, so the user
> > only has to deal with one error. IIUC that's what is already happening
> > in master code, so this patch doesn't need to do anything except make
> > a quite trivial change to the wording of the existing error message.
> >
> > For example:
> > BEFORE
> > errmsg_plural("logical replication target relation \"%s.%s\" is
> > missing replicated column: %s",
> >   "logical replication target relation \"%s.%s\" is
> > missing replicated columns: %s",
> > SUGGESTION
> > errmsg_plural("logical replication target relation \"%s.%s\" has
> > missing or generated replicated column: %s",
> >   "logical replication target relation \"%s.%s\" has
> > missing or generated replicated columns: %s",
> >
>
> With this, we can combine two different ERRORs into one but it won't
> be evident if the column name referred in the message is generated or
> missing. I see your point but combining two different errors into one
> is also confusing. We can try to add more checks to make this
> distinction clear but it doesn't seem worth the effort and complexity.
> Also, it is not clear whether combining different ERRORs is a good
> idea in the first place.
>

I don't know if it needs to be spelled out explicitly in the message
which is which because the user will surely know their own subscriber
table definition, so it will be quite obvious to them if a named
column is missing or generated.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Support LIKE with nondeterministic collations

On Tue, Nov 12, 2024 at 3:45 PM Peter Eisentraut  wrote:
>
> On 11.11.24 14:25, Heikki Linnakangas wrote:
> > Sadly the algorithm is O(n^2) with non-deterministic collations.Is there
> > any way this could be optimized? We make no claims on how expensive any
> > functions or operators are, so I suppose a slow implementation is
> > nevertheless better than throwing an error.
>
> Yeah, maybe someone comes up with new ideas in the future.
>

/*
* Now build a substring of the text and try to match it against
* the subpattern.  t is the start of the text, t1 is one past the
* last byte.  We start with a zero-length string.
*/
t1 = t
t1len = tlen;
for (;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);

select '.foo.' LIKE '_oo' COLLATE ign_punct;
pg_strncoll's iteration of the first 4 argument values.
oo  2   foo. 0
oo  2   foo. 1
oo  2   foo. 2
oo  2   foo. 3
oo  2   foo. 4

seems there is a shortcut/optimization.
if subpat don't have wildcard(percent sign, underscore)
then we can have less pg_strncoll calls?

minimum case to trigger error within GenericMatchText
since no related tests.
create table t1(a text collate case_insensitive, b text collate "C");
insert into t1 values ('a','a');
select a like b from t1;

at 9.7.1. LIKE  section, we still don't know what "wildcard" is.
we mentioned it at 9.7.2.
maybe we can add a sentence at the end of:

 If pattern does not contain percent
 signs or underscores, then the pattern only represents the string
 itself; in that case LIKE acts like the
 equals operator.  An underscore (_) in
 pattern stands for (matches) any single
 character; a percent sign (%) matches any sequence
 of zero or more characters.

saying underscore and percent sign are wildcards in LIKE.
other than that, I can understand the doc.

Re: proposal: schema variables

2024-11-14 Thread Pavel Stehule

čt 14. 11. 2024 v 8:41 odesílatel Pavel Stehule 
napsal:

>
>
> st 13. 11. 2024 v 17:35 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> napsal:
>
>> > On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
>> > ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <
>> pavel.steh...@gmail.com>
>> > napsal:
>> > I thought a lot of time about better solutions for identifier collisions
>> > and I really don't think so there is some consistent user friendly
>> syntax.
>> > Personally I think there is an easy already implemented solution -
>> > convention - just use a dedicated schema for variables and this schema
>> > should not be in the search path. Or use secondary convention - like
>> using
>> > prefix "__" for session variables. Common convention is using "_" for
>> > PLpgSQL variables. I searched how this issue is solved in other
>> databases,
>> > or in standard, and I found nothing special. The Oracle and SQL/PSM has
>> a
>> > concept of visibility - the variables are not visible outside packages
>> or
>> > modules, but Postgres has nothing similar. It can be emulated by a
>> > dedicated schema without inserting a search path, but it is less strong.
>> >
>> > I think we can introduce an alternative syntax, that will not be user
>> > friendly or readable friendly, but it can be without collisions - or can
>> > decrease possible risks.
>> >
>> > It is nothing new - SQL does it with old, "new" syntax of inner joins,
>> or
>> > in Postgres we can
>> >
>> > where salary < 4
>> >
>> > or
>> >
>> > where pg_catalog.int4lt(salary, 4);
>> >
>> >
>> > or some like we use for operators OPERATOR(*schema*.*operatorname*)
>> >
>> > So introducing VARIABLE(schema.variablename) syntax as an alternative
>> > syntax for accessing variables I really like. I strongly prefer to use
>> this
>> > as only alternative (secondary) syntax, because I don't think it is
>> > friendly syntax or writing friendly, but it is safe, and I can imagine
>> > tools that can replace generic syntax to this special, or that detects
>> > generic syntax and shows some warning. Then users can choose what they
>> > prefer. Two syntaxes - generic and special can be good enough for all -
>> and
>> > this can be perfectly consistent with current Postgres.
>>
>> As far as I recall, last time this topic was discussed in hackers, two
>> options were proposed: the one with VARIABLE(name), what you mention
>> here; and another one with adding variables to the FROM clause. The
>> VARIABLE(...) syntax didn't get much negative feedback, so I guess why
>> not -- if you find it fitting, it would be interesting to see the
>> implementation.
>>
>> I'm afraid it should not be just an alternative syntax, but the only one
>> allowed, because otherwise I don't see how scenarious like "drop a
>> column with the same name" could be avoided. As in the previous thread:
>>
>> -- we've got a variable b at the same time
>> SELECT a, b FROM table1;
>>
>> Then dropping the column b, but everything still works beause the
>> variable b got silently picked up. But if it would be required to say
>> VARIABLE(b), then all fine.
>>
>
> In this scenario you will get  a warning related to variable shadowing
> (before you drop a column).
>
> I think this issue can be partially similar to creating two equally named
> tables in different schemas (both schemas are in search path). When you
> drop one table, the query will work, but the result is different. It is the
> same issue. The SQL has no concept of shadowing and on the base line it is
> not necessary. But when you integrate SQL with some procedural code then
> you should solve this issue (or accept). This issue is real, and it is in
> every procedural enhancement of SQL that I know with the same syntax.  On
> the other hand I doubt this is a real issue. The changes of system
> catalogue are tested before production - so probably you will read a
> warning about a shadowed variable, and probably you will get different
> results, because variable b has the same value for all rows, and probably
> will have different value than column b. I can imagine the necessity of
> disabling this warning on production systems. Shadowing by self is not an
> issue, probably, but it is a signal of code quality problems.
>
> But this scenario is real, and then it is a question if the warning about
> shadowed variables should be only optional and if it can be disabled. Maybe
> not. Generally the shadowing is a strange concept - it is safeguard against
> serious issues, but it should not be used generally and everywhere the
> developer should rename the conflict identifiers.
>

There can be another example against usage of the FROM clause for
variables. Because it solves just one special case, but others are not
covered.

Theoretically, variables can have the same names as tables. The table
overshadows the variable, so it can work. But when somebody drops the
variable, then the query still can work. So requirement of usage variable
in

Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

2024-11-14 Thread vignesh C

On Thu, 14 Nov 2024 at 15:51, Shlok Kyal  wrote:
>
> Thanks for providing the comments.
>
> On Thu, 14 Nov 2024 at 12:22, vignesh C  wrote:
> >
> > On Wed, 13 Nov 2024 at 11:15, Shlok Kyal  wrote:
> > >
> > > Thanks for providing the comments.
> > >
> > > On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Friday, November 8, 2024 7:06 PM Shlok Kyal 
> > > >  wrote:
> > > > >
> > > > > Hi Amit,
> > > > >
> > > > > On Thu, 7 Nov 2024 at 11:37, Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal 
> > > > > > 
> > > > > wrote:
> > > > > > >
> > > > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > > > > > unpublished generated column as REPLICA IDENTITY. I have attached 
> > > > > > > a
> > > > > > > patch for the same.
> > > > > > >
> > > > > >
> > > > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > > > > > +testpub_gencol SET a = 100 WHERE a = 1;
> > > > > > +ERROR:  cannot update table "testpub_gencol"
> > > > > > +DETAIL:  Column list used by the publication does not cover the
> > > > > > replica identity.
> > > > > >
> > > > > > This is not a correct ERROR message as the publication doesn't have
> > > > > > any column list associated with it. You have added the code to 
> > > > > > detect
> > > > > > this in the column list code path which I think is not required. 
> > > > > > BTW,
> > > > > > you also need to consider the latest commit 7054186c4e for this. I
> > > > > > guess you need to keep another flag in PublicationDesc to detect 
> > > > > > this
> > > > > > and then give an appropriate ERROR.
> > > > >
> > > > > I have addressed the comments and provided an updated patch. Also, I 
> > > > > am
> > > > > currently working to fix this issue in back branches.
> > > >
> > > > Thanks for the patch. I am reviewing it and have some initial comments:
> > > >
> > > >
> > > > 1.
> > > > +   char attgenerated = get_attgenerated(relid, 
> > > > attnum);
> > > > +
> > > >
> > > > I think it's unnecessary to initialize attgenerated here because the 
> > > > value will
> > > > be overwritten if pubviaroot is true anyway. Also, the 
> > > > get_attgenerated()
> > > > is not cheap.
> > > >
> > > Fixed
> > >
> > > > 2.
> > > >
> > > > I think the patch missed to check the case when table is marked REPLICA
> > > > IDENTITY FULL, and generated column is not published:
> > > >
> > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) 
> > > > STORED NOT NULL);
> > > > ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > > > CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
> > > > UPDATE testpub_gencol SET a = 2;
> > > >
> > > > I expected the UPDATE to fail in above case, but it can still pass 
> > > > after applying the patch.
> > > >
> > > Fixed
> > >
> > > > 3.
> > > >
> > > > +* If the publication is FOR ALL TABLES we can skip the 
> > > > validation.
> > > > +*/
> > > >
> > > > This comment seems not clear to me, could you elaborate a bit more on 
> > > > this ?
> > > >
> > > I missed to handle the case FOR ALL TABLES. Have removed the comment.
> > >
> > > > 4.
> > > >
> > > > Also, I think the patch does not handle the FOR ALL TABLE case 
> > > > correctly:
> > > >
> > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) 
> > > > STORED NOT NULL);
> > > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
> > > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index 
> > > > testpub_gencol_idx;
> > > > CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
> > > > UPDATE testpub_gencol SET a = 2;
> > > >
> > > > I expected the UPDATE to fail in above case as well.
> > > >
> > > Fixed
> > >
> > > > 5.
> > > >
> > > > +   else if (cmd == CMD_UPDATE && 
> > > > !pubdesc.replident_has_valid_gen_cols)
> > > > +   ereport(ERROR,
> > > > +   
> > > > (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > > +errmsg("cannot update table \"%s\"",
> > > > +   
> > > > RelationGetRelationName(rel)),
> > > > +errdetail("REPLICA IDENTITY consists 
> > > > of an unpublished generated column.")));
> > > >
> > > > I think it would be better to use lower case "replica identity" to 
> > > > consistent
> > > > with other existing messages.
> > > >
> > > Fixed
> > >
> > > I have attached the updated patch here.
> >
> > Few comments:
> > 1) In the first check relation->rd_rel->relispartition also is checked
> > whereas in the below it is not checked, shouldn't the same check be
> > there below to avoid few of the function calls which are not required:
> > +   if (pubviaroot && relation->rd_rel->relispartition)
> > +   {
> > +   publish_as_relid =
> > GetTopMostAncestorInPublication(pubid, ancestors, NULL);
> > +
> > +   if (!OidI

Re: Improve the error message for logical replication of regular column to generated column.

On Fri, Nov 15, 2024 at 9:06 AM Peter Smith  wrote:
>
> On Fri, Nov 15, 2024 at 2:07 PM Amit Kapila  wrote:
> >
> > > A better solution may be just to *combine* everything, so the user
> > > only has to deal with one error. IIUC that's what is already happening
> > > in master code, so this patch doesn't need to do anything except make
> > > a quite trivial change to the wording of the existing error message.
> > >
> > > For example:
> > > BEFORE
> > > errmsg_plural("logical replication target relation \"%s.%s\" is
> > > missing replicated column: %s",
> > >   "logical replication target relation \"%s.%s\" is
> > > missing replicated columns: %s",
> > > SUGGESTION
> > > errmsg_plural("logical replication target relation \"%s.%s\" has
> > > missing or generated replicated column: %s",
> > >   "logical replication target relation \"%s.%s\" has
> > > missing or generated replicated columns: %s",
> > >
> >
> > With this, we can combine two different ERRORs into one but it won't
> > be evident if the column name referred in the message is generated or
> > missing. I see your point but combining two different errors into one
> > is also confusing. We can try to add more checks to make this
> > distinction clear but it doesn't seem worth the effort and complexity.
> > Also, it is not clear whether combining different ERRORs is a good
> > idea in the first place.
> >
>
> I don't know if it needs to be spelled out explicitly in the message
> which is which because the user will surely know their own subscriber
> table definition, so it will be quite obvious to them if a named
> column is missing or generated.
>

The separate messages in this case would be clearer and better.

-- 
With Regards,
Amit Kapila.

Re: define pg_structiszero(addr, s, r)

Hi,

On Fri, Nov 15, 2024 at 09:30:25AM +0900, Michael Paquier wrote:
> On Thu, Nov 14, 2024 at 12:33:20PM +, Bertrand Drouvot wrote:
> Anyway, as you say, the
> portability of v12 is OK even for sizeof(size_t) == 4 because we don't
> rely on any hardcoded values, and this patch does what it should in
> this case (double-checked myself manually for the three cases with
> -m32).

Yeah, thanks for the testing!

> > What would be unsafe on 32-bit would be to read up to 32 bytes while len < 
> > 32
> > and that can not happen.
> > 
> > As mentioned up-thread the comments are wrong on 32-bit, indeed they must 
> > be read
> > as:
> > 
> > Case 1: len < 4 bytes
> > Case 2: len in the 4-31 bytes range
> > Case 3: len >= 32 bytes
> 
> This part could be indeed better than what's proposed in v12, so I
> would recommend to use sizeof(size_t) a bit more consistently rather
> than have the reader guess that.

Makes sense even if that looks "more difficult" to read.

> Some comments feel duplicated, as well, like the "no risk" mentions,
> which are clear enough based on the description and the limitations of
> the previous cases.  I'd like to suggest a few tweaks, making the
> comments more flexible.  See 0003 that applies on top of your latest
> patch set, reattaching v12 again.

Thanks! Applied on v13 attached, except for things like:

"
-   /* Compare bytes until the pointer "p" is aligned */
+   /* Compare bytes until the pointer "p" is aligned. */
"

which is adding a "." at the end of single line comments (as the few already
part of this file don't do so).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 2de219babe0b9010a309152a2b629edce17278b0 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 8 Nov 2024 14:19:59 +0900
Subject: [PATCH v13 1/2] Optimize pg_memory_is_all_zeros()

pg_memory_is_all_zeros() is currently doing byte per byte comparison and so
could lead to performance regression or penalties when multi bytes comparison
could be done instead.

Let's provide an optimized version that divides the checks into multiple cases
for safety reason and multiple phases for efficiency:

Case 1: len < 8 bytes, then byte per byte comparison.

Case 2: len in the 8-63 bytes range:
  - Phase 1: Byte by byte comparison, until the pointer is aligned.
  - Phase 2: size_t comparisons, with aligned pointers, up to the last
 location possible.
  - Phase 3: Byte by byte comparison, until the end location.

Case 3: len >= 64 bytes, same as case 2 except that an additional phase is
is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t)
comparisons using bitwise OR, to encourage compilers to use SIMD
instructions if available, up to the last aligned location possible.

Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
memory area.

Code mainly suggested by David Rowley.
---
 src/include/utils/memutils.h | 119 ++-
 1 file changed, 116 insertions(+), 3 deletions(-)
 100.0% src/include/utils/

diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 3590c8bad9..c437232162 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -190,19 +190,132 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
 #define SLAB_LARGE_BLOCK_SIZE		(8 * 1024 * 1024)
 
 /*
+ * pg_memory_is_all_zeros
+ *
  * Test if a memory region starting at "ptr" and of size "len" is full of
  * zeroes.
+ *
+ * The test is divided into multiple cases for safety reason and multiple phases
+ * for efficiency.
+ *
+ * Case 1: len < sizeof(size_t) bytes, then byte per byte comparison.
+ * Case 2: len < (sizeof(size_t) * 8 - 1) bytes:
+ *   - Phase 1: Byte by byte comparison, until the pointer is aligned.
+ *   - Phase 2: size_t comparisons, with aligned pointers, up to the last
+ *  location possible.
+ *   - Phase 3: Byte by byte comparison, until the end location.
+ * Case 3: len >= (sizeof(size_t) * 8) bytes, same as case 2 except that an
+ * additional phase is placed between Phase 1 and Phase 2, with
+ * (8 * sizeof(size_t)) comparisons using bitwise OR, to encourage
+ * compilers to use SIMD instructions if available, up to the last
+ * aligned location possible.
+ *
+ * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
+ * memory area.  This is portable for 32-bit and 64-bit architectures.
+ *
+ * Caller must ensure that "ptr" is not NULL.
  */
 static inline bool
 pg_memory_is_all_zeros(const void *ptr, size_t len)
 {
-	const char *p = (const char *) ptr;
+	const unsigned char *p = (const unsigned char *) ptr;
+	const unsigned char *end = &p[len];
+	const unsigned char *aligned_end = (const unsigned char *)
+		((uintptr_t) end & (~(sizeof(size_t) - 1)));
+
+	if (len < sizeof(size_t))
+	{
+		while (p < end)
+		{

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Pavan Deolasee

On Fri, Nov 15, 2024 at 1:00 AM Tom Lane  wrote:

> Noah Misch  writes:
> > It's not immediately to clear to me why this would crash in a non-asserts
> > build.  palloc issues a 512-byte chunk for sizeof(ResultRelInfo)==368 on
> v16,
> > so I expect no actual writing past the end of the chunk.
>
> I'm confused too.  The allocation should be big enough.  The other
> hazard would be failing to initialize the field, but if the extension
> uses InitResultRelInfo then that's taken care of.
>

I should have mentioned in my original post that our limited PGD tests are
passing too. But I wasn't sure if the problem may hit us in the field,
given the subtleness of the memory corruption. But it's quite comforting to
read Noah's analysis about why this could be a non-issue for non-assert
builds.

Thanks,
Pavan

Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints

2024-11-14 Thread Robert Haas

On Thu, Nov 14, 2024 at 12:02 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> Alvaro stated that allowing a not null constraint state to be modified
> from INHERIT to NO INHERIT is going to be quite problematic because of the
> number of weird cases to avoid, so for now that support is not added.
>

What's the reasoning behind that restriction? What are the weird cases?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Improve the error message for logical replication of regular column to generated column.

On Fri, Nov 15, 2024 at 6:10 AM Peter Smith  wrote:
>
> 3. A different approach?
>
> TBH, is introducing a whole new error message even a good idea?
>
> Now there are going to be two separate error messages where previously
> there was only one. So if the table has multiple problems at the same
> time then still only one of them can "win". i.e. you have to either
> report the "generated columns" problem 1st or the "missing columns"
> problem 1st -- either way that might not be a good user experience
> because they might be unaware of multiple problems until they try the
> CREATE SUBSCRIPTION a 2nd time and then it fails a 2nd time with the
> other kind of error! That could be annoying.
>

I don't know why the user needs to perform CREATE SUBSCRIPTION
multiple times to see this. IIUC, this error will happen in the apply
worker and after fixing the first, the user should see the second. I
think this can happen in other ways in apply worker as well.

> A better solution may be just to *combine* everything, so the user
> only has to deal with one error. IIUC that's what is already happening
> in master code, so this patch doesn't need to do anything except make
> a quite trivial change to the wording of the existing error message.
>
> For example:
> BEFORE
> errmsg_plural("logical replication target relation \"%s.%s\" is
> missing replicated column: %s",
>   "logical replication target relation \"%s.%s\" is
> missing replicated columns: %s",
> SUGGESTION
> errmsg_plural("logical replication target relation \"%s.%s\" has
> missing or generated replicated column: %s",
>   "logical replication target relation \"%s.%s\" has
> missing or generated replicated columns: %s",
>

With this, we can combine two different ERRORs into one but it won't
be evident if the column name referred in the message is generated or
missing. I see your point but combining two different errors into one
is also confusing. We can try to add more checks to make this
distinction clear but it doesn't seem worth the effort and complexity.
Also, it is not clear whether combining different ERRORs is a good
idea in the first place.

-- 
With Regards,
Amit Kapila.

Re: Skip collecting decoded changes of already-aborted transactions

Hi Sawada-Sn,

Here are some review comments for patch v8-0001.

==
contrib/test_decoding/sql/stats.sql

1.
+-- The INSERT changes are large enough to be spilled but not, because the
+-- transaction is aborted. The logical decoding skips collecting further
+-- changes too. The transaction is prepared to make sure the decoding processes
+-- the aborted transaction.

/to be spilled but not/to be spilled but will not be/

==
.../replication/logical/reorderbuffer.c

ReorderBufferTruncateTXN:

2.
 /*
  * Discard changes from a transaction (and subtransactions), either after
- * streaming or decoding them at PREPARE. Keep the remaining info -
- * transactions, tuplecids, invalidations and snapshots.
+ * streaming, decoding them at PREPARE, or detecting the transaction abort.
+ * Keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.
  *
  * We additionally remove tuplecids after decoding the transaction at prepare
  * time as we only need to perform invalidation at rollback or commit prepared.
  *
+ * The given transaction is marked as streamed if appropriate and the caller
+ * asked it by passing 'mark_txn_streaming' being true.
+ *
  * 'txn_prepared' indicates that we have decoded the transaction at prepare
  * time.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_txn_streaming)

I think the function comment should describe the parameters in the
same order that they appear in the function signature.

~~~

3.
+ else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
(txn->nentries_mem != 0)))
+ {
...
+ txn->txn_flags |= RBTXN_IS_STREAMED;
+ }

I guess it doesn't matter much, but for the sake of readability,
should the condition also be checking !rbtxn_is_streamed(txn) to avoid
overwriting the RBTXN_IS_STREAMED bit when it was set already?

~~~

ReorderBufferTruncateTXNIfAborted:

4.
+ /*
+ * The transaction aborted. We discard the changes we've collected so far,
+ * and free all resources allocated for toast reconstruction. The full
+ * cleanup will happen as part of decoding ABORT record of this
+ * transaction.
+ *
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, false, false);

4a.
The first part of the comment says "... and free all resources
allocated for toast reconstruction", but the second part says "we
don't need to reset toast reconstruction data here". Is that a
contradiction?

~

4b.
Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
last param, like it used to?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

On Wed, Nov 13, 2024 at 5:23 PM Tomas Vondra  wrote:
>
> On 11/13/24 11:59, Amit Kapila wrote:
> > On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat
> >  wrote:
> >>
> >> On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada  
> >> wrote:
> >>>
> >>> On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra  wrote:
> 
> 
>  But neither of those fixes prevents backwards move for confirmed_flush
>  LSN, as enforced by asserts in the 0005 patch. I don't know if this
>  assert is incorrect or now. It seems natural that once we get a
>  confirmation for some LSN, we can't move before that position, but I'm
>  not sure about that. Maybe it's too strict.
> >>>
> >>> Hmm, I'm concerned that it might be another problem. I think there are
> >>> some cases where a subscriber sends a flush position older than slot's
> >>> confirmed_flush as a feedback message.
> >>>
> >
> > Right, it can happen for cases where subscribers doesn't have to do
> > anything (for example DDLs) like I have explained in one of my emails
> > [1]
> >
>
> Thanks. I admit not being entirely familiar with all the details, but
> doesn't that email explain more "Why it currently happens?" rather than
> "Is this what should be happening?"
>

Right.

> Sure, the subscriber needs to confirm changes for which nothing needs to
> be done, like DDL. But isn't there a better way to do that, rather than
> allowing confirmed_lsn to go backwards?
>

Firstly, I agree that we should try to find ways to avoid going
confirmed_lsn going backward. We can try to explore solutions both in
the publisher and subscriber-side.

> >>> But it seems to be dangerous if
> >>> we always accept it as a new confirmed_flush value. It could happen
> >>> that confirm_flush could be set to a LSN older than restart_lsn.
> >>>
> >
> > Possible, though I haven't tried to reproduce such a case. But, will
> > it create any issues? I don't know if there is any benefit in allowing
> > to move confirmed_flush LSN backward. AFAIR, we don't allow such
> > backward values to persist. They will temporarily be in memory. I
> > think as a separate patch we should prevent it from moving backward.
> >
> >>
> >> If confirmed_flush LSN moves backwards, it means the transactions
> >> which were thought to be replicated earlier are no longer considered
> >> to be replicated. This means that the restart_lsn of the slot needs to
> >> be at least far back as the oldest of starting points of those
> >> transactions. Thus restart_lsn of slot has to be pushed further back.
> >>
> >
> > I don't see a reason to move restart_lsn backward. Why do you think so?
> >
>
> I think what Ashutosh is saying that if confirmed_flush is allowed to
> move backwards, that may result in start_lsn moving backwards too. And
> we need to be able to decode all transactions committed since start_lsn,
> so if start_lsn moves backwards, maybe restart_lsn needs to move
> backwards too. I have no idea if confirmed_flush/start_lsn can move
> backwards enough to require restart_lsn to move, though.
>
> Anyway, these discussions are a good illustration why I think allowing
> these LSNs to move backwards is a problem. It either causes bugs (like
> with breaking replication slots) and/or it makes the reasoning about
> correct behavior much harder.
>

Right, I think one needs to come up with some reproducible scenarios
where these cause any kind of problem or inefficiency. Then, we can
discuss the solutions accordingly. I mean to say that someone has to
put effort into making a bit more solid case for changing this code
because it may not be a good idea to change something just based on
some theory unless it is just adding some assertions.

-- 
With Regards,
Amit Kapila.

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

On Thu, Nov 14, 2024 at 7:08 AM Masahiko Sawada  wrote:
>
> Sure. I've attached the updated patch. I just added the commit message.
>

@@ -1815,6 +1818,8 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr
current_lsn, XLogRecPtr restart
  confirmed_flush = slot->data.confirmed_flush;
  SpinLockRelease(&slot->mutex);

+ spin_released = true;
+
  elog(DEBUG1, "failed to increase restart lsn: proposed %X/%X, after
%X/%X, current candidate %X/%X, current after %X/%X, flushed up to
%X/%X",
  LSN_FORMAT_ARGS(restart_lsn),
  LSN_FORMAT_ARGS(current_lsn),
@@ -1823,6 +1828,9 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr
current_lsn, XLogRecPtr restart
  LSN_FORMAT_ARGS(confirmed_flush));
  }

+ if (!spin_released)
+ SpinLockRelease(&slot->mutex);

This coding pattern looks odd to me. We can consider releasing
spinlock in the other two if/else if checks. I understand it is a
matter of individual preference, so, if you and or others prefer the
current way, that is also fine with me. Other than this, the patch
looks good to me.

-- 
With Regards,
Amit Kapila.

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Peter Geoghegan

On Thu, Nov 14, 2024 at 11:29 AM Alvaro Herrera  wrote:
> ISTM that we have spare bytes where we could place that boolean without
> breaking ABI.  In x86_64 there's a couple of 4-byte holes, but those
> won't be there on x86, so not great candidates.  Fortunately there are
> 3-byte and 7-byte holes also, which we can use safely.  We can move the
> new boolean to those location.

Wasn't this part of the official guidelines? I've been doing this all
along (e.g., in commit 3fa81b62e0).

> The holes are different in each branch unfortunately.

Yeah, that'd make it a bit more complicated, but still doable. It
would be necessary to place the same field in seemingly random
locations on each backbranch.

FWIW recent versions of clangd will show me information about field
padding in struct annotations. I don't even have to run pahole.

--
Peter Geoghegan

Re: Potential ABI breakage in upcoming minor releases

Alvaro Herrera  writes:
> On 2024-Nov-14, Christoph Berg wrote:
>> It didn't actually crash, true.

> But timescale did crash:

So what is timescale doing differently?

regards, tom lane

RE: FW: Building Postgres 17.0 with meson

2024-11-14 Thread Mark Hill

On Wed, 13 Nov 2024 at 10:53AM(EST), Nazir Bilal Yavuz  
wrote:

> I think that the problem is that you are setting '.../include/openssl'
> as an include_dir not '.../include'. Could you please try:

> set 
> openssl_include_dir=D:\Jenkins\workspace\workspace\Postgres-9.4\OpenSSL\OpenSSL-Install\OpenSSL-3.1.6-wx6\include

Hi Nazir,

Thank you, that worked!   I had actually tried that with openssl, icu, and zlib 
all enabled but when either icu or zlib failed it didn't get
as far as openssl so I didn't know that it would work until I tried with just 
openssl by itself.  I have the setup working now but ninja
fails when I do the setup with openssl, zlib, and icu.   If I setup just 
openssl and zlib, the build works.

With openssl, zlib, and icu enabled in the setup, the ninja build fails trying 
to link src/backend/postgres.exe.  There are 40 unresolved
external symbol errors (see below.)  I checked a few of the symbols and they 
appear in the Postgres source without the "_72" text on
the end.   Is it getting "72" from the version of icu4c I'm using, 72.1?   
Anyone know how to prevent these errors?

Thanks, Mark


[1206/2170] Linking target src/backend/postgres.exe
FAILED: src/backend/postgres.exe
"link" @src/backend/postgres.exe.rsp
   Creating library src\backend\postgres.lib
commands_collationcmds.c.obj : error LNK2019: unresolved external symbol 
uloc_getDisplayName_72 referenced in function get_icu_locale_comment
commands_collationcmds.c.obj : error LNK2019: unresolved external symbol 
uloc_getAvailable_72 referenced in function pg_import_system_collations
utils_adt_pg_locale.c.obj : error LNK2001: unresolved external symbol 
uloc_getAvailable_72
commands_collationcmds.c.obj : error LNK2019: unresolved external symbol 
uloc_countAvailable_72 referenced in function pg_import_system_collations
utils_adt_pg_locale.c.obj : error LNK2001: unresolved external symbol 
uloc_countAvailable_72
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_islower_72 
referenced in function pg_wc_islower
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isupper_72 
referenced in function pg_wc_isupper
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isdigit_72 
referenced in function pg_wc_isdigit
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isalpha_72 
referenced in function pg_wc_isalpha
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isalnum_72 
referenced in function pg_wc_isalnum
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_ispunct_72 
referenced in function pg_wc_ispunct
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isgraph_72 
referenced in function pg_wc_isgraph
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isspace_72 
referenced in function pg_wc_isspace
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_isprint_72 
referenced in function pg_wc_isprint
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_tolower_72 
referenced in function casecmp
regex_regcomp.c.obj : error LNK2019: unresolved external symbol u_toupper_72 
referenced in function pg_wc_toupper
utils_adt_formatting.c.obj : error LNK2019: unresolved external symbol 
u_errorName_72 referenced in function icu_convert_case
utils_adt_pg_locale.c.obj : error LNK2001: unresolved external symbol 
u_errorName_72
utils_adt_formatting.c.obj : error LNK2019: unresolved external symbol 
u_strToUpper_72 referenced in function str_toupper
utils_adt_formatting.c.obj : error LNK2019: unresolved external symbol 
u_strToLower_72 referenced in function str_tolower
utils_adt_formatting.c.obj : error LNK2019: unresolved external symbol 
u_strToTitle_72 referenced in function u_strToTitle_default_BI
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
u_versionToString_72 referenced in function get_collation_actual_version
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
uiter_setString_72 referenced in function pg_strnxfrm_prefix_icu
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
uiter_setUTF8_72 referenced in function pg_strnxfrm_prefix_icu
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
uloc_getLanguage_72 referenced in function icu_validate_locale
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
uloc_canonicalize_72 referenced in function icu_set_collation_attributes
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
uloc_toLanguageTag_72 referenced in function icu_language_tag
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
ucol_open_72 referenced in function pg_ucol_open
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
ucol_openRules_72 referenced in function make_icu_collator
utils_adt_pg_locale.c.obj : error LNK2019: unresolved external symbol 
ucol_close_72 referenced in function get_collation_actual_version
utils_adt_pg_

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Pavan Deolasee

On Thu, Nov 14, 2024 at 9:43 PM Peter Eisentraut 
wrote:

> On 14.11.24 15:35, Noah Misch wrote:
> > The postgr.es/c/e54a42a standard would have us stop here.  But I'm open
> to
> > treating the standard as mistaken and changing things.
>
> That text explicitly calls out that adding struct members at the end of
> a struct is considered okay.

I think that assumption is ok when the struct is allocated by core and
passed to the extension. But if the struct is allocated (static or dynamic)
and passed to the core, we have to be extra careful.

> But thinking about it now, even adding
> fields to the end of a node struct that extensions allocate using
> makeNode() is an ABI break that is liable to cause all affected
> extensions to break in a crashing way.
>

Memory corruption issues can be quite subtle and hard to diagnose. So if
wrapping a new release is an option, I would vote for it. If we do that,
doing it for every impacted version would be more prudent. But I don't know
what are the policies around making a new release.

Thanks,
Pavan

Re: Potential ABI breakage in upcoming minor releases

Hi,

> The allocation should be big enough.  The other
> hazard would be failing to initialize the field, but if the extension
> uses InitResultRelInfo then that's taken care of.

> So what is timescale doing differently?

I see 3 usages of makeNode(ResultRelInfo) in Timescale:

- src/nodes/chunk_dispatch/chunk_insert_state.c
- src/copy.c
- src/ts_catalog/catalog.c

 In the first case it's followed by InitResultRelInfo(). In the second
- by ExecInitResultRelation() in its turn calls InitResultRelInfo().

The third case is the following:

```
extern TSDLLEXPORT ResultRelInfo *
ts_catalog_open_indexes(Relation heapRel)
{
ResultRelInfo *resultRelInfo;

resultRelInfo = makeNode(ResultRelInfo);
resultRelInfo->ri_RangeTableIndex = 0; /* dummy */
resultRelInfo->ri_RelationDesc = heapRel;
resultRelInfo->ri_TrigDesc = NULL; /* we don't fire triggers */

ExecOpenIndices(resultRelInfo, false);

return resultRelInfo;
}
```

Where it's used from there is hard to track but it looks like this is
the reason why the new field ri_needLockTagTuple is not initialized.
I'll pass this piece of information to my colleagues.

-- 
Best regards,
Aleksander Alekseev

Re: UUID v7

2024-11-14 Thread Masahiko Sawada

On Mon, Nov 11, 2024 at 12:20 PM Masahiko Sawada  wrote:
>
> On Sat, Nov 9, 2024 at 9:07 AM Sergey Prokhorenko
>  wrote:
> >
> > On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada 
> >  wrote:
> >
> > > the microsecond part is working also as a counter in a sense. IT seems 
> > > fine to me but I'm slightly concerned that there is no guidance of such 
> > > implementation in RFC 9562.
> >
> > In fact, there is guidance of similar implementation in RFC 9562:
> > https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
> > "Counter Rollover Handling:"
> > "Alternatively, implementations MAY increment the timestamp ahead of the 
> > actual time and reinitialize the counter."
> >
>
> Indeed, thank you.
>
> > But in the near future, this may not be enough for the highest-performance 
> > systems.
>
> Yeah, I'm concerned about this. That time might gradually come. That
> being said, as long as rand_a part works also as a counter, it's fine.
> Also, 12 bits does not differ much as Andrey Borodin mentioned. I
> think in the first version it's better to start with a simple
> implementation rather than over-engineering it.
>
> Regarding the implementation, the v30 patch uses only microseconds
> precision time even on platforms where nanoseconds precision is
> available such as Linux. I think it's better to store the value of
> (sub-milliseconds * 4096) into 12-bits of rand_a space instead of
> directly storing microseconds into 10 bits space.

IIUC v29 patch implements UUIDv7 generation in this way. So I've
reviewed v29 patch and here are some review comments:

---
 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-* http://tools.ietf.org/html/rfc4122#section-4.4
+* http://tools.ietf.org/html/rfc9562#section-4.4
 */

The new RFC doesn't have section 4.4.

---
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.

I'm concerned that "version and variant 0b10 bits" is not very clear
to readers. I think we can just mention "... except version and
variant bits".

---
+
+#ifndef WIN32
+#include 
+
+static uint64 get_real_time_ns()
+{
+   struct timespec tmp;
+
+   clock_gettime(CLOCK_REALTIME, &tmp);
+   return tmp.tv_sec * 10L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include 
+#include 
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(1164447360);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+   FILETIMEfile_time;
+   ULARGE_INTEGER ularge;
+
+   GetSystemTimePreciseAsFileTime(&file_time);
+   ularge.LowPart = file_time.dwLowDateTime;
+   ularge.HighPart = file_time.dwHighDateTime;
+
+   return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif

I think that it's better to implement these functions in instr_time.h
or another file.

---
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (100/4096 + 1)

I think we can rewrite it to:

#define NS_PER_MS INT64CONST(100)
#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)

Which improves the readability.

Also, I think "#define NS_PER_US INT64CONST(1000)" can also be used in
many places.

---
+   /* set version field, top four bits are 0, 1, 1, 1 */
+   uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+   /* set variant field, top two bits are 1, 0 */
+   uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;

I think we can make an inline function to set both variant and version
so we can use it for generating UUIDv4 and UUIDv7.

--
+   tms = uuid->data[5];
+   tms += ((uint64) uuid->data[4]) << 8;
+   tms += ((uint64) uuid->data[3]) << 16;
+   tms += ((uint64) uuid->data[2]) << 24;
+   tms += ((uint64) uuid->data[1]) << 32;
+   tms += ((uint64) uuid->data[0]) << 40;

How about rewriting these to the following for consistency with UUIDv1 codes?

tms = uuid->data[5]
+ ((uint64) uuid->data[4] << 8)
+ ((uint64) uuid->data[3] << 16)
+ ((uint64) uuid->data[2] << 24)
+ ((uint64) uuid->data[1] << 32)
+ ((uint64) uuid->data[0] << 40);

---
Thinking about the function structures more, I think we can refactor
generate_uuidv7(), uuidv7() and uuidv7_interval():

- create a function, get_clock_timestamp_ns(), that provides a
nanosecond-precision timestamp
- the returned timestamp is guaranteed to be greater than the
previous returned value.
- this function can be inlined.
- create a function, g

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-11-14 Thread torikoshia


On 2024-11-13 22:02, Yugo NAGATA wrote:

On Tue, 12 Nov 2024 17:38:25 +0900
torikoshia  wrote:


On 2024-11-12 14:17, Yugo Nagata wrote:
> On Tue, 12 Nov 2024 14:03:50 +0900
> Yugo Nagata  wrote:
>
>> On Tue, 12 Nov 2024 01:27:53 +0500
>> Kirill Reshke  wrote:
>>
>> > On Mon, 11 Nov 2024 at 16:11, torikoshia  
wrote:
>> > >
>> > > On 2024-11-09 21:55, Kirill Reshke wrote:
>> > >
>> > > Thanks for working on this!
>> >
>> > Thanks for reviewing the v7 patch series!
>> >
>> > > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao 
>> > > > wrote:
>> > > >>
>> > > >>
>> > > >>
>> > > >> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > > >> > when the REJECT LIMIT is set to some non-zero number and the number 
of
>> > > >> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > > >> > there WAS errors, and we should not tolerate more than $limit 
errors .
>> > > >> > I do find this behavior to be consistent.
>> > > >>
>> > > >> +1
>> > > >>
>> > > >>
>> > > >> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > > >> > replacements, as if REJECT LIMIT is inf.
>> > > >>
>> > > >> +1
>> > > >
>> > > > After thinking for a while, I'm now more opposed to this approach. I
>> > > > think we should count rows with erroneous data as errors only if
>> > > > null substitution for these rows failed, not the total number of rows
>> > > > which were modified.
>> > > > Then, to respect the REJECT LIMIT option, we compare this number with
>> > > > the limit. This is actually simpler approach IMHO. What do You think?
>> > >
>> > > IMHO I prefer the previous interpretation.
>> > > I'm not sure this is what people expect, but I assume that REJECT_LIMIT
>> > > is used to specify how many malformed rows are acceptable in the
>> > > "original" data source.
>>
>> I also prefer the previous version.
>>
>> > I do like the first version of interpretation, but I have a struggle
>> > with it. According to this interpretation, we will fail COPY command
>> > if the number
>> > of malformed data rows exceeds the limit, not the number of rejected
>> > rows (some percentage of malformed rows are accepted with nulnot-null 
constraintl
>> > substitution)

I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input 
value

to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.


I can suppose "reject" means failure of COPY command here, that is, a 
reject
of executing the command, not an error of data input. If so, we can 
interpret
REJECT_LIMIT as "the number of malformed rows allowed before the COPY 
command
is REJECTed" (not the number of rejected rows). In this case, I think 
we don't

have to rename the option name.

Of course, if there is more proper name that makes it easy for users to
understand the behaviour of the option, renaming should be nice.


>> The documentation says that REJECT_LIMIT "Specifies the maximum number
>> of errors",
>> and there are no wording "reject" in the description, so I wonder it
>> is unclear
>> what means in "REJECT" in REJECT_LIMIT. It may be proper to use
>> ERROR_LIMIT
>> since it is supposed to be used with ON_ERROR.
>>
>> Alternatively, if we emphasize that errors are handled other than
>> terminating
>> the command,perhaps MALFORMED_LIMIT as proposed above or
>> TOLERANCE_LIMIT may be
>> good, for example.
>
> I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
> limit on
> the number of rows with any malformed value allowed before the COPY
> command is
> rejected", we would not have to rename it.

The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed) 
with

null substitution. REJECT_LIMIT counts this case as REJECTed.


I am a bit confused.


Me too:)
Let me explain my understanding.

I believe there are now two candidates that count as REJECT_LIMIT 
number:


  (1) error converting a column's input value into its data type(soft 
error)

  (2) NULL substitution failure(this comes from the proposed patch)

And I understood Kirill's idea to be the following:

  1st idea: REJECT_LIMIT counts (1)
  2nd idea: REJECT_LIMIT counts (2)

And I've agreed with the 1st one.


You mean "REJECT" is raising a soft error of data
input here instead of terminating COPY?


Yes.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: Potential ABI breakage in upcoming minor releases

Aleksander Alekseev  writes:
> The third case is the following:

> ```
> extern TSDLLEXPORT ResultRelInfo *
> ts_catalog_open_indexes(Relation heapRel)
> {
> ResultRelInfo *resultRelInfo;

> resultRelInfo = makeNode(ResultRelInfo);
> resultRelInfo->ri_RangeTableIndex = 0; /* dummy */
> resultRelInfo->ri_RelationDesc = heapRel;
> resultRelInfo->ri_TrigDesc = NULL; /* we don't fire triggers */

> ExecOpenIndices(resultRelInfo, false);

> return resultRelInfo;
> }
> ```

I'd call that a bug in timescale.  It has no business assuming that
it can skip InitResultRelInfo.

regards, tom lane

Re: Potential ABI breakage in upcoming minor releases

Hi Alvaro,

> There are three makeNode(ResultRelInfo), but they don't look anything
> out of the ordinary:
> [...]
> src/ts_catalog/catalog.c-extern TSDLLEXPORT ResultRelInfo *
> src/ts_catalog/catalog.c-ts_catalog_open_indexes(Relation heapRel)
> src/ts_catalog/catalog.c-{
> src/ts_catalog/catalog.c-   ResultRelInfo *resultRelInfo;
> src/ts_catalog/catalog.c-
> src/ts_catalog/catalog.c:   resultRelInfo = makeNode(ResultRelInfo);
> src/ts_catalog/catalog.c-   resultRelInfo->ri_RangeTableIndex = 0; /* dummy */
> src/ts_catalog/catalog.c-   resultRelInfo->ri_RelationDesc = heapRel;
> src/ts_catalog/catalog.c-   resultRelInfo->ri_TrigDesc = NULL; /* we don't 
> fire triggers */
> src/ts_catalog/catalog.c-
> src/ts_catalog/catalog.c-   ExecOpenIndices(resultRelInfo, false);

ExecOpenIndices() will not fill the ri_needLockTagTuple field for the
caller and it will contain garbage.

Since you assumed this is the correct usage, maybe it should?

-- 
Best regards,
Aleksander Alekseev

Re: BF mamba failure

On Thu, Nov 14, 2024 at 10:07:41AM +, Bertrand Drouvot wrote:
> === 1
> 
> Maybe use "generation" instead of generation in the comments (where it's not 
> done,
> some comments do it already).

I've tweaked things to be more consistency, and applied that down to 15.

> === 2
> 
> We could think about adding a test. That should be doable with replication 
> slots
> or custom pgstats kinds and probably injection points. But I'm not sure that's
> worth it, as the code in the patch looks "simple" enough. Thoughts?

The tricky part is that this path involves a backend shutdown.  We
should be able to do something with a wait before the dshash_find() in
pgstat_release_entry_ref() with a BackgroundPsql object that gets
stopped, but I'm not sure if we can guarantee a lookup of
pg_stat_activity at this stage.  Let me see..
--
Michael


signature.asc
Description: PGP signature

Re: define pg_structiszero(addr, s, r)

On Fri, Nov 15, 2024 at 06:46:52AM +, Bertrand Drouvot wrote:
> Makes sense even if that looks "more difficult" to read.

I don't think it's that "bad".

> which is adding a "." at the end of single line comments (as the few already
> part of this file don't do so).

I did so after looking at the surroundings.  One way or the other is
fine by me as long as we're consistent.

I still need to have a second look.  For now, let's wait a bit, in
case there are any comments.
--
Michael


signature.asc
Description: PGP signature

Re: SQL:2011 application time

2024-11-14 Thread Matthias van de Meent

On Wed, 13 Nov 2024, 11:11 Peter Eisentraut,  wrote:
> This conditional is really hard to understand:
>
> +   /*
> +* The AM must support uniqueness, and the index must in fact be
> unique.
> +* If we have a WITHOUT OVERLAPS constraint (identified by
> uniqueness +
> +* exclusion), we can use that too.
> +*/
> +   if ((!indexRel->rd_indam->amcanunique ||
> +   !indexRel->rd_index->indisunique) &&
> +   !(indexRel->rd_index->indisunique &&
> indexRel->rd_index->indisexclusion))
>
> Why can we have a indisunique index when the AM is not amcanunique?  Are
> we using the fields wrong?

I called this issue out earlier this year: amcanunique implies
btree-style uniqueness, and allows CREATE UNIQUE INDEX. However, that
IndexAmRoutine field seems to be ignored for indexes that are created
to back temporal unique constraints, which thus get
indrel->indisunique = true. This causes interesting issues when you
look at the index catalog and errors: there are indexes with
indisunique using gist, but CREATE UNIQUE INDEX USING gist (...)
throws the nice "access method "gist" does not support unique indexes"
error.

It'd be nice if there was a better internal API to describe what types
of uniqueness each index supports, so CREATE UNIQUE INDEX could work
with gist for WITHOUT OVERLAPS, and these WITHOUT OVERLAPS unique
indexes could be attached to primary keys without taking O(>=
tablesize) of effort.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Potential ABI breakage in upcoming minor releases

On Thu, Nov 14, 2024 at 10:26:58AM -0800, Noah Misch wrote:
> On Thu, Nov 14, 2024 at 05:29:18PM +0100, Christoph Berg wrote:
> > Re: Noah Misch
> > > Based on a grep of PGXN code, here are some or all of the modules that 
> > > react
> > > to sizeof(ResultRelInfo):
> > > 
> > > $ grepx -r 'lloc.*ResultRelInfo' | tee /tmp/1 | sed 's/-[^:]*/:/'|sort -u
> > > apacheage::resultRelInfo = palloc(sizeof(ResultRelInfo));
> > 
> > Confirmed, crashing: AGE for 14..17 (12..13 seem fine)
> 
> Can you share more about the crash, perhaps the following?
> 
> - stack trace
> - any server messages just before the crash
> - which architecture (32-bit x86, 64-bit ARM, etc.)
> - asserts enabled, or not?
> 
> It's not immediately to clear to me why this would crash in a non-asserts
> build.  palloc issues a 512-byte chunk for sizeof(ResultRelInfo)==368 on v16,
> so I expect no actual writing past the end of the chunk.  I don't see
> apacheage allocating a ResultRelInfo other than via one palloc per
> ResultRelInfo (no arrays of them, no stack-allocated ResultRelInfo).  I'll
> also work on installing apacheage to get those answers locally.

On x86_64, I ran these with and without asserts:
  install PostgreSQL 16.4
  install https://github.com/apache/age/tree/master
  make -C age installcheck
  install PostgreSQL 16.5
  make -C age installcheck

The non-asserts build passed.  The asserts build failed with "+WARNING:
problem in alloc set ExecutorState: detected write past chunk" throughout the
diffs, but there were no crashes.  (Note that AGE "make installcheck" creates
a temporary installation, unlike PostgreSQL "make installcheck".)  What might
differ in how you tested?

Re: Improve the error message for logical replication of regular column to generated column.

Hi Shubham.

==
Commit message.

1.
FYI, to clarify my previous review comment [1] #1, I think a more
correct commit message might be:

SUGGESTION
Currently, if logical replication attempts to target a subscriber-side
table column that is either missing or generated, it produces the
following identical error message:
ERROR: logical replication target relation \"%s.%s\" is missing
replicated columns: %s

While the error itself is valid, the message wording can be misleading
for generated columns. This patch introduces a distinct error message
specifically for the generated column scenario.

==
src/backend/replication/logical/relation.c

2.
I noticed another problem when testing the new error message. There
are too many quotes for the column names. e.g.
2024-11-15 09:59:54.966 AEDT [32701] ERROR: replicating to a target
relation's generated column ""b"" for "public.t1" is not supported

This is because the patch code is quoting the individual faulty
columns and then you are re-quoting the whole list of faulty column
again in the err message. Please see the existing code in
'logicalrep_report_missing_attrs' for how this should look -- e.g. the
column list %s substitution marker in the message is NOT quoted.

"... is missing replicated column: %s"

BUT...

3. A different approach?

TBH, is introducing a whole new error message even a good idea?

Now there are going to be two separate error messages where previously
there was only one. So if the table has multiple problems at the same
time then still only one of them can "win". i.e. you have to either
report the "generated columns" problem 1st or the "missing columns"
problem 1st -- either way that might not be a good user experience
because they might be unaware of multiple problems until they try the
CREATE SUBSCRIPTION a 2nd time and then it fails a 2nd time with the
other kind of error! That could be annoying.

A better solution may be just to *combine* everything, so the user
only has to deal with one error. IIUC that's what is already happening
in master code, so this patch doesn't need to do anything except make
a quite trivial change to the wording of the existing error message.

For example:
BEFORE
errmsg_plural("logical replication target relation \"%s.%s\" is
missing replicated column: %s",
"logical replication target relation \"%s.%s\" is
missing replicated columns: %s",
SUGGESTION
errmsg_plural("logical replication target relation \"%s.%s\" has
missing or generated replicated column: %s",
"logical replication target relation \"%s.%s\" has
missing or generated replicated columns: %s",

Thoughts?

==
[1]
https://www.postgresql.org/message-id/CAHut%2BPt_vyFDGMbLXa94o4ffn4jNmFc8s6jkhmW-%3DBRTZM-HtQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: define pg_structiszero(addr, s, r)

On Thu, Nov 14, 2024 at 12:33:20PM +, Bertrand Drouvot wrote:
> Case 2 should be read as "in the 4-31" bytes range on 32-bit system as all 
> comparisons are done in size_t.

I'd suggest to use a -m32 in your gcc switches, when it comes to
tests, but you already know that..  Anyway, as you say, the
portability of v12 is OK even for sizeof(size_t) == 4 because we don't
rely on any hardcoded values, and this patch does what it should in
this case (double-checked myself manually for the three cases with
-m32).

> What would be unsafe on 32-bit would be to read up to 32 bytes while len < 32
> and that can not happen.
> 
> As mentioned up-thread the comments are wrong on 32-bit, indeed they must be 
> read
> as:
> 
> Case 1: len < 4 bytes
> Case 2: len in the 4-31 bytes range
> Case 3: len >= 32 bytes

This part could be indeed better than what's proposed in v12, so I
would recommend to use sizeof(size_t) a bit more consistently rather
than have the reader guess that.  Note that some parts of v12-0001 use
sizeof(size_t) in the comments, which makes things inconsistent.

Some comments feel duplicated, as well, like the "no risk" mentions,
which are clear enough based on the description and the limitations of
the previous cases.  I'd like to suggest a few tweaks, making the
comments more flexible.  See 0003 that applies on top of your latest
patch set, reattaching v12 again.
--
Michael
From 59bbfd7ece0deb785f73a94cd53d330d86b58875 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 8 Nov 2024 14:19:59 +0900
Subject: [PATCH v13 1/3] Optimize pg_memory_is_all_zeros()

pg_memory_is_all_zeros() is currently doing byte per byte comparison and so
could lead to performance regression or penalties when multi bytes comparison
could be done instead.

Let's provide an optimized version that divides the checks into multiple cases
for safety reason and multiple phases for efficiency:

Case 1: len < 8 bytes, then byte per byte comparison.

Case 2: len in the 8-63 bytes range:
  - Phase 1: Byte by byte comparison, until the pointer is aligned.
  - Phase 2: size_t comparisons, with aligned pointers, up to the last
 location possible.
  - Phase 3: Byte by byte comparison, until the end location.

Case 3: len >= 64 bytes, same as case 2 except that an additional phase is
is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t)
comparisons using bitwise OR, to encourage compilers to use SIMD
instructions if available, up to the last aligned location possible.

Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
memory area.

Code mainly suggested by David Rowley.
---
 src/include/utils/memutils.h | 123 ++-
 1 file changed, 120 insertions(+), 3 deletions(-)

diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 3590c8bad9..0808e91b77 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -190,19 +190,136 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
 #define SLAB_LARGE_BLOCK_SIZE		(8 * 1024 * 1024)
 
 /*
+ * pg_memory_is_all_zeros
+ *
  * Test if a memory region starting at "ptr" and of size "len" is full of
  * zeroes.
+ *
+ * The test is divided into multiple cases for safety reason and multiple phases
+ * for efficiency.
+ *
+ * Case 1: len < 8 bytes, then byte per byte comparison.
+ * Case 2: len in the 8-63 bytes range:
+ *   - Phase 1: Byte by byte comparison, until the pointer is aligned.
+ *   - Phase 2: size_t comparisons, with aligned pointers, up to the last
+ *  location possible.
+ *   - Phase 3: Byte by byte comparison, until the end location.
+ * Case 3: len >= 64 bytes, same as case 2 except that an additional phase is
+ * is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t)
+ * comparisons using bitwise OR, to encourage compilers to use SIMD
+ * instructions if available, up to the last aligned location possible.
+ *
+ * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
+ * memory area.
+ *
+ * Caller must ensure that "ptr" is not NULL.
  */
 static inline bool
 pg_memory_is_all_zeros(const void *ptr, size_t len)
 {
-	const char *p = (const char *) ptr;
+	const unsigned char *p = (const unsigned char *) ptr;
+	const unsigned char *end = &p[len];
+	const unsigned char *aligned_end = (const unsigned char *)
+		((uintptr_t) end & (~(sizeof(size_t) - 1)));
 
-	for (size_t i = 0; i < len; i++)
+	/* len < 8 bytes case */
+	if (len < sizeof(size_t))
 	{
-		if (p[i] != 0)
+		while (p < end)
+		{
+			if (*p++ != 0)
+return false;
+		}
+		return true;
+	}
+
+	/* len in the 8-63 bytes range case */
+	if (len < sizeof(size_t) * 8)
+	{
+		/* Compare bytes until the pointer "p" is aligned */
+		while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
+		{
+			if (p == end)
+return true;
+			if (*p++ != 0)
+return false;
+		}
+
+		/*
+		 * Compare remainin

Re: SQL:2011 application time

2024-11-14 Thread Paul Jungwirth

On 11/14/24 10:14, Matthias van de Meent wrote:

I called this issue out earlier this year: amcanunique implies
btree-style uniqueness, and allows CREATE UNIQUE INDEX. However, that
IndexAmRoutine field seems to be ignored for indexes that are created
to back temporal unique constraints, which thus get
indrel->indisunique = true. This causes interesting issues when you
look at the index catalog and errors: there are indexes with
indisunique using gist, but CREATE UNIQUE INDEX USING gist (...)
throws the nice "access method "gist" does not support unique indexes"
error.

It'd be nice if there was a better internal API to describe what types
of uniqueness each index supports, so CREATE UNIQUE INDEX could work
with gist for WITHOUT OVERLAPS, and these WITHOUT OVERLAPS unique
indexes could be attached to primary keys without taking O(>=
tablesize) of effort.

I think the issue is that a specific GiST opclass may support uniqueness (if it defines the support
proc to communicate its equality stratnum), but the AM as a whole doesn't support uniqueness. So
amcanunique is false. We could make the stratnum support proc required, but that seems like it would
break too many extensions. Or maybe we could change canunique to an opclass-level property?

Probably we could support `CREATE UNIQUE INDEX USING gist (...)` *if* the opclasses involved have
stratnum support funcs. I'm happy to write/assist a patch for that. It would use exclusion
constraints behind the scenes, as we're doing with temporal PKs/UNIQUEs. But that still wouldn't
give you CONCURRENTLY (which is the main motivation), because we don't support creating exclusion
constraints concurrently yet. The work in 2014 on REINDEX CONCURRENTLY originally tried to support
exclusion constraints, but it didn't make it into the final version. I'm not sure what needs to be
done there. But one thing that seems tricky is that the *index* doesn't know about the exclusion
rules; it's all in pg_constraint.

Yours,

--
Paul ~{:-)
p...@illuminatedcomputing.com

Re: Potential ABI breakage in upcoming minor releases

On 2024-Nov-14, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2024-Nov-14, Christoph Berg wrote:
> >> It didn't actually crash, true.
> 
> > But timescale did crash:
> 
> So what is timescale doing differently?

No idea.  Maybe a stacktrace from a core would tell us more; but the
jenkins task doesn't seem to have server logs or anything else to gives
us more clues.

There are three makeNode(ResultRelInfo), but they don't look anything
out of the ordinary:

C perhan: timescaledb 0$ git grep -C5 makeNode.ResultRelInfo
src/copy.c-  *
src/copy.c-  * WARNING. The dummy rangetable index is decremented by 1 
(unchecked)
src/copy.c-  * inside `ExecConstraints` so unless you want to have a overflow, 
keep it
src/copy.c-  * above zero. See `rt_fetch` in parsetree.h.
src/copy.c-  */
src/copy.c: resultRelInfo = makeNode(ResultRelInfo);
src/copy.c-
src/copy.c-#if PG16_LT
src/copy.c- ExecInitRangeTable(estate, pstate->p_rtable);
src/copy.c-#else
src/copy.c- Assert(pstate->p_rteperminfos != NULL);
--
src/nodes/chunk_dispatch/chunk_insert_state.c-create_chunk_result_relation_info(const
 ChunkDispatch *dispatch, Relation rel)
src/nodes/chunk_dispatch/chunk_insert_state.c-{
src/nodes/chunk_dispatch/chunk_insert_state.c-  ResultRelInfo *rri;
src/nodes/chunk_dispatch/chunk_insert_state.c-  ResultRelInfo *rri_orig = 
dispatch->hypertable_result_rel_info;
src/nodes/chunk_dispatch/chunk_insert_state.c-  Index hyper_rti = 
rri_orig->ri_RangeTableIndex;
src/nodes/chunk_dispatch/chunk_insert_state.c:  rri = makeNode(ResultRelInfo);
src/nodes/chunk_dispatch/chunk_insert_state.c-
src/nodes/chunk_dispatch/chunk_insert_state.c-  InitResultRelInfo(rri, rel, 
hyper_rti, NULL, dispatch->estate->es_instrument);
src/nodes/chunk_dispatch/chunk_insert_state.c-
src/nodes/chunk_dispatch/chunk_insert_state.c-  /* Copy options from the main 
table's (hypertable's) result relation info */
src/nodes/chunk_dispatch/chunk_insert_state.c-  rri->ri_WithCheckOptions = 
rri_orig->ri_WithCheckOptions;
--
src/ts_catalog/catalog.c-extern TSDLLEXPORT ResultRelInfo *
src/ts_catalog/catalog.c-ts_catalog_open_indexes(Relation heapRel)
src/ts_catalog/catalog.c-{
src/ts_catalog/catalog.c-   ResultRelInfo *resultRelInfo;
src/ts_catalog/catalog.c-
src/ts_catalog/catalog.c:   resultRelInfo = makeNode(ResultRelInfo);
src/ts_catalog/catalog.c-   resultRelInfo->ri_RangeTableIndex = 0; /* dummy */
src/ts_catalog/catalog.c-   resultRelInfo->ri_RelationDesc = heapRel;
src/ts_catalog/catalog.c-   resultRelInfo->ri_TrigDesc = NULL; /* we don't fire 
triggers */
src/ts_catalog/catalog.c-
src/ts_catalog/catalog.c-   ExecOpenIndices(resultRelInfo, false);


I didn't catch any other allocations that would depend on the size of
ResultRelInfo in obvious ways.



Oh, hmm, there's this bit which uses struct assignment into a stack
element:

tsl/src/compression/compression.c-1559- if 
(decompressor->indexstate->ri_NumIndices > 0)
tsl/src/compression/compression.c-1560- {
tsl/src/compression/compression.c:1561: ResultRelInfo indexstate_copy = 
*decompressor->indexstate;
tsl/src/compression/compression.c-1562- Relation single_index_relation;

I find this rather suspect.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La espina, desde que nace, ya pincha" (Proverbio africano)

Re: gamma() and lgamma() functions

2024-11-14 Thread Dean Rasheed

On Thu, 14 Nov 2024 at 16:28, Dmitry Koval  wrote:
>
> I think having gamma() and lgamma() functions in PostgreSQL would be
> useful for some users, this is asked [1].
>

Thanks for looking.

>  >SELECT gamma(float8 '1e-320');
> ERROR:  value out of range: overflow
>
>  >SELECT gamma(float8 '0');
>gamma
> --
>   Infinity
> (1 row)
>

That's correct since gamma(1e-320) is roughly 1e320, which overflows
the double precision type, but it's not actually infinite, whereas
gamma(0) is infinite.

> Perhaps it would be logical if the behavior in these cases was the same
> (either ERROR or 'Infinity')?
>

In general, I think that having gamma() throw overflow errors is
useful for spotting real problems in user code. In my experience, it's
uncommon to pass negative or even close-to-zero inputs to gamma(), but
it is common to fail to appreciate just how quickly the result grows
for positive inputs (it overflows for inputs just over 171.6), so it
seems better to be made aware of such problems (which might be solved
by using lgamma() instead).

So I think throwing overflow errors is the most useful thing to do
from a practical point of view, and if we do that for too-large
inputs, it makes sense to do the same for too-close-to-zero inputs.

OTOH, gamma(+/-0) = +/-Infinity is right from a mathematical point of
view, and consistent with the POSIX spec, and it's also consistent
with the functions cot() and cotd().

Regards,
Dean

Re: Potential ABI breakage in upcoming minor releases

On Thu, Nov 14, 2024 at 05:29:18PM +0100, Christoph Berg wrote:
> Re: Noah Misch
> > Based on a grep of PGXN code, here are some or all of the modules that react
> > to sizeof(ResultRelInfo):
> > 
> > $ grepx -r 'lloc.*ResultRelInfo' | tee /tmp/1 | sed 's/-[^:]*/:/'|sort -u
> > apacheage::resultRelInfo = palloc(sizeof(ResultRelInfo));
> 
> Confirmed, crashing: AGE for 14..17 (12..13 seem fine)

Can you share more about the crash, perhaps the following?

- stack trace
- any server messages just before the crash
- which architecture (32-bit x86, 64-bit ARM, etc.)
- asserts enabled, or not?

It's not immediately to clear to me why this would crash in a non-asserts
build.  palloc issues a 512-byte chunk for sizeof(ResultRelInfo)==368 on v16,
so I expect no actual writing past the end of the chunk.  I don't see
apacheage allocating a ResultRelInfo other than via one palloc per
ResultRelInfo (no arrays of them, no stack-allocated ResultRelInfo).  I'll
also work on installing apacheage to get those answers locally.

Re: Potential ABI breakage in upcoming minor releases

Noah Misch  writes:
> It's not immediately to clear to me why this would crash in a non-asserts
> build.  palloc issues a 512-byte chunk for sizeof(ResultRelInfo)==368 on v16,
> so I expect no actual writing past the end of the chunk.

I'm confused too.  The allocation should be big enough.  The other
hazard would be failing to initialize the field, but if the extension
uses InitResultRelInfo then that's taken care of.

regards, tom lane

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-11-14 Thread Michail Nikolaev

Hello, everyone.

Rebased on master.

>
From 9c85b499793e9a4bcbb55cc5d78cfeacab368b58 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Thu, 14 Nov 2024 22:36:26 +0100
Subject: [PATCH v4 2/4] Modify the infer_arbiter_indexes function to consider 
 both indisvalid and indisready indexes. Ensure that at least one indisvalid 
 index is still required.

The change ensures that all concurrent transactions utilize the same set of indexes as arbiters. This uniformity is required to avoid conditions that could lead to "duplicate key value violates unique constraint" errors during UPSERT operations.

The patch resolves the issues in the following specs:
* reindex_concurrently_upsert
* index_concurrently_upsert
* index_concurrently_upsert_predicate

Despite the patch, the following specs are still affected:
* reindex_concurrently_upsert_partitioned
* reindex_concurrently_upsert_on_constraint
---
 src/backend/optimizer/util/plancat.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 37b0ca2e43..c835813290 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -719,6 +719,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 
 	/* Results */
 	List	   *results = NIL;
+	bool	   foundValid = false;
 
 	/*
 	 * Quickly return NIL for ON CONFLICT DO NOTHING without an inference
@@ -812,7 +813,13 @@ infer_arbiter_indexes(PlannerInfo *root)
 		idxRel = index_open(indexoid, rte->rellockmode);
 		idxForm = idxRel->rd_index;
 
-		if (!idxForm->indisvalid)
+		/*
+		 * We need to consider both indisvalid and indisready indexes because
+		 * them may become indisvalid before execution phase. It is required
+		 * to keep set of indexes used as arbiter to be the same for all
+		 * concurrent transactions.
+		 */
+		if (!idxForm->indisready)
 			goto next;
 
 		/*
@@ -834,10 +841,9 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));
 
 			results = lappend_oid(results, idxForm->indexrelid);
-			list_free(indexList);
+			foundValid |= idxForm->indisvalid;
 			index_close(idxRel, NoLock);
-			table_close(relation, NoLock);
-			return results;
+			break;
 		}
 		else if (indexOidFromConstraint != InvalidOid)
 		{
@@ -938,6 +944,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 			goto next;
 
 		results = lappend_oid(results, idxForm->indexrelid);
+		foundValid |= idxForm->indisvalid;
 next:
 		index_close(idxRel, NoLock);
 	}
@@ -945,7 +952,8 @@ next:
 	list_free(indexList);
 	table_close(relation, NoLock);
 
-	if (results == NIL)
+	/* It is required to have at least one indisvalid index during the planning. */
+	if (results == NIL || !foundValid)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
  errmsg("there is no unique or exclusion constraint matching the ON CONFLICT specification")));
-- 
2.43.0

From 087a6ebaf196487668afef1387ce843de54d5a2d Mon Sep 17 00:00:00 2001
From: nkey 
Date: Thu, 14 Nov 2024 22:41:00 +0100
Subject: [PATCH v4 4/4] Modify the ExecInitPartitionInfo function to consider 
 partitioned indexes that are potentially processed by REINDEX CONCURRENTLY as
  arbiters as well.

This is necessary to ensure that all concurrent transactions use the same set of arbiter indexes.

The patch resolves the issues in the following specs:
* reindex_concurrently_upsert_partitioned
---
 src/backend/executor/execPartition.c | 119 ---
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..a41d5f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -483,6 +483,48 @@ ExecFindPartition(ModifyTableState *mtstate,
 	return rri;
 }
 
+/*
+ * IsIndexCompatibleAsArbiter
+ * 		Checks if the indexes are identical in terms of being used
+ * 		as arbiters for the INSERT ON CONFLICT operation by comparing
+ * 		them to the provided arbiter index.
+ *
+ * Returns the true if indexes are compatible.
+ */
+static bool
+IsIndexCompatibleAsArbiter(Relation	arbiterIndexRelation,
+		   IndexInfo  *arbiterIndexInfo,
+		   Relation	indexRelation,
+		   IndexInfo  *indexInfo)
+{
+	int i;
+
+	if (arbiterIndexInfo->ii_Unique != indexInfo->ii_Unique)
+		return false;
+	/* it is not supported for cases of exclusion constraints. */
+	if (arbiterIndexInfo->ii_ExclusionOps != NULL || indexInfo->ii_ExclusionOps != NULL)
+		return false;
+	if (arbiterIndexRelation->rd_index->indnkeyatts != indexRelation->rd_index->indnkeyatts)
+		return false;
+
+	for (i = 0; i < indexRelation->rd_index->indnkeyatts; i++)
+	{
+		int			arbiterAttoNo = arbiterIndexRelation->rd_index->indkey.values[i];
+		int			attoNo = indexRelation->rd_index->indkey.values[i];
+		if (arbiterAttoNo != attoNo)
+			return false;
+	}
+
+	if (list_difference(RelationGetIndexExpre

Re: 039_end_of_wal: error in "xl_tot_len zero" test

2024-11-14 Thread Thomas Munro

On Fri, Nov 15, 2024 at 4:54 AM Christoph Berg  wrote:
> postgresql 13.17, Debian bullseye, amd64:
>
> t/039_end_of_wal.pl .. Dubious, test returned 2 (wstat 512, 
> 0x200)

This seems to be the interesting bit:

 build/src/test/recovery/tmp_check/log/regress_log_039_end_of_wal

No such file or directory at
/home/myon/projects/postgresql/debian/13/build/../src/test/perl/TestLib.pm
line 655.

I assume that must be coming from:

my @scan_result = scan_server_header('access/xlog_internal.h',

... which reaches:

open my $header_h, '<', "$stdout/$header_path" or die "$!"; <-- line 655

Not sure yet what is different in this environment or why you're
suddenly noticing on 13.17.  The logic has been there since 13.13 (ie
it was backpatched then).

Re: Potential ABI breakage in upcoming minor releases

On Thu, Nov 14, 2024 at 09:33:32PM +0100, Alvaro Herrera wrote:
> On 2024-Nov-14, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > But timescale did crash:
> > 
> > So what is timescale doing differently?

> src/ts_catalog/catalog.c-extern TSDLLEXPORT ResultRelInfo *
> src/ts_catalog/catalog.c-ts_catalog_open_indexes(Relation heapRel)
> src/ts_catalog/catalog.c-{
> src/ts_catalog/catalog.c-   ResultRelInfo *resultRelInfo;
> src/ts_catalog/catalog.c-
> src/ts_catalog/catalog.c:   resultRelInfo = makeNode(ResultRelInfo);
> src/ts_catalog/catalog.c-   resultRelInfo->ri_RangeTableIndex = 0; /* dummy */
> src/ts_catalog/catalog.c-   resultRelInfo->ri_RelationDesc = heapRel;
> src/ts_catalog/catalog.c-   resultRelInfo->ri_TrigDesc = NULL; /* we don't 
> fire triggers */
> src/ts_catalog/catalog.c-
> src/ts_catalog/catalog.c-   ExecOpenIndices(resultRelInfo, false);

This is a copy of PostgreSQL's CatalogOpenIndexes().  Assuming timescaledb
uses it like PostgreSQL uses CatalogOpenIndexes(), it's fine.  Specifically,
CatalogOpenIndexes() is fine since PostgreSQL does not pass the ResultRelInfo
to ModifyTable.

> Oh, hmm, there's this bit which uses struct assignment into a stack
> element:

Yes, stack allocation of a ResultRelInfo sounds relevant to the crash.  It
qualifies as a "very unusual code pattern" in the postgr.es/c/e54a42a sense.

I'm hearing the only confirmed impact on non-assert builds is the need to
recompile timescaledb.  (It's unknown whether recompiling will suffice for
timescaledb.  For assert builds, six PGXN extensions need recompilation.)  I
don't see us issuing another set of back branch releases for the purpose of
making a v16.4-built timescaledb avoid a rebuild.  The target audience would
be someone who can get a new PostgreSQL build but can't get a new timescaledb
build.  That feels like a small audience.  What's missing in that analysis?

Going forward, for struct field additions, I plan to make my own patches more
ABI-breakage-averse than the postgr.es/c/e54a42a standard.

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Christoph Berg

Re: Noah Misch
> The non-asserts build passed.  The asserts build failed with "+WARNING:
> problem in alloc set ExecutorState: detected write past chunk" throughout the
> diffs, but there were no crashes.  (Note that AGE "make installcheck" creates
> a temporary installation, unlike PostgreSQL "make installcheck".)  What might
> differ in how you tested?

The builds for sid (Debian unstable) are cassert-enabled and there the
tests threw a lot of these errors.

It didn't actually crash, true.

15:25:47  SELECT * FROM cypher('cypher_index', $$ MATCH(n) DETACH DELETE n $$) 
AS (a agtype);
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d8ad710, chunk 0x55993d8ae320
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d8ad710, chunk 0x55993d8aeab0
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d88b730, chunk 0x55993d88d348
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d8ad710, chunk 0x55993d8ae320
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d8ad710, chunk 0x55993d8aeab0
15:25:47 +WARNING:  problem in alloc set ExecutorState: detected write past 
chunk end in block 0x55993d88b730, chunk 0x55993d88d348
15:25:47   a
15:25:47  ---
15:25:47  (0 rows)

https://pgdgbuild.dus.dg-i.net/job/postgresql-16-age-autopkgtest/17/architecture=amd64,distribution=sid/consoleFull

I did not test other distribution releases because the test run
stopped after this touchstone test.

Christoph

Re: Potential ABI breakage in upcoming minor releases

On 2024-Nov-14, Christoph Berg wrote:

> It didn't actually crash, true.

But timescale did crash:

https://pgdgbuild.dus.dg-i.net/job/timescaledb-autopkgtest/9/architecture=amd64,distribution=sid/console

11:59:51 @@ -34,6 +40,7 @@
11:59:51  CREATE INDEX ON _hyper_1_3_chunk(temp2);
11:59:51  -- INSERT only will not trigger vacuum on indexes for PG13.3+
11:59:51  UPDATE vacuum_test SET time = time + '1s'::interval, temp1 = 
random(), temp2 = random();
11:59:51 --- we should see two parallel workers for each chunk
11:59:51 -VACUUM (PARALLEL 3) vacuum_test;
11:59:51 -DROP TABLE vacuum_test;
11:59:51 +server closed the connection unexpectedly
11:59:51 +  This probably means the server terminated abnormally
11:59:51 +  before or while processing the request.
11:59:51 +connection to server was lost

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)

Re: Potential ABI breakage in upcoming minor releases

Christoph Berg  writes:
> Re: Noah Misch
>> The non-asserts build passed.  The asserts build failed with "+WARNING:
>> problem in alloc set ExecutorState: detected write past chunk" throughout the
>> diffs, but there were no crashes.  (Note that AGE "make installcheck" creates
>> a temporary installation, unlike PostgreSQL "make installcheck".)  What might
>> differ in how you tested?

> The builds for sid (Debian unstable) are cassert-enabled and there the
> tests threw a lot of these errors.
> It didn't actually crash, true.

Ah.  So perhaps this is a non-issue for production (non-assert)
builds?  That would change the urgency materially, IMO.

regards, tom lane

Re: Skip collecting decoded changes of already-aborted transactions

2024-11-14 Thread Masahiko Sawada

On Wed, Nov 13, 2024 at 8:23 PM Peter Smith  wrote:
>
> Hi Sawda-San,
>
> Here are some more review comments for the latest (accidentally called
> v6 again?) v6-0001 patch.

Thank you for reviewing the patch! Indeed, the previous version should
have been v7.

>
> ==
> contrib/test_decoding/sql/stats.sql
>
> 1.
> +-- Execute a transaction that is prepared and aborted. We detect that the
> +-- transaction is aborted before spilling changes, and then skip collecting
> +-- further changes.
>
> You had replied (referring to the above comment):
> I think we already mentioned the transaction is going to be spilled
> but actually not.
>
> ~
>
> Yes, spilling was already mentioned in the current comment but I felt
> it assumes the reader is expected to know details of why it was going
> to be spilled in the first place.

TBH we expect the reader, typically patch authors and reviewers, to
know it.ats1', NULL, NULL, 'skip-empty-xacts', '1');

> In other words, I thought the comment could include a bit more
> explanatory background info:
> (Also, it's not really "we detect" the abort -- it's the new postgres
> code of this patch that detects it.)
>
> SUGGESTION:
> Execute a transaction that is prepared but then aborted. The INSERT
> data exceeds the 'logical_decoding_work_mem limit' limit which
> normally would result in the transaction being spilled to disk, but
> now when Postgres detects the abort it skips the spilling and also
> skips collecting further changes.

But I'm concerned this explanation might be too detailed, and feel odd
to put this comment for the new added tests even though we're doing
similar tests in the same file. For instance, we have:

-- spilling the xact
BEGIN;
INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM
generate_series(1, 5000) g(i);
COMMIT;
SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_st

How about rewording it to the following? I think it's better to
explain why we use a prepared transaction here:

+-- The INSERT changes are large enough to be spilled but not, because the
+-- transaction is aborted. The logical decoding skips collecting further
+-- changes too. The transaction is prepared to make sure the decoding processes
+-- the aborted transaction.

>
> ~~~
>
> 2.
> +-- Check if the transaction is not spilled as it's already aborted.
> +SELECT count(*) FROM
> pg_logical_slot_get_changes('regression_slot_stats4_twophase', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +SELECT pg_stat_force_next_flush();
> +SELECT slot_name, spill_txns, spill_count FROM
> pg_stat_replication_slots WHERE slot_name =
> 'regression_slot_stats4_twophase';
> +
>
> /Check if the transaction is not spilled/Verify that the transaction
> was not spilled/

How about "Verify that the decoding doesn't spill already-aborted
transaction's changes."?

>
> ==
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferResetTXN:
>
> 3.
>   /* Discard the changes that we just streamed */
> - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
>
> Looking at the calling code for ReorderBufferResetTXN it seems this
> function can called for streaming OR prepared. So is it OK here to be
> passing hardwired 'true' as the txn_streaming parameter, or should
> that be passing rbtxn_is_streamed(txn)?

I think it should pass 'true' because otherwise the transaction won't
be marked as streamed.

After more thoughts, I think the name of txn_streaming is the source
of confusion. The flag is actually used to decide whether or not the
given transaction can be marked as streamed, but should not indicate
whether the transaction is being streamed because this function can be
called while streaming. So I renamed it to 'mark_txn_streaming' and
updated the comment.

>
> ~~~
>
> ReorderBufferLargestStreamableTopTXN:
>
> 4.
>   if ((largest == NULL || txn->total_size > largest_size) &&
>   (txn->total_size > 0) && !(rbtxn_has_partial_change(txn)) &&
> - rbtxn_has_streamable_change(txn))
> + rbtxn_has_streamable_change(txn) && !(rbtxn_is_aborted(txn)))
>   {
>   largest = txn;
>   largest_size = txn->total_size;
>
> I felt that this increasingly complicated code would be a lot easier
> to understand if you just separate the conditions into: (a) the ones
> that filter out transaction you don't care about; (b) the ones that
> check for the largest size. For example,
>
> SUGGESTION:
> dlist_foreach(...)
> {
>   ...
>
>   /* Don't consider these kinds of transactions for eviction. */
>   if (rbtxn_has_partial_change(txn) ||
> !rbtxn_has_streamable_change(txn) || rbtxn_is_aborted(txn))
> continue;
>
>   /* Find the largest of the eviction candidates. */
>   if ((largest == NULL || txn->total_size > largest_size) &&
> (txn->total_size > 0))
>   {
> largest = txn;
> largest_size = txn->total_size;
>   }
> }

I like this idea.

>
> ~~~
>
> ReorderBufferCheckMemoryLimit:
>
> 5.
> + /* skip the transaction if al

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-14 Thread Masahiko Sawada

On Wed, Nov 13, 2024 at 11:19 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In <20241105.174328.1705956947135248653@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 05 Nov 2024 17:43:28 +0900 (JST),
>   Sutou Kouhei  wrote:
>
> >> I've further investigated the performance regression, and found out it
> >> might be relevant that the compiler doesn't inline the
> >> CopyFromTextLikeOneRow() function. It might be worth testing with
> >> pg_attribute_always_inline instead of 'inline' as below:
> >
> > Wow! Good catch!
> >
> > I've rebased on the current master and updated the v20 and
> > v21 patch sets with "pg_attribute_always_inline" not
> > "inline".
> >
> > The v22 patch set is for the v20 patch set.
> > (TO/FROM changes are in one commit.)
> >
> > The v23 patch set is for the v21 patch set.
> > (TO/FROM changes are separated for easy to merge only FROM
> > or TO part.)
>
> I've run benchmark on my machine that has "Intel(R) Core(TM)
> i7-3770 CPU @ 3.40GHz".
>
> Summary:
> * "pg_attribute_always_inline" is effective for the "COPY
>   FROM" part
> * "pg_attribute_always_inline" may not be needed for the
>   "COPY TO" part
>
>
> v20-result.pdf: This is the same result PDF attached in
> https://www.postgresql.org/message-id/20241008.173918.995935870630354246.kou%40clear-code.com
> . This is the base line for "pg_attribute_always_inline"
> change.
>
> v22-result.pdf: This is a new result PDF for the v22 patch
> set.
>
> COPY FROM:
>
> 0001: The v22 patch set is slower than HEAD. This just
> introduces "routine" abstraction. It increases overhead. So
> this is expected.
>
> 0002-0005: The v22 patch set is faster than HEAD for all
> cases. The v20 patch set is slower than HEAD for smaller
> data. This shows that "pg_attribute_always_inline" for the
> "COPY FROM" part is effective on my machine too.
>
>
> COPY TO:
>
> 0001: The v22 patch set is slower than HEAD. This is
> as expected for the same reason as COPY FROM.
>
> 0002-0004: The v22 patch set is slower than HEAD. (The v20
> patch set is faster than HEAD.) This is not expected.
>
> 0005: The v22 patch set is faster than HEAD. This is
> expected. But 0005 just exports some functions. It doesn't
> change existing logic. So it's strange...
>
> This shows "pg_attribute_always_inline" is needless for the
> "COPY TO" part.
>
>
> I also tried the v24 patch set:
> * The "COPY FROM" part is same as the v22 patch set
>   ("pg_attribute_always_inline" is used.)
> * The "COPY TO" part is same as the v20 patch set
>   ("pg_attribute_always_inline" is NOT used.)
>
>
> (I think that the v24 patch set isn't useful for others. So
> I don't share it here. If you're interested in it, I'll
> share it here.)
>
> v24-result.pdf:
>
> COPY FROM: The same trend as the v22 patch set result. It's
> expected because the "COPY FROM" part is the same as the v22
> patch set.
>
> COPY TO: The v24 patch set is faster than the v22 patch set
> but the v24 patch set isn't same trend as the v20 patch
> set. This is not expected because the "COPY TO" part is the
> same as the v20 patch set.
>
>
> Anyway, the 0005 "COPY TO" parts are always faster than
> HEAD. So we can use either "pg_attribute_always_inline" or
> "inline".
>
>
> Summary:
> * "pg_attribute_always_inline" is effective for the "COPY
>   FROM" part
> * "pg_attribute_always_inline" may not be needed for the
>   "COPY TO" part
>
>
> Can we proceed this proposal with these results? Or should
> we wait for more benchmark results?

Thank you for sharing the benchmark test results! I think these
results are good for us to proceed. I'll closely look at COPY TO
results and review v22 patch sets.

Regards,


-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Gregory Smith

On Thu, Nov 14, 2024 at 5:41 PM Noah Misch  wrote:

> I'm hearing the only confirmed impact on non-assert builds is the need to
> recompile timescaledb.  (It's unknown whether recompiling will suffice for
> timescaledb.  For assert builds, six PGXN extensions need recompilation.)
>

That matches what our build and test teams are seeing.

We dug into the two lines of impacted Citus code, they are just touching
columnar metadata.  We dragged Marco into a late night session to double
check that with the Citus columnar regression tests and look for red flags
in the code.  In an assert Citus built against 16.4 running against
PostgreSQL 16.5, he hit the assert warnings, but the tests pass and there's
no signs or suspicion of a functional impact:

CREATE TABLE columnar_table_1 (a int) USING columnar;
INSERT INTO columnar_table_1 VALUES (1);
+WARNING:  problem in alloc set Stripe Write Memory Context: detected write
past chunk end in block 0x563ee43a4f10, chunk 0x563ee43a6240
+WARNING:  problem in alloc set Stripe Write Memory Context: detected write
past chunk end in block 0x563ee4369bb0, chunk 0x563ee436acb0
+WARNING:  problem in alloc set Stripe Write Memory Context: detected write
past chunk end in block 0x563ee4369bb0, chunk 0x563ee436b3c8

Thanks to everyone who's jumped in to investigate here.  With the PL/Perl
CVE at an 8.8, sorting out how to get that fix to everyone and navigate the
breakage is very important.

--
Greg Smith, Crunchy Data
Director of Open Source Strategy

RE: Improve the error message for logical replication of regular column to generated column.

2024-11-14 Thread Hayato Kuroda (Fujitsu)

Dear Shubham,

Thanks for creating a patch! I checked yours and I have comments.

01.
```
+   StringInfoData gencolsattsbuf;
+   int generatedatts = 0;
+
+   initStringInfo(&gencolsattsbuf);
```

gencolsattsbuf is initialized at the beginning but won't be free'd.

But I prefer the Peter's suggestion - you can combine the reporting stuff to
logicalrep_report_missing_attrs and rename the function. This is clearer than
directly adding declarations and ereport() in logicalrep_rel_open().

02.

```
+   /*
+* Check if the subscription table 
generated column has
+* same name as a non-generated column 
in the
+* corresponding publication table.
+*/
```

I don't think this comment is correct. The error can be reported even when
both publisher and subscriber has the generated column, right?
Also, I feel comments can be located atop "if".

03.
I feel if you combine the reporting stuff with logicalrep_report_missing_attrs, 
some
of changes are not needed anymore. You can just add comment in 
logicalrep_rel_open
and modify the message in logicalrep_report_missing_attrs.


[1]: 
https://www.postgresql.org/message-id/CAHut%2BPumbPEqk6v2XVjT7vKWKzQNBjMHXByWJ5%3DFmjEfk1v_pQ%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: altering a column's collation leaves an invalid foreign key

On 14.11.24 03:21, jian he wrote:

On Wed, Nov 13, 2024 at 8:56 PM jian he  wrote:

https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action
mentioned about the difference between "no action" and "restrict".

RI_FKey_restrict_upd comments also says:

  * The SQL standard intends that this referential action occur exactly when
  * the update is performed, rather than after.  This appears to be
  * the only difference between "NO ACTION" and "RESTRICT".  In Postgres
  * we still implement this as an AFTER trigger, but it's non-deferrable.

DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE case_insensitive PRIMARY KEY);
CREATE TABLE fktable (x text collate case_insensitive REFERENCES
pktable on update restrict on delete restrict);
INSERT INTO pktable VALUES ('A'), ('Å');
INSERT INTO fktable VALUES ('a');
update pktable set x = 'a' where x = 'A';

my previous email was too verbose.
my point is this: given PK, FK both are case_insensitive. and
PK side values are all being referenced.

case like `update pktable set x = 'a' where x = 'A';`
Do we consider PK side value changed?
it seems not mentioned anywhere.

I think the PostgreSQL documentation is selling the difference between 
NO ACTION and RESTRICT a bit short.  The SQL standard says this:

> ON UPDATE RESTRICT: any change to a referenced column in the
> referenced table is prohibited if there is a matching row.

and

> If a non-null value of a referenced column RC in the referenced table
> is updated to a value that is distinct from the current value of RC,
> then, [...] If UR specifies RESTRICT and there exists some matching
> row, then an exception con- dition is raised [...]

So this is exactly what is happening here.

You can also reproduce this with things that are not strings with 
collations.  You just need to find a type that has values that are 
"equal" but "distinct", which is not common, but it exists, for example 
0.0 and -0.0 in floats.  Example:

create table parent (val float8 primary key);
insert into parent values ('0.0');

create table child (id int, val float8 references parent (val));

insert into child values (1, '0.0');
insert into child values (2, '-0.0');

update parent set val = '-0.0';  -- ok with NO ACTION

but

create table child (id int, val float8 references parent (val) on 
update restrict);

insert into child values (1, '0.0');
insert into child values (2, '-0.0');

update parent set val = '-0.0';  -- error with RESTRICT

So this is a meaningful difference.

There is also a bug here in that the update in the case of NO ACTION 
doesn't actually run, because it thinks the values are the same and the 
update can be skipped.

I think there is room for improvement here, in the documentation, the 
tests, and maybe in the code.  And while these are thematically related 
to this thread, they are actually separate issues.

I propose that I go ahead with committing the v7 patch (with your typo 
fixes) and then we continue discussing these other issues afterwards in 
a separate thread.

Given this overall picture, I'm also less and less inclined to do any 
backpatching bug fixing here.  The status of this feature combination in 
the backbranches is just "don't push it too hard".  Maybe someone has 
some feasible mitigation ideas, but I haven't seen any yet.

Re: 2024-11-14 release announcement draft

On Tue, Nov 12, 2024 at 12:28 PM Jonathan S. Katz  wrote:
>
> Hi,
>
> Please review the draft of the release announcement for the 2024-11-14
> release. Please provide any feedback by 2024-11-14 12:00 UTC.
>

someone reminded me.
i am wondering do we need include the following two items:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
https://git.postgresql.org/cgit/postgresql.git/commit/?id=90fe6251c816547b9cb4e1895dd8e6620bae94e1

Re: BF mamba failure

On Thu, Oct 17, 2024 at 01:24:50PM +0900, Michael Paquier wrote:
>> Yeah, FWIW, we would be going from 32 bytes:
>>/* total size (bytes):   32 */
>> 
>> to 40 bytes (with the patch applied):
>>/* total size (bytes):   40 */
>> 
>> Due to the padding, that's a 8 bytes increase while we're adding "only" 4 
>> bytes.
> 
> I have spent some time digging into all the original pgstat threads 
> dealing with the shmem implementation or dshash, and I'm not really
> seeing anybody stressing on the size of the individual hash entries.
> This stuff was already wasting space with padding originally, so
> perhaps we're stressing too much here?  Would anybody like to comment?

I've been going through this patch again, and the failures that could
be seen because of such failures, like standbys failing in an
unpredictible way is not cool, so I am planning to apply the attached
down to 15 now that the release has colled down.  At the end I am not
really stressing about this addition in the structure for the sake of
making the stats entries safe to handle.

I did not like much the name "agecount" though, used to cross-check
how many times an entry is reused in shared memory and in the local
copy we keep in a process, so I've renamed it to "generation".
 --
Michael
From abd8b567f3cd97ca4e0282ce7657f64807f8c008 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 14 Nov 2024 16:53:58 +0900
Subject: [PATCH v2] Introduce "generation" counter for pgstats entries

This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid
when it gets reused.  This can happen for the following case, causing
the same hash key to be used:
- replication slots that compute internally an index number.
- other object types, with a wraparound involved.
- As of PostgreSQL 18, custom pgstats kinds.

The "generation" is tracked in the shared entries, initialized at 0 when
an entry is created and incremented when the same entry is reused, to
avoid concurrent turbulences on drop because of other backends still
holding a reference to it.
---
 src/include/utils/pgstat_internal.h   | 14 
 src/backend/utils/activity/pgstat_shmem.c | 40 ---
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 61b2e1f96b..45f5901d45 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -94,6 +94,14 @@ typedef struct PgStatShared_HashEntry
 	 */
 	pg_atomic_uint32 refcount;
 
+	/*
+	 * Counter tracking the number of times the entry has been reused.
+	 *
+	 * Set to 0 when the entry is created, and incremented by one each time
+	 * the shared entry is reinitialized with pgstat_reinit_entry().
+	 */
+	pg_atomic_uint32 generation;
+
 	/*
 	 * Pointer to shared stats. The stats entry always starts with
 	 * PgStatShared_Common, embedded in a larger struct containing the
@@ -133,6 +141,12 @@ typedef struct PgStat_EntryRef
 	 */
 	PgStatShared_Common *shared_stats;
 
+	/*
+	 * Copy of PgStatShared_HashEntry->generation, keeping locally track of
+	 * the shared stats entry "generation" retrieved (number of times reused).
+	 */
+	uint32		generation;
+
 	/*
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1b7ff76b1..c8ee0bd0b0 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -304,6 +304,11 @@ pgstat_init_entry(PgStat_Kind kind,
 	 * further if a longer lived reference is needed.
 	 */
 	pg_atomic_init_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Initialize generation to 0, as freshly initialized.
+	 */
+	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
 
 	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
@@ -327,6 +332,12 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
 
 	/* mark as not dropped anymore */
 	pg_atomic_fetch_add_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Increment generation, to let any backend with local references know
+	 * that what they point to is outdated.
+	 */
+	pg_atomic_fetch_add_u32(&shhashent->generation, 1);
 	shhashent->dropped = false;
 
 	/* reinitialize content */
@@ -367,6 +378,7 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref,
 
 	entry_ref->shared_stats = shheader;
 	entry_ref->shared_entry = shhashent;
+	entry_ref->generation = pg_atomic_read_u32(&shhashent->generation);
 }
 
 /*
@@ -532,7 +544,8 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 			 * case are replication slot stats, where a new slot can be
 			 * created with the same index just after dropping. But oi

Re: doc fail about ALTER TABLE ATTACH re. NO INHERIT

2024-11-14 Thread Amit Langote

On Wed, Nov 13, 2024 at 10:49 PM Alvaro Herrera  wrote:
> On 2024-Nov-13, Amit Langote wrote:
>
> > I rebased my patch over 14e87ffa5c54 and noticed that NOT NULL
> > constraints can also (or not) be marked NO INHERIT.  I think we should
> > allow NO INHERIT NOT NULL constraints on leaf partitions just like
> > CHECK constraints, so changed AddRelationNotNullConstraints() that way
> > and added some tests.
>
> OK, looks good.

Thanks.  I'll hold off pushing until we have some clarity on whether
the behavior should be identical for CHECK and NOT NULL constraints.

For now, I have removed the changes in my patch pertaining to NOT NULL
constraints.

> > However, I noticed that there is no clean way to do that for the following 
> > case:
> >
> > ALTER TABLE leaf_partition ADD CONSTRAINT col_nn_ni NOT NULL col NO INHERIT;
>
> Sorry, I didn't understand what's the initial state.  Does the
> constraint already exist here or not?

Sorry, here's the full example.  Note I'd changed both
AddRelationNotNullConstraints() and AdjustNotNullInheritance() to not
throw an error *if* the table is a leaf partition when the NO INHERIT
of an existing constraint doesn't match that of the new constraint.

create table p (a int not null) partition by list (a);
create table p1 partition of p for values in (1);
alter table p1 add constraint a_nn_ni not null a no inherit;

\d+ p1
   Table "public.p1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Partition of: p FOR VALUES IN (1)
Partition constraint: ((a IS NOT NULL) AND (a = 1))
Not-null constraints:
"p_a_not_null" NOT NULL "a" (local, inherited)
Access method: heap

-- noop
alter table p1 add constraint a_nn_ni not null a no inherit;
alter table p1 add constraint a_nn_ni not null a no inherit;

\d+ p1
   Table "public.p1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Partition of: p FOR VALUES IN (1)
Partition constraint: ((a IS NOT NULL) AND (a = 1))
Not-null constraints:
"p_a_not_null" NOT NULL "a" (local, inherited)
Access method: heap

The state of the existing inherited constraint is not valid and the
a_nn_ni is never created.

So, it seems that allowing NO INHERIT NOT NULL constraints of leaf
partitions to be merged with the inherited one is not as
straightforward as it is for CHECK constraints.

> > If I change the new function AdjustNotNullInheritance() added by your
> > commit to not throw an error for leaf partitions, the above constraint
> > (col_nn_ni) is not stored at all, because the function returns true in
> > that case, which means the caller does nothing with the constraint.  I
> > am not sure what the right thing to do would be.  If we return false,
> > the caller will store the constraint the first time around, but
> > repeating the command will again do nothing, not even prevent the
> > addition of a duplicate constraint.
>
> Do you mean if we return false, it allows two not-null constraints for
> the same column?  That would absolutely be a bug.

I don't think there is any such bug at the moment, because if
AdjustNotNullInheritance() finds an existing constraint, it always
returns true provided the new constraint does not cause the error due
to its NO INHERIT property not matching the existing one's.

> I think:
> * if a leaf partition already has an inherited not-null constraint
>   from its parent and we want to add another one, we should:
>   - if the one being added is NO INHERIT, then throw an error, because
> we cannot merge them

Hmm, we'll be doing something else for CHECK constraints if we apply my patch:

create table p (a int not null, constraint check_a check (a > 0))
partition by list (a);
create table p1 partition of p (constraint check_a check (a > 0) no
inherit) for values in (1);
NOTICE:  merging constraint "check_a" with inherited definition

\d p1
 Table "public.p1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
Partition of: p FOR VALUES IN (1)
Check constraints:
"check_a" CHECK (a > 0) NO INHERIT

I thought we were fine with allowing merging of such child
constraints, because leaf partitions can't have children to pass them
onto, so the NO INHERIT clause is essentially pointless.

>   - if the one being added is not NO INHERIT, then both constraints
> would have the same state and so we silently do nothing.

Maybe we should emit some kind of N

Re: On non-Windows, hard depend on uselocale(3)

2024-11-14 Thread Heikki Linnakangas


On 14/11/2024 09:48, Thomas Munro wrote:

On Thu, Aug 29, 2024 at 6:50 AM Peter Eisentraut  wrote:

The error handling with pg_ensure_c_locale(), that's the sort of thing
I'm afraid will be hard to test or even know how it will behave.  And it
creates this weird coupling between pgtypeslib and ecpglib that you
mentioned earlier.  And if there are other users of PG_C_LOCALE in the
future, there will be similar questions about the proper initialization
and error handling sequence.


Ack.  The coupling does become at least less weird (currently it must
be capable of giving the wrong answers reliably if called directly I
think, no?) and weaker, but I acknowledge that it's still non-ideal
that out-of-memory would be handled nicely only if you used ecpg
first, and subtle misbehaviour would ensure otherwise, and future
users face the same sort of problem unless they design in a reasonable
initialisation place that could check pg_ensure_c_locale() nicely.
Classic retro-fitting problem, though.


Hmm, so should we add calls to pg_ensure_c_locale() in pgtypeslib too, 
before each call to strtod_l()?


Looking at the callers of strtod() in ecpg, all but one of them actually 
readily convert the returned value to integer, with some multiplication 
or division with constants. It would be nice to replace those with a 
function that would avoid going through double in the first place. That 
still leaves one caller in numericvar_to_double() which really wants a 
double, though


Overall, the patch looks good to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: BF mamba failure

Hi,

On Thu, Nov 14, 2024 at 04:55:23PM +0900, Michael Paquier wrote:
> I've been going through this patch again, and the failures that could
> be seen because of such failures, like standbys failing in an
> unpredictible way is not cool, so I am planning to apply the attached
> down to 15 now that the release has colled down.  At the end I am not
> really stressing about this addition in the structure for the sake of
> making the stats entries safe to handle.

I don't think the addition is a concern too.

> I did not like much the name "agecount" though, used to cross-check
> how many times an entry is reused in shared memory and in the local
> copy we keep in a process, so I've renamed it to "generation".

"generation" sounds more appropriate to me too.

The approach to avoid the error sounds reasonable to me.

Just 2 comments about the patch:

=== 1

Maybe use "generation" instead of generation in the comments (where it's not 
done,
some comments do it already).

=== 2

We could think about adding a test. That should be doable with replication slots
or custom pgstats kinds and probably injection points. But I'm not sure that's
worth it, as the code in the patch looks "simple" enough. Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: define pg_structiszero(addr, s, r)

Hi,

On Thu, Nov 14, 2024 at 09:27:06AM +0900, Michael Paquier wrote:
> Makes sense to me to just do that, with a first < 8 loop, and a second
> for the 8~63 range. 

Thanks for looking at it! 

> There is also a "cant'" in the last size_t check.  Simple typo.

Please find attached v12, with more comments and comments changes to explain
the multiple cases (for safety) and phases (for efficiency).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 833f4c223717ce71c5b59d27ce54383b4fc237ce Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 8 Nov 2024 14:19:59 +0900
Subject: [PATCH v12 1/2] Optimize pg_memory_is_all_zeros()

pg_memory_is_all_zeros() is currently doing byte per byte comparison and so
could lead to performance regression or penalties when multi bytes comparison
could be done instead.

Let's provide an optimized version that divides the checks into multiple cases
for safety reason and multiple phases for efficiency:

Case 1: len < 8 bytes, then byte per byte comparison.

Case 2: len in the 8-63 bytes range:
  - Phase 1: Byte by byte comparison, until the pointer is aligned.
  - Phase 2: size_t comparisons, with aligned pointers, up to the last
 location possible.
  - Phase 3: Byte by byte comparison, until the end location.

Case 3: len >= 64 bytes, same as case 2 except that an additional phase is
is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t)
comparisons using bitwise OR, to encourage compilers to use SIMD
instructions if available, up to the last aligned location possible.

Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
memory area.

Code mainly suggested by David Rowley.
---
 src/include/utils/memutils.h | 123 ++-
 1 file changed, 120 insertions(+), 3 deletions(-)
 100.0% src/include/utils/

diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 3590c8bad9..0808e91b77 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -190,19 +190,136 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
 #define SLAB_LARGE_BLOCK_SIZE		(8 * 1024 * 1024)
 
 /*
+ * pg_memory_is_all_zeros
+ *
  * Test if a memory region starting at "ptr" and of size "len" is full of
  * zeroes.
+ *
+ * The test is divided into multiple cases for safety reason and multiple phases
+ * for efficiency.
+ *
+ * Case 1: len < 8 bytes, then byte per byte comparison.
+ * Case 2: len in the 8-63 bytes range:
+ *   - Phase 1: Byte by byte comparison, until the pointer is aligned.
+ *   - Phase 2: size_t comparisons, with aligned pointers, up to the last
+ *  location possible.
+ *   - Phase 3: Byte by byte comparison, until the end location.
+ * Case 3: len >= 64 bytes, same as case 2 except that an additional phase is
+ * is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t)
+ * comparisons using bitwise OR, to encourage compilers to use SIMD
+ * instructions if available, up to the last aligned location possible.
+ *
+ * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the
+ * memory area.
+ *
+ * Caller must ensure that "ptr" is not NULL.
  */
 static inline bool
 pg_memory_is_all_zeros(const void *ptr, size_t len)
 {
-	const char *p = (const char *) ptr;
+	const unsigned char *p = (const unsigned char *) ptr;
+	const unsigned char *end = &p[len];
+	const unsigned char *aligned_end = (const unsigned char *)
+		((uintptr_t) end & (~(sizeof(size_t) - 1)));
+
+	/* len < 8 bytes case */
+	if (len < sizeof(size_t))
+	{
+		while (p < end)
+		{
+			if (*p++ != 0)
+return false;
+		}
+		return true;
+	}
+
+	/* len in the 8-63 bytes range case */
+	if (len < sizeof(size_t) * 8)
+	{
+		/* Compare bytes until the pointer "p" is aligned */
+		while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
+		{
+			if (p == end)
+return true;
+			if (*p++ != 0)
+return false;
+		}
+
+		/*
+		 * Compare remaining size_t-aligned chunks.
+		 *
+		 * There is no risk to read beyond the memory area, as aligned_end
+		 * can't be > end as we are in the len 8-63 bytes range case (so len
+		 * >= 8).
+		 */
+		for (; p < aligned_end; p += sizeof(size_t))
+		{
+			if (*(size_t *) p != 0)
+return false;
+		}
+
+		/* Compare remaining bytes until the end */
+		while (p < end)
+		{
+			if (*p++ != 0)
+return false;
+		}
+		return true;
+	}
+
+	/* len >= 64 bytes case */
 
-	for (size_t i = 0; i < len; i++)
+	/* Compare bytes until the pointer "p" is aligned */
+	while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0)
 	{
-		if (p[i] != 0)
+		if (p == end)
+			return true;
+
+		if (*p++ != 0)
+			return false;
+	}
+
+	/*
+	 * Compare 8 * sizeof(size_t) chunks at once.
+	 *
+	 * For performance reasons, we manually unroll this loop and purposefully
+	 * use bitwise-ORs to combine each comparison.  This

Re: define pg_structiszero(addr, s, r)

2024-11-14 Thread Ranier Vilela

Em qui., 14 de nov. de 2024 às 07:09, Bertrand Drouvot <
bertranddrouvot...@gmail.com> escreveu:

> Hi,
>
> On Thu, Nov 14, 2024 at 09:27:06AM +0900, Michael Paquier wrote:
> > Makes sense to me to just do that, with a first < 8 loop, and a second
> > for the 8~63 range.
>
> Thanks for looking at it!
>
> > There is also a "cant'" in the last size_t check.  Simple typo.
>
> Please find attached v12, with more comments and comments changes to
> explain
> the multiple cases (for safety) and phases (for efficiency).
>
Is it worth mentioning that pg_memory_is_all_zeros does not work correctly
on 32-bit systems?

(63 < (size_t) * 8) /* 63 - 32*/

Or do we adjust magic constants according to 32/64 bit?

best regards,
Ranier Vilela

Re: altering a column's collation leaves an invalid foreign key

On Thu, Nov 14, 2024 at 4:04 PM Peter Eisentraut  wrote:

> I propose that I go ahead with committing the v7 patch (with your typo
> fixes) and then we continue discussing these other issues afterwards in
> a separate thread.
>
looks good to me.

but in create_table.sgml

  In addition, when the data in the referenced columns is changed,
  certain actions are performed on the data in this table's
  columns.

I actually want to add a sentence like:

  If the references columns collation is indeterministic, we use
bitwise comparison to
  check if the referenced columns data is being changed.

searching the internet found out:
— ON UPDATE RESTRICT: any change to a referenced column in the
referenced table is prohibited if there
is a matching row.
— ON UPDATE NO ACTION (the default): there is no referential update
action; the referential constraint
only specifies a constraint check.
NOTE 53 — Even if constraint checking is not deferred, ON UPDATE
RESTRICT is a stricter condition than ON UPDATE NO
ACTION. ON UPDATE RESTRICT prohibits an update to a particular row if
there are any matching rows; ON UPDATE NO
ACTION does not perform its constraint check until the entire set of
rows to be updated has been processed.

looking at NOTE 53, overall i think i get it.

Re: define pg_structiszero(addr, s, r)

Hi,

On Thu, Nov 14, 2024 at 08:22:23AM -0300, Ranier Vilela wrote:
> Em qui., 14 de nov. de 2024 às 07:09, Bertrand Drouvot <
> bertranddrouvot...@gmail.com> escreveu:
> 
> > Hi,
> >
> > On Thu, Nov 14, 2024 at 09:27:06AM +0900, Michael Paquier wrote:
> > > Makes sense to me to just do that, with a first < 8 loop, and a second
> > > for the 8~63 range.
> >
> > Thanks for looking at it!
> >
> > > There is also a "cant'" in the last size_t check.  Simple typo.
> >
> > Please find attached v12, with more comments and comments changes to
> > explain
> > the multiple cases (for safety) and phases (for efficiency).
> >
> Is it worth mentioning that pg_memory_is_all_zeros does not work correctly
> on 32-bit systems?
> 
> (63 < (size_t) * 8) /* 63 - 32*/

I think that the code is fully portable on 32-bit systems as it's using size_t
in all the places.

I agree that the comments are "64-bit" focused though, but I don't think that's
an issue (as I think it's already the case in multiple places in the core code).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add html-serve target to autotools and meson

2024-11-14 Thread Ashutosh Bapat

On Tue, Nov 12, 2024 at 4:00 PM Ashutosh Bapat
 wrote:
>
> On Mon, Nov 11, 2024 at 10:41 PM Jacob Champion
>  wrote:
> >
> > On Mon, Nov 11, 2024 at 7:50 AM Andres Freund  wrote:
> > > If you just want that we could print out a file:// url or even open it? 
> > > Rather
> > > than an http server?
> >
> > +1. The coverage-html recipe is a nice example of this; just
> > control-click the file: link and away you go.
>
> WFM.

I noticed that the documentation served by the postgresql.org site
allows searching in the documentation, whereas that functionality is
not available if we open HTML pages from docs directory. Would
html-server provide that functionality?

-- 
Best Wishes,
Ashutosh Bapat

Re: define pg_structiszero(addr, s, r)

2024-11-14 Thread Ranier Vilela

Em qui., 14 de nov. de 2024 às 08:58, Bertrand Drouvot <
bertranddrouvot...@gmail.com> escreveu:

> Hi,
>
> On Thu, Nov 14, 2024 at 08:22:23AM -0300, Ranier Vilela wrote:
> > Em qui., 14 de nov. de 2024 ąs 07:09, Bertrand Drouvot <
> > bertranddrouvot...@gmail.com> escreveu:
> >
> > > Hi,
> > >
> > > On Thu, Nov 14, 2024 at 09:27:06AM +0900, Michael Paquier wrote:
> > > > Makes sense to me to just do that, with a first < 8 loop, and a
> second
> > > > for the 8~63 range.
> > >
> > > Thanks for looking at it!
> > >
> > > > There is also a "cant'" in the last size_t check.  Simple typo.
> > >
> > > Please find attached v12, with more comments and comments changes to
> > > explain
> > > the multiple cases (for safety) and phases (for efficiency).
> > >
> > Is it worth mentioning that pg_memory_is_all_zeros does not work
> correctly
> > on 32-bit systems?
> >
> > (63 < (size_t) * 8) /* 63 - 32*/
>
> I think that the code is fully portable on 32-bit systems as it's using
> size_t
> in all the places.
>
Maybe I'm doing something wrong.
But I'm testing in 32-bit, with the size set to 63, with v12 and I'm seeing
the SIMD loop execute.
Because the test
if (len < sizeof(size_t) * 8) // 8-63 bytes
failed.

I expected that with size 63, it would be solved by case 2, or am I wrong?

best regards,
Ranier Vilela

PS. Windows 11 64 bits
msvc 32 bits 2022

Re: commitfest.postgresql.org Specify thread msgid does not work for pgsql-bugs(at)lists(dot)postgresql(dot)org

2024-11-14 Thread Jelte Fennema-Nio

On Mon, 21 Oct 2024 at 11:49, Alvaro Herrera  wrote:
> I guess it would be helpful if the commitfest app can read a URL to a
> mailing list post and reverse-engineer the message-id from it ... maybe
> you'd have to propose a patch to this:
> https://git.postgresql.org/gitweb/?p=pgcommitfest2.git;a=blob;f=pgcommitfest/commitfest/views.py;h=2c21cd3d623d86a6fc8a0f639ab127ece520eacf;hb=refs/heads/master#l387

I agree that would be helpful. Feel free to send a patch to me over
email, or create a PR on this repo:
https://github.com/JelteF/commitfest

Re: Add html-serve target to autotools and meson

2024-11-14 Thread Daniel Gustafsson

> On 14 Nov 2024, at 13:11, Ashutosh Bapat  wrote:

>>> On Mon, Nov 11, 2024 at 7:50 AM Andres Freund  wrote:
 If you just want that we could print out a file:// url or even open it? 
 Rather
 than an http server?

+1

> I noticed that the documentation served by the postgresql.org site
> allows searching in the documentation, whereas that functionality is
> not available if we open HTML pages from docs directory. Would
> html-server provide that functionality?

No, doc searching on the site is implemented with FTS in the pgweb repository
and would not be present in the proposal here.

--
Daniel Gustafsson

Re: pg_rewind WAL segments deletion pitfall

2024-11-14 Thread Alexander Kukushkin

Hi Alvaro,

The commit message looks good to me, except maybe using a "master" word,
which I would suggest to replace with "primary".

And a small nitpick:
+/*
+ * Initialize a hash table to store WAL file names that must be kept.
+ */
+void
+keepwal_init(void)
+{
+   /*
+* This hash table is empty in the vast majority of cases, so set an
+* initial size of 0.
+*/
+   keepwal = keepwal_create(0, NULL);
+}

I don't think that the hash table will be empty. It needs to hold all WAL
filenames starting from the last checkpoint and up to divergent point.
On loaded clusters it could be hundreds and thousands of files.

On Tue, 12 Nov 2024 at 20:18, Alvaro Herrera 
wrote:

> Hello
>
> After reading the whole thread a couple of times to make sure I
> understood the problem correctly, I think the approach in the v10 patch
> is a reasonable one.  I agree that it's better for maintainability to
> keep a separate hash table.  I made some cosmetic adjustments -- didn't
> find any fault in the code.  I also verified that the test is hitting
> the problematic case.
>
> So here's v11, which I'll try to get out tomorrow.
>
> I also assessed back-patching this.  There's some conflicts, but nothing
> serious, back to 14.  Unfortunately, 13 lacks requisite infrastructure,
> so I think it'll have to stay as it is.  (12 is dead.)
>
> Can you please verify that my explanation in the commit message is
> sound?
>
> Thanks
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
> "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
>


Regards,
--
Alexander Kukushkin

Re: Potential ABI breakage in upcoming minor releases

On Thu, Nov 14, 2024 at 04:18:02PM +0530, Pavan Deolasee wrote:
> Commit 51ff46de29f67d73549b2858f57e77ada8513369 (backported all the way
> back to v12) added a new member to `ResultRelInfo` struct. This can
> potentially cause ABI breakage for the extensions that allocate the struct
> and pass it down to the PG code. The previously built extensions may
> allocate a shorter struct, while the new PG code would expect a larger
> struct, thus overwriting some memory unintentionally.
> 
> A better approach may have been what Tom did in
> 8cd190e13a22dab12e86f7f1b59de6b9b128c784, but I understand it might be too
> late to change this since the releases are already tagged. Nevertheless, I
> thought of bringing it up if others have different views.

The release is now announced.  We could issue new releases (likely next
Thursday) if there's a problem that warrants it.

Based on a grep of PGXN code, here are some or all of the modules that react
to sizeof(ResultRelInfo):

$ grepx -r 'lloc.*ResultRelInfo' | tee /tmp/1 | sed 's/-[^:]*/:/'|sort -u
apacheage::resultRelInfo = palloc(sizeof(ResultRelInfo));
pg_pathman::ResultRelInfo  *resultRelInfo = (ResultRelInfo *) 
palloc(sizeof(ResultRelInfo));
$ grepx -r 'Node.*ResultRelInfo' | tee /tmp/2 | sed 's/-[^:]*/:/'|sort -u
apacheage::cypher_node->resultRelInfo = makeNode(ResultRelInfo);
apacheage::cypher_node->resultRelInfo = makeNode(ResultRelInfo);
citus:: resultRelInfo = makeNode(ResultRelInfo);
citus:: ResultRelInfo *resultRelInfo = makeNode(ResultRelInfo);
pg_bulkload::   checker->resultRelInfo = makeNode(ResultRelInfo);
pg_bulkload::   self->relinfo = makeNode(ResultRelInfo);
pg_pathman::child_result_rel_info = makeNode(ResultRelInfo);
pg_pathman::parent_result_rel = makeNode(ResultRelInfo);
pg_pathman::parent_rri = makeNode(ResultRelInfo);
pg_pathman::part_result_rel_info = makeNode(ResultRelInfo);
vops::  resultRelInfo = makeNode(ResultRelInfo);

I don't know whether we should make a new release, amend the release
announcement to call for extension rebuilds, or just stop here.
https://wiki.postgresql.org/wiki/Committing_checklist#Maintaining_ABI_compatibility_while_backpatching
mentions the problem, but neither it nor the new standard at
postgr.es/c/e54a42a say how reticent we'll be to add to the end of a struct on
which extensions do sizeof.

The postgr.es/c/e54a42a standard would have us stop here.  But I'm open to
treating the standard as mistaken and changing things.

Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

2024-11-14 Thread Shlok Kyal

Thanks for providing the comments.

On Thu, 14 Nov 2024 at 12:22, vignesh C  wrote:
>
> On Wed, 13 Nov 2024 at 11:15, Shlok Kyal  wrote:
> >
> > Thanks for providing the comments.
> >
> > On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Friday, November 8, 2024 7:06 PM Shlok Kyal  
> > > wrote:
> > > >
> > > > Hi Amit,
> > > >
> > > > On Thu, 7 Nov 2024 at 11:37, Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal 
> > > > wrote:
> > > > > >
> > > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with
> > > > > > unpublished generated column as REPLICA IDENTITY. I have attached a
> > > > > > patch for the same.
> > > > > >
> > > > >
> > > > > +CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
> > > > > +testpub_gencol SET a = 100 WHERE a = 1;
> > > > > +ERROR:  cannot update table "testpub_gencol"
> > > > > +DETAIL:  Column list used by the publication does not cover the
> > > > > replica identity.
> > > > >
> > > > > This is not a correct ERROR message as the publication doesn't have
> > > > > any column list associated with it. You have added the code to detect
> > > > > this in the column list code path which I think is not required. BTW,
> > > > > you also need to consider the latest commit 7054186c4e for this. I
> > > > > guess you need to keep another flag in PublicationDesc to detect this
> > > > > and then give an appropriate ERROR.
> > > >
> > > > I have addressed the comments and provided an updated patch. Also, I am
> > > > currently working to fix this issue in back branches.
> > >
> > > Thanks for the patch. I am reviewing it and have some initial comments:
> > >
> > >
> > > 1.
> > > +   char attgenerated = get_attgenerated(relid, 
> > > attnum);
> > > +
> > >
> > > I think it's unnecessary to initialize attgenerated here because the 
> > > value will
> > > be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
> > > is not cheap.
> > >
> > Fixed
> >
> > > 2.
> > >
> > > I think the patch missed to check the case when table is marked REPLICA
> > > IDENTITY FULL, and generated column is not published:
> > >
> > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) 
> > > STORED NOT NULL);
> > > ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
> > > CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
> > > UPDATE testpub_gencol SET a = 2;
> > >
> > > I expected the UPDATE to fail in above case, but it can still pass after 
> > > applying the patch.
> > >
> > Fixed
> >
> > > 3.
> > >
> > > +* If the publication is FOR ALL TABLES we can skip the 
> > > validation.
> > > +*/
> > >
> > > This comment seems not clear to me, could you elaborate a bit more on 
> > > this ?
> > >
> > I missed to handle the case FOR ALL TABLES. Have removed the comment.
> >
> > > 4.
> > >
> > > Also, I think the patch does not handle the FOR ALL TABLE case correctly:
> > >
> > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) 
> > > STORED NOT NULL);
> > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
> > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index 
> > > testpub_gencol_idx;
> > > CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
> > > UPDATE testpub_gencol SET a = 2;
> > >
> > > I expected the UPDATE to fail in above case as well.
> > >
> > Fixed
> >
> > > 5.
> > >
> > > +   else if (cmd == CMD_UPDATE && 
> > > !pubdesc.replident_has_valid_gen_cols)
> > > +   ereport(ERROR,
> > > +   
> > > (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > +errmsg("cannot update table \"%s\"",
> > > +   
> > > RelationGetRelationName(rel)),
> > > +errdetail("REPLICA IDENTITY consists of 
> > > an unpublished generated column.")));
> > >
> > > I think it would be better to use lower case "replica identity" to 
> > > consistent
> > > with other existing messages.
> > >
> > Fixed
> >
> > I have attached the updated patch here.
>
> Few comments:
> 1) In the first check relation->rd_rel->relispartition also is checked
> whereas in the below it is not checked, shouldn't the same check be
> there below to avoid few of the function calls which are not required:
> +   if (pubviaroot && relation->rd_rel->relispartition)
> +   {
> +   publish_as_relid =
> GetTopMostAncestorInPublication(pubid, ancestors, NULL);
> +
> +   if (!OidIsValid(publish_as_relid))
> +   publish_as_relid = relid;
> +   }
> +
>
> +   if (pubviaroot)
> +   {
> +   /* attribute name in the child table */
> +   char   *colname =
> get_attname(relid, attnum, false);
> +
> +   /*
> +*

Re: Converting SetOp to read its two inputs separately

2024-11-14 Thread Richard Guo

On Thu, Nov 14, 2024 at 11:00 AM Tom Lane  wrote:
> Aside from that minor TODO, the main thing that's left undone in this
> patch series is to persuade the thing to exploit presorted input
> paths.  Right now it fails to do so, as can be seen in some of the
> regression test cases, eg
>
> regression=# set enable_hashagg = 0;
> SET
> regression=# explain (costs off) select unique1 from tenk1 except select 
> unique2 from tenk1 where unique2 != 10;
> QUERY PLAN
> --
>  SetOp Except
>->  Sort
>  Sort Key: tenk1.unique1
>  ->  Index Only Scan using tenk1_unique1 on tenk1
>->  Sort
>  Sort Key: tenk1_1.unique2
>  ->  Index Only Scan using tenk1_unique2 on tenk1 tenk1_1
>Filter: (unique2 <> 10)
> (8 rows)
>
> Obviously the sorts are unnecessary, but it's not seeing that.
> I suppose this requires integrating generate_nonunion_paths with
> the logic from commit 66c0185a3.  I tried to make sense of that,
> but failed --- either I don't understand it, or there are a
> number of things wrong with it.  I'd welcome some help with
> getting that done.

I think we may need to do the following to make this work:

1. We need to teach set_operation_ordered_results_useful() that sorted
input paths are also useful for INTERSECT/EXCEPT, so that we can have
setop_pathkeys set for the subqueries.

2. In generate_nonunion_paths(), we need to provide a valid
"interesting_pathkeys" when calling build_setop_child_paths().

Thanks
Richard

Re: Virtual generated columns

On Tue, Nov 12, 2024 at 9:47 PM Peter Eisentraut  wrote:
>
> On 10.11.24 04:16, Amit Kapila wrote:
> > The possible idea to replicate virtual generated columns is to compute
> > the corresponding expression before sending the data to the client. If
> > we can allow it in the row filter than why not to publish it as well.
>
> Row filters have pretty strong restrictions for what kind of operations
> they can contain.  Applying those restrictions to virtual generated
> columns would probably not make that feature very useful.  (You want to
> use virtual columns for expressions that are too cumbersome to write out
> by hand every time.)
>

>From this paragraph, it sounds like you are saying we can't support
virtual columns in row filters. But the patch already works (not
checked all possible cases). For example,

postgres=# CREATE TABLE gtest1 (a int PRIMARY KEY, b int GENERATED
ALWAYS AS (a * 2) VIRTUAL);
CREATE TABLE
postgres=# create publication pub2 for table gtest1 WHERE (b > 5);
CREATE PUBLICATION

After this, Insert also adheres to this row filter. I haven't tested
it in any further detail but its basic usage in row filters works.

> Moreover, we would have to implement some elaborate cross-checks if a
> table gets added to a publication.  How would that work?  "Can't add
> table x to publication because it contains a virtual generated column
> with a non-simple expression"?  With row filters, this is less of a
> problem, because the row filter a property of the publication.
>

Because virtual generated columns work in row filters, so I thought it
could follow the rules for column lists as well. If the virtual column
doesn't adhere to the rules of the row filter then it shouldn't even
work there. My response was based on the theory that the expression
for virtual columns could be computed during logical decoding. So,
let's first clarify that before discussing this point further.

-- 
With Regards,
Amit Kapila.

Re: Converting SetOp to read its two inputs separately

2024-11-14 Thread Richard Guo

On Thu, Nov 14, 2024 at 6:28 PM Richard Guo  wrote:
> 1. We need to teach set_operation_ordered_results_useful() that sorted
> input paths are also useful for INTERSECT/EXCEPT, so that we can have
> setop_pathkeys set for the subqueries.

BTW, I noticed a typo in the comment for function
set_operation_ordered_results_useful().

  * set_operation_ordered_results_useful
  * Return true if the given SetOperationStmt can be executed by utilizing
  * paths that provide sorted input according to the setop's targetlist.
- * Returns false when sorted paths are not any more useful then unsorted
+ * Returns false when sorted paths are not any more useful than unsorted
  * ones.

Thanks
Richard

Re: Some dead code in get_param_path_clause_serials()

2024-11-14 Thread Andrei Lepikhov


On 11/14/24 08:20, Richard Guo wrote:

On Wed, Nov 13, 2024 at 9:59 PM Andrei Lepikhov  wrote:

Even So, it would be better to insert changes, induced by new
feature by one commit.


I'm not sure I understand what you mean by this sentence.
Yeah, it's my bad English. nothing special, I just wanted to say that 
this code can be added later.





By the way, Do you know if anyone gave a speech on the current state of
internal plan parameterisation and its possible development directions?
It would be helpful to have such an explanation.


Not that I'm aware of, but I found the "Parameterized Paths" section
in optimizer/README to be very helpful.
Although parameterisation itself quite clearly described, the concept 
and code, related to re-parameterisation (in presence of 
inheritance/partitioning) seems complicated.


--
regards, Andrei Lepikhov

Re: Speed up TransactionIdIsCurrentTransactionId() with lots of subxacts

Hi,

On Thu, Nov 14, 2024 at 12:16:52AM +0200, Heikki Linnakangas wrote:
> On 13/11/2024 18:08, Bertrand Drouvot wrote:
> > === 1
> > 
> > +  parentOffset--;
> > +  parents[parentOffset]->totalXids = totalXids;
> > 
> > what about "parents[--parentOffset]->totalXids = totalXids;" instead?
> > 
> > That would remove the ability for someone to insert code between the 
> > decrement
> > and the array access.
> 
> IMHO it's good as it is. There's no particular harm if someone inserts code
> there, is there?

I think it all depends what the inserted code expects as parentOffset value.
I thought it could be better to just use a single line but this is probably just
a matter of taste afterall.

> I wonder if we should add a "child" pointer to TransactionStateData though.
> That would make it unnecessary to build the temporary array here, and it
> could be used to avoid the recursion in ShowTransactionStateRec().

Yeah, that's a good idea and that would also provide a more natural 
representation
of transactions relationships IMHO. I don't see cons doing so, that would not
add extra complexity, just the need to maintain both parent and child pointers
which should be simple enough. I think that would make the code simpler 
actually.

> > === 2
> > 
> > +  newsize = 32;
> > +  CurrentTransactionIds = MemoryContextAlloc(TopTransactionContext,
> > + 32 * sizeof(TransactionId));
> > 
> > what about defining a macro for "32" (as it is used in multiple places in
> > xact.c)?
> 
> The only other place is in AtStart_Memory(), but there it's used for a
> different thing. I think a constant is fine here. It'd probably be good to
> change "32 * sizeof(TransactionId)" to "newsize * sizeof(TransactionId)"
> though.

Yeah, +1 for using "newsize * sizeof(TransactionId)": that was the main idea
behind the define proposal (change the constant value at only one place if
needed).

> > 4 ===
> > 
> >   int
> >   xactGetCommittedChildren(TransactionId **ptr)
> >   {
> > -   TransactionState s = CurrentTransactionState;
> > +   return GetCommittedChildren(CurrentTransactionState, ptr);
> > 
> > Worth to move this part of the comment on top of xactGetCommittedChildren(),
> > 
> > "
> > If there are no subxacts, *ptr is set to NULL.
> > "
> > 
> > on top of GetCommittedChildren() instead?
> 
> I don't see why. The interface is described in xactGetCommittedChildren()
> now, and GetCommittedChildren() just notes that it's an internal version of
> xactGetCommittedChildren(). It seems clear to me that you should look at
> xactGetCommittedChildren() for more information.

Yeah, OTOH GetCommittedChildren() is the main reason of 
"If there are no subxacts, *ptr is set to NULL": so maybe instead of moving the
comment, just duplicate it. That's probably a Nit though. 

> > 5 ===
> > 
> >   static void
> >   AtSubAbort_childXids(void)
> >   {
> > -   TransactionState s = CurrentTransactionState;
> > -
> > -   /*
> > -* We keep the child-XID arrays in TopTransactionContext (see
> > -* AtSubCommit_childXids).  This means we'd better free the array
> > -* explicitly at abort to avoid leakage.
> > -*/
> > -   if (s->childXids != NULL)
> > -   pfree(s->childXids);
> > -   s->childXids = NULL;
> > -   s->nChildXids = 0;
> > -   s->maxChildXids = 0;
> > +   /* nothing to do */
> > 
> > Yeah that works but then some CurrentTransactionIds[] elements do not 
> > reflect
> > the "reality" anymore (until the top level transaction finishes, or until a
> > new subtransaction is created).
> > 
> > Could'nt that be an issue from a code maintability point of view?
> 
> Hmm, I don't know. The subtransaction is in the process of being aborted.
> Whether you consider its XID to still belong to the subtransaction until
> it's fully aborted is a matter of taste. If you consider that they still
> belong to the subtransaction until the subtransaction's TransactionState is
> free'd, it makes sense to leave them alone here.

Agree.

> Before this patch, it made sense to reset all those fields when the
> childXids array was free'd, but with this patch, the elements in
> CurrentTransactionIds[] are still valid until the TransactionState is
> free'd.

Yes, there is nothing wrong with the patch, it works.

> What would be the alternative? I don't think it makes sense to go out of our
> way to clear the elements in CurrentTransactionIds[]. We could set them to
> InvalidXid to make it clear they're not valid anymore,

Yeah, that's what I had in mind. It's probably not worth the extra code but
maybe it's worth a comment?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Improve the error message for logical replication of regular column to generated column.

Hi Shubham,

+1 for the patch idea.

Improving this error message for subscriber-side generated columns
will help to remove some confusion.

Here are my review comments for patch v1-0001.

==
Commit message.

1.
The error message was misleading, as it failed to clarify that the replication
of regular column on the publisher to the corresponding generated column on
the subscriber is not supported.

This patch improves the error handling and reporting mechanism to make it clear
that the replication of regular column on the subscriber is not supported,
resolving the misleading "missing column" error.

~

It makes no difference whether the publishing table column is regular
or generated, so you should not be implying that this has anything to
do with the replication of just regular columns. AFAIK, the *only*
thing that matters is that you cannot replicate into a subscriber-side
generated column or a subscriber-side missing column.

The current master reports replication into either a generated or a
missing column as the same "missing replication column" error. IIUC,
the errors were "correct", although clearly, for the generated column
case the error was quite misleading.

So, this patch is really *only* to improve the error wording when
attempting to replicate into a subscriber-side generated column.
That's what the commit message should be conveying.

==
src/backend/replication/logical/relation.c

logicalrep_rel_open:

2.
  Bitmapset  *missingatts;
+ StringInfoData gencolsattsbuf;
+ int generatedatts = 0;
+
+ initStringInfo(&gencolsattsbuf);

The existing "missing columns" error is implemented by building a BMS
and then passing it to the function 'logicalrep_report_missing_attrs'
to report the error.

IMO the generated column error is essentially the same, so should be
implemented with almost identical logic -- i.e. you should build a
'generatedattrs' BMS of generated cols with matching names and (if
that BMS is not empty) then pass that to a new function
'logicalrep_report_generated_attrs' (a sibling function to the
existing one).

~~~

3.
+ /*
+ * Check if the subscription table generated column has
+ * same name as a non-generated column in the
+ * corresponding publication table.
+ */

This (misplaced) comment talks about checking if the names are the
same. But I don't see any name-checking logic here (???). Where is it?

~~~

4.
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg_plural("replicating to a target relation's generated column
\"%s\" for \"%s.%s\" is not supported",
+"replicating to a target relation's generated column \"%s\" for
\"%s.%s\" is not supported",
+generatedatts, gencolsattsbuf.data, remoterel->nspname,
remoterel->relname)));

There are no plural differences here. This smells like a cut/paste
mistake from logicalrep_report_generated_attrs'.

IMO this error should close match the existing "missing replication
columns" error, and use the errmsg_plural correctly. In other words,
it should look something more like this:

ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg_plural("cannot replicate to target relation \"%s.%s\"
generated column: %s",
"cannot replicate to target relation \"%s.%s\"
generated columns: %s",
...

==
src/test/subscription/t/011_generated.pl

5.
+# =
+# Exercise logical replication of a regular column to a subscriber side
+# generated column.
+#
+# A "normal --> generated" replication fails, reporting an error that the
+# replication of a generated column on subscriber side is not supported.
+# =
+
+# --
+# Test Case: normal --> generated
+# Publisher table has regular columns 'c2' and 'c3'.
+# Subscriber table has generated columns 'c2' and 'c3'.
+# --
+

As I have said in previous internal reviews, this test (and the
comments) can be much more sophisticated. AFAICT by cleverly arranging
different publication table column types and different subscriber-side
table column ordering I think you should be able to test multiple
things at once.

Such as
- regular -> generated is detected
- generated -> generated is detected
- that the error only reports the generated column problems where the
column names are matching, not others



6.
Also, as previously mentioned in internal reviews, this patch should
include a 2nd test case to do pretty much the same testing but
expecting to get a "missing replication column".

The reasons to include this 2nd test are:
a) The missing column was never tested properly before.
b) This current patch has overlapping logic so you need to be assured
that adding this new error doesn't break the existing one.
c) Only one of these errors wins. Adding both tests will define the
expected order if both error

Re: Virtual generated columns

On Wed, Nov 13, 2024 at 11:30 AM jian he  wrote:
>
> These 3 functions will call StoreRelNotNull to store the not-null constraint.
> StoreConstraints
> AddRelationNotNullConstraints
> AddRelationNewConstraints
>
> we can disallow not-null on virtual generated columns via these 3 functions.
> I guess we don't want to add more complexity to AddRelationNotNullConstraints.
> we can do it in StoreRelNotNull.

inspired by not-null and check check_modified_virtual_generated again.

in plpgsql_exec_trigger, we can:
/*
 * In BEFORE trigger, stored generated columns are not computed yet,
 * so make them null in the NEW row.  (Only needed in UPDATE branch;
 * in the INSERT case, they are already null, but in UPDATE, the field
 * still contains the old value.)  Alternatively, we could construct a
 * whole new row structure without the generated columns, but this way
 * seems more efficient and potentially less confusing.
 */
if (tupdesc->constr && tupdesc->constr->has_generated_stored &&
TRIGGER_FIRED_BEFORE(trigdata->tg_event))
{
for (int i = 0; i < tupdesc->natts; i++)
{
if (TupleDescAttr(tupdesc, i)->attgenerated ==
ATTRIBUTE_GENERATED_STORED ||
TupleDescAttr(tupdesc, i)->attgenerated ==
ATTRIBUTE_GENERATED_VIRTUAL)
expanded_record_set_field_internal(rec_new->erh,
   i + 1,
   (Datum) 0,
   true,/* isnull */
   false, false);
}
}
then we don't need check_modified_virtual_generated at all.

this will align with the stored generated column behave for
BEFORE UPDATE/INSERT FOR EACH ROW trigger. that is
you are free to assign the virtual generated column any value,
but at the plpgsql_exec_trigger, we will rewrite it to null.

also i understand correctly.
later if we want to implement virtual generated column with not-null then
check_modified_virtual_generated needs to be removed?

Re: Forbid to DROP temp tables of other sessions

2024-11-14 Thread Daniil Davydov

On Wed, Oct 30, 2024 at 7:32 PM Rafia Sabih  wrote:

> Good catch. I agree with this being an unwarranted behaviour.
> A minor comment from my end is the wording of the error message.
> Based on the Postgresql error message style huide, something like this could 
> be better,
> "could not access temporary relations of other sessions".
> --
> Regards,
> Rafia Sabih
> CYBERTEC PostgreSQL International GmbH
>
Thanks for your comment. I attach a patch with a fixed error message.
Also you can find it in commit fest
(https://commitfest.postgresql.org/51/5379/)
--
Best Regards,
Daniil Davydov
From 2c2e1b2fd75d38d43fbec5023691b93dceb18dbb Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Sun, 27 Oct 2024 16:15:50 +0700
Subject: [PATCH] Add fixes so now we cannot access to temporary tables of
 other sessions

---
 src/backend/catalog/namespace.c |  52 --
 src/backend/nodes/makefuncs.c   |   6 +-
 src/backend/parser/gram.y   |  11 ++-
 src/test/isolation/expected/temp-tables.out |  92 +
 src/test/isolation/isolation_schedule   |   1 +
 src/test/isolation/specs/temp-tables.spec   | 103 
 6 files changed, 231 insertions(+), 34 deletions(-)
 create mode 100644 src/test/isolation/expected/temp-tables.out
 create mode 100644 src/test/isolation/specs/temp-tables.spec

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 30807f9190..1a8e8c8844 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -499,28 +499,19 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
 		 */
 		if (relation->relpersistence == RELPERSISTENCE_TEMP)
 		{
-			if (!OidIsValid(myTempNamespace))
-relId = InvalidOid; /* this probably can't happen? */
-			else
+			if (relation->schemaname)
 			{
-if (relation->schemaname)
-{
-	Oid			namespaceId;
+Oid			namespaceId;
 
-	namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok);
-
-	/*
-	 * For missing_ok, allow a non-existent schema name to
-	 * return InvalidOid.
-	 */
-	if (namespaceId != myTempNamespace)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("temporary tables cannot specify a schema name")));
-}
-
-relId = get_relname_relid(relation->relname, myTempNamespace);
+namespaceId = LookupExplicitNamespace(relation->schemaname,
+	  missing_ok);
+if (namespaceId != myTempNamespace)
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("could not access temporary relations of other sessions")));
 			}
+
+			relId = get_relname_relid(relation->relname, myTempNamespace);
 		}
 		else if (relation->schemaname)
 		{
@@ -3387,17 +3378,14 @@ LookupExplicitNamespace(const char *nspname, bool missing_ok)
 	Oid			namespaceId;
 	AclResult	aclresult;
 
-	/* check for pg_temp alias */
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
-		if (OidIsValid(myTempNamespace))
-			return myTempNamespace;
-
 		/*
-		 * Since this is used only for looking up existing objects, there is
-		 * no point in trying to initialize the temp namespace here; and doing
-		 * so might create problems for some callers --- just fall through.
+		 * If namespace specified in 'pg_temp' format
+		 * (without owner proc number specification), we assume that this is
+		 * our temporary namespace
 		 */
+		return myTempNamespace;
 	}
 
 	namespaceId = get_namespace_oid(nspname, missing_ok);
@@ -3553,21 +3541,19 @@ get_namespace_oid(const char *nspname, bool missing_ok)
 RangeVar *
 makeRangeVarFromNameList(const List *names)
 {
-	RangeVar   *rel = makeRangeVar(NULL, NULL, -1);
+	RangeVar   *rel;
 
 	switch (list_length(names))
 	{
 		case 1:
-			rel->relname = strVal(linitial(names));
+			rel = makeRangeVar(NULL, strVal(linitial(names)), -1);
 			break;
 		case 2:
-			rel->schemaname = strVal(linitial(names));
-			rel->relname = strVal(lsecond(names));
+			rel = makeRangeVar(strVal(linitial(names)), strVal(lsecond(names)), -1);
 			break;
 		case 3:
+			rel = makeRangeVar(strVal(lsecond(names)), strVal(lthird(names)), -1);
 			rel->catalogname = strVal(linitial(names));
-			rel->schemaname = strVal(lsecond(names));
-			rel->relname = strVal(lthird(names));
 			break;
 		default:
 			ereport(ERROR,
@@ -3774,6 +3760,8 @@ GetTempNamespaceProcNumber(Oid namespaceId)
 		return INVALID_PROC_NUMBER; /* no such namespace? */
 	if (strncmp(nspname, "pg_temp_", 8) == 0)
 		result = atoi(nspname + 8);
+	else if (strncmp(nspname, "pg_temp", 7) == 0)
+		result = MyProcNumber;
 	else if (strncmp(nspname, "pg_toast_temp_", 14) == 0)
 		result = atoi(nspname + 14);
 	else
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 61ac172a85..e60044c9d5 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -429,10 +429,14 @@ makeRangeVar(char *schemaname, char *relname, int location)
 	r

Re: Remove useless casts to (void *)


On 29.10.24 15:20, Tom Lane wrote:

Peter Eisentraut  writes:

There are a bunch of (void *) casts in the code that don't make sense to
me.  I think some of these were once necessary because char * was used
in place of void * for some function arguments.  And some of these were
probably just copied around without further thought.  I went through and
cleaned up most of these.  I didn't find any redeeming value in these.
They are just liable to hide actual problems such as incompatible types.
   But maybe there are other opinions.


I don't recall details, but I'm fairly sure some of these prevented
compiler warnings on some (old?) compilers.  Hard to be sure if said
compilers are all gone.

Looking at the sheer size of the patch, I'm kind of -0.1, just
because I'm afraid it's going to create back-patching gotchas.
I don't really find that it's improving readability, though
clearly that's a matter of opinion.


I did a bit of archeological research on these.  None of these casts 
were ever necessary, and in many cases even the original patch that 
introduced an API used the coding style inconsistently.  So I'm very 
confident that there are no significant backward compatibility or 
backpatching gotchas here.


I'm more concerned that many of these just keep getting copied around 
indiscriminately, and this is liable to hide actual type mismatches or 
silently discard qualifiers.  So I'm arguing in favor of a more 
restrictive style in this matter.

Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Pavan Deolasee

Hello,

Commit 51ff46de29f67d73549b2858f57e77ada8513369 (backported all the way
back to v12) added a new member to `ResultRelInfo` struct. This can
potentially cause ABI breakage for the extensions that allocate the struct
and pass it down to the PG code. The previously built extensions may
allocate a shorter struct, while the new PG code would expect a larger
struct, thus overwriting some memory unintentionally.

A better approach may have been what Tom did in
8cd190e13a22dab12e86f7f1b59de6b9b128c784, but I understand it might be too
late to change this since the releases are already tagged. Nevertheless, I
thought of bringing it up if others have different views.

Thanks,
Pavan

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Ants Aasma

On Thu, 14 Nov 2024 at 16:35, Noah Misch  wrote:

> Based on a grep of PGXN code, here are some or all of the modules that
> react
> to sizeof(ResultRelInfo):
>

To add to this list, Christoph Berg confirmed that timescaledb test suite
crashes. [1]

Regards,
Ants Aasma

[1]
https://pgdgbuild.dus.dg-i.net/job/timescaledb-autopkgtest/9/architecture=amd64,distribution=sid/console

Object identifier types in logical replication binary mode

2024-11-14 Thread Emre Hasegeli

I encountered a problem with logical replication.

The object identifier types, regclass, regproc, regtype, etc. are
transferred as an oid in the binary mode.  However, the subscriber
expects them as text which makes sense because it cannot do anything
with the oids.  I am getting "invalid byte sequence for encoding
"UTF8": 0x00" when I try this.

I think the publisher should not transfer these in binary just like
the data types from extensions.  Any opinions?

Re: Potential ABI breakage in upcoming minor releases


On 14.11.24 15:35, Noah Misch wrote:

The postgr.es/c/e54a42a standard would have us stop here.  But I'm open to
treating the standard as mistaken and changing things.


That text explicitly calls out that adding struct members at the end of 
a struct is considered okay.  But thinking about it now, even adding 
fields to the end of a node struct that extensions allocate using 
makeNode() is an ABI break that is liable to cause all affected 
extensions to break in a crashing way.

Re: gamma() and lgamma() functions

2024-11-14 Thread Dmitry Koval


Hi!

I think having gamma() and lgamma() functions in PostgreSQL would be 
useful for some users, this is asked [1].


I have a question regarding the current implementation of gamma() 
function. Code block:


+   if (errno == ERANGE && arg1 != 0)
+   {
+   if (result != 0.0)
+   float_overflow_error();
+   else
+   float_underflow_error();
+   }
+   else if (isinf(result) && arg1 != 0 && !isinf(arg1))
+   float_overflow_error();
+   else if (result == 0.0)
+   float_underflow_error();

Why in some cases (if arg1 is close to 0, but not 0) an error 
(float_overflow_error) will be returned, but in the case of "arg1 = 0" 
the value 'Infinity' will be returned?

For example:

>SELECT gamma(float8 '1e-320');
ERROR:  value out of range: overflow

>SELECT gamma(float8 '0');
  gamma
--
 Infinity
(1 row)

Perhaps it would be logical if the behavior in these cases was the same 
(either ERROR or 'Infinity')?


Links:
[1] 
https://stackoverflow.com/questions/58884066/how-can-i-run-the-equivalent-of-excels-gammaln-function-in-postgres


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com

Re: Potential ABI breakage in upcoming minor releases

2024-11-14 Thread Christoph Berg

Re: Noah Misch
> Based on a grep of PGXN code, here are some or all of the modules that react
> to sizeof(ResultRelInfo):
> 
> $ grepx -r 'lloc.*ResultRelInfo' | tee /tmp/1 | sed 's/-[^:]*/:/'|sort -u
> apacheage::resultRelInfo = palloc(sizeof(ResultRelInfo));

Confirmed, crashing: AGE for 14..17 (12..13 seem fine)

> citus:: resultRelInfo = makeNode(ResultRelInfo);
> citus:: ResultRelInfo *resultRelInfo = makeNode(ResultRelInfo);
> pg_bulkload::   checker->resultRelInfo = makeNode(ResultRelInfo);
> pg_bulkload::   self->relinfo = makeNode(ResultRelInfo);
> pg_pathman::child_result_rel_info = makeNode(ResultRelInfo);
> pg_pathman::parent_result_rel = makeNode(ResultRelInfo);
> pg_pathman::parent_rri = makeNode(ResultRelInfo);
> pg_pathman::part_result_rel_info = makeNode(ResultRelInfo);
> vops::  resultRelInfo = makeNode(ResultRelInfo);

(These are not on apt.pg.o)

I've also tested other packages where ResultRelInfo appears in the
source, but they all passed the tests (most have decent test but a few
have not):

Bad:
postgresql-16-age
timescaledb

Good:
hypopg
libpg-query
pglast
pglogical
pgpool2
pgsql-ogr-fdw
pg-squeeze
postgresql-mysql-fdw (impossible to test sanely because mysql/mariadb
  take turns at being the default, and in some environments just don't
  start)

> I don't know whether we should make a new release, amend the release
> announcement to call for extension rebuilds, or just stop here.
> https://wiki.postgresql.org/wiki/Committing_checklist#Maintaining_ABI_compatibility_while_backpatching
> mentions the problem, but neither it nor the new standard at
> postgr.es/c/e54a42a say how reticent we'll be to add to the end of a struct on
> which extensions do sizeof.

I'd say the ship has sailed, a new release would now break things the
other way round.

Christoph

Re: Potential ABI breakage in upcoming minor releases

Timescale at least has many users, I don't know about the others.  But
they won't be happy if we let things be.  IMO we should rewrap ...
especially 12, since there won't be a next one.

ISTM that we have spare bytes where we could place that boolean without
breaking ABI.  In x86_64 there's a couple of 4-byte holes, but those
won't be there on x86, so not great candidates.  Fortunately there are
3-byte and 7-byte holes also, which we can use safely.  We can move the
new boolean to those location.

The holes are different in each branch unfortunately.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"

Re: Potential ABI breakage in upcoming minor releases

Hi,

> To add to this list, Christoph Berg confirmed that timescaledb test suite 
> crashes. [1]

Yes changing ResultRelInfo most definetely breaks TimescaleDB. The
extension uses makeNode(ResultRelInfo) and this can't end-up well:

```
static inline Node *
newNode(size_t size, NodeTag tag)
{
Node   *result;

Assert(size >= sizeof(Node));/* need the tag, at least */
result = (Node *) palloc0(size);
result->type = tag;

return result;
}

#define makeNode(_type_)((_type_ *) newNode(sizeof(_type_),T_##_type_))
```

-- 
Best regards,
Aleksander Alekseev

Re: 039_end_of_wal: error in "xl_tot_len zero" test

2024-11-14 Thread Christoph Berg

Re: Thomas Munro
> Pushed.  Thanks!

Well...

postgresql 13.17, Debian bullseye, amd64:

t/039_end_of_wal.pl .. Dubious, test returned 2 (wstat 512, 
0x200)
No subtests run

Test Summary Report
---
t/039_end_of_wal.pl(Wstat: 512 Tests: 0 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
Files=26, Tests=257, 102 wallclock secs ( 0.11 usr  0.04 sys + 14.35 cusr 12.78 
csys = 27.28 CPU)
Result: FAIL


Christoph
t/026_overwrite_contrecord.pl  
1..3
ok 1 - 00010001 differs from 00010002
ok 2 - standby replays past overwritten contrecord
ok 3 - found log line in standby
ok
t/031_recovery_conflict.pl ... 
1..0 # SKIP disabled until after minor releases, due to instability
skipped: disabled until after minor releases, due to instability
t/033_replay_tsp_drops.pl  
ok 1 - standby node started
ok 2 - invalid directory creation is detected
1..2
ok
t/037_invalid_database.pl  
ok 1 - can't connect to invalid database - error code
ok 2 - can't connect to invalid database - error message
ok 3 - can't ALTER invalid database
ok 4 - can't use invalid database as template
ok 5 - invalid databases are ignored by vac_truncate_clog
ok 6 - can DROP invalid database
ok 7 - can't drop already dropped database
# issuing query via background psql: SELECT pg_backend_pid()
# issuing query via background psql: 
#   CREATE DATABASE regression_invalid_interrupt;
#   BEGIN;
#   LOCK pg_tablespace;
#   PREPARE TRANSACTION 'lock_tblspc';
ok 8 - blocked DROP DATABASE completion
# issuing query via background psql: 
#   DO $$
#   BEGIN
#   WHILE NOT EXISTS(SELECT * FROM pg_locks WHERE NOT granted AND 
relation = 'pg_tablespace'::regclass AND mode = 'AccessShareLock') LOOP
#   PERFORM pg_sleep(.1);
#   END LOOP;
#   END$$;
#   SELECT pg_cancel_backend(2496772);
ok 9 - canceling DROP DATABASE
ok 10 - cancel processed
ok 11 - can't connect to invalid_interrupt database
# issuing query via background psql: ROLLBACK PREPARED 'lock_tblspc'
ok 12 - unblock DROP DATABASE
# issuing query via background psql: DROP DATABASE regression_invalid_interrupt
ok 13 - DROP DATABASE invalid_interrupt
1..13
ok
t/039_end_of_wal.pl .. Dubious, test returned 2 (wstat 512, 
0x200)
No subtests run 

Test Summary Report
---
t/039_end_of_wal.pl(Wstat: 512 Tests: 0 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
Files=26, Tests=257, 102 wallclock secs ( 0.11 usr  0.04 sys + 14.35 cusr 12.78 
csys = 27.28 CPU)
Result: FAIL
make[2]: *** [Makefile:23: check] Error 1
make[2]: Leaving directory 
'/home/myon/projects/postgresql/debian/13/build/src/test/recovery'
make[1]: *** [Makefile:48: check-recovery-recurse] Error 2
make[1]: Leaving directory 
'/home/myon/projects/postgresql/debian/13/build/src/test'
make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2
make: Leaving directory '/home/myon/projects/postgresql/debian/13/build'
 
build/src/test/recovery/tmp_check/log/regress_log_031_recovery_conflict 
1..0 # SKIP disabled until after minor releases, due to instability
 
build/src/test/recovery/tmp_check/log/regress_log_026_overwrite_contrecord 

1..3
# Checking port 19609
# Found port 19609
Name: primary
Data directory: 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary_data/pgdata
Backup directory: 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary_data/backup
Archive directory: 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary_data/archives
Connection string: port=19609 host=/home/myon/tmp/yfaXDJawbQ
Log file: 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/log/026_overwrite_contrecord_primary.log
# Running: initdb -D 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary_data/pgdata
 -A trust -N
The files belonging to this database system will be owned by user "myon".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory 
/home/myon/projects/postgresql/debian/13/build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary_data/pgdata
 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... sysv
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap

Re: Potential ABI breakage in upcoming minor releases

On Thu, Nov 14, 2024 at 05:13:35PM +0100, Peter Eisentraut wrote:
> On 14.11.24 15:35, Noah Misch wrote:
> > The postgr.es/c/e54a42a standard would have us stop here.  But I'm open to
> > treating the standard as mistaken and changing things.
> 
> That text explicitly calls out that adding struct members at the end of a
> struct is considered okay.  But thinking about it now, even adding fields to
> the end of a node struct that extensions allocate using makeNode() is an ABI
> break

Right.  makeNode(), palloc(sizeof), and stack allocation have that problem.
Allocation wrappers like CreateExecutorState() avoid the problem.  More
generally, structs allocated in non-extension code are fine.

Re: Potential ABI breakage in upcoming minor releases

Hi,

> Right.  makeNode(), palloc(sizeof), and stack allocation have that problem.
> Allocation wrappers like CreateExecutorState() avoid the problem.  More
> generally, structs allocated in non-extension code are fine.

Perhaps we should consider adding an API like makeResultRelInfo() and
others, one per node. It's going to be boilerplate for sure, but
considering the fact that we already generate some code I don't really
see drawbacks.

-- 
Best regards,
Aleksander Alekseev

Re: BitmapOr node not used in plan for ANY/IN but is for sequence of ORs ...

2024-11-14 Thread Jim Vanns

Thanks Tomas, that's useful to know.

Cheers

Jim

On Wed, 13 Nov 2024 at 13:13, Tomas Vondra  wrote:

> On 11/13/24 13:08, Jim Vanns wrote:
> > (sent to general users mailing list yesterday - but perhaps this is a
> > more suitable audience?)
> >
> > In PG16.4, we have a table of key/pair data (around 30M rows) where
> > there are about 7 distinct keys and each has a conditional or partial
> > index on them (the distribution is different for each key/value pair
> > combination).  I've found that when we have a query that uses an OR then
> > those partial indexes are used but not if the query is written to use
> > ANY/IN, which is more convenient from a programmer POV (especially any
> > with 3rd party query generators etc.). Naturally, the result sets
> > returned by the queries are identical due to the filter semantics of any
> > of the 3 solution variants.
> >
> > Here's a shareable, MRP;
> >
> > https://dbfiddle.uk/OKs_7HWv 
> >
> > Is there any trick I can do to get the planner to make use of the
> > conditional/partial index? Or is this simply an unoptimised code path
> > yet to be exploited!?
> >
>
> I believe this is "simply" not implemented, so there's no way to
> convince the planner to use these partial indexes.
>
> The proximate cause is that the planner does not treat ANY()/IN() as
> equivalent to an OR clause, and does not even consider building the
> "bitmap OR" path for those queries. That's what happens at the very
> beginning of generate_bitmap_or_paths().
>
> Perhaps we could "expand" the ANY/IN clauses into an OR clause, so that
> restriction_is_or_clause() returns "true". But I haven't tried and I'm
> sure there'd be more stuff to fix to make this work.
>
>
> regards
>
> --
> Tomas Vondra
>
>

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

Re: [PATCH] Refactor SLRU to always use long file names

Hi Michael,

> The scans may be quite long as well, actually, which could be a
> bottleneck.  Did you measure the runtime with a maximized (still
> realistic) pool of files for these SLRUs in the upgrade time?  For
> upgrades, data would be the neck.

Good question.

In theory SLRUs are not supposed to grow large and their size is a
small fraction of the rest of the database. As an example CLOG (
pg_xact/ ) stores 2 bits per transaction. Since every SLRU has a
dedicated directory and we scan just it, non-SLRU files don't affect
the scan time.

To make sure I asked several people to check how many SLRUs they have
in the prod environment. The typical response looked like this:

```
$PGDATA/pg_xact: 191 segments
$PGDATA/pg_commit_ts: 3
$PGDATA/pg_multixact/offsets: 148
$PGDATA/pg_multixact/members: 400
$PGDATA/pg_subtrans: 4
$PGDATA/pg_serial: 3
```

This is a 800 Gb database. Interestingly larger databases (4.2Tb) may
have much less SLRU segments (220 in total, most of them are pg_xact).

And here is the *worst* case that was reported to me:

```
$PGDATA/pg_xact: 171 segments
$PGDATA/pg_commit_ts: 3
$PGDATA/pg_multixact/offsets: 4864
$PGDATA/pg_multixact/members: 40996
$PGDATA/pg_subtrans: 5
$PGDATA/pg_serial: 3
```

I was told this is a "1Tb+" database. For this user pg_upgrade will
rename 45 000 files. I wrote a little script to check how much time it
will take:

```
#!/usr/bin/env perl

use strict;

my $from = "test_0001.tmp";
my $to = "test_0002.tmp";

system("touch $from");

for my $i (1..45000) {
rename($from, $to);
($from, $to) = ($to, $from);
}
```

On my laptop I get 0.5 seconds. Note that I don't do scanning, only
renaming, assuming that the recent should take most of the time. I
think this should be multiplied by 10 to take into account the role of
the filesystem cache and other factors.

All in all in the absolutely worst case scenario this shouldn't take
more than 5 seconds, in reality it will probably be orders of
magnitude less.

> Note that this also depends on the system endianness, see 039_end_of_wal.pl.

Sure, I think I took it into account when using pack("L!"). My
understanding is that "L" takes care of the endiness since I see
special flags to force little- or big-endiness independently from the
platform [1]. This of course should be tested in practice on different
machines. Using an exclamation mark in "L!" was a mistake since
cat_ver is not an int, but rather an uint32.

> You don't really need the lookup part, actually?

For lookup we already have the pg_controldata tool, that's not a problem.

> Control file manipulation may be useful as a routine in Cluster.pm,
> based on an offset in the file and a format to pack as argument?
> [...]
> It's one of these things I could see myself reuse to force a state in
> the cluster and make a test cheaper, for example.

> You would just need the part where
> the control file is rewritten, which should be OK as long as the
> cluster is freshly initdb'd meaning that there should be nothing that
> interacts with the new value set.

Agree. Still I don't see a good way of figuring out
sizeof(ControlFileData) from Perl. The structure has int's in it (e.g.
wal_level, MaxConnections, etc) thus the size is platform-dependent.
The CRC should be placed at the end of the structure. If we want to
manipulate MaxConnections etc their offsets are going to be
platform-dependent as well. And my understanding is that the alignment
is platform/compiler dependent too.

I guess we are going to need either a `pg_writecontoldata` tool or
`pg_controldata -w` flag. I wonder which option you find more
attractive, or maybe you have better ideas?

[1]: https://perldoc.perl.org/functions/pack

-- 
Best regards,
Aleksander Alekseev

Re: Reordering DISTINCT keys to match input path's pathkeys

2024-11-14 Thread Andrei Lepikhov


On 11/14/24 08:09, Richard Guo wrote:

On Wed, Nov 13, 2024 at 7:36 PM Andrei Lepikhov  wrote:

On 11/13/24 13:49, Richard Guo wrote:
In thread [1], I try to add one more strategy that minimises the number
of comparison operator calls. It seems that it would work the same way
with the DISTINCT statement. Do you think it make sense in general and
can be a possible direction of improvement for the current patch?


I haven’t had a chance to follow that thread.  From a quick look at
that patch, it seems to improve the general costing logic for sorting.
If that’s the case, I think it would be beneficial in the areas where
we use cost_sort(), including in this patch.
Yes, the core of the discussion is cost calculation. It is a base for 
the final patch that adds a grouping strategy, according to which it may 
be profitable to put the column with max distinct value at the first 
position to make sorting less expensive. I think it makes sense to do 
the same in this DISTINCT feature, too (as a further improvement, of 
course).



disable features during severe slowdowns or bugs. It might make sense to
group them into a single 'Clause Reordering' parameter.

code.  If these GUCs are mainly for debugging, I think it's better to
keep them separate so that we can debug each optimization individually.

Ok.
See minor changes I propose in the attachment.

--
regards, Andrei Lepikhovdiff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 99063d0506..9006a14abc 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2287,7 +2287,6 @@ static int
 pathkeys_useful_for_distinct(PlannerInfo *root, List *pathkeys)
 {
 	int			n_common_pathkeys;
-	ListCell   *lc;
 
 	/*
 	 * distinct_pathkeys may have become empty if all of the pathkeys were
@@ -2298,10 +2297,8 @@ pathkeys_useful_for_distinct(PlannerInfo *root, List *pathkeys)
 
 	/* walk the pathkeys and search for matching DISTINCT key */
 	n_common_pathkeys = 0;
-	foreach(lc, pathkeys)
+	foreach_node(PathKey, pathkey, pathkeys)
 	{
-		PathKey*pathkey = (PathKey *) lfirst(lc);
-
 		/* no matching DISTINCT key, we're done */
 		if (!list_member_ptr(root->distinct_pathkeys, pathkey))
 			break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2870ef58f0..c22e41e83b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5158,7 +5158,6 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *input_path = (Path *) lfirst(lc);
 			Path	   *sorted_path;
 			List	   *useful_pathkeys_list = NIL;
-			ListCell   *lc2;
 
 			useful_pathkeys_list =
 get_useful_pathkeys_for_distinct(root,
@@ -5166,10 +5165,8 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
  input_path->pathkeys);
 			Assert(list_length(useful_pathkeys_list) > 0);
 
-			foreach(lc2, useful_pathkeys_list)
+			foreach_node(List, useful_pathkeys, useful_pathkeys_list)
 			{
-List	   *useful_pathkeys = (List *) lfirst(lc2);
-
 sorted_path = make_ordered_path(root,
 distinct_rel,
 input_path,
@@ -5276,11 +5273,9 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 {
 	List	   *useful_pathkeys_list = NIL;
 	List	   *useful_pathkeys = NIL;
-	ListCell   *lc;
 
 	/* always include the given 'needed_pathkeys' */
-	useful_pathkeys_list = lappend(useful_pathkeys_list,
-   needed_pathkeys);
+	useful_pathkeys_list = lappend(useful_pathkeys_list, needed_pathkeys);
 
 	if (!enable_distinct_reordering)
 		return useful_pathkeys_list;
@@ -5294,10 +5289,8 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 	 * list matches initial distinctClause pathkeys; otherwise, it won't have
 	 * the desired behavior.
 	 */
-	foreach(lc, path_pathkeys)
+	foreach_node(PathKey, pathkey, path_pathkeys)
 	{
-		PathKey*pathkey = (PathKey *) lfirst(lc);
-
 		/*
 		 * The PathKey nodes are canonical, so they can be checked for
 		 * equality by simple pointer comparison.
@@ -5324,19 +5317,16 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 		return useful_pathkeys_list;
 
 	/* Append the remaining PathKey nodes in needed_pathkeys */
-	useful_pathkeys = list_concat_unique_ptr(useful_pathkeys,
-			 needed_pathkeys);
+	useful_pathkeys = list_concat_unique_ptr(useful_pathkeys, needed_pathkeys);
 
 	/*
 	 * If the resulting pathkey list is the same as the 'needed_pathkeys',
 	 * just drop it.
 	 */
-	if (compare_pathkeys(needed_pathkeys,
-		 useful_pathkeys) == PATHKEYS_EQUAL)
+	if (compare_pathkeys(needed_pathkeys, useful_pathkeys) == PATHKEYS_EQUAL)
 		return useful_pathkeys_list;
 
-	useful_pathkeys_list = lappend(useful_pathkeys_list,
-   useful_pathkeys);
+	useful_pathkeys_list = lappend(useful_pathkeys_list, useful_pathkeys);
 
 	return useful_pathkeys_list;
 }

Re: [PATCH] New predefined role pg_manage_extensions

2024-11-14 Thread Jelte Fennema-Nio

On Thu, 31 Oct 2024 at 22:47, Michael Banck  wrote:
> Even though there has not been a lot of discussion on this, here is a
> rebased patch.  I have also added it to the upcoming commitfest.

After considering this again, I actually think this is a good change.
Basically all cloud providers already provide something similar. It
would be great if we could have a standardized way of doing this.

Regarding the actual patch: I think it looks good, but it needs some tests.

Re: SQL:2023 JSON simplified accessor support


On 07.11.24 22:57, Alexandra Wang wrote:

The v5 patch includes the following updates:

- Fixed the aforementioned issue and added more tests covering composite
types with domains, nested domains, and arrays of domains over
JSON/JSONB.

- Refactored the logic for parsing JSON/JSONB object fields by moving it
from ParseFuncOrColumn() to transformIndirection() for improved
readability. The ParseFuncOrColumn() function is already handling both
single-argument function calls and composite types, and it has other
callers besides transformIndirection().


This patch implements array subscripting support for the json type, but 
it does it in a non-standard way, using 
ParseJsonSimplifiedAccessorArrayElement().  This would be better done by 
providing a typsubscript function for the json type.  This is what jsonb 
already has, which is why your patch doesn't need to provide the array 
support for jsonb.  I suggest you implement the typsubscript support for 
the json type (make it a separate patch but you can keep it in this 
thread IMO) and remove the custom code from this patch.


A few comments on the tests:  The tests look good to me.  Good coverage 
of weirdly nested types.  Results look correct.


+drop table if exists test_json_dot;

This can be omitted, since we know that the table doesn't exist yet.

This code could be written in the more conventional insert ... values 
syntax:


+insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
+insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, 
"d":[11, 12]}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, 
"d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json;


Then the ::json casts can also go away.

Also, using a different value for "id" for each row would be more
useful, so that the subsequent tests could then be written like

select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot;

so we can see which result corresponds to which input row.  Also make
id the primary key in this table.

Also, let's keep the json and the jsonb variants aligned.  There are
some small differences, like the test_json_dot table having 4 rows but
the test_jsonb_dot having 3 rows.  And the array and wildcard tests in
the opposite order.  Not a big deal, but keeping these the same helps
eyeballing the test files.

Maybe add a comment somewhere in this file that you are running the
json_query equivalents to cross-check the semantics of the dot syntax.

Some documentation should be written.  This looks like this right place 
to start:


https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION

and them maybe some cross-linking between there and the sections on JSON 
types and operators.

Re: Interrupts vs signals

2024-11-14 Thread Jelte Fennema-Nio

On Sun, 10 Nov 2024 at 22:30, Heikki Linnakangas  wrote:
> I thought of the Wiki so that it could updated more casually by
> extension authors.

Makes sense, I agree it would be nice not to have to rely on
committers to merge these changes. Especially since many extension
authors are not committers. So +1 for storing it on the wiki.

Re: define pg_structiszero(addr, s, r)

Hi,

On Thu, Nov 14, 2024 at 09:13:19AM -0300, Ranier Vilela wrote:
> Em qui., 14 de nov. de 2024 às 08:58, Bertrand Drouvot <
> Maybe I'm doing something wrong.
> But I'm testing in 32-bit, with the size set to 63, with v12 and I'm seeing
> the SIMD loop execute.

Yeah, that's expected and safe as each iteration reads 32 bytes on 32-bit.

> if (len < sizeof(size_t) * 8) // 8-63 bytes
> failed.
> 
> I expected that with size 63, it would be solved by case 2, or am I wrong?

Case 2 should be read as "in the 4-31" bytes range on 32-bit system as all 
comparisons are done in size_t.

What would be unsafe on 32-bit would be to read up to 32 bytes while len < 32
and that can not happen.

As mentioned up-thread the comments are wrong on 32-bit, indeed they must be 
read
as:

Case 1: len < 4 bytes
Case 2: len in the 4-31 bytes range
Case 3: len >= 32 bytes

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Difference in dump from original and restored database due to NOT NULL constraints on children

Hello Ashutosh,

On 2024-Nov-14, Ashutosh Bapat wrote:

> Because of the test I am developing under discussion at [1], I noticed
> that the DDLs dumped for inheritance children with NOT NULL
> constraints defer between original database and database restored from
> a dump of original database. I investigated those differences, but can
> not decide whether they are expected and whether they are intentional
> or not. Look something related to
> 14e87ffa5c543b5f30ead7413084c25f7735039f.

It does and it is, thanks.

So there are two differences:

1. 

CREATE TABLE generated_stored_tests.gtest1 (
a integer NOT NULL,
b integer GENERATED ALWAYS AS ((a * 2)) STORED
);

 CREATE TABLE generated_stored_tests.gtestxx_4 (
-a integer,
-b integer NOT NULL
+a integer NOT NULL,
+b integer
 )
 INHERITS (generated_stored_tests.gtest1);

In this case, I think it's wrong to add NOT NULL to gtestxx_4.a because
the constraint was not local originally and it should be acquired by the
inheritance.  I'm not sure about gtestxx_4.b ... the GENERATED
constraint should induce a not-null if I recall correctly, so an
explicit not-null marking in the child might not be necessary. But I'll
have a look.

2. 

CREATE TABLE public.notnull_tbl4 (
a integer NOT NULL
);

 CREATE TABLE public.notnull_tbl4_cld2 (
 )
 INHERITS (public.notnull_tbl4);
-ALTER TABLE ONLY public.notnull_tbl4_cld2 ALTER COLUMN a SET NOT NULL;
+ALTER TABLE ONLY public.notnull_tbl4_cld2 ADD CONSTRAINT 
notnull_tbl4_a_not_null NOT NULL a;

 CREATE TABLE public.notnull_tbl4_cld3 (
 )
 INHERITS (public.notnull_tbl4);
-ALTER TABLE ONLY public.notnull_tbl4_cld3 ADD CONSTRAINT a_nn NOT NULL a;
+ALTER TABLE ONLY public.notnull_tbl4_cld3 ADD CONSTRAINT 
notnull_tbl4_a_not_null NOT NULL a;

For notnull_tbl4_cld2 we try to set the column not-null, but it's
already not-null due to inheritance ... so why are we doing that?
Weird.  These lines should just go away in both dumps.

In notnull_tbl4_cld3's case we have the same, but we also want to set a
constraint name, but the constraint already exists, so that doesn't
work, and that's the reason why the next dump uses the standard name.
Maybe we could dump the name change as ALTER TABLE RENAME CONSTRAINT in
that case instead, _if_ we can obtain the original name (should be
doable, because we can see it's a nonstandard name.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Enhancing Memory Context Statistics Reporting

On 2024-Nov-14, Michael Paquier wrote:

> Already mentioned previously at [1] and echoing with some surrounding
> arguments, but I'd suggest to keep it simple and just remove entirely
> the part of the patch where the stats information gets spilled into
> disk.  With more than 6000-ish context information available with a
> hard limit in place, there should be plenty enough to know what's
> going on anyway.

Functionally-wise I don't necessarily agree with _removing_ the spill
code, considering that production systems with thousands of tables would
easily reach that number of contexts (each index gets its own index info
context, each regexp gets its own memcxt); and I don't think silently
omitting a fraction of people's memory situation (or erroring out if the
case is hit) is going to make us any friends.

That said, it worries me that we choose a shared memory size so large
that it becomes impractical to hit the spill-to-disk code in regression
testing.  Maybe we can choose a much smaller limit size when
USE_ASSERT_CHECKING is enabled, and use a test that hits that number?
That way, we know the code is being hit and tested, without imposing a
huge memory consumption on test machines.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)

Re: 2024-11-14 release announcement draft

2024-11-14 Thread Jonathan S. Katz


On 11/14/24 2:55 AM, jian he wrote:

On Tue, Nov 12, 2024 at 12:28 PM Jonathan S. Katz  wrote:


Hi,

Please review the draft of the release announcement for the 2024-11-14
release. Please provide any feedback by 2024-11-14 12:00 UTC.



someone reminded me.
i am wondering do we need include the following two items:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
https://git.postgresql.org/cgit/postgresql.git/commit/?id=90fe6251c816547b9cb4e1895dd8e6620bae94e1


Yes, I agree. I've expanded the "Several query planner fixes" point to 
call out this one specifically. Thanks!


Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: 2024-11-14 release announcement draft

2024-11-14 Thread Jonathan S. Katz


On 11/11/24 11:54 PM, Thomas Munro wrote:

On Tue, Nov 12, 2024 at 5:28 PM Jonathan S. Katz  wrote:

Please review the draft of the release announcement for the 2024-11-14
release. Please provide any feedback by 2024-11-14 12:00 UTC.


I think "* Fixes random crashes in JIT compilation on aarch64 systems"
might be worth highlighting.  We've had a handful of separate reports
about that and people have been turning off JIT to work around it.


Thanks - I wasn't sure if that was user visible. I've included it in the 
final copy!


Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: Commit Timestamp and LSN Inversion issue

2024-11-14 Thread Jan Wieck


On 11/13/24 03:56, Amit Kapila wrote:


the key point Andres
is raising is that we won't be able to convert the operation in
ReserveXLogInsertLocation() to use atomics after such a solution. Now,
even, if the change is only in the *commit* code path, we may or may
not be able to maintain two code paths for reserving LSN, one using
atomics and the other (for commit record) using spinlock.


Which is only partially true. There is nothing that would prevent us 
from using atomic operations inside of a spinlock and only reserving 
xlog space for commit records needs a spinlock because of the extra 
logic that cannot be combined into a single atomic operation. The idea 
as I understand Andres is that changing ReserveXLogInsertLocation() to 
use atomics gets rid not only of the spinlock, but also the LW-locks 
that protect it from a spinlock storm.


Keeping the current LW-lock plus spinlock architecture only for commit 
records but changing the actual reserve to atomic operations would 
affect small transactions more than large ones. Making all of this 
depend on "wal_level = logical" removes the argument that the two 
solutions are mutually exclusive. It does however make the code less 
maintenance friendly.



Best Regards, Jan

Re: per backend I/O statistics

Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:
> On Fri, Nov 08, 2024 at 02:09:30PM +, Bertrand Drouvot wrote:
> > 1. one assertion in pgstat_drop_entry_internal() is not necessary true 
> > anymore 
> > with this new stat kind. So, adding an extra bool as parameter to take care 
> > of it.
> 
> Why?  I have to admit that the addition of this argument at this level
> specific to a new stats kind feels really weird and it looks like a
> layer violation, while this happens only when resetting the whole
> pgstats state because we have loaded a portion of the stats but the
> file loaded was in such a state that we don't have a consistent
> picture.

Yes. I agree that looks weird and I don't like it that much. But the assertion
is not true anymore. If you:

- change the arguments to false in the pgstat_drop_entry_internal() call in 
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

> > 2. a new struct "PgStat_Backend" is created and does contain the backend 
> > type.
> > The backend type is used for filtering purpose when the stats are displayed.
> 
> Is that necessary?  We can guess that from the PID with a join in
> pg_stat_activity.  pg_stat_get_backend_io() from 0004 returns a set of
> tuples for a specific PID, as well..

How would that work for someone doing "select * from 
pg_stat_get_backend_io()"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

> + /* save the bktype */
> + if (kind == PGSTAT_KIND_PER_BACKEND)
> + bktype = ((PgStatShared_Backend *) header)->stats.bktype;
> [...]
> + /* restore the bktype */
> + if (kind == PGSTAT_KIND_PER_BACKEND)
> + ((PgStatShared_Backend *) header)->stats.bktype = bktype;
> 
> Including the backend type results in these blips in
> shared_stat_reset_contents() which should not have anything related 
> to stats kinds and should remain neutral, as well.

Yeah, we can simply get rid of it if we remove the backend type in 
PgStat_Backend.

> pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
> guess that it should be static in this file rather than declared in
> pgstat.h?

Good catch, thanks!

> +typedef struct PgStat_PendingIO
> 
> Perhaps this part should use a separate structure named
> "BackendPendingIO"?  The definition of the structure has to be in
> pgstat.h as this is the pending_size of the new stats kind.  It looks
> like it would be cleaner to keep PgStat_PendingIO local to
> pgstat_io.c, and define PgStat_PendingIO based on
> PgStat_BackendPendingIO?

I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h?

> + /*
> +  * Do serialize or not this kind of stats.
> +  */
> + boolto_serialize:1;
> 
> Not sure that "serialize" is the best term that applies here.  For
> pgstats entries, serialization refers to the matter of writing their
> entries with a "serialized" name because they have an undefined number
> when stored locally after a reload.  I'd suggest to split this concept
> into its own patch, rename the flag as "write_to_file" (or what you
> think is suited), and also apply the flag in the fixed-numbered loop
> done in pgstat_write_statsfile() before going through the dshash.

Makes sense to create its own patch, will have a look.

> +  
> +   
> +
> + pg_stat_reset_single_backend_io_counters
> +
> +pg_stat_reset_single_backend_io_counters ( 
> int4 )
> +void
> 
> This should document that the input argument is a PID.

Yeap, will add.

> 
> Is pg_my_stat_io() the best name ever? 

Not 100% sure, but there is already "pg_my_temp_schema" so I thought that
pg_my_stat_io would not be that bad. Open to suggestions though.

> I'd suggest to just merge 0004 with 0001.

Sure.

> Structurally, it may be cleaner to keep all the callbacks and the
> backend-I/O specific logic into a separate file, perhaps
> pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

> >  0002
> > 
> > Merge both IO stats flush callback. There is no need to keep both callbacks.
> > 
> > Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
> > IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.
> 
> Not sure to be a fan of that, TBH, still I get the performance
> argument of the matter.  Each stats kind has its own flush path, and
> this assumes that we'll never going to diverge in terms of the
> information maintained for each backend and the existing pg_stat_io.
> Perhaps there could be a divergence at some point?

Yeah perhaps, so in case of diver

Re: Refactoring postmaster's code to cleanup after child exit

2024-11-14 Thread Heikki Linnakangas


On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Lots of little cleanups and changes here and there since the last
versions, but the notable bigger changes are:

- There is now a BackendTypeMask datatype, so that if you try to mix up
bitmasks and plain BackendType values, the compiler will complain.

- pmchild.c has been rewritten per feedback, so that the "pools" of
PMChild structs are more explicit. The size of each pool is only stated
once, whereas before the same logic was duplicated in
MaxLivePostmasterChildren() which calculates the number of slots and in
InitPostmasterChildSlots() which allocates them.

- In PostmasterStateMachine(), I combined the code to handle
PM_STOP_BACKENDS and PM_WAIT_BACKENDS. They are essentially the same
state, except that PM_STOP_BACKENDS first sends the signal to all the
child processes that it will then wait for. They both needed to build
the same bitmask of processes to signal or wait for; this eliminates the
duplication.


Made a few more changes since last patch version:

- Fixed initialization in pmchild.c in single-user and bootstrapping mode
- inlined assign_backendlist_entry() into its only caller; it wasn't 
doing much anymore

- cleaned up some leftovers in canAcceptConnections()
- Renamed some functions for clarity, fixed some leftover comments that 
still talked about Backend structs and BackendList


With those changes, committed. Thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: On non-Windows, hard depend on uselocale(3)