Re: Implement waiting for wal lsn replay: reloaded

2025-09-13 Thread Alexander Korotkov
Hi, Xuneng!

On Wed, Aug 27, 2025 at 6:54 PM Xuneng Zhou  wrote:
> I did a rebase for the patch to v8 and incorporated a few changes:
>
> 1) Updated documentation, added new tests, and applied minor code
> adjustments based on prior review comments.
> 2) Tweaked the initialization of waitReplayLSNState so that
> non-backend processes can call wait for replay.
>
> Started a new thread [1] and attached a patch addressing the polling
> issue in the function
> read_local_xlog_page_guts built on the infra of patch v8.
>
> [1] 
> https://www.postgresql.org/message-id/cabptf7vr99gz5gm_zybynd9mmnovw3pukbevivohkrvjw-d...@mail.gmail.com
>
> Feedbacks welcome.

Thank you for your reviewing and revising this patch.

I see you've integrated most of your points expressed in [1].  I went
though them and I've integrated the rest of them.  Except this one.

> 11) The synopsis might read more clearly as:
> - WAIT FOR LSN '' [ TIMEOUT  ] [ 
> NO_THROW ]

I didn't find examples on how we do the similar things on other places
of docs.  This is why I decided to leave this place as it currently
is.

Also, I found some mess up with typedefs.list.  I've returned the
changes to typdefs.list back and re-indented the sources.

I'd like to ask your opinion of the way this feature is implemented in
terms of grammar: generic parsing implemented in gram.y and the rest
is done in wait.c.  I think this approach should minimize additional
keywords and states for parsing code.  This comes at the price of more
complex code in wait.c, but I think this is a fair price.

Links.
1. 
https://www.postgresql.org/message-id/CABPTF7VsoGDMBq34MpLrMSZyxNZvVbgH6-zxtJOg5AwOoYURbw%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase


v9-0001-Implement-WAIT-FOR-command.patch
Description: Binary data


Re: Fix missing EvalPlanQual recheck for TID scans

2025-09-13 Thread Sophie Alpert
On Tue, Sep  9, 2025 at 10:18 PM, Chao Li  wrote:
> No, that’s not true. If you extend David’s procedure and use 3 sessions to 
> reproduce the problem, s1 update (0,1), s2 update (0,2) and s3 update (0,1) 
> or (0,2), and trace the backend process of s3, you will see every time when 
> TidRecheck() is called, “node” is brand new, so it has to call 
> TidListEval(node) every time.

After more investigation I agree you are correct here; in a single query, 
ReScan is called once for each subsequent tuple being rechecked. (ExecUpdate 
calls EvalPlanQualBegin which always sets rcplanstate->chgParam.)

On Wed, Sep 10, 2025 at 10:30 PM, Chao Li  wrote:
> But, why can't we make TidRecheck() to simplify return FALSE?
> 
> Looks like the only case where TidRecheck() is called is a concurrent 
> transaction upadated the row, and in that case, ctid have must be changed.

Although returning FALSE is clever, it is only correct if several other 
unstated preconditions for the function hold, which I am loath to rely on.

And indeed, like I mentioned in my previous message, my isolation test 
`permutation tid1 tidsucceed2 c1 c2 read` from eval-plan-qual.spec in my patch 
will fail if Recheck were to return false in this case. Though somewhat 
contrived, you can imagine this happening with multiple sessions driven by the 
same application:

setup: two rows exist with ctid=(0,1) and (0,2)
S1: BEGIN;
S2: BEGIN;
S1: UPDATE WHERE ctid=(0,1) RETURNING ctid;
-- returns (0,3), which the application uses in the next query from another 
session:
S2: UPDATE WHERE ctid=(0,1) OR ctid=(0,3); -- statement snapshot sees (0,1); 
recheck will see (0,3) and should pass
S1: COMMIT;
S2: COMMIT;

I would not defend this as good code from an application developer but the 
behavior is observable. So I understand it would be best to match the 
enable_tidscan = off behavior, which my existing strategy more verifiably does. 
Of course if the team disagrees with me then I will defer to everyone's better 
judgement.

I hope it is rare that many tuples need to be rechecked in a single query but 
if this is common, then I would suggest it would be better to rework the 
generic EPQ machinery so that EPQ nodes do not need to be not reset for every 
row, rather than depending on subtle implicit guarantees in the TidRecheck code.

Sophie




Re: shmem_startup_hook called twice on Windows

2025-09-13 Thread Nathan Bossart
I quickly put together a patch for the stuff we've discussed in this
thread.  WDYT?

-- 
nathan
>From 11289e1e69fc7c6fdbe5a73483efc2daf1197ec9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 8 Sep 2025 16:08:00 -0500
Subject: [PATCH v1 1/1] Fix shmem_startup_hook documentation.

---
 doc/src/sgml/xfunc.sgml| 10 ++
 src/test/modules/test_slru/test_slru.c | 21 +++--
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index da21ef56891..80d862ff9c2 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3668,11 +3668,13 @@ LWLockRelease(AddinShmemInitLock);
 
   shmem_startup_hook provides a convenient place for the
   initialization code, but it is not strictly required that all such code
-  be placed in this hook.  Each backend will execute the registered
-  shmem_startup_hook shortly after it attaches to shared
-  memory.  Note that add-ins should still acquire
+  be placed in this hook.  In some builds, each backend will execute the
+  registered shmem_startup_hook shortly after it
+  attaches to shared memory, so add-ins should still acquire
   AddinShmemInitLock within this hook, as shown in the
-  example above.
+  example above.  In other builds, the postmaster process will execute the
+  shmem_startup_hook once, and each backend will
+  automatically inherit the pointers to shared memory.
  
 
  
diff --git a/src/test/modules/test_slru/test_slru.c 
b/src/test/modules/test_slru/test_slru.c
index 8c03674..f41422cca7d 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -219,8 +219,8 @@ test_slru_shmem_startup(void)
 */
const bool  long_segment_names = true;
const char  slru_dir_name[] = "pg_test_slru";
-   int test_tranche_id;
-   int test_buffer_tranche_id;
+   int test_tranche_id = -1;
+   int test_buffer_tranche_id = -1;
 
if (prev_shmem_startup_hook)
prev_shmem_startup_hook();
@@ -231,10 +231,19 @@ test_slru_shmem_startup(void)
 */
(void) MakePGDirectory(slru_dir_name);
 
-   /* initialize the SLRU facility */
-   test_tranche_id = LWLockNewTrancheId("test_slru_tranche");
-
-   test_buffer_tranche_id = LWLockNewTrancheId("test_buffer_tranche");
+   /*
+* Initialize the SLRU facility.
+*
+* In EXEC_BACKEND builds, the shmem_startup_hook is called in the
+* postmaster and in each backend, but we only need to generate the 
LWLock
+* tranches once.  Note that these IDs are not used by SimpleLruInit() 
in
+* the IsUnderPostmaster (i.e., backend) case.
+*/
+   if (!IsUnderPostmaster)
+   {
+   test_tranche_id = LWLockNewTrancheId("test_slru_tranche");
+   test_buffer_tranche_id = 
LWLockNewTrancheId("test_buffer_tranche");
+   }
 
TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
SimpleLruInit(TestSlruCtl, "TestSLRU",
-- 
2.39.5 (Apple Git-154)



Re: Fix missing EvalPlanQual recheck for TID scans

2025-09-13 Thread Sophie Alpert
On Sat, Sep 13, 2025 at  3:12 PM, Sophie Alpert  wrote:
> And indeed, like I mentioned in my previous message, my isolation test 
> `permutation tid1 tidsucceed2 c1 c2 read` from eval-plan-qual.spec in 
> my patch will fail if Recheck were to return false in this case. Though 
> somewhat contrived, you can imagine this happening with multiple 
> sessions driven by the same application:

Another case where returning FALSE does not give the correct behavior is when 
two relations are involved, only one of which is modified:

S1: BEGIN;
S2: BEGIN;
S1: UPDATE accounts SET balance = balance + 100 WHERE ctid = '(0,1)' RETURNING 
accountid, balance;
S2: SELECT * FROM accounts JOIN accounts_ext USING (accountid) WHERE 
accounts_ext.ctid = '(0,1)' FOR UPDATE OF accounts;
S1: COMMIT;
S2: COMMIT;

In my patch the S2 query correctly returns one row, whereas with your proposed 
change it incorrectly returns none.

accountid|balance|balance2|balance|other|newcol|newcol2
-+---++---+-+--+---
checking |700|1400|600|other|42|   

Sophie




Re: Use WALReadFromBuffers in more places

2025-09-13 Thread Bharath Rupireddy
Hi,

On Tue, Oct 15, 2024 at 1:22 AM Jingtang Zhang  wrote:
>
> I've been back to this patch for a while recently. I witness that if a WAL
> writer works fast, the already flushed WAL buffers will be zeroed out and
> re-initialized for future use by AdvanceXLInsertBuffer in
> XLogBackgroundFlush, so that WALReadFromBuffers will miss even though the
> space of WAL buffer is enough. It is much more unfriendly for logical
> walsenders than physical walsenders, because logical ones consume WAL
> slower than physical ones due to the extra decoding phase. Seems that the aim
> of AdvanceXLInsertBuffer in WAL writer contradicts with our reading from
> WAL buffer. Any thoughts?

Thanks for looking at this. Yes, the WAL writers can zero out flushed
buffers before WALReadFromBuffers gets to them. However,
WALReadFromBuffers was intentionally designed as an opportunistic
optimization - it's a "try this first, quickly" approach before
falling back to reading from WAL files. The no-locks design ensures it
never gets in the way of backends generating WAL, which is critical
for overall system performance.

I rebased and attached the v3 patch. I discarded the test extension
patch that demonstrated WALReadFromBuffers' behavior (i.e., waiting
for WAL to be fully copied to WAL buffers with
WaitXLogInsertionsToFinish), as I believe the comment at the top of
WALReadFromBuffers is sufficient documentation. I can reintroduce the
test extension if there's interest.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v3-0001-Use-WALReadFromBuffers-in-more-places.patch
Description: Binary data


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-13 Thread Peter Geoghegan
On Wed, Sep 10, 2025 at 3:41 PM Natalya Aksman  wrote:
> Fantastic, the patch is working, it fixes our issue!

I pushed this patch just now.

Thanks
-- 
Peter Geoghegan




Re: PostgreSQL 18 GA press release draft

2025-09-13 Thread Jonathan S. Katz

On 9/12/25 2:18 PM, Masahiko Sawada wrote:



I found a typo;

s/gen_rand_uuid()/gen_random_uuid()/


Good catch - thanks!

Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


remove unnecessary include in src/backend/commands/policy.c

2025-09-13 Thread jian he
hi.

in src/backend/commands/policy.c, i found that
#include "access/htup.h"
#include "access/htup_details.h"
#include "catalog/catalog.h"
#include "nodes/pg_list.h"
#include "parser/parse_node.h"
#include "utils/array.h"

is not necessary "include", so I removed it.

we can also remove
#include "access/relation.h"
replace relation_open to table_open, since we already did relkind check in
RangeVarCallbackForPolicy.
From 5ba1586d0844ec40674338ad920b12a61dd646ed Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sun, 14 Sep 2025 14:35:33 +0800
Subject: [PATCH v1 1/1] remove unnecessary include in policy.c

discussion: https://postgr.es/m/
---
 src/backend/commands/policy.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 83056960fe4..ce3eb10ae86 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -13,12 +13,8 @@
 #include "postgres.h"
 
 #include "access/genam.h"
-#include "access/htup.h"
-#include "access/htup_details.h"
-#include "access/relation.h"
 #include "access/table.h"
 #include "access/xact.h"
-#include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
@@ -28,15 +24,12 @@
 #include "catalog/pg_type.h"
 #include "commands/policy.h"
 #include "miscadmin.h"
-#include "nodes/pg_list.h"
 #include "parser/parse_clause.h"
 #include "parser/parse_collate.h"
-#include "parser/parse_node.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "rewrite/rowsecurity.h"
 #include "utils/acl.h"
-#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
@@ -630,7 +623,7 @@ CreatePolicy(CreatePolicyStmt *stmt)
 		stmt);
 
 	/* Open target_table to build quals. No additional lock is necessary. */
-	target_table = relation_open(table_id, NoLock);
+	target_table = table_open(table_id, NoLock);
 
 	/* Add for the regular security quals */
 	nsitem = addRangeTableEntryForRelation(qual_pstate, target_table,
@@ -752,7 +745,7 @@ CreatePolicy(CreatePolicyStmt *stmt)
 	free_parsestate(qual_pstate);
 	free_parsestate(with_check_pstate);
 	systable_endscan(sscan);
-	relation_close(target_table, NoLock);
+	table_close(target_table, NoLock);
 	table_close(pg_policy_rel, RowExclusiveLock);
 
 	return myself;
@@ -805,7 +798,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
 		RangeVarCallbackForPolicy,
 		stmt);
 
-	target_table = relation_open(table_id, NoLock);
+	target_table = table_open(table_id, NoLock);
 
 	/* Parse the using policy clause */
 	if (stmt->qual)
@@ -1082,7 +1075,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
 
 	/* Clean up. */
 	systable_endscan(sscan);
-	relation_close(target_table, NoLock);
+	table_close(target_table, NoLock);
 	table_close(pg_policy_rel, RowExclusiveLock);
 
 	return myself;
@@ -1110,7 +1103,7 @@ rename_policy(RenameStmt *stmt)
 		RangeVarCallbackForPolicy,
 		stmt);
 
-	target_table = relation_open(table_id, NoLock);
+	target_table = table_open(table_id, NoLock);
 
 	pg_policy_rel = table_open(PolicyRelationId, RowExclusiveLock);
 
@@ -1189,7 +1182,7 @@ rename_policy(RenameStmt *stmt)
 	/* Clean up. */
 	systable_endscan(sscan);
 	table_close(pg_policy_rel, RowExclusiveLock);
-	relation_close(target_table, NoLock);
+	table_close(target_table, NoLock);
 
 	return address;
 }
-- 
2.34.1



Re: meson's in-tree libpq header search order vs -Dextra_include_dirs

2025-09-13 Thread Thomas Munro
Added to commitfest: https://commitfest.postgresql.org/patch/6056/

I'm having to carry this patch in all my development branches for now
or I can't build on one of my regular dev machines, so I'm quite keen
to get this into the tree soon and would back-patch to 16.

I gather no one else is affected yet, but I assume you can see the
problem on a Mac by installing PostgreSQL with Homebrew/MacPorts, or
on CI if you do this to .cirrus.tasks.yml:

   setup_additional_packages_script: |
-#pkg install -y ...
+pkg install -y postgresql17-client




Re: How can end users know the cause of LR slot sync delays?

2025-09-13 Thread Shlok Kyal
Hi Amit, Ashutosh,

On Fri, 12 Sept 2025 at 17:28, Ashutosh Sharma  wrote:
>
> Hi Amit,
>
> On Fri, Sep 12, 2025 at 4:24 PM Amit Kapila  wrote:
> >
> > On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma  
> > wrote:
> > >
> > > On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma  
> > > wrote:
> > > >
> > > > I think we can do that, since sync_skip_reason appears to be a
> > > > descriptive metadata rather than statistical data like
> > > > slot_sync_skip_count and last_slot_sync_skip. However, it's also true
> > > > that all three pieces of data are transient by nature - they will just
> > > > be present in the runtime.
> > > >
> > >
> > > After spending some more time on this, I found that maintaining
> > > sync_skip_reason in pg_replication_slots would make the code changes a
> > > bit messy and harder to maintain.
> > >
> >
> > What exactly is your worry? It seems more logical to have
> > sync_skip_reason in pg_replication_slots and other two in
> > pg_stat_replication_slots as the latter is purely a stats view and the
> > sync_skip_count/last_sync_skip suits there better.
> >
>
> The code changes for adding the skip reason to pg_replication_slots
> feel a bit hacky compared to the approach for incorporating all three
> pieces of information in pg_stat_replication_slots. I thought many
> might prefer simplicity over hackiness, which is why having everything
> in pg_stat_replication_slots could be more acceptable. That said, we
> could maybe prepare a POC patch with this approach as well, compare
> the two, and then decide which path to take.
>

Here are my thought on this:

I believe that if we decide to keep skip_reason in
pg_stat_replication_slot, it should mean "reason of last slot sync
skip" as I think it would make more sense in the sense of statistics.
And if we decide to keep skip_reason in pg_replication_slot, it would
be more appropriate to keep the latest slot data ( It should display
skip_reason only if the current slot sync cycle is skipped).
This is my observation based on the behaviour of  current columns in
these views. Thoughts?

I have also attached POC patches for both approaches as per discussion above.
v2_approach1 : It adds all columns 'slot_sync_skip_reason',
'slot_sync_skip_count' and 'last_slot_sync_skip' to
pg_stat_replication_slots
v2_approach2 : It adds column 'slot_sync_skip_reason' to
pg_replication_slots and columns  'slot_sync_skip_count' and
'last_slot_sync_skip' to pg_stat_replication_slots

Thanks,
Shlok Kyal


v2_approach1-0001-Add-stats-related-to-slot-sync-skip.patch
Description: Binary data


v2_approach2-0001-Add-stats-related-to-slot-sync-skip.patch
Description: Binary data


Re: pg_dump does not dump domain not-null constraint's comments

2025-09-13 Thread Noah Misch
On Tue, Jul 15, 2025 at 03:40:38PM +0200, Álvaro Herrera wrote:
> On 2025-Jul-15, jian he wrote:
> > we should let:
> > dumpConstraint handles dumping separate "NOT VALID" domain constraints along
> > with their comments.
> > dumpDomain: handles dumping "inlined" valid (not separate) domain 
> > constraints
> > together with their comments.

This became commit 0858f0f.

> @@ -18487,8 +18493,25 @@ dumpConstraint(Archive *fout, const ConstraintInfo 
> *coninfo)
>   
>   .description = "CHECK CONSTRAINT",
>   
>   .section = SECTION_POST_DATA,
>   
>   .createStmt = q->data,
>   
>   .dropStmt = delq->data));
> +
> + if (coninfo->dobj.dump & DUMP_COMPONENT_COMMENT)
> + {
> + char   *qtypname;
> +
> + PQExpBuffer conprefix = createPQExpBuffer();
> + qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
> +
> + appendPQExpBuffer(conprefix, "CONSTRAINT %s ON 
> DOMAIN",
> +   
> fmtId(coninfo->dobj.name));
> +
> + dumpComment(fout, conprefix->data, qtypname,
> + 
> tyinfo->dobj.namespace->dobj.name,
> + tyinfo->rolname,
> + coninfo->dobj.catId, 0, 
> tyinfo->dobj.dumpId);

The last argument gives the dump object on which the comment has a dependency.
Since this is the case of a separately-dumped constraint, the comment needs to
depend on that constraint (coninfo), not on the domain (tyinfo):

-   coninfo->dobj.catId, 0, 
tyinfo->dobj.dumpId);
+   coninfo->dobj.catId, 0, 
coninfo->dobj.dumpId);

I didn't encounter a failure from this, but sufficient restore parallelism
might reach a failure.  A failure would look like a "does not exist" error in
the COMMENT command, due to the constraint not yet existing.
dumpTableConstraintComment() is an older case that optimally handles the last
dumpComment() arg.

In the absence of objections, I'll make it so.




Re: RFC: extensible planner state

2025-09-13 Thread Melanie Plageman
On Mon, Aug 25, 2025 at 3:47 PM Robert Haas  wrote:
>
> 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo,
> and RelOptInfo. It is unchanged since v2, and contains only the fix
> for memory context handling since v1. However, I've now tested it, and
> I think it's OK to commit, barring further review comments.

A few nits on 0001

> From 1aa43c063edb325548fa3db30b9991bf0831f6f5 Mon Sep 17 00:00:00 2001
> From: Robert Haas 
> Date: Mon, 18 Aug 2025 16:11:10 -0400
> Subject: [PATCH v3 1/5] Allow private state in certain planner data
> + * extendplan.c
> + *  Extend core planner objects with additional private state
> + *
> + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994-5, Regents of the University of California
> + *
> + * The interfaces defined in this file make it possible for loadable
> + * modules to their own private state inside of key planner data

You're missing a word above -- like "modules to store their own"

> + * uses set_join_pathlist_hook can arrange to compute a key intermediate
> + * result once per joinrel rather than on every call.
> + *
> + * IDENTIFICATION
> + *  src/backend/commands/extendplan.c

This path does not reflect where you put the file

> + *
> +int
> +GetPlannerExtensionId(const char *extension_name)
> +{
<--snip-->
> +
> +/* If there's an array but it's currently full, expand it. */
> +if (PlannerExtensionNamesAssigned >= PlannerExtensionNamesAllocated)
> +{
> +inti = pg_nextpower2_32(PlannerExtensionNamesAssigned + 
> 1);

Storing a uint32 in a signed int that could be 32-bit stuck out to me.

> +
> +PlannerExtensionNameArray = (const char **)
> +repalloc(PlannerExtensionNameArray, i * sizeof(char *));
> +PlannerExtensionNamesAllocated = i;
> +}
> +
> +/* Assign and return new ID. */
> +PlannerExtensionNameArray[PlannerExtensionNamesAssigned] = 
> extension_name;

Since you don't copy the extension name, it might be worth mentioning
that the caller should provide a literal or at least something that
will be around later.

> diff --git a/src/include/optimizer/extendplan.h 
> b/src/include/optimizer/extendplan.h
> new file mode 100644
> +extern void SetPlannerInfoExtensionState(PlannerInfo *root, int extension_id,
> + void *opaque);
> +extern void SetRelOptInfoExtensionState(RelOptInfo *root, int extension_id,
> +void *opaque);

You used a different variable name here than in the implementation for
the RelOptInfo parameter.

> 0002 removes join_search_private, as before. Whether it makes sense to
> go ahead with this is debatable. Needs review, and needs an opinion on
> whether this should be considered a PoC only (and discarded) or
> something that should go forward to commit.

Is there a downside to going forward with it?

As for the code itself, I thought assumeReplanning was a bit vague
since it seems like whether or not replanning is allowed could come up
outside of join order search -- but perhaps that's okay.

> For another example of how these patches could be used, see
> http://postgr.es/m/CA+TgmoZ=6jji9tgyzcm33vads46hfkyz6aju_salt6gfs-i...@mail.gmail.com
> and in particular 0001 and 0002. This patch set's planner_setup_hook
> call would go write after those patches compute default_ssa_mask and
> default_jsa_mask, allowing the hook to override those values.

So, are you saying that you would rewrite the patches in that set to
use the infrastructure in this set -- e.g. remove that set's
PlannerGlobal.default_jsa_mask and instead put it in
PlannerGlobal.extension_state? Or am I misunderstanding?

- Melanie




Re: Eager aggregation, take 3

2025-09-13 Thread Richard Guo
On Sat, Sep 13, 2025 at 3:48 AM Robert Haas  wrote:
> On Fri, Sep 12, 2025 at 5:34 AM Richard Guo  wrote:
> > I really like this idea.  Currently, aggtransspace represents an
> > estimate of the transition state size provided by the aggregate
> > definition.  If it's set to zero, a default estimate based on the
> > state data type is used.  Negative values currently have no defined
> > meaning.  I think it makes perfect sense to reuse this field so that
> > a negative value indicates that the transition state data can grow
> > unboundedly in size.
> >
> > Attached 0002 implements this idea.  It requires fewer code changes
> > than I expected.  This is mainly because that our current code uses
> > aggtransspace in such a way that if it's a positive value, that value
> > is used as it's provided by the aggregate definition; otherwise, some
> > heuristics are applied to estimate the size.  For the aggregates that
> > accumulate input rows (e.g., array_agg, string_agg), I don't currently
> > have a better heuristic for estimating their size, so I've chosen to
> > keep the current logic.  This won't regress anything in estimating
> > transition state data size.

> This might be OK, but it's not what I was suggesting: I was suggesting
> trying to do a calculation like space_used = -aggtransspace *
> rowcount, not just using a <0 value as a sentinel.

I've considered your suggestion, but I'm not sure I'll adopt it in the
end.  Here's why:

1) At the point where we check whether any aggregates might pose a
risk of excessive memory usage during partial aggregation, row count
information is not yet available.  You could argue that we could
reorganize the logic to perform this check after we've had the row
count, but that seems quite tricky.  If I understand correctly, the
"rowcount" in this context actually means the number of rows within
one partial group.  That would require us to first decide on the
grouping expressions for the partial aggregation, then compute the
group row counts, then estimate space usage, and only then decide
whether memory usage is excessive and fall back.  This would come
quite late in planning and adds nontrivial overhead, compared to the
current approach which checks at the very beginning.

2) Even if we were able to estimate space usage based on the number of
rows per partial group and determined that memory usage seems
acceptable, we still couldn't guarantee that the transition state data
won't grow excessively after further joins.  Joins can multiply
partial aggregates, potentially causing a blowup in memory usage even
if the initial estimate seemed safe.

3) I don't think "-aggtransspace * rowcount" reflects the true memory
footprint for aggregates that accumulate input rows.  For example,
what if we have an aggregate like string_agg(somecolumn, 'a very long
delimiter')?

4) AFAICS, the main downside of the current approach compared to yours
is that it avoids pushing down aggregates like string_agg() that
accumulate input rows, whereas your suggestion might allow pushing
them down in some cases where we *think* it wouldn't blow up memory.
You might argue that the current implementation is over-conservative.
But I prefer to start safe.

That said, I appreciate you proposing the idea of reusing
aggtransspace, although I ended up using it in a different way than
you suggested.

- Richard




Mark ItemPointer arguments as const thoughoutly

2025-09-13 Thread Chao Li
Hi Hacker,

This is a follow up 991295f. I searched over the src/ and make all
ItemPointer arguments as const as much as possible.

I made clean build and no waring found. And "make check" also passes. I
will create a patch on CF to see if CI passes.

Best regards,
Chao Li (Evan)
-
HighGo Software Co., Ltd.
https://www.highgo.com/


v1-0001-Mark-ItemPointer-arguments-as-const-thoughoutly.patch
Description: Binary data


Re: Eager aggregation, take 3

2025-09-13 Thread Robert Haas
On Fri, Sep 12, 2025 at 5:34 AM Richard Guo  wrote:
> I really like this idea.  Currently, aggtransspace represents an
> estimate of the transition state size provided by the aggregate
> definition.  If it's set to zero, a default estimate based on the
> state data type is used.  Negative values currently have no defined
> meaning.  I think it makes perfect sense to reuse this field so that
> a negative value indicates that the transition state data can grow
> unboundedly in size.
>
> Attached 0002 implements this idea.  It requires fewer code changes
> than I expected.  This is mainly because that our current code uses
> aggtransspace in such a way that if it's a positive value, that value
> is used as it's provided by the aggregate definition; otherwise, some
> heuristics are applied to estimate the size.  For the aggregates that
> accumulate input rows (e.g., array_agg, string_agg), I don't currently
> have a better heuristic for estimating their size, so I've chosen to
> keep the current logic.  This won't regress anything in estimating
> transition state data size.

This might be OK, but it's not what I was suggesting: I was suggesting
trying to do a calculation like space_used = -aggtransspace *
rowcount, not just using a <0 value as a sentinel.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: How can end users know the cause of LR slot sync delays?

2025-09-13 Thread Ashutosh Sharma
Hi Amit,

On Fri, Sep 12, 2025 at 4:24 PM Amit Kapila  wrote:
>
> On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma  wrote:
> >
> > On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma  
> > wrote:
> > >
> > > I think we can do that, since sync_skip_reason appears to be a
> > > descriptive metadata rather than statistical data like
> > > slot_sync_skip_count and last_slot_sync_skip. However, it's also true
> > > that all three pieces of data are transient by nature - they will just
> > > be present in the runtime.
> > >
> >
> > After spending some more time on this, I found that maintaining
> > sync_skip_reason in pg_replication_slots would make the code changes a
> > bit messy and harder to maintain.
> >
>
> What exactly is your worry? It seems more logical to have
> sync_skip_reason in pg_replication_slots and other two in
> pg_stat_replication_slots as the latter is purely a stats view and the
> sync_skip_count/last_sync_skip suits there better.
>

The code changes for adding the skip reason to pg_replication_slots
feel a bit hacky compared to the approach for incorporating all three
pieces of information in pg_stat_replication_slots. I thought many
might prefer simplicity over hackiness, which is why having everything
in pg_stat_replication_slots could be more acceptable. That said, we
could maybe prepare a POC patch with this approach as well, compare
the two, and then decide which path to take.

--
With Regards,
Ashutosh Sharma.




Re: Error with DEFAULT VALUE in temp table

2025-09-13 Thread Sergey Shinderuk

On 13.09.2025 00:19, Tom Lane wrote:

Fixed (and tested) in the attached.


Great! Thank you.


> So that "else" action was unreachable, and the code failed
> to set "istemp" true for its own temp schema.

As for dropping my own temp schema, it's still a bit inconsistent (as it 
was before):


postgres=# select pg_my_temp_schema()::regnamespace;
 pg_my_temp_schema
---
 pg_temp_0
(1 row)
postgres=# drop schema pg_temp_0;
DROP SCHEMA

postgres=# select * from dropped_objects where object_type = 
'schema' and is_temporary \gx

-[ RECORD 1 ]---+--
n   | 7
classid | 2615
objid   | 16398
objsubid| 0
original| t
normal  | f
is_temporary| t
object_type | schema
schema_name |
object_name | pg_temp_0
object_identity | pg_temp
address_names   | {pg_temp}
address_args| {}

object_identity is pg_temp, but object_name is pg_temp_0. But maybe 
that's okay. Anyway, I don't think that dropping my own temp schema 
makes sense.



Also I noticed that schema_name for temp functions doesn't match with 
object_identity (pg_temp vs pg_temp_1):


postgres=# create function pg_temp.bar(int) returns int as 'select 
$1' language sql;

CREATE FUNCTION
postgres=# drop function pg_temp.bar(int);
DROP FUNCTION

postgres=# select * from dropped_objects where object_type = 
'function' and is_temporary \gx

-[ RECORD 1 ]---+---
n   | 8
classid | 1255
objid   | 16412
objsubid| 0
original| t
normal  | f
is_temporary| t
object_type | function
schema_name | pg_temp
object_name |
object_identity | pg_temp_1.bar(integer)
address_names   | {pg_temp,bar}
address_args| {integer}

There should be a call to get_namespace_name_or_temp somewhere, I guess.

If you say this should be fixed, I can come up with a patch later. But 
maybe it's trivial.



Thanks again!

--
Sergey Shinderukhttps://postgrespro.com/




Re: plan shape work

2025-09-13 Thread Richard Guo
On Sat, Sep 13, 2025 at 12:08 AM Tom Lane  wrote:
> After thinking about this for awhile, I believe that Richard and I
> each had half of the right solution ;-).  Let me propose some new
> terminology in hopes of clarifying matters:
>
> * A join plan node "starts" an outer join if it performs the
> null-extension step corresponding to that OJ (specifically,
> if it is the first join doing null-extension over the minimum
> RHS of that OJ).
>
> * A join plan node "completes" an outer join if its output
> nulls all the values that that OJ should null when done
> according to syntactic order.

This new notion makes a lot of sense to me.  I feel that it could help
us optimize some existing logic, or at least make certain parts of it
easier to understand.  It might be worth adding to the README, maybe
under the section "Relation Identification and Qual Clause Placement",
where we explain the idea behind pushed-down joins.

- Richard




Re: plan shape work

2025-09-13 Thread Richard Guo
On Sat, Sep 13, 2025 at 2:32 AM Robert Haas  wrote:
> We sort of got started down this path because, reviewing v4-0003,
> Richard commented that I might be able to sanity-check
> something-or-other about RTE_JOIN RTIs instead of just focusing on
> baserels. From there, this sub-thread has turned into a discussion of
> exactly what that sanity check should be.

Yeah, when commenting on v4-0003 about including outer join relids in
the assertion, I had a feeling that it might take the discussion off
topic -- sorry that it did.  That's why I suggested moving this part
of discussion to a separate thread.

I still think that cross-checking outer join relids isn't a
requirement for committing your patches, so I'm totally fine if you
end up with a version that doesn't assert outer join relids.

- Richard




Re: POC: make mxidoff 64 bits

2025-09-13 Thread Alexander Korotkov
Hello Maxim!

On Thu, Sep 11, 2025 at 11:58 AM Maxim Orlov  wrote:
>
> Once again, @ 8191e0c16a

Thank you for your work on this subject.  Multixact members can really
grow much faster than multixact offsets, and avoiding wraparound just
here might make sense.  At the same time, making multixact offsets
64-bit is local and doesn't require changing the tuple xmin/xmax
interpretation.

I went through the patchset.  The shape does not look bad, but I have
a concern about the size of the multixact offsets.  As I understand,
this patchset grows multixact offsets twice; each multixact offset
grows from 32 bits to 64 bits.  This seems quite a price for avoiding
the Multixact members' wraparound.

We can try to squeeze multixact offsets given it's ascending sequence
each time increased by a multixact size.  But how many members can a
multixact contain at maximum?  Looking at MultiXactIdExpand(), I get
that we're keeping locks from in-progress transactions, and committed
non-lock transactions (I guess the latter could be only one).  The
number of transactions running by backends should fit MAX_BACKENDS
(2^18 - 1), and the number of prepared transactions should also fit
MAX_BACKENDS. So, I guess we can cap the total number of one multixact
members to 2^24.

Therefore, we can change from each 8 of 32-bit multixact offsets
(takes 32-bytes) to one 64-bit offset + 7 of 24-bit offset increments
(takes 29-bytes).  The actual multixact offsets can be calculated at
the fly, overhead shouldn't be significant.  What do you think?

--
Regards,
Alexander Korotkov
Supabase




Re: [PATCH] Add tests for Bitmapset

2025-09-13 Thread Greg Burd

> On Sep 11, 2025, at 9:36 PM, Michael Paquier  wrote:
> 
> On Thu, Sep 11, 2025 at 06:56:07AM -0400, Greg Burd wrote:
>> Just for reference I started this not to increase coverage, which is a good
>> goal just not the one I had.  I was reviewing the API and considering some
>> changes based on other work I've done.  Now that I see how deeply baked in
>> this code is I think that's unlikely.  Maybe something else distinct for
>> bitmaps over 64-bit space at some point will be useful.  I wrote this code
>> just to capture the API in test form.
> 
> How much does this measure in terms of numbers produced by
> coverage-html (see [1])?  The paths taken don't always matter as it
> can also be important to check combinations of code paths that are
> taken by other tests when checking after edge cases, but that would
> give an idea of gain vs extra runtime.  Not objecting to your patch,
> just being curious as I am not seeing any numbers posted on this
> thread.
> 
> [1]: https://www.postgresql.org/docs/devel/regress-coverage.html
> --
> Michael


Sawada-san, Michael,

Thank you both for the push to measure.  Before the patch as it stands now the
coverage for src/backend/nodes/bitmapset.c is 63.5% and after it is 66.5%.  Not
an amazing difference, but something.  I guess I expected this to be higher 
given
the degree to which this datatype is used.

I'll review the gaps in coverage and update the tests.  I'll look for a way to 
add
meaningful randomization.

-greg

Re: Parallel Apply

2025-09-13 Thread Abhi Mehta
Hi Amit,


Really interesting proposal! I've been thinking through some of the
implementation challenges:


*On the memory side:* That hash table tracking RelationId and
ReplicaIdentity could get pretty hefty under load. Maybe bloom filters
could help with the initial screening? Also wondering

about size caps with some kind of LRU cleanup when things get tight.


*Worker bottleneck:* This is the tricky part - hundreds of active
transactions but only a handful of workers. Seems like we'll hit
serialization anyway when workers are maxed out. What

about spawning workers dynamically (within limits) or having some smart
queuing for when we're worker-starved?



*Alternative approach(if it can be consider): *Rather than full
parallelization, break transaction processing into overlapping stages:


• *Stage 1:* Parse WAL records

• *Stage 2:* Analyze dependencies

• *Stage 3:* Execute changes

• *Stage 4:* Commit and track progress


This creates a pipeline where Transaction A executes changes while
Transaction B analyzes dependencies and Transaction C parses data - all
happening simultaneously in different stages.


The out-of-order commit option you mentioned makes sense for apps handling
integrity themselves.


*Question:* What's the fallback behavior when dependency detection fails?



Thanks,

Abhishek Mehta

On Sat, Sep 13, 2025 at 5:08 PM Amit Kapila  wrote:

> Hi,
>
> Background and Motivation
> -
> In high-throughput systems, where hundreds of sessions generate data
> on the publisher, the subscriber's apply process often becomes a
> bottleneck due to the single apply worker model. While users can
> mitigate this by creating multiple publication-subscription pairs,
> this approach has scalability and usability limitations.
>
> Currently, PostgreSQL supports parallel apply only for large streaming
> transactions (streaming=parallel). This proposal aims to extend
> parallelism to non-streaming transactions, thereby improving
> replication performance in workloads dominated by smaller, frequent
> transactions.
>
> Design Overview
> 
> To safely parallelize non-streaming transactions, we must ensure that
> transaction dependencies are respected to avoid failures and
> deadlocks. Consider the following scenarios to understand it better:
> (a) Transaction failures: Say, if we insert a row in the first
> transaction and update it in the second transaction on the publisher,
> then allowing the subscriber to apply both in parallel can lead to
> failure in the update; (b) Deadlocks - allowing transactions that
> update the same set of rows in a table in the opposite order in
> parallel can lead to deadlocks.
>
> The core idea is that the leader apply worker ensures the following:
> a. Identifies dependencies between transactions. b. Coordinates
> parallel workers to apply independent transactions concurrently. c.
> Ensures correct ordering for dependent transactions.
>
> Dependency Detection
> 
> 1. Basic Dependency Tracking: Maintain a hash table keyed by
> (RelationId, ReplicaIdentity) with the value as the transaction XID.
> Before dispatching a change to a parallel worker, the leader checks
> for existing entries: (a) If no match: add the entry and proceed; (b)
> If match: instruct the worker to wait until the dependent transaction
> completes.
>
> 2. Unique Keys
> In addition to RI, track unique keys to detect conflicts. Example:
> CREATE TABLE tab1(a INT PRIMARY KEY, b INT UNIQUE);
> Transactions on publisher:
> Txn1: INSERT (1,1)
> Txn2: INSERT (2,2)
> Txn3: DELETE (2,2)
> Txn4: UPDATE (1,1) → (1,2)
>
> If Txn4 is applied before Txn2 and Txn3, it will fail due to a unique
> constraint violation. To prevent this, track both RI and unique keys
> in the hash table. Compare keys of both old and new tuples to detect
> dependencies. Then old_tuple's RI needs to be compared, and new
> tuple's, both unique key and RI (new tuple's RI is required to detect
> some prior insertion with the same key) needs to be compared with
> existing hash table entries to identify transaction dependency.
>
> 3. Foreign Keys
> Consider FK constraints between tables. Example:
>
> TABLE owner(user_id INT PRIMARY KEY);
> TABLE car(car_name TEXT, user_id INT REFERENCES owner);
>
> Transactions:
> Txn1: INSERT INTO owner(1)
> Txn2: INSERT INTO car('bz', 1)
>
> Applying Txn2 before Txn1 will fail. To avoid this, check if FK values
> in new tuples match any RI or unique key in the hash table. If
> matched, treat the transaction as dependent.
>
> 4. Triggers and Constraints
> For the initial version, exclude tables with user-defined triggers or
> constraints from parallel apply due to complexity in dependency
> detection. We may need some parallel-apply-safe marking to allow this.
>
> Replication Progress Tracking
> -
> Parallel apply introduces out-of-order commit application,
> complicating replication progres

[PATCH 2/2] Benchmark code for postgres checksums

2025-09-13 Thread tenistarkim
From: Andrew kim 

---
 contrib/meson.build   |  1 +
 contrib/pg_checksum_bench/meson.build | 23 +
 .../pg_checksum_bench--1.0.sql|  8 +
 contrib/pg_checksum_bench/pg_checksum_bench.c | 34 +++
 .../pg_checksum_bench.control |  4 +++
 .../sql/pg_checksum_bench.sql | 17 ++
 6 files changed, 87 insertions(+)
 create mode 100644 contrib/pg_checksum_bench/meson.build
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench.c
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench.control
 create mode 100644 contrib/pg_checksum_bench/sql/pg_checksum_bench.sql

diff --git a/contrib/meson.build b/contrib/meson.build
index ed30ee7d639..fe5149aadff 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -12,6 +12,7 @@ contrib_doc_args = {
   'install_dir': contrib_doc_dir,
 }
 
+subdir('pg_checksum_bench')
 subdir('amcheck')
 subdir('auth_delay')
 subdir('auto_explain')
diff --git a/contrib/pg_checksum_bench/meson.build 
b/contrib/pg_checksum_bench/meson.build
new file mode 100644
index 000..32ccd9efa0f
--- /dev/null
+++ b/contrib/pg_checksum_bench/meson.build
@@ -0,0 +1,23 @@
+# Copyright (c) 2022-2025, PostgreSQL Global Development Group
+
+pg_checksum_bench_sources = files(
+  'pg_checksum_bench.c',
+)
+
+if host_system == 'windows'
+  pg_checksum_bench_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+'--NAME', 'pg_checksum_bench',
+'--FILEDESC', 'pg_checksum_bench',])
+endif
+
+pg_checksum_bench = shared_module('pg_checksum_bench',
+  pg_checksum_bench_sources,
+  kwargs: contrib_mod_args,
+)
+contrib_targets += pg_checksum_bench
+
+install_data(
+  'pg_checksum_bench--1.0.sql',
+  'pg_checksum_bench.control',
+  kwargs: contrib_data_args,
+)
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql 
b/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
new file mode 100644
index 000..5f13cbe3c5e
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
@@ -0,0 +1,8 @@
+/* contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+-- \echo Use "CREATE EXTENSION pg_checksum_bench" to load this file. \quit
+
+CREATE FUNCTION drive_pg_checksum(page_count int)
+   RETURNS pg_catalog.void
+   AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench.c 
b/contrib/pg_checksum_bench/pg_checksum_bench.c
new file mode 100644
index 000..f40f335ff59
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench.c
@@ -0,0 +1,34 @@
+#include "postgres.h"
+#include "fmgr.h"
+#include "storage/checksum_impl.h"
+
+#include 
+#include 
+
+PG_MODULE_MAGIC;
+
+#define REPEATS 100
+
+PG_FUNCTION_INFO_V1(drive_pg_checksum);
+Datum
+drive_pg_checksum(PG_FUNCTION_ARGS)
+{
+   int page_count = PG_GETARG_INT32(0);
+
+   PGChecksummablePage * pages = palloc(page_count * 
sizeof(PGChecksummablePage));
+   srand(0);
+   for (size_t i = 0; i < page_count * sizeof(PGChecksummablePage); i++){
+   char * byte_ptr = (char *) pages;
+   byte_ptr[i] = rand() % 256;
+   }
+
+   for (int i = 0; i < REPEATS; i++){
+   const PGChecksummablePage * test_page = pages + (i % 
page_count);
+   volatile uint32 result = pg_checksum_block(test_page);
+   (void) result;
+   }
+
+   pfree((void *) pages);
+
+   PG_RETURN_VOID();
+}
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench.control 
b/contrib/pg_checksum_bench/pg_checksum_bench.control
new file mode 100644
index 000..4a4e2c9363c
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench.control
@@ -0,0 +1,4 @@
+comment = 'pg_checksum benchmark'
+default_version = '1.0'
+module_pathname = '$libdir/pg_checksum_bench'
+relocatable = true
diff --git a/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql 
b/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql
new file mode 100644
index 000..4b347699953
--- /dev/null
+++ b/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql
@@ -0,0 +1,17 @@
+CREATE EXTENSION pg_checksum_bench;
+
+SELECT drive_pg_checksum(-1);
+
+\timing on
+
+SELECT drive_pg_checksum(1);
+SELECT drive_pg_checksum(2);
+SELECT drive_pg_checksum(4);
+SELECT drive_pg_checksum(8);
+SELECT drive_pg_checksum(16);
+SELECT drive_pg_checksum(32);
+SELECT drive_pg_checksum(64);
+SELECT drive_pg_checksum(128);
+SELECT drive_pg_checksum(256);
+SELECT drive_pg_checksum(512);
+SELECT drive_pg_checksum(1024);
-- 
2.43.0





Re: pg_dump does not dump domain not-null constraint's comments

2025-09-13 Thread Álvaro Herrera
On 2025-Sep-12, Noah Misch wrote:

> The last argument gives the dump object on which the comment has a dependency.
> Since this is the case of a separately-dumped constraint, the comment needs to
> depend on that constraint (coninfo), not on the domain (tyinfo):
> 
> - coninfo->dobj.catId, 0, 
> tyinfo->dobj.dumpId);
> + coninfo->dobj.catId, 0, 
> coninfo->dobj.dumpId);
> 
> I didn't encounter a failure from this, but sufficient restore parallelism
> might reach a failure.  A failure would look like a "does not exist" error in
> the COMMENT command, due to the constraint not yet existing.
> dumpTableConstraintComment() is an older case that optimally handles the last
> dumpComment() arg.

Sounds sane.

> In the absence of objections, I'll make it so.

Please do, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: plan shape work

2025-09-13 Thread Tom Lane
Robert Haas  writes:
> But, I also can't commit either v4-0003 or v5-0003 or any variant
> thereof until we agree on what to do about 0001, and you're the
> holdout there.

Yeah, I owe you a review, hope to get to it over the weekend.

regards, tom lane




Re: PostgreSQL 18 GA press release draft

2025-09-13 Thread Jeff Davis
On Fri, 2025-09-12 at 13:21 -0500, Nico Williams wrote:
> On Fri, Sep 12, 2025 at 10:11:59AM -0700, Jeff Davis wrote:
> > The name PG_UNICODE_FAST is meant to convey that it provides full
> > unicode semantics for case mapping and pattern matching, while also
> > being fast because it uses memcmp for comparisons. While the name
> > PG_C_UTF8 is meant to convey that it's closer to what users of the
> > libc
> > "C.UTF-8" locale might expect.
> 
> How does one do form-insensitive comparisons?

If you mean case insensitive matching, you can do:

   CASEFOLD(a) = CASEFOLD(b)

in any locale or provider, but it's best when using PG_UNICODE_FAST or
ICU, because it handles some nuances better. For instance:

   CASEFOLD('ß') = CASEFOLD('SS') AND
   CASEFOLD('ß') = CASEFOLD('ss') AND
   CASEFOLD('ß') = CASEFOLD('ẞ')

are all true in PG_UNICODE_FAST and "en-US-x-icu", but not in libc
collations.

ICU also has case-insensitive collations, which offer a bit more
flexibility:

https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC

Regards,
Jeff Davis





Re: Making type Datum be 8 bytes everywhere

2025-09-13 Thread Tom Lane
Tomas Vondra  writes:
> On 9/10/25 22:35, Tom Lane wrote:
>> This is silently assuming that sizeof(SortItem) is a multiple of
>> alignof(Datum), which on a 32-bit-pointer platform is not true
>> any longer.  We ought to MAXALIGN the two occurrences of
>> data->numrows * sizeof(SortItem).

> You're right, I misunderstood which of the accesses is triggering the
> report. I added the two MAXALIGNs and can confirm that makes it go away
> on the rpi5. It's interesting it didn't happen on the i386 machine at
> all, but I don't have time to look at why right now.

I think it's just that i386 doesn't have hardware-level alignment
restrictions, or at least not ones for more than 4 bytes.

>> Do you want to fix it, or shall I?

> Feel free to do so. If not, I'll do that on Monday.

I can deal with it today, will go do so.

regards, tom lane




Re: ICU warnings during make installcheck and text_extensions test

2025-09-13 Thread Alexander Korotkov
On Tue, Jul 29, 2025 at 12:45 PM Oleg Tselebrovskiy
 wrote:
> Thanks for your response!
>
> Your patch works with REL_17 & master, but not with REL_16, since there
> is no builtin provider
>
> So if we're going that route, for PostgreSQL 16 and older we could just
> use libc provider instead of builtin for the same effect (see attached)
>
> I've run installcheck-world for both 'builtin' and 'libc', both seem to
> work fine

Great.  I'm not sure if we want this to be backpatched.  Yet pushed
this to master.

> Dunno what about tests like collate.icu.utf8.sql that require icu
> databases to run. Will those tests be run if we force non-icu locale
> providers in pg_regress? We probaly want them to be run sometimes,
> right? Except this, LGTM

collate.icu.utf8.sql have a protection against running in
inappropriate locale in the top.  It's OK to run it with any locale.
If the locale doesn't match then it will be stopped in the beginning
and alternative output collate.icu.utf8_1.out matched.

--
Regards,
Alexander Korotkov
Supabase




Re: Orphan page in _bt_split

2025-09-13 Thread Peter Geoghegan
On Wed, Sep 3, 2025 at 2:25 AM Konstantin Knizhnik  wrote:
> Do you suggest to replace `PageGetTempPage` with using local buffers?
> Or may be change `PageGetTempPage*` to accept parameter with provided
> local buffer?

I think that _bt_split could easily be switched over to using 2 local
PGAlignedBlock variables, removing its use of PageGetTempPage. I don't
think that it is necessary to consider other PageGetTempPage callers.

-- 
Peter Geoghegan