Re: Support getrandom() for pg_strong_random() source
On Tue, Oct 7, 2025 at 1:40 AM Daniel Gustafsson wrote: > None come to mind, and it seems increasingly likely that designing API which > we > confidently can document how to use is too complicated. Maybe it's better to > re-focus on just the usecase in the UUID code and leave the API at pg_prng() > and pg_strong_random()? (We already have UUID code in contrib which in some > cases use arc4random after all.) Maybe that would be best. If a second use case does show up in the future, we'd be better positioned to pull it out into an API (and we'd know how to document it!). Sawada-san: I feel like I've driven your thread off the rails (off a cliff?). Maybe I can try to summarize some of the ideas that seem to be emerging, without signing you up for them: - We should let superusers switch UUID generation to a faster CSPRNG, like getrandom(). - The UUID switch should be done at a layer above pg_strong_random(). Maybe just in the UUID code for now, so that we don't choose a bad abstraction. - It seems important for pg_strong_random() to default to a known-FIPS-complaint solution whenever available. - Packagers should be able to select a pg_strong_random alternative at compile time, since kernel-level randomness can potentially bring security improvements in addition to speed, not everyone cares about FIPS anyway, and it's conceivable that some distributions might provide FIPS-compliant kernel randomness. - It'd be nice for superusers (only) to be able to switch pg_strong_random at runtime, too, for the reasons Michael mentioned upthread. - We should document the RFC's warning about UUIDs not being considered cryptographic secrets. There is still a question of whether unprivileged users should be able to switch the UUID implementation. That seems to hinge, unfortunately, on whether or not we think someone could be relying on it for security and/or FIPS. Does that all seem reasonable, and are there pieces I should peel off into other threads? Thanks, --Jacob
pgstattuple: Use streaming read API in pgstatindex functions
Hi hackers, While reading the code related to streaming reads and their current use cases, I noticed that pgstatindex could potentially benefit from adopting the streaming read API. The required change is relatively simple—similar to what has already been implemented in the pg_warm and pg_visibility extensions. I also ran some performance tests on an experimental patch to validate the improvement. Summary Cold cache performance (the typical use case for diagnostic tools): - Medium indexes (~21MB): 1.21x - 1.79x faster (20-44% speedup) - Large indexes (~214MB): 1.50x - 1.90x faster (30-47% speedup) - Xlarge indexes (~1351MB):1.4x–1.9x gains. (29–47% speedup) Hardware: AX162-R from hetzner Test matrix: - Index types: Primary key, timestamp, float, composite (3 columns) - Sizes: Medium (1M rows, ~21MB), Large (10M rows, ~214MB), XLarge (50M rows, ~ 1351MB)) - Layouts: Unfragmented (sequential) and Fragmented (random insert order) - Cache states: Cold (dropped OS cache) and Warm (pg_prewarm) Xlarge fragmented example: ==> Creating secondary indexes on test_xlarge Created 3 secondary indexes: created_at, score, composite Created test_xlarge_pkey: 1351 MB Fragmentation stats (random insert order): leaf_frag_pct | avg_density_pct | leaf_pages | size ---+-++- 49.9 |71.5 | 172272 | 1351 MB (1 row) configuration: - shared_buffers = 16GB - effective_io_concurrency = 500 - io_combine_limit = 16 - autovacuum = off - checkpoint_timeout = 1h - bgwriter_delay = 1ms (minimize background writes) - jit = off - max_parallel_workers_per_gather = 0 Unfragmented Indexes (Cold Cache) Index TypeSize Baseline Patched Speedup Primary KeyMedium 31.5 ms 19.6 ms 1.58× Primary KeyLarge 184.0 ms 119.0 ms 1.54× Timestamp Medium 13.4 ms 10.5 ms 1.28× Timestamp Large 85.0 ms 45.6 ms 1.86× Float (score) Medium 13.7 ms 11.4 ms 1.21× Float (score) Large 84.0 ms 45.0 ms 1.86× Composite (3 col) Medium 26.7 ms 17.2 ms 1.56× Composite (3 col) Large 89.8 ms 51.2 ms 1.75× ⸻ Fragmented Indexes (Cold Cache) To address concerns about filesystem fragmentation, I tested indexes built with random inserts (ORDER BY random()) to trigger page splits and create fragmented indexes: Index TypeSize Baseline Patched Speedup Primary KeyMedium 41.9 ms 23.5 ms 1.79× Primary KeyLarge 236.0 ms 148.0 ms 1.58× Primary KeyXLarge 953.4 ms 663.1 ms 1.43× Timestamp Medium 32.1 ms 18.8 ms 1.70× Timestamp Large 188.0 ms 117.0 ms 1.59× Timestamp XLarge 493.0 ms 518.6 ms 0.95× Float (score) Medium 14.0 ms 10.9 ms 1.28× Float (score) Large 85.8 ms 45.2 ms 1.89× Float (score) XLarge 263.2 ms 176.5 ms 1.49× Composite (3 col) Medium 42.3 ms 24.1 ms 1.75× Composite (3 col) Large 245.0 ms 162.0 ms 1.51× Composite (3 col) XLarge 1052.5 ms 716.5 ms 1.46× Summary: Fragmentation generally does not hurt streaming reads; most fragmented cases still see 1.4×–1.9× gains. One outlier (XLarge Timestamp) shows a slight regression (0.95×). ⸻ Warm Cache Results When indexes are fully cached in shared_buffers: Unfragmented: infrequent little regression for small to medium size index(single digit ms variance, barely noticeable); small gains for large size index Fragmented: infrequent little regression for small to medium size index(single digit ms variance, barely noticeable); small gains for large size index Best, Xuneng From 153ab2799803dc402789d0aa825456ea12f2d3d9 Mon Sep 17 00:00:00 2001 From: alterego655 <[email protected]> Date: Sun, 12 Oct 2025 21:27:22 +0800 Subject: [PATCH] pgstattuple: Use streaming read API in pgstatindex functions Replace synchronous ReadBufferExtended() loops with the streaming read API in pgstatindex_impl() and pgstathashindex(). --- contrib/pgstattuple/pgstatindex.c | 205 ++ 1 file changed, 121 insertions(+), 84 deletions(-) diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index 40823d54fca..a708cc417b0 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -37,6 +37,7 @@ #include "funcapi.h" #include "miscadmin.h" #include "storage/bufmgr.h" +#include "storage/read_stream.h" #include "utils/rel.h" #include "utils/varlena.h" @@ -273,60 +274,77 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) indexStat.fragments = 0; /* - * Scan all blocks except the metapage + * Scan all blocks except the metapage using streaming reads */ nblocks = RelationGetNumberOfBlocks(rel); - for (blkno = 1; blkno < nblocks; blkno++) { - Buffer buffer; - Page page; - BTPageOpaque opaque; - - CHECK_FOR_INTERRUPTS(); - - /* Read and lock buffer */ - buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy); - LockBuffer(buffer, BUFFER_LOCK_SHARE); - - page = BufferGetPage(b
Re: Logical Replication of sequences
HI Vignesh,
Here are some minor review comments for patches 0001 and 0002.
Patch 0001
AlterSubscription:
1.1.
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
subscriptions")));
+ errmsg("ALTER SUBSCRIPTION ... REFRESH PUBLICATION is not allowed
for disabled subscriptions")));
Maybe this could use a parameter substitution like:
errmsg("%s is not allowed for disabled subscriptions", "ALTER
SUBSCRIPTION ... REFRESH PUBLICATION");
That way (in preparation for the next patch), there will be only 1
message requiring translation.
Patch 0002
Commit message:
2.1
"This command update the sequence entries present in the..."
/update/updates/
==
AlterSubscription:
2.2
+ if (!sub->enabled)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("ALTER SUBSCRIPTION ... REFRESH SEQUENCES is not allowed for
disabled subscriptions"));
Can use the same message with parameter substitution as mentioned above (#1.1)
==
Kind Regards,
Peter Smith.
Fujitsu Australia
Re: Finalizing read stream users' flag choices
Hi, Could you please help to clarify the READ_STREAM_SEQUENTIAL usage policy. There are several places that use the callback 'block_range_read_stream_cb'. So it means all of them have the same sequential access pattern. But only one of them (amcheck verify_heapam) uses READ_STREAM_SEQUENTIAL. Why is it? Is it correct to say that 'block_range_read_stream_cb' produces a simple enough pattern so we don't need READ_STREAM_SEQUENTIAL with it? OTOH we know in advance that we want to read data strictly sequentially, so why not let OS detect the pattern, if it does a better job as the comment around READ_STREAM_SEQUENTIAL says. So amcheck usage looks valid too.. Best regards, Arseniy Mukhin
Core dumps from recovery/017_shm
While looking for something else, I noticed that we occasionally see
assertion failures like this:
TRAP: failed Assert("latch->maybe_sleeping == false"), File:
"latch.c", Line: 378, PID: 28023
Here's one in the build farm:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-08-05%2005:52:51
And here are some recent cases on CI, which again fail somewhere else,
but that might be expected as these are cfbot branches from patches on
the mailing list:
task_id |task_name
--+-
6347210574528512 | Linux - Debian Bookworm - Meson
6420333948829696 | FreeBSD - Meson
5616450825617408 | FreeBSD - Meson
4515661445070848 | Linux - Debian Bookworm - Meson
4945927242252288 | Linux - Debian Bookworm - Meson
5133563223343104 | Linux - Debian Bookworm - Meson
You can drop those task IDs into these URLs:
https://cirrus-ci.com/task/$TASK_ID
https://api.cirrus-ci.com/v1/artifact/task/$TASK_ID/testrun/build/testrun/recovery/017_shm/log/017_shm_gnat.log
My current theory is that backends are exiting when the test kills the
postmaster, but a backend that is concurrently starting up takes over
its latch, and then its first ResetLatch(MyLatch) fails that assertion
because maybe_sleeping was never cleared. So I suppose it should be
cleared in ... DisownLatch()?
That sails close to the topic in these threads:
https://www.postgresql.org/message-id/flat/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru
https://www.postgresql.org/message-id/flat/ca+hukgkp0ktpummcpa97+wfjtm+uyzq9ex8umdh8zxklwo0...@mail.gmail.com
If we didn't use proc_exit(), we wouldn't recycle the latch, so the
problem would go away with the new emergency cleanup solution I'm
working on (which incidentally also gets rid of the other source of
core dump spam that clogs up BF and CI systems: archive scripts and
other subprocesses of backends). More about that soon on that last
thread, but...
That would still leave versions 15-18 with these rare assertion
failures, since they have commit c8f3bc24. So I think the thing to do
is change DisownLatch() to clear maybe_sleeping just where it also
clears owner_pid, and backpatch that. Another idea would be to do it
in WaitEventSetWaitBlock() before exiting, but that'd be duplicated in
several places.
Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it
hi.
rebase and regress tests changes.
From 99f3cb1fefa579aee759232e9495f7b4180628d7 Mon Sep 17 00:00:00 2001
From: jian he
Date: Mon, 13 Oct 2025 09:42:40 +0800
Subject: [PATCH v3 1/1] improve function to_json_is_immutable and
to_jsonb_is_immutable
this will make to_json_is_immutable, to_jsonb_is_immutable recurse to composite
data type or array type elements. also add extensive regress tests for it.
discussion: https://postgr.es/m/CACJufxFz%3DOsXQdsMJ-cqoqspD9aJrwntsQP-U2A-UaV_M%2B-S9g%40mail.gmail.com
---
src/backend/optimizer/util/clauses.c | 10 ++-
src/backend/utils/adt/json.c | 77 +++
src/backend/utils/adt/jsonb.c | 77 +++
src/include/utils/json.h | 2 +-
src/include/utils/jsonb.h | 2 +-
src/test/regress/expected/sqljson.out | 104 ++
src/test/regress/sql/sqljson.sql | 68 +
7 files changed, 309 insertions(+), 31 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 81d768ff2a2..a7a2c2683b4 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -409,10 +409,14 @@ contain_mutable_functions_walker(Node *node, void *context)
foreach(lc, ctor->args)
{
Oid typid = exprType(lfirst(lc));
+ bool contain_mutable = false;
- if (is_jsonb ?
-!to_jsonb_is_immutable(typid) :
-!to_json_is_immutable(typid))
+ if (is_jsonb)
+to_jsonb_is_immutable(typid, &contain_mutable);
+ else
+to_json_is_immutable(typid, &contain_mutable);
+
+ if(contain_mutable)
return true;
}
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 14f5cb498fc..b8a030b8995 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -14,6 +14,7 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/relation.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "common/hashfn.h"
@@ -29,6 +30,7 @@
#include "utils/json.h"
#include "utils/jsonfuncs.h"
#include "utils/lsyscache.h"
+#include "utils/rel.h"
#include "utils/typcache.h"
@@ -693,15 +695,65 @@ row_to_json_pretty(PG_FUNCTION_ARGS)
/*
* Is the given type immutable when coming out of a JSON context?
*
- * At present, datetimes are all considered mutable, because they
- * depend on timezone. XXX we should also drill down into objects
- * and arrays, but do not.
+ * At present, datetimes are all considered mutable, because they depend on
+ * timezone.
+ *
+ * The caller should set contain_mutable to false first!! This function will
+ * recurse through the type’s element types to check for mutability. If any
+ * element type is mutable, it will set contain_mutable to true.
*/
-bool
-to_json_is_immutable(Oid typoid)
+void
+to_json_is_immutable(Oid typoid, bool *contain_mutable)
{
+ char att_typtype = get_typtype(typoid);
JsonTypeCategory tcategory;
Oid outfuncoid;
+ Oid att_typelem;
+
+ /* since this function recurses, it could be driven to stack overflow */
+ check_stack_depth();
+
+ Assert(contain_mutable != NULL);
+
+ if (*contain_mutable == true)
+ return;
+
+ if (att_typtype == TYPTYPE_DOMAIN)
+ to_json_is_immutable(getBaseType(typoid), contain_mutable);
+ else if (att_typtype == TYPTYPE_COMPOSITE)
+ {
+ /*
+ * For a composite type, recurse into its attributes.
+ */
+ Relation relation;
+ TupleDesc tupdesc;
+ int i;
+
+ relation = relation_open(get_typ_typrelid(typoid), AccessShareLock);
+
+ tupdesc = RelationGetDescr(relation);
+
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+ if (attr->attisdropped)
+continue;
+
+ to_json_is_immutable(attr->atttypid, contain_mutable);
+ }
+
+ relation_close(relation, AccessShareLock);
+ }
+ else if (att_typtype == TYPTYPE_RANGE)
+ {
+ to_json_is_immutable(get_range_subtype(typoid), contain_mutable);
+ }
+ else if (OidIsValid((att_typelem = get_element_type(typoid
+ {
+ /* recurse into array element type */
+ to_json_is_immutable(att_typelem, contain_mutable);
+ }
json_categorize_type(typoid, false, &tcategory, &outfuncoid);
@@ -711,26 +763,25 @@ to_json_is_immutable(Oid typoid)
case JSONTYPE_JSON:
case JSONTYPE_JSONB:
case JSONTYPE_NULL:
- return true;
+ break;
case JSONTYPE_DATE:
case JSONTYPE_TIMESTAMP:
case JSONTYPE_TIMESTAMPTZ:
- return false;
+ *contain_mutable = true;
+ break;
case JSONTYPE_ARRAY:
- return false; /* TODO recurse into elements */
-
case JSONTYPE_COMPOSITE:
- return false; /* TODO recurse into fields */
+ break;
case JSONTYPE_NUMERIC:
case JSONTYPE_CAST:
case JSONTYPE_OTHER:
- return func_volatile(outfuncoid) == PROVOLATILE_IMMUTABLE;
+ if (func_volatile(outfuncoid) != PROVOLATILE_IMMUTABLE)
+*contain_mutable = true;
+ break;
}
-
- return false;/* not reached */
}
/*
diff --git a/s
Re: Improve docs for n_distinct_inherited
Hi David, I think your revision is good and accurate. > On Oct 13, 2025, at 07:42, David Rowley wrote: > > Just picking this one up again. I forgot to come back to this after > PGConf.dev. > > I came up with: > > Ordinarily n_distinct is used. > n_distinct_inherited exists to allow the distinct > estimate to be overwritten for the statistics gathered for inheritance > parent tables and for partitioned tables. This clarifies that n_distinct_inherited applies to both inheritance parents and partitioned tabled, that’s accurate and better than the original wording. > > I also fixed what I thought was some misleading text about ANALYZE > using this value to calculate things. That's not true. It's the query > planner that uses this value. ANALYZE just stores whatever this is set > to into pg_statistic. I also adjusted the text that was talking about > "the size of the table", which, as I mentioned earlier isn't correct. > It's all related to the estimated number rows in the table, per > "ntuples = vardata->rel->tuples;" in get_variable_numdistinct(). > When I read the diff, I thought “table size” is usually used in PG docs, so maybe “estimated table size”. But as you explicitly explained why you made the change, I am fine with your change. > Also fixed a typo; "twice on the average" shouldn't contain "the". > Correct grammer fix. > I wonder if ", since the multiplication by the number of rows in the > table is not performed until query planning time" should be deleted > since I modified the text earlier to talk about "the query planner”. > Yeah, with your rewrite, that clause now feels a little redundant. I think it can be removed entirely. The other thing that doesn’t belong to your change but as you are touching here: “When set to a negative value, which must be greater than or equal to -1" When I first time read the doc, I was confused. Because no easier sentence indicated “n_distinct” is of float type. I thought “greater than” was a typo. When I read through, the later example (0.5) resolved my confusion. To avoid the same confusion to other readers, maybe change to “when set to a negative value between -1 and 0 (inclusive of -1)” or “when set to a negative value, which must be in the range -1<= value < 0”. But as I said, this doesn’t belong to your change. If you don’t want to enhance this place, or you don’t consider this is a problem, I am absolutely fine. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: Invalid pointer access in logical decoding after error
On Thu, 9 Oct 2025 at 23:33, Masahiko Sawada wrote: > > On Thu, Oct 9, 2025 at 10:07 AM Masahiko Sawada wrote: > > > > On Thu, Oct 9, 2025 at 8:23 AM Euler Taveira wrote: > > > > > > On Thu, Oct 9, 2025, at 10:40 AM, vignesh C wrote: > > > > On Thu, 9 Oct 2025 at 00:16, Masahiko Sawada > > > > wrote: > > > >> > > > >> One thing we might want to consider is for v14 and v13. They don't > > > >> have this bug since the entry_ctx was introduced in v15, but it's > > > >> still true for them that RelationSyncCache is not cleaned up in error > > > >> cases if pgoutput is used via SQL API. For example, if > > > >> RelationSyncCache hash table gets corrupted for some reason, logical > > > >> decoding could repeat an error until logical decoding completes > > > >> successfully and its shutdown callback is called. Also, it might be a > > > >> good idea in general to ensure cleaning up the hash table after use. > > > > > > > > Agreed, let's backpatch to PG13. Should we also add a test case in the > > > > master branch, given that this issue has been around for a while? > > > > > > > > > > I'm wondering if it is a good idea because the bug doesn't manifest in > > > v13 and > > > v14. At least the v13 has its final minor version in less than a month > > > and EOL. > > > I would have caution when applying fixes to the latest minor version of a > > > stable branch; there won't be a chance to fix the fix in the next minor > > > release. Furthermore, in these back branches, the patch doesn't fix a > > > known > > > issue. I wouldn't bother with these back branches. For v14, if, in a > > > couple of > > > months, we have some reports that justify the backpatch, fine. > > > > Agreed. I'm hesitant with patching to v13 and v14. We've never got > > such a bug report yet and the next minior version of v13 would be the > > final release. I'll add some comments in the commit message. > > > > And now pushed (from master to v15). Thanks for committing the changes. I monitored the build farm, and the runs completed successfully. I've also closed the Cf entry at [1]. [1] - https://commitfest.postgresql.org/patch/5903/ Regards, Vignesh
Re: Expanding HOT updates for expression and partial indexes
On Wed, 2025-10-08 at 15:48 -0500, Nathan Bossart wrote: > > The theory being that > > my new code using the old/new tts to form and test the index tuples > > resulting from executing expressions was using the resultsRelInfo > > struct > > created during plan execution, not the information found on the > > page, > > and so was safe without the lock. > > An open question (at least from me) is whether this is safe. I'm not > familiar enough with this area of code yet to confidently determine > that. The optimization requires that the expression evaluates to the same thing on the old and new tuples. That determination doesn't have anything to do with a lock on the buffer, so long as the old tuple isn't pruned away or something. And clearly it won't be pruned, because we're in the process of updating it, so we have a snapshot that can see it. There might be subtleties in other parts of the proposal, but the above determination can be made safely without a buffer lock. > > > I added a reloption "expression_checks" to disable this new code > > path. > > Good idea or bad precedent? > > If there are cases where the added overhead outweighs the benefits > (which > seems like it must be true some of the time), then I think we must > have a > way to opt-out (or maybe even opt-in). In fact, I'd advise adding a > GUC to > complement the reloption so that users can configure it at higher > levels. I'll push back against this. For now I'm fine with developer options to make testing easier, but we should find a way to make this work well without tuning. Regards, Jeff Davis
commit_baseurl is broken in postgres.sgml
Hi Hacker, I just noticed $SUBJECT: In doc/src/sgml/postgres.sgml, line 12: ``` https://postgr.es/c/";> ``` https://postgr.es/c/ is resolved to https://git.postgresql.org/pg/commitdiff/, but the URL now returns 404. Looks like Bruce fixed the URL 13 months ago, but it’s broken again. I am not sure if https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff is the correct URL, and anyway I don’t know how to generate a short URL. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: Is there public API to fetch errcode?
Sergey Fukanchik writes: > Does libpq have an API to extract this code? PQresultErrorField(res, PG_DIAG_SQLSTATE) should do, no? regards, tom lane
Re: Is there public API to fetch errcode?
sqlstate is saved into conn->last_sqlstate along with the error message in pqGetErrorNotice3: if (id == PG_DIAG_SQLSTATE) strlcpy(conn->last_sqlstate, workBuf.data, sizeof(conn->last_sqlstate)); is this usable? Can it be extracted somehow? --- Sergey
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Hi Tom, > While I'd paid basically zero attention to this patch (the claim > in the commit message that I reviewed it is a flight of fancy), > I've been forced to look through it as a consequence of the mop-up > that's been happening to silence compiler warnings. Sorry for taking up your time to fix the compiler warnings. I haven't noticed your commit 71540dcdcb2 until today. Next time I will try to fix warnings found by buildfarm. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
