Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

2025-04-05 Thread Michael Paquier
On Sat, Apr 05, 2025 at 07:14:27PM +0900, Michael Paquier wrote:
> On Wed, Apr 02, 2025 at 05:29:00PM -0700, Noah Misch wrote:
>> Your back-patches are correct.  Thanks.
> 
> Thanks for double-checking.  I'll move on with what I have after a
> second look as it's been a few weeks since I've looked at all these
> conflicts.  I am also planning to add a few notes in the commits to
> mention this thread.

And applies this series on REL_14_STABLE and REL_13_STABLE, editing
all the commits to mention what they were linked to previously.
--
Michael


signature.asc
Description: PGP signature


Re: rename pg_log_standby_snapshot

2025-04-05 Thread Michael Paquier
On Sat, Apr 05, 2025 at 10:32:40PM -0400, Andres Freund wrote:
> I think this would all be a nice argument to have when introducing a new
> function. But I don't think it's a wart sufficiently big to justify breaking
> compatibility.

Yeah, I would side as well with the compatibility argument on this
one.
--
Michael


signature.asc
Description: PGP signature


Re: Removing unneeded self joins

2025-04-05 Thread Tender Wang
Hi,

I find that the postgresql.conf.sample file doesn't contain
enable_self_join_elimination guc.
If this is necessary, please see the attached patch.

--
Thanks, Tender Wang
From f27c99aebbd07d4008173492c7913749b149b540 Mon Sep 17 00:00:00 2001
From: Tender Wang 
Date: Sun, 6 Apr 2025 11:54:49 +0800
Subject: [PATCH] Put enable_self_join_elimination into postgresql.conf.sample

---
 src/backend/utils/misc/postgresql.conf.sample | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/utils/misc/postgresql.conf.sample 
b/src/backend/utils/misc/postgresql.conf.sample
index ff56a1f0732..bcd4e67f43e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -427,6 +427,7 @@
 #enable_tidscan = on
 #enable_group_by_reordering = on
 #enable_distinct_reordering = on
+#enable_self_join_elimination = on
 
 # - Planner Cost Constants -
 
-- 
2.34.1



Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-05 Thread Álvaro Herrera
Hi

I've been giving this some final polish which have me time to think it through, 
and I think Peter is right. We should not be adding the new column, but instead 
RelationBuildTupleDesc should use its existing scan of pg_constraint to 
determine validity status of constraints. We may need in addition a further 
value for attnullability that means unknown, "a constraint exists but we don't 
know yet if it's valid or invalid" for the times when we scanned pg_attribute 
but not yet pg_constraint. I think it's not too large a patch on top of that we 
have.

I'm going on a little excursion now but should be back to work on this in a 
couple of hours.



Re: Parallel heap vacuum

2025-04-05 Thread Masahiko Sawada
On Sat, Apr 5, 2025 at 1:32 PM Andres Freund  wrote:
>
> Hi,
>
> On 2025-04-04 14:34:53 -0700, Masahiko Sawada wrote:
> > On Fri, Apr 4, 2025 at 11:05 AM Melanie Plageman
> >  wrote:
> > >
> > > On Tue, Apr 1, 2025 at 5:30 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > >
> > > > I've attached the new version patch. There are no major changes; I
> > > > fixed some typos, improved the comment, and removed duplicated codes.
> > > > Also, I've updated the commit messages.
> > >
> > > I haven't looked closely at this version but I did notice that you do
> > > not document that parallel vacuum disables eager scanning. Imagine you
> > > are a user who has set the eager freeze related table storage option
> > > (vacuum_max_eager_freeze_failure_rate) and you schedule a regular
> > > parallel vacuum. Now that table storage option does nothing.
> >
> > Good point. That restriction should be mentioned in the documentation.
> > I'll update the patch.
>
> I don't think we commonly accept that a new feature B regresses a pre-existing
> feature A, particularly not if feature B is enabled by default. Why would that
> be OK here?

The eager freeze scan is the pre-existing feature but it's pretty new
code that was pushed just a couple months ago. I didn't want to make
the newly introduced code complex further in one major release
especially if it's in a vacuum area. While I agree that disabling
eager freeze scans during parallel heap vacuum is not very
user-friendly, there are still many cases where parallel heap vacuum
helps even without the eager freeze scan. FYI the parallel heap scan
can be disabled by setting min_parallel_table_scan_size. So I thought
we can incrementally improve this part.

>
>
> The justification in the code:
> +* One might think that it would make sense to use the eager scanning 
> even
> +* during parallel lazy vacuum, but parallel vacuum is available only 
> in
> +* VACUUM command and would not be something that happens frequently,
> +* which seems not fit to the purpose of the eager scanning. Also, it
> +* would require making the code complex. So it would make sense to
> +* disable it for now.
>
> feels not at all convincing to me. There e.g. are lots of places that run
> nightly vacuums. I don't think it's ok to just disable eager scanning in such
> a case, as it would mean that the "freeze cliff" would end up being *higher*
> because of the nightly vacuums than if just plain autovacuum would have been
> used.

That's a fair argument.

> I think it was already a mistake to allow the existing vacuum parallelism to
> be introduced without integrating it with autovacuum. I don't think we should
> go further down that road.

Okay, I think we can consider how to proceed with this patch including
the above point in the v19 development.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Reduce "Var IS [NOT] NULL" quals during constant folding

2025-04-05 Thread Richard Guo
Currently, we have an optimization that reduces an IS [NOT] NULL qual
on a NOT NULL column to constant true or constant false, provided we
can prove that the input expression of the NullTest is not nullable by
any outer joins.  This deduction happens pretty late in planner,
during the distribution of quals to relations in query_planner().  As
mentioned in [1], doing it at such a late stage has some drawbacks.

Ideally, this deduction should happen during constant folding.
However, we don't have the per-relation information about which
columns are defined as NOT NULL available at that point.  That
information is collected from catalogs when building RelOptInfos for
base or other relations.

I'm wondering whether we can collect that information while building
the RangeTblEntry for a base or other relation, so that it's available
before constant folding.  This could also enable other optimizations,
such as checking if a NOT IN subquery's output columns and its
left-hand expressions are all certainly not NULL, in which case we can
convert it to an anti-join.

Attached is a draft patch to reduce NullTest on a NOT NULL column in
eval_const_expressions.

Initially, I planned to get rid of restriction_is_always_true and
restriction_is_always_false altogether, since we now perform the
reduction of "Var IS [NOT] NULL" quals in eval_const_expressions.
However, removing them would prevent us from reducing some IS [NOT]
NULL quals that we were previously able to reduce, because (a) the
self-join elimination may introduce new IS NOT NULL quals after the
constant folding, and (b) if some outer joins are converted to inner
joins, previously irreducible NullTest quals may become reducible.

So I think maybe we'd better keep restriction_is_always_true and
restriction_is_always_false as-is.

Any thoughts?

[1] https://postgr.es/m/2323997.1740623...@sss.pgh.pa.us

Thanks
Richard


v1-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patch
Description: Binary data


Re: Draft for basic NUMA observability

2025-04-05 Thread Jakub Wartak
On Wed, Apr 2, 2025 at 5:27 PM Bertrand Drouvot
 wrote:
>
> Hi Jakub,

Hi Bertrand,

> > OK, but I still fail to grasp why pg_indent doesnt fix this stuff on
> > it's own... I believe orginal ident, would fix this on it's own?
>
> My comment was not about indention but about the fact that I think that the
> casting is not a the right place. I think that's the result of the 
> multiplication
> that we want to be casted (cast operator has higher precedence than 
> Multiplication
> operator).

Oh! I've missed that, but v21 got a rewrite (still not polished) just
to show it can be done without float points as Tomas requested.

[..]
> Ok, but then does it make sense to see some num_size < shmem_size?
>
> postgres=# select c.name, c.size as num_size, s.size as shmem_size
>  from (select n.name as name, sum(n.size) as size from 
> pg_shmem_numa_allocations n group by n.name) c, pg_shmem_allocations s
>  where c.name = s.name and s.size > c.size;
>  name  | num_size  | shmem_size
> ---+---+
>  XLOG Ctl  |   4194304 |4208200
>  Buffer Blocks | 134217728 |  134221824
>  AioHandleIOV  |   2097152 |2850816

This was a real bug, fixed in v21 , the ent->allocated_size was not
properly page aligned. Thanks for the attention to detail.

-J.




Re: Parallel heap vacuum

2025-04-05 Thread Melanie Plageman
On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada  wrote:
>
> If we use ParallelBlockTableScanDesc with streaming read like the
> patch did, we would also need to somehow rewind the number of blocks
> allocated to workers. The problem I had with such usage was that a
> parallel vacuum worker allocated a new chunk of blocks when doing
> look-ahead reading and therefore advanced
> ParallelBlockTableScanDescData.phs_nallocated. In this case, even if
> we unpin the remaining buffers in the queue by a new functionality and
> a parallel worker resumes the phase 1 from the last processed block,
> we would lose some blocks in already allocated chunks unless we rewind
> ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData
> data. However, since a worker might have already allocated multiple
> chunks it would not be easy to rewind these scan state data.

Ah I didn't realize rewinding the state would be difficult. It seems
like the easiest way to make sure those blocks are done is to add them
back to the counter somehow. And I don't suppose there is some way to
save these not yet done block assignments somewhere and give them to
the workers who restart phase I to process on the second pass?

> Another idea is that parallel workers don't exit phase 1 until it
> consumes all pinned buffers in the queue, even if the memory usage of
> TidStore exceeds the limit. It would need to add new functionality to
> the read stream to disable the look-ahead reading. Since we could use
> much memory while processing these buffers, exceeding the memory
> limit, we can trigger this mode when the memory usage of TidStore
> reaches 70% of the limit or so. On the other hand, it means that we
> would not use the streaming read for the blocks in this mode, which is
> not efficient.

That might work. And/or maybe you could start decreasing the size of
block assignment chunks when the memory usage of TidStore reaches a
certain level. I don't know how much that would help or how fiddly it
would be.

> So we would need to
> invent a way to stop and resume the read stream in the middle during
> parallel scan.

As for needing to add new read stream functionality, we actually
probably don't have to. If you use read_stream_end() ->
read_stream_reset(), it resets the distance to 0, so then
read_stream_next_buffer() should just end up unpinning the buffers and
freeing the per buffer data. I think the easiest way to implement this
is to think about it as ending a read stream and starting a new one
next time you start phase I and not as pausing and resuming the read
stream. And anyway, maybe it's better not to keep a bunch of pinned
buffers and allocated memory hanging around while doing what could be
very long index scans.

- Melanie




Re: BTScanOpaqueData size slows down tests

2025-04-05 Thread Peter Geoghegan
On Wed, Apr 2, 2025 at 12:10 PM Tom Lane  wrote:
> I could get behind the idea of just having enough space in
> BTScanOpaqueData for about ten items, and dynamically allocating
> a MaxTIDsPerBTreePage-sized array only if we overrun that.
> And not allocate any space for mark/restore unless a mark is
> done.

That's exactly what I had in mind.

-- 
Peter Geoghegan




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-04-05 Thread Masahiko Sawada
On Sat, Mar 29, 2025 at 1:57 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 28 Mar 2025 22:37:03 -0700,
>   Masahiko Sawada  wrote:
>
> >> I've added the following tests:
> >>
> >> * Wrong input type handler without namespace
> >> * Wrong input type handler with namespace
> >> * Wrong return type handler without namespace
> >> * Wrong return type handler with namespace
> >> * Wrong return value (Copy*Routine isn't returned) handler without 
> >> namespace
> >> * Wrong return value (Copy*Routine isn't returned) handler with namespace
> >> * Nonexistent handler
> >> * Invalid qualified name
> >> * Valid handler without namespace and without search_path
> >> * Valid handler without namespace and with search_path
> >> * Valid handler with namespace
> >
> > Probably we can merge these newly added four files into one .sql file?
>
> I know that src/test/regress/sql/ uses this style (one .sql
> file includes many test patterns in one large category). I
> understand that the style is preferable in
> src/test/regress/sql/ because src/test/regress/sql/ has
> tests for many categories.
>
> But do we need to follow the style in
> src/test/modules/*/sql/ too? If we use the style in
> src/test/modules/*/sql/, we need to have only one .sql in
> src/test/modules/*/sql/ because src/test/modules/*/ are for
> each category.
>
> And the current .sql per sub-category style is easy to debug
> (at least for me). For example, if we try qualified name
> cases on debugger, we can use "\i sql/schema.sql" instead of
> extracting target statements from .sql that includes many
> unrelated statements. (Or we can use "\i sql/all.sql" and
> many GDB "continue"s.)
>
> BTW, it seems that src/test/modules/test_ddl_deparse/sql/
> uses .sql per sub-category style. Should we use one .sql
> file for sql/test/modules/test_copy_format/sql/? If it's
> required for merging this patch set, I'll do it.

I'm not sure that the regression test queries are categorized in the
same way as in test_ddl_deparse. While the former have separate .sql
files for different types of inputs (valid inputs and invalid inputs
etc.) , which seems finer graind, the latter has .sql files for each
DDL command.

Most of the queries under test_copy_format/sql verifies the input
patterns of the FORMAT option. I find that the regression tests
included in that directory probably should focus on testing new
callback APIs and some regression tests for FORMAT option handling can
be moved into the normal regression test suite (e.g., in copy.sql or a
new file like copy_format.sql). IIUC testing for invalid input
patterns can be done even without creating artificial wrong handler
functions.

>
> > Also we need to add some comments to describe what these queries test.
> > For example, it's not clear to me at a glance what queries in
> > no-schema.sql are going to test as there is no comment there.
>
> Hmm. You refer no_schema.sql in 0002, right?
>
> 
> CREATE EXTENSION test_copy_format;
> CREATE TABLE public.test (a smallint, b integer, c bigint);
> INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
> COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
> \.
> COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
> DROP TABLE public.test;
> DROP EXTENSION test_copy_format;
> 
>
> In general, COPY FORMAT tests focus on "COPY FROM WITH
> (FORMAT ...)" and "COPY TO WITH (FORMAT ...)". And the file
> name "no_schema" shows that it doesn't use qualified
> name. Based on this, I feel that the above content is very
> straightforward without any comment.
>
> What should we add as comments? For example, do we need the
> following comments?
>
> 
> -- This extension includes custom COPY handler: test_copy_format
> CREATE EXTENSION test_copy_format;
> -- Test data
> CREATE TABLE public.test (a smallint, b integer, c bigint);
> INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
> -- Use custom COPY handler, test_copy_format, without
> -- schema for FROM.
> COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
> \.
> -- Use custom COPY handler, test_copy_format, without
> -- schema for TO.
> COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
> -- Cleanup
> DROP TABLE public.test;
> DROP EXTENSION test_copy_format;
> 

I'd like to see in the comment what the tests expect. Taking the
following queries as an example,

COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
\.
COPY public.test TO stdout WITH (FORMAT 'test_copy_format');

it would help readers understand the test case better if we have a
comment like for example:

-- Specify the custom format name without schema. We test if both
-- COPY TO and COPY FROM can find the correct handler function
-- in public schema.

>
> > One problem in the following chunk I can see is:
> >
> > +   qualified_format = stringToQualifiedNameList(format, NULL);
> > +   

Re: Draft for basic NUMA observability

2025-04-05 Thread Jakub Wartak
On Mon, Mar 17, 2025 at 5:11 PM Bertrand Drouvot
 wrote:

> Thanks for v13!

Rebased and fixes inside in the attached v14 (it passes CI too):

> Looking at 0003:
>
> === 1
>
> +  NUMA mappings for shared memory allocations
>
> s/NUMA mappings/NUMA node mappings/ maybe?

Done.

> === 2
>
> +  
> +   The pg_shmem_numa_allocations view shows NUMA 
> nodes
> +   assigned allocations made from the server's main shared memory segment.
>
> What about?
>
> "
> shows how shared memory allocations in the server's main shared memory segment
> are distributed across NUMA nodes" ?

Done.

> === 3
>
> +   numa_zone_id int4
>
> s/numa_zone_id/zone_id? to be consistent with pg_buffercache_numa introduced 
> in
> 0002.
>
> BTW, I wonder if "node_id" would be better (to match the descriptions...).
> If so, would also need to be done in 0002.

Somewhat duplicate, please see answer for #9

> === 4
>
> +  ID of NUMA node
>
> NUMA node ID? (to be consistent with 0002).
>
> === 5
>
> +static bool firstUseInBackend = true;
>
> Let's use firstNumaTouch to be consistent with 0002.

Done.

> === 6
>
> + elog(NOTICE, "libnuma initialization failed or NUMA is not supported on 
> this platform, some NUMA data might be unavailable.");;
>
> There is 2 ";" + I think that we should used the same wording as in
> pg_buffercache_numa_pages().
>
> === 7
>
> What about using ERROR instead? (like in pg_buffercache_numa_pages())

Both are synced now.

> === 8
>
> +   /*
> +* This is for gathering some NUMA statistics. We might be using 
> various
> +* DB block sizes (4kB, 8kB , .. 32kB) that end up being allocated in
> +* various different OS memory pages sizes, so first we need to 
> understand
> +* the OS memory page size before calling move_pages()
> +*/
> +   os_page_size = pg_numa_get_pagesize();
>
> Maybe use the same comment as the one in pg_buffercache_numa_pages() before 
> calling
> pg_numa_get_pagesize()?

Done, improved style of the comment there and synced pg_buffercache
one to shmem.c one.

> === 9
>
> +   max_zones = pg_numa_get_max_node();
>
> I think we are mixing "zone" and "node". I think we should standardize on one
> and use it everywhere (code and doc for both 0002 and 0003). I'm tempted to
> vote for node, but zone is fine too if you prefer.

Given that numa(7) does not use "zone" keyword at all and both
/proc/zoneinfo and /proc/pagetypeinfo shows that NUMA nodes are split
into zones, we can conclude that "zone" is simply a subdivision within
a NUMA node's memory (internal kernel thing). Examples are ZONE_DMA,
ZONE_NORMAL, ZONE_HIGHMEM. We are fetching just node id info (without
internal information about zones), therefore we should stay away from
using "zone" within the patch at all, as we are just fetching NUMA
node info. My bad, it's a terminology error on my side from start -
I've probably saw "zone" info in some command output back then when we
had that workshop and started using it and somehow it propagated
through the patchset up to this day... I've adjusted it all and
settled on "numa_node_id" column name.

> === 10
>
> +   /*
> +* Preallocate memory all at once without going into details which 
> shared
> +* memory segment is the biggest (technically min s_b can be as low as
> +* 16xBLCKSZ)
> +*/
>
> What about?
>
> "
> Allocate memory for page pointers and status based on total shared memory 
> size.
> This simplified approach allocates enough space for all pages in shared memory
> rather than calculating the exact requirements for each segment.
> " instead?

Done.

> === 11
>
> +   int shm_total_page_count,
> +   shm_ent_page_count,
>
> I think those 2 should be uint64.

Right...

> === 12
>
> +   /*
> +* XXX: We are ignoring in NUMA version reporting of the following 
> regions
> +* (compare to pg_get_shmem_allocations() case): 1. output shared 
> memory
> +* allocated but not counted via the shmem index 2. output as-of-yet
> +* unused shared memory
> +*/
>
> why XXX?
>
> what about?
>
> "
> We are ignoring the following memory regions (as compared to
> pg_get_shmem_allocations())

Fixed , it was apparently leftover of when I was thinking if we should
still report it.

> === 13
>
> +   page_ptrs = palloc(sizeof(void *) * shm_total_page_count);
> +   memset(page_ptrs, 0, sizeof(void *) * shm_total_page_count);
>
> maybe we could use palloc0() here?

Of course!++

> === 14
>
> and I realize that we could probably use it in 0002 for os_page_ptrs.

Of course!++

> === 15
>
> I think there is still some multi-lines comments that are missing a period. I
> probably also missed some in 0002 during the previous review. I think that's
> worth another check.

Please do such a check, I've tried pgident on all .c files, but I'm
apparently blind to such issues. BTW if patch has anything left that
causes pgident to fix, th

Re: Making sslrootcert=system work on Windows psql

2025-04-05 Thread Daniel Gustafsson
> On 3 Apr 2025, at 16:26, Daniel Gustafsson  wrote:
>> On 3 Apr 2025, at 14:41, George MacKerron  wrote:

>> Your diff certainly fixes (1b), so it’s definitely an improvement.
> 
> Thanks, unless Jacob objects I propose to apply that backpatched down to when
> sslrootcert=system went in.

Hearing no objections I've committed this with a backpatch down to v16.

--
Daniel Gustafsson





Re: Test to dump and restore objects left behind by regression

2025-04-05 Thread Ashutosh Bapat
On Thu, Mar 20, 2025 at 10:09 PM Alvaro Herrera  wrote:
>
> On 2025-Mar-20, vignesh C wrote:
>
> > Will it help the execution time if we use --jobs in case of pg_dump
> > and pg_restore wherever supported:
>
> As I said in another thread, I think we should enable this test to run
> without requiring any PG_TEST_EXTRA, because otherwise the only way to
> know about problems is to commit a patch and wait for buildfarm to run
> it.  Furthermore, I think running all 4 dump format modes is a waste of
> time; there isn't any extra coverage by running this test in additional
> formats.
>
> Putting those two thoughts together with yours about running with -j,
> I propose that what we should do is make this test use -Fc with no
> compression (to avoid wasting CPU on that) and use a lowish -j value for
> both pg_dump and pg_restore, probably 2, or 3 at most.  (Not more,
> because this is likely to run in parallel with other tests anyway.)

-Fc and -j are not allowed. -j is only allowed for directory format.

$ pg_dump -Fc -j2
pg_dump: error: parallel backup only supported by the directory format

Using just directory format, on my laptop with dev build (because
that's what most developers will use when running the tests)

$ meson test -C $BuildDir pg_upgrade/002_pg_upgrade | grep 002_pg_upgrade

without dump/restore test
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
33.51s   19 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
34.22s   19 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
34.64s   19 subtests passed

without -j, extra ~9 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.33s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.25s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
43.10s   28 subtests passed

with -j2, extra 7.5 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
42.77s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.67s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.88s   28 subtests passed

with -j3, extra 7 seconds
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
40.77s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.05s   28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.28s   28 subtests passed

Between -j2 and -j3 there's not much difference so we could use -j2.
But it still takes 7.5 extra seconds which almost 20% extra time. Do
you think that will be acceptable? I saw somewhere Andres mentioning
that he runs this test quite frequently. Please note that I would very
much like this test to be run by default, but Tom Lane has expressed a
concern about adding even that much time [1] to run the test and
mentioned that he would like the test to be opt-in.

When I started writing the test one year before, people raised
concerns about how useful the test would be. Within a year it has
shown 4 bugs. I have similar feeling about the formats - it's doubtful
now but will prove useful soon especially with the work happening on
dump formats in nearby threads. If we run the test by default, we
could run directory with -j by default and leave other formats as
opt-in OR just forget those formats for now. But If we are going to
make it opt-in, testing all formats gives the extra coverage.

About the format coverage, consensus so far is me and Daniel are for
including all formats when running test as opt-in. Alvaro and Vignesh
are for just one format. We need a tie-breaker or someone amongst us
needs to change their vote :D.

--
Best Wishes,
Ashutosh Bapat




Re: Statistics Import and Export

2025-04-05 Thread Corey Huinker
>
> > Also, why do we need the clause "WHERE s.tablename = ANY($2)"? Isn't
> > that already implied by "JOIN unnest($1, $2) ... s.tablename =
> > u.tablename"?
>
> Good question.  Corey, do you recall why this was needed?
>

In my patch, that SQL statement came with the comment:

+ /*
+ * The results must be in the order of relations supplied in the
+ * parameters to ensure that they are in sync with a walk of the TOC.
+ *
+ * The redundant (and incomplete) filter clause on s.tablename = ANY(...)
+ * is a way to lead the query into using the index
+ * pg_class_relname_nsp_index which in turn allows the planner to avoid an
+ * expensive full scan of pg_stats.
+ *
+ * We may need to adjust this query for versions that are not so easily
+ * led.
+ */


Re: RFC: Additional Directory for Extensions

2025-04-05 Thread Tom Lane
Matheus Alcantara  writes:
> (Not sure if we should also improve the message to make the test failure less
> opaque?)

Yeah, I was wondering how to do that.  The earlier tests in that
script show the whole row from pg_available_extensions, not just a
bool ... but that doesn't help if the problem is we don't find a row.

regards, tom lane




Re: split func.sgml to separated individual sgml files

2025-04-05 Thread David G. Johnston
On Wed, Nov 13, 2024 at 1:11 PM Corey Huinker 
wrote:

> The following is step-by-step logic.
>>
>>
> The end result (one file per section) seems good to me.
>
> I suspect that reviewer burden may be the biggest barrier to going
> forward. Perhaps breaking up the changes so that each new sect1 file gets
> its own commit, allowing the reviewer to more easily (if not
> programmatically) verify that the text that moved out of func.sgml moved
> into func-sect-foo.sgml.
>
> Granted, the committer will likely squash all of those commits down into
> one big one, but by the the hard work of reviewing is done by then.
>
>
Validation is pretty trivial.  I just built the before and after HTML files
and confirmed they are exactly the same size.

I suppose we might have lost some comments or something that wouldn't end
up visible in the HTML (seems unlikely) but this is basically one-and-done
so long as you don't let other commits happen (that touch this area) while
you extract and build HEAD and then compare it to the patched build
results.  The git diff will let us know the script didn't affect any source
files it wasn't supposed to.

In short, ready to commit (see last paragraph below however), but the
committer will need to run the python script at the time of commit on the
then-current tree.

In my recent patch touching filelist.sgml I would be placing this new
%allfiles_func; line pairing at the top just beneath %allfiles; which is
the first child element.  But the choice made here makes sense should this
go in first.

There is little downside, though, to renaming the existing %allfiles; to
%allfiles_ref; It's a local-only name.

David J.


Re: Logging which local address was connected to in log_line_prefix

2025-04-05 Thread Tom Lane
Jim Jones  writes:
> In that case, it LGTM.

I looked at 0002 briefly.  I don't have any particular objection to
the proposed feature, but I'm quite concerned about the potential
performance implications of doing a new pg_getnameinfo_all() call
for every line of log output.  I think that needs to be cached
somehow.  It's a little tricky since we may pass through this function
one or more times before the socket info has been filled in, but it
seems do-able.

"Local address" doesn't convey a lot to my mind, and Jim's confusion
about what "[local]" means seems tied to that.  Can we think of a
different explanation for the docs?  Also, in %h we use "[local]" to
signify a Unix socket, but this patch prints that for both the
Unix-socket case and the case of no client connection at all.
I think "[none]" or some such would be a better idea than "[local]"
for background processes.

The patch pays no attention to the "padding" feature that
all the other switch cases honor.

Also, the coding style is randomly unlike project style in various
details, notably paren placement and the use of "0 == foo" to mean
"!foo".

regards, tom lane




Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-05 Thread Andrei Lepikhov

On 3/23/25 22:16, David Rowley wrote:

On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov  wrote:
Can you explain why "Estimated Capacity" and "Estimated Distinct
Lookup Keys" don't answer that?  If there are more distinct lookup
keys than there is capacity to store them, then some will be evicted.
I wouldn't say these parameters don't answer. I try to debate how usable 
they are. To be more practical, let me demonstrate the example:


EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;

 Nested Loop  (cost=0.44..7312.65 rows=211330 width=33)
   ->  Seq Scan on t1  (cost=0.00..492.00 rows=3 width=22)
   ->  Memoize  (cost=0.44..3.82 rows=7 width=11)
 Cache Key: t1.x
 Cache Mode: logical
 Estimated Capacity: 1001  Estimated Distinct Lookup Keys: 1001
 ->  Index Scan using t2_x_idx2 on t2  (cost=0.43..3.81 rows=7)
   Index Cond: (x = t1.x)

At first, I began to look for documentation because it was unclear what 
both new parameters specifically meant. Okay, there was no documentation 
but trivial code, and after a short discovery, I realised the meaning.
The first fact I see from this EXPLAIN is that Postgres estimates it has 
enough memory to fit all the entries. Okay, but what does it give me? I 
may just increase work_mem and provide the query with more memory if 
needed. My main concern is how frequently this cache is planned to be 
used. Doing some mental effort, I found the line "rows=3." 
Calculating a bit more, I may suppose it means that we have a 95% chance 
to reuse the cache. Okay, I got it.

Now, see:
1. I needed to discover the meaning of the new parameters because they 
were different from the earlier "hit" and "miss."
2. I need to find a common JOIN for keys of this node. Imagine a typical 
200-row EXPLAIN with 2-3 Memoization keys from different tables.

3. I need to make calculations

On the opposite, the hit ratio, written instead, already known by 
analogy, already provides me with necessary cache efficacy data; no need 
to watch outside the node; it may be easily compared with the actual 
value. Am I wrong?


Both approaches provide the data, but each one is more usable?
I think we may ask more people, for example, Nikolay Samokhvalov, who, 
as I heard, works hard with explains.




Once again, I'm not necessarily objecting to hit and evict ratios
being shown, I just want to know they're actually useful enough to
show and don't just bloat the EXPLAIN output needlessly. So far your
arguments aren't convincing me that they are.

I'm -1 for this redundancy.

--
regards, Andrei Lepikhov




Re: Reducing the log spam

2025-04-05 Thread Tom Lane
Fujii Masao  writes:
> Filtering log messages by SQLSTATE might be useful for some users,
> but I'm unsure if it should belong in the core. There are also other
> potential filtering needs, such as by application name, client host,
> database, or roles. Adding only SQLSTATE filtering may not be good idea,
> while supporting all possible cases in the core wouldn't be practical either.

> Given that, I think implementing this as an extension using emit_log_hook
> would be a better approach. Anyway, I'd like to hear more opinions from
> other hackers on this.

I took a brief look and I concur with Fujii-san's conclusion: this'd
be a fine extension a/k/a contrib module, but it seems a bit too
opinionated about what sort of filtering is needed to be appropriate
within elog.c itself.

Also, just as a minor coding thought, I'd suggest using -1 not 0
to terminate the array of encoded SQLSTATEs, rather than assuming
that nobody would write 0.  Compare commit 58fdca220.

regards, tom lane




Re: Using read stream in autoprewarm

2025-04-05 Thread Melanie Plageman
On Mon, Mar 31, 2025 at 2:58 PM Nazir Bilal Yavuz  wrote:
>
> Do you think that I should continue to
> attach both approaches?

No, for now let's try and get this approach to a good place and then
see which one we like.

I think there might be another problem with the code. We only set
cur_database in the loop in autoprewarm_databas_main() when it is 0

if (cur_database != blk->database)
{
if (cur_database == 0)
cur_database = blk->database;

I know that the read stream will return InvalidBlockNumber when we
move onto the next database, but I don't see how we will end up
actually stopping prewarming in autoprewarm_database_main() when we
move on to the next database.

Another thing:
I don't know if it is a correctness issue but in
autoprewarm_database_main(), in this case
if (!rel && cur_filenumber != blk->filenumber)
{
you have removed the Assert(rel == NULL) -- I worry that we will end
up with a rel from a previous iteration after failing to open th enext
rel. I think we want this assert.

And a last thing
I noticed is that the initial values for cur_database, cur_filenumber,
and nblocks_in_fork in autoprewarm_database_main() are all initialized
to different kinds of initial values for different reasons. I'm
thinking if there is a way to make it consistent.

cur_database = block_info[pos].database;
cur_filenumber = InvalidOid;
nblocks_in_fork = InvalidBlockNumber;

cur_database is set to be the same as the first database in the array
so that we won't hit
if (cur_database != blk->database)
on the first block.

However, we make cur_filenumber InvalidOid because for the first block
we want to hit code that forces us to open a new relation
if (!rel && cur_filenumber != blk->filenumber)

And nblocks_in_fork to InvalidBlockNumber so that 1) we don't have to
get the number before starting the loop and 2) so we would move past
BlockInfoRecords with invalid filenumber and invalid blocknumber

if (cur_filenumber == blk->filenumber &&
blk->blocknum >= nblocks_in_fork)

So, I'm just thinking if it would be correct to initialize
cur_database to InvalidOid and to check for that before skipping a
block, or if that doesn't work when the first blocks' database is
InvalidOid.

- Melanie




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-04-05 Thread Euler Taveira
On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
> I have incorporated the "--remove/-r" parameter in the attached patch,
> as it seems more intuitive and straightforward for users.
> The attached patch contains the latest changes.

There were a lot of discussion around the single vs multiple options since my
last review [1] so I'm answering some of the questions here.

Due to the nature of removing multiple objects, I would say that a general
option that has a value and can be informed multiple times is the way to go. I
saw that the discussion led to this. Regarding the name, my preference is
--drop since we already have other binaries with similar options (pg_receivewal
and pg_recvlogical have --drop-slot).

The commit message needs some adjustments. There are redundant information (1st
and last paragraph).

+   publications. This option is useful when
+   transitioning from streaming replication to logical replication as
+   existing objects may no longer be needed. Before using this option,

Use physical replication. That's what we used in the current
pg_createsubscriber documentation and it is also the terminology referred in
the glossary (see Replication).

booltwo_phase;  /* enable-two-phase option */
+   uint32  remove_objects; /* flag to remove objects on subscriber */

uint32 -> bits32.

-static void setup_subscriber(struct LogicalRepInfo *dbinfo,
-const char *consistent_lsn);
+static void setup_subscriber(const char *consistent_lsn);

This is unrelated to this patch.

- * replication setup.
+ * replication setup. If 'remove' parameter is specified, existing publications
+ * on the subscriber will be dropped before creating new subscriptions.
  */
static void
-setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
+setup_subscriber(const char *consistent_lsn)

There is no parameter 'remove' in this function. I understand that you want to
provide information about the remove option but it is not the right place to
add it. If for some reason you need to change or remove such parameter, it is
easier to left this comment because it is not near the place you are removing
some lines of code.

+   struct LogicalRepInfo *dbinfo = &dbinfos.dbinfo[i];
 
/* Connect to subscriber. */
-   conn = connect_database(dbinfo[i].subconninfo, true);
+   conn = connect_database(dbinfo->subconninfo, true);

This new dbinfo pointer is just a way to increase your patch size without
improving readability.

-   drop_publication(conn, &dbinfo[i]);
+   if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
+   drop_all_publications(conn, dbinfo);
+   else
+   drop_publication(conn, dbinfo, dbinfo->pubname);

At first glance, I didn't like this change. You inform dbinfo->pubname as a 3rd
parameter but the 2nd parameter is dbinfo. After reading
drop_all_publication(), I realized that's the cause for this change. Is there a
better way to do it?

+static void
+drop_all_publications(PGconn *conn, struct LogicalRepInfo *dbinfo)
+{
+   PGresult   *res;
+

I would add an Assert(conn != NULL) here to follow the same pattern as the
other functions.

+   res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;");
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not obtain publication information: %s",
+PQresultErrorMessage(res));
+   PQclear(res);
+   return;
+   }

Shouldn't it disconnect_database() here? See the pattern in other functions
that error out while querying the catalog.

+   SimpleStringListCell *cell;
+
+   for (cell = opt.remove_objects.head; cell; cell = cell->next)
+   {

You could use SimpleStringListCell in the for loop without a separate 
declaration.

+   pg_log_error("wrong remove object is specified");
+   pg_log_error_hint("Only \"publications\" can be removed.");
+   exit(1);

The main message sounds strange. Did you mean 'wrong object is specified'?

+ok($node_s->safe_psql($db1, "SELECT COUNT(*) = 2 FROM pg_publication"),
+   'two publications created before --remove is run');
+

two pre-existing publications on subscriber ?

+is($node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;"),
+   '0', 'all publications dropped after --remove is run');
+

all publications on subscriber have been removed ?

+   primary_conninfo = '$pconnstr'
+   max_logical_replication_workers = 5

Do you need to set m_l_r_w here?

+# --verbose is used twice to show more information.

This comment is superfluous. Remove it.

+# Confirm publications still remain after running 'pg_createsubscriber'
+is($node_u->safe_psql($db3, "SELECT COUNT(*) FROM pg_publication;"),
+   '2', 'all publications remain after running pg_createsubscriber');

I would remove the semicolon here because none of the SQL commands in this test
file includes it.

Files=1, Tests=43, 14 wallclock secs ( 0.

Re: Regression test postgres_fdw might fail due to autovacuum

2025-04-05 Thread Andres Freund
Hi,

On 2025-03-24 02:26:35 +0900, Fujii Masao wrote:
> > With autovacuum = off, all of these fluctuations go away.
> 
> So you are suggesting disabling autovacuum during the postgres_fdw regression 
> test?

I don't think that's a good fix in this case.


> Just my idea to stabilize the test with "RETURNING *" is to use WITH, like 
> this:
> 
> WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...

+1


On 2025-03-23 16:00:00 +0200, Alexander Lakhin wrote:
> Interestingly enough, with "log_autovacuum_min_duration = 0" added to
> TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
> to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
> indirectly.
> 
> With autovacuum = off, all of these fluctuations go away.

Hm. Is it possible that autovacuum started but didn't finish?


FWIW, I observed some oddities around autovacuum=0 recently, but didn't have
time to track the source down yet. E.g. running pg_trgm's tests against a
server with autovacuum=0 results in test failures due to plan changes, despite
no AVs getting launched.

Greetings,

Andres




Re: pg_stat_statements: improve loading and saving routines for the dump file

2025-04-05 Thread m . litsarev

Hi!

Thank you for detailed explanations.

Respectfully,
Mikhail Litsarev,
Postgres Professional: https://postgrespro.com




Re: Reducing the log spam

2025-04-05 Thread Fujii Masao




On 2025/03/17 17:12, Jim Jones wrote:


On 15.03.25 07:12, Laurenz Albe wrote:

... and here is v7, improved in the spirit of
https://postgr.es/m/E132D362-A669-4606-AFE1-B45C9DFCC141%40yesql.se



I just revisited this patch and v7 passes all tests from [1,2].
LGTM.


Since this patch is marked as ready for committer, I've started looking at it.

Filtering log messages by SQLSTATE might be useful for some users,
but I'm unsure if it should belong in the core. There are also other
potential filtering needs, such as by application name, client host,
database, or roles. Adding only SQLSTATE filtering may not be good idea,
while supporting all possible cases in the core wouldn't be practical either.

Given that, I think implementing this as an extension using emit_log_hook
would be a better approach. Anyway, I'd like to hear more opinions from
other hackers on this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: Index AM API cleanup

2025-04-05 Thread Peter Eisentraut
I have committed these four patches (squashed into three).  I made the 
error handling change in 0003 that you requested, and also the error 
handling change in 0002 discussed in an adjacent message.



On 12.03.25 16:52, Mark Dilger wrote:



On Wed, Mar 12, 2025 at 7:00 AM Peter Eisentraut > wrote:


And another small patch set extracted from the bigger one, to keep
things moving along:

0001: Add get_opfamily_method() in lsyscache.c, from your patch set.


Right, this came from v21-0006-*, with a slight code comment change and 
one variable renaming.  It is ready for commit.


0002: Add get_opfamily_member_for_cmptype().  This was called
get_opmethod_member() in your patch set, but I think that name wasn't
quite right.  I also removed the opmethod argument, which was rarely
used and is somewhat redundant.


This is also taken from v21-0006-*.  The reason I had an opmethod 
argument was that some of the callers of this function already know the 
opmethod, and without the argument, this function has to look up the 
opmethod from the syscache again.  Whether that makes a measurable 
performance difference is an open question.
Your version is ready for commit.  If we want to reintroduce the 
opmethod argument for performance reasons, we can always circle back to 
that in a later commit.


0003 and 0004 are enabling non-btree unique indexes for partition keys


You refactored v21-0011-* into v21.2-0003-*, in which an error gets 
raised about a missing operator in a slightly different part of the 
logic.  I am concerned that the new positioning of the check-and-error 
might allow the flow of execution to reach the Assert(idx_eqop) 
statement in situations where the user has defined an incomplete 
opfamily or opclass.  Such a condition should raise an error about the 
missing operator rather than asserting.


In particular, looking at the control logic:

                        if (stmt->unique && !stmt->iswithoutoverlaps)
                         {
                                
                         }
                         else if (exclusion)
                                ;

                         Assert(idx_eqop);

I cannot prove to myself that the assertion cannot trigger, because the 
upstream logic before we reach this point *might* be filtering out all 
cases where this could be a problem, but it is too complicated to 
prove.  Even if it is impossible now, this is a pretty brittle piece of 
code after applying the patch.


Any chance you'd like to keep the patch closer to how I had it in 
v21-0011-* ?


and materialized views.  These were v21-0011 and v21-0012, except that
I'm combining the switch from strategies to compare types (which was in
v21-0006 or so) with the removal of the btree requirements.


v21.2-0004-* is ready for commit.

—
Mark Dilger
EnterpriseDB:http://www.enterprisedb.com 
The Enterprise PostgreSQL Company






Re: BTScanOpaqueData size slows down tests

2025-04-05 Thread Peter Geoghegan
On Wed, Apr 2, 2025 at 12:08 PM Andres Freund  wrote:
> > It isn't at all rare for the scan to have to return about 1350 TIDs
> > from a page, though. Any low cardinality index will tend to have
> > almost that many TIDs to return on any page that only stores
> > duplicates. And scan will necessarily have to return all of the TIDs
> > from such a page, if it has to return any.
>
> I'm not sure what you're arguing for/against here? Obviously we need to handle
> that case.  I doubt that the overhead of once-per-scan allocation of a
> MaxTIDsPerBTreePage * sizeof(BTScanPosItem) array once per scan matters when
> that many tuples are returned.

I'm not arguing for or against anything. I'm just giving context.

-- 
Peter Geoghegan




Re: BTScanOpaqueData size slows down tests

2025-04-05 Thread Tom Lane
Andres Freund  writes:
> I'm a bit confused by the "MUST BE LAST" comment:
>   BTScanPosItem items[MaxTIDsPerBTreePage];   /* MUST BE LAST */

Yeah, me too.  It looks like it might've been intended to allow
the items[] array to be only partially allocated.  But as Peter
says, we don't do that.  Too long ago to be sure :=(

regards, tom lane




Re: Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c

2025-04-05 Thread Alexander Korotkov
On Fri, Nov 29, 2024 at 3:39 AM Tender Wang  wrote:
> Alexander Pyhalov  于2024年11月29日周五 00:02写道:
>>
>> Tender Wang писал(а) 2024-10-09 10:26:
>> > Hi,
>> >When I debug FDW join pushdown codes, I found below codes in
>> > semijoin_target_ok():
>> > if (bms_is_member(var->varno, innerrel->relids) &&
>> >
>> > !bms_is_member(var->varno, outerrel->relids))
>> >
>> > As far as I know, if a var belongs to the innerrel of joinrel, it's
>> > not possible that it
>> > may belong to the outerrel. So if the bms_is_member(var->varno,
>> > innerrel->relids)
>> > returns TRUE, then !bms_is_member(var->varno, outerrel->relids) must
>> > be TRUE.
>> > If bms_is_member(var->varno, innerrel->relids) returns FALSE,
>> > !bms_is_member(var->varno, outerrel->relids) will not execute due to
>> > short circuit.
>> >
>> > So I think we can remove the "!bms_is_member(var->varno,
>> > outerrel->relids)" from if.
>> > Any thoughts?
>>
>> Hi.
>> Seems good to me.
>> --
>> Best regards,
>> Alexander Pyhalov,
>> Postgres Professional
>
>
> Thanks for looking at that.

Pushed.  But I've decided to keep the redundant check as an assertion.

--
Regards,
Alexander Korotkov
Supabase




Re: long-standing data loss bug in initial sync of logical replication

2025-04-05 Thread Amit Kapila
On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > Regarding the PG13, it cannot be
> > applied
> > as-is thus some adjustments are needed. I will share it in upcoming posts.
>
> Here is a patch set for PG13. Apart from PG14-17, the patch could be created 
> as-is,
> because...
>
> 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not 
> exist.
> 2. Thus the ReorderBufferChange for the invalidation does not exist.
>Our patch tries to distribute it but cannot be done as-is.
> 3. Codes assumed that invalidation messages can be added only once.
> 4. The timing when invalidation messages are consumed is limited:
>   a. COMMAND_ID change is poped,
>   b. start of decoding a transaction, or
>   c. end of decoding a transaction.
>
> Above means that invalidations cannot be executed while being decoded.
> I created two patch sets to resolve the data loss issue. 0001 has less code
> changes but could resolve a part of issue, 0002 has huge changes but provides 
> a
> complete solution.
>
> 0001 - mostly same as patches for other versions. 
> ReorderBufferAddInvalidations()
>was adjusted to allow being called several times. As I said above,
>0001 cannot execute inval messages while decoding the transacitons.
> 0002 - introduces new ReorderBufferChange type to indicate inval messages.
>It would be handled like PG14+.
>
> Here is an example. Assuming that the table foo exists on both nodes, a
> publication "pub" which publishes all tables, and a subscription "sub" which
> subscribes "pub". What if the workload is executed?
>
> ```
> S1  S2
> BEGIN;
> INSERT INTO foo VALUES (1)
> ALTER PUBLICATION pub RENAME TO pub_renamed;
> INSERT INTO foo VALUES (2)
> COMMIT;
> LR -> ?
> ```
>
> With 0001, tuples (1) and (2) would be replicated to the subscriber.
> An error "publication "pub" does not exist" would raise when new changes are 
> done
> later.
>
> 0001+0002 works more aggressively; the error would raise when S1 transaction 
> is decoded.
> The behavior is same as for patched PG14-PG17.
>

I understand that with 0001 the fix is partial in the sense that
because invalidations are processed at the transaction end, the
changes of concurrent DDL will only be reflected for the next
transaction. Now, on one hand, it is prudent to not add a new type of
ReorderBufferChange in the backbranch (v13) but the change is not that
invasive, so we can go with it as well. My preference would be to go
with just 0001 for v13 to minimize the risk of adding new bugs or
breaking something unintentionally.

Sawada-San, and others involved here, do you have any suggestions on
this matter?

-- 
With Regards,
Amit Kapila.




Re: Add Pipelining support in psql

2025-04-05 Thread Daniel Verite
Anthonin Bonnefoy wrote:

> 0002: Allows ';' to send a query using extended protocol when within a
> pipeline by using PQsendQueryParams

It's a nice improvement!

> with 0 parameters. It is not
> possible to send parameters with extended protocol this way and
> everything will be propagated through the query string, similar to a
> simple query.

It's actually possible to use parameters

\startpipeline 
\bind 'foo'
select $1;
\endpipeline

 ?column? 
--
 foo
(1 row)

I suspect there's a misunderstanding that \bind can only be placed
after the query, because it's always written like that in the regression
tests, and in the documentation.
But that's just a notational preference.


Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/




Re: Proposal - Allow extensions to set a Plan Identifier

2025-04-05 Thread Andrei Lepikhov

On 3/25/25 00:47, Sami Imseih wrote:

I know it was mentioned above by both Michael and Andrei that
AppendJumble should not be exposed. I am not sure I agree with
that. I think it should be exposed, along with
JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
and _jumbleList.
It would be helpful if you could provide an example to support the 
reasoning behind exposing this stuff. I assume you want to influence the 
logic of jumbling, correct? If that’s the case, it might be beneficial 
to add a callback to the JumbleState that is triggered during each node 
jumbling attempt. This could facilitate moving clocations code and data 
into pg_stat_statements and give developers the opportunity to influence 
the jumbling process, adding extra meaning to their hashes.
One reason this might be necessary is that generating the same hash for 
both a Param node and a Const node could lead to a more generalized 
hash, allowing us to group more queries under the same value.


--
regards, Andrei Lepikhov




Re: Using read stream in autoprewarm

2025-04-05 Thread Melanie Plageman
On Mon, Mar 31, 2025 at 12:27 PM Nazir Bilal Yavuz  wrote:
>
> I worked on an alternative approach, I refactored code a bit. It does
> not traverse the list two times and I think the code is more suitable
> to use read streams now. I simply get how many blocks are processed by
> read streams and move the list forward by this number, so the actual
> loop skips these blocks. This approach is attached with 'alternative'
> prefix.

I am leaning toward the refactored approach because I don't think we
want to go through the array twice and I think it is hard to get it
right with incrementing p.pos in both places and being sure we
correctly close the relation etc.

Looking at your alternative approach, I don't see how the innermost
while loop in autoprewarm_database_main() is correct

/* Check whether blocknum is valid and within fork file size. */
while (cur_filenumber == blk->filenumber &&
   blk->blocknum >= nblocks_in_fork)
{
/* Move to next forknum. */
pos++;
continue;
}

Won't this just infinitely loop?

- Melanie




Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).

2025-04-05 Thread David G. Johnston
On Sat, Mar 29, 2025 at 11:56 AM Andrew Dunstan  wrote:

> I don't believe that the premise supports the conclusion.
>
>
> Regardless, I do support this patch and probably any similar ones proposed
in the future.  Do you have an opinion on that?

David J.


RE: Fix slot synchronization with two_phase decoding enabled

2025-04-05 Thread Zhijie Hou (Fujitsu)
On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

> 
> On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila 
> wrote:
> >
> > On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > Hi,
> > >
> > > When testing the slot synchronization with logical replication slots that
> > > enabled two_phase decoding, I found that transactions prepared before
> two-phase
> > > decoding is enabled may fail to replicate to the subscriber after being
> > > committed on a promoted standby following a failover.
> > >
> > > To reproduce this issue, please follow these steps (also detailed in the
> > > attached TAP test, v1-0001):
> > >
> > > 1. sub: create a subscription with (two_phase = false)
> > > 2. primary (pub): prepare a txn A.
> > > 3. sub: alter subscription set (two_phase = true) and wait for the logical
> slot to
> > >be synced to standby.
> > > 4. primary (pub): stop primary, promote the standby and let the subscriber
> use
> > >the promoted standby as publisher.
> > > 5. promoted standby (pub): COMMIT PREPARED A;
> > > 6. sub: the apply worker will report the following ERROR because it didn't
> > >receive the PREPARE.
> > >ERROR:  prepared transaction with identifier "pg_gid_16387_752"
> does not exist
> > >
> > > I think the root cause of this issue is that the two_phase_at field of the
> > > slot, which indicates the LSN from which two-phase decoding is enabled
> (used to
> > > prevent duplicate data transmission for prepared transactions), is not
> > > synchronized to the standby server.
> > >
> > > In step 3, transaction A is not immediately replicated because it occurred
> > > before enabling two-phase decoding. Thus, the prepared transaction
> should only
> > > be replicated after decoding the final COMMIT PREPARED, as referenced
> in
> > > ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at
> on the
> > > standby, the prepared transaction fails to send at that time.
> > >
> > > This problem arises after the support for altering the two-phase option
> > > (1462aad).
> > >
> 
> I suspect that this can happen in PG17 as well, but I need to think
> more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

However, since the proposed fix requires catalog changes and the issue is not a
security risk significant enough to justify changing the catalog in back
branches, we cannot back-patch the same solution. Following off-list
discussions with Amit and Kuroda-san, we are considering disallowing enabling
failover and two-phase decoding together for a replication slot, as suggested
in attachment 0002.

Another idea considered is to prevent the slot that enables two-phase decoding
from being synced to standby. IOW, this means displaying the failover field as
false in the view, if there is any possibility that transactions prepared
before the two_phase_at position exist (e.g., if restart_lsn is less than
two_phase_at). However, implementing this change would require additional
explanations to users for this new behavior, which seems tricky.

> 
> In the meantime, I have a few minor comments on the proposed patches:
> 1.
> ##
>  # Promote the standby1 to primary. Confirm that:
>  # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new 
> primary
>  # b) logical replication for regress_mysub1 is resumed successfully
> after failover
>  # c) changes can be consumed from the synced slot 'snap_test_slot'
>  ##
> -$standby1->start;
>  $primary->wait_for_replay_catchup($standby1);
> 
>  # Capture the time before the standby is promoted
> @@ -885,6 +940,15 @@ $standby1->wait_for_catchup('regress_mysub1');
>  is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
>   "20", 'data replicated from the new primary');
> 
> +# Commit the prepared transaction
> +$standby1->safe_psql('postgres',
> + "COMMIT PREPARED 'test_twophase_slotsync';");
> +$standby1->wait_for_catchup('regress_mysub1');
> +
> +# Confirm that the prepared transaction is replicated to the subscriber
> +is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
> + "21", 'prepared data replicated from the new primary');
> 
> The commentary above this test should include information about
> verifying the replication of previously prepared transactions after
> promotion. Also, it would be better if confirm the commit prepared
> before confirming the new Insert is replicated after promotion.
> 
> 2.
> @@ -249,6 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot,
> Oid remote_dbid,
>   SpinLockAcquire(&slot->mutex);
>   slot->data.restart_lsn = remote_slot->restart_lsn;
>   slot->data.confirmed_flush = remote_slot->confirmed_lsn;
> + slot->data.two_phase_at = remote_slot->two_phase_at;
> 
> Why do we need to update the two_phase_at here when the patch does it
> later in this function when local and

Re: [PATCH] Add sortsupport for range types and btree_gist

2025-04-05 Thread Heikki Linnakangas

On 11/03/2025 20:28, Bernd Helmle wrote:

Please find a new rebased version of this patch.


Hmm, if we implement sortsupport function for GiST, we can register it 
for B-tree opfamily as well. The range comparison function has quite 
high overhead thanks to detoasting, but I'm nevertheless seeing a tiny 
speedup. I tested that with:


setup:

create table intranges (r int4range);
insert into intranges select int4range(g, g+10) from generate_series(1, 
100) g;

vacuum freeze intranges;

test script 'rangesort.sql':

set work_mem='100 MB';
explain analyze select r from intranges order by r;

test:

pgbench -n -f rangesort.sql -P1 postgres -t100

On my laptop, that reports avg latency of 152 ms on master, and 149 ms 
latency with the patch. Nothing to write home about, but we might as 
well take it if we have the sortsupport function for gist anyway.


So I added it for the btree opfamily too, and moved the function to 
rangetypes.c since it's not just for gist anymore. I Ccmmitted that 
part, and will start looking more closely at the remaining btree_gist 
parts now.


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: RFC: Logging plan of the running query

2025-04-05 Thread Sadeq Dousti
Hi,

Sergey and I reviewed this patch and have a few comments, listed below.

> BTW the patch adds about 400 lines to explain.c and it may be better
> to split the file as well as 9173e8b6046, but I leave it as it is for
> now.

1. As rightfully described by the OP above, explain.c has grown too big. In
addition, explain.h is added to quite a number of other files.

2. We wanted to entertain the possibility of a GUC variable to enable the
feature. That is, having it disabled by default, and the DBA can
selectively enable it on the fly. The reasoning is that it can affect some
workloads in an unforeseen way, so this choice can make the next Postgres
release safer. In future releases, we can make the default "enabled",
assuming enough real-world scenarios have proven the feature safe (i.e.,
not affecting the DB performance noticeably).

3. In the email thread for this patch, we saw some attempts to measure the
performance overhead of the feature. We suggest a more rigorous study,
including memory overhead in addition to time overhead. We came to the
conclusion that analytical workloads - with complicated query plans - and a
high query rate are the best to attempt such benchmarks.

4. In PGConf EU 2024, there was a talk by Rafael Castro titled "Debugging
active queries" [1], and he also submitted a recent patch titled "Proposal:
Progressive explain" [2]. We see a lot of similarities in the idea behind
that patch and this one, and would like to suggest joining forces for an
ideal outcome.

[1] https://www.youtube.com/watch?v=6ahTb-7C05c
[2] https://commitfest.postgresql.org/patch/5473/

Best Regards,
Sadeq Dousti & Sergey Dudoladov


Re: Using read stream in autoprewarm

2025-04-05 Thread Melanie Plageman
On Sun, Mar 30, 2025 at 10:01 AM Nazir Bilal Yavuz  wrote:
>
> > Some review feedback on your v4: I don't think we need the
> > rs_have_free_buffer per_buffer_data. We can just check
> > have_free_buffers() in both the callback and main loop in
> > autoprewarm_database_main(). I also think you want a comment about why
> > first_block is needed. And I think you need to guard the
> > read_stream_end() after the loop -- what if we never made a read
> > stream because we errored out for all the block's relations or
> > something?
>
> All of these are addressed. One extra thing I noticed is we were not
> checking if blocknum < number_of_block_in_relation at the first_block
> case in the stream callback, this is fixed now.

I'm wondering why you need to check if have_free_buffer() in the else
branch after getting the buffer from the read stream API. Can't you
just put it back in the for loop condition? Seems like it would have
the same effect.

-   for (; pos < apw_state->prewarm_stop_idx; pos++)
+   for (; pos < apw_state->prewarm_stop_idx && have_free_buffer(); pos++)
{
BlockInfoRecord *blk = &block_info[pos];
Buffer  buf;
@@ -640,9 +640,6 @@ autoprewarm_database_main(Datum main_arg)
apw_state->prewarmed_blocks++;
ReleaseBuffer(buf);
}
-   /* There are no free buffers left in shared buffers, break the loop. */
-   else if (!have_free_buffer())
-   break;

Looking at the code some more, I feel stuck on my original point about
incrementing the position in two places.
AFAICT, you are going to end up going through the array twice with this design.
Because you don't set the pos variable in autoprewarm_database_main()
from the p->pos variable which the read stream callback is
incrementing, if the read stream callback increments p->pos it a few
positions yielding those blocks to the read stream machinery to read,
you are then going to iterate over those positions in the array again
in the autoprewarm_database_main() loop.

I think you can get around this by setting pos from p->pos in
autoprewarm_database_main() after read_stream_next_buffer(). Or by
using p->pos in the loop in autoprewarm_database_main() (which is
basically what Andres suggested). I'm not sure, though, if this has
any problems. Like not closing a relation in the right place.

- Melanie




RE: pg_recvlogical requires -d but not described on the documentation

2025-04-05 Thread Hayato Kuroda (Fujitsu)
Dear Fujii-san,

> I've updated the commit messages for both patches and also revised
> the code comments in the 0002 patch. The updated patches are attached.
> 
> Unless there are any objections, I'm thinking to commit them.

Thanks for updating the patch. LGTM.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Prune partitions by ScalarArrayOpExpr with an array parameter (partkey = ANY($1))

2025-04-05 Thread Andrei Lepikhov

On 3/17/25 14:28, Andrei Lepikhov wrote:

Hi,

As I see, initial pruning doesn't work in the case when a 
ScalarArrayOpExpr contains a parameter as the RHS of the expression, 
like following:


partkey = ANY($1)

As colleagues say, it is quite typical to use stored procedures, pass an 
array of IDs as a parameter, and use it in a SELECT clause.


So, here I propose a patch that extends pruning machinery. It is nothing 
innovative or complicated, but I'm not sure it is fully operational so 
far: it may need some discussion, review and polishing.


I intended to add it to the next commitfest if this feature makes sense.


For the record, here I want to resolve a bit more general issue, than 
just pruning on a "key op ANY($1)" expression. For example, a hashed 
InitPlan also may be used for this purpose:


CREATE TABLE part (a int)
PARTITION BY RANGE (a);

CREATE TABLE part1 PARTITION OF part FOR VALUES FROM (0) to (101);
CREATE TABLE part2 PARTITION OF part FOR VALUES FROM (101) to (200);
INSERT INTO part (a) SELECT value%200 FROM generate_series(1,1000) AS value;
CREATE TABLE other (x int, y int);
INSERT INTO other (x,y) VALUES (1,1);

EXPLAIN (VERBOSE, ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF)
SELECT * FROM part
WHERE a = ANY((SELECT array_agg(x) AS x
FROM other
WHERE y between 0 and 100)::integer[]);

I think subqueries is quite a common pattern and pruning in this case 
makes sense.


--
regards, Andrei Lepikhov




Re: Snapshot related assert failure on skink

2025-04-05 Thread Tomas Vondra



On 3/24/25 16:25, Heikki Linnakangas wrote:
> On 24/03/2025 16:56, Tomas Vondra wrote:
>>
>>
>> On 3/23/25 17:43, Heikki Linnakangas wrote:
>>> On 21/03/2025 17:16, Andres Freund wrote:
 Am I right in understanding that the only scenario (when in
 STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds()
 would
 "legally" remove a transaction, rather than the commit / abort records
 doing
 so, is if the primary crash-restarted while transactions were ongoing?

 Those transactions won't have a commit/abort records and thus won't
 trigger
 ExpireTreeKnownAssignedTransactionIds(), which otherwise would have
 updated
 ->xactCompletionCount?
>>>
>>> Correct.
>>>
 When writing the snapshot caching patch, I tried to make sure that all
 the
 places that maintain ->latestCompletedXid also update
 ->xactCompletionCount. Afaict that's still the case. Which, I think,
 means
 that we're also missing calls to MaintainLatestCompletedXidRecovery()?
>>>
>>> Yep, I was just looking at that too.
>>>
 If latestCompletedXid is incorrect visibility determinations end up
 wrong...
>>>
>>> I think it happens to work, because the transactions are effectively
>>> aborted. latestCompletedXid is used to initialize xmax in
>>> GetSnapshotData. If, for example, latestCompletedXid is incorrectly set
>>> to 1000 even though XID 1001 already aborted, a snapshot with xmax=1000
>>> still correctly considers XID 1001 as "not visible". As soon as a
>>> transaction commits, latestCompletedXid is fixed.
>>>
>>> AFAICS we could skip updating latestCompletedXid on aborts altogether
>>> and rename it to latestCommittedXid. But it's hardly worth optimizing
>>> for aborts.
>>>
>>> For the same reason, I believe the assertion failure we're discussing
>>> here is also harmless. Even though the reused snapshot has a smaller
>>> xmin than expected, all those transactions aborted and are thus not
>>> visible anyway.
>>>
>>> In any case, let's be tidy and fix both latestCompletedXid and
>>> xactCompletionCount.
>>>
>>
>> Thanks for looking into this and pushing the fix.
>>
>> Would it make sense to add a comment documenting this reasoning about
>> not handling aborts? Otherwise someone will get to rediscover this in
>> the future ...
> 
> Well, we're currently _not_ relying on the reasoning about aborts not
> changing visibility. So it's not like someone will forget the reasoning
> and have a bug as a result. I guess someone could rediscover that
> reasoning and think that they can skip advancing those counters on
> aborts as an optimization, re-introducing the assertion failure. But I
> believe that was not why we had this bug, it was just a simple omission.
> 
> So the comment would look like this:
> 
> "In principle, we would not need to advance xactCompletionCount and
> latestCompletedXid because aborting transactions don't change
> visibility. But that could make xmin within a transaction go backwards,
> if a snapshot with an older xmin but same xactCompletionCount is reused,
> triggering the assertion in GetSnapshotDataReuse()."
> 
> If we add that, I guess it should go into ProcArrayEndTransaction()
> which currently just notes that it's used for commits and aborts
> interchangeably. Do you think it's worth it? Want to propose a patch?
> 

I think the wording you suggested above is reasonable - I certainly
don't have a better one. I don't know what's the likelihood of someone
breaking this, but I think it makes sense to mention this because the
assert is non-trivial to hit.

regards

-- 
Tomas Vondra





Re: pgsql: pg_upgrade: Preserve default char signedness value from old clus

2025-04-05 Thread Robert Haas
On Tue, Mar 18, 2025 at 12:36 AM Masahiko Sawada  wrote:
> > Cool. The commit message refers to 003_char_signedness, but the test
> > name is 005, not 003.
>
> Thank you for reviewing the patch. I've pushed the patch after fixing it.

Thanks for taking care of it (and so quickly!).

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Proposal: Progressive explain

2025-04-05 Thread Rafael Thofehrn Castro
> I haven't looked into the code yet, but when I ran below commands during
> make installcheck, there was an error and an assertion failure

Thanks for the report. I actually made a nasty mistake in the last
patch after code refactoring, which is to not properly check that
a QueryDesc is already being tracked. So every subquery call
in the same query is allocating DSAs and those segments are not
being properly cleared. So the patch is broken and probably explains
your crashes.

Just made a 1 line fix and make installcheck looks clean now. Will
do more tests before sending another version.


Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).

2025-04-05 Thread jian he
On Wed, Apr 2, 2025 at 11:20 PM Fujii Masao  wrote:
> >
> >  if (!RelationIsPopulated(rel))
> >  ereport(ERROR,
> >  errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >  errmsg("cannot copy from unpopulated
> > materialized view \"%s\"",
> >  RelationGetRelationName(rel)),
> >  errhint("Use the REFRESH MATERIALIZED VIEW
> > command to populate the materialized view first."));
>
> I think it's better to use the same hint message as the one output by
> "COPY (SELECT * FROM ) TO",
> specifically: "Use the REFRESH MATERIALIZED VIEW command," for consistency.
>
ok.

>
> >> The copy.sgml documentation should clarify that COPY TO can
> >> be used with a materialized view only if it is populated.
> >>
> > "COPY TO can be used only with plain tables, not views, and does not
> > copy rows from child tables or child partitions"
> > i changed it to
> > "COPY TO can be used with plain tables and materialized views, not
> > regular views, and does not copy rows from child tables or child
> > partitions"
>
> It would be clearer to specify that "COPY TO" applies to *populated*
> materialized views rather than just "materialized views"?
>
>
> > Another alternative wording I came up with:
> > "COPY TO can only be used with plain tables and materialized views,
> > not regular views. It also does not copy rows from child tables or
> > child partitions."
>
> If we split the first description into two as you suggested,
> I'm tempted to propose the following improvements to enhance
> the overall descriptions:
>
> -
> "COPY TO" can be used with plain tables and populated materialized views. For 
> example, "COPY table TO" copies the same rows as "SELECT * FROM ONLY table." 
> However, it doesn't directly support other relation types, such as 
> partitioned tables, inheritance child tables, or views. To copy all rows from 
> these relations, use "COPY (SELECT * FROM table) TO."
> -
>

your wording makes sense to me.
I try to ensure that the changing part in copy.sgml the line width
is less than 80 characters.
but I also want to make sure "<>" "" within the same line.
so after the change it becomes:

   
COPY TO can be used with plain
tables and populated materialized views.
For example,
COPY table
TO
copies the same rows as
SELECT * FROM ONLY table.
However it doesn't directly support other relation types,
such as partitioned tables, inheritance child tables, or views.



> The tests seem to have been placed under the category "COPY FROM ... DEFAULT",
> which feels a bit misaligned. How about adding them to the end of copy.sql 
> instead?
>
ok.
From ba91ac0d9403f304d6c72683ca2c471e2c3c3ecc Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 3 Apr 2025 16:45:48 +0800
Subject: [PATCH v3 1/1] COPY materialized_view TO

generally `COPY table TO` is faster than `COPY (query)`.
since populated materialized view have physical storage, so
this can use table_beginscan, table_endscan to scan a table.

context: https://postgr.es/m/8967.1353167...@sss.pgh.pa.us
context: https://www.postgresql.org/message-id/flat/20121116162558.90150%40gmx.com
discussion: https://postgr.es/m/CACJufxHVxnyRYy67hiPePNCPwVBMzhTQ6FaL9_Te5On9udG=y...@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5533/
---
 doc/src/sgml/ref/copy.sgml | 14 --
 src/backend/commands/copyto.c  | 13 -
 src/test/regress/expected/copy.out | 11 +++
 src/test/regress/sql/copy.sql  |  8 
 4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..c855e98f757 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -520,12 +520,14 @@ COPY count
   Notes
 

-COPY TO can be used only with plain
-tables, not views, and does not copy rows from child tables
-or child partitions.  For example, COPY table TO copies
-the same rows as SELECT * FROM ONLY table.
+COPY TO can be used with plain
+tables and populated materialized views.
+For example,
+COPY table TO
+copies the same rows as
+SELECT * FROM ONLY table.
+However it doesn't directly support other relation types,
+such as partitioned tables, inheritance child tables, or views.
 The syntax COPY (SELECT * FROM table) TO ... can be used to
 dump all of the rows in an inheritance hierarchy, partitioned table,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..9233cbcecb4 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -653,11 +653,14 @@ BeginCopyTo(ParseState *pstate,
 			RelationGetRelationName(rel)),
 	 errhint("Try the COPY (SELECT ...) TO variant.")));
 		else if (rel->rd_rel->relkind == RELKIND_MATVIEW)
-			ereport(ERROR,
-	(errcode(ERRCODE

Re: making EXPLAIN extensible

2025-04-05 Thread Robert Haas
On Thu, Mar 20, 2025 at 3:04 AM Andrei Lepikhov  wrote:
> I'm sorry, I was confused; previously, the difficulties faced by
> extension developers were always attributed to him (refer to the
> discussion on the selectivity hook). However, now you're introducing a
> hook for a trivial operation that could simply be resolved at the end of
> execution within a per-node hook with tiny inconsistency in output. I'm
> pleased to see a change in the narrative.

It sounds like we're sufficiently in agreement so I've committed the
patch. I agree with you that this could be done within the per-node
hook, but that would be awkward, and this is better. I take note of
your frustration with the difficulty of getting hooks added and to
some extent I share it; on the flip side, there are plenty of
poorly-considered hook proposals, too.

I've rebased my pg_overexplain patch and attach that here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v8-0001-pg_overexplain-Additional-EXPLAIN-options-for-deb.patch
Description: Binary data


Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-04-05 Thread Amit Kapila
On Thu, Mar 20, 2025 at 9:45 AM Shubham Khanna
 wrote:
>

The patch had a bug in dry-run mode such that it was not emitting the
drop-related command for publications created by the tool with the new
option. I fixed that and pushed the patch. Thanks!

-- 
With Regards,
Amit Kapila.




Re: SQLFunctionCache and generic plans

2025-04-05 Thread Alexander Pyhalov

Tom Lane писал(а) 2025-04-02 21:09:

I wrote:

Anyway, I feel pretty good about this patch now and am quite content
to stop here for PG 18.


Since feature freeze is fast approaching, I did a tiny bit more
cosmetic work on this patchset and then pushed it.  (There's still
plenty of time for adjustments if you have further comments.)

Thanks for working on this!  This is something I've wanted to see
done ever since we invented plancache.

regards, tom lane


Hi. I've looked through the latest patch series, but didn't have much 
time to examine them thoroughly.
Haven't found any issue so far. From cosmetic things I've noticed - I 
would set func->pcontext to NULL in sql_delete_callback() to avoid 
dangling pointer if we error out early (but it seems only to be a matter 
of taste).

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Using read_stream in index vacuum

2025-04-05 Thread Melanie Plageman
On Sun, Mar 23, 2025 at 1:02 PM Andrey Borodin  wrote:
>
> There's a BF failure with just these changes [0]. But IMO it's unrelated.
> There are 2 failed tests:
> 1. 'activeslot slot invalidation is logged with vacuum on pg_authid' is very 
> similar to what is discussed here [1]
> 2. '+ERROR: tuple concurrently deleted' in injection_points/isolation seems 
> to be discussed here [2]

Yep, I saw the recovery/035_standby_logical_decoding skink failure and
concluded it was caused by a pre-existing issue that is already being
discussed. I hadn't yet investigated the injection point isolation
test failure. So, it is good to know it seems like it is a known
issue. Thanks for being on the lookout.

- Melanie




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2025-04-05 Thread Ashutosh Bapat
On Wed, Apr 2, 2025 at 1:00 PM Amit Langote  wrote:

>
> I'm feeling good about this version, but let me know if you have any
> further thoughts / comments.

Thanks for incorporating the changes and fixing initial hash table size.

+ #define EC_DERIVES_HASH_THRESHOLD 32

Given that the constant is being used only at a single place, we don't
need a macro. But I am not against the macro.

PFA patch set with some minor edits in 0003. Also I have edited commit
message of 0001 and 0002.

In the commit messages of 0002,
1. mentioning that the lookup happens only for join clause generation
is not accurate, since we lookup EM = constant clauses as well which
are not join clauses.
2. In the second paragraph em1 and em2 are mentioned without
mentioning what are they. I have rephrased it so as to avoid
mentioning names of structure member.

--
Best Wishes,
Ashutosh Bapat
From 208f231a57431bfa126dc6c41af302fd5d761132 Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Mon, 24 Mar 2025 21:17:46 +0900
Subject: [PATCH 1/3] Add assertion to verify derived clause has constant RHS

find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member the EquivalenceMember to search for.

The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.

Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().

Author: Ashutosh Bapat 
Discussion: https://postgr.es/m/caexhw5scmxyfrqofe6odmbiw2rnvbemeeca-p4w_cyueiku...@mail.gmail.com
---
 src/backend/optimizer/path/equivclass.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0f9ecf5ee8b..493a95d26cc 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2664,7 +2664,10 @@ find_derived_clause_for_ec_member(EquivalenceClass *ec,
 		 * members on the left side of derived clauses.
 		 */
 		if (rinfo->left_em == em)
+		{
+			Assert(rinfo->right_em->em_is_const);
 			return rinfo;
+		}
 	}
 	return NULL;
 }

base-commit: 43dca8a11624d02dde2b4bd348d77b7045c0dfbc
-- 
2.34.1

From e484154330d2a6ed90ad542f4d9ffa463f1ab9be Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Wed, 2 Apr 2025 15:31:30 +0900
Subject: [PATCH 2/3] Make derived clause lookup in EquivalenceClass more
 efficient

Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are looked up later using the left and right
EquivalenceMembers and the clause's parent EC.

This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.

To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).

Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key stores NULL followed by the
non-constant EM.

The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.

The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.

The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when the list grows large.

An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.

This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and guidance on initial sizing to
balance memory usage and CPU efficiency.

Author: Ashutosh Bapat 
Reviewed-by: Amit Langote 
Reviewed-by: David Rowley 
Tested-by: Dmitry Dolgov <9erthali...@gmail.com>
Tested-by: Alvaro Herrera 
Tested-by: Amit Langote 
Tested-by: David Rowley 
Discussion: https:/

Re: psql \dh: List High-Level (Root) Tables and Indexes

2025-04-05 Thread Christoph Berg
Re: Sadeq Dousti
> > I think this is the wrong way round.
> > It should be \dtN instead of \dNt.
> 
> Hi Christoph,
> The order does not matter, the user can use \dNt or \dtN, as they do
> exactly the same thing. Letters coming after \d can be freely permuted. If
> you mean a change to the documentation or tests, I can apply whatever order
> makes more sense, as I don't have any particular preference.

Oh ok, that's perfect then.

HELP0("  \\dn[Sx+] [PATTERN] list schemas\n");
+   HELP0("  \\dN[Sx+] [PATTERN] list tables and indexes (no 
partitions)\n");
HELP0("  \\do[Sx+] [OPPTRN [TYPEPTRN [TYPEPTRN]]]\n"

Is this really a new "top-level" \d command and not a flag for the
existing \d commands? I would think that the help texts should place
the N next to S which is a similar "show different set of things"
modifier:

\d[NSx+]list tables, views, and sequences
\di[NSx+] [PATTERN] list indexes
\dt[NSx+] [PATTERN] list tables

For documentation and tests, the N should be at the end like S is at
the end in \dt[S+].

Christoph




Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS)

2025-04-05 Thread Vladlen Popolitov

Andres Freund писал(а) 2025-03-27 01:22:

Hi,


Isn't that to be expected? If I understand correctly, the way your gdb 
is
configured is that it intercepts SIGUSR1 signals *without* passing it 
on to
the application (i.e. postgres).  We rely on the signal to be 
delivered. Which

it isn't. Thus a hang.

At least my gdb doesn't intercept SIGUSR1 by default. It's a newer gdb 
though,
so that could have been different in the past (although I don't 
remember a

different behaviour).

(gdb) handle SIGUSR1
SignalStop  Print   Pass to program Description
SIGUSR1   NoNo  Yes User defined signal 1

If I change the configuration to not pass it, but print it, I can 
reproduce a

hang:
handle SIGUSR1 print nopass


What does your gdb show for "handle SIGUSR1"? If it isn't what I 
reported, is

it possible that you set that in your .gdbinit or such?




2. Original commit with reproducible behaviour.
I tracked this behaviour down to commit
> commit 7202d72787d3b93b692feae62ee963238580c877
> Date:   Fri Feb 21 08:03:33 2025 +0100
> backend launchers void * arguments for binary data
> Change backend launcher functions to take void * for binary data
> instead of char *.  This removes the need for numerous casts.
> Discussion: 
https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org


I also find it very hard to believe that this commit introduced this 
problem -
it doesn't sound like a postgres issue to me.  I can reproduce it in PG 
16,

after doing "handle SIGUSR1 print nopass".

Greetings,

Andres Freund


Dear colleagues,

I changed the debugging extension in VScode (from C_Cpp to CodeLLDB) and
this hang disappeared. Thank you for pointing, that it more relates to
a debugger than Postgres.

 Both extensions process signals internally and do not expose the 
handlers

in settings.

 Thank you for your support and time.

--
Best regards,

Vladlen Popolitov.




Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-05 Thread Robert Haas
On Wed, Apr 2, 2025 at 5:17 AM Alvaro Herrera  wrote:
> I don't quite love this behavior, but since there have been no
> complaints, I suppose it's okay and we should just do the same for
> not-nulls.

I don't understand the issue. It seems like the pg_dump output shown
here would recreate the catalog state.

> FWIW the part that I think you're not right on, is that constraints on
> partitioned tables never have local definitions.  Even if you start with
> a constraint defined locally in the partition, the ATTACH operation will
> change its conislocal flag to false.  So you can never "drop" it from
> the partition.  For regular inheritance, we don't flip the conislocal
> flag to false, but you're still prevented from "dropping" the constraint
> from the child while the inheritance relationship exists (i.e. you can
> never set conislocal=false in such a case).

Hmm. I think this is different from attislocal/attinhcount.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Draft for basic NUMA observability

2025-04-05 Thread Bertrand Drouvot
Hi,

On Sat, Apr 05, 2025 at 03:23:38PM +0200, Tomas Vondra wrote:
> Something like that. But I think it should be "align the size of ...",
> we're not aligning the start.
> 
> >> - There's a comment at the end which talks about "ignored segments".
> >> IMHO that type of information should be in the function comment,
> >> but I'm
> >> also not quite sure I understand what "output shared memory" is ...
> > 
> > I think that comes from the comments that are already in
> > pg_get_shmem_allocations().
> > 
> > I think that those are located here and worded that way to ease to 
> > understand 
> > what is not in with pg_get_shmem_allocations_numa() if one look at both
> > functions. That said, I'm +1 to put this kind of comments in the function 
> > comment.
> > 
> 
> OK. But I'm still not sure what "output shared memory" is about. Can you
> explain what shmem segments are not included?

Looking at pg_get_shmem_allocations() and the pg_shmem_allocations view
documentation, I would say the wording is linked to "anonymous allocations" and
"unused memory" (i.e the ones reported with  or NULL as name in the
pg_shmem_allocations view). Using output in this comment sounds confusing while
it makes sense in pg_get_shmem_allocations() because it really reports those.

I think that we could just mention in the function comment that
pg_get_shmem_allocations_numa() does not handle anonymous allocations and unused
memory.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pgsql: Add support for OAUTHBEARER SASL mechanism

2025-04-05 Thread Jacob Champion
On Mon, Mar 31, 2025 at 4:09 PM Jacob Champion
 wrote:
> I don't have hurd-amd64 to test, but I'm working on a patch that will
> build and pass tests if I manually munge pg_config.h. We were skipping
> the useless tests via a $windows_os check; I think I should use
> check_pg_config() instead.

Proposed fix attached.

Thanks,
--Jacob


0001-oauth-Fix-build-on-platforms-without-epoll-kqueue.patch
Description: Binary data


Re: Add -k/--link option to pg_combinebackup

2025-04-05 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 18, 2025 at 10:31 AM Andrew Dunstan  wrote:
>> Just a question if everyone wants to run this. Koel takes about 10s to run 
>> the indent test.

> Well, IMHO, that's pretty cheap insurance against having to push a
> second commit to fix indentation. But I guess one problem is not
> everyone may have pgindent installed locally.

10s added to every check-world run doesn't sound "cheap" to me.
However, both pg_bsd_indent and the typedefs list are in-tree
nowadays, so I don't see any reason that this couldn't be a
totally self-contained check.

> But at least I should get it set up here. I tried setting
> PG_TEST_EXTRA=code-indent locally and running 'meson test' and I
> didn't get a failure -- and 'git grep code-indent' returned nothing.
> Any tips?

Andrew was suggesting a test we could write, not one we
already have.

regards, tom lane




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-04-05 Thread Shubham Khanna
On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston
 wrote:
>
> On Monday, March 17, 2025, Shubham Khanna  wrote:
>>
>>
>> I have added validation for "all" to address both issues at once.
>>
>
> Usage needs to be changed to refer to object types and we should try and 
> specify which are valid there too.
>

Fixed.

> The sgml docs need to mention “all” as a valid value and what it means 
> (namely, if new object types are added and “all” is specified those new types 
> will be removed as well).  Suggest not using it in scripts.
>

As suggested by Amit in [1], I have removed the provision for "all" as
a valid option for --remove. This keeps the implementation focused on
the current supported object type (publications).

> There are still more places where “object type” is needed instead of just 
> “object”.  I’ll pin-point or fix tomorrow if you don’t beat me to them.
>

I identified and fixed a few instances where "object type" was needed
instead of just "object." Please let me know if I missed any other
occurrences.

> It would be good if we could get this to play nicely with —dry-run; maybe 
> connecting to the source for the queries instead of the target.  That would 
> help alleviate my issue with the current auto-drop behavior.
>

In the v18 patch attached at [2], I have removed the second test that
verified publications are not removed when --remove is not specified.
Now, the test suite only verifies that pg_createsubscriber with
--remove correctly removes publications from the subscriber node. This
reduces redundancy while still validating the core functionality of
the --remove option.
IIUC, for testing with --dry-run, we can directly check the relevant
stdout logs (e.g., "dropping publication 'test_pub1' ...") to verify
the call without actually dropping the publications.
However, IMO, using --dry-run alone would miss code coverage for the
actual drop publication execution part.
Please let me know if I misunderstood your point. I will update this
accordingly once I have more clarity.

The attached patch contains the suggested changes,

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JHD0fmyMkTe_y84gJ--WWPLVFo6kJMNxvFdcDs7nXjbQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAHv8RjK8q%2BmWPWPh9K7CeH2tKF31vGn%2BoPV3qY7pdPCmtbjzkg%40mail.gmail.com

Thanks and regards,
Shubham Khanna.


v19-0001-Support-for-dropping-all-publications-in-pg_crea.patch
Description: Binary data


Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-05 Thread Andres Freund
Hi,

On 2025-03-20 17:08:54 -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Thu, Mar 20, 2025 at 01:33:26PM -0700, Jacob Champion wrote:
> >> So one question for the collective is -- putting Curl itself aside --
> >> is having a basic-but-usable OAuth flow, out of the box, worth the
> >> costs of a generic HTTP client?
> 
> > One observation is that security scanning tools are going to see the
> > curl dependency and look at any CSVs related to them and ask us, whether
> > they are using OAUTH or not.
> 
> Yes.  Also, none of this has addressed my complaint about the extent
> of the build and install dependencies.  Yes, simply not selecting
> --with-libcurl removes the problem ... but most packagers are under
> very heavy pressure to enable all features of a package.

How about we provide the current libpq.so without linking to curl and also a
libpq-oauth.so that has curl support? If we do it right libpq-oauth.so would
itself link to libpq.so, making libpq-oauth.so a fairly small library.

That way packagers can split libpq-oauth.so into a separate package, while
still just building once.

That'd be a bit of work on the buildsystem side, but it seems doable.


> From what's been said here, only a small minority of users are likely
> to have any interest in this feature.  So my answer to "is it worth
> the cost" is no, and would be no even if I had a lower estimate of
> the costs.

I think this is likely going to be rather widely used, way more widely than
e.g. kerberos or ldap support in libpq. My understanding is that there's a
fair bit of pressure in lots of companies to centralize authentication towards
centralized systems, even for server applications.


> I don't have any problem with making a solution available to those
> users who want it --- but I really do NOT want this to be part of
> stock libpq nor done as part of the core Postgres build.  I do not
> think that the costs of that have been fully accounted for, especially
> not the fact that almost all of those costs fall on people other than
> us.

I am on board with not having it as part of stock libpq, but I don't see what
we gain by not building it as part of postgres (if the dependencies are
available, of course).

Greetings,

Andres Freund




PRI?64 vs Visual Studio (2022)

2025-04-05 Thread Kyotaro Horiguchi
Hello,

If you're already aware of this and have taken it into account, please
feel free to ignore this.

As described in the recent commit a0ed19e0a9e, many %ll? format
specifiers are being replaced with %.

I hadn’t paid much attention to this before, but I happened to check
how this behaves on Windows, and it seems that with VS2022, PRId64
expands to "%lld". As a result, I suspect the gettext message catalog
won't match these messages correctly.

I haven't been able to build with -Dnls=enabled myself, but I did
check the strings embedded in a binary compiled with VS2022, and they
indeed use %lld.

Just wanted to share this in case it’s helpful.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


rename pg_log_standby_snapshot

2025-04-05 Thread Sami Imseih
Hi,

While looking at [1] which introduces a new function called pg_log_query_plan to
write an explain plan to the log file, I noticed that we currently
have overloaded
the meaning of the "pg_log_" prefix.

Currently there is pg_log_backend_memory_contexts which logs memory
contexts to the log file, but there is also a pg_log_standby_snapshot  which
takes a snapshot of running transactions and writes them to wal, so it has
nothing to do with writing to the log file.

  List of functions
   Schema   |  Name  | Result data type |
Argument data types | Type
++--+-+--
 pg_catalog | pg_log_backend_memory_contexts | boolean  |
integer | func
 pg_catalog | pg_log_standby_snapshot| pg_lsn   |
   | func
(3 rows)

Should the pg_log_ prefix strictly refer to functions that write to
logs? If so, should we rename
pg_log_standby_snapshot to something else, such as
pg_standby_snapshot_xip_to_wal?
This name is long, but it is still shorter than other system function
names and clearly describes
what the function does.

Additionally, this name is similar to pg_snapshot_xip, which returns
in-progress transactions.

Also, pg_snapshot_ is a good prefix for future functions that operate
on standbys only.

Another minor thing: the documentation for pg_log_standby_snapshot
function currently says,
"Take a snapshot of running transactions and write it to WAL."
However, we commonly refer to these transactions as "in-progress transactions."
So, maybe the documentation should say, "Take a snapshot of
in-progress transactions and write it to WAL."


[1] 
https://www.postgresql.org/message-id/flat/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com#6539312988e4695d2d416688a3ab34fa

--
Sami Imseih
Amazon Web Services (AWS)




Re: [PATCH] Automatic client certificate selection support for libpq v1

2025-04-05 Thread Robin Haberkorn
Hello everybody!

I was investigating the item "Allow automatic selection of SSL client
certificates from a certificate store" from the Todo list [1]. If I
understand Seth Robertson in his initial mail correctly, he wanted libpq to
select client certificates automatically based on the host while storing
several client certificates and keys in single crt and key files.

Client certs are currently loaded with SSL_CTX_use_certificate_chain_file()
in src/interfaces/libpq/fe-secure-openssl.c.
While it is theoretically possible to implement host-specific logic using
SSL_CTX_set_client_cert_cb(), I don't think you could store all
the possible certificates in a single file as you might actually already
need several certificates for a single host in the form of a certificate
chain. As was pointed out in the thread back then, you would have to
implement something like a certificate wallet/store from scratch.
At the most, Seth's initial patch could be improved by looking
up per-host client certificate/key files based on the host-reported
server name (SSL_get_servername()), which should be more reliable when
working with host aliases.

But then on the other hand, there are sslcert/sslkey connection parameters
since 8.4, which Seth was apparently aware of. As far as I understand,
he just wanted this patch for 8.3 as well and he didn't want to update
all of his existing programs. Considering that his initial mail was
written 16 years ago, I don't think this is a valid argument anymore.
It should be possible to adapt programs easily, e.g. by accepting
"postgresql://" URIs instead of domains and manually choosing appropriate
certificate/key filenames.

In my opinion there is little than can and should be done in Postgres
at this point. Or does anybody think, that a server-name-based certificate
file selection feature should still be implemented?
If yes, I would be happy to take care of it.
If not, I would suggest to remove this item from the agenda/wiki.

Best regards,
Robin Haberkorn

[1]: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00406.php

-- 
Robin Haberkorn
Senior Software Engineer
Tel.: +49 157 85228715
E-Mail: haberk...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537




Re: Using read stream in autoprewarm

2025-04-05 Thread Melanie Plageman
On Thu, Apr 3, 2025 at 4:22 PM Daniel Gustafsson  wrote:
>
> >> I had a quick look at this. Looks good overall
>
> Same here, this seemed like a good piece to bite into with my limited AIO
> knowledge to learn more, and reading it over it seems like a good change.

Thanks for taking a look!

> A few small comments:
>
> +* `pos` is the read stream callback's index into block_info. Because the
> I'm not a fan of markdown in code comments (also in a few more places).

Removed them. I got the idea of doing this to distinguish variable
names in comments from English words. But I can see how it is kind of
distracting -- since it is not common in the codebase.

> +   /* Time to try and open our new found relation */
> s/new found/newfound/

Fixed

> +   while (p->pos < apw_state->prewarm_stop_idx)
> +   {
> +   BlockInfoRecord blk = p->block_info[p->pos];
> +
> +   CHECK_FOR_INTERRUPTS();
> Isn't checking inside this loop increasing the frequency of checks compared to
> the current version?

It's unclear. The current version does seem to execute the main while
loop (including the CFI) once per block -- even for blocks that it
doesn't end up reading for whatever reason. Things get murkier with
the read stream code. But I put it in the callback to keep the general
idea of doing a CFI once per block. In attached v14, I moved the CFI
to the top of the callback, outside of the loop, to make that
intention more clear.

> +   Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
> Is there a non programmer-error case where this can happen?  The Assert right
> after a loop around the same function seems to imply there is a race or toctou
> case which if so could use a comment.

Yep. Good call. At some point one read stream user had this assert
because its invocation of read_stream_buffer() was interleaved with
other stuff, so it wasn't obvious that the stream would be exhausted
when it was time to end it. And the assert helped defend that
invariant against future innovation :) I think I've copy-pasta'd this
assert around for no good reason to other read stream users. I've
removed it in v14 and I should probably do a follow-on commit to
master to remove it from the other places it obviously doesn't belong
and is a confusing distraction for future readers.

- Melanie
From 11f706609998aebbeac06e282a6e1d25d3f38fdd Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 3 Apr 2025 12:47:19 -0400
Subject: [PATCH v14 1/4] Fix autoprewarm neglect of tablespaces

While prewarming blocks from a dump file, autoprewarm_database_main()
mistakenly ignored tablespace when detecting the beginning of the next
relation to prewarm. Because RelFileNumbers are only unique within a
tablespace, autoprewarm could miss prewarming blocks from a
relation with the same RelFileNumber in a different tablespace.

Though this situation is likely rare in practice, it's best to make the
code correct. Do so by explicitly checking for the RelFileNumber when
detecting a new relation.

Reported-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/97c36982-603b-494a-95f4-aaf2a12ac27e%40iki.fi
---
 contrib/pg_prewarm/autoprewarm.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 73485a2323c..760b1548eff 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -472,10 +472,15 @@ autoprewarm_database_main(Datum main_arg)
 
 		/*
 		 * As soon as we encounter a block of a new relation, close the old
-		 * relation. Note that rel will be NULL if try_relation_open failed
-		 * previously; in that case, there is nothing to close.
+		 * relation. RelFileNumbers are only guaranteed to be unique within a
+		 * tablespace, so check that too.
+		 *
+		 * Note that rel will be NULL if try_relation_open failed previously;
+		 * in that case, there is nothing to close.
 		 */
-		if (old_blk != NULL && old_blk->filenumber != blk->filenumber &&
+		if (old_blk != NULL &&
+			(old_blk->tablespace != blk->tablespace ||
+			 old_blk->filenumber != blk->filenumber) &&
 			rel != NULL)
 		{
 			relation_close(rel, AccessShareLock);
@@ -487,7 +492,9 @@ autoprewarm_database_main(Datum main_arg)
 		 * Try to open each new relation, but only once, when we first
 		 * encounter it. If it's been dropped, skip the associated blocks.
 		 */
-		if (old_blk == NULL || old_blk->filenumber != blk->filenumber)
+		if (old_blk == NULL ||
+			old_blk->tablespace != blk->tablespace ||
+			old_blk->filenumber != blk->filenumber)
 		{
 			Oid			reloid;
 
@@ -508,6 +515,7 @@ autoprewarm_database_main(Datum main_arg)
 
 		/* Once per fork, check for fork existence and size. */
 		if (old_blk == NULL ||
+			old_blk->tablespace != blk->tablespace ||
 			old_blk->filenumber != blk->filenumber ||
 			old_blk->forknum != blk->forknum)
 		{
-- 
2.34.1

From 915fe5b0db6ea8599bc21f95980a352ae404463d Mon Sep 17 00:0

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-04-05 Thread Shubham Khanna
On Thu, Mar 20, 2025 at 4:53 PM Ashutosh Bapat
 wrote:
>
> On Thu, Mar 20, 2025 at 10:25 AM Shubham Khanna
>  wrote:
>
> >
> > The attached patches contain the suggested changes.
> >
>
> I have started reviewing the patches again. Here are some review comments
>
> 
> +
> + -a
> + --all
> + 
> +  
> +   For all source server non-template databases create subscriptions for
> +   databases with the same names on the target server.
>
> In this construct "all" goes with "source server" but it should go
> with "non-template database". How about, "For every non-template
> database on the source server, create one subscription on the target
> server in the database with the same name."?
>

Fixed.

> +   Subscription names, publication names, and replication slot names are
> +   automatically generated. This option cannot be used together with
> +   --database, --publication,
> +   --replication-slot or 
> --subscription.
>
> ... used along with ...
>
> also comma before or .
>
> We should also mention that generated names of replication slots,
> publications and subscriptions are used when using this option.
>
> +  
> + 
> +
> +
>  
>   -d  class="parameter">dbname
>   --database= class="parameter">dbname

Fixed.

> @@ -94,9 +130,10 @@ PostgreSQL documentation
>
> The name of the database in which to create a subscription.  Multiple
> databases can be selected by writing multiple -d
> -   switches. If -d option is not provided, the database
> -   name will be obtained from -P option. If the database
> -   name is not specified in either the -d option or
> +   switches. This option cannot be used together with
> --all.
> +   If -d option is not provided, the database name will 
> be
> +   obtained from -P option. If the database name is not
> +   specified in either the -d option or
> -P option, an error will be reported.
>
> We have to mention -a as well here. Something like "If the database
> name is not specified in either the -d option or
> -P option, and -a is not provided,
> an error will be reported."
>

Fixed.

> +# run pg_createsubscriber with '--database' and '--all' without '--dry-run'
> +# and verify the failure
> +command_fails_like(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--pgdata' => $node_s->data_dir,
> + '--publisher-server' => $node_p->connstr($db1),
> + '--socketdir' => $node_s->host,
> + '--subscriber-port' => $node_s->port,
> + '--database' => $db1,
> + '--all',
> + ],
> + qr/--database cannot be used with --all/,
> + 'fail if --database is used with --all');
>
>
> Why does only this test not use --dry-run, but all other such tests
> use --dry-run? Either they should all use it or not use it.
>
> +
> +# run pg_createsubscriber with '--publication' and '--all' and verify
> +# the failure
> +command_fails_like(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--dry-run',
> + '--pgdata' => $node_s->data_dir,
> + '--publisher-server' => $node_p->connstr($db1),
> + '--socketdir' => $node_s->host,
> + '--subscriber-port' => $node_s->port,
> + '--all',
> + '--publication' => 'pub1',
> + ],
> + qr/--publication cannot be used with --all/,
> + 'fail if --publication is used with --all');
> +
> +# run pg_createsubscriber with '--replication-slot' and '--all' and
> +# verify the failure
> +command_fails_like(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--dry-run',
> + '--pgdata' => $node_s->data_dir,
> + '--publisher-server' => $node_p->connstr($db1),
> + '--socketdir' => $node_s->host,
> + '--subscriber-port' => $node_s->port,
> + '--replication-slot' => 'replslot1',
> + '--all',
> + ],
> + qr/--replication-slot cannot be used with --all/,
> + 'fail if --replication-slot is used with --all');
> +
> +# run pg_createsubscriber with '--subscription' and '--all' and
> +# verify the failure
> +command_fails_like(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--dry-run',
> + '--pgdata' => $node_s->data_dir,
> + '--publisher-server' => $node_p->connstr($db1),
> + '--socketdir' => $node_s->host,
> + '--subscriber-port' => $node_s->port,
> + '--all',
> + '--subscription' => 'sub1',
> + ],
> + qr/--subscription cannot be used with --all/,
> + 'fail if --subscription is used with --all');
> +
>
> Just to cover all combinations we need to test both scenarios. where
> --all comes before an incompatible option and vice versa.
>
>  # Run pg_createsubscriber on node S.  --verbose is used twice
>  # to show more information.
>  # In passing, also test the --enable-two-phase option
> @@ -459,10 +526,93 @@ my $sysid_s = $node_s->safe_psql('postgres',
>   'SELECT system_identifier FROM pg_control_system()');
>  ok($sysid_p != $sysid_s, 'system identifier was changed');
>
> +# On node P create test tables
> +$node_p->safe_psql('postgres', 'CREATE TABLE tbl1 (a text)');
> +$node_p->safe_psql($db1, 'CREATE TABLE tbl2 (a text)');
> +$node_p->safe_psql($db1, 

Re: AIO v2.5

2025-04-05 Thread Andres Freund
Hi,

On 2025-03-19 13:20:17 -0400, Melanie Plageman wrote:
> On Tue, Mar 18, 2025 at 4:12 PM Andres Freund  wrote:
> >
> > Attached is v2.10,
> 
> I noticed a few comments could be improved in  0011: bufmgr: Use AIO
> in StartReadBuffers()
> [...]

Yep.


> Above and in AsyncReadBuffers()
> 
>  * To support retries after short reads, the first operation->nblocks_done is
>  * buffers are skipped.
> 
> can't quite understand this

Heh, yea, it's easy to misunderstand. "short read" in the sense of a partial
read, i.e. a preadv() that only read some of the blocks, not all. I'm
replacing the "short" with partial.

(also removed the superfluous "is")



>  * A secondary benefit is that this would allows us to measure the time in
>  * pgaio_io_acquire() without causing undue timer overhead in the common,
>  * non-blocking, case.  However, currently the pgstats infrastructure
>  * doesn't really allow that, as it a) asserts that an operation can't
>  * have time without operations b) doesn't have an API to report
>  * "accumulated" time.
>  */
> 
> allows->allow
> 
> What would the time spent in pgaio_io_acquire() be reported as?

I'd report it as additional time for the IO we're trying to start, as that
wait would otherwise not happen.


> And what is "accumulated" time here? It seems like you just add the time to
> the running total and that is already accumulated.

Afaict there currently is no way to report a time delta to
pgstat. pgstat_count_io_op_time() computes the time since
pgstat_prepare_io_time(). Due to the assertions that time cannot be reported
for an operation with a zero count, we can't just do two
  pgstat_prepare_io_time(); ...; pgstat_count_io_op_time();
twice, with the first one passing cnt=0.

Greetings,

Andres Freund




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2025-04-05 Thread vignesh C
On Tue, 18 Mar 2025 at 09:26, jian he  wrote:
>
> changed based on this.
>
> also minor documentation tweaks.

Few comments:
1) I felt this is wrong:
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 9a4d993e2bc..7980513a9bd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
  "HEADER", "QUOTE", "ESCAPE",
"FORCE_QUOTE",
  "FORCE_NOT_NULL",
"FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL",
"LOG_VERBOSITY");

as the following fails:
postgres=# copy t_on_error_null from stdin WITH ( set_to_null );
ERROR:  option "set_to_null" not recognized
LINE 1: copy t_on_error_null from stdin WITH ( set_to_null );

2) Can you limit this to 80 chars if possible to improve the readability:
+  stop means fail the command,
+  ignore means discard the input row and
continue with the next one, and
+  set_to_null means replace columns containing
invalid input values with
+  NULL and move to the next field.

3) similarly here too:
+  For ignore option,
+  a NOTICE message containing the ignored row count is
+  emitted at the end of the COPY FROM if at
least one row was discarded.
+  For set_to_null option,
+  a NOTICE message indicating the number of
rows where invalid input values were replaced with
+  null is emitted at the end of the COPY FROM
if at least one row was replaced.

4) Could you mention a brief one line in the commit message as to why
"on_error null" cannot be used:
Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose
"on_error set_to_null".

Regards,
Vignesh




Re: Question -- why is there no errhint_internal function?

2025-04-05 Thread Peter Smith
On Thu, Apr 3, 2025 at 10:26 AM Andres Freund  wrote:
>
> Hi,
>
> On 2025-04-03 09:58:30 +1100, Peter Smith wrote:
> > I saw that a new errhint_internal() function was recently committed
> > [1]. I had also posted above asking about this same missing function a
> > month ago [2].
> >
> > But, your patch only added the new function -- it does not make any
> > use of it for existing code that was using the errhint("%s", str)
> > method.
> >
> > I wondered, given your commit message "but that's not exactly pretty
> > and makes it harder to avoid memory leaks", if you think it is
> > worthwhile to revisit those existing "%s" usages and modify them to
> > use the new errhint_internal? Tom above [3] seemed not keen to modify
> > those without performance reasons, although at that time
> > errhint_internal didn't even exist.
>
> I'd not go around and just categorically convert all users of errhint("%s",
> str), that's probably not worth the noise. And plenty of them won't
> benefit. E.g. the first one I just looked at is dblink_res_error(), where I
> don't think using it would bring meaningful benefit.  I suspect a bunch of
> other places are going to be similar.
>

Yes, I found 19 examples, but they are all similar to that.

I think they could all be changed

FROM
errhint("%s", str)

TO one of
errhint_internal("%s", str)
errhint_internal(str)

...but due to noise/benefit trade-off, I won't bother.

Thanks for your feedback.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Proposal - Allow extensions to set a Plan Identifier

2025-04-05 Thread Michael Paquier
On Thu, Mar 20, 2025 at 09:46:55PM -0400, Sami Imseih wrote:
> * For the same reasons as the query identifiers (see above),
> 
> but, I went ahead and commented it similar to how we document
> pgstat_report_query_id and pgstat_get_my_query_id routines.
> attached is v2-0001

Looks mostly OK from here..  Except for two things.

@@ -2016,6 +2017,17 @@ exec_bind_message(StringInfo input_message)
  */
 cplan = GetCachedPlan(psrc, params, NULL, NULL);
 
+foreach(lc, cplan->stmt_list)
+{
+PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+
+if (plan->planId != UINT64CONST(0))
+{
+pgstat_report_plan_id(plan->planId, false);
+break;
+}
+}

In exec_bind_message(), the comment at the top of PortalDefineQuery()
tells to not put any code between this call and the GetCachedPlan()
that could issue an error.  pgstat_report_plan_id() is OK, but I'd
rather play it safe and set the ID once the portal is defined, based
on portal->stmts instead.  That's the same as your patch, but slightly
safer in the long-term, especially if pgstat_report_plan_id() is
twisted in such a way that it introduces a path where it ERRORs.

planner() is the sole place in the core code where the planner hook
can be called.  Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query?  At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.

bind and execute message paths are OK as they stand, where we set a
plan ID once their portal is defined from its planned statements.

With some adjustments to some comments and the surroundings of the
code, I get the attached.  What do you think?
--
Michael
From a7233c1068ecc29026914fcfd4287dbc62ec65d8 Mon Sep 17 00:00:00 2001
From: "Sami Imseih (AWS)"
 
Date: Wed, 19 Feb 2025 15:56:21 +
Subject: [PATCH v3] Allow plugins to set a 64-bit plan identifier

There are extensions that wish to compute a plan identifier
to distinguish different plans that a particular query may use.
Currently, there is no way for such extensions to make this
plan identifier available to the core for reporting and other
potentially interesting use cases such as plan management. This
patch allows extensions to do this by providing a mechanism to
track this identifier in the backend_status as well as in
PlannedStmt. The reset of the plan identifier is controlled by
core and follows the same code path as the query identifier added
in 4f0b0966c8.

Discussion: https://www.postgresql.org/message-id/flat/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw%40mail.gmail.com
---
 src/include/nodes/plannodes.h   |  3 ++
 src/include/utils/backend_status.h  |  5 ++
 src/backend/executor/execParallel.c |  1 +
 src/backend/optimizer/plan/planner.c|  4 ++
 src/backend/tcop/postgres.c | 24 +
 src/backend/utils/activity/backend_status.c | 58 +
 6 files changed, 95 insertions(+)

diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f78bffd90cf8..658d76225e47 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -55,6 +55,9 @@ typedef struct PlannedStmt
 	/* query identifier (copied from Query) */
 	uint64		queryId;
 
+	/* plan identifier (can be set by plugins) */
+	uint64		planId;
+
 	/* is it insert|update|delete|merge RETURNING? */
 	bool		hasReturning;
 
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 1c9b4fe14d06..430ccd7d78e4 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -171,6 +171,9 @@ typedef struct PgBackendStatus
 
 	/* query identifier, optionally computed using post_parse_analyze_hook */
 	uint64		st_query_id;
+
+	/* plan identifier, optionally computed using planner_hook */
+	uint64		st_plan_id;
 } PgBackendStatus;
 
 
@@ -319,6 +322,7 @@ extern void pgstat_clear_backend_activity_snapshot(void);
 /* Activity reporting functions */
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
 extern void pgstat_report_query_id(uint64 query_id, bool force);
+extern void pgstat_report_plan_id(uint64 plan_id, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
@@ -326,6 +330,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
 extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
 	   int buflen);
 extern uint64 pgstat_get_my_query_id(void);
+extern uint64 pgstat_get_my_plan_id(void);
 extern BackendType pgstat_get_backend_type_by_proc_number(ProcNumber procNumber);
 
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index e9337a97d174..39c990ae638d 100644
--- a/s

RE: Some codes refer slot()->{'slot_name'} but it is not defined

2025-04-05 Thread Hayato Kuroda (Fujitsu)
Dear Fujii-san,

> I think this fix should be backpatched to all supported versions.
> Since the issue you found and the one I found seem different,
> I'd prefer committing them separately.

I have no objections.

> If that works for you,
> here are the commit log messages I'm considering.

LGTM, thanks.

Best regards,
Hayato Kuroda
FUJITSU LIMITED 



Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-05 Thread Peter Geoghegan
On Tue, Apr 1, 2025 at 3:08 PM Alena Rybakina  wrote:
> I think it would be useful to show information that we used an index scan but 
> at the same time we skipped the "region" column and I assume we should output 
> how many distinct values the "region" column had.
>
> For example it will look like this "Skip Scan on region (4 distinct values)":
>
> What do you think?

I don't see much value in that. We can sometimes have data skew that
makes the number of distinct values far from representative of how
many index searches were required. We can have 3 distinct prefix
column values within 90% of all leaf pages, while the remaining 10%
all have unique values. Skip scan will work quite well here (at least
compared to a traditional full index scan), but the number of distinct
values makes it look really bad.

> I see that this work is really voluminous and yes, I agree with you that 
> optimization for skipping index scanning
> in terms of displaying information about changing quals, if any, can even be 
> done using Oracle as an example.
> For example, if you introduce a new range ANY() due 
> to skipping the first column
> in the index key, it will be useful for users to know. Or if you define a new 
> range that the skip array will consider
> during the index scan. Well, and other things related to this, but not 
> specifically with your patch, for example
> in the case of conflicting conditions or defining boundaries in the case of 
> index scanning, like, when a>1 and a>10
> we need to scan only a>10.
>
> This optimization was also introduced by you earlier even before the patch on 
> skip optimization, but I think it also lacks
> some output in the explain.

I would also like this kind of stuff to appear in EXPLAIN. It's partly
hard because so much stuff happens outside of planning.

> * The availability of opclass skipsupport, which makes skip arrays
> generate their element values by addition/subtraction from the current
> array element, rather than using NEXT/PRIOR sentinel keys.
>
> The sentinel keys act as probes that get the next real (non-sentinel)
> value that we need to look up next. Whereas skip support can often
> successfully guess that (for example) the next value in the index
> after 268 is 269, saving a primitive scan each time (this might not
> happen at all, or it might work only some of the time, or it might
> work all of the time).
>
> To be honest, I missed your point here. If possible, could you explain it in 
> more detail?

So, many types don't (and probably can't) offer skip support. Skip
scan still works there. The most common example of this is "text".

We're still using skip arrays when skipping using (say) a text column.
Conceptually, it's still "WHERE a = ANY()",
even with these continuous data types. It is useful to "pretend that
we're using a discrete data type", so that everything can work in the
usual way (remember, I hate special cases). We need to invent another
way to "increment" a text datum, that works in the same way (but
doesn't really require understanding the semantics of text, or
whatever the data type may be).

See my explanation about this here:

https://postgr.es/m/cah2-wznkyhq_w7heu87z80ehyzepqewbguazebcxzhvoxwc...@mail.gmail.com

See the part of my email that begins with "I think that you're
probably still a bit confused because the terminology in this area is
a little confusing. There are two ways of explaining the situation
with types like text and numeric (types that lack skip support)"

> I agree with you, this is an improvement. "Index Searches: N" shows the 
> number of individual index searches, but
> it is still not clear enough. Here you can additionally determine what 
> happened based on the information about
> the number of scanned pages, but with large amounts of data this is difficult.

The benefit of using skip scan comes from all of the pages that we
*aren't* reading, that we'd usually have to read. Hard to show that.

> By the way, are you planning to commit additional output to the explain about 
> skipped pages? I think, together with
> the previous ones, the information about index scanning would be sufficient 
> for analysis.

Not for  Postgres 18.

> Although I am still learning to understand correctly this information in the 
> explain.
> By the way, I have long wanted to ask, maybe you can advise something else to 
> read on this topic?
> Maybe not in this thread, so as not to overload this.

Let's talk about it off list.

> I think it is worth adding "skip scan" information, without it it is 
> difficult in my opinion to evaluate
> whether this index is effective in comparison with another, looking only at 
> the information on
> scanned blocks or Index search or I missed something?

I think that it's particularly worth adding something to EXPLAIN
ANALYZE that makes it obvious that the index in question might not be
what the user thinks that it is. It might be an index that does some
skipping, but is far from optimal. They migh

Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-04-05 Thread Aidar Imamov

Joseph Koshakow 2025-03-21 01:25:

Hi I am working with Aidar to give a review and I am also a beginner
reviewer.


From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00

2001

From: Nazir Bilal Yavuz 
Date: Fri, 20 Dec 2024 14:06:47 +0300
Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for

testing

  */
 bool
-EvictUnpinnedBuffer(Buffer buf)
+EvictUnpinnedBuffer(Buffer buf, bool *flushed)


I think you should update the comment above this function to include
details on the new `flushed` argument.


diff --git a/doc/src/sgml/pgbuffercache.sgml

b/doc/src/sgml/pgbuffercache.sgml

index 802a5112d77..83950ca5cce 100644
--- a/doc/src/sgml/pgbuffercache.sgml
+++ b/doc/src/sgml/pgbuffercache.sgml
@@ -31,8 +31,10 @@
   This module provides the

pg_buffercache_pages()

   function (wrapped in the pg_buffercache

view),

   the pg_buffercache_summary() function, the
-  pg_buffercache_usage_counts() function and
-  the pg_buffercache_evict() function.
+  pg_buffercache_usage_counts() function, the
+  pg_buffercache_evict(), the
+  pg_buffercache_evict_relation(), and the
+  pg_buffercache_evict_all() function.


All the other functions have indexterm tags above this, should we add
indexterms for the new functions?


+ 
+  The pg_buffercache_evict_relation

Function

+  
+   The pg_buffercache_evict_relation()

function is very similar

+   to pg_buffercache_evict() function.  The

difference is that

+   pg_buffercache_evict_relation() takes a

relation

+   identifier instead of buffer identifier.  Then, it tries to

evict all

+   buffers in that relation.  The function is intended for

developer testing only.

+  
+ 
+
+ 
+  The pg_buffercache_evict_all

Function

+  
+   The pg_buffercache_evict_all() function is

very similar

+   to pg_buffercache_evict() function.  The

difference is,

+   the pg_buffercache_evict_all() does not

take argument;

+   instead it tries to evict all buffers in the buffer pool.  The

function is

+   intended for developer testing only.
+  
+ 


The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we
probably
want to mention here.

Also I think it's more gramatically correct to say "the 
function" or just "". A couple of times you say "
function" or "the ".

Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".


+CREATE FUNCTION pg_buffercache_evict_relation(
+IN regclass,
+IN fork text default 'main',
+OUT buffers_evicted int4,
+OUT buffers_flushed int4)
+AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
+LANGUAGE C PARALLEL SAFE VOLATILE;
+
+CREATE FUNCTION pg_buffercache_evict_all(
+OUT buffers_evicted int4,
+OUT buffers_flushed int4)
+AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
+LANGUAGE C PARALLEL SAFE VOLATILE;


Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?


+ /* Lock the header and check if it's valid. */
+ buf_state = LockBufHdr(desc);
+ if ((buf_state & BM_VALID) == 0)
+ {
+ UnlockBufHdr(desc, buf_state);
+ return false;
+ }


EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?

   /* Lock the header if it is not already locked. */
   if (!buf_state)
buf_state = LockBufHdr(desc);


I don't think *flushed is necessarily accurate this way, as it

might detect

that it doesn't need to flush the data (via StartBufferIO()

returning false).


You are right. It seems there is no way to get that information
without editing the FlushBuffer(), right? And I think editing
FlushBuffer() is not worth it. So, I can either remove it or explain
in the docs that #buffers_flushed may not be completely accurate.


There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.

Thanks,
Joe Koshakow


I agree with most of what Joseph said. However, I would like to add some
comments.

At the moment, the "flushed" flag essentially indicates whether the 
buffer
was dirty at the time of eviction and it does not guarantee that it has 
been
written to disk. Therefore, it would be better to rename the 
buffers_flushed

column in the output of pg_buffer_cache_evict_all() and
pg_buffercache_evict_relation() functions to dirty_buffers mb? This 
would
allow us to avoid the confusion that arises from the fact that not all 
dirty
buffers could have actually been written to disk. In addition, this 
would

remove the "flushed" parameter from the EvictUnpinnedBuffer() function.
Because if we explicitly call LockBufHdr() outside of 
EvictUnpinnedBuffer(),

we can already know in advance whether the buffer is dirty or not.

The same applies to the suggestion to retrieve "flushed" count from the
pg_buffercache_evict() call. We cannot say t

Re: md.c vs elog.c vs smgrreleaseall() in barrier

2025-04-05 Thread Noah Misch
On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote:
> Here's a proposed patch for this. It turns out that the bug might already be
> reachable, even without defining FDDEBUG. There's a debug ereport() in
> register_dirty_segment() - but it's hard to reach in practice.
> 
> I don't really know whether that means we ought to backpatch or not - which
> makes me want to be conservative and not backpatch.

Non-backpatch sounds fine.

> Subject: [PATCH v2 1/2] smgr: Hold interrupts in most smgr functions

> It seems better to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() in smgr.c,
> instead of trying to push them down to md.c where possible: For one, every
> smgr implementation would be vulnerable, for another, a good bit of smgr.c
> code itself is affected too.

I'm fine with putting it in smgr.c.  Academically, I don't see every smgr
implementation being vulnerable for most smgr entry points.  For example, the
upthread backtrace happens because mdclose() undoes the fd-opening work of
mdnblocks().  Another smgr implementation could make its smgr_close callback a
no-op.  smgrrelease() doesn't destroy anything important at the smgr level; it
is mdclose() that destroys state that md.c still needs.

> @@ -362,12 +397,16 @@ smgrreleaseall(void)
>   if (SMgrRelationHash == NULL)
>   return;
>  
> + HOLD_INTERRUPTS();

Likely not important, but it's not clear to me why smgrdestroyall() and
smgrreleaseall() get HOLD_INTERRUPTS(), as opposed to relying on the holds in
smgrdestroy() and smgrrelease().  In contrast, smgrreleaserellocator() does
rely on smgrrelease() for the hold.

> +
>   hash_seq_init(&status, SMgrRelationHash);
>  
>   while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
>   {
>   smgrrelease(reln);
>   }
> +
> + RESUME_INTERRUPTS();
>  }
>  
>  /*

> @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
>   if (nrels == 0)
>   return;
>  
> + HOLD_INTERRUPTS();
> +
>   FlushRelationsAllBuffers(rels, nrels);

FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to
become sensitive to smgrrelease().  It may do a ton of I/O.  Hence, I'd
HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before.

>  
>   /*
> @@ -449,6 +498,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
>   smgrsw[which].smgr_immedsync(rels[i], forknum);

Long-term, someone might change this to hold interrupts once per immedsync
with a CFI between immedsync calls.  That would be safe.  It's not this
change's job, though.  I'm mentioning it for the archives.

>   }
>   }
> +
> + RESUME_INTERRUPTS();
>  }
>  
>  /*
> @@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool 
> isRedo)
>   if (nrels == 0)
>   return;
>  
> + HOLD_INTERRUPTS();

I would move this below DropRelationsAllBuffers(), for reasons like
FlushRelationsAllBuffers() above.

> Subject: [PATCH v2 2/2] smgr: Make SMgrRelation initialization safer against
>  errors
> 
> In case the smgr_open callback failed, the ->pincount field would not be
> initialized and the relation would not be put onto the unpinned_relns list.
> 
> This buglet was introduced in 21d9c3ee4ef7. As that commit is only in HEAD, no
> need to backpatch.
> 
> Discussion: 
> https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
> ---
>  src/backend/storage/smgr/smgr.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
> index 53a09fe4aaa..24971304b85 100644
> --- a/src/backend/storage/smgr/smgr.c
> +++ b/src/backend/storage/smgr/smgr.c
> @@ -255,12 +255,12 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
>   reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
>   reln->smgr_which = 0;   /* we only have md.c at present */
>  
> - /* implementation-specific initialization */
> - smgrsw[reln->smgr_which].smgr_open(reln);
> -
>   /* it is not pinned yet */
>   reln->pincount = 0;
>   dlist_push_tail(&unpinned_relns, &reln->node);
> +
> + /* implementation-specific initialization */
> + smgrsw[reln->smgr_which].smgr_open(reln);
>   }

I struggle to speculate about the merits of this, because mdopen() can't fail.
If mdopen() started to do things that could fail, mdnblocks() would be
reasonable in assuming those things are done.  Hence, the long-term direction
should be more like destroying the new smgr entry in the event of error.

I would not make this change.  I'd maybe add a comment that smgr_open
callbacks currently aren't allowed to fail, since smgropen() isn't ready to
clean up smgr-level state from a failed open.  How do you see it?




Re: Allow default \watch interval in psql to be configured

2025-04-05 Thread Greg Sabino Mullane
On Thu, Mar 20, 2025 at 4:45 PM Daniel Gustafsson  wrote:

> Having a watch interval of zero is IMHO somewhat nonsensical, but since it
> was done intentionally in 6f9ee74d45 (which I had missed) I agree that the
> default should support it as well. Fixed.
>

Yeah, I forgot about that too.  The new patch looks good except for one
tiny little thing: the error should say "must be at least 0.00" or similar,
instead of "must be greater than 0.00" now that we allow 0.00

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: [PATCH] Automatic client certificate selection support for libpq v1

2025-04-05 Thread Seth Robertson



Yes, at first glance the service file looks like it should work and is
a much more elegant and generic method than my proposed hack.  I can't
trivially tell if the ssl configuration aspect of it was available in
8.3/8.4, but that isn't overly relevant since it is certainly
available now.

Thanks!
-Seth Robertson

From: Jacob Champion
Date: Mon, 31 Mar 2025 09:52:49 -0700
To: Seth Robertson,
To: Robin Haberkorn
Subject: Re: [PATCH] Automatic client certificate selection support for libpq v1

On Mon, Mar 31, 2025 at 9:01 AM Seth Robertson
 wrote:

Third, the only real use case where this feature would be critical is
a client which needs to have connections to two different PostgreSQL
servers at the same time.  Those applications are likely fairly rare
and doing custom programming to support different filenames would
likely be warranted.


Can this be handled well enough with a service file?


Given the lack of "me too" or "+1" posts over the past 16 years, I
suspect there may be features with higher user benefit.  I would not
cry if it gets removed.


Yeah, at least not without a solid use case. (If anyone does feel
motivated to pick it up, be aware of the server-side SNI work [1].
It'd be nice if the two halves were complementary -- or at minimum,
not clashing with each other.)

Thanks!
--Jacob

[1] https://postgr.es/m/1C81CD0D-407E-44F9-833A-DD0331C202E5%40yesql.se




Re: Add partial :-variable expansion to psql \copy

2025-04-05 Thread Tom Lane
Fabien COELHO  writes:
> I've been biten by psql's \copy lack of variable expansion, in a 
> limited-access docker-inside-VM context where COPY is not a viable option and 
> hardwired names are not desirable. The attached patch allows \copy to use 
> variable's values in place of table and file names:

Hm ... I'm on board with the general idea of the feature, but I find
this implementation quite horrid.  I would rather see us adjust the
lexing rules in psqlscanslash.l so that variable expansion happens
there when collecting \copy arguments.  This would eliminate at
least part of the distinction between OT_WHOLE_LINE and OT_NORMAL
modes, and we'd have to have some discussion about how far to go
there.  Or maybe just change exec_command_copy to use OT_NORMAL
not OT_WHOLE_LINE?  If we modify the behavior of OT_WHOLE_LINE
then the ability to expand variables would start to apply in the
other commands using that, notably \!.  I think that's at least
potentially good, but perhaps the blast radius of such a change
is too large.

Anyway, my feeling about it is that \copy parsing is a huge hack
right now, and I'd rather see it become less of a hack, that is
more like other psql commands, instead of getting even hackier.

regards, tom lane




Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-04-05 Thread Shubham Khanna
On Thu, Mar 20, 2025 at 5:54 PM Amit Kapila  wrote:
>
> On Thu, Mar 20, 2025 at 4:53 PM Ashutosh Bapat
>  wrote:
> >
> > On Thu, Mar 20, 2025 at 10:25 AM Shubham Khanna
> >  wrote:
> >
> >
> > +# run pg_createsubscriber with '--database' and '--all' without '--dry-run'
> > +# and verify the failure
> > +command_fails_like(
> > + [
> > + 'pg_createsubscriber',
> > + '--verbose',
> > + '--pgdata' => $node_s->data_dir,
> > + '--publisher-server' => $node_p->connstr($db1),
> > + '--socketdir' => $node_s->host,
> > + '--subscriber-port' => $node_s->port,
> > + '--database' => $db1,
> > + '--all',
> > + ],
> > + qr/--database cannot be used with --all/,
> > + 'fail if --database is used with --all');
> >
> >
> > Why does only this test not use --dry-run, but all other such tests
> > use --dry-run? Either they should all use it or not use it.
> >
>
> I think this patch is adding a lot of extra tests which I don't think
> are required. We should have one positive test in --dry-run mode
> similar to ('run pg_createsubscriber without --databases') and
> probably verify in the LOG that it has expanded commands for all
> databases. Also, for negative tests, one or two tests are sufficient
> instead of testing all possible combinations. I don't expect this new
> option to add a noticeable testing time.
>

As per your suggestions, I have updated the test cases in the v17-0001
patch. The primary tests now focus on a positive test in --dry-run
mode, similar to the one for run pg_createsubscriber without
--databases, along with verification in the log to ensure that
commands for all databases are expanded. Also, I have limited the
negative tests to just one or two representative cases, avoiding
unnecessary combinations.
I have moved the additional test cases to the v17-0002 patch to ensure
that the testing time for the new option remains negligible.

> *  
> +  
> +   pg_createsubscriber
> +   option
> +   
> +
> + -a
> + --all
> +
> +
> + -D 
> + --pgdata
> +
> +datadir
> +
> + -P
> + --publisher-server
> +
> +connstr
>
> Most of this is unrelated to this patch. I suggest making a top-up
> patch, we can commit it separately.
>

I have created a separate v17-0003 patch that includes the synopsis
for the '--all' option.

The attached patch at [1] contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8RjLj0KxVHbxaPZHzudGS1igzDMccFE8LDh4LHNJR_2Aqug%40mail.gmail.com

Thanks and regards,
Shubham Khanna.




Re: Draft for basic NUMA observability

2025-04-05 Thread Jakub Wartak
On Thu, Apr 3, 2025 at 10:23 AM Bertrand Drouvot
 wrote:
>
> Hi,

Hi Bertrand,

> On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote:
[..]

> === v21-0002
> While pg_buffercache_build_tuple() is not added (pg_buffercache_save_tuple()
> is).

Fixed

> About v21-0002:
>
> === 1
>
> I can see that the pg_buffercache_init_entries() helper comments are added in
> v21-0003 but I think that it would be better to add them in v21-0002
> (where the helper is actually created).

Moved

> About v21-0003:
>
> === 2
>
> > I hear you, attached v21 / 0003 is free of float/double arithmetics
> > and uses non-float point values.
>
> +   if (buffers_per_page > 1)
> +   os_page_query_count = NBuffers;
> +   else
> +   os_page_query_count = NBuffers * pages_per_buffer;
>
> yeah, that's more elegant. I think that it properly handles the relationships
> between buffer and page sizes without relying on float arithmetic.

Cool

> === 3
>
> +   if (buffers_per_page > 1)
> +   {
>
> As buffers_per_page does not change, I think I'd put this check outside of the
> for (i = 0; i < NBuffers; i++) loop, something like:
>
> "
> if (buffers_per_page > 1) {
> /* BLCKSZ < PAGESIZE: one page hosts many Buffers */
> for (i = 0; i < NBuffers; i++) {
> .
> .
> .
> .
> } else {
> /* BLCKSZ >= PAGESIZE: Buffer occupies more than one OS page */
> for (i = 0; i < NBuffers; i++) {
> .
> .
> .
> "

Done.

> === 4
>
> > That _numa_prepare_ptrs() is unused and will need to be removed,
> > but we can still move some code there if necessary.
>
> Yeah I think that it can be simply removed then.

Removed.

> === 5
>
> > @Bertrand: do you have anything against pg_shm_allocations_numa
> > instead of pg_shm_numa_allocations? I don't mind changing it...
>
> I think that pg_shm_allocations_numa() is better (given the examples you just
> shared).

OK, let's go with this name then (in v22).

> === 6
>
> > I find all of those non-user friendly and I'm afraid I won't be able
> > to pull that alone in time...
>
> Maybe we could add a few words in the doc to mention the "multiple nodes per
> buffer" case? And try to improve it for say 19?

Right, we could also put it as a limitation. I would be happy to leave
it as it must be a rare condition, but Tomas stated he's not.

> Also maybe we should just focus till v21-0003 (and discard v21-0004 for 18).

Do you mean discard pg_buffercache_numa (0002+0003) and instead go
with pg_shm_allocations_numa (0004) ?

BTW: I've noticed that Tomas responded with his v22 to this after I've
solved all of the above in mine v22, so I'll drop v23 soon and then
let's continue there.


-J.




Re: Speed up ICU case conversion by using ucasemap_utf8To*()

2025-04-05 Thread vignesh C
On Sun, 30 Mar 2025 at 00:20, Andres Freund  wrote:
>
> On 2025-03-17 12:16:11 +0530, vignesh C wrote:
> > On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson  wrote:
> > >
> > > Hi,
> > >
> > > Jeff pointed out to me that the case conversion functions in ICU have
> > > UTF-8 specific versions which means we can call those directly if the
> > > database encoding is UTF-8 and skip having to convert to and from UChar.
> > >
> > > Since most people today run their databases in UTF-8 I think this
> > > optimization is worth it and when measuring on short to medium length
> > > strings I got a 15-20% speed up. It is still slower than glibc in my
> > > benchmarks but the gap is smaller now.
> > >
> > > SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
> > > "sv-SE-x-icu") FROM generate_series(1, 100) i);
> > >
> > > master:  ~540 ms
> > > Patched: ~460 ms
> > > glibc:   ~410 ms
> > >
> > > I have also attached a clean up patch for the non-UTF-8 code paths. I
> > > thought about doing the same for the new UTF-8 code paths but it turned
> > > out to be a bit messy due to different function signatures for
> > > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs 
> > > ucasemap_utf8ToTitle().
> >
> > I noticed that Jeff's comments from [1] have not yet been addressed, I
> > have changed the commitfest entry status to "Waiting on Author",
> > please address them and update it to "Needs Review".
> > [1] - 
> > https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.ca...@j-davis.com
>
> It's also worth noting that this patch hasn't been building for quite a while
> (at least not since 2025-01-29):
>
> https://cirrus-ci.com/task/5621435164524544?logs=build#L1228
> [17:17:51.214] ld: error: undefined symbol: icu_convert_case
> [17:17:51.214] >>> referenced by pg_locale_icu.c:484 
> (../src/backend/utils/adt/pg_locale_icu.c:484)
> [17:17:51.214] >>>   
> src/backend/postgres_lib.a.p/utils_adt_pg_locale_icu.c.o:(strfold_icu)
> [17:17:51.214] cc: error: linker command failed with exit code 1 (use -v to 
> see invocation)
>
> I think we can mark this as returned-with-feedback for now?

Thanks, the commitfest entry is marked as returned with feedback.
@Andreas Karlsson Feel free to add a new commitfest entry when you
have addressed the feedback.

Regards,
Vignesh




Re: making EXPLAIN extensible

2025-04-05 Thread Robert Haas
On Mon, Mar 17, 2025 at 11:54 PM Sami Imseih  wrote:
> +1
>
> I did not think of adding a new hook, because there must be a really good
> reason to add a new hook. I think it's justified for this case. It's better 
> than
> my approach since the extension author can just put all their checks in one
> place rather than having to register a bunch of handlers.

Do you want to propose a patch?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-04-05 Thread Michael Paquier
On Tue, Mar 18, 2025 at 05:24:42PM +1300, David Rowley wrote:
> I had a look at this and it seems the main difference will be that
> this patch will protect against the case that a given node is non-null
> but has a custom jumble function which selects to not jumble anything
> in some cases. Since Ivan's patch only adds jumbling for non-null,
> that could lead to the same bug if a custom jumble function decided to
> not jumble anything at all.

Yeah.

> It's certainly hard to see much slowdown between master and v4, but it
> is visible if I sort the results and put them in a line graph and
> scale the vertical axis a bit. A 0.7% slowdown. See attached.

Thanks for the numbers.  I'd like to say that this is within noise,
but that's clearly not if repeatable, as you are pointing out.  Ugh.

> I do agree that your v4 fixes the issue at hand and I'm not going to
> stand in your way if you want to proceed with it. However, from my
> point of view, it seems that we could achieve the same goal with much
> less overhead by just insisting that custom jumble functions always
> jumble something. We could provide a jumble function to do that if the
> custom function conditionally opts to jumble nothing. That would save
> having to add this additional jumbling of the node count.

If we make the whole cheaper with only extra entropy added for NULLs
in nodes and strings, I'd rather have an insurance policy for the
custom functions.  Something like that:
- Forbid a size of 0 in AppendJumble().
- When dealing with a non-NULL case in _jumbleNode(), save the initial
jumble_len and the jumble contents when starting, then complain if the
jumble_len matches with the initial length at the end and if the
contents are the same in an assert.  A check on the length may be
enough, but we'd depend on JUMBLE_SIZE and nodes can get pretty big. 

What do you think?

> Also, am I right in thinking that there's no way for an extension to
> add a custom jumble function? If so, then we have full control over
> any custom jumble function. That would make it easier to ensure these
> always jumble something.

Currently, no, that would not be possible through this code path.  The
only way would be to fork the code as we rely entirely on the code
generated from the in-core headers.
--
Michael


signature.asc
Description: PGP signature


Re: Add mention in docs about locking all partitions for generic plans

2025-04-05 Thread David Rowley
On Mon, 24 Mar 2025 at 22:19, Tender Wang  wrote:
>> Maybe I was wrong about writing nothing in master's docs. It might
>> still be important to detail this. I don't know the best way to phrase
>> that, but maybe something along the lines of: "The query planner
>> obtains locks for all partitions which are part of the plan.  However,
>> when the executor uses a cached plan, locks are only obtained on the
>> partitions which remain after partition pruning done during the
>> initialization phase of execution, i.e., the ones shown in the EXPLAIN
>> output and not the ones referred to by the “Subplans Removed”
>> property.".
>>
>> Any opinions?
>
> The above sentence looks good to me.

Thanks for looking.

I've attached the master and the <= v17 in patch form. Also, the
compiled HTML for ease of review.

I'll push these in the next few days unless anyone else wants to voice
their opinions.

David
<>


v17-0001-Doc-add-information-about-partition-locking.patch
Description: Binary data


v18-0001-Doc-add-information-about-partition-locking.patch
Description: Binary data


Re: Add Postgres module info

2025-04-05 Thread Yurii Rashkovskii
On Wed, Mar 26, 2025 at 8:17 PM Tom Lane  wrote:

> Yurii Rashkovskii  writes:
> >> Can you propose a specific change to clean it up?  I wanted to write
> >> just "PG_MODULE_MAGIC_DATA()", but I'm not sure that's valid C either.
>
> > I was thinking about passing `.name = NULL, .version = NULL` instead of
> > `0`—do you have any reservations about this?
>
> If we're going that way, I'd minimize it to just ".name = NULL".
>

Would something like this work?


v1-0001-PG_MODULE_MAGIC_DATA-0-mixes-designated-and-non-desi.patch
Description: Binary data


Re: Separate GUC for replication origins

2025-04-05 Thread Masahiko Sawada
On Thu, Mar 20, 2025 at 8:38 PM Amit Kapila  wrote:
>
> On Thu, Mar 20, 2025 at 10:37 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Mar 19, 2025 at 8:15 PM Amit Kapila  wrote:
> > >
> > > On Wed, Mar 19, 2025 at 10:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Mar 17, 2025 at 6:05 PM Euler Taveira  wrote:
> > > > >
> > > > > On Mon, Mar 17, 2025, at 8:44 PM, Masahiko Sawada wrote:
> > > > >
> > > > > I would suggest putting the new max_active_replication_origins after
> > > > > max_parallel_apply_workers_per_subscription as both
> > > > > max_sync_workers_per_subscription and
> > > > > max_parallel_apply_workers_per_subscription are related to
> > > > > max_logical_replication_workers.
> > > > >
> > > > >
> > > > > Good point. Looking at the documentation, the old 
> > > > > max_replication_slots
> > > > > parameter was the first one in that section so I decided to use the 
> > > > > same order
> > > > > for the postgresql.conf.sample.
> > > >
> > > > Thank you for updating the patch!
> > > >
> > >
> > > *
> > >
> > > Logical replication requires several configuration options to be set. 
> > > Most
> > > -   options are relevant only on one side of the replication. However,
> > > -   max_replication_slots is used on both the 
> > > publisher and
> > > -   the subscriber, but it has a different meaning for each.
> > > +   options are relevant only on one side of the replication.
> > >
> > >
> > > In this para, after removing the content about max_replication_slots,
> > > the other line: "Most options are relevant only on one side of the
> > > replication." doesn't make sense because there is no other option that
> > > applies to both sides and if there is one then we should mention about
> > > that.
> >
> > Good point. How about the following change?
> >
> >
> > -   Logical replication requires several configuration options to be set. 
> > Most
> > +   Logical replication requires several configuration options to be set. 
> > These
> > options are relevant only on one side of the replication.
> >
> >
>
> LGTM.

Thank you for reviewing it. I've committed the patch.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: AIO v2.5

2025-04-05 Thread Noah Misch
On Thu, Mar 20, 2025 at 02:54:14PM -0400, Andres Freund wrote:
> On 2025-03-19 18:11:18 -0700, Noah Misch wrote:
> > On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> > > On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > > > I see this relies on md_readv_complete having converted "result" to 
> > > > blocks.
> > > > Was there some win from doing that as opposed to doing the division 
> > > > here?
> > > > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel 
> > > > easier
> > > > to follow, to me.
> > > 
> > > It seemed like that would be wrong layering - what if we had an smgr that
> > > could store data in a compressed format? The raw read would be of a 
> > > smaller
> > > size. The smgr API deals in BlockNumbers, only the md.c layer should know
> > > about bytes.
> > 
> > I hadn't thought of that.  That's a good reason.
> 
> I thought that was better documented, but alas, it wasn't. How about updating
> the documentation of smgrstartreadv to the following:
> 
> /*
>  * smgrstartreadv() -- asynchronous version of smgrreadv()
>  *
>  * This starts an asynchronous readv IO using the IO handle `ioh`. Other than
>  * `ioh` all parameters are the same as smgrreadv().
>  *
>  * Completion callbacks above smgr will be passed the result as the number of
>  * successfully read blocks if the read [partially] succeeds. This maintains
>  * the abstraction that smgr operates on the level of blocks, rather than
>  * bytes.
>  */

That's good.  Possibly add "(Buffers for blocks not successfully read might
bear unspecified modifications, up to the full nblocks.)"

In a bit of over-thinking this, I wondered if shared_buffer_readv_complete
would be better named shared_buffer_smgrreadv_complete, to emphasize the
smgrreadv semantics.  PGAIO_HCB_SHARED_BUFFER_READV likewise.  But I tend to
think not.  smgrreadv() has no "result" concept, so the symmetry is limited.

> I briefly had a bug in test_aio's injection point that lead to *increasing*
> the number of bytes successfully read. That triggered an assertion failure in
> bufmgr.c, but not closer to the problem.  Is it worth adding an assert against
> that to md_readv_complete? Can't quite decide.

I'd lean yes, if in doubt.




Re: AIO v2.5

2025-04-05 Thread Andres Freund
Hi,

I've pushed fixes for 1) and 2) and am working on 3).


On 2025-04-01 17:13:24 -0700, Noah Misch wrote:
> On Tue, Apr 01, 2025 at 06:25:28PM -0400, Andres Freund wrote:
> > On 2025-04-01 17:47:51 -0400, Andres Freund wrote:
> > > 3) Some subtests fail if RELCACHE_FORCE_RELEASE and 
> > > CATCACHE_FORCE_RELEASE are defined:
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-04-01%2019%3A23%3A07
> > > 
> > > # +++ tap check in src/test/modules/test_aio +++
> > > 
> > > #   Failed test 'worker: batch_start() leak & cleanup in implicit xact: 
> > > expected stderr'
> > > #   at t/001_aio.pl line 318.
> > > #   'psql::4: ERROR:  starting batch while batch 
> > > already in progress'
> > > # doesn't match '(?^:open AIO batch at end)'
> > > 
> > > 
> > > The problem is basically that the test intentionally forgets to exit 
> > > batchmode
> > > - normally that would trigger an error at the end of the transaction, 
> > > which
> > > the test verifies.  However, with RELCACHE_FORCE_RELEASE and
> > > CATCACHE_FORCE_RELEASE defined, we get other code entering batchmode and
> > > erroring out because batchmode isn't allowed to be entered recursively.
> 
> > > I don't really have a good idea how to deal with that yet.
> > 
> > Hm. Making the query something like
> > 
> > SELECT * FROM (VALUES (NULL), (batch_start()));
> > 
> > avoids the wrong output, because the type lookup happens for the first row
> > already. But that's pretty magical and probably fragile.
> 
> Hmm.  Some options:
> 
> a. VALUES() trick above.  For test code, it's hard to argue with something
>that seems to solve it in practice.

I think I'll go for a slightly nicer version of that, namely
  SELECT WHERE batch_start() IS NULL
I think that ends up the least verbose of the ideas we've been discussing.


> c. Move RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE to be
>GUC-controlled, like how CLOBBER_CACHE_ALWAYS changed into the
>debug_discard_caches GUC.  Then disable them for relevant parts of
>test_aio.  This feels best long-term, but it's bigger.  I also wanted this
>in syscache-update-pruned.spec[1].

Yea, that'd probably be a good thing medium-term.

Greetings,

Andres Freund




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-04-05 Thread Alena Rybakina

On 03.04.2025 18:26, Alexander Korotkov wrote:

On Thu, Apr 3, 2025 at 5:18 PM Alena Rybakina  wrote:

Okay, I agree with you.

Good.  I've reflected this limitation in comments and the commit
message.

Thank you, it looks fine)

Also, I've adjust regression tests by removing excessive
ones and adding more important cases.  I'm going to push this if no
objections.

I agree with your changes.

--
Regards,
Alena Rybakina
Postgres Professional





Re: Test to dump and restore objects left behind by regression

2025-04-05 Thread Ashutosh Bapat
On Mon, Mar 24, 2025 at 5:44 PM Alvaro Herrera  wrote:
>
> On 2025-Mar-24, Ashutosh Bapat wrote:
>
> > One concern I have with directory format is the dumped database is not
> > readable. This might make investigating a but identified the test a
> > bit more complex.
>
> Oh, it's readable all right.  You just need to use `pg_restore -f-` to
> read it.  No big deal.
>
>
> So I ran this a few times:
> /usr/bin/time make -j8 -Otarget -C /pgsql/build/master check-world -s 
> PROVE_FLAGS="-c -j6" > /dev/null
>
> commenting out the call to test_regression_dump_restore() to test how
> much additional runtime does the new test incur.
>
> With test:
>
> 136.95user 116.56system 1:13.23elapsed 346%CPU (0avgtext+0avgdata 
> 250704maxresident)k
> 4928inputs+55333008outputs (114major+14784937minor)pagefaults 0swaps
>
> 138.11user 117.43system 1:15.54elapsed 338%CPU (0avgtext+0avgdata 
> 278592maxresident)k
> 48inputs+55333464outputs (80major+14794494minor)pagefaults 0swaps
>
> 137.05user 113.13system 1:08.19elapsed 366%CPU (0avgtext+0avgdata 
> 279272maxresident)k
> 48inputs+55330064outputs (83major+14758028minor)pagefaults 0swaps
>
> without the new test:
>
> 135.46user 114.55system 1:14.69elapsed 334%CPU (0avgtext+0avgdata 
> 145372maxresident)k
> 32inputs+55155256outputs (105major+14737549minor)pagefaults 0swaps
>
> 135.48user 114.57system 1:09.60elapsed 359%CPU (0avgtext+0avgdata 
> 148224maxresident)k
> 16inputs+55155432outputs (95major+14749502minor)pagefaults 0swaps
>
> 133.76user 113.26system 1:14.92elapsed 329%CPU (0avgtext+0avgdata 
> 148064maxresident)k
> 48inputs+55154952outputs (84major+14749531minor)pagefaults 0swaps
>
> 134.06user 113.83system 1:16.09elapsed 325%CPU (0avgtext+0avgdata 
> 145940maxresident)k
> 32inputs+55155032outputs (83major+14738602minor)pagefaults 0swaps
>
> The increase in duration here is less than a second.
>
>
> My conclusion with these numbers is that it's not worth hiding this test
> in PG_TEST_EXTRA.

Thanks for the conclusion.

On Mon, Mar 24, 2025 at 3:29 PM Daniel Gustafsson  wrote:
>
> > On 24 Mar 2025, at 10:54, Ashutosh Bapat  
> > wrote:
>
> > 0003 - same as 0002 in the previous patch set. It excludes statistics
> > from comparison, otherwise the test will fail because of bug reported
> > at [1]. Ideally we shouldn't commit this patch so as to test
> > statistics dump and restore, but in case we need the test to pass till
> > the bug is fixed, we should merge this patch to 0001 before
> > committing.
>
> If the reported bug isn't fixed before feature freeze I think we should commit
> this regardless as it has clearly shown value by finding bugs (though perhaps
> under PG_TEST_EXTRA or in some disconnected till the bug is fixed to limit the
> blast-radius in the buildfarm).

Combining Alvaro's and Daniel's recommendations, I think we should
squash all the three of my patches while committing the test if the
bug is not fixed by then. Otherwise we should squash first two patches
and commit it. Just attaching the patches again for reference.

> If we really wanted to save some total test runtime,
> it might be better to write a regress schedule file for
> 027_stream_regress.pl which only takes the test that emit useful WAL,
> rather than all tests.

That's out of scope for this patch, but it seems like an idea worth exploring.

-- 
Best Wishes,
Ashutosh Bapat
From f26f88364a196dc9589ca451cb54f5e514e3422e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/3] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml  | 12 
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 76 +-
 2 files changed, 25 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
   
  
 
-
-
- regress_dump_test
- 
-  
-   When enabled, src/bin/pg_upgrade/t/002_pg_upgrade.pl
-   tests dump and restore of regression database left behind by the
-   regression run. Not enabled by default because it is time and resource
-   consuming.
-  
- 
-

 
Tests for features that are not supported by the cu

Re: Using read stream in autoprewarm

2025-04-05 Thread Melanie Plageman
On Tue, Apr 1, 2025 at 7:21 AM Nazir Bilal Yavuz  wrote:
>
> On Tue, 1 Apr 2025 at 05:14, Melanie Plageman  
> wrote:
> >
> +1 for using the functions. I think it is hard to follow / maintain
> this with the do-while loops and goto statements.

I'll take a look at your downthread proposal in a bit.

But the attached patch is a new version of what I proposed with the
functions. It's still not totally correct, but I wanted to see what
you thought.

> > But the explicit looping for skipping the bad blocks and the nested
> > loops for rel and fork -- I think these are less error prone.
>
> One question in my mind is, the outermost loop stops when the database
> changes, we do not check if it is changed from the database oid = 0.
> Handling this might require some structural changes.

I don't understand why each database has global objects at the
beginning. If there are global objects, they are global to all
databases, so surely the sort function would have put them all at the
beginning? One problem is we need a database connection to prewarm
these, but if the global objects are all at the beginning, then maybe
we can handle those with a special case and not force ourselves to
check for them when trying to load blocks from every database.

- Melanie
From f1c98580825acc63538372e9f67f6ed2bf68b540 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 31 Mar 2025 22:02:25 -0400
Subject: [PATCH] pgsr autoprewarm

---
 contrib/pg_prewarm/autoprewarm.c | 244 ---
 1 file changed, 160 insertions(+), 84 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 73485a2323c..fd3fb228715 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -41,6 +41,7 @@
 #include "storage/latch.h"
 #include "storage/lwlock.h"
 #include "storage/procsignal.h"
+#include "storage/read_stream.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
@@ -421,6 +422,96 @@ apw_load_buffers(void)
 (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
 		apw_state->prewarmed_blocks, num_elements)));
 }
+struct apw_read_stream_private
+{
+	BlockInfoRecord *block_info;
+	int			pos;
+	Oid			database;
+	RelFileNumber filenumber;
+	ForkNumber	forknum;
+	BlockNumber nblocks;
+};
+
+static BlockNumber
+awp_read_stream_next_block(ReadStream *stream,
+		   void *callback_private_data,
+		   void *per_buffer_data)
+{
+	struct apw_read_stream_private *p = callback_private_data;
+
+	for (int i; (i = p->pos++) < apw_state->prewarm_stop_idx;)
+	{
+		BlockInfoRecord cur_blk = p->block_info[i];
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (!have_free_buffer())
+		{
+			p->pos = apw_state->prewarm_stop_idx;
+			return InvalidBlockNumber;
+		}
+
+		if (cur_blk.database != p->database)
+			return InvalidBlockNumber;
+
+		if (cur_blk.filenumber != p->filenumber)
+			return InvalidBlockNumber;
+
+		if (cur_blk.forknum != p->forknum)
+			return InvalidBlockNumber;
+
+		if (cur_blk.blocknum >= p->nblocks)
+			continue;
+
+		return cur_blk.blocknum;
+	}
+
+	return InvalidBlockNumber;
+}
+
+static int
+next_fork_idx(BlockInfoRecord *block_info, int start, int stop,
+			  Oid database, RelFileNumber filenumber, ForkNumber forknum)
+{
+	Assert(block_info[start].forknum == forknum);
+
+	for (int i = start; i < stop; i++)
+	{
+		BlockInfoRecord *blk = &block_info[i];
+
+		if (blk->database != database)
+			return i;
+
+		if (blk->filenumber != filenumber)
+			return i;
+
+		if (blk->forknum != forknum)
+			return i;
+	}
+
+	return stop;
+}
+
+static int
+next_rel_idx(BlockInfoRecord *block_info, int start, int stop,
+			 Oid database, RelFileNumber filenumber)
+{
+	Assert(block_info[start].filenumber == filenumber);
+
+	for (int i = start; i < stop; i++)
+	{
+		BlockInfoRecord *blk = &block_info[i];
+
+		if (blk->database != database)
+			return i;
+
+		if (blk->filenumber != filenumber)
+			return i;
+	}
+
+	return stop;
+}
+
 
 /*
  * Prewarm all blocks for one database (and possibly also global objects, if
@@ -429,12 +520,11 @@ apw_load_buffers(void)
 void
 autoprewarm_database_main(Datum main_arg)
 {
-	int			pos;
 	BlockInfoRecord *block_info;
-	Relation	rel = NULL;
-	BlockNumber nblocks = 0;
-	BlockInfoRecord *old_blk = NULL;
+	int			i;
+	BlockInfoRecord *blk;
 	dsm_segment *seg;
+	Oid			database;
 
 	/* Establish signal handlers; once that's done, unblock signals. */
 	pqsignal(SIGTERM, die);
@@ -449,108 +539,94 @@ autoprewarm_database_main(Datum main_arg)
  errmsg("could not map dynamic shared memory segment")));
 	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid, 0);
 	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
-	pos = apw_state->prewarm_start_idx;
 
-	/*
-	 * Loop until we run out of blocks to prewarm or until we run out of free
-	 * buffers.
-	 */
-	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
-	{
-		BlockInfoRecord *blk = &block_info[po

Re: Thread-safe nl_langinfo() and localeconv()

2025-04-05 Thread Peter Eisentraut

On 28.03.25 09:13, Peter Eisentraut wrote:

About patch 0003:

I had previously pointed out that the canonicalization might have 
been intentional, and that we have canonicalization of ICU locale 
names. But we don't have to keep the setlocale()-based locale 
checking implementation just for that, I think.  (If this was meant 
to be a real feature offered by libc, there should be a 
get_locale_name(locale_t) function.)


Anyway, this patch fails tests on CI on NetBSD, so it will need some 
further investigation.


It turns out the problem was that nl_langinfo_l() returns a codeset name 
of "646" for the C locale, which we didn't have in our mapping table. If 
we add that, then everything passes there as well.  (But the use of 
nl_langinfo_l() wasn't added by this patch, it just exposes it to the 
tests.  So this is apparently a pre-existing problem.)


Further analysis:

(But I have not tested any of this.)

It appears that the new implementation of check_locale() provided by 
this patch, using newlocale() instead of setlocale(), works differently 
on NetBSD.  Specifically, it apparently does not catch garbage locale 
names.  Instead, it just assumes they are C locale.  Then, in 
pg_get_encoding_from_locale(), we have special cases mapping "C" and 
"POSIX" to SQL_ASCII.  But as discussed, with this patch, we no longer 
do canonicalization of the passed-in locale name, so if you pass a 
garbage locale name, it will not match "C" or "POSIX".  Then, you fall 
through to the mapping table, and that's where we get the error about 
the missing "646" entry.  That's why this was not a problem before, even 
though we've used nl_langinfo_l() for a few months and nl_langinfo() 
before that.


I'm not sure what to do with this.  If setlocale() and newlocale() 
indeed behave differently in what set of locale names they accept, then 
technically we ought to test both of them, since we do use both of them 
later on.  Or maybe we push on with the effort to get rid of setlocale() 
calls and then just worry about testing newlocale() (as this patch 
does).  But right now, if newlocale() is more permissive, then we could 
accept locale names that will later fail setlocale() calls, which might 
be a problem.







Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-05 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Mar 20, 2025 at 2:38 AM Daniel Gustafsson  wrote:
>> On 19 Mar 2025, at 05:57, Tom Lane  wrote:
>>> BTW, I was pretty seriously disheartened just now to realize that
>>> this feature was implemented by making libpq depend on libcurl.

> How feasible/fragile/weird would it be to dlopen() it on demand?

FWIW, that would not really move the needle one bit so far as
my worries are concerned.  What I'm unhappy about is the very
sizable expansion of our build dependency footprint as well
as the sizable expansion of the 'package requires' footprint.
The fact that the new dependencies are mostly indirect doesn't
soften that blow at all.

To address that (without finding some less kitchen-sink-y OAuth
implementation to depend on), we'd need to shove the whole thing
into a separately-built, separately-installable package.

What I expect is likely to happen is that packagers will try to do
that themselves to avoid the dependency bloat.  AFAICT our current
setup will make that quite painful for them, and in any case I
don't believe it's work we should make them do.  If they fail to
do that, the burden of the extra dependencies will fall on end
users.  Either way, it's not going to make us look good.

regards, tom lane




Re: vacuum_truncate configuration parameter and isset_offset

2025-04-05 Thread Nikolay Shaplov
В письме от понедельник, 24 марта 2025 г. 22:11:19 MSK пользователь Robert 
Haas написал:

> I think that the answer here might be that Nikolay doesn't like this
> because it interacts badly with his "New [relation] options engine,"
There is no problem with adding isset_offset into "New [relation] options 
engine", not a bit deal. But it will break all internal logic of reloption. 
Both current version and my patch

> And if I'm wrong in thinking that, then I would like a detailed
> technical argument explaining exactly why.

You can't have both isset_offset flag and default_value. Either one or another. 
You can't have them both, it will make whole design inconsistent. We already 
have default_value. Even for boolean.
If you want custom behaviour when option is not set, you set unreachable 
default value. This is the way it is done for _all_ options that has custom 
behavior for unset option. Even those that a technically boolean. Why 
vacuum_truncate should do another way? Either do it the way it have been done 
before, or suggest total redisign for all oprions that has custom unset 
behaviour via unreachable defaults. This will be consistent.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: [RFC] Lock-free XLog Reservation from WAL

2025-04-05 Thread Yura Sokolov
Good day, Andres.

18.03.2025 23:40, Andres Freund wrote:
> Reliably fails tests on windows, due to what looks to be a null pointer 
> dereference.
> 
> E.g. https://cirrus-ci.com/task/6178371937239040
> 
> That's likely related to EXEC_BACKEND.
> 
> The new status of this patch is: Waiting on Author

Thank you very much for pointing on!
Yes, I've missed copying from XLogCtl as it is done for WALInsertLocks.
Fixed.

-- 
regards
Yura Sokolov aka funny-falconFrom 8cc3f9c7d629ce7fedd15f224df7566fd723cb06 Mon Sep 17 00:00:00 2001
From: Yura Sokolov 
Date: Sun, 19 Jan 2025 17:40:28 +0300
Subject: [PATCH v5] Lock-free XLog Reservation using lock-free hash-table

Removed PrevBytePos to eliminate lock contention, allowing atomic updates
to CurrBytePos. Use lock-free hash-table based on 4-way Cuckoo Hashing
to store link to PrevBytePos.
---
 src/backend/access/transam/xlog.c | 585 +++---
 src/tools/pgindent/typedefs.list  |   2 +
 2 files changed, 532 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4b6c694a3f7..82d6dc0732c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,8 @@
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
 #include "common/file_utils.h"
+#include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -384,6 +386,94 @@ typedef union WALInsertLockPadded
 	char		pad[PG_CACHE_LINE_SIZE];
 } WALInsertLockPadded;
 
+/* #define WAL_LINK_64 0 */
+#ifndef WAL_LINK_64
+#ifdef PG_HAVE_ATOMIC_U64_SIMULATION
+#define WAL_LINK_64 0
+#else
+#define WAL_LINK_64 1
+#endif
+#endif
+
+/*
+ * It links current position with previous one.
+ * - CurrPosId is (CurrBytePos ^ (CurrBytePos>>32))
+ *   Since CurrBytePos grows monotonically and it is aligned to MAXALIGN,
+ *   CurrPosId correctly identifies CurrBytePos for at least 4*2^32 = 32GB of
+ *   WAL logs.
+ * - CurrPosHigh is (CurrBytePos>>32), it is stored for strong uniqueness check.
+ * - PrevSize is difference between CurrBytePos and PrevBytePos
+ */
+typedef struct
+{
+#if WAL_LINK_64
+	uint64		CurrPos;
+	uint64		PrevPos;
+#define WAL_PREV_EMPTY (~((uint64)0))
+#define WALLinkEmpty(l) ((l).PrevPos == WAL_PREV_EMPTY)
+#define WALLinkSamePos(a, b) ((a).CurrPos == (b).CurrPos)
+#define WALLinkCopyPrev(a, b) do {(a).PrevPos = (b).PrevPos;} while(0)
+#else
+	uint32		CurrPosId;
+	uint32		CurrPosHigh;
+	uint32		PrevSize;
+#define WALLinkEmpty(l) ((l).PrevSize == 0)
+#define WALLinkSamePos(a, b) ((a).CurrPosId == (b).CurrPosId && (a).CurrPosHigh == (b).CurrPosHigh)
+#define WALLinkCopyPrev(a, b) do {(a).PrevSize = (b).PrevSize;} while(0)
+#endif
+} WALPrevPosLinkVal;
+
+/*
+ * This is an element of lock-free hash-table.
+ * In 32 bit mode PrevSize's lowest bit is used as a lock, relying on fact it is MAXALIGN-ed.
+ * In 64 bit mode lock protocol is more complex.
+ */
+typedef struct
+{
+#if WAL_LINK_64
+	pg_atomic_uint64 CurrPos;
+	pg_atomic_uint64 PrevPos;
+#else
+	pg_atomic_uint32 CurrPosId;
+	uint32		CurrPosHigh;
+	pg_atomic_uint32 PrevSize;
+	uint32		pad;			/* to align to 16 bytes */
+#endif
+} WALPrevPosLink;
+
+StaticAssertDecl(sizeof(WALPrevPosLink) == 16, "WALPrevPosLink should be 16 bytes");
+
+#define PREV_LINKS_HASH_CAPA (NUM_XLOGINSERT_LOCKS * 2)
+StaticAssertDecl(!(PREV_LINKS_HASH_CAPA & (PREV_LINKS_HASH_CAPA - 1)),
+ "PREV_LINKS_HASH_CAPA should be power of two");
+StaticAssertDecl(PREV_LINKS_HASH_CAPA < UINT16_MAX,
+ "PREV_LINKS_HASH_CAPA is too large");
+
+/*---
+ * PREV_LINKS_HASH_STRATEGY - the way slots are chosen in hash table
+ *   1 - 4 positions h1,h1+1,h2,h2+2 - it guarantees at least 3 distinct points,
+ * but may spread at 4 cache lines.
+ *   2 - 4 positions h,h^1,h^2,h^3 - 4 points in single cache line.
+ *   3 - 8 positions h1,h1^1,h1^2,h1^4,h2,h2^1,h2^2,h2^3 - 8 distinct points in
+ * in two cache lines.
+ */
+#ifndef PREV_LINKS_HASH_STRATEGY
+#define PREV_LINKS_HASH_STRATEGY 3
+#endif
+
+#if PREV_LINKS_HASH_STRATEGY <= 2
+#define PREV_LINKS_LOOKUPS 4
+#else
+#define PREV_LINKS_LOOKUPS 8
+#endif
+
+struct WALPrevLinksLookups
+{
+	uint16		pos[PREV_LINKS_LOOKUPS];
+};
+
+#define SWAP_ONCE_IN 128
+
 /*
  * Session status of running backup, used for sanity checks in SQL-callable
  * functions to start and stop backups.
@@ -395,17 +485,18 @@ static SessionBackupState sessionBackupState = SESSION_BACKUP_NONE;
  */
 typedef struct XLogCtlInsert
 {
-	slock_t		insertpos_lck;	/* protects CurrBytePos and PrevBytePos */
-
 	/*
 	 * CurrBytePos is the end of reserved WAL. The next record will be
-	 * inserted at that position. PrevBytePos is the start position of the
-	 * previously inserted (or rather, reserved) record - it is copied to the
-	 * prev-link of the next record. These are stored as "usable byte
-	 * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
+	 * inse

Detach partition with constraint test

2025-04-05 Thread Ashutosh Bapat
Hi,
I tested the "not enforced" constraint feature extensively today
especially the cases of partitioned table. Everything seems to be
working fine.

While doing that, I found that foreign_key.sql does not have a test to
make sure that a partition continues to have the constraints in the
same state after detaching. Here's a 0001 patch adding those tests in
the same block as the not enforced constraints. Probably we could add
similar test in other blocks as well, but I haven't done that yet.
Checking if something like this would be acceptable.

0002 fixes a comment in the same block.

-- 
Best Wishes,
Ashutosh Bapat
From c0305c8572c9c416da45b9a14146be049cf91102 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 3 Apr 2025 17:29:49 +0530
Subject: [PATCH 1/2] Test detaching a partition with foreign key constraint

A partition of a partitioned table with a foreign key constraint should continue
to have the constraint in the same state even after detaching it from the
partitioned table. Add a test for this case.

Author: Ashutosh Bapat
---
 src/test/regress/expected/foreign_key.out | 27 ++-
 src/test/regress/sql/foreign_key.sql  | 12 +-
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 53810a0fde9..15782da9c97 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1810,9 +1810,34 @@ ORDER BY tgrelid, tgtype;
  fk_partitioned_fk_a_b_fkey | fk_partitioned_fk_3_1 | RI_ConstraintTrigger_c_N | 17
 (14 rows)
 
+-- Detaching a partition should leave the constraint behind in a good state
+ALTER TABLE fk_partitioned_fk DETACH PARTITION fk_partitioned_fk_1;
+\d fk_partitioned_fk_1
+Table "public.fk_partitioned_fk_1"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+ b  | integer |   |  | 
+Foreign-key constraints:
+"fk_partitioned_fk_a_b_fkey" FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk(a, b)
+
+SELECT conname, tgrelid::regclass as tgrel, regexp_replace(tgname, '[0-9]+', 'N') as tgname, tgtype
+FROM pg_trigger t JOIN pg_constraint c ON (t.tgconstraint = c.oid)
+WHERE tgrelid = 'fk_partitioned_fk_1'::regclass
+ORDER BY tgrelid, tgtype;
+  conname   |tgrel|  tgname  | tgtype 
++-+--+
+ fk_partitioned_fk_a_b_fkey | fk_partitioned_fk_1 | RI_ConstraintTrigger_c_N |  5
+ fk_partitioned_fk_a_b_fkey | fk_partitioned_fk_1 | RI_ConstraintTrigger_c_N | 17
+(2 rows)
+
+INSERT INTO fk_partitioned_fk_1 (a,b) VALUES (600, 601); --fails
+ERROR:  insert or update on table "fk_partitioned_fk_1" violates foreign key constraint "fk_partitioned_fk_a_b_fkey"
+DETAIL:  Key (a, b)=(600, 601) is not present in table "fk_notpartitioned_pk".
+INSERT INTO fk_partitioned_fk_1 (a,b) VALUES (500, 501);
 ALTER TABLE fk_partitioned_fk DROP CONSTRAINT fk_partitioned_fk_a_b_fkey;
 -- done.
-DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk;
+DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk, fk_partitioned_fk_1;
 -- Altering a type referenced by a foreign key needs to drop/recreate the FK.
 -- Ensure that works.
 CREATE TABLE fk_notpartitioned_pk (a INT, PRIMARY KEY(a), CHECK (a > 0));
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 77c0c615630..e68c19d6c4b 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -1320,9 +1320,19 @@ WHERE tgrelid IN (SELECT relid FROM pg_partition_tree('fk_partitioned_fk'::regcl
   UNION ALL SELECT 'fk_notpartitioned_pk'::regclass)
 ORDER BY tgrelid, tgtype;
 
+-- Detaching a partition should leave the constraint behind in a good state
+ALTER TABLE fk_partitioned_fk DETACH PARTITION fk_partitioned_fk_1;
+\d fk_partitioned_fk_1
+SELECT conname, tgrelid::regclass as tgrel, regexp_replace(tgname, '[0-9]+', 'N') as tgname, tgtype
+FROM pg_trigger t JOIN pg_constraint c ON (t.tgconstraint = c.oid)
+WHERE tgrelid = 'fk_partitioned_fk_1'::regclass
+ORDER BY tgrelid, tgtype;
+INSERT INTO fk_partitioned_fk_1 (a,b) VALUES (600, 601); --fails
+INSERT INTO fk_partitioned_fk_1 (a,b) VALUES (500, 501);
+
 ALTER TABLE fk_partitioned_fk DROP CONSTRAINT fk_partitioned_fk_a_b_fkey;
 -- done.
-DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk;
+DROP TABLE fk_notpartitioned_pk, fk_partitioned_fk, fk_partitioned_fk_1;
 
 -- Altering a type referenced by a foreign key needs to drop/recreate the FK.
 -- Ensure that works.

base-commit: b82e7eddb023ade0739002b3ef05939ea6937e57
-- 
2.34.1

From 2b74fd3ba2e4d14484b81b53efc06f14cdfe640e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 3 Apr 2025 17:33:13 +0530
Subject: [PATCH 2/2] Fix an inaccurate comment

The comment before SQ

Re: Statistics Import and Export

2025-04-05 Thread Corey Huinker
>
> * Changed to use LookupExplicitNamespace()
>

Seems good.


> * Added test for temp tables
>

+1


> * Doc fixes


So this patch swings the pendulum a bit back towards accepting some things
as errors. That's understandable, as we're never going to have a situation
where we can guarantee that the restore functions never generate an error,
so the best we can do is to draw the error-versus-warning line at a place
that:

* doesn't mess up flawed restores that we would otherwise expect to
complete at least partially
* is easy for us to understand
* is easy for us to explain
* we can live with for the next couple of decades

I don't know where that line should be drawn, so if people are happy with
Jeff's demarcation, then less roll with it.


Re: RFC: Logging plan of the running query

2025-04-05 Thread Sami Imseih
> > 2/
> > It should be noted that the plan will not print to the log until
> > the plan begins executing the next plan node? depending on the
> > operation, that could take some time ( i.e.long seq scan of a table,
> > etc.)
> > Does this behavior need to be called out in docs?
>
> Seems reasonable, but long seq scan of a table would not cause the case
> since ExecProcNode() is called at least the number of the rows in a
> table, as far as I remember.
> Of course there can be the case where long time can elapse before
> executing ExecProcNode(), so I agree with adding the doc about this.

Correct. Seq Scan is a bad example since ExecProcNode is called for
every tuple as you mention. MultiExecProcNode, functions doing
time consuming computations, pg_sleep, etc. can all delay the
signal being sent.

I also realized that the extended query protocol may have its own caveats.
A query running under the extended protocol will alternate between
"active" and "idle in transaction"
as it transitions through parse, bind, and execute. If a user calls
pg_log_query_plan while
the state is "idle in transaction," it will result in a "backend with
PID ... is not running a query" log message.
This is more likely if the query repeatedly reissues an "execute"
message (i.e., JDBC fetchSize).
Of course, if the user executes pg_log_query_plan again, it will
(eventually) log the plan,
but it may be very confusing to see the "is not running a query"
message in the logs.
the chances of this behavior is low, but not 0, so it's probably worth
calling out
in documentation.

> 3/
> + * By default, only superusers are allowed to signal to log the plan because
> + * allowing any users to issue this request at an unbounded rate would

> Only superuser allowed to do this is very restrictive. Many shops do
> not, for good
> reasons, want DBAs or monitoring tools to connect as superuser. Why not allow
> this functionality with "pg_monitor" ?

I just realized that my comment above is unwarranted. A superuser can
just simply GRANT EXECUTE ON FUNCTION to pg_monitor, or whatever
monitoring role if they choose. You can ignore this.

--
Sami Imseih
Amazon Web Services (AWS)




Re: [18] CREATE SUBSCRIPTION ... SERVER

2025-04-05 Thread Shlok Kyal
On Sat, 1 Mar 2025 at 04:35, Jeff Davis  wrote:
>
> On Mon, 2024-12-16 at 20:05 -0800, Jeff Davis wrote:
> > On Wed, 2024-10-30 at 08:08 -0700, Jeff Davis wrote:
> >
>
> Rebased v14.
>
> The approach has changed multiple times. It starte off with more in-
> core code, but in response to review feedback, has become more
> decoupled from core and more coupled to postgres_fdw.
>
> But the patch has been about the same (just rebases) since March of
> last year, and hasn't gotten feedback since. I still think it's a nice
> feature, but I'd like some feedback on the externals of the feature.
>
> As a note, this will require a version bump for postgres_fdw for the
> new connection method.
>
Hi Jeff,

I reviewed the patch and I have a comment:

If version is >=18, the query will have 'suboriginremotelsn',
'subenabled', 'subfailover' twice.

  if (fout->remoteVersion >= 17)
  appendPQExpBufferStr(query,
- " s.subfailover\n");
+ " s.subfailover,\n");
  else
  appendPQExpBuffer(query,
-   " false AS subfailover\n");
+   " false AS subfailover,\n");
+
+ if (dopt->binary_upgrade && fout->remoteVersion >= 18)
+ appendPQExpBufferStr(query, " fs.srvname AS subservername,\n"
+ " o.remote_lsn AS suboriginremotelsn,\n"
+ " s.subenabled,\n"
+ " s.subfailover\n");
+ else
+ appendPQExpBufferStr(query, " NULL AS subservername,\n"
+ " NULL AS suboriginremotelsn,\n"
+ " false AS subenabled,\n"
+ " false AS subfailover\n");

query formed is something like:
"SELECT s.tableoid, s.oid, s.subname,\n s.subowner,\n s.subconninfo,
s.subslotname, s.subsynccommit,\n s.subpublications,\n s.subbinary,\n
s.substream,\n s.subtwophasestate,\n s.subdisableonerr,\n
s.subpasswordrequired,\n s.subrunasowner,\n s.suborigin,\n NULL AS
suboriginremotelsn,\n false AS subenabled,\n s.subfailover,\n NULL AS
subservername,\n NULL AS suboriginremotelsn,\n false AS subenabled,\n
false AS subfailover\nFROM pg_subscription s\nWHERE s.subdbid =
(SELECT oid FROM pg_database\n.."

is it expected?

Thanks and Regards,
Shlok Kyal




Re: Show WAL write and fsync stats in pg_stat_io

2025-04-05 Thread Ranier Vilela
Hi.

Em seg., 3 de fev. de 2025 às 01:07, Michael Paquier 
escreveu:

> On Fri, Jan 31, 2025 at 11:29:31AM +0300, Nazir Bilal Yavuz wrote:
> > On Wed, 29 Jan 2025 at 18:16, Bertrand Drouvot
> >  wrote:
> >> I think that's the main reason why ff99918c625 added this new GUC
> (looking at
> >> the commit message). I'd feel more comfortable if we keep it.
> >
> > As Michael suggested, I will run a couple of benchmarks to see the
> > actual effect of this change. Then let's see if this affects anything.
>
> I've looked at bit at all that today, and something like the attached
> is what seems like the best streamlined version to me for the main
> feature.  I am also planning to run some short benchmarks with
> track_io_timing=on on HEAD and with the patch, then see the
> difference, without any relationship to track_wal_io_timing.
>
> The comment additions in pgstat_count_io_op_time() were worth a patch
> of their own.  This part has been applied as b998fedab74c, after a few
> tweaks of my own.
>
Sorry, I couldn't find the email in this thread, linked to the commit:
a051e71 

I think it left an oversight.
Copy and past thinko?

Attached a trivial patch.

best regards,
Ranier Vilela


fix-possible-copy-and-paste-thinko.patch
Description: Binary data


Re: [PoC] Reducing planning time when tables have many partitions

2025-04-05 Thread Yuya Watari
Hello Tom,

Thank you for your detailed review, and apologies for my late response.

On Tue, Mar 25, 2025 at 2:49 AM Tom Lane  wrote:
>
> One thing I don't love is putting the children into RelOptInfos.
> That seems like an unrelated data structure.  Have you thought
> about instead having, in each EC that needs it, an array indexed
> by RTI of per-relation child-member lists?  I think this might
> net out as less storage because there typically aren't that many
> ECs in a query.  But the main thing is to not have so many
> interconnections between ECs and RelOptInfos.

Thank you for your suggestion. Storing EquivalenceMembers in
RelOptInfos indeed complicates the data structures involved. In the
next version, I will explore alternative approaches, including the one
you have suggested.

> Another thing I really don't like is the back-link from EMs to ECs:
>
> +   EquivalenceClass *em_ec;/* EquivalenceClass which has this 
> member */
>
> That makes the data structure circular, which will cause pprint to
> recurse infinitely.  (The fact that you hadn't noticed that makes
> me wonder how you debugged any of these data structure changes.)
> We could prevent the recursion with suitable annotation on this field,
> but I'd really rather not have the field in the first place.  Circular
> pointers are dangerous and best avoided.  Also, it's bloating a node
> type that you are concerned about supporting a lot of.  Another point
> is that I don't see any code to take care of updating these links
> during an EC merge.

I apologize for missing this critical point. It is clear that avoiding
circular dependencies would be preferable, so I will reconsider this
aspect of the design.

> * setup_eclass_member_iterator_with_children is a carpal-tunnel-inducing
> name.  Could we drop the "_with_children" part?  It doesn't seem to
> add much, since there's no variant for "without children".

Thank you for this suggestion. I will remove "_with_children" in the
next version.

> * The root parameter should be first; IMO there should be no
> exceptions to that within the planner.  Perhaps putting the target
> iterator parameter last would make it read more nicely.  Or you could
> rely on struct assignment:
>
> it = setup_eclass_member_iterator(root, ec, relids);

I agree with your point. I will adjust the parameter order in the next
version to match your suggestion.

> * Why did you define the iterator as possibly returning irrelevant
> members?  Doesn't that mean that every caller has to double-check?
> Wouldn't it make for less code and fewer bugs for the iterator to
> have that responsibility?  If there is a good reason to do it like
> that, the comments should explain why.

This design was chosen for performance reasons. If the iterator always
filtered out irrelevant members, it would need to repeatedly check
each element against "bms_is_subset". However, some callers require
stricter conditions, such as "bms_equals", resulting in redundant
checks. Therefore, the iterator intentionally returns some false
positives, leaving it to callers to perform additional checks for the
exact conditions they require. As you pointed out, I failed to clearly
document this, and I will fix this oversight in the next version.

> I don't really like the concept of 0004 at all.  Putting *all*
> the EC-related RelOptInfos into a root-stored list seems to be
> doubling down very hard on the assumption that no performance-critical
> operations will ever need to search that whole list.  Is there a good
> reason to do it like that, rather than say using the bitmap-index
> concept separately within each EC?  That might also alleviate the
> problem you're having with the bitmapsets getting too big.

Thank you for this suggestion. The patch series indeed has issues with
memory consumption. Your suggestion to manage bitmap indexes
separately within each EC seems worth exploring, and I will
investigate this approach further.

> Given that we've only got a week left, I see little hope of getting
> any of this into v18.

I agree that addressing these issues within the remaining time is
challenging. The design clearly needs reconsideration. Therefore, I
will postpone these changes and submit a fully revised version for
v19. Would this approach be acceptable to you?

-- 
Best regards,
Yuya Watari




Re: Modern SHA2- based password hashes for pgcrypto

2025-04-05 Thread Alvaro Herrera
Hello,

I have pushed this now, hoping it won't explode.

Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"




Re: New criteria for autovacuum

2025-04-05 Thread Konstantin Knizhnik



On 03/04/2025 6:29 pm, Aleksander Alekseev wrote:

I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.


Overhead seems to be minimal (update of one statistic field in case of 
heap fetch in index-only scan) and information about visibility checks 
performed by IOS seems to be useful in any case, even if it is not used 
by autovacuum.


So I am not sure how frequent this scenario is (when index-only scan has 
to perform extra heap checks or is just not used because of VM 
examination), but if it can be prevented without much efforts or 
overhead, then why not to implement it?






Re: RFC: Additional Directory for Extensions

2025-04-05 Thread Matheus Alcantara
On Wed, Mar 19, 2025 at 3:56 PM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > Committed that, thanks.
>
> Buildfarm member snakefly doesn't like this too much.  Since no other
> animals have failed, I guess it must be about local conditions on
> that machine, but the report is pretty opaque:
>
> # +++ tap check in src/test/modules/test_extensions +++
>
> #   Failed test '$system extension is installed correctly on 
> pg_available_extensions'
> #   at t/001_extension_control_path.pl line 69.
> #  got: 'f'
> # expected: 't'
>
> #   Failed test '$system extension is installed correctly on 
> pg_available_extensions with empty extension_control_path'
> #   at t/001_extension_control_path.pl line 76.
> #  got: 'f'
> # expected: 't'
> # Looks like you failed 2 tests of 5.
> [06:43:53] t/001_extension_control_path.pl ..
> Dubious, test returned 2 (wstat 512, 0x200)
> Failed 2/5 subtests
>
> Looking at the test, it presupposes that "amcheck" must be an
> available extension.  I do not see anything that guarantees
> that that's so, though.  It'd fail if contrib hasn't been
> installed.  Is there a reason to use "amcheck" rather than
> something more certainly available, like "plpgsql"?

There is no specific reason to use "amcheck" instead of "plpgsql". Attached a
patch with this change, sorry about that.

(Not sure if we should also improve the message to make the test failure less
opaque?)

-- 
Matheus Alcantara


v1-0001-Fix-extension-control-path-tests.patch
Description: Binary data


Re: vacuum_truncate configuration parameter and isset_offset

2025-04-05 Thread Nikolay Shaplov
В письме от среда, 26 марта 2025 г. 17:42:23 MSK пользователь Robert Haas 
написал:

> 1. We're talking about a minor deviation resulting in a very small
> amount of additional code. It's entirely unclear to me why anyone
> thinks this is a big deal either way, even if one accepts that the
> patch is "wrong", which I don't.
This code is important part of my life for 8 years. There are not many 
commits, but I do code rebases all the time, I port features that have been 
added for all these years. I do care about it, it is big deal for me.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: Separate GUC for replication origins

2025-04-05 Thread Amit Kapila
On Wed, Mar 19, 2025 at 10:43 AM Masahiko Sawada  wrote:
>
> On Mon, Mar 17, 2025 at 6:05 PM Euler Taveira  wrote:
> >
> > On Mon, Mar 17, 2025, at 8:44 PM, Masahiko Sawada wrote:
> >
> > I would suggest putting the new max_active_replication_origins after
> > max_parallel_apply_workers_per_subscription as both
> > max_sync_workers_per_subscription and
> > max_parallel_apply_workers_per_subscription are related to
> > max_logical_replication_workers.
> >
> >
> > Good point. Looking at the documentation, the old max_replication_slots
> > parameter was the first one in that section so I decided to use the same 
> > order
> > for the postgresql.conf.sample.
>
> Thank you for updating the patch!
>

*
   
Logical replication requires several configuration options to be set. Most
-   options are relevant only on one side of the replication. However,
-   max_replication_slots is used on both the publisher and
-   the subscriber, but it has a different meaning for each.
+   options are relevant only on one side of the replication.
   

In this para, after removing the content about max_replication_slots,
the other line: "Most options are relevant only on one side of the
replication." doesn't make sense because there is no other option that
applies to both sides and if there is one then we should mention about
that.

> The patch looks good to me.
>

Other than the above, the patch looks good to me as well.

-- 
With Regards,
Amit Kapila.




  1   2   3   >