date:20230901

During pg_upgrade, we start the server for the old cluster which can
allow the checkpointer to remove the WAL files. It has been noticed
that we do generate certain types of WAL records (e.g
XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
even during pg_upgrade for old cluster, so additional WAL records
could let checkpointer decide that certain WAL segments can be removed
(e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
the slots. Currently, I can't see any problem with this but for future
work where we want to migrate logical slots during an upgrade[1], we
need to decide what to do for such cases. The initial idea we had was
that if the old cluster has some invalid slots, we won't allow an
upgrade unless the user removes such slots or uses some option like
--exclude-slots. It is quite possible that slots got invalidated
during pg_upgrade due to no user activity. Now, even though the
possibility of the same is less I think it is worth considering what
should be the behavior.

The other possibilities apart from not allowing an upgrade in such a
case could be (a) Before starting the old cluster, we fetch the slots
directly from the disk using some tool like [2] and make the decisions
based on that state; (b) During the upgrade, we don't allow WAL to be
removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
as well but for that, we need to expose an API to invalidate the
slots; (d) somehow distinguish the slots that are invalidated during
an upgrade and then simply copy such slots because anyway we ensure
that all the WAL required by slot is sent before shutdown.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Extract numeric filed in JSONB more effectively

I think the last patch failed. I am not 100% sure.
https://cirrus-ci.com/task/5464366154252288
says "Created 21 hours ago", I assume the latest patch.

the diff in Artifacts section. you can go to
testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out
2023-09-01 03:34:43.585036700 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out
2023-09-01 03:39:05.800452844 +
@@ -528,7 +528,7 @@
 (3 rows)

 SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type
= 'object';
-ERROR:  cannot cast jsonb string to type integer
+ERROR:  unknown jsonb type: 1125096840
 SELECT (test_json -> 'field1')::bool FROM test_jsonb WHERE json_type
= 'object';
 ERROR:  cannot cast jsonb string to type boolean
 \pset null ''

Re: Eager page freeze criteria clarification

2023-09-01 Thread Peter Geoghegan

On Fri, Sep 1, 2023 at 12:34 PM Robert Haas  wrote:
> If the table
> is being vacuumed frequently, then the last-vacuum-LSN will be newer,
> which means we'll freeze *more* aggressively.

Sort of like how if you set autovacuum_vacuum_scale_factor low enough,
with standard pgbench, with heap fill factor tuned, autovacuum will
never think that the table doesn't need to be vacuumed. It will
continually see enough dead heap-only tuples to get another autovacuum
each time. Though there won't be any LP_DEAD items at any point --
regardless of when VACUUM actually runs.

When I reported this a couple of years ago, I noticed that autovacuum
would spin whenever I set autovacuum_vacuum_scale_factor to 0.02. But
autovacuum would *never* run (outside of antiwraparound autovacuums)
when it was set just a little higher (perhaps 0.03 or 0.04). So there
was some inflection point at which its behavior totally changed.

> And I'm not sure why we
> want to do that. If the table is being vacuumed a lot, it's probably
> also being modified a lot, which suggests that we ought to be more
> cautious about freezing, rather than the reverse.

Why wouldn't it be both things at the same time, for the same table?

Why not also avoid setting pages all-visible? The WAL records aren't
too much smaller than most freeze records these days -- 64 bytes on
most systems. I realize that the rules for FPIs are a bit different
when page-level checksums aren't enabled, but fundamentally it's the
same situation. No?

--
Peter Geoghegan

Re: Why doesn't Vacuum FULL update the VM

2023-09-01 Thread Peter Geoghegan

On Fri, Sep 1, 2023 at 12:34 PM Melanie Plageman
 wrote:
> I don't see why the visibility map shouldn't be updated so that all of
> the pages show all visible and all frozen for this relation after the
> vacuum full.

There was a similar issue with COPY FREEZE. It was fixed relatively
recently -- see commit 7db0cd21.

-- 
Peter Geoghegan

Re: SQL:2011 application time

2023-09-01 Thread Vik Fearing


On 9/1/23 21:56, Paul Jungwirth wrote:

On 9/1/23 03:50, Vik Fearing wrote:

On 9/1/23 11:30, Peter Eisentraut wrote:
1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
OVERLAPS clause attach to the last column, or to the whole column 
list? In the SQL standard, you can only have one period and it has to 
be listed last, so this question does not arise.  But here we are 
building a more general facility to then build the SQL facility on 
top of.  So I think it doesn't make sense that the range column must 
be last or that there can only be one.  Also, your implementation 
requires at least one non-overlaps column, which also seems like a 
confusing restriction.


I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
would be possible.  Then the WITHOUT OVERLAPS clause would directly 
correspond to the choice between equality or overlaps operator per 
column.


An alternative interpretation would be that WITHOUT OVERLAPS applies 
to the whole column list, and we would take it to mean, for any range 
column, use the overlaps operator, for any non-range column, use the 
equals operator.  But I think this would be confusing and would 
prevent the case of using the equality operator for some ranges and 
the overlaps operator for some other ranges in the same key.


I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
the column or expression it is attached to, and need not be last in line.


I agree. The second option seems confusing and is more restrictive.

I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality. Several books[1,2] about temporal databases describe a 
multi-dimensional temporal space (even beyond application time vs. 
system time), and the standard is pretty disappointing here. It's not a 
weird idea.


But I just want to be explicit that this isn't something the standard 
describes. (I think everyone in the conversation so far understands 
that.) So far I've tried to be pretty scrupulous about following 
SQL:2011, although personally I'd rather see Postgres support this 
functionality. And it's not like it goes *against* what the standard 
says. But if there are any objections, I'd love to hear them before 
putting in the work. :-)



I have no problem with a first version doing exactly what the standard 
says and expanding it later.



If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE 
constraints, then surely we also allow multiple+anywhere PERIOD in 
FOREIGN KEY constraints too. (I guess the standard switched keywords 
because a FK is more like "MUST OVERLAPS". :-)



Seems reasonable.


Also if you have multiple application-time dimensions we probably need 
to allow multiple FOR PORTION OF clauses. I think the syntax would be:


UPDATE t
   FOR PORTION OF valid_at FROM ... TO ...
   FOR PORTION OF asserted_at FROM ... TO ...
   [...]
   SET foo = bar

Does that sound okay?



That sounds really cool.


[1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational 
Theory, Second Edition: Temporal Databases in the Relational Model and 
SQL. 2nd edition, 2014.

[2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014.



Thanks!  I have ordered these books.
--
Vik Fearing

Re: Why doesn't Vacuum FULL update the VM

2023-09-01 Thread Vik Fearing


On 9/1/23 21:34, Melanie Plageman wrote:

Hi,

I noticed that VACUUM FULL actually does freeze the tuples in the
rewritten table (heap_freeze_tuple()) but then it doesn't mark them
all visible or all frozen in the visibility map. I don't understand
why. It seems like it would save us future work.


I have often wondered this as well, but obviously I haven't done 
anything about it.



I don't see why the visibility map shouldn't be updated so that all of
the pages show all visible and all frozen for this relation after the
vacuum full.


It cannot just blindly mark everything all visible and all frozen 
because it will copy over dead tuples that concurrent transactions are 
still allowed to see.

--
Vik Fearing

Re: sandboxing untrusted code

2023-09-01 Thread Jeff Davis

On Fri, 2023-09-01 at 09:12 -0400, Robert Haas wrote:
> Close but not quite. As you say, #2 does exercise privileges. Also,
> even if no privileges are exercised, you could still refer to
> CURRENT_ROLE, and I think you could also call a function like
> has_table_privilege.  Your identity hasn't changed, but you're
> restricted from exercising some of your privileges. Really, you still
> have them, but they're just not available to you in that situation.

Which privileges are available in a sandboxed environment, exactly? Is
it kind of like masking away all privileges except EXECUTE, or are
other privileges available, like SELECT?

And the distinction that you are drawing between having the privileges
but them (mostly) not being available, versus not having the privileges
at all, is fairly subtle. Some examples showing why that distinction is
important would be helpful.

> 
> > Although your proposal sounds like a good security backstop, it
> > feels
> > like it's missing the point that there are different _kinds_ of
> > functions. We already have the IMMUTABLE marker and we already have
> > runtime checks to make sure that immutable functions can't CREATE
> > TABLE; why not build on that mechanism or create new markers?

...

> Here, however, we can't trust the owners
> of functions to label those functions accurately.

Of course, but observe:

  =# CREATE FUNCTION f(i INT) RETURNS INT IMMUTABLE LANGUAGE plpgsql AS
  $$
  BEGIN
CREATE TABLE x(t TEXT);
RETURN 42 + i;
  END;
  $$;

  =# SELECT f(2);
  ERROR:  CREATE TABLE is not allowed in a non-volatile function
  CONTEXT:  SQL statement "CREATE TABLE x(t TEXT)"
  PL/pgSQL function f(integer) line 3 at SQL statement

The function f() is called at the top level, not as part of any index
expression or other special context. But it fails to CREATE TABLE
simply because that's not an allowed thing for an IMMUTABLE function to
do. That tells me right away that my function isn't going to work, and
I can rewrite it rather than waiting for some other user to say that it
failed when run in a sandbox.

>  It won't do for
> Alice to create a function and then apply the NICE_AND_SAFE marker to
> it.

You can if you always execute NICE_AND_SAFE functions in a sandbox. The
difference is that it's always executed in a sandbox, rather than
sometimes, so it will fail consistently.

> Now, in the case of a C function, things are a bit different. We
> can't
> inspect the generated machine code and know what the function does,
> because of that pesky halting problem. We could handle that either
> through function labeling, since only superusers can create C
> functions, or by putting checks directly in the C code. I was
> somewhat
> inclined toward the latter approach, but I'm not completely sure yet
> what makes sense. Thinking about your comments here made me realize
> that there are other procedural languages to worry about, too, like
> PL/python or PL/perl or PL/sh. Whatever we do for the C functions
> will
> have to be extended to those cases somehow as well. If we label
> functions, then we'll have to allow superusers only to label
> functions
> in these languages as well and make the default label "this is
> unsafe." If we put checks in the C code then I guess any given PL
> needs to certify that it knows about sandboxing or have all of its
> functions treated as unsafe. I think doing this at the C level would
> be better, strictly speaking, because it's more granular. Imagine a
> function that only conditionally does some prohibited action - it can
> be allowed to work in the cases where it does not attempt the
> prohibited operation, and blocked when it does. Labeling is
> all-or-nothing.

Here I'm getting a little lost in what you mean by "prohibited
operation". Most languages mostly use SPI, and whatever sandboxing
checks you do should work there, too. Are you talking about completely
separate side effects like writing files or opening sockets?

Regards,
Jeff Davis

Re: Inefficiency in parallel pg_restore with many tables

On Fri, Sep 01, 2023 at 04:00:44PM -0400, Robert Haas wrote:
> In hindsight, I think that making binaryheap depend on Datum was a bad
> idea. I think that was my idea, and I think it wasn't very smart.
> Considering that people have coded to that decision up until now, it
> might not be too easy to change at this point. But in principle I
> guess you'd want to be able to make a heap out of any C data type,
> rather than just Datum, or just Datum in the backend and just void *
> in the frontend.

Yeah, something similar to simplehash for binary heaps could be nice.  That
being said, I don't know if there's a strong reason to specialize the
implementation for a given C data type in most cases.  I suspect many
callers are just fine with dealing with pointers (e.g., I wouldn't store an
entire TocEntry in the array), and smaller types like integers are already
stored directly in the array thanks to the use of Datum.  However, it
_would_ allow us to abandon this frontend/backend void */Datum kludge,
which is something.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Transaction timeout


On 12.01.23 20:46, Andrey Borodin wrote:

On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote:

I've rewritten this part to correctly report all timeouts that did
happen. However there's now a tricky comma-formatting code which was
tested only manually.

I suspect this will make translation difficult.

I use special functions for this like _()

char* lock_reason = lock_timeout_occurred ? _("lock timeout") : "";

and then
ereport(ERROR, (errcode(err_code),
  errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1,
  stmt_reason, comma2, tx_reason)));

I hope it will be translatable...


No, you can't do that.  You have to write out all the strings separately.

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin  wrote:
> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
> issue for us.)

Maybe it'd be worth trying something stronger, like
pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from
reordering loads and stores as it goes, and ARM64 has weak memory
ordering.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

On Fri, Sep 1, 2023 at 11:47 AM Ranier Vilela  wrote:
> If a null locale is reached in these paths.
> elog will dereference a null pointer.

True. That's sloppy coding.

I don't know enough about this code to be sure whether the error
messages that you propose are for the best.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [17] CREATE SUBSCRIPTION ... SERVER

On Fri, Sep 1, 2023 at 4:04 PM Jeff Davis  wrote:
> On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote:
> > Maybe move postgres_fdw to be a first class built in feature instead
> > of
> > an extension?
>
> That could make sense, but we still have to solve the problem of how to
> present a built-in FDW.
>
> FDWs don't have a schema, so it can't be inside pg_catalog. So we'd
> need some special logic somewhere to make pg_dump and psql \dew work as
> expected, and I'm not quite sure what to do there.

I'm worried that an approach based on postgres_fdw would have security
problems. I think that we don't want postgres_fdw installed in every
PostgreSQL cluster for security reasons. And I think that the set of
people who should be permitted to manage connection strings for
logical replication subscriptions could be different from the set of
people who are entitled to use postgres_fdw.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Inefficiency in parallel pg_restore with many tables

On Tue, Jul 25, 2023 at 2:53 PM Nathan Bossart  wrote:
> On Mon, Jul 24, 2023 at 12:00:15PM -0700, Nathan Bossart wrote:
> > Here is a sketch of this approach.  It required fewer #ifdefs than I was
> > expecting.  At the moment, this one seems like the winner to me.
>
> Here is a polished patch set for this approach.  I've also added a 0004
> that replaces the open-coded heap in pg_dump_sort.c with a binaryheap.
> IMHO these patches are in decent shape.

[ drive-by comment that hopefully doesn't cause too much pain ]

In hindsight, I think that making binaryheap depend on Datum was a bad
idea. I think that was my idea, and I think it wasn't very smart.
Considering that people have coded to that decision up until now, it
might not be too easy to change at this point. But in principle I
guess you'd want to be able to make a heap out of any C data type,
rather than just Datum, or just Datum in the backend and just void *
in the frontend.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL:2011 application time

2023-09-01 Thread Paul Jungwirth

On 9/1/23 03:50, Vik Fearing wrote:

On 9/1/23 11:30, Peter Eisentraut wrote:
1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
OVERLAPS clause attach to the last column, or to the whole column 
list? In the SQL standard, you can only have one period and it has to 
be listed last, so this question does not arise.  But here we are 
building a more general facility to then build the SQL facility on top 
of.  So I think it doesn't make sense that the range column must be 
last or that there can only be one.  Also, your implementation 
requires at least one non-overlaps column, which also seems like a 
confusing restriction.

I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
would be possible.  Then the WITHOUT OVERLAPS clause would directly 
correspond to the choice between equality or overlaps operator per 
column.

An alternative interpretation would be that WITHOUT OVERLAPS applies 
to the whole column list, and we would take it to mean, for any range 
column, use the overlaps operator, for any non-range column, use the 
equals operator.  But I think this would be confusing and would 
prevent the case of using the equality operator for some ranges and 
the overlaps operator for some other ranges in the same key.

I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
the column or expression it is attached to, and need not be last in line.

I agree. The second option seems confusing and is more restrictive.

I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality. Several books[1,2] about temporal databases describe a 
multi-dimensional temporal space (even beyond application time vs. 
system time), and the standard is pretty disappointing here. It's not a 
weird idea.

But I just want to be explicit that this isn't something the standard 
describes. (I think everyone in the conversation so far understands 
that.) So far I've tried to be pretty scrupulous about following 
SQL:2011, although personally I'd rather see Postgres support this 
functionality. And it's not like it goes *against* what the standard 
says. But if there are any objections, I'd love to hear them before 
putting in the work. :-)

If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE 
constraints, then surely we also allow multiple+anywhere PERIOD in 
FOREIGN KEY constraints too. (I guess the standard switched keywords 
because a FK is more like "MUST OVERLAPS". :-)

Also if you have multiple application-time dimensions we probably need 
to allow multiple FOR PORTION OF clauses. I think the syntax would be:

UPDATE t
  FOR PORTION OF valid_at FROM ... TO ...
  FOR PORTION OF asserted_at FROM ... TO ...
  [...]
  SET foo = bar

Does that sound okay?

I don't quite understand this part:

>> Also, your implementation
>> requires at least one non-overlaps column, which also seems like a
>> confusing restriction.

That's just a regular non-temporal constraint. Right? If I'm missing 
something let me know.

[1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational 
Theory, Second Edition: Temporal Databases in the Relational Model and 
SQL. 2nd edition, 2014.

[2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014.

--
Paul  ~{:-)
p...@illuminatedcomputing.com

Re: Replace known_assigned_xids_lck by memory barrier

On Fri, Sep 1, 2023 at 3:41 PM Nathan Bossart  wrote:
> On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote:
> > Ah, that explains it.  v5 of the patch is attached.
>
> Barring additional feedback, I plan to commit this patch in the current
> commitfest.

I'm not an expert on this code but I looked at this patch briefly and
it seems OK to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add 'worker_type' to pg_stat_subscription

On Wed, Aug 16, 2023 at 07:14:18PM +1000, Peter Smith wrote:
> Earlier this year I proposed a small change for the pg_stat_subscription view:
> 
> --
> ...it would be very useful to have an additional "kind" attribute for
> this view. This will save the user from needing to do mental
> gymnastics every time just to recognise what kind of process they are
> looking at.
> --
> 
> At that time Amit replied [1] that this could be posted as a separate
> enhancement thread.
> 
> Now that the LogicalRepWorkerType has been recently pushed [2]
> (something with changes in the same area of the code) it seemed the
> right time to resurrect my pg_stat_subscription proposal.

This sounds generally reasonable to me.

+   worker_type text
+  
+  
+   Type of the subscription worker process. Possible values are:
+   
+
+
+  a: apply worker
+ 
+
+
+ 
+  p: parallel apply worker
+ 
+
+
+ 
+  t: tablesync worker
+ 
+
+   
+  
+ 

Is there any reason not to spell out the names?  I think that would match
the other system views better (e.g., backend_type in pg_stat_activity).
Also, instead of "tablesync worker", I'd suggest using "synchronization
worker" to match the name used elsewhere in this table.

I see that the table refers to "leader apply workers".  Would those show up
as parallel apply workers in the view?  Can we add another worker type for
those?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Why doesn't Vacuum FULL update the VM

2023-09-01 Thread Melanie Plageman

Hi,

I noticed that VACUUM FULL actually does freeze the tuples in the
rewritten table (heap_freeze_tuple()) but then it doesn't mark them
all visible or all frozen in the visibility map. I don't understand
why. It seems like it would save us future work.

Here is an example:

create extension pg_visibility;
drop table if exists foo;
create table foo(a int) with (autovacuum_enabled=false);
insert into foo select i%3 from generate_series(1,300)i;
update foo set a = 5 where a = 2;
select * from pg_visibility_map_summary('foo');
vacuum (verbose) foo;
select * from pg_visibility_map_summary('foo');
vacuum (full, verbose) foo;
select * from pg_visibility_map_summary('foo');

I don't see why the visibility map shouldn't be updated so that all of
the pages show all visible and all frozen for this relation after the
vacuum full.

- Melanie

Re: Eager page freeze criteria clarification

On Tue, Aug 29, 2023 at 10:22 AM Robert Haas  wrote:
> Let's assume for a moment that the rate at which the insert LSN is
> advancing is roughly constant over time, so that it serves as a good
> proxy for wall-clock time. Consider four tables A, B, C, and D that
> are, respectively, vacuumed once per minute, once per hour, once per
> day, and once per week. With a 33% threshold, pages in table A will be
> frozen if they haven't been modified in 20 seconds, page in table B
> will be frozen if they haven't been modified in 20 minutes, pages in
> table C will be frozen if they haven't been modified in 8 hours, and
> pages in table D will be frozen if they haven't been modified in 2
> days, 8 hours. My intuition is that this feels awfully aggressive for
> A and awfully passive for D.
>
> [ discussion of freeze-on-evict ]

Another way of thinking about this is: instead of replacing this
heuristic with a complicated freeze-on-evict system, maybe we just
need to adjust the heuristic. Maybe using an LSN is a good idea, but
maybe the LSN of the last table vacuum isn't the right one to be
using, or maybe it shouldn't be the only one that we use. For
instance, what about using the redo LSN of the last checkpoint, or the
checkpoint before the last checkpoint, or something like that?
Somebody might find that unprincipled, but it seems to me that the
checkpoint cycle has a lot to do with whether or not opportunistic
freezing makes sense. If a page is likely to be modified again before
a checkpoint forces it to be written, then freezing it is likely a
waste. If it's likely to be written out of shared_buffers before it's
modified again, then freezing it now is a pretty good bet. A given
page could be evicted from shared_buffers, and thus written, sooner
than the next checkpoint, but if it's dirty now, it definitely won't
be written any later than the next checkpoint. By looking at the LSN
of a page that I'm about to modify just before I modify it, I can make
some kind of a guess as to whether this is a page that is being
modified more or less than once per checkpoint cycle and adjust
freezing behavior accordingly.

One could also use a hybrid of the two values e.g. normally use the
insert LSN of the last VACUUM, but if that's newer than the redo LSN
of the last checkpoint, then use the latter instead, to avoid doing
too much freezing of pages that may have been quiescent for only a few
tens of seconds. I don't know if that's better or worse. As I think
about it, I realize that I don't really know why Andres suggested a
last-vacuum-LSN-based heuristic in the first place. Before, I wrote of
this that "One thing I really like about it is that if the table is
being vacuumed frequently, then we freeze less aggressively, and if
the table is being vacuumed infrequently, then we freeze more
aggressively." But actually, I don't think it does that. If the table
is being vacuumed frequently, then the last-vacuum-LSN will be newer,
which means we'll freeze *more* aggressively. And I'm not sure why we
want to do that. If the table is being vacuumed a lot, it's probably
also being modified a lot, which suggests that we ought to be more
cautious about freezing, rather than the reverse.

Just spitballing here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Speaker Bureau

2023-09-01 Thread Dave Cramer

Greetings,

If you are on the speaker list can you send an email to ugc...@postgresql.us
indicating whether you are available to travel for meetups?

This serves the obvious purpose but also provides your email address to us.

Thanks,

Dave Cramer

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-09-01 Thread Alexander Lakhin


Hello Tomas,

01.09.2023 16:00, Tomas Vondra wrote:

Hmmm, I'm not very good at reading the binary code, but here's what
objdump produced for WaitEventSetWait. Maybe someone will see what the
issue is.


At first glance, I can't see anything suspicious in the disassembly.
IIUC, waiting = true presented there as:
  805c38: b902ad18  str w24, [x8, #684] // pgstat_report_wait_start(): 
proc->wait_event_info = wait_event_info;
// end of pgstat_report_wait_start(wait_event_info);

  805c3c: b0ffdb09  adrp    x9, 0x366000 
  805c40: b0ffdb0a  adrp    x10, 0x366000 
  805c44: feeb  adrp    x11, 0x9e4000 

  805c48: 52800028  mov w8, #1 // true
  805c4c: 52800319  mov w25, #24
  805c50: 5280073a  mov w26, #57
  805c54: fd446128  ldr d8, [x9, #2240]
  805c58: 9d7b  adrp    x27, 0x9b1000 
  805c5c: fd415949  ldr d9, [x10, #688]
  805c60: f9071d68  str x8, [x11, #3640] // waiting = true (x8 = w8)
So there are two simple mov's and two load operations performed in parallel,
but I don't think it's similar to what we had in that case.


I thought about maybe just adding the barrier in the code, but then how
would we know it's the issue and this fixed it? It happens so rarely we
can't make any conclusions from a couple runs of tests.


Probably I could construct a reproducer for the lockup if I had access to
the such machine for a day or two.

Best regards,
Alexander

Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-01 Thread Jeff Davis

On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote:
> Maybe move postgres_fdw to be a first class built in feature instead
> of 
> an extension?

That could make sense, but we still have to solve the problem of how to
present a built-in FDW.

FDWs don't have a schema, so it can't be inside pg_catalog. So we'd
need some special logic somewhere to make pg_dump and psql \dew work as
expected, and I'm not quite sure what to do there.

Regards,
Jeff Davis

Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-01 Thread Jeff Davis

On Fri, 2023-09-01 at 12:28 +0530, Ashutosh Bapat wrote:
> Thinking larger, how about we allow any FDW to be used here.

That's a possibility, but I think that means the subscription would
need to constantly re-check the parameters rather than relying on the
FDW's validator.

Otherwise it might be the wrong kind of FDW, and the user might be able
to circumvent the password_required protection. It might not even be a
postgres-related FDW at all, which would be a bit strange.

If it's constantly re-checking the parameters then it raises the
possibility that some "ALTER SERVER" or "ALTER USER MAPPING" succeeds
but then subscriptions to that foreign server start failing, which
would not be ideal. But I could be fine with that.

> But I think there's some value in bringing
> together these two subsystems which deal with foreign data logically
> (as in logical vs physical view of data).

I still don't understand how a core dependency on an extension would
work.

Regards,
Jeff Davis

Re: Replace known_assigned_xids_lck by memory barrier

On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote:
> Ah, that explains it.  v5 of the patch is attached.

Barring additional feedback, I plan to commit this patch in the current
commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: should frontend tools use syncfs() ?

On Fri, Sep 01, 2023 at 01:19:13PM -0500, Justin Pryzby wrote:
> What about (per git grep no-sync doc) pg_receivewal?

I don't think it's applicable there, either.  IIUC that option specifies
whether to sync the data as it is streamed over.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: should frontend tools use syncfs() ?

2023-09-01 Thread Justin Pryzby

On Fri, Sep 01, 2023 at 11:08:51AM -0700, Nathan Bossart wrote:
> > This should probably give a distinct error when syncfs is not supported
> > than when it's truely recognized.
> 
> Later versions of the patch should have this.

Oops, right.

> > The patch should handle pg_dumpall, too.
> 
> It looks like pg_dumpall only ever fsyncs a single file, so I don't think
> it is really needed there.

What about (per git grep no-sync doc) pg_receivewal?

-- 
Justin

Re: should frontend tools use syncfs() ?

Thanks for taking a look.

On Fri, Sep 01, 2023 at 12:58:10PM -0500, Justin Pryzby wrote:
>> +if (!user_opts.sync_method)
>> +user_opts.sync_method = pg_strdup("fsync");
> 
> why pstrdup?

I believe I was just following the precedent set by some of the other
options.

>> +parse_sync_method(const char *optarg, SyncMethod *sync_method)
>> +{
>> +if (strcmp(optarg, "fsync") == 0)
>> +*sync_method = SYNC_METHOD_FSYNC;
>> +#ifdef HAVE_SYNCFS
>> +else if (strcmp(optarg, "syncfs") == 0)
>> +*sync_method = SYNC_METHOD_SYNCFS;
>> +#endif
>> +else
>> +{
>> +pg_log_error("unrecognized sync method: %s", optarg);
>> +return false;
>> +}
> 
> This should probably give a distinct error when syncfs is not supported
> than when it's truely recognized.

Later versions of the patch should have this.

> The patch should handle pg_dumpall, too.

It looks like pg_dumpall only ever fsyncs a single file, so I don't think
it is really needed there.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: should frontend tools use syncfs() ?

2023-09-01 Thread Justin Pryzby

> + if (!user_opts.sync_method)
> + user_opts.sync_method = pg_strdup("fsync");

why pstrdup?

> +parse_sync_method(const char *optarg, SyncMethod *sync_method)
> +{
> + if (strcmp(optarg, "fsync") == 0)
> + *sync_method = SYNC_METHOD_FSYNC;
> +#ifdef HAVE_SYNCFS
> + else if (strcmp(optarg, "syncfs") == 0)
> + *sync_method = SYNC_METHOD_SYNCFS;
> +#endif
> + else
> + {
> + pg_log_error("unrecognized sync method: %s", optarg);
> + return false;
> + }

This should probably give a distinct error when syncfs is not supported
than when it's truely recognized.

The patch should handle pg_dumpall, too.

Note that /bin/sync doesn't try to de-duplicate, it does just what you
tell it.

$ strace -e openat,syncfs,fsync sync / / / -f
...
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
+++ exited with 0 +++

Re: cataloguing NOT NULL constraints

2023-09-01 Thread Alvaro Herrera

On 2023-Aug-31, Alvaro Herrera wrote:

> Hmm, that's some weird code I left there all right.  Can you please try
> this patch?  (Not final; I'll review it more completely later,
> particularly to add this test case.)

The change in MergeAttributesIntoExisting turned out to be close but not
quite there, so I pushed another version of the fix.

In case you're wondering, the change in MergeConstraintsIntoExisting is
a related but different case, for which I added the other test case you
see there.

I also noticed, while looking at this, that there's another problem when
a child has a NO INHERIT not-null constraint and the parent has a
primary key in the same column.  It should refuse, or take over by
marking it no longer NO INHERIT.  But it just accepts silently and all
appears to be good.  The problems appear when you add a child to that
child.  I'll look into this later; it's not exactly the same code.  At
least it's not a crasher.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Adding a pg_get_owned_sequence function?

On Fri, Sep 01, 2023 at 01:31:43PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> I wonder if it'd be possible to just remove pg_get_serial_sequence().
> 
> A quick search at http://codesearch.debian.net/ finds uses of it
> in packages like gdal, qgis, rails, ...  We could maybe get rid of
> it after a suitable deprecation period, but I think we can't just
> summarily remove it.

Given that, I'd still vote for marking it deprecated, but I don't feel
strongly about actually removing it anytime soon (or anytime at all,
really).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: GenBKI emits useless open;close for catalogs without rows

Alvaro Herrera  writes:
> On 2023-Sep-01, Matthias van de Meent wrote:
>> A potential addition to the patch would to stop manually closing
>> relations: initdb and check-world succeed without manual 'close'
>> operations because the 'open' command auto-closes the previous open
>> relation (in boot_openrel). Testing also suggests that the last opened
>> relation apparently doesn't need closing - check-world succeeds
>> without issues (incl. with TAP enabled). That is therefore implemented
>> in attached patch 2 - it removes the 'close' syntax in its entirety.

> Hmm, what happens with the last relation in the bootstrap process?  Is
> closerel() called via some other path for that one?

Taking a quick census of existing closerel() callers: there is
cleanup() in bootstrap.c, but it's called uncomfortably late
and outside any transaction, so I misdoubt that it works
properly if asked to actually shoulder any responsibility.
(A little code reshuffling could fix that.)
There are also a couple of low-level elog warnings in CREATE
that would likely get triggered, though I suppose we could just
remove those elogs.

I guess my reaction to this patch is "why bother?".  It seems
unlikely to yield any measurable benefit, though of course
that guess could be wrong.

regards, tom lane

Re: Inefficiency in parallel pg_restore with many tables

On Fri, Sep 01, 2023 at 01:41:41PM -0400, Tom Lane wrote:
> I've not actually looked at any of these patchsets after the first one.
> I have added myself as a reviewer and will hopefully get to it within
> a week or so.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: GenBKI emits useless open;close for catalogs without rows

2023-09-01 Thread Matthias van de Meent

On Fri, 1 Sept 2023 at 19:43, Alvaro Herrera  wrote:
>
> On 2023-Sep-01, Matthias van de Meent wrote:
>
> > A potential addition to the patch would to stop manually closing
> > relations: initdb and check-world succeed without manual 'close'
> > operations because the 'open' command auto-closes the previous open
> > relation (in boot_openrel). Testing also suggests that the last opened
> > relation apparently doesn't need closing - check-world succeeds
> > without issues (incl. with TAP enabled). That is therefore implemented
> > in attached patch 2 - it removes the 'close' syntax in its entirety.
>
> Hmm, what happens with the last relation in the bootstrap process?  Is
> closerel() called via some other path for that one?

There is a final cleanup() call that closes the last open boot_reldesc
relation (if any) at the end of BootstrapModeMain, after boot_yyparse
has completed and its changes have been committed.

- Matthias

Re: GenBKI emits useless open;close for catalogs without rows

2023-09-01 Thread Alvaro Herrera

On 2023-Sep-01, Matthias van de Meent wrote:

> A potential addition to the patch would to stop manually closing
> relations: initdb and check-world succeed without manual 'close'
> operations because the 'open' command auto-closes the previous open
> relation (in boot_openrel). Testing also suggests that the last opened
> relation apparently doesn't need closing - check-world succeeds
> without issues (incl. with TAP enabled). That is therefore implemented
> in attached patch 2 - it removes the 'close' syntax in its entirety.

Hmm, what happens with the last relation in the bootstrap process?  Is
closerel() called via some other path for that one?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

Re: Inefficiency in parallel pg_restore with many tables

Nathan Bossart  writes:
> I'm hoping to commit these patches at some point in the current commitfest.
> I don't sense anything tremendously controversial, and they provide a
> pretty nice speedup in some cases.  Are there any remaining concerns?

I've not actually looked at any of these patchsets after the first one.
I have added myself as a reviewer and will hopefully get to it within
a week or so.

regards, tom lane

Re: Adding a pg_get_owned_sequence function?

Nathan Bossart  writes:
> I wonder if it'd be possible to just remove pg_get_serial_sequence().

A quick search at http://codesearch.debian.net/ finds uses of it
in packages like gdal, qgis, rails, ...  We could maybe get rid of
it after a suitable deprecation period, but I think we can't just
summarily remove it.

regards, tom lane

GenBKI emits useless open;close for catalogs without rows

2023-09-01 Thread Matthias van de Meent

Hi,

Whilst looking at PostgreSQL's bootstrapping process, I noticed that
postgres.bki contains quite a few occurrances of the pattern "open
$catname; close $catname".
I suppose this pattern isn't too expensive, but according to my
limited research a combined open+close cycle doens't do anything
meaningful, so it does waste some CPU cycles in the process.

The attached patch 1 removes the occurances of those combined
open/close statements in postgresql.bki. Locally it passes
check-world, so I assume that opening and closing a table is indeed
not required for initializing a data-less catalog during
bootstrapping.

A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


0002-Remove-the-bki-close-command.patch
Description: Binary data


0001-Stop-emitting-open-nodata-close-patterns-in-genbki.p.patch
Description: Binary data

Re: Inefficiency in parallel pg_restore with many tables

On Tue, Jul 25, 2023 at 11:53:36AM -0700, Nathan Bossart wrote:
> Here is a polished patch set for this approach.  I've also added a 0004
> that replaces the open-coded heap in pg_dump_sort.c with a binaryheap.
> IMHO these patches are in decent shape.

I'm hoping to commit these patches at some point in the current commitfest.
I don't sense anything tremendously controversial, and they provide a
pretty nice speedup in some cases.  Are there any remaining concerns?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Adding a pg_get_owned_sequence function?

On Fri, Jun 09, 2023 at 08:19:44PM +0100, Dagfinn Ilmari Mannsåker wrote:
> I've always been annoyed by the fact that pg_get_serial_sequence takes
> the table and returns the sequence as strings rather than regclass. And
> since identity columns were added, the name is misleading as well (which
> is even acknowledged in the docs, together with a suggestion for a
> better name).
> 
> So, instead of making excuses in the documentation, I thought why not
> add a new function which addresses all of these issues, and document the
> old one as a backward-compatibilty wrapper?

This sounds generally reasonable to me.  That note has been there since
2006 (2b2a507).  I didn't find any further discussion about this on the
lists.

> +A backwards-compatibility wrapper
> +for pg_get_owned_sequence, which
> +uses text for the table and sequence names instead of
> +regclass.  The first parameter is a table name with 
> optional

I wonder if it'd be possible to just remove pg_get_serial_sequence().
Assuming 2b2a507 removed the last use of it in pg_dump, any dump files
created on versions >= v8.2 shouldn't use it.  But I suppose it wouldn't be
too much trouble to keep it around for anyone who happens to need it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Is there a complete doc to describe pg's traction implementation in detail?

2023-09-01 Thread jacktby jacktby

Casual Meson fixups

2023-09-01 Thread Tristan Partin

The first two patches in the series are re-proposals that had previously 
been approved[0] by Andres, but fell through the cracks.


The only patch that _could_ be controversial is probably the last one, 
but from my understanding it would match up with the autotools build.


One thing that I did notice while testing this patch is that Muon 
doesn't build postgres without coercing the build a bit. I had to 
disable nls and plpython. The nls issue could be fixed with a bump to 
Meson 0.59, which introduces import(required:). nls isn't supported in 
Muon unfortunately at the moment. The plpython issue is that it doesn't 
understand pymod.find_installation(required:), which is a bug in Muon.


Muon development has slowed quite a bit this year. Postgres is probably 
the largest project which tries its best to support Muon. It seems like 
if we want to keep supporting Muon, we should get a buildfarm machine to 
use it instead of Meson to catch regressions. OR we should contemplate 
removing support for it.


Alternatively someone (me?) could step up and provide some patches to 
Muon to make the postgres experience better. But I wonder if any 
Postgres user even uses Muon to build it.


--
Tristan Partin
Neon (https://neon.tech)
From 462ea170b1410d7916a25ffa6d27dc45284db7e3 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 17 May 2023 10:36:52 -0500
Subject: [PATCH v1 2/7] Add Meson override for libpq

Meson has the ability to do transparent overrides when projects are used
as subprojects. For instance, say I am building a Postgres extension. I
can define Postgres to be a subproject of my extension given the
following wrap file:

[wrap-git]
url = https://git.postgresql.org/git/postgresql.git
revision = master
depth = 1

[provide]
dependency_names = libpq

Then in my extension (root project), I can have the following line
snippet:

libpq = dependency('libpq')

This will tell Meson to transparently compile libpq prior to it
compiling my extension (because I depend on libpq) if libpq isn't found
on the host system.
---
 src/interfaces/libpq/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/interfaces/libpq/meson.build b/src/interfaces/libpq/meson.build
index 80e6a15adf..6d18970e81 100644
--- a/src/interfaces/libpq/meson.build
+++ b/src/interfaces/libpq/meson.build
@@ -84,6 +84,8 @@ libpq = declare_dependency(
   include_directories: [include_directories('.')]
 )
 
+meson.override_dependency('libpq', libpq)
+
 pkgconfig.generate(
   name: 'libpq',
   description: 'PostgreSQL libpq library',
-- 
Tristan Partin
Neon (https://neon.tech)

From 5455426c9944ff8c8694db46929eaa37e03d907f Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 1 Sep 2023 11:07:40 -0500
Subject: [PATCH v1 7/7] Disable building contrib targets by default

This matches the autotools build.
---
 contrib/meson.build  | 4 +++-
 contrib/oid2name/meson.build | 2 +-
 contrib/vacuumlo/meson.build | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/contrib/meson.build b/contrib/meson.build
index 618b1c50c8..06c579def5 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -1,6 +1,8 @@
 # Copyright (c) 2022-2023, PostgreSQL Global Development Group
 
-contrib_mod_args = pg_mod_args
+# These will get built during test runs or if the contrib target is ran.
+contrib_bin_args = default_bin_args + { 'build_by_default': false }
+contrib_mod_args = pg_mod_args + { 'build_by_default': false }
 
 contrib_data_dir = dir_data_extension
 contrib_data_args = {
diff --git a/contrib/oid2name/meson.build b/contrib/oid2name/meson.build
index 171d2d226b..904c94a0e1 100644
--- a/contrib/oid2name/meson.build
+++ b/contrib/oid2name/meson.build
@@ -13,7 +13,7 @@ endif
 oid2name = executable('oid2name',
   oid2name_sources,
   dependencies: [frontend_code, libpq],
-  kwargs: default_bin_args,
+  kwargs: contrib_bin_args,
 )
 contrib_targets += oid2name
 
diff --git a/contrib/vacuumlo/meson.build b/contrib/vacuumlo/meson.build
index 9fa7380590..a059dda65a 100644
--- a/contrib/vacuumlo/meson.build
+++ b/contrib/vacuumlo/meson.build
@@ -13,7 +13,7 @@ endif
 vacuumlo = executable('vacuumlo',
   vacuumlo_sources,
   dependencies: [frontend_code, libpq],
-  kwargs: default_bin_args,
+  kwargs: contrib_bin_args,
 )
 contrib_targets += vacuumlo
 
-- 
Tristan Partin
Neon (https://neon.tech)

From 6c7b3bde703014ccec91f3f6a8a7af21b36d0a1f Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 17 May 2023 09:40:02 -0500
Subject: [PATCH v1 1/7] Make finding pkg-config(python3) more robust

It is a possibility that the installation can't be found.
---
 meson.build | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 5422885b0a..bd9cdb797e 100644
--- a/meson.build
+++ b/meson.build
@@ -1056,15 +1056,17 @@ endif
 ###
 
 pyopt = get_option('plpython')
+python3_dep = not_found_dep
 if not pyopt.disabled()

Re: PATCH: Add REINDEX tag to event triggers

On Fri, Sep 1, 2023 at 6:40 PM Jim Jones  wrote:
>
> On 01.09.23 11:23, jian he wrote:
> > because the change made in here:
> > https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7
> >
> > I manually slight edited the patch content:
> > from
> >   static List *ChooseIndexColumnNames(List *indexElems);
> >   static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
> >bool isTopLevel);
> > to
> > static List *ChooseIndexColumnNames(const List *indexElems);
> > static void ReindexIndex(const RangeVar *indexRelation, const
> > ReindexParams *params,
> > bool isTopLevel);
> >
> > can you try the attached one.
>
> The patch applies cleanly. However, now the compiler complains about a
> qualifier mismatch
>
> indexcmds.c:2707:33: warning: assignment discards ‘const’ qualifier from
> pointer target type [-Wdiscarded-qualifiers]
>   2707 | c
>
>
> Perhaps 'const ReindexStmt *currentReindexStatement'?
>
> Tests also work just fine. I also tested it against unique and partial
> indexes, and it worked as expected.
>
> Jim
>

Thanks for pointing this out!
also due to commit
https://git.postgresql.org/cgit/postgresql.git/commit/?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7
ExecReindex function input arguments also changed. so we need to
rebase this patch.

change to
currentReindexStatement = unconstify(ReindexStmt*, stmt);
seems to work for me. I tested it, no warning. But I am not 100% sure.

anyway I refactored the patch, making it git applyable
also change from "currentReindexStatement = stmt;" to
"currentReindexStatement = unconstify(ReindexStmt*, stmt);" due to
ExecReindex function changes.
From a3b6775362e07d53b367bcf303a3535d24f42cce Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Fri, 1 Sep 2023 23:51:46 +0800
Subject: [PATCH v3] Add REINDEX tag to event triggers
This turns on the event tag REINDEX. This will now
 return a record for each index that was modified during as part of start/end
 ddl commands.

For concurrent reindex, this will return the OID for the new index. This
includes concurrently reindexing tables and databases. It will only
report the new index records and not the ones destroyed by the
concurrent reindex process.

Author: Garrett Thornburg
---
 doc/src/sgml/event-trigger.sgml |   8 ++
 src/backend/commands/indexcmds.c|  83 +
 src/backend/tcop/utility.c  |  12 +-
 src/include/tcop/cmdtaglist.h   |   2 +-
 src/test/regress/expected/event_trigger.out | 126 
 src/test/regress/sql/event_trigger.sql  |  99 +++
 6 files changed, 327 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 3b6a5361..aa1b359f 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -1013,6 +1013,14 @@
 -
 

+   
+REINDEX
+X
+X
+-
+-
+
+   

 REVOKE
 X
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ab8b81b3..483b0665 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -95,6 +95,8 @@ static char *ChooseIndexNameAddition(const List *colnames);
 static List *ChooseIndexColumnNames(const List *indexElems);
 static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params,
 		 bool isTopLevel);
+static void reindex_event_trigger_collect(Oid oid);
+static void reindex_event_trigger_collect_relation(Oid oid);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
 static Oid	ReindexTable(const RangeVar *relation, const ReindexParams *params,
@@ -111,6 +113,8 @@ static bool ReindexRelationConcurrently(Oid relationOid,
 static void update_relispartition(Oid relationId, bool newval);
 static inline void set_indexsafe_procflags(void);
 
+static ReindexStmt *currentReindexStatement = NULL;
+
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
  */
@@ -2696,6 +2700,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel)
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
 
+	/*
+	 * Make current stmt available for event triggers without directly passing
+	 * the context to every subsequent call.
+	 */
+	currentReindexStatement = unconstify(ReindexStmt*, stmt);
+
 	/* Parse option list */
 	foreach(lc, stmt->params)
 	{
@@ -2776,6 +2786,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel)
  (int) stmt->kind);
 			break;
 	}
+
+	/*
+	 * Clear the reindex stmt global reference now that triggers should have
+	 * completed
+	 */
+	currentReindexStatement = NULL;
 }
 
 /*
@@ -2827,6 +2843,8 @@ ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool is
 
 		newparams.options |=

Add const qualifiers

2023-09-01 Thread David Steele


Hackers,

I noticed that there was a mismatch between the const qualifiers for 
excludeDirContents in src/backend/backup/backup/basebackup.c and 
src/bin/pg_rewind/file_map.c and that led me to use ^static const.*\*.*= 
to do a quick search for similar cases.


I think at the least we should make excludeDirContents match, but the 
rest of the changes seem like a good idea as well.


Regards,
-David

diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c 
b/contrib/fuzzystrmatch/fuzzystrmatch.c
index 5686497983..fc9cfbeda4 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -55,7 +55,7 @@ static void _soundex(const char *instr, char *outstr);
 #define SOUNDEX_LEN 4
 
 /* 
ABCDEFGHIJKLMNOPQRSTUVWXYZ */
-static const char *soundex_table = "01230120022455012623010202";
+static const char *const soundex_table = "01230120022455012623010202";
 
 static char
 soundex_code(char letter)
diff --git a/contrib/pgcrypto/pgp-armor.c b/contrib/pgcrypto/pgp-armor.c
index 9128756647..bfc90af063 100644
--- a/contrib/pgcrypto/pgp-armor.c
+++ b/contrib/pgcrypto/pgp-armor.c
@@ -178,8 +178,8 @@ pg_base64_dec_len(unsigned srclen)
  * PGP armor
  */
 
-static const char *armor_header = "-BEGIN PGP MESSAGE-\n";
-static const char *armor_footer = "\n-END PGP MESSAGE-\n";
+static const char *const armor_header = "-BEGIN PGP MESSAGE-\n";
+static const char *const armor_footer = "\n-END PGP MESSAGE-\n";
 
 /* CRC24 implementation from rfc2440 */
 #define CRC24_INIT 0x00b704ceL
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b42711f574..d03c961678 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -228,7 +228,7 @@ static const FormData_pg_attribute a6 = {
.attislocal = true,
 };
 
-static const FormData_pg_attribute *SysAtt[] = {, , , , , };
+static const FormData_pg_attribute *const SysAtt[] = {, , , , , 
};
 
 /*
  * This function returns a Form_pg_attribute pointer for a system attribute.
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 97b0ef22ac..4fae852666 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -316,9 +316,9 @@ typedef void (*rsv_callback) (Node *node, deparse_context 
*context,
  * --
  */
 static SPIPlanPtr plan_getrulebyoid = NULL;
-static const char *query_getrulebyoid = "SELECT * FROM pg_catalog.pg_rewrite 
WHERE oid = $1";
+static const char *const query_getrulebyoid = "SELECT * FROM 
pg_catalog.pg_rewrite WHERE oid = $1";
 static SPIPlanPtr plan_getviewrule = NULL;
-static const char *query_getviewrule = "SELECT * FROM pg_catalog.pg_rewrite 
WHERE ev_class = $1 AND rulename = $2";
+static const char *const query_getviewrule = "SELECT * FROM 
pg_catalog.pg_rewrite WHERE ev_class = $1 AND rulename = $2";
 
 /* GUC parameters */
 bool   quote_all_identifiers = false;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 84e7ad4d90..c25c697a06 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -112,7 +112,7 @@ typedef struct
 #error XLOG_BLCKSZ must be between 1KB and 1MB
 #endif
 
-static const char *memory_units_hint = gettext_noop("Valid units for this 
parameter are \"B\", \"kB\", \"MB\", \"GB\", and \"TB\".");
+static const char *const memory_units_hint = gettext_noop("Valid units for 
this parameter are \"B\", \"kB\", \"MB\", \"GB\", and \"TB\".");
 
 static const unit_conversion memory_unit_conversion_table[] =
 {
@@ -149,7 +149,7 @@ static const unit_conversion memory_unit_conversion_table[] 
=
{""}/* end of table marker 
*/
 };
 
-static const char *time_units_hint = gettext_noop("Valid units for this 
parameter are \"us\", \"ms\", \"s\", \"min\", \"h\", and \"d\".");
+static const char *const time_units_hint = gettext_noop("Valid units for this 
parameter are \"us\", \"ms\", \"s\", \"min\", \"h\", and \"d\".");
 
 static const unit_conversion time_unit_conversion_table[] =
 {
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 905b979947..79548ba06e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -217,8 +217,8 @@ static bool authwarning = false;
  * but here it is more convenient to pass it as an environment variable
  * (no quoting to worry about).
  */
-static const char *boot_options = "-F -c log_checkpoints=false";
-static const char *backend_options = "--single -F -O -j -c 
search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false";
+static const char *const boot_options = "-F -c log_checkpoints=false";
+static const char *const backend_options = "--single -F -O -j -c 
search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false";
 
 /* Additional switches to pass to backend (either boot or standalone) */
 static char *extra_options = "";
diff

Re: Initdb-time block size specification

On Thu, Aug 31, 2023 at 2:32 PM David Christensen
 wrote:
> Here's a patch atop the series which converts to 16-bit uints and
> passes regressions, but I don't consider well-vetted at this point.

For what it's worth, my gut reaction to this patch series is similar
to that of Andres: I think it will be a disaster. If the disaster is
not evident to us, that's far more likely to mean that we've failed to
test the right things than it is to mean that there is no disaster.

I don't see that there is a lot of upside, either. I don't think we
have a lot of evidence that changing the block size is really going to
help performance. In fact, my guess is that there are large amounts of
code that are heavily optimized, without the authors even realizing
it, for 8kB blocks, because that's what we've always had. If we had
much larger or smaller blocks, the structure of heap pages or of the
various index AMs used for blocks might no longer be optimal, or might
be less optimal than they are for an 8kB block size. If you use really
large blocks, your blocks may need more internal structure than we
have today in order to avoid CPU inefficiencies. I suspect there's
been so little testing of non-default block sizes that I wouldn't even
count on the code to not be outright buggy.

If we could find a safe way to get rid of full page writes, I would
certainly agree that that was worth considering. I'm not sure that
anything in this thread adds up to that being a reasonable way to go,
but the savings would be massive.

I feel like the proposal here is a bit like deciding to change the
speed limit on all American highways from 65 mph or whatever it is to
130 mph or 32.5 mph and see which way works out best. The whole
infrastructure has basically been designed around the current rules.
The rate of curvature of the roads is appropriate for the speed that
you're currently allowed to drive on them. The vehicles are optimized
for long-term operation at about that speed. The people who drive the
vehicles are accustomed to driving at that speed, and the people who
maintain them are accustomed to the problems that happen when you
drive them at that speed. Just changing the speed limit doesn't change
all that other stuff, and changing all that other stuff is a truly
massive undertaking. Maybe this example somewhat overstates the
difficulties here, but I do think the difficulties are considerable.
The fact that we have 8kB block sizes has affected the thinking of
hundreds of developers over decades in making thousands or tens of
thousands or hundreds of thousands of decisions about algorithm
selection and page format and all kinds of stuff. Even if some other
page size seems to work better in a certain context, it's pretty hard
to believe that it has much chance of being better overall, even
without the added overhead of run-time configuration.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Buildfarm failures on urocryon

2023-09-01 Thread Mark Wong

Hi,

On Fri, Sep 01, 2023 at 11:27:47AM +0530, vignesh C wrote:
> Hi,
> 
> Recently urocryon has been failing with the following errors at [1]:
> checking for icu-uc icu-i18n... no
> configure: error: ICU library not found
> If you have ICU already installed, see config.log for details on the
> failure.  It is possible the compiler isn't looking in the proper directory.
> Use --without-icu to disable ICU support.
> 
> configure:8341: checking whether to build with ICU support
> configure:8371: result: yes
> configure:8378: checking for icu-uc icu-i18n
> configure:8440: result: no
> configure:8442: error: ICU library not found
> If you have ICU already installed, see config.log for details on the
> failure.  It is possible the compiler isn't looking in the proper directory.
> Use --without-icu to disable ICU support.
> 
> Urocryon has been failing for the last 17 days.
> 
> I think ICU libraries need to be installed in urocryon to fix this issue.

Oops, that's when I upgraded the build farm client (from v14 to v17).  I
think it's fixed now...

Regards,
Mark

Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-01 Thread Ranier Vilela

Hi,

If a null locale is reached in these paths.
elog will dereference a null pointer.

best regards,

Ranier Vilela


0001-Avoid-a-possible-dereference-a-null-pointer.patch
Description: Binary data

Re: trying again to get incremental backup

Hey, thanks for the reply.

On Thu, Aug 31, 2023 at 6:50 PM David Steele  wrote:
> pg_subtrans, at least, can be ignored since it is excluded from the
> backup and not required for recovery.

I agree...

> Welcome to the club!

Thanks for the welcome, but being a member feels *terrible*. :-)

> I do not. My conclusion back then was that validating a physical
> comparison would be nearly impossible without changes to Postgres to
> make the primary and standby match via replication. Which, by the way, I
> still think would be a great idea. In principle, at least. Replay is
> already a major bottleneck and anything that makes it slower will likely
> not be very popular.

Fair point. But maybe the bigger issue is the work involved. I don't
think zeroing the hole in all cases would likely be that expensive,
but finding everything that can cause the standby to diverge from the
primary and fixing all of it sounds like an unpleasant amount of
effort. Still, it's good to know that I'm not missing something
obvious.

> No objections to 0001/0002.

Cool.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Should we use MemSet or {0} for struct initialization?

2023-09-01 Thread Jelte Fennema

On Fri, 1 Sept 2023 at 15:25, John Naylor 
wrote:
> On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema  wrote:
> > The C standard says:
> > > When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation that
correspond to any padding bytes take unspecified values.
> >
> > So if you set any of the fields after a MemSet, the values of the
> > padding bytes that were set to 0 are now unspecified. It seems much
> > safer to actually spell out the padding fields of a hash key.
>
> No, the standard is telling you why you need to memset if consistency of
padding bytes matters.

Maybe I'm misunderstanding the sentence from the C standard I quoted. But
under my interpretation it means that even an assignment to a field of a
struct causes the padding bytes to take unspecified (but not undefined)
values, because of the "including in a member object" part of the sentence.
It's ofcourse possible that all compilers relevant to Postgres never
actually change padding when assigning to a field.

Re: Should we use MemSet or {0} for struct initialization?

2023-09-01 Thread Chapman Flack

On 2023-09-01 09:25, John Naylor wrote:
On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema  
wrote:

The C standard says:
> When a value is stored in an object of structure or union type,
> including in a member object, the bytes of the object representation that
> correspond to any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency 
of

padding bytes matters.

Um, I'm in no way a language lawyer for recent C specs, but the language
Jelte Fennema quoted is also present in the rather old 9899 TC2 draft
I still have around from years back, and in particular it does say
that upon assignment, padding bytes ▶take◀ unspecified values, not
merely that they retain whatever unspecified values they may have had
before. There is a footnote attached (in 9899 TC2) that says "Thus,
for example, structure assignment need not copy any padding bits."
If that footnote example were normative, it would be reassuring,
because you could assume that padding bits not copied are unchanged
and remember what you originally memset() them to. So that would be
nice. But everything about the form and phrasing of the footnote
conveys that it isn't normative. And the normative text does appear
to be saying that those padding bytes ▶take◀ unspecified values upon,
assignment to the object, even if you may have memset() them before.
Or at least to be saying that's what could happen, in some 
implementation

on some architecture, and it would be standard-conformant if it did.

Perhaps there is language elsewhere in the standard that pins it down
to the way you've interpreted it? If you know where that language
is, could you point to it?

Regards,
-Chap

Re: Should we use MemSet or {0} for struct initialization?

2023-09-01 Thread John Naylor

On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema  wrote:

> The C standard says:
> > When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation that
correspond to any padding bytes take unspecified values.
>
> So if you set any of the fields after a MemSet, the values of the
> padding bytes that were set to 0 are now unspecified. It seems much
> safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency of
padding bytes matters.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Fix a typo in decode.c

On Fri, Sep 1, 2023 at 5:09 PM Zhijie Hou (Fujitsu)
 wrote:
>
> When reading the code, I noticed a typo in the description of WAL record.
>
> /*
> - * Decode XLOG_HEAP2_MULTI_INSERT_insert record into multiple tuplebufs.
> + * Decode XLOG_HEAP2_MULTI_INSERT record into multiple tuplebufs.
>   *
>
> And attach a small patch to fix it.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: sandboxing untrusted code

On Thu, Aug 31, 2023 at 8:57 PM Jeff Davis  wrote:
> > As a refresher, the scenario I'm talking about is any one in which
> > one
> > user, who I'll call Bob, does something that results in executing
> > code
> > provided by another user, who I'll call Alice. The most obvious way
> > that this can happen is if Bob performs some operation that targets a
> > table owned by Alice. That operation might be DML, like an INSERT or
> > UPDATE; or it might be some other kind of maintenance command that
> > can
> > cause code execution, like REINDEX, which can evaluate index
> > expressions.
>
> REINDEX executes index expressions as the table owner. (You are correct
> that INSERT executes index expressions as the inserting user.)

I was speaking here of who provided the code, rather than whose
credentials were used to execute it. The index expressions are
provided by the table owner no matter who evaluates them in a
particular case.

> > 1. Compute stuff. There's no restriction on the permissible amount of
> > compute; if you call untrusted code, nothing prevents it from running
> > forever.
> > 2. Call other code. This may be done by a function call or a command
> > such as CALL or DO, all subject to the usual permissions checks but
> > no
> > further restrictions.
> > 3. Access the current session state, without modifying it. For
> > example, executing SHOW or current_setting() is fine.
> > 4. Transiently modify the current session state in ways that are
> > necessarily reversed before returning to the caller. For example, an
> > EXCEPTION block or a configuration change driven by proconfig is
> > fine.
> > 5. Produce messages at any log level. This includes any kind of
> > ERROR.
>
> Nothing in that list really exercises privileges (except #2?). If those
> are the allowed set of things a sandboxed function can do, is a
> sandboxed function equivalent to a function running with no privileges
> at all?

Close but not quite. As you say, #2 does exercise privileges. Also,
even if no privileges are exercised, you could still refer to
CURRENT_ROLE, and I think you could also call a function like
has_table_privilege.  Your identity hasn't changed, but you're
restricted from exercising some of your privileges. Really, you still
have them, but they're just not available to you in that situation.

> Please explain #2 in a bit more detail. Whose EXECUTE privileges would
> be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the
> called code also be sandboxed?

Nothing in this proposed system has any impact on whose privileges are
used in any particular context, so any privilege checks conducted
pursuant to #2 are performed as the same user who would perform them
today. Whether the called code would be sandboxed depends on how the
rules I articulated in the previous email would apply to it. Since
those rules depend on the user IDs, if the called code is owned by the
same user as the calling code and is SECURITY INVOKER, then those
rules apply in the same way and the same level of sandboxing will
apply. But if the called function is owned by a different user or is
SECURITY DEFINER, then the rules might apply differently to the called
code than the calling code. It's possible this isn't quite good enough
and that some adjustments to the rules are necessary; I'm not sure.

> Let me qualify that: if the function is written by Alice, and the code
> is able to really exercise the privileges of the caller (Bob), then it
> seems really hard to make it safe for the caller.
>
> If the function is sandboxed such that it's not really using Bob's
> privileges (it's just nominally running as Bob) that's a much more
> tractable problem.

Agreed.

> One complaint (not an objection, because I don't think we have
> the luxury of objecting to viable proposals when it comes to improving
> our security model):
>
> Although your proposal sounds like a good security backstop, it feels
> like it's missing the point that there are different _kinds_ of
> functions. We already have the IMMUTABLE marker and we already have
> runtime checks to make sure that immutable functions can't CREATE
> TABLE; why not build on that mechanism or create new markers?

I haven't ruled that out completely, but there's some subtlety here
that doesn't exist in those other cases. If the owner of a function
marks it wrongly in terms of volatility or parallel safety, then they
might make queries run more slowly than they should, or they might
make queries return wrong answers, or error out, or even end up with
messed-up indexes. But none of that threatens the stability of the
system in any very deep way, or the security of the system. It's no
different than putting a CHECK (false) constraint on a table, or
something like that: it might make the system not work, and if that
happens, then you can fix it. Here, however, we can't trust the owners
of functions to label those functions accurately. It won't do for
Alice to create a function and then apply the

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-01 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thank you for reviewing! New patch can be available in [1].

> + /*
> + * Note: This must be done after doing the pg_resetwal command because
> + * pg_resetwal would remove required WALs.
> + */
> + if (count_old_cluster_logical_slots())
> + {
> + start_postmaster(_cluster, true);
> + create_logical_replication_slots();
> + stop_postmaster(false);
> + }
> 
> Can we combine this code with the code in the function
> issue_warnings_and_set_wal_level()? That will avoid starting/stopping
> the server for creating slots.

Yeah, I can. But create_logical_replication_slots() must be done before doing
"initdb --sync-only", so they put before that.
The name is setup_new_cluster().

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-01 Thread Hayato Kuroda (Fujitsu)

Dear Peter,
 
Thanks for reviewing! New patch can be available in [1].

> 
> ==
> src/bin/pg_upgrade/check.c
> 
> 1. check_old_cluster_for_valid_slots
> 
> + /* Quick exit if the cluster does not have logical slots. */
> + if (count_old_cluster_logical_slots() == 0)
> + return;
> 
> /Quick exit/Quick return/
> 
> I know they are kind of the same, but the reason I previously
> suggested this change was to keep it consistent with the similar
> comment that is already in
> check_new_cluster_logical_replication_slots().

Fixed.

> 2. check_old_cluster_for_valid_slots
> 
> + /*
> + * Do additional checks to ensure that confirmed_flush LSN of all the slots
> + * is the same as the latest checkpoint location.
> + *
> + * Note: This can be satisfied only when the old cluster has been shut
> + * down, so we skip this live checks.
> + */
> + if (!live_check)
> 
> missing word
> 
> /skip this live checks./skip this for live checks./

Fixed.

> src/bin/pg_upgrade/controldata.c
> 
> 3.
> + /*
> + * Read the latest checkpoint location if the cluster is PG17
> + * or later. This is used for upgrading logical replication
> + * slots. Currently, we need it only for the old cluster but
> + * didn't add additional check for the similicity.
> + */
> + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> 
> /similicity/simplicity/
> 
> SUGGESTION
> Currently, we need it only for the old cluster but for simplicity
> chose not to have additional checks.

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 4. get_old_cluster_logical_slot_infos_per_db
> 
> + /*
> + * The temporary slots are expressly ignored while checking because such
> + * slots cannot exist after the upgrade. During the upgrade, clusters are
> + * started and stopped several times so that temporary slots will be
> + * removed.
> + */
> + res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;");
> 
> IIUC, the removal of temp slots is just a side-effect of the
> start/stop; not the *reason* for the start/stop. So, the last sentence
> needs some modification
> 
> BEFORE
> During the upgrade, clusters are started and stopped several times so
> that temporary slots will be removed.
> 
> SUGGESTION
> During the upgrade, clusters are started and stopped several times
> causing any temporary slots to be removed.
>

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-01 Thread Hayato Kuroda (Fujitsu)

Dear Dilip,

Thank you for reviewing! 

> 
> 1.
> + conn = connectToServer(_cluster, "template1");
> +
> + prep_status("Checking for logical replication slots");
> +
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> 
> 
> I think we should add some comment saying this query will only fetch
> logical slots because the database name will always be NULL in the
> physical slots.  Otherwise looking at the query it is very confusing
> how it is avoiding the physical slots.

Hmm, the query you pointed out does not check the database of the slot...
We are fetching only logical slots by the condition "slot_type = 'logical'",
I think it is too trivial to describe in the comment.
Just to confirm - pg_replication_slots can see alls the slots even if the 
database
is not current one.

```
tmp=# SELECT slot_name, slot_type, database FROM pg_replication_slots where 
database != current_database();
 slot_name | slot_type | database 
---+---+--
 test  | logical   | postgres
(1 row)
```

If I misunderstood something, please tell me...

> 2.
> +void
> +get_old_cluster_logical_slot_infos(void)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> +
> + pg_log(PG_VERBOSE, "\nsource databases:");
> 
> I think we need to change some headings like "slot info source
> databases:"  Or add an extra message saying printing slot information.
> 
> Before this patch, we were printing all the relation information so
> message ordering was quite clear e.g.
> 
> source databases:
> Database: "template1"
> relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
> relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
> reltblspace: ""
> Database: "postgres"
> relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
> relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
> reltblspace: ""
> 
> But after this patch slot information is also getting printed in a
> similar fashion so it's very confusing now.  Refer
> get_db_and_rel_infos() for how it is fetching all the relation
> information first and then printing them.
> 
> 
> 
> 
> 3. One more problem is that the slot information and the execute query
> messages are intermingled so it becomes more confusing, see the below
> example of the latest messaging.  I think ideally we should execute
> these queries first
> and then print all slot information together instead of intermingling
> the messages.
> 
> source databases:
> executing: SELECT pg_catalog.set_config('search_path', '', false);
> executing: SELECT slot_name, plugin, two_phase FROM
> pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
> database = current_database() AND temporary IS FALSE;
> Database: "template1"
> executing: SELECT pg_catalog.set_config('search_path', '', false);
> executing: SELECT slot_name, plugin, two_phase FROM
> pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
> database = current_database() AND temporary IS FALSE;
> Database: "postgres"
> slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0
> 
> 4.  Looking at the above two comments I feel that now the order should be like
> - Fetch all the db infos
> get_db_infos()
> - loop
>get_rel_infos()
>get_old_cluster_logical_slot_infos()
> 
> -- and now print relation and slot information per database
>  print_db_infos()

Fixed like that. It seems that we go back to old style...
Now the debug prints are like below:

```
source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: 
""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: 
""
Logical replication slots within the database:
slotname: "old1", plugin: "test_decoding", two_phase: 0
slotname: "old2", plugin: "test_decoding", two_phase: 0
slotname: "old3", plugin: "test_decoding", two_phase: 0
```


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description:  v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-09-01 Thread Tomas Vondra



On 9/1/23 10:00, Alexander Lakhin wrote:
> Hello Thomas,
> 
> 31.08.2023 14:15, Thomas Munro wrote:
> 
>> We have a signal that is pending and not blocked, so I don't
>> immediately know why poll() hasn't returned control.
> 
> When I worked at the Postgres Pro company, we observed a similar lockup
> under rather specific conditions (we used Elbrus CPU and the specific
> Elbrus
> compiler (lcc) based on edg).
> I managed to reproduce that lockup and Anton Voloshin investigated it.
> The issue was caused by the compiler optimization in WaitEventSetWait():
>     waiting = true;
> ...
>     while (returned_events == 0)
>     {
> ...
>     if (set->latch && set->latch->is_set)
>     {
> ...
>     break;
>     }
> 
> In that case, compiler decided that it may place the read
> "set->latch->is_set" before the write "waiting = true".
> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
> issue for us.)
> I can't provide more details for now, but maybe you could look at the
> binary
> code generated on the target platform to confirm or reject my guess.
> 

Hmmm, I'm not very good at reading the binary code, but here's what
objdump produced for WaitEventSetWait. Maybe someone will see what the
issue is.

I thought about maybe just adding the barrier in the code, but then how
would we know it's the issue and this fixed it? It happens so rarely we
can't make any conclusions from a couple runs of tests.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company00805ba0 :
  805ba0: d102c3ff  sub sp, sp, #176
  805ba4: 6d0423e9  stp d9, d8, [sp, #64]
  805ba8: a9057bfd  stp x29, x30, [sp, #80]
  805bac: a9066ffc  stp x28, x27, [sp, #96]
  805bb0: a90767fa  stp x26, x25, [sp, #112]
  805bb4: a9085ff8  stp x24, x23, [sp, #128]
  805bb8: a90957f6  stp x22, x21, [sp, #144]
  805bbc: a90a4ff4  stp x20, x19, [sp, #160]
  805bc0: 910143fd  add x29, sp, #80
  805bc4: 717f  cmp w3, #0
  805bc8: f90007e2  str x2, [sp, #8]
  805bcc: 5400240d  b.le0x80604c 
  805bd0: 2a0403f8  mov w24, w4
  805bd4: 2a0303f5  mov w21, w3
  805bd8: aa0103f3  mov x19, x1
  805bdc: aa0003f4  mov x20, x0
  805be0: b7f801e1  tbnzx1, #63, 0x805c1c 
  805be4: 910083e1  add x1, sp, #32
  805be8: 52800080  mov w0, #4
  805bec: 940623f5  bl  0x98ebc0 
  805bf0: d35ffe68  lsr x8, x19, #31
  805bf4: aa1303f7  mov x23, x19
  805bf8: b4000148  cbz x8, 0x805c20 
  805bfc: 90ffd480  adrpx0, 0x295000 
  805c00: 91175c00  add x0, x0, #1495
  805c04: f0ffd5c1  adrpx1, 0x2c 
  805c08: 9111f021  add x1, x1, #1148
  805c0c: 90ffd822  adrpx2, 0x309000 
  805c10: 91046842  add x2, x2, #282
  805c14: 52807563  mov w3, #939
  805c18: 9404de3e  bl  0x93d510 
  805c1c: 92800017  mov x23, #-1
  805c20: 9de8  adrpx8, 0x9c1000 
  805c24: 9f09  adrpx9, 0x9e5000 
  805c28: 3979e108  ldrbw8, [x8, #3704]
  805c2c: 3488  cbz w8, 0x805c3c 
  805c30: f940f128  ldr x8, [x9, #480]
  805c34: b448  cbz x8, 0x805c3c 
  805c38: b902ad18  str w24, [x8, #684]
  805c3c: b0ffdb09  adrpx9, 0x366000 
  805c40: b0ffdb0a  adrpx10, 0x366000 
  805c44: feeb  adrpx11, 0x9e4000 
  805c48: 52800028  mov w8, #1
  805c4c: 52800319  mov w25, #24
  805c50: 5280073a  mov w26, #57
  805c54: fd446128  ldr d8, [x9, #2240]
  805c58: 9d7b  adrpx27, 0x9b1000 
  805c5c: fd415949  ldr d9, [x10, #688]
  805c60: f9071d68  str x8, [x11, #3640]
  805c64: f90003f3  str x19, [sp]
  805c68: 1410  b   0x805ca8 
  805c6c: 9e620100  scvtf   d0, x8
  805c70: d2d09008  mov x8, #145685290680320
  805c74: f2e825c8  movkx8, #16686, lsl #48
  805c78: 9e670101  fmovd1, x8
  805c7c: d2c80008  mov x8, #70368744177664
  805c80: f2e811e8  movkx8, #16527, lsl #48
  805c84: 1e611800  fdivd0, d0, d1
  805c88: 9e620121  scvtf   d1, x9
  805c8c: 9e670102  fmovd2, x8
  805c90: 1f420020  fmadd   d0, d1, d2, d0
  805c94: 9e780008  fcvtzs  x8, d0
  805c98: cb080277  sub x23, x19, x8
  805c9c: f10006ff  cmp x23, #1
  805ca0: 540012ab  b.lt0x805ef4 
  805ca4: 35001478  cbnzw24, 0x805f30 
  805ca8: f9400a88  ldr x8, [x20, #16]
  805cac: b468  cbz x8, 0x805cb8 
  805cb0: f9400108  ldr x8, [x8]
  805cb4: b5001248  cbnzx8, 0x805efc 
  805cb8: f9401280  ldr x0, [x20, #32]
  805cbc: 2a1703e2  mov w2, w23
  805cc0: b9400281  ldr w1, [x20]
  805cc4: 940626eb  bl  0x98f870 
  805cc8: 37f80cc0  tbnzw0, #31, 0x805e60 
  805ccc: 34001140  cbz w0,

Re: Move bki file pre-processing from initdb to bootstrap


On 01.09.23 14:37, Tom Lane wrote:

Krishnakumar R  writes:

This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.


You haven't provided any iota of evidence why this would be an
improvement.


I had played with similar ideas in the past, because it would shave some 
time of initdb, which would accumulate noticeably over a full test run.


But now with the initdb caching mechanism, I wonder whether this is 
still needed.

Re: Commitfest 2023-09 starts soon


Okay, here we go, starting with:

Status summary: Needs review: 227. Waiting on Author: 37. Ready for 
Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. 
Withdrawn: 1. Total: 337.


(which is less than CF 2023-07)

I have also already applied one round of the waiting-on-author-pruning 
described below (not included in the above figures).



On 31.08.23 10:36, Peter Eisentraut wrote:
Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in 
less than 28 hours.


If you have any patches you would like considered, be sure to add them 
in good time.


All patch authors, and especially experienced hackers, are requested to 
make sure the patch status is up to date.  If the patch is still being 
worked on, it might not need to be in "Needs review".  Conversely, if 
you are hoping for a review but the status is "Waiting on author", then 
it will likely be ignored by reviewers.


There are a number of patches carried over from the PG16 development 
cycle that have been in "Waiting on author" for several months.  I will 
aggressively prune those after the start of this commitfest if there 
hasn't been any author activity by then.

Re: Unlogged relations and WAL-logging

Is the patch 
0003-Remove-unnecessary-smgrimmedsync-when-creating-unlog.patch still 
relevant, or can this commitfest entry be closed?


On 23.08.23 16:40, Heikki Linnakangas wrote:

5. In heapam_relation_set_new_filenode(), we do this:



    /*
 * If required, set up an init fork for an unlogged table 
so that it can
 * be correctly reinitialized on restart.  An immediate 
sync is required
 * even if the page has been logged, because the write did 
not go through
 * shared_buffers and therefore a concurrent checkpoint may 
have moved the
 * redo pointer past our xlog record.  Recovery may as well 
remove it
 * while replaying, for example, XLOG_DBASE_CREATE or 
XLOG_TBLSPC_CREATE
 * record. Therefore, logging is necessary even if 
wal_level=minimal.

 */
    if (persistence == RELPERSISTENCE_UNLOGGED)
    {
    Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
   rel->rd_rel->relkind == RELKIND_MATVIEW ||
   rel->rd_rel->relkind == 
RELKIND_TOASTVALUE);

    smgrcreate(srel, INIT_FORKNUM, false);
    log_smgrcreate(newrnode, INIT_FORKNUM);
    smgrimmedsync(srel, INIT_FORKNUM);
    }


The comment doesn't make much sense, we haven't written nor WAL-logged
any page here, with nor without the buffer cache. It made more sense
before commit fa0f466d53.


Well, it seems to me (and perhaps I am just confused) that complaining
that there's no page written here might be a technicality. The point
is that there's no synchronization between the work we're doing here
-- which is creating a fork, not writing a page -- and any concurrent
checkpoint. So we both need to log it, and also sync it immediately.


I see. I pushed the fix from the other thread that makes smgrcreate()
call register_dirty_segment (commit 4b4798e13). I believe that makes
this smgrimmedsync() unnecessary. If a concurrent checkpoint happens
with a redo pointer greater than this WAL record, it must've received
the fsync request created by smgrcreate(). That depends on the fact that
we write the WAL record *after* smgrcreate(). Subtle..

Hmm, we have a similar smgrimmedsync() call after index build, because
we have written pages directly with smgrextend(skipFsync=true). If no
checkpoints have occurred during the index build, we could call
register_dirty_segment() instead of smgrimmedsync(). That would avoid
the fsync() latency when creating an index on an empty or small index.

This is all very subtle to get right though. That's why I'd like to
invent a new bulk-creation facility that would handle this stuff, and
make the callers less error-prone.


Having a more generic and less error-prone bulk-creation mechanism is 
still on my long TODO list..

Re: Should we use MemSet or {0} for struct initialization?

2023-09-01 Thread Jelte Fennema

On Thu, 31 Aug 2023 at 13:35, Junwang Zhao  wrote:
> > > If the struct has padding or aligned, {0} only guarantee the struct
> > > members initialized to 0, while memset sets the alignment/padding
> > > to 0 as well, but since we will not access the alignment/padding, so
> > > they give the same effect.
> >
> > See above -- if it's used as a hash key, for example, you must clear 
> > everything.
>
> Yeah, if memcmp was used as the key comparison function, there is a problem.

The C standard says:
> When a value is stored in an object of structure or union type, including in 
> a member object, the bytes of the object representation that correspond to 
> any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

Re: Move bki file pre-processing from initdb to bootstrap

Krishnakumar R  writes:
> This patch moves the pre-processing for tokens in the bki file from
> initdb to bootstrap. With these changes the bki file will only be
> opened once in bootstrap and parsing will be done by the bootstrap
> parser.

You haven't provided any iota of evidence why this would be an
improvement.

regards, tom lane

Re: Assert failure in ATPrepAddPrimaryKey

2023-09-01 Thread Alvaro Herrera

On 2023-Sep-01, Richard Guo wrote:

> I ran into an Assert failure in ATPrepAddPrimaryKey() with the query
> below:
> 
> CREATE TABLE t0(c0 boolean);
> CREATE TABLE t1() INHERITS(t0);
> 
> # ALTER TABLE t0 ADD CONSTRAINT m EXCLUDE ((1) WITH =);
> server closed the connection unexpectedly

Ugh, right, I failed to make the new function do nothing for this case;
this had no coverage.  Fix attached, with some additional test cases
based on yours.

Thanks for reporting.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/
>From 6df9aa36f250cd3b131efc91e0991fd231597523 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Sep 2023 13:41:09 +0200
Subject: [PATCH] Only process inheritance for primary keys, not other
 constraint types

---
 src/backend/commands/tablecmds.c  | 12 +---
 src/test/regress/expected/inherit.out | 25 -
 src/test/regress/sql/inherit.sql  | 15 ++-
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d097da3c78..0339774672 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8906,19 +8906,25 @@ ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd,
 {
 	List	   *children;
 	List	   *newconstrs = NIL;
 	ListCell   *lc;
-	IndexStmt  *stmt;
+	IndexStmt  *indexstmt;
+
+	/* No work if not creating a primary key */
+	if (!IsA(cmd->def, IndexStmt))
+		return;
+	indexstmt = castNode(IndexStmt, cmd->def);
+	if (!indexstmt->primary)
+		return;
 
 	/* No work if no legacy inheritance children are present */
 	if (rel->rd_rel->relkind != RELKIND_RELATION ||
 		!rel->rd_rel->relhassubclass)
 		return;
 
 	children = find_inheritance_children(RelationGetRelid(rel), lockmode);
 
-	stmt = castNode(IndexStmt, cmd->def);
-	foreach(lc, stmt->indexParams)
+	foreach(lc, indexstmt->indexParams)
 	{
 		IndexElem  *elem = lfirst_node(IndexElem, lc);
 		Constraint *nnconstr;
 
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dae61b9a0b..59583e1e41 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -2308,9 +2308,32 @@ create table inh_child (a int primary key);
 alter table inh_child inherit inh_parent;		-- nope
 ERROR:  column "a" in child table must be marked NOT NULL
 alter table inh_child alter a set not null;
 alter table inh_child inherit inh_parent;		-- now it works
-drop table inh_parent, inh_child;
+-- don't interfere with other types of constraints
+alter table inh_parent add constraint inh_parent_excl exclude ((1) with =);
+alter table inh_parent add constraint inh_parent_uq unique (a);
+alter table inh_parent add constraint inh_parent_fk foreign key (a) references inh_parent (a);
+create table inh_child2 () inherits (inh_parent);
+create table inh_child3 (like inh_parent);
+alter table inh_child3 inherit inh_parent;
+select conrelid::regclass, conname, contype, coninhcount, conislocal
+ from pg_constraint
+ where conrelid::regclass::text in ('inh_parent', 'inh_child', 'inh_child2', 'inh_child3')
+ order by 2, 1;
+  conrelid  |conname| contype | coninhcount | conislocal 
++---+-+-+
+ inh_child2 | inh_child2_a_not_null | n   |   1 | f
+ inh_child3 | inh_child3_a_not_null | n   |   1 | t
+ inh_child  | inh_child_a_not_null  | n   |   1 | t
+ inh_child  | inh_child_pkey| p   |   0 | t
+ inh_parent | inh_parent_excl   | x   |   0 | t
+ inh_parent | inh_parent_fk | f   |   0 | t
+ inh_parent | inh_parent_pkey   | p   |   0 | t
+ inh_parent | inh_parent_uq | u   |   0 | t
+(8 rows)
+
+drop table inh_parent, inh_child, inh_child2, inh_child3;
 --
 -- test multi inheritance tree
 --
 create table inh_parent(f1 int not null);
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 9ceaec1d78..abe8602682 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -845,9 +845,22 @@ create table inh_parent (a int primary key);
 create table inh_child (a int primary key);
 alter table inh_child inherit inh_parent;		-- nope
 alter table inh_child alter a set not null;
 alter table inh_child inherit inh_parent;		-- now it works
-drop table inh_parent, inh_child;
+
+-- don't interfere with other types of constraints
+alter table inh_parent add constraint inh_parent_excl exclude ((1) with =);
+alter table inh_parent add constraint inh_parent_uq unique (a);
+alter table inh_parent add constraint inh_parent_fk foreign key (a) references inh_parent (a);
+create table

Re: generate syscache info automatically

2023-09-01 Thread John Naylor

On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut 
wrote:

> Attached is an updated patch set.
>
> One win here is that the ObjectProperty lookup now uses a hash function
> instead of a linear search.  So hopefully the performance is better (to
> be confirmed) and it could now be used for things like [0].

+ # XXX This one neither, but if I add it to @skip, PerfectHash will fail.
(???)
+ #FIXME: AttributeRelationId

I took a quick look at this, and attached is the least invasive way to get
it working for now, which is to bump the table size slightly. The comment
says doing this isn't reliable, but it happens to work in this case.
Playing around with the functions is hit-or-miss, and when that fails,
somehow the larger table saves the day.

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 199091305c..3640ee154b 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -223,6 +223,7 @@ foreach my $header (@ARGV)
 
# XXX These catalogs were not covered by the previous hand-maintained 
table.
my @skip = qw(
+   AttributeRelationId
AttrDefaultRelationId EnumRelationId IndexRelationId
LargeObjectRelationId ParameterAclRelationId
PublicationNamespaceRelationId PublicationRelRelationId
@@ -235,8 +236,6 @@ foreach my $header (@ARGV)
SecLabelRelationId SharedSecLabelRelationId
PartitionedRelationId RangeRelationId SequenceRelationId
SubscriptionRelRelationId);
-   # XXX This one neither, but if I add it to @skip, PerfectHash will 
fail. (???)
-   #FIXME: AttributeRelationId
 
# XXX hardcoded ObjectType mapping -- where to put this?
my %objtypes = (
diff --git a/src/tools/PerfectHash.pm b/src/tools/PerfectHash.pm
index e54905a3ef..f343661859 100644
--- a/src/tools/PerfectHash.pm
+++ b/src/tools/PerfectHash.pm
@@ -200,7 +200,8 @@ sub _construct_hash_table
# can be rejected due to touching unused hashtable entries.  In 
practice,
# neither effect seems strong enough to justify using a larger table.)
my $nedges = scalar @keys;   # number of edges
-   my $nverts = 2 * $nedges + 1;# number of vertices
+   # WIP for object properties
+   my $nverts = 2 * $nedges + 3;# number of vertices
 
# However, it would be very bad if $nverts were exactly equal to either
# $hash_mult1 or $hash_mult2: effectively, that hash function would be

Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view

2023-09-01 Thread Jim Jones




On 01.09.23 12:44, Michael Paquier wrote:

I am not sure what you have in mind, but IMO any solution would live
better as long as a solution is:
- not linked to hba.c, handled in a separate code path.
- linked to all configuration files where comments are supported, if
need be.
If I understood you correctly: You mean an independent feature that i.e. 
gets raw lines and parses the inline #comments.


Doing so we could indeed avoid the trouble of messing around with the 
hba.c logic, and it would be accessible to other config files. Very 
interesting thought! It sounds like a much more elegant solution.



Perhaps others have more opinions.
--
Michael


If I hear no objections, I'll try to sketch it as you suggested.

Thanks again for the feedback

Jim

Re: SQL:2011 application time

2023-09-01 Thread Vik Fearing


On 9/1/23 11:30, Peter Eisentraut wrote:
1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
OVERLAPS clause attach to the last column, or to the whole column list? 
In the SQL standard, you can only have one period and it has to be 
listed last, so this question does not arise.  But here we are building 
a more general facility to then build the SQL facility on top of.  So I 
think it doesn't make sense that the range column must be last or that 
there can only be one.  Also, your implementation requires at least one 
non-overlaps column, which also seems like a confusing restriction.


I think the WITHOUT OVERLAPS clause should be per-column, so that 
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would 
be possible.  Then the WITHOUT OVERLAPS clause would directly correspond 
to the choice between equality or overlaps operator per column.


An alternative interpretation would be that WITHOUT OVERLAPS applies to 
the whole column list, and we would take it to mean, for any range 
column, use the overlaps operator, for any non-range column, use the 
equals operator.  But I think this would be confusing and would prevent 
the case of using the equality operator for some ranges and the overlaps 
operator for some other ranges in the same key.


I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
the column or expression it is attached to, and need not be last in line.

--
Vik Fearing

Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view

2023-09-01 Thread Michael Paquier

On Fri, Sep 01, 2023 at 11:32:35AM +0200, Jim Jones wrote:
> Would you be in favor of parsing #comments instead? Given that # is
> currently already being parsed (ignored), it shouldn't add too much
> complexity to the code.

I am not sure what you have in mind, but IMO any solution would live
better as long as a solution is:
- not linked to hba.c, handled in a separate code path.
- linked to all configuration files where comments are supported, if
need be.

Perhaps others have more opinions.
--
Michael


signature.asc
Description: PGP signature

Re: PATCH: Add REINDEX tag to event triggers

2023-09-01 Thread Jim Jones


On 01.09.23 11:23, jian he wrote:

because the change made in here:
https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7

I manually slight edited the patch content:
from
  static List *ChooseIndexColumnNames(List *indexElems);
  static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
   bool isTopLevel);
to
static List *ChooseIndexColumnNames(const List *indexElems);
static void ReindexIndex(const RangeVar *indexRelation, const
ReindexParams *params,
bool isTopLevel);

can you try the attached one.


The patch applies cleanly. However, now the compiler complains about a 
qualifier mismatch


indexcmds.c:2707:33: warning: assignment discards ‘const’ qualifier from 
pointer target type [-Wdiscarded-qualifiers]

 2707 | currentReindexStatement = stmt;


Perhaps 'const ReindexStmt *currentReindexStatement'?

Tests also work just fine. I also tested it against unique and partial 
indexes, and it worked as expected.


Jim

Fix a typo in decode.c

2023-09-01 Thread Zhijie Hou (Fujitsu)

Hi,

When reading the code, I noticed a typo in the description of WAL record.

/*
- * Decode XLOG_HEAP2_MULTI_INSERT_insert record into multiple tuplebufs.
+ * Decode XLOG_HEAP2_MULTI_INSERT record into multiple tuplebufs.
  *

And attach a small patch to fix it.

Best Regards,
Hou Zhijie



0001-fix-typo-in-decode.c.patch
Description: 0001-fix-typo-in-decode.c.patch

Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view

2023-09-01 Thread Jim Jones


Hi Michael

On 01.09.23 03:18, Michael Paquier wrote:

hba.c is complex enough these days (inclusion logic, tokenization of
the items) that I am not in favor of touching its code paths for
anything like that.  This is not something that can apply only to
pg_hba.conf, but to all configuration files.
It is indeed possible to extrapolate it to any configuration file, but 
my point was rather to visualize comments purposefully left by the DBA 
regarding user access (pg_hba and pg_ident).

   And this touches in
adding support for a second type of comment format.  This is one of
these areas where we may want a smarter version of pg_read_file that
returns a SRF for (line_number, line_contents) of a file read?  Note
that it is possible to add comments at the end of a HBA entry already,
like:
local all all trust # My comment, and this is a correct HBA entry.


I also considered parsing the inline #comments - actually it was my 
first idea - but I thought it would leave no option to make an inline 
comment without populating pg_hba_file_rules. But I guess in this case 
one could always write the comment in the line above :)


Would you be in favor of parsing #comments instead? Given that # is 
currently already being parsed (ignored), it shouldn't add too much 
complexity to the code.


Thanks for the feedback.

Jim

On 31.08.23 23:26, Paul Jungwirth wrote:
I've tried to clean up the first four patches to get them ready for
committing, since they could get committed before the PERIOD patch. I
think there is a little more cleanup needed but they should be ready for
a review.

Looking at the patch 0001 "Add temporal PRIMARY KEY and UNIQUE constraints":

Generally, this looks like a good direction. The patch looks
comprehensive, with documentation and tests, and appears to cover all
the required pieces (client programs, ruleutils, etc.).

I have two conceptual questions that should be clarified before we go
much further:

1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT
OVERLAPS clause attach to the last column, or to the whole column list?
In the SQL standard, you can only have one period and it has to be
listed last, so this question does not arise. But here we are building
a more general facility to then build the SQL facility on top of. So I
think it doesn't make sense that the range column must be last or that
there can only be one. Also, your implementation requires at least one
non-overlaps column, which also seems like a confusing restriction.

I think the WITHOUT OVERLAPS clause should be per-column, so that
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would
be possible. Then the WITHOUT OVERLAPS clause would directly correspond
to the choice between equality or overlaps operator per column.

An alternative interpretation would be that WITHOUT OVERLAPS applies to
the whole column list, and we would take it to mean, for any range
column, use the overlaps operator, for any non-range column, use the
equals operator. But I think this would be confusing and would prevent
the case of using the equality operator for some ranges and the overlaps
operator for some other ranges in the same key.

2) The logic hinges on get_index_attr_temporal_operator(), to pick the
equality and overlaps operator for each column. For btree indexes, the
strategy numbers are fixed, so this is straightforward. But for gist
indexes, the strategy numbers are more like recommendations. Are we
comfortable with how this works? I mean, we could say, if you want to
be able to take advantage of the WITHOUT OVERLAPS syntax, you have to
use these numbers, otherwise you're on your own. It looks like the gist
strategy numbers are already hardcoded in a number of places, so maybe
that's all okay, but I feel we should be more explicit about this
somewhere, maybe in the documentation, or at least in code comments.

Besides that, some stylistic comments:

* There is a lot of talk about "temporal" in this patch, but this
functionality is more general than temporal. I would prefer to change
this to more neutral terms like "overlaps".

* The field ii_Temporal in IndexInfo doesn't seem necessary and could be
handled via local variables. See [0] for a similar discussion:

[0]:
https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c...@eisentraut.org

* In gram.y, change withoutOverlapsClause -> without_overlaps_clause for
consistency with the surrounding code.

* No-op assignments like n->without_overlaps = NULL; can be omitted.
(Or you should put them everywhere. But only in some places seems
inconsistent and confusing.)

Re: PATCH: Add REINDEX tag to event triggers

On Wed, Aug 30, 2023 at 8:38 PM Jim Jones  wrote:
>
> Greetings
>
>
>
> I was unable to apply it in 7ef5f5f
>
> $ git apply -v v2-0001-Add-REINDEX-tag-to-event-triggers.patch
> Checking patch doc/src/sgml/event-trigger.sgml...
> Checking patch src/backend/commands/indexcmds.c...
> error: while searching for:
> static List *ChooseIndexColumnNames(List *indexElems);
> static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
>  bool isTopLevel);
> static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
>   
>   Oid relId, Oid oldRelId, void *arg);
> static Oid  ReindexTable(RangeVar *relation, ReindexParams *params,
>
> error: patch failed: src/backend/commands/indexcmds.c:94
> error: src/backend/commands/indexcmds.c: patch does not apply
> Checking patch src/backend/tcop/utility.c...
> Checking patch src/include/tcop/cmdtaglist.h...
> Checking patch src/test/regress/expected/event_trigger.out...
> Hunk #1 succeeded at 556 (offset 2 lines).
> Checking patch src/test/regress/sql/event_trigger.sql...
>
> Am I missing something?
>
> Jim

because the change made in here:
https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7

I manually slight edited the patch content:
from
 static List *ChooseIndexColumnNames(List *indexElems);
 static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
  bool isTopLevel);
to
static List *ChooseIndexColumnNames(const List *indexElems);
static void ReindexIndex(const RangeVar *indexRelation, const
ReindexParams *params,
bool isTopLevel);

can you try the attached one.
diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 3b6a5361..aa1b359f 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -1013,6 +1013,14 @@
 -
 

+   
+REINDEX
+X
+X
+-
+-
+
+   

 REVOKE
 X
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ab8b81b3..b78f1d87 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -95,6 +95,8 @@ static char *ChooseIndexNameAddition(const List *colnames);
 static List *ChooseIndexColumnNames(const List *indexElems);
 static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params,
 		 bool isTopLevel);
+static void reindex_event_trigger_collect(Oid oid);
+static void reindex_event_trigger_collect_relation(Oid oid);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
 static Oid	ReindexTable(const RangeVar *relation, const ReindexParams *params,
@@ -111,6 +113,8 @@ static bool ReindexRelationConcurrently(Oid relationOid,
 static void update_relispartition(Oid relationId, bool newval);
 static inline void set_indexsafe_procflags(void);
 
+static ReindexStmt *currentReindexStatement = NULL;
+
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
  */
@@ -2696,6 +2700,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel)
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
 
+	/*
+	 * Make current stmt available for event triggers without directly passing
+	 * the context to every subsequent call.
+	 */
+	currentReindexStatement = stmt;
+
 	/* Parse option list */
 	foreach(lc, stmt->params)
 	{
@@ -2776,6 +2786,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel)
  (int) stmt->kind);
 			break;
 	}
+
+	/*
+	 * Clear the reindex stmt global reference now that triggers should have
+	 * completed
+	 */
+	currentReindexStatement = NULL;
 }
 
 /*
@@ -2827,6 +2843,8 @@ ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool is
 
 		newparams.options |= REINDEXOPT_REPORT_PROGRESS;
 		reindex_index(indOid, false, persistence, );
+		/* Add the index to event trigger */
+		reindex_event_trigger_collect(indOid);
 	}
 }
 
@@ -2950,6 +2968,9 @@ ReindexTable(const RangeVar *relation, const ReindexParams *params, bool isTopLe
 			ereport(NOTICE,
 	(errmsg("table \"%s\" has no indexes to reindex",
 			relation->relname)));
+
+		/* Create even for the indexes being modified */
+		reindex_event_trigger_collect_relation(heapOid);
 	}
 
 	return heapOid;
@@ -3366,6 +3387,8 @@ ReindexMultipleInternal(const List *relids, const ReindexParams *params)
 			newparams.options |=
 REINDEXOPT_REPORT_PROGRESS | REINDEXOPT_MISSING_OK;
 			reindex_index(relid, false, relpersistence, );
+			/* Add the index to event trigger */
+			reindex_event_trigger_collect(relid);
 			PopActiveSnapshot();
 			/* reindex_index() does the verbose output */
 		}
@@ -3387,6 +3410,9 @@ ReindexMultipleInternal(const List *relids, const ReindexParams

Re: persist logical slots to disk during shutdown checkpoint

On Fri, Sep 1, 2023 at 1:11 PM Ashutosh Bapat
 wrote:
>
> On Thu, Aug 31, 2023 at 7:28 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> >  wrote:
> > >
> > > > +  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> > > > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
> > >
> > > I don't think the LSN reported in this message is guaranteed to be the
> > > confirmed_flush LSN of the slot. It's usually confirmed_flush but not
> > > always. It's the LSN that snapshot builder computes based on factors
> > > including confirmed_flush. There's a chance that this test will fail
> > > sometimes because  of this behaviour.
> > >
> >
> > I think I am missing something here because as per my understanding,
> > the LOG referred by the test is generated in CreateDecodingContext()
> > before which we shouldn't be changing the slot's confirmed_flush LSN.
> > The LOG [1] refers to the slot's persistent value for confirmed_flush,
> > so how it could be different from what the test is expecting.
> >
> > [1]
> > errdetail("Streaming transactions committing after %X/%X, reading WAL
> > from %X/%X.",
> >LSN_FORMAT_ARGS(slot->data.confirmed_flush),
>
> I was afraid that we may move confirmed_flush while creating the
> snapshot builder when creating the decoding context. But I don't see
> any code doing that. So may be we are safe.
>

We are safe in that respect. As far as I understand there is no reason
to be worried.

>
 But if the log message
> changes, this test would fail - depending upon the log message looks a
> bit fragile, esp. when we have a way to access the data directly
> reliably.
>

This message is there from the very begining (b89e1510) and I can't
forsee a reason to change such a message. But even if we change, we
can always change the test output or test accordingly, if required. I
think it is a matter of preference to which way we can write the test,
so let's not argue too much on this. I find current way slightly more
reliable but we can change it if we see any problem.

-- 
With Regards,
Amit Kapila.

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-01 Thread David Geier


Hi,

On 9/1/23 03:25, Peter Geoghegan wrote:

On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
 wrote:

Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue.

It's definitely not an issue.

The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.


They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.

The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.

That makes sense and seems like a much better compromise. Thanks for the 
explanations. Please update the comment to document the corner case and 
how we handle it.


--
David Geier
(ServiceNow)

Re: Synchronizing slots from primary to standby

2023-09-01 Thread Peter Smith

Hi Shveta. Here are some comments for patch v14-0002

The patch is large, so my code review is a WIP... more later next week...

==
GENERAL

1. Patch size

The patch is 2700 lines.  Is it possible to break this up into smaller
self-contained parts to make the reviews more manageable?

~~~

2. PG Docs

I guess there are missing PG Docs for this patch. E.g there are new
GUCs added but I see no documentation yet for them.

~

3. Terms

There are variations of what to call the sync worker
- "Slot sync worker" or
- "slot-sync worker" or
- "slot synchronization worker" or
- "slot-synchronization worker"
- and others

These are all in the comments and messages etc. Better to
search/replace to make a consistent term everywhere.

FWIW, I preferred just to call it "slot-sync worker".

~

4. typedefs

I think multiple new typedefs are added by this patch. IIUC, those
should be included in the file typedef.list so the pg_indent will work
properly.

5. max_slot_sync_workers GUC

There is already a 'max_sync_workers_per_subscription', but that one
is for "tablesync" workers. IMO it is potentially confusing now that
both these GUCs have 'sync_workers' in the name. I think it would be
less ambiguous to change your new GUC to 'max_slotsync_workers'.

==
Commit Message

6. Overview?

I felt the information in this commit message is describing details of
what changes are in this patch but there is no synopsis about the
*purpose* of this patch as a whole. Eg. What is it for?

It seemed like there should be some introductory paragraph up-front
before describing all the specifics.

~~~

7.
For slots to be synchronised, another GUC is added:
synchronize_slot_names: This is a runtime modifiable GUC.

~

If this is added by this patch then how come there is some SGML
describing the same GUC in patch 14-0001? What is the relationship?

~~~

8.
Let us say slots mentioned in 'synchronize_slot_names' on primary belongs to
10 DBs and say the new GUC is set at default value of 2, then each worker
will manage 5 dbs and will keep on synching the slots for them.

~

/the new GUC is set at default value of 2/'max_slot_sync_workers' is 2/

~~~

9.
If a new
DB is found by replication launcher, it will assign this new db to
the worker handling the minimum number of dbs currently (or first
worker in case of equal count)

~

Hmm. Isn't this only describing cases where max_slot_workers was
exceeded? Otherwise, you should just launch a brand new sync-worker,
right?

~~~

10.
Each worker slot will have its own dbids list.

~

It seems confusing to say "worker slot" when already talking about
workers and slots. Can you reword that more like "Each slot-sync
worker will have its own dbids list"?

==
src/backend/postmaster/bgworker.c

==
.../libpqwalreceiver/libpqwalreceiver.c

11. libpqrcv_list_db_for_logical_slots

+/*
+ * List DB for logical slots
+ *
+ * It gets the list of unique DBIDs for logical slots mentioned in slot_names
+ * from primary.
+ */
+static List *
+libpqrcv_list_db_for_logical_slots(WalReceiverConn *conn,

Comment needs some minor tweaking.

~~~

12.
+ if (strcmp(slot_names, "") != 0 && strcmp(slot_names, "*") != 0)
+ {
+ char*rawname;
+ List*namelist;
+ ListCell   *lc;
+
+ appendStringInfoChar(, ' ');
+ rawname = pstrdup(slot_names);
+ SplitIdentifierString(rawname, ',', );
+ foreach (lc, namelist)
+ {
+ if (lc != list_head(namelist))
+ appendStringInfoChar(, ',');
+ appendStringInfo(, "%s",
+ quote_identifier(lfirst(lc)));
+ }
+ }

/rawname/rawnames/

~~~

13.
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("could not receive list of slots the primary server: %s",
+ pchomp(PQerrorMessage(conn->streamConn);
+ }

/the primary server/from the primary server/

~~~

14.
+ if (PQnfields(res) < 1)
+ {
+ int nfields = PQnfields(res);
+
+ PQclear(res);
+ ereport(ERROR,
+ (errmsg("invalid response from primary server"),
+ errdetail("Could not get list of slots: got %d fields, "
+ "expected %d or more fields.",
+nfields, 1)));
+ }

This code seems over-complicated. If it is < 1 then it can only be
zero, right? So then what is the point of calculating and displaying
the 'nfields' which can only be 0?

~~~

15.
+ ntuples = PQntuples(res);
+ for (int i = 0; i < ntuples; i++)
+ {
+
+ slot_data = palloc0(sizeof(WalRecvReplicationSlotDbData));
+ if (!PQgetisnull(res, i, 0))
+ slot_data->database = atooid(PQgetvalue(res, i, 0));
+
+ slot_data->last_sync_time = 0;
+ slotlist = lappend(slotlist, slot_data);
+ }

15a.
Unnecessary blank line in for-block.

~~~

15b.
Unnecessary assignment to 'last_sync_time' because the whole structure
was palloc0 just 2 lines above.

==
src/backend/replication/logical/Makefile

==
src/backend/replication/logical/launcher.c

16.
+/*
+ * Initial and incremental allocation size for dbids array for each
+ * SlotSyncWorker in dynamic shared memory i.e. we start with this size
+ * and once it is exhausted, dbids is rellocated with size

PATCH: document for regression test forgets libpq test

2023-09-01 Thread Ryo Matsumura (Fujitsu)

Hi,

I found a small mistake in document in 33.1.3. Additional Test Suites.

> The additional tests that can be invoked this way include:
The list doesn't include interface/libpq/test.

I attached patch.

Thank you.

Best Regards
Ryo Matsumura
Fujitsu Limited


regress_doc_fix.diff
Description: regress_doc_fix.diff

Move bki file pre-processing from initdb to bootstrap

2023-09-01 Thread Krishnakumar R

Hi All,

This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.

The flow of bki file processing will be as follows:
- In initdb gather the values used to replace the tokens in the bki file.
- Pass these values into postgres bootstrap startup using '-i' option
as key-value pairs.
- In bootstrap open the bki file (the bki file name was received as a
parameter).
- During the parsing of the bki file, replace the tokens received as
parameters with their values.

Related discussion can be found here:
https://www.postgresql.org/message-id/20220216021219.ygzrtb3hd5bn7olz%40alap3.anarazel.de

Note: Currently the patch breaks on windows due to placement of extra
quotes when passing parameters (Thanks to Thomas Munro for helping me
find that). Will follow up with v2 fixing the windows issues on
passing  the parameters and format fixes.

Please review and provide feedback.

--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]


v1-0001-Move-the-pre-processing-for-tokens-in-bki-file-fr.patch
Description: Binary data

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-09-01 Thread Alexander Lakhin


Hello Thomas,

31.08.2023 14:15, Thomas Munro wrote:


We have a signal that is pending and not blocked, so I don't
immediately know why poll() hasn't returned control.


When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
    waiting = true;
...
    while (returned_events == 0)
    {
...
    if (set->latch && set->latch->is_set)
    {
...
    break;
    }

In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.

Best regards,
Alexander

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-01 Thread Peter Smith

Here are some review comments for v29-0002

==
src/bin/pg_upgrade/check.c

1. check_old_cluster_for_valid_slots

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

/Quick exit/Quick return/

I know they are kind of the same, but the reason I previously
suggested this change was to keep it consistent with the similar
comment that is already in
check_new_cluster_logical_replication_slots().

~~~

2. check_old_cluster_for_valid_slots

+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all the slots
+ * is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been shut
+ * down, so we skip this live checks.
+ */
+ if (!live_check)

missing word

/skip this live checks./skip this for live checks./

==
src/bin/pg_upgrade/controldata.c

3.
+ /*
+ * Read the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots. Currently, we need it only for the old cluster but
+ * didn't add additional check for the similicity.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)

/similicity/simplicity/

SUGGESTION
Currently, we need it only for the old cluster but for simplicity
chose not to have additional checks.

==
src/bin/pg_upgrade/info.c

4. get_old_cluster_logical_slot_infos_per_db

+ /*
+ * The temporary slots are expressly ignored while checking because such
+ * slots cannot exist after the upgrade. During the upgrade, clusters are
+ * started and stopped several times so that temporary slots will be
+ * removed.
+ */
+ res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "wal_status <> 'lost' AND "
+ "database = current_database() AND "
+ "temporary IS FALSE;");

IIUC, the removal of temp slots is just a side-effect of the
start/stop; not the *reason* for the start/stop. So, the last sentence
needs some modification

BEFORE
During the upgrade, clusters are started and stopped several times so
that temporary slots will be removed.

SUGGESTION
During the upgrade, clusters are started and stopped several times
causing any temporary slots to be removed.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Incremental View Maintenance, take 2

hi
based on v29.
based on https://stackoverflow.com/a/4014981/1560347:
I added a new function  append_update_set_caluse, and deleted
functions: {append_set_clause_for_count, append_set_clause_for_sum,
append_set_clause_for_avg, append_set_clause_for_minmax}

I guess this way is more extensible/generic than yours.

replaced the following code with the generic function: append_update_set_caluse.
+ /* For views with aggregates, we need to build SET clause for
updating aggregate
+ * values. */
+ if (query->hasAggs && IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ const char *aggname = get_func_name(aggref->aggfnoid);
+
+ /*
+ * We can use function names here because it is already checked if these
+ * can be used in IMMV by its OID at the definition time.
+ */
+
+ /* count */
+ if (!strcmp(aggname, "count"))
+ append_set_clause_for_count(resname, aggs_set_old, aggs_set_new,
aggs_list_buf);
+
+ /* sum */
+ else if (!strcmp(aggname, "sum"))
+ append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf);
+
+ /* avg */
+ else if (!strcmp(aggname, "avg"))
+ append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf,
+   format_type_be(aggref->aggtype));
+
+ else
+ elog(ERROR, "unsupported aggregate function: %s", aggname);
+ }
--<<<
attached is my refactor. there is some whitespace errors in the
patches, you need use
git apply --reject --whitespace=fix
basedon_v29_matview_c_refactor_update_set_clause.patch

Also you patch cannot use git apply, i finally found out bulk apply
using gnu patch from
https://serverfault.com/questions/102324/apply-multiple-patch-files.
previously I just did it manually one by one.

I think if you use { for i in  $PATCHES/v29*.patch; do patch -p1 < $i;
done } GNU patch, it will generate an .orig file for every modified
file?
-<
src/backend/commands/matview.c
2268: /* For tuple deletion */
maybe "/* For tuple deletion and update*/" is more accurate?
-<
currently at here: src/test/regress/sql/incremental_matview.sql
98: -- support SUM(), COUNT() and AVG() aggregate functions
99: BEGIN;
100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i,
SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i;
101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4;
102: INSERT INTO mv_base_a VALUES(2,100);

src/backend/commands/matview.c
2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT)
2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data);

then I debug, print out querybuf.data:
WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ =
mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum =
(CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum
IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum
OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ =
(mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__),
count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN
mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN
mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN
diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE
(mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./)
(mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__),
__ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL
THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN
mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__
OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff
WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i
IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum,
count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__,
__ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__,
__ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS
diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i
OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)));

At this final SPI_exec, we have a update statement with related
columns { __ivm_count_sum__, sum, __ivm_count__, count, avg,
__ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops
working, querybuf.data is way too big, but I still feel like there is
some logic associated with these columns, maybe we can use it as an
assertion to prove that this query (querybuf.len = 1834) is indeed
correct.

Since the apply delta query is quite complex, I feel like adding some
"if debug then print out the final querybuf.data end if" would be a
good idea.

we add hidden columns somewhere, also to avoid corner cases,

Re: persist logical slots to disk during shutdown checkpoint

2023-09-01 Thread Ashutosh Bapat

On Thu, Aug 31, 2023 at 7:28 PM Amit Kapila  wrote:
>
> On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
>  wrote:
> >
> > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  wrote:
> > > > > > I
> > > > > > think we should shut down subscriber, restart publisher and then 
> > > > > > make this
> > > > > > check based on the contents of the replication slot instead of 
> > > > > > server log.
> > > > > > Shutting down subscriber will ensure that the subscriber won't send 
> > > > > > any new
> > > > > > confirmed flush location to the publisher after restart.
> > > > > >
> > > > >
> > > > > But if we shutdown the subscriber before the publisher there is no
> > > > > guarantee that the publisher has sent all outstanding logs up to the
> > > > > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > > > > can only be there if we do a clean shutdown of the publisher before
> > > > > the subscriber.
> > > >
> > > > So the sequence is shutdown publisher node, shutdown subscriber node,
> > > > start publisher node and carry out the checks.
> > > >
> > >
> > > This can probably work but I still prefer the current approach as that
> > > will be closer to the ideal values on the disk instead of comparison
> > > with a later in-memory value of confirmed_flush LSN. Ideally, if we
> > > would have a tool like pg_replslotdata which can read the on-disk
> > > state of slots that would be better but missing that, the current one
> > > sounds like the next best possibility. Do you see any problem with the
> > > current approach of test?
> >
> > > +  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> > > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
> >
> > I don't think the LSN reported in this message is guaranteed to be the
> > confirmed_flush LSN of the slot. It's usually confirmed_flush but not
> > always. It's the LSN that snapshot builder computes based on factors
> > including confirmed_flush. There's a chance that this test will fail
> > sometimes because  of this behaviour.
> >
>
> I think I am missing something here because as per my understanding,
> the LOG referred by the test is generated in CreateDecodingContext()
> before which we shouldn't be changing the slot's confirmed_flush LSN.
> The LOG [1] refers to the slot's persistent value for confirmed_flush,
> so how it could be different from what the test is expecting.
>
> [1]
> errdetail("Streaming transactions committing after %X/%X, reading WAL
> from %X/%X.",
>LSN_FORMAT_ARGS(slot->data.confirmed_flush),

I was afraid that we may move confirmed_flush while creating the
snapshot builder when creating the decoding context. But I don't see
any code doing that. So may be we are safe. But if the log message
changes, this test would fail - depending upon the log message looks a
bit fragile, esp. when we have a way to access the data directly
reliably.

-- 
Best Wishes,
Ashutosh Bapat

Re: [PoC] pg_upgrade: allow to upgrade publisher node