Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele

On 3/15/24 18:32, Michael Paquier wrote:

On Fri, Mar 15, 2024 at 06:23:15PM +1300, David Steele wrote:

Well, this is what we recommend in the docs, i.e. using bin mode to save
backup_label, so it seems OK to me.


Indeed, I didn't notice that this is actually documented, so what I
did took the right angle.  French flair, perhaps..


This seems like a reasonable explanation to me.

-David




Re: Add basic tests for the low-level backup method.

2024-03-14 Thread Michael Paquier
On Fri, Mar 15, 2024 at 06:23:15PM +1300, David Steele wrote:
> Well, this is what we recommend in the docs, i.e. using bin mode to save
> backup_label, so it seems OK to me.

Indeed, I didn't notice that this is actually documented, so what I
did took the right angle.  French flair, perhaps..
--
Michael


signature.asc
Description: PGP signature


Re: Add basic tests for the low-level backup method.

2024-03-14 Thread Michael Paquier
On Fri, Mar 15, 2024 at 08:38:47AM +0900, Michael Paquier wrote:
> That's why these tests are not that easy, they can be racy.  I've run
> the test 5~10 times in the CI this time to gain more confidence, and
> saw zero failures with the stability fixes in place including Windows.
> I've applied it now, as I can still monitor the buildfarm for a few
> more days.  Let's see what happens, but that should be better.

So, it looks like the buildfarm is clear.  sidewinder has reported a
green state, and the recent runs of the CFbot across all the patches
are looking stable as well on all platforms.  There are still a few
buildfarm members on Windows that will take time more time before
running.
--
Michael


signature.asc
Description: PGP signature


Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele

On 3/15/24 12:38, Michael Paquier wrote:

On Fri, Mar 15, 2024 at 09:40:38AM +1300, David Steele wrote:

Is the missing test in meson the reason we did not see test failures for
Windows in CI?


The test has to be listed in src/test/recovery/meson.build or the CI
would ignore it.


Right -- I will keep this in mind for the future.


The second LOG is something that can be acted on.  I've added some
debugging to the parsing of the backup_label file in the backend, and
noticed that the first fscanf() for START WAL LOCATION is failing
because the last %c is detected as \r rather than \n.  Tweaking the
contents stored from pg_backend_stop() with a sed won't help, because
the issue is that we write the CRLFs with append_to_file, and the
startup process cannot cope with that.  The simplest method I can
think of is to use binmode, as of the attached.


Yeah, that makes sense.


I am wondering if there is a better trick here that would not require
changes in the backend to make the backup_label parsing more flexible,
though.


Well, this is what we recommend in the docs, i.e. using bin mode to save 
backup_label, so it seems OK to me.



I am attaching an updated patch with all that fixed, which is stable
in the CI and any tests I've run.  Do you have any comments about


These changes look good to me. Sure wish we had an easier to way to test
commits in the build farm.


That's why these tests are not that easy, they can be racy.  I've run
the test 5~10 times in the CI this time to gain more confidence, and
saw zero failures with the stability fixes in place including Windows.
I've applied it now, as I can still monitor the buildfarm for a few
more days.  Let's see what happens, but that should be better.


At least sidewinder is happy now -- and the build farm in general as far 
as I can see.


Thank you for your help on this!
-David




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-14 Thread Bharath Rupireddy
On Wed, Mar 13, 2024 at 9:38 AM Amit Kapila  wrote:
>
> BTW, is XID the based parameter 'max_slot_xid_age' not have similarity
> with 'max_slot_wal_keep_size'? I think it will impact the rows we
> removed based on xid horizons. Don't we need to consider it while
> vacuum computing the xid horizons in ComputeXidHorizons() similar to
> what we do for WAL w.r.t 'max_slot_wal_keep_size'?

I'm having a hard time understanding why we'd need something up there
in ComputeXidHorizons(). Can you elaborate it a bit please?

What's proposed with max_slot_xid_age is that during checkpoint we
look at slot's xmin and catalog_xmin, and the current system txn id.
Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses
max_slot_xid_age, we invalidate the slot.  Let me illustrate how all
this works:

1. Setup a primary and standby with hot_standby_feedback set to on on
standby. For instance, check my scripts at [1].

2. Stop the standby to make the slot inactive on the primary. Check
the slot is holding xmin of 738.
./pg_ctl -D sbdata -l logfilesbdata stop

postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]---+-
slot_name   | sb_repl_slot
plugin  |
slot_type   | physical
datoid  |
database|
temporary   | f
active  | f
active_pid  |
xmin| 738
catalog_xmin|
restart_lsn | 0/300
confirmed_flush_lsn |
wal_status  | reserved
safe_wal_size   |
two_phase   | f
conflict_reason |
failover| f
synced  | f

3. Start consuming the XIDs on the primary with the following script
for instance
./psql -d postgres -p 5432
DROP TABLE tab_int;
CREATE TABLE tab_int (a int);

do $$
begin
  for i in 1..268435 loop
-- use an exception block so that each iteration eats an XID
begin
  insert into tab_int values (i);
exception
  when division_by_zero then null;
end;
  end loop;
end$$;

4. Make some dead rows in the table.
update tab_int set a = a+1;
delete from tab_int where a%4=0;

postgres=# SELECT n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del FROM
pg_stat_user_tables WHERE relname = 'tab_int';
-[ RECORD 1 ]--
n_dead_tup | 335544
n_tup_ins  | 268435
n_tup_upd  | 268435
n_tup_del  | 67109

5. Try vacuuming to delete the dead rows, observe 'tuples: 0 removed,
536870 remain, 335544 are dead but not yet removable'. The dead rows
can't be removed because the inactive slot is holding an xmin, see
'removable cutoff: 738, which was 268441 XIDs old when operation
ended'.

postgres=# vacuum verbose tab_int;
INFO:  vacuuming "postgres.public.tab_int"
INFO:  finished vacuuming "postgres.public.tab_int": index scans: 0
pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total)
tuples: 0 removed, 536870 remain, 335544 are dead but not yet removable
removable cutoff: 738, which was 268441 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 4759 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.07 s
VACUUM

6. Now, repeat the above steps but with setting max_slot_xid_age =
20 on the primary.

7. Do a checkpoint to invalidate the slot.
postgres=# checkpoint;
CHECKPOINT
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]---+-
slot_name   | sb_repl_slot
plugin  |
slot_type   | physical
datoid  |
database|
temporary   | f
active  | f
active_pid  |
xmin| 738
catalog_xmin|
restart_lsn | 0/300
confirmed_flush_lsn |
wal_status  | lost
safe_wal_size   |
two_phase   | f
conflicting |
failover| f
synced  | f
invalidation_reason | xid_aged

8. And, then vacuum the table, observe 'tuples: 335544 removed, 201326
remain, 0 are dead but not yet removable'.

postgres=# vacuum verbose tab_int;
INFO:  vacuuming "postgres.public.tab_int"
INFO:  finished vacuuming "postgres.public.tab_int": index scans: 0
pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total)
tuples: 335544 removed, 201326 remain, 0 are dead but not yet removable
removable cutoff: 269179, which was 0 XIDs old when operation ended
new relfrozenxid: 269179, which is 268441 XIDs ahead of previous value
frozen: 1189 pages from table (50.04% of total) had 201326 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 193.100 MB/s
buffer usage: 4760 hits, 0 misses, 2381 dirtied
WAL usage: 5942 records, 2378 full page images, 8343275 bytes
system usage: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
VACUUM

[1]
cd 

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Andrei Lepikhov

On 14/3/2024 17:39, Alexander Korotkov wrote:

Thank you, Andrei.  Looks like a very undesirable side effect.  Do you
have any idea why it happens?  Partition pruning should work correctly
for both transformed and non-transformed quals, why does
transformation hurt it?
Now we have the v23-0001-* patch with all issues resolved. The last one 
which caused execution stage pruning was about necessity to evaluate 
SAOP expression right after transformation. In previous version the core 
executed it on transformed expressions.


> As you can see this case is not related to partial indexes.  Just no
> index selective for the whole query.  However, splitting scan by the
> OR qual lets use a combination of two selective indexes.
Thanks for the case. I will try to resolve it.

--
regards,
Andrei Lepikhov
Postgres Professional
From 156c00c820a38e5e1856f07363af87b3109b5d77 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 374 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 215 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 19 files changed, 929 insertions(+), 58 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest
Output: utrtest_1.a, utrtest_1.b
Foreign Update on public.remp utrtest_1
Update on public.locp utrtest_2
->  Append
  ->  Foreign Update on public.remp utrtest_1
-   Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a 
= 2))) RETURNING a, b
+   Remote SQL: UPDATE public.loct SET a = 1 WHERE ((a = ANY 
('{1,2}'::integer[]))) RETURNING a, b
  ->  Seq Scan on public.locp utrtest_2
Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-   Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+   Filter: (utrtest_2.a = ANY ('{1,2}'::integer[]))
 (10 rows)
 
 -- The new values are concatenated with ' triggered !'
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..2de6ae301a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5472,6 +5472,23 @@ ANY num_sync ( 
+  enable_or_transformation (boolean)
+   
+enable_or_transformation configuration 
parameter
+   
+  
+  
+   
+Enables or disables the query 

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-14 Thread shveta malik
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> > >
> > > Yes, there will be some sort of duplicity if we emit conflict_reason
> > > as a text field. However, I still think the better way is to turn
> > > conflict_reason text to conflict boolean and set it to true only on
> > > rows_removed and wal_level_insufficient invalidations. When conflict
> > > boolean is true, one (including all the tests that we've added
> > > recently) can look for invalidation_reason text field for the reason.
> > > This sounds reasonable to me as opposed to we just mentioning in the
> > > docs that "if invalidation_reason is rows_removed or
> > > wal_level_insufficient it's the reason for conflict with recovery".

+1 on maintaining both conflicting and invalidation_reason

> > Fair point. I think we can go either way. Bertrand, Nathan, and
> > others, do you have an opinion on this matter?
>
> While we wait to hear from others on this, I'm attaching the v9 patch
> set implementing the above idea (check 0001 patch). Please have a
> look. I'll come back to the other review comments soon.

Thanks for the patch. JFYI, patch09 does not apply to HEAD, some
recent commit caused the conflict.

Some trivial comments on patch001 (yet to review other patches)

1)
info.c:

- "%s as caught_up, conflict_reason IS NOT NULL as invalid "
+ "%s as caught_up, invalidation_reason IS NOT NULL as invalid "

Can we revert back to 'conflicting as invalid' since it is a query for
logical slots only.

2)
040_standby_failover_slots_sync.pl:

- q{SELECT conflict_reason IS NULL AND synced AND NOT temporary FROM
pg_replication_slots WHERE slot_name = 'lsub1_slot';}
+ q{SELECT invalidation_reason IS NULL AND synced AND NOT temporary
FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';}

Here too, can we have 'NOT conflicting' instead of '
invalidation_reason IS NULL' as it is a logical slot test.

thanks
Shveta




Re: Add publisher and subscriber to glossary documentation.

2024-03-14 Thread Euler Taveira
On Fri, Mar 15, 2024, at 1:14 AM, Amit Kapila wrote:
> I think node should mean instance for both physical and logical
> replication, otherwise, it would be confusing. We need both the usages
> as a particular publication/subscription is defined at the database
> level but the server on which we define those is referred to as a
> node/instance.

If you are creating a subscription that connects to the same instance
(replication between 2 databases in the same cluster), your definition is not
correct and Alvaro's definition is accurate. The node definition is closely
linked to the connection string. While the physical replication does not
specify a database (meaning "any database" referring to an instance), the
logical replication requires a database.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Skip collecting decoded changes of already-aborted transactions

2024-03-14 Thread Ajin Cherian
On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada 
wrote:

>
> I resumed working on this item. I've attached the new version patch.
>
> I rebased the patch to the current HEAD and updated comments and
> commit messages. The patch is straightforward and I'm somewhat
> satisfied with it, but I'm thinking of adding some tests for it.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com


I just had a look at the patch, the patch no longer applies because of a
removal of a header in a recent commit. Overall the patch looks fine, and I
didn't find any issues. Some cosmetic comments:
in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

regards,
Ajin Cherian
Fujitsu Australia


Re: Add publisher and subscriber to glossary documentation.

2024-03-14 Thread Amit Kapila
On Thu, Mar 14, 2024 at 7:51 PM Alvaro Herrera  wrote:
>
> On 2024-Mar-14, Shlok Kyal wrote:
>
> > Andrew Atkinson wrote:
> >
> > > Anyway, hopefully these examples show “node” and “database” are
> > > mixed and perhaps others agree using one consistently might help the
> > > goals of the docs.
> >
> > For me the existing content looks good, I felt let's keep it as it is
> > unless others feel differently.
>
> Actually it's these small terminology glitches that give me pause.  If
> we're going to have terms that are interchangeable (in this case "node"
> and "database"), then they should be always interchangeable, not just in
> some unspecified cases.  Maybe the idea of using "node" (which sounds
> like something that's instance-wide) is wrong for logical replication,
> which is necessarily something that happens database-locally.
>
> Then again, maybe defining "node" as something that exists at a
> database-local level when used in the context of logical replication is
> sufficient.  In that case, it would be better to avoid defining it as a
> synonym of "instance".  Then the terms are not always interchangeable,
> but it's clear when they are and when they aren't.
>
> "Node: in replication, each of the endpoints to which or
> from which data is replicated.  In the context of physical replication,
> each node is an instance.  In the context of logical replication, each
> node is a database".
>

I think node should mean instance for both physical and logical
replication, otherwise, it would be confusing. We need both the usages
as a particular publication/subscription is defined at the database
level but the server on which we define those is referred to as a
node/instance.

One of the usages pointed out by Andrew: "The subscriber database..."
[1] is unclear but I feel we can use node there as well instead of
database.

[1] - 
https://www.postgresql.org/docs/current/logical-replication-subscription.html

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-14 Thread Euler Taveira
On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote:
> Added a top-up patch v28-0005 to fix this issue.
> I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks for your review!

I'm posting a new patch (v29) that merges the previous patches (v28-0002 and
v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during
a rebase. I also included the patch proposed by Shlok [2] that stops the target
server on error if it is running.

Tomas suggested in [3] that maybe the PID should be replaced with something
else that has more entropy. Instead of PID, it uses a random number for
replication slot and subscription. There is also a concern about converting
multiple standbys that will have the same publication name. It added the same
random number to the publication name so it doesn't fail because the
publication already exists. Documentation was changed based on Tomas feedback.

The user name was always included in the subscriber connection string. Let's
have the libpq to choose it. While on it, a new routine (get_sub_conninfo)
contains the code to build the subscriber connection string.

As I said in [4], there wasn't a way to inform a different configuration file.
If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber
starts the server it will fail. The new --config-file option let you inform the
postgresql.conf location and the server is started just fine.

I also did some changes in the start_standby_server routine. I replaced the
strcat and snprintf with appendPQExpBuffer that has been used to store the
pg_ctl command.


[1] 
https://www.postgresql.org/message-id/TYCPR01MB12077FD21BB186C5A685C0BF3F52A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/CANhcyEW6-dH28gLbFc5XpDTJ6JPizU%2Bt5g-aKUWJBf5W_Zriqw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[4] 
https://www.postgresql.org/message-id/d898faad-f6d7-4b0d-b816-b9dcdf490685%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v29-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


Re: Inconsistent printf placeholders

2024-03-14 Thread David Rowley
On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi  wrote:
> I have considered only the two messages. Actually, buffile.c and md.c
> are already like that. The attached aligns the messages in
> pg_combinebackup.c and reconstruct.c with the precedents.

This looks like a worthy cause to make translator work easier.

I don't want to widen the goalposts or anything, but just wondering if
you'd searched for any others that could get similar treatment?

I only just had a quick look at the following.

$ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
 31 msgid ""
  2 msgid "could not accept SSL connection: %"
  2 msgid "could not initialize LDAP: %"
  2 msgid "could not look up local user ID %: %"
  2 msgid "could not open file \"%\": %"
  2 msgid "could not read file \"%\": read % of %"
  2 msgid "could not read from log segment %, offset %: %"
  2 msgid "could not read from log segment %, offset %: read % of %"
  2 msgid "index % out of valid range, 0..%"
  2 msgid "invalid value for parameter \"%\": %"
  2 msgid "%%% is outside the valid range for parameter \"%\" (% .. %)"
  2 msgid "must be owner of large object %"
  2 msgid "oversize GSSAPI packet sent by the client (% > %)"
  2 msgid "permission denied for large object %"
  2 msgid "string is too long for tsvector (% bytes, max % bytes)"
  2 msgid "timestamp out of range: \"%\""
  2 msgid "Valid values are between \"%\" and \"%\"."

I've not looked at how hard it would be with any of the above to
determine how hard it would be to make the formats consistent.  The
3rd last one seems similar enough that it might be worth doing
together with this?

David




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Bruce Momjian
On Thu, Mar 14, 2024 at 07:43:15PM -0400, Robert Haas wrote:
> On Thu, Mar 14, 2024 at 5:15 PM Maciek Sakrejda  wrote:
> > It's not a security feature: it's a usability feature.
> >
> > It's a usability feature because, when Postgres configuration is
> > managed by an outside mechanism (e.g., as in a Kubernetes
> > environment), ALTER SYSTEM currently allows a superuser to make
> > changes that appear to work, but may be discarded at some point in the
> > future when that outside mechanism updates the config. They may also
> > be represented incorrectly in a management dashboard if that dashboard
> > is based on the values in the outside configuration mechanism, rather
> > than values directly from Postgres.
> >
> > In this case, the end user with access to Postgres superuser
> > privileges presumably also has access to the outside configuration
> > mechanism. The goal is not to prevent them from changing settings, but
> > to offer guard rails that prevent them from changing settings in a way
> > that will be unstable (revertible by a future update) or confusing
> > (not showing up in a management UI).
> >
> > There are challenges here in making sure this is _not_ seen as a
> > security feature. But I do think the feature itself is sensible and
> > worthwhile.
> 
> This is what I would have said if I'd tried to offer an explanation,
> except you said it better than I would have done.

I do think the docs need to clearly say this is not a security feature.
In fact, I wonder if the ALTER SYSTEM error message should explain the
GUC that is causing the failure.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Reports on obsolete Postgres versions

2024-03-14 Thread Bruce Momjian
On Thu, Mar 14, 2024 at 10:46:28PM -0400, Bruce Momjian wrote:
> > In the end, while I certainly don't mind improving the web page, I
> > think that a lot of what we're seeing here probably has to do with the
> > growing popularity and success of PostgreSQL. If you have more people
> > using your software, you're also going to have more people using
> > out-of-date versions of your software.
> 
> Yeah, probably, and we recently end-of-life'ed PG 11.

In a way it is that we had more users during the PG 10/11 period than
before that, and those people aren't upgrading as quickly.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-03-14 Thread Masahiko Sawada
On Thu, Mar 14, 2024 at 9:03 PM Masahiko Sawada  wrote:
>
> On Thu, Mar 14, 2024 at 6:55 PM John Naylor  wrote:
> >
> > On Thu, Mar 14, 2024 at 12:06 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Mar 14, 2024 at 1:29 PM John Naylor  
> > > wrote:
> > > > Okay, here's an another idea: Change test_lookup_tids() to be more
> > > > general and put the validation down into C as well. First we save the
> > > > blocks from do_set_block_offsets() into a table, then with all those
> > > > blocks lookup a sufficiently-large range of possible offsets and save
> > > > found values in another array. So the static items structure would
> > > > have 3 arrays: inserts, successful lookups, and iteration (currently
> > > > the iteration output is private to check_set_block_offsets(). Then
> > > > sort as needed and check they are all the same.
> > >
> > > That's a promising idea. We can use the same mechanism for randomized
> > > tests too. If you're going to work on this, I'll do other tests on my
> > > environment in the meantime.
> >
> > Some progress on this in v72 -- I tried first without using SQL to
> > save the blocks, just using the unique blocks from the verification
> > array. It seems to work fine.
>
> Thanks!
>
> >
> > - Since there are now three arrays we should reduce max bytes to
> > something smaller.
>
> Agreed.
>
> > - Further on that, I'm not sure if the "is full" test is telling us
> > much. It seems we could make max bytes a static variable and set it to
> > the size of the empty store. I'm guessing it wouldn't take much to add
> > enough tids so that the contexts need to allocate some blocks, and
> > then it would appear full and we can test that. I've made it so all
> > arrays repalloc when needed, just in case.
>
> How about using work_mem as max_bytes instead of having it as a static
> variable? In test_tidstore.sql we set work_mem before creating the
> tidstore. It would make the tidstore more controllable by SQL queries.
>
> > - Why are we switching to TopMemoryContext? It's not explained -- the
> > comment only tells what the code is doing (which is obvious), but not
> > why.
>
> This is because the tidstore needs to live across the transaction
> boundary. We can use TopMemoryContext or CacheMemoryContext.
>
> > - I'm not sure it's useful to keep test_lookup_tids() around. Since we
> > now have a separate lookup test, the only thing it can tell us is that
> > lookups fail on an empty store. I arranged it so that
> > check_set_block_offsets() works on an empty store. Although that's
> > even more trivial, it's just reusing what we already need.
>
> Agreed.
>

I have two questions on tidstore.c:

+/*
+ * Set the given TIDs on the blkno to TidStore.
+ *
+ * NB: the offset numbers in offsets must be sorted in ascending order.
+ */

Do we need some assertions to check if the given offset numbers are
sorted expectedly?

---
+   if (TidStoreIsShared(ts))
+   found = shared_rt_set(ts->tree.shared, blkno, page);
+   else
+   found = local_rt_set(ts->tree.local, blkno, page);
+
+   Assert(!found);

Given TidStoreSetBlockOffsets() is designed to always set (i.e.
overwrite) the value, I think we should not expect that found is
always false.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Reports on obsolete Postgres versions

2024-03-14 Thread Bruce Momjian
On Thu, Mar 14, 2024 at 10:15:18AM -0400, Robert Haas wrote:
> I think that whatever we say here should focus on what we try to do or
> guarantee, not on what actions we think users ought to take, never
> mind must take. We can say that we try to avoid making any changes
> upon which an application might be relying -- but there surely is some
> weasel-wording there, because we have made such changes before in the
> name of security, and sometimes to fix bugs, and we will likely to do
> so again in the future. But it's not for us to decide how much testing
> is warranted. It's the user's system, not ours.

Yes, good point, let's tell whem what our goals are and they can decide
what testing they need.

> In the end, while I certainly don't mind improving the web page, I
> think that a lot of what we're seeing here probably has to do with the
> growing popularity and success of PostgreSQL. If you have more people
> using your software, you're also going to have more people using
> out-of-date versions of your software.

Yeah, probably, and we recently end-of-life'ed PG 11.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Typos in reorderbuffer.c.

2024-03-14 Thread Kyotaro Horiguchi
At Thu, 14 Mar 2024 11:23:38 +0530, Amit Kapila  wrote 
in 
> On Thu, Mar 14, 2024 at 9:58 AM Kyotaro Horiguchi
>  wrote:
> >
> > While examining reorderbuffer.c, I found several typos. I'm not sure
> > if fixing them is worthwhile, but I've attached a fix just in case.
> >
> 
> LGTM. I'll push this in some time.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Inconsistent printf placeholders

2024-03-14 Thread Kyotaro Horiguchi
Thank you for the suggestions.

At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson  wrote 
in 
> I've only skimmed this so far but +1 on keeping the messages the same where
> possible to reduce translation work.  Adding a comment on the message where 
> the
> casting is done to indicate that it is for translation might reduce the risk 
> of
> it "getting fixed" down the line.

Added a comment "/* cast xxx to avoid extra translatable messages */.

At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut  
wrote in 
> If you want to make them uniform, then I suggest the error messages

Yeah. Having the same messages with only the placeholders changed is
not very pleasing during translation. If possible, I would like to
align them.

> should both be "%zd of %zu bytes", which are the actual types read()
> deals with.

I have considered only the two messages. Actually, buffile.c and md.c
are already like that. The attached aligns the messages in
pg_combinebackup.c and reconstruct.c with the precedents.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 6f0814d9ac..feb4d5dcf4 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen)
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
-	 filename, rb, (long long int) st.st_size);
+			/* cast st_size to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 filename, rb, (size_t) st.st_size);
 	}
 
 	/* Adjust buffer length for new data and restore trailing-\0 invariant */
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b..a4badb90e2 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -504,15 +504,16 @@ make_rfile(char *filename, bool missing_ok)
 static void
 read_bytes(rfile *rf, void *buffer, unsigned length)
 {
-	int			rb = read(rf->fd, buffer, length);
+	ssize_t		rb = read(rf->fd, buffer, length);
 
 	if (rb != length)
 	{
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", rf->filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %d of %u bytes",
-	 rf->filename, rb, length);
+			/* cast length to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 rf->filename, rb, (size_t) length);
 	}
 }
 


Re: Add bump memory context type and use it for tuplesorts

2024-03-14 Thread David Rowley
On Tue, 12 Mar 2024 at 23:57, Tomas Vondra
 wrote:
> Attached is an updated version of the mempool patch, modifying all the
> memory contexts (not just AllocSet), including the bump context. And
> then also PDF with results from the two machines, comparing results
> without and with the mempool. There's very little impact on small reset
> values (128kB, 1MB), but pretty massive improvements on the 8MB test
> (where it's a 2x improvement).

I think it would be good to have something like this.  I've done some
experiments before with something like this [1]. However, mine was
much more trivial.

One thing my version did was get rid of the *context* freelist stuff
in aset.  I wondered if we'd need that anymore as, if I understand
correctly, it's just there to stop malloc/free thrashing, which is
what the patch aims to do anyway. Aside from that, it's now a little
weird that aset.c has that but generation.c and slab.c do not.

One thing I found was that in btbeginscan(), we have "so =
(BTScanOpaque) palloc(sizeof(BTScanOpaqueData));", which on this
machine is 27344 bytes and results in a call to AllocSetAllocLarge()
and therefore a malloc().  Ideally, there'd be no malloc() calls in a
standard pgbench run, at least once the rel and cat caches have been
warmed up.

I think there are a few things in your patch that could be improved,
here's a quick review.

1. MemoryPoolEntryIndex() could follow the lead of
AllocSetFreeIndex(), which is quite well-tuned and has no looping. I
think you can get rid of MemoryPoolEntrySize() and just have
MemoryPoolEntryIndex() round up to the next power of 2.

2. The following could use  "result = Min(MEMPOOL_MIN_BLOCK,
pg_nextpower2_size_t(size));"

+ * should be very low, though (less than MEMPOOL_SIZES, i.e. 14).
+ */
+ result = MEMPOOL_MIN_BLOCK;
+ while (size > result)
+ result *= 2;

3. "MemoryPoolFree": I wonder if this is a good name for such a
function.  Really you want to return it to the pool. "Free" sounds
like you're going to free() it.  I went for "Fetch" and "Release"
which I thought was less confusing.

4. MemoryPoolRealloc(), could this just do nothing if the old and new
indexes are the same?

5. It might be good to put a likely() around this:

+ /* only do this once every MEMPOOL_REBALANCE_DISTANCE allocations */
+ if (pool->num_requests < MEMPOOL_REBALANCE_DISTANCE)
+ return;

Otherwise, if that function is inlined then you'll bloat the functions
that inline it for not much good reason.  Another approach would be to
have a static inline function which checks and calls a noinline
function that does the work so that the rebalance stuff is never
inlined.

Overall, I wonder if the rebalance stuff might make performance
testing quite tricky.  I see:

+/*
+ * How often to rebalance the memory pool buckets (number of allocations).
+ * This is a tradeoff between the pool being adaptive and more overhead.
+ */
+#define MEMPOOL_REBALANCE_DISTANCE 25000

Will TPS take a sudden jump after 25k transactions doing the same
thing?  I'm not saying this shouldn't happen, but... benchmarking is
pretty hard already. I wonder if there's something more fine-grained
that can be done which makes the pool adapt faster but not all at
once.  (I've not studied your algorithm for the rebalance.)

David

[1] https://github.com/david-rowley/postgres/tree/malloccache




Re: cleanup patches for incremental backup

2024-03-14 Thread Nathan Bossart
On Thu, Mar 14, 2024 at 04:00:10PM -0500, Nathan Bossart wrote:
> Separately, I suppose it's probably time to revert the temporary debugging
> code adding by commit 5ddf997.  I can craft a patch for that, too.

As promised...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 3923e30acb1d4e21ce311b25edbcd8b96b4223d2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 14 Mar 2024 20:36:48 -0500
Subject: [PATCH v1 1/2] Revert "Temporary patch to help debug pg_walsummary
 test failures."

This reverts commit 5ddf9973477729cf161b4ad0a1efd52f4fea9c88.
---
 src/backend/backup/walsummary.c   |  7 ---
 src/bin/pg_walsummary/t/002_blocks.pl | 14 --
 2 files changed, 21 deletions(-)

diff --git a/src/backend/backup/walsummary.c b/src/backend/backup/walsummary.c
index 4d047e1c02..322ae3c3ad 100644
--- a/src/backend/backup/walsummary.c
+++ b/src/backend/backup/walsummary.c
@@ -252,15 +252,8 @@ RemoveWalSummaryIfOlderThan(WalSummaryFile *ws, time_t cutoff_time)
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", path)));
-	/* XXX temporarily changed to debug buildfarm failures */
-#if 0
 	ereport(DEBUG2,
 			(errmsg_internal("removing file \"%s\"", path)));
-#else
-	ereport(LOG,
-			(errmsg_internal("removing file \"%s\" cutoff_time=%llu", path,
-			 (unsigned long long) cutoff_time)));
-#endif
 }
 
 /*
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index d4bae3d564..52d3bd8840 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -51,7 +51,6 @@ my $summarized_lsn = $node1->safe_psql('postgres', From 38dda89ee48736489d37e18dea186e90358468b0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 14 Mar 2024 20:46:02 -0500
Subject: [PATCH v1 2/2] Fix possible overflow in MaybeRemoveOldWalSummaries().

---
 src/backend/postmaster/walsummarizer.c | 4 ++--
 src/backend/utils/misc/guc_tables.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ec2874c18c..c820d1f9ed 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -139,7 +139,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
  * GUC parameters
  */
 bool		summarize_wal = false;
-int			wal_summary_keep_time = 10 * 24 * 60;
+int			wal_summary_keep_time = 10 * HOURS_PER_DAY * MINS_PER_HOUR;
 
 static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
@@ -1474,7 +1474,7 @@ MaybeRemoveOldWalSummaries(void)
 	 * Files should only be removed if the last modification time precedes the
 	 * cutoff time we compute here.
 	 */
-	cutoff_time = time(NULL) - 60 * wal_summary_keep_time;
+	cutoff_time = time(NULL) - wal_summary_keep_time * SECS_PER_MINUTE;
 
 	/* Get all the summaries that currently exist. */
 	wslist = GetWalSummaries(0, InvalidXLogRecPtr, InvalidXLogRecPtr);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 57d9de4dd9..1e71e7db4a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3293,9 +3293,9 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_MIN,
 		},
 		_summary_keep_time,
-		10 * 24 * 60,			/* 10 days */
+		10 * HOURS_PER_DAY * MINS_PER_HOUR, /* 10 days */
 		0,
-		INT_MAX,
+		INT_MAX / SECS_PER_MINUTE,
 		NULL, NULL, NULL
 	},
 
-- 
2.25.1



Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2024-03-14 Thread James Coleman
On Thu, Mar 14, 2024 at 10:28 AM Robert Haas  wrote:
>
> On Wed, Oct 4, 2023 at 9:12 PM James Coleman  wrote:
> > All right, attached is a v3 which attempts to fix the wrong
> > information with an economy of words. I may at some point submit a
> > separate patch that adds a broader pruning section, but this at least
> > brings the docs inline with reality insofar as they address it.
>
> I don't think this is as good as what I proposed back on October 2nd.
> IMHO, that version does a good job making the text accurate and clear,
> and is directly responsive to your original complaint, namely, that
> the root of the HOT chain can't be removed. But this version seems to
> contain a number of incidental changes that are unrelated to that
> point, e.g. "old versions" -> "old, no longer visible versions", "can
> be completely removed" -> "may be pruned", and the removal of the
> sentence "In summary, heap-only tuple updates can only be created - if
> columns used by indexes are not updated" which AFAICT is both
> completely correct as-is and unrelated to the original complaint.
>
> Maybe I shouldn't be, but I'm slightly frustrated here. I thought I
> had proposed an alternative which you found acceptable, but then you
> proposed several more versions that did other things instead, and I
> never really understood why we couldn't just adopt the version that
> you seemed to think was OK. If there's a problem with that, say what
> it is. If there's not, let's do that and move on.

I think there's simply a misunderstanding here. I read your proposal
as "here's an idea to consider as you work on the patch" (as happens
on many other threads), and so I attempted to incorporate your primary
points of feedback into my next version of the patch.

Obviously I have reasons for the other changes I made: for example,
"no longer visible" improves the correctness, since being an old
version isn't sufficient. I removed the "In summary" sentence because
it simply doesn't follow from the prose before it. That sentence
simply restates information already appearing earlier in almost as
simple a form, so it's redundant. But more importantly it's just not
actually a summary of the text before it, so removing it improves the
documentation.

I can explain my reasoning further if desired, but I fear it would
simply frustrate you further, so I'll stop here.

If the goal here is the most minimal patch possible, then please
commit what you proposed. I am interested in improving the document
further, but I don't know how to do that easily if the requirement is
effectively "must only change one specific detail at a time". So, that
leaves me feeling a bit frustrated also.

Regards,
James Coleman




Re: Support json_errdetail in FRONTEND builds

2024-03-14 Thread Tom Lane
Michael Paquier  writes:
> Hmm.  I am not sure how much protection this would offer, TBH.  One
> thing that I find annoying with common/stringinfo.c as it is currently
> is that we have two exit() calls in the enlarge path, and it does not
> seem wise to me to spread that even more.

> My last argument sounds like a nit for HEAD knowing that this does not
> impact libpq that has its own pqexpbuffer.c to avoid issues with
> palloc, elog and exit, but that could be a problem if OAuth relies
> more on these code paths in libpq.

I hope nobody is expecting that such code will get accepted.  We have
a policy (and an enforcement mechanism) that libpq.so must not call
exit().  OAuth code in libpq will need to cope with using pqexpbuffer.

regards, tom lane




Re: broken JIT support on Fedora 40

2024-03-14 Thread Thomas Munro
For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro 
Date:   Wed Oct 18 22:15:54 2023 +1300

jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*).  First obvious question
would be: is that NULL argument still acceptable?  Perhaps it wants
our LLVMTargetMachineRef there:

   err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph.  Might need an assertion build to learn more.  I'll try to look
again next week or so.




Re: Support json_errdetail in FRONTEND builds

2024-03-14 Thread Michael Paquier
On Thu, Mar 14, 2024 at 10:56:46AM +0100, Daniel Gustafsson wrote:
> + /* don't allow destroys of read-only StringInfos */
> + Assert(str->maxlen != 0);
> Considering that StringInfo.c don't own the memory here I think it's warranted
> to turn this assert into an elog() to avoid the risk of use-after-free bugs.

Hmm.  I am not sure how much protection this would offer, TBH.  One
thing that I find annoying with common/stringinfo.c as it is currently
is that we have two exit() calls in the enlarge path, and it does not
seem wise to me to spread that even more.

My last argument sounds like a nit for HEAD knowing that this does not
impact libpq that has its own pqexpbuffer.c to avoid issues with
palloc, elog and exit, but that could be a problem if OAuth relies
more on these code paths in libpq.
--
Michael


signature.asc
Description: PGP signature


Re: broken JIT support on Fedora 40

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 12:27 PM Daniel Gustafsson  wrote:
> > On 14 Mar 2024, at 20:15, Pavel Stehule  wrote:
>
> > build is ok, but regress tests fails with crash (it works without 
> > -with-llvm)
>
> Can you post some details around this crash?  It doesn't seem to be a
> combination we have covered in the buildfarm.

Yeah, 18.1 (note they switched to 1-based minor numbers, there was no
18.0) just came out a week or so ago.  Despite testing their 18 branch
just before their "RC1" tag, as recently as

commit d282e88e50521a457fa1b36e55f43bac02a3167f
Author: Thomas Munro 
Date:   Thu Jan 25 10:37:35 2024 +1300

Track LLVM 18 changes.

at which point everything worked, it seems that something changed
before they released.  I haven't looked into why yet but it's crashing
on my FreeBSD box too.




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Robert Haas
On Thu, Mar 14, 2024 at 5:15 PM Maciek Sakrejda  wrote:
> It's not a security feature: it's a usability feature.
>
> It's a usability feature because, when Postgres configuration is
> managed by an outside mechanism (e.g., as in a Kubernetes
> environment), ALTER SYSTEM currently allows a superuser to make
> changes that appear to work, but may be discarded at some point in the
> future when that outside mechanism updates the config. They may also
> be represented incorrectly in a management dashboard if that dashboard
> is based on the values in the outside configuration mechanism, rather
> than values directly from Postgres.
>
> In this case, the end user with access to Postgres superuser
> privileges presumably also has access to the outside configuration
> mechanism. The goal is not to prevent them from changing settings, but
> to offer guard rails that prevent them from changing settings in a way
> that will be unstable (revertible by a future update) or confusing
> (not showing up in a management UI).
>
> There are challenges here in making sure this is _not_ seen as a
> security feature. But I do think the feature itself is sensible and
> worthwhile.

This is what I would have said if I'd tried to offer an explanation,
except you said it better than I would have done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix the synopsis of pg_md5_hash

2024-03-14 Thread Tatsuro Yamada
Hi, Daniel and Michael,

On Thu, Mar 14, 2024 at 09:32:55AM +0100, Daniel Gustafsson wrote:
> > On 14 Mar 2024, at 07:02, Tatsuro Yamada  wrote:
> >> So, I created a patch to fix them.
> >
> > Thanks, applied.
>
> Oops.  Thanks.
> --
> Michael
>

Thank you guys!

Regards,
Tatsuro Yamada
NTT Open Source Software Center


Re: Add basic tests for the low-level backup method.

2024-03-14 Thread Michael Paquier
On Fri, Mar 15, 2024 at 09:40:38AM +1300, David Steele wrote:
> Is the missing test in meson the reason we did not see test failures for
> Windows in CI?

The test has to be listed in src/test/recovery/meson.build or the CI
would ignore it.

>> The second LOG is something that can be acted on.  I've added some
>> debugging to the parsing of the backup_label file in the backend, and
>> noticed that the first fscanf() for START WAL LOCATION is failing
>> because the last %c is detected as \r rather than \n.  Tweaking the
>> contents stored from pg_backend_stop() with a sed won't help, because
>> the issue is that we write the CRLFs with append_to_file, and the
>> startup process cannot cope with that.  The simplest method I can
>> think of is to use binmode, as of the attached.
> 
> Yeah, that makes sense.

I am wondering if there is a better trick here that would not require
changes in the backend to make the backup_label parsing more flexible,
though.

>> I am attaching an updated patch with all that fixed, which is stable
>> in the CI and any tests I've run.  Do you have any comments about
> 
> These changes look good to me. Sure wish we had an easier to way to test
> commits in the build farm.

That's why these tests are not that easy, they can be racy.  I've run
the test 5~10 times in the CI this time to gain more confidence, and
saw zero failures with the stability fixes in place including Windows.
I've applied it now, as I can still monitor the buildfarm for a few
more days.  Let's see what happens, but that should be better.
--
Michael


signature.asc
Description: PGP signature


Re: broken JIT support on Fedora 40

2024-03-14 Thread Daniel Gustafsson
> On 14 Mar 2024, at 20:15, Pavel Stehule  wrote:

> build is ok, but regress tests fails with crash (it works without -with-llvm)

Can you post some details around this crash?  It doesn't seem to be a
combination we have covered in the buildfarm.

--
Daniel Gustafsson





Re: small_cleanups around login event triggers

2024-03-14 Thread Daniel Gustafsson
> On 14 Mar 2024, at 14:21, Robert Treat  wrote:
> On Thu, Mar 14, 2024 at 8:21 AM Daniel Gustafsson  wrote:

>> - canceling connection in psql wouldn't cancel
>> + canceling a connection in psql will not 
>> cancel
>> Nitpickery (perhaps motivated by english not being my first language), but
>> since psql only deals with one connection I would expect this to read "the
>> connection".
>> 
> 
> My interpretation of this is that "a connection" is more correct
> because it could be your connection or someone else's connection (ie,
> you are canceling one of many possible connections). Definitely
> nitpickery either way.

Fair point.

>> - * Returns true iff the lock was acquired.
>> + * Returns true if the lock was acquired.
>> Using "iff" here is being consistent with the rest of the file (and 
>> technically
>> correct):

> Ah, yeah, I was pretty focused on the event trigger stuff and didn't
> notice it being used elsewhere; thought it was a typo, but I guess
> it's meant as shorthand for "if and only if", I wonder how many people
> are familiar with that.

I would like to think it's fairly widely understood among programmers, but I
might be dating myself in saying so.

I went ahead and applied this with the fixes mentioned here with one more tiny
change to the last hunk of the patch to make it say "login event trigger"
rather than just "login trigger". Thanks for the submission!

--
Daniel Gustafsson





Re: Weird test mixup

2024-03-14 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Mar 14, 2024 at 06:19:38PM -0400, Tom Lane wrote:
>> I wonder if it'd be wise to adjust the injection point stuff so that
>> it's active in only the specific database the injection point was
>> activated in.

> It can be made optional by extending InjectionPointAttach() to
> specify a database OID or a database name.  Note that
> 041_checkpoint_at_promote.pl wants an injection point to run in the
> checkpointer, where we don't have a database requirement.

> Or we could just disable runningcheck because of the concurrency
> requirement in this test.  The test would still be able to run, just
> less times.

No, actually we *must* mark all these tests NO_INSTALLCHECK if we
stick with the current definition of injection points.  The point
of installcheck mode is that the tests are supposed to be safe to
run in a live installation.  Side-effects occurring in other
databases are completely not OK.

I can see that some tests would want to be able to inject code
cluster-wide, but I bet that's going to be a small minority.
I suggest that we invent a notion of "global" vs "local"
injection points, where a "local" one only fires in the DB it
was defined in.  Then only tests that require a global injection
point need to be NO_INSTALLCHECK.

regards, tom lane




Re: Weird test mixup

2024-03-14 Thread Michael Paquier
On Fri, Mar 15, 2024 at 07:53:57AM +0900, Michael Paquier wrote:
> It can be made optional by extending InjectionPointAttach() to
> specify a database OID or a database name.  Note that
> 041_checkpoint_at_promote.pl wants an injection point to run in the
> checkpointer, where we don't have a database requirement.

Slight correction here.  It is also possible to not touch
InjectionPointAttach() at all: just tweak the callbacks to do that as
long as the database that should be used is tracked in shmem with its
point name, say with new fields in InjectionPointSharedState.  That
keeps the backend APIs in a cleaner state.
--
Michael


signature.asc
Description: PGP signature


Re: Fix the synopsis of pg_md5_hash

2024-03-14 Thread Michael Paquier
On Thu, Mar 14, 2024 at 09:32:55AM +0100, Daniel Gustafsson wrote:
> On 14 Mar 2024, at 07:02, Tatsuro Yamada  wrote:
>> So, I created a patch to fix them.
> 
> Thanks, applied.

Oops.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: type cache cleanup improvements

2024-03-14 Thread Michael Paquier
On Thu, Mar 14, 2024 at 04:27:43PM +0300, Teodor Sigaev wrote:
>> So I would like to suggest the attached patch for this first piece.
>> What do you think?
>
> I have not any objections

Okay, I've applied this piece for now.  Not sure I'll have much room
to look at the rest.
--
Michael


signature.asc
Description: PGP signature


Re: pg16: XX000: could not find pathkey item to sort

2024-03-14 Thread David Rowley
On Thu, 14 Mar 2024 at 12:00, David Rowley  wrote:
> I've attached a patch which fixes the problem for me.

I've pushed the patch to fix gather_grouping_paths().  The issue with
the RelOptInfo having the incorrect PathTarget->exprs after the
partial phase of partition-wise aggregate remains.

David




Re: Weird test mixup

2024-03-14 Thread Michael Paquier
On Thu, Mar 14, 2024 at 06:19:38PM -0400, Tom Lane wrote:
> Do they?  It'd be fairly easy to explain this if these things were
> being run in "installcheck" style.  I'm not sure about CI, but from
> memory, the buildfarm does use installcheck for some things.
> 
> I wonder if it'd be wise to adjust the injection point stuff so that
> it's active in only the specific database the injection point was
> activated in.

It can be made optional by extending InjectionPointAttach() to
specify a database OID or a database name.  Note that
041_checkpoint_at_promote.pl wants an injection point to run in the
checkpointer, where we don't have a database requirement.

Or we could just disable runningcheck because of the concurrency
requirement in this test.  The test would still be able to run, just
less times.
--
Michael


signature.asc
Description: PGP signature


Re: Weird test mixup

2024-03-14 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Mar 15, 2024 at 11:19 AM Tom Lane  wrote:
>> Do they?  It'd be fairly easy to explain this if these things were
>> being run in "installcheck" style.  I'm not sure about CI, but from
>> memory, the buildfarm does use installcheck for some things.

> Right, as mentioned here:
> https://www.postgresql.org/message-id/flat/CA%2BhUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX%3D%3DAVnW84f-%2B8yamQ%40mail.gmail.com
> That's the "running" test, which is like the old installcheck.

Hmm.  Seems like maybe we need to institute a rule that anything
using injection points has to be marked NO_INSTALLCHECK.  That's
kind of a big hammer though.

regards, tom lane




Re: Weird test mixup

2024-03-14 Thread Tom Lane
I wrote:
> Heikki Linnakangas  writes:
>> Somehow the 'gin-leave-leaf-split-incomplete' injection point was active 
>> in the 'intarray' test. That makes no sense. That injection point is 
>> only used by the test in src/test/modules/gin/. Perhaps that ran at the 
>> same time as the intarray test? But they run in separate instances, with 
>> different data directories.

> Do they?  It'd be fairly easy to explain this if these things were
> being run in "installcheck" style.  I'm not sure about CI, but from
> memory, the buildfarm does use installcheck for some things.

Hmm, Munro's comment yesterday[1] says that current CI does use
installcheck mode in some cases.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CA+hUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX==avnw84f-+8y...@mail.gmail.com




Re: Weird test mixup

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 11:19 AM Tom Lane  wrote:
> Heikki Linnakangas  writes:
> > Somehow the 'gin-leave-leaf-split-incomplete' injection point was active
> > in the 'intarray' test. That makes no sense. That injection point is
> > only used by the test in src/test/modules/gin/. Perhaps that ran at the
> > same time as the intarray test? But they run in separate instances, with
> > different data directories.
>
> Do they?  It'd be fairly easy to explain this if these things were
> being run in "installcheck" style.  I'm not sure about CI, but from
> memory, the buildfarm does use installcheck for some things.

Right, as mentioned here:

https://www.postgresql.org/message-id/flat/CA%2BhUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX%3D%3DAVnW84f-%2B8yamQ%40mail.gmail.com

That's the "running" test, which is like the old installcheck.




Re: Proposal to add page headers to SLRU pages

2024-03-14 Thread Jeff Davis
On Mon, 2024-03-11 at 10:01 +, Li, Yong wrote:
> - The clog LSN group has been brought back.
>   Now the page LSN on each clog page is used for honoring the write-
> ahead rule
>   and it is always the highest LSN of all the LSN groups on the page.
>   The LSN groups are used by TransactionIdGetStatus() as before.

I like where this is going.

Álvaro, do you still see a problem with this approach?

> - New comments have been added to pg_upgrade to mention the SLRU
>   page header change as the reason for upgrading clog files.

That seems reasonable, but were any alternatives discussed? Do we have
consensus that this is the right thing to do?

And if we use this approach, is there extra validation or testing that
can be done?

Regards,
Jeff Davis





Re: Weird test mixup

2024-03-14 Thread Tom Lane
Heikki Linnakangas  writes:
> Somehow the 'gin-leave-leaf-split-incomplete' injection point was active 
> in the 'intarray' test. That makes no sense. That injection point is 
> only used by the test in src/test/modules/gin/. Perhaps that ran at the 
> same time as the intarray test? But they run in separate instances, with 
> different data directories.

Do they?  It'd be fairly easy to explain this if these things were
being run in "installcheck" style.  I'm not sure about CI, but from
memory, the buildfarm does use installcheck for some things.

I wonder if it'd be wise to adjust the injection point stuff so that
it's active in only the specific database the injection point was
activated in.

regards, tom lane




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-14 Thread Melanie Plageman
On Thu, Mar 14, 2024 at 5:26 PM Tomas Vondra
 wrote:
>
> On 3/14/24 19:16, Melanie Plageman wrote:
> > On Thu, Mar 14, 2024 at 03:32:04PM +0200, Heikki Linnakangas wrote:
> >> ...
> >>
> >> Ok, committed that for now. Thanks for looking!
> >
> > Attached v6 is rebased over your new commit. It also has the "fix" in
> > 0010 which moves BitmapAdjustPrefetchIterator() back above
> > table_scan_bitmap_next_block(). I've also updated the Streaming Read API
> > commit (0013) to Thomas' v7 version from [1]. This has the update that
> > we theorize should address some of the regressions in the bitmapheapscan
> > streaming read user in 0014.
> >
>
> Should I rerun the benchmarks with these new patches, to see if it
> really helps with the regressions?

That would be awesome!

I will soon send out a summary of what we investigated off-list about
0010 (though we didn't end up concluding anything). My "fix" (leaving
BitmapAdjustPrefetchIterator() above table_scan_bitmap_next_block())
eliminates the regression in 0010 on the one example that I repro'd
upthread, but it would be good to know if it eliminates the
regressions across some other tests.

I think it would be worthwhile to run the subset of tests which seemed
to fare the worst on 0010 against the patches 0001-0010-- cyclic
uncached on your xeon machine with 4 parallel workers, IIRC -- even
the 1 million scale would do the trick, I think.

And then separately run the subset of tests which seemed to do the
worst on 0014. There were several groups of issues across the
different tests, but I think that the uniform pages data test would be
relevant to use. It showed the regressions with eic 0.

As for the other regressions showing with 0014, I think we would want
to see at least one with fully-in-shared-buffers and one with fully
uncached. Some of the fixes were around pinning fewer buffers when the
blocks were already in shared buffers.

- Melanie




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-14 Thread Tomas Vondra
On 3/14/24 19:16, Melanie Plageman wrote:
> On Thu, Mar 14, 2024 at 03:32:04PM +0200, Heikki Linnakangas wrote:
>> ...
>>
>> Ok, committed that for now. Thanks for looking!
> 
> Attached v6 is rebased over your new commit. It also has the "fix" in
> 0010 which moves BitmapAdjustPrefetchIterator() back above
> table_scan_bitmap_next_block(). I've also updated the Streaming Read API
> commit (0013) to Thomas' v7 version from [1]. This has the update that
> we theorize should address some of the regressions in the bitmapheapscan
> streaming read user in 0014.
> 

Should I rerun the benchmarks with these new patches, to see if it
really helps with the regressions?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Weird test mixup

2024-03-14 Thread Heikki Linnakangas
I got a weird test failure while testing my forking refactor patches on 
Cirrus CI 
(https://cirrus-ci.com/task/5880724448870400?logs=test_running#L121):



[16:52:39.753] Summary of Failures:
[16:52:39.753] 
[16:52:39.753] 66/73 postgresql:intarray-running / intarray-running/regress   ERROR 6.27s   exit status 1
[16:52:39.753] 
[16:52:39.753] Ok: 72  
[16:52:39.753] Expected Fail:  0   
[16:52:39.753] Fail:   1   
[16:52:39.753] Unexpected Pass:0   
[16:52:39.753] Skipped:0   
[16:52:39.753] Timeout:0   
[16:52:39.753] 
[16:52:39.753] Full log written to /tmp/cirrus-ci-build/build/meson-logs/testlog-running.txt


And:


diff -U3 /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out 
/tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out
--- /tmp/cirrus-ci-build/contrib/intarray/expected/_int.out 2024-03-14 
16:48:48.690367000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/intarray-running/regress/results/_int.out
2024-03-14 16:52:05.759444000 +
@@ -804,6 +804,7 @@
 
 DROP INDEX text_idx;

 CREATE INDEX text_idx on test__int using gin ( a gin__int_ops );
+ERROR:  error triggered for injection point gin-leave-leaf-split-incomplete
 SELECT count(*) from test__int WHERE a && '{23,50}';
  count 
 ---

@@ -877,6 +878,7 @@
 (1 row)
 
 DROP INDEX text_idx;

+ERROR:  index "text_idx" does not exist
 -- Repeat the same queries with an extended data set. The data set is the
 -- same that we used before, except that each element in the array is
 -- repeated three times, offset by 1000 and 2000. For example, {1, 5}


Somehow the 'gin-leave-leaf-split-incomplete' injection point was active 
in the 'intarray' test. That makes no sense. That injection point is 
only used by the test in src/test/modules/gin/. Perhaps that ran at the 
same time as the intarray test? But they run in separate instances, with 
different data directories. And the 'gin' test passed.


I'm completely stumped. Anyone have a theory?

--
Heikki Linnakangas
Neon (https://neon.tech)




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Maciek Sakrejda
On Thu, Mar 14, 2024 at 1:38 PM Robert Haas  wrote:
> On Thu, Mar 14, 2024 at 4:08 PM Tom Lane  wrote:
> > The patch-of-record contains no such wording.
>
> I plan to fix that, if nobody else beats me to it.
>
> > And if this isn't a
> > security feature, then what is it?  If you have to say to your
> > (super) users "please don't mess with the system configuration",
> > you might as well just trust them not to do it the easy way as not
> > to do it the hard way.  If they're untrustworthy, why have they
> > got superuser?
>
> I mean, I feel like this question has been asked and answered before,
> multiple times, on this thread. If you sincerely don't understand the
> use case, I can try again to explain it. But somehow I feel like it's
> more that you just don't like the idea, which is fair, but it seems
> like a considerable number of people feel otherwise.

I know I'm jumping into a long thread here, but I've been following it
out of interest. I'm sympathetic to the use case, since I used to work
at a Postgres cloud provider, and while our system intentionally did
not give our end users superuser privileges, I can imagine other
managed environments where that's not an issue. I'd like to give
answering this question again a shot, because I think this has been a
persistent misunderstanding in this thread, and I don't think it's
been made all that clear.

It's not a security feature: it's a usability feature.

It's a usability feature because, when Postgres configuration is
managed by an outside mechanism (e.g., as in a Kubernetes
environment), ALTER SYSTEM currently allows a superuser to make
changes that appear to work, but may be discarded at some point in the
future when that outside mechanism updates the config. They may also
be represented incorrectly in a management dashboard if that dashboard
is based on the values in the outside configuration mechanism, rather
than values directly from Postgres.

In this case, the end user with access to Postgres superuser
privileges presumably also has access to the outside configuration
mechanism. The goal is not to prevent them from changing settings, but
to offer guard rails that prevent them from changing settings in a way
that will be unstable (revertible by a future update) or confusing
(not showing up in a management UI).

There are challenges here in making sure this is _not_ seen as a
security feature. But I do think the feature itself is sensible and
worthwhile.

Thanks,
Maciek




Re: JIT compilation per plan node

2024-03-14 Thread Tomas Vondra



On 3/14/24 20:14, Robert Haas wrote:
> On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
>  wrote:
>> I certainly agree that the current JIT costing is quite crude, and we've
>> all seen cases where the decision turns out to not be great. And I think
>> the plan to make the decisions at the node level makes sense, so +1 to
>> that in general.
> 
> Seems reasonable to me also.
> 
>> And I think you're right that looking just at the node total cost may
>> not be sufficient - that we may need a better cost model, considering
>> how many times an expression is executed and so on. But I think we
>> should try to do this in smaller steps, meaningful on their own,
>> otherwise we won't move at all. The two threads linked by Melih are ~4y
>> old and *nothing* changed since then, AFAIK.
>>
>> I think it's reasonable to start by moving the decision to the node
>> level - it's where the JIT happens, anyway. It may not be perfect, but
>> it seems like a clear improvement. And if we then choose to improve the
>> "JIT cost model" to address some of the issues you pointed out, surely
>> that would need to happen at the node level too ...
> 
> I'm not sure I understand whether you (Tomas) think that this patch is
> a good idea or a bad idea as it stands. I read the first of these two
> paragraphs to suggest that the patch hasn't really evolved much in the
> last few years, perhaps suggesting that if it wasn't good enough to
> commit back then, it still isn't now. But the second of these two
> paragraphs seems more supportive.
> 

To clarify, I think the patch is a step in the right direction, and a
meaningful improvement. It may not be the perfect solution we imagine
(but who knows how far we are from that), but AFAIK moving these
decisions to the node level is something the ideal solution would need
to do too.

The reference to the 4y old patches was meant to support this patch as
an improvement - perhaps incomplete, but still an improvement. We keep
imagining "perfect solutions" and then end up doing nothing.

I recognize there's a risk we may never get to have the ideal solution
(e.g. because it requires information we don't possess). But I still
think moving the decision to the node level would allow us to do better
decisions compared to just doing it for the query as a whole.

> From my own point of view, I definitely agree with David's statement
> that what we really want to know is how many times each expression
> will be evaluated. If we had that information, or just an estimate, I
> think we could make much better decisions in this area. But we don't
> have that infrastructure now, and it doesn't seem easy to create, so
> it seems to me that what we have to decide now is whether applying a
> cost threshold on a per-plan-node basis will produce better or worse
> results than what making one decision for the whole plan. David's
> provided an example of where it does indeed work better back in
> https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
> - but could there be enough cases where the opposite happens to make
> us think that the patch is overall a bad idea?
> 

Right, this risk or regression is always there, and I'm sure it'd be
possible to construct such cases. But considering how crude the current
costing is, I'd be surprised if this ends up being a net negative.

Also, is the number of executions really the thing we're missing? Surely
we know the number of rows the node is dealing with, so we could use
this (yes, I realize there are issues, but we deal with that when
costing quals too). Isn't it much bigger issue that we have pretty much
no cost model for the actual JIT (compilation/optimization) depending on
how many expressions it deals with?

> I personally find that a bit unlikely, although not impossible. I see
> a couple of ways that using the per-node cost can distort things -- it
> seems like it will tend to heavily feature JIT for "interior" plan
> nodes because the cost of a plan node includes it's children -- and as
> was mentioned previously, it doesn't really care whether the node cost
> is high because of expression evaluation or something else. But
> neither of those things seem like they'd be bad enough to make this a
> bad way forward over all. For the patch to lose, it seems like we'd
> need a case where the overall plan cost would have been high enough to
> trigger JIT pre-patch, but most of the benefit would have come from
> relatively low-cost nodes that don't get JITted post-patch. The
> easiest way for that to happen is if the planner's estimates are off,
> but that's not really an argument against this patch as much as it is
> an argument that query planning is hard in general.
> 
> A slightly subtler way the patch could lose is if the new threshold is
> harder to adjust than the old one. For example, imagine that you have
> a query that does a Cartesian join. That makes the cost of the input
> nodes rather small compared 

Re: cleanup patches for incremental backup

2024-03-14 Thread Nathan Bossart
On Wed, Jan 24, 2024 at 12:05:15PM -0600, Nathan Bossart wrote:
> There might be an overflow risk in the cutoff time calculation, but I doubt
> that's the root cause of these failures:
> 
>   /*
>* Files should only be removed if the last modification time precedes 
> the
>* cutoff time we compute here.
>*/
>   cutoff_time = time(NULL) - 60 * wal_summary_keep_time;

I've attached a short patch for fixing this overflow risk.  Specifically,
it limits wal_summary_keep_time to INT_MAX / SECS_PER_MINUTE, just like
log_rotation_age.

I considering checking for overflow when we subtract the keep-time from the
result of time(2), but AFAICT there's only a problem if time_t is unsigned,
which Wikipedia leads me to believe is unusual [0], so I figured we might
be able to just wait this one out until 2038.

> Otherwise, I think we'll probably need to add some additional logging to
> figure out what is happening...

Separately, I suppose it's probably time to revert the temporary debugging
code adding by commit 5ddf997.  I can craft a patch for that, too.

[0] https://en.wikipedia.org/wiki/Unix_time#Representing_the_number

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ec2874c18c..c820d1f9ed 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -139,7 +139,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
  * GUC parameters
  */
 bool		summarize_wal = false;
-int			wal_summary_keep_time = 10 * 24 * 60;
+int			wal_summary_keep_time = 10 * HOURS_PER_DAY * MINS_PER_HOUR;
 
 static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
@@ -1474,7 +1474,7 @@ MaybeRemoveOldWalSummaries(void)
 	 * Files should only be removed if the last modification time precedes the
 	 * cutoff time we compute here.
 	 */
-	cutoff_time = time(NULL) - 60 * wal_summary_keep_time;
+	cutoff_time = time(NULL) - wal_summary_keep_time * SECS_PER_MINUTE;
 
 	/* Get all the summaries that currently exist. */
 	wslist = GetWalSummaries(0, InvalidXLogRecPtr, InvalidXLogRecPtr);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 57d9de4dd9..1e71e7db4a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3293,9 +3293,9 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_MIN,
 		},
 		_summary_keep_time,
-		10 * 24 * 60,			/* 10 days */
+		10 * HOURS_PER_DAY * MINS_PER_HOUR, /* 10 days */
 		0,
-		INT_MAX,
+		INT_MAX / SECS_PER_MINUTE,
 		NULL, NULL, NULL
 	},
 


Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2024-03-14 Thread Heikki Linnakangas

On 14/03/2024 22:00, Melanie Plageman wrote:

On Thu, Mar 14, 2024 at 05:30:30PM +0200, Heikki Linnakangas wrote:

typedef struct SharedBitmapHeapInstrumentation
{
int num_workers;
BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedBitmapHeapInstrumentation;

typedef struct BitmapHeapScanState
{
ScanState   ss; /* its first field is 
NodeTag */
...
SharedBitmapHeapInstrumentation sinstrument;
} BitmapHeapScanState;

that compiles, at least with my compiler, but I find it weird to have a
variable-length inner struct embedded in an outer struct like that.


In the attached patch, BitmapHeapScanState->sinstrument is a pointer,
though. Or are you proposing the above as an alternative that you
decided not to go with?


Right, the above is what I contemplated at first but decided it was a 
bad idea.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 3:18 AM Tomas Vondra
 wrote:
> So, IIUC this means (1) the patched code is more aggressive wrt
> prefetching (because we prefetch more data overall, because master would
> prefetch N pages and patched prefetches N ranges, each of which may be
> multiple pages. And (2) it's not easy to quantify how much more
> aggressive it is, because it depends on how we happen to coalesce the
> pages into ranges.
>
> Do I understand this correctly?

Yes.

Parallelism must prevent coalescing here though.  Any parallel aware
executor node that allocates block numbers to workers without trying
to preserve ranges will.  That not only hides the opportunity to
coalesce reads, it also makes (globally) sequential scans look random
(ie locally they are more random), so that our logic to avoid issuing
advice for sequential scan won't work, and we'll inject extra useless
or harmful (?) fadvise calls.  I don't know what to do about that yet,
but it seems like a subject for future research.  Should we recognise
sequential scans with a window (like Linux does), instead of strictly
next-block detection (like some other OSes do)?  Maybe a shared
streaming read that all workers pull blocks from, so it can see what's
going on?  I think the latter would be strictly more like what the ad
hoc BHS prefetching code in master is doing, but I don't know if it'd
be over-engineering, or hard to do for some reason.

Another aspect of per-backend streaming reads in one parallel query
that don't know about each other is that they will all have their own
effective_io_concurrency limit.  That is a version of a problem that
comes up again and again in parallel query, to be solved by the grand
unified resource control system of the future.




Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin  wrote:
>> Could it be that the timeout (360 sec?) is just not enough for the test
>> under the current (changed due to switch to meson) conditions?

> But you're right that under meson the test takes a lot longer, I guess
> due to increased concurrency:

What it seems to be is highly variable.  Looking at calliphoridae's
last half dozen successful runs, I see reported times for
027_stream_regress anywhere from 183 to 704 seconds.  I wonder what
else is happening on that machine.  Also, this is probably not
helping anything:

   'extra_config' => {
  ...
  'fsync = on'

I would suggest turning that off and raising wait_timeout a good
deal, and then we'll see if calliphoridae gets any more stable.

regards, tom lane




Re: Built-in CTYPE provider

2024-03-14 Thread Jeff Davis
On Thu, 2024-03-14 at 15:38 +0100, Peter Eisentraut wrote:
> On 14.03.24 09:08, Jeff Davis wrote:
> > 0001 (the C.UTF-8 locale) is also close...
> 
> If have tested this against the libc locale C.utf8 that was available
> on 
> the OS, and the behavior is consistent.

That was the goal, in spirit.

But to clarify: it's not guaranteed that the built-in C.UTF-8 is always
the same as the libc UTF-8, because different implementations do
different things. For instance, I saw significant differences on MacOS.

> I wonder if we should version the builtin locales too.  We might make
> a 
> mistake and want to change something sometime?

I'm fine with that, see v25-0004 in the reply to your other mail.

The version only tracks sort order, and all of the builtin locales sort
based on memcmp(). But it's possible there are bugs in the
optimizations around memcmp() (e.g. abbreviated keys, or some future
optimization).

> Tiny comments:
> 
> * src/bin/scripts/t/020_createdb.pl
> 
> The two added tests should have different names that tells them apart
> (like the new initdb tests).
> 
> * src/include/catalog/pg_collation.dat

Done in v25-0002 (in reply to your other mail).

Regards,
Jeff Davis






Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele

On 3/14/24 20:00, Michael Paquier wrote:

On Thu, Mar 14, 2024 at 09:12:52AM +1300, David Steele wrote:

I think you are right that the start message is better since it can only
appear once when the backup_label is found. The completed message could in
theory appear after a restart, though the backup_label must have been found
at some point.


So, I've given a try to this patch with 99b4a63bef94, to note that
sidewinder failed because of a timing issue on Windows: the recovery
of the node without backup_label, expected to fail, would try to
backup the last segment it has replayed, because, as it has no
backup_label, it behaves like the primary.  It would try to use the
same archive location as the primary, leading to a conflict failure on
Windows.  This one was easy to fix, by overwritting postgresql.conf on
the node to not do archiving.


Hmmm, I wonder why this did not show up in the Windows tests on CI?


Following that, I've noticed a second race condition: we don't wait
for the segment after pg_switch_wal() to be archived.  This one can be
easily avoided with a poll on pg_stat_archiver.


Ugh, yeah, good change.


After that, also because I've initially managed to, cough, forget an
update of meson.build to list the new test, I've noticed a third
failure on Windows for the case of the node that has a backup_label.
Here is one of the failures:
https://cirrus-ci.com/task/5245341683941376


Is the missing test in meson the reason we did not see test failures for 
Windows in CI?



regress_log_042_low_level_backup and
042_low_level_backup_replica_success.log have all the information
needed, that can be summarized like that:
The system cannot find the file specified.
2024-03-14 06:02:37.670 GMT [560][startup] FATAL:  invalid data in file 
"backup_label"

The first message is something new to me, that seems to point to a
corruption failure of the file system.  Why don't we see something
similar in other tests, then?  Leaving that aside..

The second LOG is something that can be acted on.  I've added some
debugging to the parsing of the backup_label file in the backend, and
noticed that the first fscanf() for START WAL LOCATION is failing
because the last %c is detected as \r rather than \n.  Tweaking the
contents stored from pg_backend_stop() with a sed won't help, because
the issue is that we write the CRLFs with append_to_file, and the
startup process cannot cope with that.  The simplest method I can
think of is to use binmode, as of the attached.


Yeah, that makes sense.


It is the first time that we'd take the contents received from a
BackgroundPsql and write them to a file parsed by the backend, so
perhaps we should try to do that in a more general way, but I'm not
sure how, tbh, and the case of this test is special while adding
handling for \r when reading the backup_label got discussed in the
past but we were OK with what we are doing now on HEAD.


I think it makes sense to leave the parsing code as is and make the 
change in the test. If we add more tests to this module we'll probably 
need a function to avoid repeating code.



On top of all that, note that I have removed remove_tree as I am not
sure if this would be OK in all the buildfarm animals, added a quit()
for BackgroundPsql, moved queries to use less BackgroundPsql, as well
as a few other things like avoiding the hardcoded segment names.
meson.build is.. Cough.. Updated now.


OK.


I am attaching an updated patch with all that fixed, which is stable
in the CI and any tests I've run.  Do you have any comments about


These changes look good to me. Sure wish we had an easier to way to test 
commits in the build farm.


Regards,
-David




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Robert Haas
On Thu, Mar 14, 2024 at 4:08 PM Tom Lane  wrote:
> The patch-of-record contains no such wording.

I plan to fix that, if nobody else beats me to it.

> And if this isn't a
> security feature, then what is it?  If you have to say to your
> (super) users "please don't mess with the system configuration",
> you might as well just trust them not to do it the easy way as not
> to do it the hard way.  If they're untrustworthy, why have they
> got superuser?

I mean, I feel like this question has been asked and answered before,
multiple times, on this thread. If you sincerely don't understand the
use case, I can try again to explain it. But somehow I feel like it's
more that you just don't like the idea, which is fair, but it seems
like a considerable number of people feel otherwise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add the ability to limit the amount of memory that can be allocated to backends.

2024-03-14 Thread Anton A. Melnikov

On 13.03.2024 10:41, Anton A. Melnikov wrote:


Here is a version updated for the current master.



During patch updating i mistakenly added double counting of deallocatated 
blocks.
That's why the tests in the patch tester failed.
Fixed it and squashed fix 0002 with 0001.
Here is fixed version.

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom 7a0925a38acfb9a945087318a5d91fae4680db0e Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 26 Dec 2023 17:54:40 +0100
Subject: [PATCH 1/2] Add tracking of backend memory allocated

Add tracking of backend memory allocated in total and by allocation
type (aset, dsm, generation, slab) by process.

allocated_bytes tracks the current bytes of memory allocated to the
backend process. aset_allocated_bytes, dsm_allocated_bytes,
generation_allocated_bytes and slab_allocated_bytes track the
allocation by type for the backend process. They are updated for the
process as memory is malloc'd/freed.  Memory allocated to items on
the freelist is included.  Dynamic shared memory allocations are
included only in the value displayed for the backend that created
them, they are not included in the value for backends that are
attached to them to avoid double counting. DSM allocations that are
not destroyed by the creating process prior to it's exit are
considered long lived and are tracked in a global counter
global_dsm_allocated_bytes. We limit the floor of allocation
counters to zero. Created views pg_stat_global_memory_allocation and
pg_stat_memory_allocation for access to these trackers.
---
 doc/src/sgml/monitoring.sgml| 246 
 src/backend/catalog/system_views.sql|  34 +++
 src/backend/storage/ipc/dsm.c   |  11 +-
 src/backend/storage/ipc/dsm_impl.c  |  78 +++
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/backend_status.c | 114 +
 src/backend/utils/adt/pgstatfuncs.c |  84 +++
 src/backend/utils/init/miscinit.c   |   3 +
 src/backend/utils/mmgr/aset.c   |  17 ++
 src/backend/utils/mmgr/generation.c |  13 ++
 src/backend/utils/mmgr/slab.c   |  22 ++
 src/include/catalog/pg_proc.dat |  17 ++
 src/include/storage/proc.h  |   2 +
 src/include/utils/backend_status.h  | 144 +++-
 src/test/regress/expected/rules.out |  27 +++
 src/test/regress/expected/stats.out |  36 +++
 src/test/regress/sql/stats.sql  |  20 ++
 17 files changed, 867 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8736eac284..41d788be45 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4599,6 +4599,252 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
  
 
+ 
+  pg_stat_memory_allocation
+
+  
+   pg_stat_memory_allocation
+  
+
+  
+   The pg_stat_memory_allocation view will have one
+   row per server process, showing information related to the current memory
+   allocation of that process in total and by allocator type. Due to the
+   dynamic nature of memory allocations the allocated bytes values may not be
+   exact but should be sufficient for the intended purposes. Dynamic shared
+   memory allocations are included only in the value displayed for the backend
+   that created them, they are not included in the value for backends that are
+   attached to them to avoid double counting.  Use
+   pg_size_pretty described in
+to make these values more easily
+   readable.
+  
+
+  
+   pg_stat_memory_allocation View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   datid oid
+  
+  
+   OID of the database this backend is connected to
+  
+ 
+
+ 
+  
+   pid integer
+  
+  
+   Process ID of this backend
+  
+ 
+
+ 
+  
+   allocated_bytes bigint
+  
+ 
+  Memory currently allocated to this backend in bytes. This is the balance
+  of bytes allocated and freed by this backend. Dynamic shared memory
+  allocations are included only in the value displayed for the backend that
+  created them, they are not included in the value for backends that are
+  attached to them to avoid double counting.
+ 
+ 
+
+ 
+  
+   aset_allocated_bytes bigint
+  
+  
+   Memory currently allocated to this backend in bytes via the allocation
+   set allocator.
+  
+ 
+
+ 
+  
+   dsm_allocated_bytes bigint
+  
+  
+   Memory currently allocated to this backend in bytes via the dynamic
+   shared memory allocator. Upon process exit, dsm allocations that have
+   not been freed are considered long lived and added to
+   global_dsm_allocated_bytes 

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-03-14 Thread Robert Haas
On Thu, Mar 14, 2024 at 1:54 PM Jelte Fennema-Nio  wrote:
> In my view there can be, **by definition,** only two general types of
> protocol changes:
> 1. A "simple protocol change": This is one that requires agreement by
> both the client and server, that they understand the new message types
> involved in this change. e.g. the ParameterSet message proposal (this
> message type is either supported or it's not)
> 2. A "parameterized protocol change": This requires the same as 1, but
> should also allow some parameterization from the client, e.g. for the
> compression proposal, the client should specify what compression
> algorithm the server should use to compress data when sending it to
> the client.

You seem to be supposing here that all protocol changes will consist
of adding new message types. While I think that will be a common
pattern, I do not think it will be a universal one. I do agree,
however, that every possible variation of the protocol is either
Boolean-valued (i.e. are we doing X or not?) or something more
complicated (i.e. how exactly are doing X?).

> Client and Server can agree that a "simple protocol change" is
> supported by both advertising a minimum minor protocol version. And
> for a "parameterized protocol change" the client can send a _pq_
> parameter in the startup packet.
>
> So, new _pq_ parameters should only ever be introduced for
> parameterized protocol changes. They are not meant to advertise
> support, they are meant to configure protocol features. For a
> non-configurable protocol feature, we'd simply bump the protocol
> version. And since _pq_ parameters thus always control some setting at
> the session level, we can simply store it as a GUC, because that's how
> we store all our parameters at a session level.

This is definitely not how I was thinking about it. I was imagining
that we wanted to reserve bumping the protocol version for more
significant changes, and that we'd use _pq_ parameters for relatively
minor new functionality whether Boolean-valued or otherwise.

> I think given the automatic downgrade supported by the
> NegotiateProtocolVersion, there's no real down-side to requesting the
> newest version by default. The only downside I can see is when
> connecting to other applications (i.e. non PostgreSQL servers) that
> implement the postgres protocol, but don't implement
> NegotiateProtocolVersion. But for that I added the
> max_protocol_version connection option in 0006 (of my latest
> patchset).

You might be right. This is a minor point that's not worth arguing
about too much.

> > I'm really unhappy with 0002-0004.
>
> Just to be clear, (afaict) your argument below seems to only really be
> about 0004, not about 0002 or 0003. Was there anything about 0002 &
> 0003 that you were unhappy with? 0002 & 0003 are not dependent an 0004
> imho. Because even when not making _pq_ parameters map to GUCs, we'd
> still need to change libpq to not instantly close the connection
> whenever a _pq_ parameter (or new protocol version) is not supported
> by the server (which is what 0002 & 0003 do).

I completely agree that we need to avoid slamming the connection shut.
What I don't agree with is taking the list of protocol extensions that
the server knows about and putting it into an array of strings that
the user can see. I don't want the user to know or care so much about
what's happening at the wire protocol level. The user is entitled to
know whether PQsetProtocolParameter() will work or not, and the user
is entitled to know whether it has a chance of working next time if it
didn't work this time, and when it fails, the user is entitled to a
good error message explaining the reason for the failure. But the user
is not entitled to know what negotiation took place over the wire to
figure that out. They shouldn't need to know that the _pq_ namespace
exists, and they shouldn't need to know whether we negotiated the
availability or unavailability of PQsetProtocolParameter() using [a]
the protocol minor version number, [b] the protocol major version
number, [c] a protocol option called parameter_set, or [d] a protocol
option called bananas_foster. Those are all things that might change
in the future.

Just as a for instance, I had a thought that we might accumulate a few
new message types controlled by protocol options (ParameterSet,
AlwaysSendTypeInBinaryFormat, whatever else) while keeping the
protocol version as 3.0, and then eventually bump the protocol version
to 3.1 where all of that would be mandatory functionality. So the
protocol parameters wouldn't be specified any more when using 3.1, but
they would be specified when talking to older 3.0 servers. That
difference shouldn't be visible to the user. The user should only know
the result of the negotiation.

> Okay, our interpretation is very different here. From my perspective
> introducing a non-GUC namespace is NOT AT ALL the benefit of the _pq_
> prefix. The main benefit (imho) is that it allows sending an
> 

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin  wrote:
> Could it be that the timeout (360 sec?) is just not enough for the test
> under the current (changed due to switch to meson) conditions?

Hmm, well it looks like he switched over to meson around 42 days ago
2024-02-01, looking at "calliphoridae" (skink has the extra
complication of valgrind, let's look at a more 'normal' animal
instead).  The first failure that looks like that on calliphoridae is
19 days ago 2024-02-23, and after that it's happening every 3 days,
sometimes in clusters.

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=calliphoridae=HEAD

But you're right that under meson the test takes a lot longer, I guess
due to increased concurrency:

287/287 postgresql:recovery / recovery/027_stream_regress
   OK  684.50s   6 subtests passed

With make we don't have an individual time per script, but for for all
of the recovery tests we had for example:

t/027_stream_regress.pl ... ok
All tests successful.
Files=39, Tests=542, 65 wallclock secs ( 0.26 usr  0.06 sys + 20.16
cusr 31.65 csys = 52.13 CPU)




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 14, 2024 at 3:13 PM Tom Lane  wrote:
>> With the possible exception of #1, every one of these is easily
>> defeatable by an uncooperative superuser.  I'm not excited about
>> adding a "security" feature with such obvious holes in it.

> We're going to document that it's not a security feature along the
> lines of what Magnus suggested in
> http://postgr.es/m/CABUevEx9m=CV8=wpxvw+rtvvs858kdj6yprkexv7n+f6mk0...@mail.gmail.com

The patch-of-record contains no such wording.  And if this isn't a
security feature, then what is it?  If you have to say to your
(super) users "please don't mess with the system configuration",
you might as well just trust them not to do it the easy way as not
to do it the hard way.  If they're untrustworthy, why have they
got superuser?

What I think this is is a loaded foot-gun painted in kid-friendly
colors.  People will use it and then file CVEs about how it did
not turn out to be as secure as they imagined (probably without
reading the documentation).

regards, tom lane




Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2024-03-14 Thread Melanie Plageman
On Thu, Mar 14, 2024 at 05:30:30PM +0200, Heikki Linnakangas wrote:
> On 18/02/2024 00:31, Tomas Vondra wrote:
> > Do you plan to work continue working on this patch? I did take a look,
> > and on the whole it looks reasonable - it modifies the right places etc.
> 
> +1
> 
> > I think there are two things that may need an improvement:
> > 
> > 1) Storing variable-length data in ParallelBitmapHeapState
> > 
> > I agree with Robert the snapshot_and_stats name is not great. I see
> > Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData -
> > the reasons are somewhat different (phs_snapshot_off exists because we
> > don't know which exact struct will be allocated), while here we simply
> > need to allocate two variable-length pieces of memory. But it seems like
> > it would work nicely for this. That is, we could try adding an offset
> > for each of those pieces of memory:
> > 
> >   - snapshot_off
> >   - stats_off
> > 
> > I don't like the GetSharedSnapshotData name very much, it seems very
> > close to GetSnapshotData - quite confusing, I think.
> > 
> > Dmitry also suggested we might add a separate piece of shared memory. I
> > don't quite see how that would work for ParallelBitmapHeapState, but I
> > doubt it'd be simpler than having two offsets. I don't think the extra
> > complexity (paid by everyone) would be worth it just to make EXPLAIN
> > ANALYZE work.
> 
> I just removed phs_snapshot_data in commit 84c18acaf6. I thought that would
> make this moot, but now that I rebased this, there are stills some aesthetic
> questions on how best to represent this.
> 
> In all the other node types that use shared instrumentation like this, the
> pattern is as follows: (using Memoize here as an example, but it's similar
> for Sort, IncrementalSort, Agg and Hash)
> 
> /* 
>  * Shared memory container for per-worker memoize information
>  * 
>  */
> typedef struct SharedMemoizeInfo
> {
>   int num_workers;
>   MemoizeInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
> } SharedMemoizeInfo;
> 
> /* this struct is backend-private */
> typedef struct MemoizeState
> {
>   ScanState   ss; /* its first field is 
> NodeTag */
>   ...
>   MemoizeInstrumentation stats;   /* execution statistics */
>   SharedMemoizeInfo *shared_info; /* statistics for parallel workers */
> } MemoizeState;
> 
> While the scan is running, the node updates its private data in
> MemoizeState->stats. At the end of a parallel scan, the worker process
> copies the MemoizeState->stats to MemoizeState->shared_info->stats, which
> lives in shared memory. The leader process copies
> MemoizeState->shared_info->stats to its own backend-private copy, which it
> then stores in its MemoizeState->shared_info, replacing the pointer to the
> shared memory with a pointer to the private copy. That happens in
> ExecMemoizeRetrieveInstrumentation().
> 
> This is a little different for parallel bitmap heap scans, because a bitmap
> heap scan keeps some other data in shared memory too, not just
> instrumentation data. Also, the naming is inconsistent: the equivalent of
> SharedMemoizeInfo is actually called ParallelBitmapHeapState. I think we
> should rename it to SharedBitmapHeapInfo, to make it clear that it lives in
> shared memory, but I digress.

FWIW, if we merge a BHS streaming read user like the one I propose in
[1] (not as a pre-condition to this but just as something to make you
more comfortable with these names), the ParallelBitmapHeapState will
basically only contain the shared iterator and the coordination state
for accessing it and could be named as such.

Then if you really wanted to be consistent with Memoize, you could name
the instrumentation SharedBitmapHeapInfo. But, personally I prefer the
name you gave it: SharedBitmapHeapInstrumentation. I think that would
have been a better name for SharedMemoizeInfo since num_workers is
really just used as the length of the array of instrumentation info.

> We could now put the new stats at the end of ParallelBitmapHeapState as a
> varlen field. But I'm not sure that's a good idea. In
> ExecBitmapHeapRetrieveInstrumentation(), would we make a backend-private
> copy of the whole ParallelBitmapHeapState struct, even though the other
> fields don't make sense after the shared memory is released? Sounds
> confusing. Or we could introduce a separate struct for the stats, and copy
> just that:
> 
> typedef struct SharedBitmapHeapInstrumentation
> {
>   int num_workers;
>   BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
> } SharedBitmapHeapInstrumentation;
> 
> typedef struct BitmapHeapScanState
> {
>   ScanState   ss; /* its first field is 
> NodeTag */
>   ...
>   SharedBitmapHeapInstrumentation sinstrument;
> } BitmapHeapScanState;
> 
> that compiles, at least with my compiler, but I find it 

Re: JIT compilation per plan node

2024-03-14 Thread David Rowley
Thanks for chipping in here.

On Fri, 15 Mar 2024 at 08:14, Robert Haas  wrote:
> A slightly subtler way the patch could lose is if the new threshold is
> harder to adjust than the old one. For example, imagine that you have
> a query that does a Cartesian join. That makes the cost of the input
> nodes rather small compared to the cost of the join node, and it also
> means that JITting the inner join child in particular is probably
> rather important. But if you set join_above_cost low enough to JIT
> that node post-patch, then maybe you'll also JIT a bunch of things
> that aren't on the inner side of a nested loop and which might
> therefore not really need JIT. Unless I'm missing something, this is a
> fairly realistic example of where this patch's approach to costing
> could turn out to be painful ... but it's not like the current system
> is pain-free either.

I think this case would be covered as the cost of the inner side of
the join would be multiplied by the estimated outer-side rows.
Effectively, making this part work is the bulk of the patch as we
currently don't know the estimated number of loops of a node during
create plan.

David




RE: Popcount optimization using AVX512

2024-03-14 Thread Amonson, Paul D
> -Original Message-
> From: Nathan Bossart 
> Sent: Monday, March 11, 2024 6:35 PM
> To: Amonson, Paul D 

> Thanks.  There's no need to wait to post the AVX portion.  I recommend using
> "git format-patch" to construct the patch set for the lists.

After exploring git format-patch command I think I understand what you need. 
Attached.
 
> > What exactly do you suggest here? I am happy to always call either
> > pg_popcount32() or pg_popcount64() with the understanding that it may
> > not be optimal, but I do need to know which to use.
> 
> I'm recommending that we don't change any of the code in the pg_popcount()
> function (which is renamed to pg_popcount_slow() in your v6 patch).  If
> pointers are 8 or more bytes, we'll try to process the buffer in 64-bit 
> chunks.
> Else, we'll try to process it in 32-bit chunks.  Any remaining bytes will be
> processed one-by-one.

Ok, we are on the same page now. :)  It is already fixed that way in the 
refactor patch #1.

As for new performance numbers: I just ran a full suite like I did earlier in 
the process. My latest results an equivalent to a pgbench scale factor 10 DB 
with the target column having varying column widths and appropriate random data 
are 1.2% improvement with a 2.2% Margin of Error at a 98% confidence level. 
Still seeing improvement and no regressions.

As stated in the previous separate chain I updated the code removing the extra 
"extern" keywords.

Thanks,
Paul



v8-0001-Refactor-POPCNT-code-refactored-for-future-accelerat.patch
Description: v8-0001-Refactor-POPCNT-code-refactored-for-future-accelerat.patch


v8-0002-Feat-Add-AVX-512-POPCNT-support-initial-checkin.patch
Description: v8-0002-Feat-Add-AVX-512-POPCNT-support-initial-checkin.patch


Re: WIP Incremental JSON Parser

2024-03-14 Thread Jacob Champion
I've been poking at the partial token logic. The json_errdetail() bug
mentioned upthread (e.g. for an invalid input `[12zz]` and small chunk
size) seems to be due to the disconnection between the "main" lex
instance and the dummy_lex that's created from it. The dummy_lex
contains all the information about the failed token, which is
discarded upon an error return:

> partial_result = json_lex(_lex);
> if (partial_result != JSON_SUCCESS)
> return partial_result;

In these situations, there's an additional logical error:
lex->token_start is pointing to a spot in the string after
lex->token_terminator, which breaks an invariant that will mess up
later pointer math. Nothing appears to be setting lex->token_start to
point into the partial token buffer until _after_ the partial token is
successfully lexed, which doesn't seem right -- in addition to the
pointer math problems, if a previous chunk was freed (or on a stale
stack frame), lex->token_start will still be pointing off into space.
Similarly, wherever we set token_terminator, we need to know that
token_start is pointing into the same buffer.

Determining the end of a token is now done in two separate places
between the partial- and full-lexer code paths, which is giving me a
little heartburn. I'm concerned that those could drift apart, and if
the two disagree on where to end a token, we could lose data into the
partial token buffer in a way that would be really hard to debug. Is
there a way to combine them?

Thanks,
--Jacob




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Robert Haas
On Thu, Mar 14, 2024 at 3:13 PM Tom Lane  wrote:
> With the possible exception of #1, every one of these is easily
> defeatable by an uncooperative superuser.  I'm not excited about
> adding a "security" feature with such obvious holes in it.
> We reverted MAINTAIN last year for much less obvious holes;
> how is it that we're going to look the other way on this one?

We're going to document that it's not a security feature along the
lines of what Magnus suggested in
http://postgr.es/m/CABUevEx9m=CV8=wpxvw+rtvvs858kdj6yprkexv7n+f6mk0...@mail.gmail.com

And then maybe someday we'll do this:

> Really we'd need to do something about removing superusers'
> access to the filesystem in order to build something with
> fewer holes.  I'm not against inventing such a feature,
> but it'd take a fair amount of work and likely would end
> in a noticeably less usable system (no plpython for example).

Yep. It would be useful if you replied to the portion of
http://postgr.es/m/ca+tgmoasugkz27x0xzh4edmq_b6jbrt6csuxf+phdgj-esk...@mail.gmail.com
where I enumerate the methods that I know about for the superuser to
get filesystem access. I don't think it's going to be practical to
block all of those methods in a single commit, and I'm not entirely
convinced that we can ever close all the holes without compromising
the superuser's ability to do necessary system administration tasks,
but maybe it's possible, and documenting the list of such methods
would make it a lot easier for users to understand the risks and
hackers to pick problems to try to tackle.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: broken JIT support on Fedora 40

2024-03-14 Thread Pavel Stehule
Hi

čt 14. 3. 2024 v 19:20 odesílatel Robert Haas 
napsal:

> On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule 
> wrote:
> > after today update, the build with --with-llvm produces broken code, and
> make check fails with crash
> >
> > Upgradehwdata-0.380-1.fc40.noarch
>  @updates-testing
> > Upgraded   hwdata-0.379-1.fc40.noarch
>  @@System
> > Upgradellvm-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> > Upgraded   llvm-17.0.6-6.fc40.x86_64
> @@System
> > Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> > Upgraded   llvm-devel-17.0.6-6.fc40.x86_64
> @@System
> > Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-libs-18.1.0~rc4-1.fc40.i686
>  @updates-testing
> > Upgraded   llvm-libs-17.0.6-6.fc40.i686
>  @@System
> > Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-libs-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-static-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Instalovat llvm17-libs-17.0.6-7.fc40.i686
>  @updates-testing
> > Instalovat llvm17-libs-17.0.6-7.fc40.x86_64
>  @updates-testing
>
> I don't know anything about LLVM, but somehow I'm guessing that people
> who do will need more detail than this in order to be able to fix the
> problem. I see that support for LLVM 18 was added in commit
> d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
> were building from an older commit?
>

I repeated build and check today, fresh master, today fedora update with
same result

build is ok, but regress tests fails with crash (it works without
-with-llvm)

Regards

Pavel






>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: JIT compilation per plan node

2024-03-14 Thread Robert Haas
On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
 wrote:
> I certainly agree that the current JIT costing is quite crude, and we've
> all seen cases where the decision turns out to not be great. And I think
> the plan to make the decisions at the node level makes sense, so +1 to
> that in general.

Seems reasonable to me also.

> And I think you're right that looking just at the node total cost may
> not be sufficient - that we may need a better cost model, considering
> how many times an expression is executed and so on. But I think we
> should try to do this in smaller steps, meaningful on their own,
> otherwise we won't move at all. The two threads linked by Melih are ~4y
> old and *nothing* changed since then, AFAIK.
>
> I think it's reasonable to start by moving the decision to the node
> level - it's where the JIT happens, anyway. It may not be perfect, but
> it seems like a clear improvement. And if we then choose to improve the
> "JIT cost model" to address some of the issues you pointed out, surely
> that would need to happen at the node level too ...

I'm not sure I understand whether you (Tomas) think that this patch is
a good idea or a bad idea as it stands. I read the first of these two
paragraphs to suggest that the patch hasn't really evolved much in the
last few years, perhaps suggesting that if it wasn't good enough to
commit back then, it still isn't now. But the second of these two
paragraphs seems more supportive.

>From my own point of view, I definitely agree with David's statement
that what we really want to know is how many times each expression
will be evaluated. If we had that information, or just an estimate, I
think we could make much better decisions in this area. But we don't
have that infrastructure now, and it doesn't seem easy to create, so
it seems to me that what we have to decide now is whether applying a
cost threshold on a per-plan-node basis will produce better or worse
results than what making one decision for the whole plan. David's
provided an example of where it does indeed work better back in
https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
- but could there be enough cases where the opposite happens to make
us think that the patch is overall a bad idea?

I personally find that a bit unlikely, although not impossible. I see
a couple of ways that using the per-node cost can distort things -- it
seems like it will tend to heavily feature JIT for "interior" plan
nodes because the cost of a plan node includes it's children -- and as
was mentioned previously, it doesn't really care whether the node cost
is high because of expression evaluation or something else. But
neither of those things seem like they'd be bad enough to make this a
bad way forward over all. For the patch to lose, it seems like we'd
need a case where the overall plan cost would have been high enough to
trigger JIT pre-patch, but most of the benefit would have come from
relatively low-cost nodes that don't get JITted post-patch. The
easiest way for that to happen is if the planner's estimates are off,
but that's not really an argument against this patch as much as it is
an argument that query planning is hard in general.

A slightly subtler way the patch could lose is if the new threshold is
harder to adjust than the old one. For example, imagine that you have
a query that does a Cartesian join. That makes the cost of the input
nodes rather small compared to the cost of the join node, and it also
means that JITting the inner join child in particular is probably
rather important. But if you set join_above_cost low enough to JIT
that node post-patch, then maybe you'll also JIT a bunch of things
that aren't on the inner side of a nested loop and which might
therefore not really need JIT. Unless I'm missing something, this is a
fairly realistic example of where this patch's approach to costing
could turn out to be painful ... but it's not like the current system
is pain-free either.

I don't really know what's best here, but I'm mildly inclined to
believe that the patch might be a change for the better. I have not
reviewed the implementation and have no comment on whether it's good
or bad from that point of view.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Tom Lane
Robert Haas  writes:
> As far as I can see from reading the thread, most people agree that
> it's reasonable to have some way to disable ALTER SYSTEM, but there
> are at least six competing ideas about how to do that:

> 1. command-line option
> 2. GUC
> 3. event trigger
> 4. extension
> 5. sentinel file
> 6. remove permissions on postgresql.auto.conf

With the possible exception of #1, every one of these is easily
defeatable by an uncooperative superuser.  I'm not excited about
adding a "security" feature with such obvious holes in it.
We reverted MAINTAIN last year for much less obvious holes;
how is it that we're going to look the other way on this one?

#2 with the GUC_DISALLOW_IN_AUTO_FILE flag can be made secure
(I think) by putting the main postgresql.conf file outside the
data directory and then making it not owned by or writable by the
postgres user.  But I doubt that's a common configuration, and
I'm sure we will get complaints from people who failed to set it
up that way.  The proposed patch certainly doesn't bother to
document the hazard.

Really we'd need to do something about removing superusers'
access to the filesystem in order to build something with
fewer holes.  I'm not against inventing such a feature,
but it'd take a fair amount of work and likely would end
in a noticeably less usable system (no plpython for example).

regards, tom lane




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Jelte Fennema-Nio
On Thu, 14 Mar 2024 at 17:37, Robert Haas  wrote:
> or in the
> alternative (2) but with the GUC being PGC_SIGHUP and
> GUC_DISALLOW_IN_AUTO_FILE. I believe there would be adequate consensus
> to proceed with either of those approaches. Anybody feel like coding
> it up?

Here is a slightly modified version of Gabrielle his original patch,
which already implemented GUC approach. The changes I made are adding
PGC_SIGHUP and GUC_DISALLOW_IN_AUTO_FILE as well as adding some more
docs.


v2-0001-Add-enable_alter_system-GUC.patch
Description: Binary data


Re: broken JIT support on Fedora 40

2024-03-14 Thread Robert Haas
On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule  wrote:
> after today update, the build with --with-llvm produces broken code, and make 
> check fails with crash
>
> Upgradehwdata-0.380-1.fc40.noarch   
> @updates-testing
> Upgraded   hwdata-0.379-1.fc40.noarch   @@System
> Upgradellvm-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> Upgraded   llvm-17.0.6-6.fc40.x86_64@@System
> Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64  
> @updates-testing
> Upgraded   llvm-devel-17.0.6-6.fc40.x86_64  @@System
> Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64 
> @updates-testing
> Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64 @@System
> Upgradellvm-libs-18.1.0~rc4-1.fc40.i686 
> @updates-testing
> Upgraded   llvm-libs-17.0.6-6.fc40.i686 @@System
> Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64   
> @updates-testing
> Upgraded   llvm-libs-17.0.6-6.fc40.x86_64   @@System
> Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64 
> @updates-testing
> Upgraded   llvm-static-17.0.6-6.fc40.x86_64 @@System
> Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64   
> @updates-testing
> Instalovat llvm17-libs-17.0.6-7.fc40.i686   
> @updates-testing
> Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 
> @updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Pre-proposal: unicode normalized text

2024-03-14 Thread Jeff Davis
On Thu, 2024-02-29 at 17:02 -0800, Jeff Davis wrote:
> Attached is an implementation of a per-database option STRICT_UNICODE
> which enforces the use of assigned code points only.

The CF app doesn't seem to point at the latest patch:

https://www.postgresql.org/message-id/a0e85aca6e03042881924c4b31a840a915a9d349.ca...@j-davis.com

which is perhaps why nobody has looked at it yet.

But in any case, I'm OK if this gets bumped to 18. I still think it's a
good feature, but some of the value will come later in v18 anyway, when
I plan to propose support for case folding. Case folding is a version
of lowercasing with compatibility guarantees when you only use assigned
code points.

Regards,
Jeff Davis





Re: UUID v7

2024-03-14 Thread Andrey M. Borodin



> On 14 Mar 2024, at 20:10, Peter Eisentraut  wrote:
> 
> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).
 Well, it was correct only for implemented variant. I've made version that 
 implements full table 1 from section 4.1.
>>> I think we are still interpreting this differently.  I think 
>>> uuid_extract_variant should just return whatever is in those four bits. 
>>> Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", 
>>> which I don't think it is correct.  It should return 0 through 15.
>> We will return "do not care" bits. This bits can confuse someone. E.g. for 
>> varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some 
>> reason document lists number 1-15, but your are correct that range is 0-15.
> 
> I agree it's confusing.  Before I studied the RFC 4122bis project, I didn't 
> even know about variant vs. version.  I think overall people will find this 
> more confusing than useful.  If you just want to know, "is this UUID of the 
> kind specified in RFC 4122", you can query it with uuid_extract_version(x) IS 
> NOT NULL.  So maybe we don't need the _extract_variant function?

I think it's the best possible solution. The variant has no value besides 
detecting if a version can be extracted.


Best regards, Andrey Borodin.



Re: linux cachestat in file Readv and Prefetch

2024-03-14 Thread Robert Haas
On Sat, Feb 17, 2024 at 6:10 PM Tomas Vondra
 wrote:
> I may be missing some important bit behind this idea, but this does not
> seem like a great idea to me. The comment added to FilePrefetch says this:
>
>   /*
>* last time we visit this file (somewhere), nr_recently_evicted pages
>* of the range were just removed from vm cache, it's a sign a memory
>* pressure. so do not prefetch further.
>* it is hard to guess if it is always the right choice in absence of
>* more information like:
>*  - prefetching distance expected overall
>*  - access pattern/backend maybe
>*/
>
> Firstly, is this even a good way to detect memory pressure? It's clearly
> limited to a single 1GB segment, so what's the chance we'll even see the
> "pressure" on a big database with many files?
>
> If we close/reopen the file (which on large databases we tend to do very
> often) how does that affect the data reported for the file descriptor?
>
> I'm not sure I even agree with the idea that we should stop prefetching
> when there is memory pressure. IMHO it's perfectly fine to keep
> prefetching stuff even if it triggers eviction of unnecessary pages from
> page cache. That's kinda why the eviction exists.

I agree with all of these criticisms. I think it's the job of
pg_prewarm to do what the user requests, not to second-guess whether
the user requested the right thing. One of the things that frustrates
people about the ring-buffer system is that it's hard to get all of
your data cached in shared_buffers by just reading it, e.g. SELECT *
FROM my_table. If pg_prewarm also isn't guaranteed to actually read
your data, and may decide that your data didn't need to be read after
all, then what exactly is a user supposed to do if they're absolutely
sure that they know better than PostgreSQL itself and want to
guarantee that their data actually does get read?

So I think a feature like this would at the very least need to be
optional, but it's unclear to me why we'd want it at all, and I feel
like Cedric's email doesn't really answer that question. I suppose
that if you could detect useless prefetching and skip it, you'd save a
bit of work, but under what circumstances does anyone use pg_prewarm
so aggressively as to make that a problem, and why wouldn't the
solution be for the user to just calm down a little bit? There
shouldn't be any particular reason why the user can't know both the
size of shared_buffers and the approximate size of the OS cache;
indeed, they can probably know the latter much more reliably than
PostgreSQL itself can. So it should be fairly easy to avoid just
prefetching more data than will fit, and then you don't have to worry
about any of this. And you'll probably get a better result, too,
because, along the same lines as Tomas's remarks above, I doubt that
this would be an accurate method anyway.

> Well ... I'd argue at least some basic evaluation of performance is a
> rather important / expected part of a proposal for a patch that aims to
> improve a performance-focused feature. It's impossible to have any sort
> of discussion about such patch without that.

Right.

I'm going to mark this patch as Rejected in the CommitFest application
for now. If in subsequent discussion that comes to seem like the wrong
result, then we can revise accordingly, but right now it looks
extremely unlikely to me that this is something that we'd want.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Alexander Lakhin

Hello Thomas and Michael,

14.03.2024 06:16, Thomas Munro wrote:


Yeah, I was wondering if its checkpoint delaying logic might have
got the checkpointer jammed or something like that, but I don't
currently see how.  Yeah, the replay of bulk newpages could be
relevant, but it's not exactly new technology.  One thing I wondered
about is whether the Perl "wait for catchup" thing, which generates
large volumes of useless log, could be somehow changed to actually
show the progress after some time.  Something like "I'm still waiting
for this replica to reach LSN X, but it has so far only reported LSN
Y, and here's a dump of the WAL around there"?


I have perhaps reproduced the issue here (at least I'm seeing something
similar), and going to investigate the issue in the coming days, but what
I'm confused with now is the duration of poll_query_until:
For the failure you referenced:
[15:55:54.740](418.725s) # poll_query_until timed out executing this query:

And a couple of others:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-08%2000%3A34%3A06
[00:45:57.747](376.159s) # poll_query_until timed out executing this query:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-04%2016%3A32%3A17
[16:45:24.870](407.970s) # poll_query_until timed out executing this query:

Could it be that the timeout (360 sec?) is just not enough for the test
under the current (changed due to switch to meson) conditions?

Best regards,
Alexander




Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-03-14 Thread Jelte Fennema-Nio
On Thu, 14 Mar 2024 at 16:45, Robert Haas  wrote:
> I feel bad arguing about the patches that you think are a slam-dunk,
> but I find myself disagreeing with the design choices.

No worries, thanks a lot for responding. I'm happy to discuss this
design further. I didn't necessarily mean these patches were a
slam-dunk. I mainly meant that these first few patches were not
specific to any protocol change, but are changes we should agree on
before any change to the protocol is possible at all. Based on your
response, we currently disagree on a bunch of core things.

I'll first try to summarize my view on (future) protocol changes and
why I think the current core design in this patchset is the correct
path forward, and then go into some details inline in your response
below:

In my view there can be, **by definition,** only two general types of
protocol changes:
1. A "simple protocol change": This is one that requires agreement by
both the client and server, that they understand the new message types
involved in this change. e.g. the ParameterSet message proposal (this
message type is either supported or it's not)
2. A "parameterized protocol change": This requires the same as 1, but
should also allow some parameterization from the client, e.g. for the
compression proposal, the client should specify what compression
algorithm the server should use to compress data when sending it to
the client.

Client and Server can agree that a "simple protocol change" is
supported by both advertising a minimum minor protocol version. And
for a "parameterized protocol change" the client can send a _pq_
parameter in the startup packet.

So, new _pq_ parameters should only ever be introduced for
parameterized protocol changes. They are not meant to advertise
support, they are meant to configure protocol features. For a
non-configurable protocol feature, we'd simply bump the protocol
version. And since _pq_ parameters thus always control some setting at
the session level, we can simply store it as a GUC, because that's how
we store all our parameters at a session level.

> Regarding 0001, I considered making this change as part of
> ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed but decided against it,
> because it seemed like it was making the assumption that we always
> wanted to initiate new connections with the latest protocol version
> that we know how to accept, and I wasn't sure that would always be
> true.

I think given the automatic downgrade supported by the
NegotiateProtocolVersion, there's no real down-side to requesting the
newest version by default. The only downside I can see is when
connecting to other applications (i.e. non PostgreSQL servers) that
implement the postgres protocol, but don't implement
NegotiateProtocolVersion. But for that I added the
max_protocol_version connection option in 0006 (of my latest
patchset).

> I'm really unhappy with 0002-0004.

Just to be clear, (afaict) your argument below seems to only really be
about 0004, not about 0002 or 0003. Was there anything about 0002 &
0003 that you were unhappy with? 0002 & 0003 are not dependent an 0004
imho. Because even when not making _pq_ parameters map to GUCs, we'd
still need to change libpq to not instantly close the connection
whenever a _pq_ parameter (or new protocol version) is not supported
by the server (which is what 0002 & 0003 do).

> That left no room for any other
> type of protocol modification, so that commit carved out an exception,
> where when we something that starts with _pq_., it's assumed to be
> setting some other kind of thing, not a GUC.

Okay, our interpretation is very different here. From my perspective
introducing a non-GUC namespace is NOT AT ALL the benefit of the _pq_
prefix. The main benefit (imho) is that it allows sending an
"optional" parameter (i.e GUC) in the startup packet. So, one where if
the server doesn't recognize it the connection attempt still succeeds.
If you specify a normal GUC in the connection parameters and the
server doesn't know about it, the server will close the connection.
So, to be able to send a GUC that depends on the protocol and/or
server version in an optional way, you'd need to wait for an extra
network roundtrip until the server tells you what protocol and/or
server version they are.

> But 0004 throws that out
> the window and decides, nope, those are just GUCs, too. Even if we
> don't have a specific reason today why we'd ever want such a thing, it
> seems short-sighted to give up on the possibility that in the future
> we will.

Since I believe a _pq_ parameter should only be used to control
settings at the session level. I don't think it would be short-sighted
to give-up on the possibility to store them as anything else as GUCs.
Because in the many years that we've had GUCs, we've been able to
store all session settings using that infrastructure. BUT PLEASE NOTE:
I don't think we are giving up on the thing you're describing (see
explanation in final part of this email)

> With this 

Re: Psql meta-command conninfo+

2024-03-14 Thread Nathan Bossart
Hi Maiquel,

On Thu, Feb 29, 2024 at 10:02:21PM +, Maiquel Grassi wrote:
> Sorry for the delay. I will make the adjustments as requested soon.

We have only a few weeks left before feature-freeze for v17.  Do you think
you will be able to send an updated patch soon?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add system identifier to backup manifest

2024-03-14 Thread Robert Haas
On Thu, Mar 14, 2024 at 11:05 AM Amul Sul  wrote:
> Thank you, Robert.

Thanks for the patch!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Possibility to disable `ALTER SYSTEM`

2024-03-14 Thread Robert Haas
On Tue, Feb 13, 2024 at 2:05 AM Joel Jacobson  wrote:
> > Wouldn't having system wide EVTs be a generic solution which could be the
> > infrastructure for this requested change as well as others in the same area?
>
> +1
>
> I like the wider vision of providing the necessary infrastructure to provide 
> a solution for the general case.

We don't seem to be making much progress here.

As far as I can see from reading the thread, most people agree that
it's reasonable to have some way to disable ALTER SYSTEM, but there
are at least six competing ideas about how to do that:

1. command-line option
2. GUC
3. event trigger
4. extension
5. sentinel file
6. remove permissions on postgresql.auto.conf

As I see it, (5) or (6) are most convenient for the system
administrator, since they let that person make changes without needing
to log into the database or, really, worry very much about the
database's usual configuration mechanisms at all, and (5) seems like
less work to implement than (6), because (6) probably breaks a bunch
of client tools in weird ways that might not be easy for us to
discover during patch review. (1) doesn't allow changing things at
runtime, and might require the system administrator to fiddle with the
startup scripts, which seems like it could be inconvenient. (2) and
(3) seem like they put the superuser in a position to easily reverse a
policy about what the superuser ought to do, but in the case of (2),
that can be mitigated if the GUC can only be set in postgresql.conf
and not elsewhere. (4) has no real advantages except for allowing core
to maintain the fiction that we don't support this while actually
supporting it; I think we should reject that approach outright.

So what I'd like to see is a patch that implements (5), or in the
alternative (2) but with the GUC being PGC_SIGHUP and
GUC_DISALLOW_IN_AUTO_FILE. I believe there would be adequate consensus
to proceed with either of those approaches. Anybody feel like coding
it up?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: DOCS: add helpful partitioning links

2024-03-14 Thread Ashutosh Bapat
Hi Robert,

On Thu, Mar 7, 2024 at 10:49 PM Robert Treat  wrote:

> This patch adds a link to the "attach partition" command section
> (similar to the detach partition link above it) as well as a link to
> "create table like" as both commands contain additional information
> that users should review beyond what is laid out in this section.
> There's also a couple of wordsmiths in nearby areas to improve
> readability.
>

Thanks.

The patch gives error when building html

ddl.sgml:4300: element link: validity error : No declaration for attribute
linked of element link
 CREATE TABLE ...
LIKECREATE TABLE ...
LIKEhttps://english.stackexchange.com/questions/9700/outside-or-outside-of. But
I think the meaning of the sentence will be more clear if we rephrase it as
in the attached patch.

- convenient, as not only will the existing partitions become indexed,
but
- also any partitions that are created in the future will.  One
limitation is
+ convenient as not only will the existing partitions become indexed,
but
+ any partitions created in the future will as well.  One limitation is

I am finding the current construct hard to read. The comma is misplaced as
you have pointed out. The pair of commas break the "not only" ... "but
also" construct. I have tried to simplify the sentence in the attached.
Please review.

- the partitioned table; such an index is marked invalid, and the
partitions
- do not get the index applied automatically.  The indexes on
partitions can
- be created individually using CONCURRENTLY, and
then
+ the partitioned table; such an index is marked invalid and the
partitions
+ do not get the index applied automatically.  The partition indexes can

"indexes on partition" is clearer than "partition index". Fixed in the
attached patch.

Please review.

-- 
Best Wishes,
Ashutosh Bapat
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 8616a8e9cc..043717136c 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4283,18 +4283,20 @@ CREATE TABLE measurement_y2008m02 PARTITION OF measurement
 TABLESPACE fasttablespace;
 
 
- As an alternative, it is sometimes more convenient to create the
- new table outside the partition structure, and attach it as a
- partition later. This allows new data to be loaded, checked, and
- transformed prior to it appearing in the partitioned table.
- Moreover, the ATTACH PARTITION operation requires
- only SHARE UPDATE EXCLUSIVE lock on the
- partitioned table, as opposed to the ACCESS
- EXCLUSIVE lock that is required by CREATE TABLE
- ... PARTITION OF, so it is more friendly to concurrent
- operations on the partitioned table.
- The CREATE TABLE ... LIKE option is helpful
- to avoid tediously repeating the parent table's definition:
+ As an alternative, it is sometimes more convenient to create the partition
+ as a standalone new table, and attach it to the partitioned table later.
+ This allows new data to be loaded, checked, and transformed prior to it
+ appearing in the partitioned table. Moreover, the ATTACH
+ PARTITION operation requires only SHARE UPDATE
+ EXCLUSIVE lock on the partitioned table, as opposed to the
+ ACCESS EXCLUSIVE lock that is required by
+ CREATE TABLE ... PARTITION OF, so it is more friendly
+ to concurrent operations on the partitioned table; see ALTER TABLE ... ATTACH
+ PARTITION for additional details. The CREATE TABLE ...
+ LIKE command can be helpful to avoid tediously repeating
+ the parent table's definition, for example:
 
 
 CREATE TABLE measurement_y2008m02
@@ -4345,16 +4347,14 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
 
  As explained above, it is possible to create indexes on partitioned tables
- so that they are applied automatically to the entire hierarchy.
- This is very
- convenient, as not only will the existing partitions become indexed, but
- also any partitions that are created in the future will.  One limitation is
+ so that they are applied automatically to the entire hierarchy. The operation of creating an index on a partitioned table also creates corresponding indexes on its partitions. Any new partition created in future also inherits the indexes on the partitioned table.
+ One limitation is
  that it's not possible to use the CONCURRENTLY
  qualifier when creating such a partitioned index.  To avoid long lock
  times, it is possible to use CREATE INDEX ON ONLY
- the partitioned table; such an index is marked invalid, and the partitions
+ the partitioned table; such an index is marked invalid and the partitions
  do not get the index applied automatically.  The indexes on partitions can
- be created individually using CONCURRENTLY, and then
+ then be created individually using CONCURRENTLY and
  attached to the index on the parent using
  ALTER INDEX .. ATTACH 

Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-03-14 Thread Nathan Bossart
On Wed, Mar 13, 2024 at 01:09:18PM +0700, Yugo NAGATA wrote:
> On Tue, 12 Mar 2024 22:07:17 -0500
> Nathan Bossart  wrote:
>> I did some light editing to prepare this for commit.
> 
> Thank you. I confirmed the test you improved and I am fine with that.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Reports on obsolete Postgres versions

2024-03-14 Thread Peter Eisentraut

On 13.03.24 18:12, Bruce Momjian wrote:

On Wed, Mar 13, 2024 at 09:21:27AM -0700, Jeremy Schneider wrote:

It's not just roadmaps and release pages where we mix up these terms
either, it's even in user-facing SQL and libpq routines: both
PQserverVersion and current_setting('server_version_num') return the
patch release version in the numeric patch field, rather than the
numeric minor field (which is always 0).

In my view, the best thing would be to move toward consistently using
the word "patch" and moving away from the word "minor" for the
PostgreSQL quarterly maintenance updates.


I think "minor" is a better term since it contrasts with "major".  We
don't actually supply patches to upgrade minor versions.


There are potentially different adjectives that could apply to "version" 
and "release".


The version numbers can be called major and minor, because that just 
describes their ordering and significance.


But I do agree that "minor release" isn't quite as clear, because one 
could also interpret that as "a release, but a bit smaller this time". 
(Also might not translate well, since "minor" and "small" could 
translate to the same thing.)


One could instead, for example, describe those as "maintenance releases":

"Are you on the latest maintenance release?  Why not?  Are you not 
maintaining your database?"


That carries much more urgency than the same with "minor".

A maintenance release corresponds to an increase in the minor version 
number.  It's still tied together, but has different terminology.


The last news item reads:

"The PostgreSQL Global Development Group has released an update to all 
supported versions of PostgreSQL"


which has no urgency, but consider

"The PostgreSQL Global Development Group has published maintenance 
releases to all supported versions of PostgreSQL"






Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-03-14 Thread Robert Haas
On Fri, Mar 8, 2024 at 6:47 AM Jelte Fennema-Nio  wrote:
> 1. 0001-0005 are needed for any protocol change, and hopefully
> shouldn't require much discussion

I feel bad arguing about the patches that you think are a slam-dunk,
but I find myself disagreeing with the design choices.

Regarding 0001, I considered making this change as part of
ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed but decided against it,
because it seemed like it was making the assumption that we always
wanted to initiate new connections with the latest protocol version
that we know how to accept, and I wasn't sure that would always be
true. I don't think it would be catastrophic if this got committed or
anything -- it could always be changed later if the need arose -- but
I wanted to mention that I had a reason for not doing it, and I'm
still not particularly convinced that we should do it.

I'm really unhappy with 0002-0004. Before
ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed, any parameter included in
the startup message that wasn't in a short, hard-coded list was
treated as a request to set a GUC. That left no room for any other
type of protocol modification, so that commit carved out an exception,
where when we something that starts with _pq_., it's assumed to be
setting some other kind of thing, not a GUC. But 0004 throws that out
the window and decides, nope, those are just GUCs, too. Even if we
don't have a specific reason today why we'd ever want such a thing, it
seems short-sighted to give up on the possibility that in the future
we will. Because if we implement what this patch wants to do in this
way, basically consuming the entire namespace that
ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed created in on shot, and then
later we want the sort of thing that I'm postulating, we'll have to
manufacture another new namespace for that need.

And it seems to me that there are other ways we could do this. For
example, suppose we introduced just one new protocol parameter; for
the sake of argument, I'll call it _pq_.protocol_set. If the client
sends this parameter and the server accepts it, then the client knows
that the server supports a new protocol message ProtocolSetParameter,
which is the only way to set GUCs in the new PROTOCOL_EXTENSION
category. New libpq functions with names like, I don't know,
PQsetProtocolParameter(), are added to send such messages, and they
return an error if there are network problems or whatever, or if the
server didn't accept the _pq_.protocol_set option at startup time.

With this kind of design, you're just consuming one element of the
_pq_ namespace, and the next person who wants to do something can
consume one more element, and we'll be able to go on for a very long
time without running out of room. This is really how I intended this
mechanism to be used, and the only real downside I see as compared to
what you've done is that you can't set the protocol GUCs in the
startup packet, but must set them afterward via separate messages. If
that's a problem, then the proposal I just outline needs modification
... but no matter what we do exactly, I don't want the very first
protocol extension we ever add to eat up all of _pq_. I intended that
to support decades worth of extensibility, not just one patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2024-03-14 Thread Heikki Linnakangas

On 18/02/2024 00:31, Tomas Vondra wrote:

Do you plan to work continue working on this patch? I did take a look,
and on the whole it looks reasonable - it modifies the right places etc.


+1


I think there are two things that may need an improvement:

1) Storing variable-length data in ParallelBitmapHeapState

I agree with Robert the snapshot_and_stats name is not great. I see
Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData -
the reasons are somewhat different (phs_snapshot_off exists because we
don't know which exact struct will be allocated), while here we simply
need to allocate two variable-length pieces of memory. But it seems like
it would work nicely for this. That is, we could try adding an offset
for each of those pieces of memory:

  - snapshot_off
  - stats_off

I don't like the GetSharedSnapshotData name very much, it seems very
close to GetSnapshotData - quite confusing, I think.

Dmitry also suggested we might add a separate piece of shared memory. I
don't quite see how that would work for ParallelBitmapHeapState, but I
doubt it'd be simpler than having two offsets. I don't think the extra
complexity (paid by everyone) would be worth it just to make EXPLAIN
ANALYZE work.


I just removed phs_snapshot_data in commit 84c18acaf6. I thought that 
would make this moot, but now that I rebased this, there are stills some 
aesthetic questions on how best to represent this.


In all the other node types that use shared instrumentation like this, 
the pattern is as follows: (using Memoize here as an example, but it's 
similar for Sort, IncrementalSort, Agg and Hash)


/* 
 *   Shared memory container for per-worker memoize information
 * 
 */
typedef struct SharedMemoizeInfo
{
int num_workers;
MemoizeInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedMemoizeInfo;

/* this struct is backend-private */
typedef struct MemoizeState
{
ScanState   ss; /* its first field is 
NodeTag */
...
MemoizeInstrumentation stats;   /* execution statistics */
SharedMemoizeInfo *shared_info; /* statistics for parallel workers */
} MemoizeState;

While the scan is running, the node updates its private data in 
MemoizeState->stats. At the end of a parallel scan, the worker process 
copies the MemoizeState->stats to MemoizeState->shared_info->stats, 
which lives in shared memory. The leader process copies 
MemoizeState->shared_info->stats to its own backend-private copy, which 
it then stores in its MemoizeState->shared_info, replacing the pointer 
to the shared memory with a pointer to the private copy. That happens in 
ExecMemoizeRetrieveInstrumentation().


This is a little different for parallel bitmap heap scans, because a 
bitmap heap scan keeps some other data in shared memory too, not just 
instrumentation data. Also, the naming is inconsistent: the equivalent 
of SharedMemoizeInfo is actually called ParallelBitmapHeapState. I think 
we should rename it to SharedBitmapHeapInfo, to make it clear that it 
lives in shared memory, but I digress.


We could now put the new stats at the end of ParallelBitmapHeapState as 
a varlen field. But I'm not sure that's a good idea. In 
ExecBitmapHeapRetrieveInstrumentation(), would we make a backend-private 
copy of the whole ParallelBitmapHeapState struct, even though the other 
fields don't make sense after the shared memory is released? Sounds 
confusing. Or we could introduce a separate struct for the stats, and 
copy just that:


typedef struct SharedBitmapHeapInstrumentation
{
int num_workers;
BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedBitmapHeapInstrumentation;

typedef struct BitmapHeapScanState
{
ScanState   ss; /* its first field is 
NodeTag */
...
SharedBitmapHeapInstrumentation sinstrument;
} BitmapHeapScanState;

that compiles, at least with my compiler, but I find it weird to have a 
variable-length inner struct embedded in an outer struct like that.


Long story short, I think it's still better to store 
ParallelBitmapHeapInstrumentationInfo separately in the DSM chunk, not 
as part of ParallelBitmapHeapState. Attached patch does that, rebased 
over current master.



I didn't address any of the other things that you, Tomas, pointed out, 
but I think they're valid concerns.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 04f11a37ee04b282c51e9dae68ead7c9c3d5fb3d Mon Sep 17 00:00:00 2001
From: David Geier 
Date: Tue, 8 Nov 2022 19:40:31 +0100
Subject: [PATCH v3 1/1] Parallel Bitmap Heap Scan reports per-worker stats

Similarly to other nodes (e.g. hash join, sort, memoize),
Bitmap Heap Scan now reports per-worker stats in the EXPLAIN
ANALYZE output. Previously only the heap blocks stats for the
leader were reported which was incomplete in parallel scans.

Discussion: 

Re: UUID v7

2024-03-14 Thread Peter Eisentraut

On 14.03.24 12:25, Andrey M. Borodin wrote:

I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that 
implements full table 1 from section 4.1.

I think we are still interpreting this differently.  I think uuid_extract_variant should 
just return whatever is in those four bits. Your function comment says "Can return 
only 0, 0b10, 0b110 and 0b111.", which I don't think it is correct.  It should 
return 0 through 15.

We will return "do not care" bits. This bits can confuse someone. E.g. for 
varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some reason 
document lists number 1-15, but your are correct that range is 0-15.


I agree it's confusing.  Before I studied the RFC 4122bis project, I 
didn't even know about variant vs. version.  I think overall people will 
find this more confusing than useful.  If you just want to know, "is 
this UUID of the kind specified in RFC 4122", you can query it with 
uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the 
_extract_variant function?






Re: Add system identifier to backup manifest

2024-03-14 Thread Amul Sul
On Thu, Mar 14, 2024 at 12:48 AM Robert Haas  wrote:

> On Fri, Mar 8, 2024 at 12:14 AM Amul Sul  wrote:
> > Thank you for the improvement.
> >
> > The caller of verify_control_file() has the full path of the control
> file that
> > can pass it and avoid recomputing. With this change, it doesn't really
> need
> > verifier_context argument -- only the manifest's system identifier is
> enough
> > along with the control file path.  Did the same in the attached delta
> patch
> > for v11-0002 patch, please have a look, thanks.
>
> Those seem like sensible changes. I incorporated them and committed. I
> also:
>
> * ran pgindent, which changed a bit of your formatting
> * changed some BAIL_OUT calls to die; I think we should hardly ever be
> using BAIL_OUT, as that terminates the entire TAP test run, not just
> the current file
>

Thank you, Robert.

Regards,
Amul


Re: psql: fix variable existence tab completion

2024-03-14 Thread Alexander Korotkov
On Sun, Mar 3, 2024 at 5:37 PM Erik Wienhold  wrote:
> On 2024-03-03 03:00 +0100, Steve Chavez wrote:
> > psql has the :{?name} syntax for testing a psql variable existence.
> >
> > But currently doing \echo :{?VERB doesn't trigger tab completion.
> >
> > This patch fixes it. I've also included a TAP test.
>
> Thanks.  The code looks good, all tests pass, and the tab completion
> works as expected when testing manually.

A nice improvement.  I've checked why we have at all the '{' at
WORD_BREAKS and if we're going to break anything by removing that.  It
seems that '{' here from the very beginning and it comes from the
default value of rl_basic_word_break_characters [1].  It seems that
:{?name} is the only usage of '{' sign in psql.  So, removing it from
WORD_BREAKS shouldn't break anything.

I'm going to push this patch if no objections.

Links.
1. 
https://tiswww.case.edu/php/chet/readline/readline.html#index-rl_005fbasic_005fword_005fbreak_005fcharacters

--
Regards,
Alexander Korotkov




Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

2024-03-14 Thread vignesh C
On Thu, 14 Mar 2024 at 15:49, Amit Kapila  wrote:
>
> On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada  wrote:
> >
> > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila  wrote:
> > >
> > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > This fact makes me think that the slotsync worker might be able to
> > > > accept the primary_conninfo value even if there is no dbname in the
> > > > value. That is, if there is no dbname in the primary_conninfo, it uses
> > > > the username in accordance with the specs of the connection string.
> > > > Currently, the slotsync worker connects to the local database first
> > > > and then establishes the connection to the primary server. But if we
> > > > can reverse the two steps, it can get the dbname that has actually
> > > > been used to establish the remote connection and use it for the local
> > > > connection too. That way, the primary_conninfo generated by
> > > > pg_basebackup could work even without the patch. For example, if the
> > > > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > > > would connect to the postgres database. Given the 'postgres' database
> > > > is created by default and 'postgres' OS user is used in common, I
> > > > guess it could cover many cases in practice actually.
> > > >
> > >
> > > I think this is worth investigating but I suspect that in most cases
> > > users will end up using a replication connection without specifying
> > > the user name and we may not be able to give a meaningful error
> > > message when slotsync worker won't be able to connect. The same will
> > > be true even when the dbname same as the username would be used.
> >
> > What do you mean by not being able to give a meaningful error message?
> >
> > If the slotsync worker uses the user name as the dbname, and such a
> > database doesn't exist, the error message the user will get is
> > "database "test_user" does not exist". ISTM the same is true when the
> > user specifies the wrong database in the primary_conninfo.
> >
>
> Right, the exact error message as mentioned by Shveta will be:
> ERROR:  could not connect to the primary server: connection to server
> at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not
> exist
>
> Now, without this idea, the ERROR message will be:
>  ERROR:  slot synchronization requires dbname to be specified in
> primary_conninfo
>
> I am not sure how much this matters but the second message sounds more useful.
>
> > >
> > > > Having said that, even with (or without) the above change, we might
> > > > want to change the pg_basebackup so that it writes the dbname to the
> > > > primary_conninfo if -R option is specified. Since the database where
> > > > the slotsync worker connects cannot be dropped while the slotsync
> > > > worker is running, the user might want to change the database to
> > > > connect, and it would be useful if they can do that using
> > > > pg_basebackup instead of modifying the configuration file manually.
> > > >
> > > > While the current approach makes sense to me, I'm a bit concerned that
> > > > we might end up having the pg_basebackup search the actual database
> > > > name (e.g. 'dbname=template1') from the .pgpass file instead of
> > > > 'dbname=replication'. As far as I tested on my environment, suppose
> > > > that I execute:
> > > >
> > > > pg_basebackup -D tmp -d "dbname=testdb" -R
> > > >
> > > > The pg_basebackup established a replication connection but looked for
> > > > the password of the 'testdb' database. This could be another
> > > > inconvenience for the existing users who want to use the slot
> > > > synchronization.
> > > >
> > >
> > > This is true because it is internally using logical replication
> > > connection (as we will set set replication=database).
> >
> > Did you mean the pg_basebackup is using a logical replication
> > connection in this case? As far as I tested, even if we specify dbname
> > to the -d option of pg_basebackup, it uses a physical replication
> > connection. For example, it can take a backup even if I specify a
> > non-existing database name.
> >
>
> You are right. I misunderstood some part of the code in GetConnection.
> However, I think my point is still valid that if the user has provided
> dbname in the connection string it means that she wants that database
> entry to be looked upon not "replication" entry.
>
> >
> > >
> > > > But it's still just an idea and I might be missing something. And
> > > > given we're getting closer to the feature freeze, it would be a PG18
> > > > item.
> > > >
> > >
> > > +1. At this stage, it is important to discuss whether we should allow
> > > pg_baseback to write dbname (either a specified one or a default one)
> > > along with other parameters in primary_conninfo?
> > >
> >
> > True. While I basically agree that pg_basebackup writes dbname in
> > primary_conninfo, I'm concerned that writing "dbname=replication"
> > could be problematic. Quoting the case 3) Vignesh 

Re: Make attstattarget nullable

2024-03-14 Thread Tomas Vondra



On 3/14/24 11:13, Peter Eisentraut wrote:
> On 12.03.24 14:32, Tomas Vondra wrote:
>> On 3/12/24 13:47, Peter Eisentraut wrote:
>>> On 06.03.24 22:34, Tomas Vondra wrote:
 0001
 

 1) I think this bit in ALTER STATISTICS docs is wrong:

 -  >>> class="parameter">new_target
 +  SET STATISTICS { >>> class="parameter">integer | DEFAULT }

 because it means we now have list entries for name, ..., new_name,
 new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }".
 That's a bit weird.
>>>
>>> Ok, how would you change it?  List out the full clauses of the other
>>> variants under Parameters as well?
>>
>> I'd go with a parameter, essentially exactly as it used to be, except
>> for adding the DEFAULT option. So the list would define new_target, and
>> mention DEFAULT as a special value.
> 
> Ok, done that way (I think).
> 

Seems OK to me.

 2) The newtarget handling in AlterStatistics seems rather confusing.
 Why
 does it get set to -1 just to ignore the value later? For a while I was
 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field
 to -1. Maybe ditching the first if block and directly checking
 stmt->stxstattarget before setting repl_val/repl_null would be better?
>>>
>>> But we also need to continue accepting -1 for default on input.  The
>>> current code achieves that, the proposed variant would not.
>>
>> OK, I did not realize that. But then maybe this should be explained in a
>> comment before the new "if" block, because people won't realize why it
>> needs to be this way.
> 
> In the new version, I tried to write this more explicitly, and updated
> tablecmds.c to match.

WFM. It still seems a bit hard to read, but I don't know how to do it
better. I guess it's how it has to be to deal with multiple default
values in a backwards-compatible way. Good thing is it's localized in
two places.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Built-in CTYPE provider

2024-03-14 Thread Peter Eisentraut

On 14.03.24 09:08, Jeff Davis wrote:

0001 (the C.UTF-8 locale) is also close. Considering that most of the
infrastructure is already in place, that's not a large patch. You many
have some comments about the way I'm canonicalizing and validating in
initdb -- that could be cleaner, but it feels like I should refactor
the surrounding code separately first.


If have tested this against the libc locale C.utf8 that was available on 
the OS, and the behavior is consistent.


I wonder if we should version the builtin locales too.  We might make a 
mistake and want to change something sometime?


Tiny comments:

* src/bin/scripts/t/020_createdb.pl

The two added tests should have different names that tells them apart
(like the new initdb tests).

* src/include/catalog/pg_collation.dat

Maybe use 'and' instead of '&' in the description.


0002 (inlining utf8 functions) is also ready.


Seems ok.


For 0003 and beyond, I'd like some validation that it's what you had in
mind.


I'll look into those later.





Re: abi-compliance-checker

2024-03-14 Thread Robert Haas
On Mon, Mar 4, 2024 at 7:50 AM Peter Eisentraut  wrote:
> Looking at this again, if we don't want the .xml files in the tree, then
> we don't really need this patch series.

Based on this, I've updated the status of this patch in the CommitFest
application to Withdrawn. If that's not correct, please feel free to
adjust.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Should we remove -Wdeclaration-after-statement?

2024-03-14 Thread Robert Haas
On Wed, Feb 7, 2024 at 7:55 PM Noah Misch  wrote:
> > So my suggestion is for people to respond with -1, -0.5, +-0, +0.5, or
> > +1 to indicate support against/for the change.
>
> I'm +1 for the change, for these reasons:
>
> - Fewer back-patch merge conflicts.  The decls section of long functions is a
>   classic conflict point.
> - A mid-func decl demonstrates that its var is unused in the first half of the
>   func.
> - We write Perl in the mixed decls style, without problems.
>
> For me personally, the "inconsistency" concern is negligible.  We allowed "for
> (int i = 0", and that inconsistency has been invisible to me.

This thread was interesting as an opinion poll, but it seems clear
that the consensus is still against the proposed change, so I've
marked the CommitFest entry rejected.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Catalog domain not-null constraints

2024-03-14 Thread Peter Eisentraut

On 14.03.24 15:03, Alvaro Herrera wrote:

On 2024-Mar-14, Peter Eisentraut wrote:


Perhaps it would make sense if we change the ALTER TABLE command to be like

 ALTER TABLE t1 ADD IF NOT EXISTS NOT NULL c1

Then the behavior is like one would expect.

For ALTER TABLE, we would reject this command if IF NOT EXISTS is not
specified.  (Since this is mainly for pg_dump, it doesn't really matter for
usability.)  For ALTER DOMAIN, we could accept both variants.


I don't understand why you want to change this behavior, though.


Because in the abstract, the behavior of

ALTER TABLE t1 ADD 

should be to add a constraint.

In the current implementation, the behavior is different for different 
constraint types.






Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-14 Thread Bharath Rupireddy
On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila  wrote:
>
> On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> >
> > Yes, there will be some sort of duplicity if we emit conflict_reason
> > as a text field. However, I still think the better way is to turn
> > conflict_reason text to conflict boolean and set it to true only on
> > rows_removed and wal_level_insufficient invalidations. When conflict
> > boolean is true, one (including all the tests that we've added
> > recently) can look for invalidation_reason text field for the reason.
> > This sounds reasonable to me as opposed to we just mentioning in the
> > docs that "if invalidation_reason is rows_removed or
> > wal_level_insufficient it's the reason for conflict with recovery".
> >
> Fair point. I think we can go either way. Bertrand, Nathan, and
> others, do you have an opinion on this matter?

While we wait to hear from others on this, I'm attaching the v9 patch
set implementing the above idea (check 0001 patch). Please have a
look. I'll come back to the other review comments soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 18855c08cd8bcbaf41aba10048f0ea23a246e546 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 14 Mar 2024 12:48:52 +
Subject: [PATCH v9 1/4] Track invalidation_reason in pg_replication_slots

Up until now, reason for replication slot invalidation is not
tracked in pg_replication_slots. A recent commit 007693f2a added
conflict_reason to show the reasons for slot invalidation, but
only for logical slots.

This commit adds a new column to show invalidation reasons for
both physical and logical slots. And, this commit also turns
conflict_reason text column to conflicting boolean column
(effectively reverting commit 007693f2a). One now can look at the
new invalidation_reason column for logical slots conflict with
recovery.
---
 doc/src/sgml/ref/pgupgrade.sgml   |  4 +-
 doc/src/sgml/system-views.sgml| 63 +++
 src/backend/catalog/system_views.sql  |  5 +-
 src/backend/replication/logical/slotsync.c|  2 +-
 src/backend/replication/slot.c|  8 +--
 src/backend/replication/slotfuncs.c   | 25 +---
 src/bin/pg_upgrade/info.c |  4 +-
 src/include/catalog/pg_proc.dat   |  6 +-
 src/include/replication/slot.h|  2 +-
 .../t/035_standby_logical_decoding.pl | 35 ++-
 .../t/040_standby_failover_slots_sync.pl  |  4 +-
 src/test/regress/expected/rules.out   |  7 ++-
 12 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 58c6c2df8b..8de52bf752 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -453,8 +453,8 @@ make prefix=/usr/local/pgsql.new install
   
All slots on the old cluster must be usable, i.e., there are no slots
whose
-   pg_replication_slots.conflict_reason
-   is not NULL.
+   pg_replication_slots.conflicting
+   is not true.
   
  
  
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index be90edd0e2..f3fb5ba1b0 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2525,34 +2525,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
   
-   conflict_reason text
+   conflicting bool
   
   
-   The reason for the logical slot's conflict with recovery. It is always
-   NULL for physical slots, as well as for logical slots which are not
-   invalidated. The non-NULL values indicate that the slot is marked
-   as invalidated. Possible values are:
-   
-
- 
-  wal_removed means that the required WAL has been
-  removed.
- 
-
-
- 
-  rows_removed means that the required rows have
-  been removed.
- 
-
-
- 
-  wal_level_insufficient means that the
-  primary doesn't have a  sufficient to
-  perform logical decoding.
- 
-
-   
+   True if this logical slot conflicted with recovery (and so is now
+   invalidated). When this column is true, check
+   invalidation_reason column for the conflict
+   reason.
   
  
 
@@ -2581,6 +2560,38 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+ 
+  
+   invalidation_reason text
+  
+  
+   The reason for the slot's invalidation. NULL if the
+   slot is currently actively being used. The non-NULL values indicate that
+   the slot is marked as invalidated. Possible values are:
+   
+
+ 
+  wal_removed means that the required WAL has been
+  removed.
+ 
+
+
+ 
+  

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2024-03-14 Thread Robert Haas
On Wed, Oct 4, 2023 at 9:12 PM James Coleman  wrote:
> All right, attached is a v3 which attempts to fix the wrong
> information with an economy of words. I may at some point submit a
> separate patch that adds a broader pruning section, but this at least
> brings the docs inline with reality insofar as they address it.

I don't think this is as good as what I proposed back on October 2nd.
IMHO, that version does a good job making the text accurate and clear,
and is directly responsive to your original complaint, namely, that
the root of the HOT chain can't be removed. But this version seems to
contain a number of incidental changes that are unrelated to that
point, e.g. "old versions" -> "old, no longer visible versions", "can
be completely removed" -> "may be pruned", and the removal of the
sentence "In summary, heap-only tuple updates can only be created - if
columns used by indexes are not updated" which AFAICT is both
completely correct as-is and unrelated to the original complaint.

Maybe I shouldn't be, but I'm slightly frustrated here. I thought I
had proposed an alternative which you found acceptable, but then you
proposed several more versions that did other things instead, and I
never really understood why we couldn't just adopt the version that
you seemed to think was OK. If there's a problem with that, say what
it is. If there's not, let's do that and move on.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add publisher and subscriber to glossary documentation.

2024-03-14 Thread Alvaro Herrera
On 2024-Mar-14, Shlok Kyal wrote:

> Andrew Atkinson wrote:
>
> > Anyway, hopefully these examples show “node” and “database” are
> > mixed and perhaps others agree using one consistently might help the
> > goals of the docs.
> 
> For me the existing content looks good, I felt let's keep it as it is
> unless others feel differently.

Actually it's these small terminology glitches that give me pause.  If
we're going to have terms that are interchangeable (in this case "node"
and "database"), then they should be always interchangeable, not just in
some unspecified cases.  Maybe the idea of using "node" (which sounds
like something that's instance-wide) is wrong for logical replication,
which is necessarily something that happens database-locally.

Then again, maybe defining "node" as something that exists at a
database-local level when used in the context of logical replication is
sufficient.  In that case, it would be better to avoid defining it as a
synonym of "instance".  Then the terms are not always interchangeable,
but it's clear when they are and when they aren't.

"Node: in replication, each of the endpoints to which or
from which data is replicated.  In the context of physical replication,
each node is an instance.  In the context of logical replication, each
node is a database".

Does that make sense?

I'd also look at altering "Primary" and "Standby" so that it's clearer
that they're about physical replication, and don't mention "database"
anymore, since that's the wrong level.  Maybe turn them into "Primary
(node)" and "Standby (node)" instead.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-14 Thread Tomas Vondra



On 3/13/24 23:38, Thomas Munro wrote:
> On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra
>  wrote:
>> On 3/2/24 23:28, Melanie Plageman wrote:
>>> On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra
>>>  wrote:
 With the current "master" code, eic=1 means we'll issue a prefetch for B
 and then read+process A. And then issue prefetch for C and read+process
 B, and so on. It's always one page ahead.
>>>
>>> Yes, that is what I mean for eic = 1
> 
> I spent quite a few days thinking about the meaning of eic=0 and eic=1
> for streaming_read.c v7[1], to make it agree with the above and with
> master.  Here's why I was confused:
> 
> Both eic=0 and eic=1 are expected to generate at most 1 physical I/O
> at a time, or I/O queue depth 1 if you want to put it that way.  But
> this isn't just about concurrency of I/O, it's also about computation.
> Duh.
> 
> eic=0 means that the I/O is not concurrent with executor computation.
> So, to annotate an excerpt from [1]'s random.txt, we have:
> 
> effective_io_concurrency = 0, range size = 1
> unpatched  patched
> ==
> pread(43,...,8192,0x58000) = 8192  pread(82,...,8192,0x58000) = 8192
>  *** executor now has page at 0x58000 to work on ***
> pread(43,...,8192,0xb) = 8192  pread(82,...,8192,0xb) = 8192
>  *** executor now has page at 0xb to work on ***
> 
> eic=1 means that a single I/O is started and then control is returned
> to the executor code to do useful work concurrently with the
> background read that we assume is happening:
> 
> effective_io_concurrency = 1, range size = 1
> unpatched  patched
> ==
> pread(43,...,8192,0x58000) = 8192  pread(82,...,8192,0x58000) = 8192
> posix_fadvise(43,0xb,0x2000,...)   posix_fadvise(82,0xb,0x2000,...)
>  *** executor now has page at 0x58000 to work on ***
> pread(43,...,8192,0xb) = 8192  pread(82,...,8192,0xb) = 8192
> posix_fadvise(43,0x108000,0x2000,...)  posix_fadvise(82,0x108000,0x2000,...)
>  *** executor now has page at 0xb to work on ***
> pread(43,...,8192,0x108000) = 8192 pread(82,...,8192,0x108000) = 8192
> posix_fadvise(43,0x16,0x2000,...)  posix_fadvise(82,0x16,0x2000,...)
> 
> In other words, 'concurrency' doesn't mean 'number of I/Os running
> concurrently with each other', it means 'number of I/Os running
> concurrently with computation', and when you put it that way, 0 and 1
> are different.
> 

Interesting. For some reason I thought with eic=1 we'd issue the fadvise
for page #2 before pread of page #1, so that there'd be 2 IO requests in
flight at the same time for a bit of time ... it'd give the fadvise more
time to actually get the data into page cache.

> Note that the first read is a bit special: by the time the consumer is
> ready to pull a buffer out of the stream when we don't have a buffer
> ready yet, it is too late to issue useful advice, so we don't bother.
> FWIW I think even in the AIO future we would have a synchronous read
> in that specific place, at least when using io_method=worker, because
> it would be stupid to ask another process to read a block for us that
> we want right now and then wait for it wake us up when it's done.
> 
> Note that even when we aren't issuing any advice because eic=0 or
> because we detected sequential access and we believe the kernel can do
> a better job than us, we still 'look ahead' (= call the callback to
> see which block numbers are coming down the pipe), but only as far as
> we need to coalesce neighbouring blocks.  (I deliberately avoid using
> the word "prefetch" except in very general discussions because it
> means different things to different layers of the code, hence talk of
> "look ahead" and "advice".)  That's how we get this change:
> 
> effective_io_concurrency = 0, range size = 4
> unpatched  patched
> ==
> pread(43,...,8192,0x58000) = 8192  pread(82,...,8192,0x58000) = 8192
> pread(43,...,8192,0x5a000) = 8192  preadv(82,...,2,0x5a000) = 16384
> pread(43,...,8192,0x5c000) = 8192  pread(82,...,8192,0x5e000) = 8192
> pread(43,...,8192,0x5e000) = 8192  preadv(82,...,4,0xb) = 32768
> pread(43,...,8192,0xb) = 8192  preadv(82,...,4,0x108000) = 32768
> pread(43,...,8192,0xb2000) = 8192  preadv(82,...,4,0x16) = 32768
> 
> And then once we introduce eic > 0 to the picture with neighbouring
> blocks that can be coalesced, "patched" starts to diverge even more
> from "unpatched" because it tracks the number of wide I/Os in
> progress, not the number of single blocks.
> 

So, IIUC this means (1) the patched code is more aggressive wrt
prefetching (because we prefetch more data overall, because master would

Re: Reports on obsolete Postgres versions

2024-03-14 Thread Robert Haas
On Wed, Mar 13, 2024 at 3:05 PM Laurenz Albe  wrote:
> I think we are pretty conservative with backpatching changes to the
> optimizer that could destabilize existing plans.

We have gotten better about that, which is good.

> I feel quite strongly that we should not use overly conservative language
> there.  If people feel that they have to test their applications for new
> minor releases, the only effect will be that they won't install minor 
> releases.
> Installing a minor release should be as routine as the operating system
> patches that many companies apply automatically during weekend maintenance
> windows.  They can also introduce bugs, and everybody knows and accepts that.

I don't agree with this. If we tell people that they don't need to
test their applications after an upgrade, I do not think that the
result will be that they skip the testing and everything works
perfectly. I think that the result will be that we lose all
credibility. Nobody who has been working on computers for longer than
a week is going to believe that a software upgrade can't break
anything, and someone whose entire business depends on things not
breaking is going to want to validate that they haven't. And they're
not wrong to think that way, either.

I think that whatever we say here should focus on what we try to do or
guarantee, not on what actions we think users ought to take, never
mind must take. We can say that we try to avoid making any changes
upon which an application might be relying -- but there surely is some
weasel-wording there, because we have made such changes before in the
name of security, and sometimes to fix bugs, and we will likely to do
so again in the future. But it's not for us to decide how much testing
is warranted. It's the user's system, not ours.

In the end, while I certainly don't mind improving the web page, I
think that a lot of what we're seeing here probably has to do with the
growing popularity and success of PostgreSQL. If you have more people
using your software, you're also going to have more people using
out-of-date versions of your software.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: MERGE ... RETURNING

2024-03-14 Thread Alvaro Herrera
On 2024-Mar-13, Dean Rasheed wrote:

> On Wed, 13 Mar 2024 at 06:44, jian he  wrote:
> >
> > 
> > [ WITH with_query [, ...] ]
> > MERGE INTO [ ONLY ]  >
> > here the "WITH" part should have "[ RECURSIVE ]"
> 
> Actually, no. MERGE doesn't support WITH RECURSIVE.
> 
> It's not entirely clear to me why though. I did a quick test, removing
> that restriction in the parse analysis code, and it seemed to work
> fine. Alvaro, do you remember why that restriction is there?

There's no real reason for it, other than I didn't want to have to think
it through; I did suspect that it might Just Work, but I felt I would
have had to come up with more nontrivial test cases than I wanted to
write at the time.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"People get annoyed when you try to debug them."  (Larry Wall)




Re: Catalog domain not-null constraints

2024-03-14 Thread Alvaro Herrera
On 2024-Mar-14, Peter Eisentraut wrote:

> Perhaps it would make sense if we change the ALTER TABLE command to be like
> 
> ALTER TABLE t1 ADD IF NOT EXISTS NOT NULL c1
> 
> Then the behavior is like one would expect.
> 
> For ALTER TABLE, we would reject this command if IF NOT EXISTS is not
> specified.  (Since this is mainly for pg_dump, it doesn't really matter for
> usability.)  For ALTER DOMAIN, we could accept both variants.

I don't understand why you want to change this behavior, though.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"




Re: REVOKE FROM warning on grantor

2024-03-14 Thread David G. Johnston
On Thursday, March 14, 2024, Étienne BERSAC 
wrote:

>
> However, I'd prefer if Postgres fails properly. Because the GRANT is
> actually not revoked. This prevent ldap2pg to report an issue in
> handling privileges on such roles.
>
> What do you think of make this warning an error ?
>
>
The choice of warning is made because after the command ends the grantmin
question does not exist.  The revoke was a no-op and the final state is as
the user intended.  Historically doing this didn’t give any message at all
which was confusing so we added a warning so the semantics of not failing
were preserved but there was some indication that something was amiss.  I
don’t have a compelling argument to,change the long-standing behavior.
Client code can and probably should look for a show errors reported by the
backend.  It is indeed possibly to treat this warning more serverly than
the server chooses to.

David J.


Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-14 Thread Heikki Linnakangas

On 14/03/2024 12:55, Dilip Kumar wrote:

On Thu, Mar 14, 2024 at 4:07 PM Heikki Linnakangas  wrote:

_SPI_execute_plan() has code to deal with the possibility that the
active snapshot is not set. That seems fishy; do we really support SPI
without any snapshot? I'm inclined to turn that into an error. I ran the
regression tests with an "Assert(ActiveSnapshotSet())" there, and
everything worked.


IMHO, we can call SPI_Connect() and SPI_Execute() from any C
extension, so I don't think there we can guarantee that the snapshot
must be set, do we?


I suppose, although the things you could do without a snapshot would be 
pretty limited. The query couldn't access any tables. Could it even look 
up functions in the parser? Not sure.



Maybe for now we can just handle this specific case to remove the
snapshot serializing for the BitmapHeapScan as you are doing in the
patch.  After looking into the code your theory seems correct that we
are just copying the ActiveSnapshot while building the query
descriptor and from there we are copying into the Estate so logically
there should not be any reason for these two to be different.


Ok, committed that for now. Thanks for looking!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: type cache cleanup improvements

2024-03-14 Thread Teodor Sigaev

One thing that first pops out to me is that we can do the refactor of
hash_initial_lookup() as an independent piece, without the extra paths
introduced.  But rather than returning the bucket hash and have the
bucket number as an in/out argument of hash_initial_lookup(), there is
an argument for reversing them: hash_search_with_hash_value() does not
care about the bucket number.

Ok, no problem




02-hash_seq_init_with_hash_value.v5.patch - introduces a
hash_seq_init_with_hash_value() method. hash_initial_lookup() is marked as
inline, but I suppose, modern compilers are smart enough to inline it
automatically.


Likely so, though that does not hurt to show the intention to the
reader.

Agree



So I would like to suggest the attached patch for this first piece.
What do you think?

I have not any objections



It may also be an idea to use `git format-patch` when generating a
series of patches.  That makes for easier reviews.

Thanks, will try

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/




Re: small_cleanups around login event triggers

2024-03-14 Thread Robert Treat
On Thu, Mar 14, 2024 at 8:21 AM Daniel Gustafsson  wrote:
>
> > On 14 Mar 2024, at 02:47, Robert Treat  wrote:
>
> > I was taking a look at the login event triggers work (nice work btw)
>
> Thanks for reviewing committed code, that's something which doesn't happen
> often enough and is much appreciated.
>
> > and saw a couple of minor items that I thought would be worth cleaning
> > up. This is mostly just clarifying the exiting docs and code comments.
>
> + either in a connection string or configuration file. Alternativly, you 
> can
> This should be "Alternatively" I think.
>

Yes.

> - canceling connection in psql wouldn't cancel
> + canceling a connection in psql will not 
> cancel
> Nitpickery (perhaps motivated by english not being my first language), but
> since psql only deals with one connection I would expect this to read "the
> connection".
>

My interpretation of this is that "a connection" is more correct
because it could be your connection or someone else's connection (ie,
you are canceling one of many possible connections). Definitely
nitpickery either way.

> - * Returns true iff the lock was acquired.
> + * Returns true if the lock was acquired.
> Using "iff" here is being consistent with the rest of the file (and 
> technically
> correct):
>
> $ grep -c "if the lock was" src/backend/storage/lmgr/lmgr.c
> 1
> $ grep -c "iff the lock was" src/backend/storage/lmgr/lmgr.c
> 5
>

Ah, yeah, I was pretty focused on the event trigger stuff and didn't
notice it being used elsewhere; thought it was a typo, but I guess
it's meant as shorthand for "if and only if", I wonder how many people
are familiar with that.


Robert Treat
https://xzilla.net




  1   2   >