date:20250202

On Mon, Feb 3, 2025 at 4:50 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> > I think it's a good idea to support it at least on HEAD. I've attached
> > a patch for that.
>
> +1. I've confirmed that pg_rewind and -R can't output dbname for now,
> and your patch allows to do it.
>
> Few comments for your patch.
>
> 1.
>
> pg_basebackup.sgml has below description. I feel this can be ported to
> pg_rewind.sgml as well.
>
> ```
> The dbname will be recorded only if the dbname was
> specified explicitly in the connection string or 
> environment variable.
> ```
>
> 2.
> I'm not sure whether recovery_gen.h/c is a correct place for the exported 
> function
> GetDbnameFromConnectionOptions(). The function itself does not handle
> postgresql.cuto.conf file.
> I seeked other header files and felt that connect_utils.h might be.
>
> ```
> /*-
>  *
>  * Facilities for frontend code to connect to and disconnect from databases.
> ```
>
> Another idea is to change the third API to accept the whole connection 
> string, and
> it extracts dbname from it. In this approach we can make 
> GetDbnameFromConnectionOptions()
> to static function - which does not feel strange for me.
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

Hi Sawada-san, h

Here are a few more minor comments in addition to what Kuroda-San
already posted.

==
typo in patch name /primary_conninfo/primary_connifo/

==
Commit message

no details.
bad link.

==
src/fe_utils/recovery_gen.c

1.
static char *
FindDbnameInConnParams(PQconninfoOption *conn_opts)

There is a missing forward declaration of this function. Better to add
it for consistency because the other static function has one.

~~~

2.
+static char *
+FindDbnameInConnParams(PQconninfoOption *conn_opts)
+{
+ PQconninfoOption *conn_opt;
+
+ for (conn_opt = conn_opts; conn_opt->keyword != NULL; conn_opt++)
+ {
+ if (strcmp(conn_opt->keyword, "dbname") == 0 &&
+ conn_opt->val != NULL && conn_opt->val[0] != '\0')
+ return pg_strdup(conn_opt->val);
+ }
+ return NULL;
+}

2a.
I know you copied this, but it seems a bit strange that this function
is named "Params" when everything about it including its parameter and
comments and the caller talks about "_opts" or "Options"

~

2b.
I know you copied this, but normally 'conn_opt' might have been
written as a for-loop variable.

~~~

3.
+/*
+ * GetDbnameFromConnectionOptions
+ *
+ * This is a special purpose function to retrieve the dbname from either the
+ * 'connstr' specified by the caller or from the environment variables.
+ *
+ * Returns NULL, if dbname is not specified by the user in the above
+ * mentioned connection options.
+ */

What does "in the above mentioned connection options" mean? In the
original function comment where this was copied from there was an
extra sentence ("We follow ... from various connection options.") so
this had more context, but now that the other sentence is removed
maybe "above mentioned connection options" looks like it also needs
rewording.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: POC, WIP: OR-clause support for indexes

2025-02-02 Thread Andrei Lepikhov


On 2/3/25 00:57, Alexander Korotkov wrote:

On Thu, Jan 30, 2025 at 3:23 PM Pavel Borisov  wrote:

On Tue, 28 Jan 2025 at 12:42, Andrei Lepikhov  wrote:


On 1/28/25 11:36, Andrei Lepikhov wrote:

On 1/27/25 16:50, Alexander Korotkov wrote:
qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);

To fit an index, the order of elements in the target array of the
`ScalarArrayOpExpr` may change compared to the initial list of OR
expressions. If there are indexes that cover the same set of columns but
in reverse order, this could potentially alter the position of a
Subplan. However, I believe this is a rare case; it is supported by the
initial OR path and should be acceptable.

I beg your pardon - I forgot that we've restricted the feature's scope
and can't combine OR clauses into ScalarArrayOpExpr if the args list
contains references to different columns.
So, my note can't be applied here.

--
regards, Andrei Lepikhov


I've looked at the patch v46-0001
Looks good to me.

There is a test that demonstrates the behavior change. Maybe some more
cases like are also worth adding to a test.

+SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.c
OR (t1.a=t2.b OR t1.a=1);
+   QUERY PLAN
+
+ Nested Loop
+   ->  Seq Scan on bitmap_split_or t2
+   ->  Index Scan using t_a_b_idx on bitmap_split_or t1
+ Index Cond: (a = ANY (ARRAY[t2.c, t2.b, 1]))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.c=t2.b OR t1.a=1;
+  QUERY PLAN
+--
+ Nested Loop
+   Join Filter: ((t1.c = t2.b) OR (t1.a = 1))
+   ->  Seq Scan on bitmap_split_or t1
+   ->  Materialize
+ ->  Seq Scan on bitmap_split_or t2
+(5 rows)
+
+EXPLAIN (COSTS OFF)


Added more tests to join.sql

I have made final pass through the changes. All looks good.
Only one thing looks strange for me - multiple '42's in the output of 
the test. May be reduce output by an aggregate in the target list of the 
query?


--
regards, Andrei Lepikhov

Re: pgbench with partitioned tables

2025-02-02 Thread Álvaro Herrera

On 2025-Feb-03, Sergey Tatarintsev wrote:

> Thanks for the note. I changed the query in the patch (v2 patch attached)
> 
> Btw, an additional benefit from the patch is that we can use foreign tables
> (for example, to test postgres_fdw optimizations)

Good thought, and maybe it would be better if the new function was
"get_table_relkind" which just returns the char, and this check

> + /* Use COPY with FREEZE on v14 and later for all regular tables */
> + if ((PQserverVersion(con) >= 14) && is_regular_table(con, table))
>   copy_statement_fmt = "copy %s from stdin with (freeze 
> on)";

can be "&& get_table_relkind(con, table) == RELKIND_RELATION"

which I think is more natural.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: why there is not VACUUM FULL CONCURRENTLY?

2025-02-02 Thread Alvaro Herrera



> From bf2ec8c5d753de340140839f1b061044ec4c1149 Mon Sep 17 00:00:00 2001
> From: Antonin Houska 
> Date: Mon, 13 Jan 2025 14:29:54 +0100
> Subject: [PATCH 4/8] Add CONCURRENTLY option to both VACUUM FULL and CLUSTER
>  commands.

> @@ -950,8 +1412,46 @@ copy_table_data(Relation NewHeap, Relation OldHeap, 
> Relation OldIndex, bool verb

> + if (concurrent)
> + {
> + PgBackendProgress   progress;
> +
> + /*
> +  * Command progress reporting gets terminated at 
> subtransaction
> +  * end. Save the status so it can be eventually 
> restored.
> +  */
> + memcpy(&progress, &MyBEEntry->st_progress,
> +sizeof(PgBackendProgress));
> +
> + /* Release the locks by aborting the subtransaction. */
> + RollbackAndReleaseCurrentSubTransaction();
> +
> + /* Restore the progress reporting status. */
> + pgstat_progress_restore_state(&progress);
> +
> + CurrentResourceOwner = oldowner;
> + }

I was looking at 0002 to see if it'd make sense to commit it ahead of a
fuller review of the rest, and I find that the reason for that patch is
this hunk you have here in copy_table_data -- you want to avoid a
subtransaction abort (which you use to release planner lock) clobbering
the status.  I think this a bad idea.  It might be better to handle this
in a different way, for instance

1) maybe have a flag that says "do not reset progress status during
subtransaction abort"; REPACK would set that flag, so it'd be able to
continue its business without having to memcpy the current status (which
seems like quite a hack) or restoring it afterwards.

2) maybe subtransaction abort is not the best way to release the
planning locks anyway.  I think it might be better to have a
ResourceOwner that owns those locks, and we do ResourceOwnerRelease()
which would release them.  I think this would be a novel usage of
ResourceOwner so it needs more research.  But if this works, then we
don't need the subtransaction at all, and therefore we don't need
backend progress restore at all either.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Proposal to CREATE FOREIGN TABLE LIKE

2025-02-02 Thread Álvaro Herrera

On 2025-Feb-01, Zhang Mingli wrote:

> For example, we use kafka_fdw to produce and consume data from a Kafka
> server.  In our scenario, we sometimes need to write records from a
> local table into Kafka. Here’s a brief outline of our process:
> 
> 1. We already have a wide table, local_wide_table in Postgres.
> 2. We need to create a foreign table, foreign_table, with the same
> definition as local_wide_table.
> 3. Insert records into foreign_table by selecting
> from local_wide_table with the some quals.
> 
> In step 2, we currently have to manually create the foreign table
> using CREATE FOREIGN TABLE and copy the column definitions one by one.

Eh yeah, I guess for this use case it makes sense to allow a LIKE clause
on CREATE FOREIGN TABLE.  Were you going to submit a patch?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

Re: pgbench with partitioned tables

2025-02-02 Thread Álvaro Herrera

On 2025-Jan-31, Melanie Plageman wrote:

> Maybe instead of just not using COPY FREEZE on a table if it is
> partitioned, we could add new data generation init_steps. Perhaps one
> that is client-side data generation (g) but with no freezing? I'm not
> really sure what the letter would be (f? making it f, g, and G?).

I think it makes sense to do what you suggest, but on the other hand,
the original code that Sergey is patching looks like a hack in the sense
that it hardcodes which tables to use FREEZE with.  There's no point to
doing things that way actually, so accepting Sergey's patch to replace
that with a relkind check feels sensible to me.

I think the query should be
SELECT relkind FROM pg_catalog.pg_class WHERE oid='%s'::pg_catalog.regclass
if only for consistency with pgbench's other query on catalogs.

Your proposal to add different init_steps might be reasonable, at least
if we allowed partitionedness of tables to vary in other ways (eg. if we
made pgbench_history partitioned), but I don't think it conflicts with
Sergey's patch in spirit.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Here's a general engineering tip: if the non-fun part is too complex for you
to figure out, that might indicate the fun part is too ambitious." (John Naylor)
https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com

Re: Getting rid of CaseTestExpr (well, partially)

I wrote:
> Therefore, my proposal here is to leave the parser's usage of
> CaseTestExpr alone, and replace CaseTestExprs with Params during
> eval_const_expressions, just before any relevant function inlining
> could happen.

I thought of a nasty defect in this idea: CASE expressions that would
have been seen as equal() may not be equal after the transformation.
(The upper nodes are fine because we can make their *param fields be
equal_ignore, but the lower PARAM_EXPRs won't match.)  There's at
least one important behavior this probably breaks, which is matching
of index expressions containing a CASE or ArrayCoerce to query quals.
That might be a rare use-case, but it'd annoy anyone doing it, and
it'd be pretty hard to explain why it's not a bug.

I spent some time wondering if we could somehow number the Params
"bottom up" to fix this, so that (for example) a CASE would always
use paramid 1 unless it contains one other CASE, which would cause it
to use paramid 2, etc.  I think eval_const_expressions could be made
to do that, but it's not clear how to preserve the property during
function inlining.  But the main thing I don't like is that this would
make it much less obvious that Params with overlapping lifespans have
distinct IDs, which takes away a large part of the attraction of the
whole design.

Pending some better idea, I'm withdrawing this patch.

regards, tom lane

Re: POC, WIP: OR-clause support for indexes

On Tue, Jan 28, 2025 at 10:42 AM Andrei Lepikhov  wrote:
> On 1/28/25 11:36, Andrei Lepikhov wrote:
> > On 1/27/25 16:50, Alexander Korotkov wrote:
> > qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);
> >
> > To fit an index, the order of elements in the target array of the
> > `ScalarArrayOpExpr` may change compared to the initial list of OR
> > expressions. If there are indexes that cover the same set of columns but
> > in reverse order, this could potentially alter the position of a
> > Subplan. However, I believe this is a rare case; it is supported by the
> > initial OR path and should be acceptable.
> I beg your pardon - I forgot that we've restricted the feature's scope
> and can't combine OR clauses into ScalarArrayOpExpr if the args list
> contains references to different columns.
> So, my note can't be applied here.

OK, thank you!

--
Regards,
Alexander Korotkov
Supabase

Re: POC, WIP: OR-clause support for indexes

On Thu, Jan 30, 2025 at 3:23 PM Pavel Borisov  wrote:
> On Tue, 28 Jan 2025 at 12:42, Andrei Lepikhov  wrote:
> >
> > On 1/28/25 11:36, Andrei Lepikhov wrote:
> > > On 1/27/25 16:50, Alexander Korotkov wrote:
> > > qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);
> > >
> > > To fit an index, the order of elements in the target array of the
> > > `ScalarArrayOpExpr` may change compared to the initial list of OR
> > > expressions. If there are indexes that cover the same set of columns but
> > > in reverse order, this could potentially alter the position of a
> > > Subplan. However, I believe this is a rare case; it is supported by the
> > > initial OR path and should be acceptable.
> > I beg your pardon - I forgot that we've restricted the feature's scope
> > and can't combine OR clauses into ScalarArrayOpExpr if the args list
> > contains references to different columns.
> > So, my note can't be applied here.
> >
> > --
> > regards, Andrei Lepikhov
>
> I've looked at the patch v46-0001
> Looks good to me.
>
> There is a test that demonstrates the behavior change. Maybe some more
> cases like are also worth adding to a test.
>
> +SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.c
> OR (t1.a=t2.b OR t1.a=1);
> +   QUERY PLAN
> +
> + Nested Loop
> +   ->  Seq Scan on bitmap_split_or t2
> +   ->  Index Scan using t_a_b_idx on bitmap_split_or t1
> + Index Cond: (a = ANY (ARRAY[t2.c, t2.b, 1]))
> +(4 rows)
> +
> +EXPLAIN (COSTS OFF)
> +SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.c=t2.b OR 
> t1.a=1;
> +  QUERY PLAN
> +--
> + Nested Loop
> +   Join Filter: ((t1.c = t2.b) OR (t1.a = 1))
> +   ->  Seq Scan on bitmap_split_or t1
> +   ->  Materialize
> + ->  Seq Scan on bitmap_split_or t2
> +(5 rows)
> +
> +EXPLAIN (COSTS OFF)

Added more tests to join.sql

> Comment
> > *Also, add any potentially usable join OR clauses to *joinorclauses
> may reflect the change in v46-0001 lappend -> list_append_unique_ptr
> that differs in the processing of equal clauses in the list.

Comments in this function are revised.  I also added detailed
explanation of this change to the commit message.

> Semantics mentioned in the commit message:
> > 2. Make match_join_clauses_to_index() pass OR-clauses to
> >match_clause_to_index().
> could also be added as comments in the section just before
> match_join_clauses_to_index()

Right, this is addressed too.

> Since d4378c0005e6 comment for match_clause_to_indexcol() I think
> needs change. This could be as a separate commit, not regarding
> current patch v46-0001.
> > * NOTE:  returns NULL if clause is an OR or AND clause; it is the
> > * responsibility of higher-level routines to co

Good catch.  This is added as a separate patch.

> I think the patch can be pushed with possible additions to regression
> test and comments.

OK, thank you!

--
Regards,
Alexander Korotkov
Supabase


v47-0001-Revise-the-header-comment-for-match_clause_to_in.patch
Description: Binary data


v47-0002-Allow-usage-of-match_orclause_to_indexcol-for-jo.patch
Description: Binary data

Re: POC, WIP: OR-clause support for indexes

On Fri, Jan 31, 2025 at 4:31 PM Alena Rybakina
 wrote:
> I started reviewing at the patch and saw some output "ERROR" in the output of 
> the test and is it okay here?
>
> SELECT * FROM tenk1 t1
> WHERE t1.thousand = 42 OR t1.thousand = (SELECT t2.tenthous FROM tenk1 t2 
> WHERE t2.thousand = t1.tenthous);
> ERROR: more than one row returned by a subquery used as an expression

The output is correct for this query.  But the query is very
unfortunate for the regression test.  I've revised query in the v47
revision [1].

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfdsBZmNt9qUoJBqsQFiVDX1%3DyCKpuVAt1YnR7JCpP%3Dk8%2BA%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase

Re: Allow default \watch interval in psql to be configured

2025-02-02 Thread Daniel Gustafsson

> On 19 Nov 2024, at 11:20, Masahiro Ikeda  wrote:

> I've tested it and reviewed the patch. I'd like to provide some feedback.

Thank you very much for your review and I do apologize for the late response.

> I tested with v3 patch and found the following compile error.
> It seems that math.h needs to be included in variables.c.
> 
> variables.c: In function 'ParseVariableDouble':
> variables.c:220:54: error: 'HUGE_VAL' undeclared (first use in this function)
>  220 |  (dblval == 0.0 || dblval >= HUGE_VAL || 
> dblval <= -HUGE_VAL))
>  |  ^~~~

Odd, I don't see that, but I've nonetheless fixed it by including math.h

> Although the error handling logic is being discussed now, I think it would be 
> better,
> at least, to align the logic and messages of exec_command_watch() and 
> ParseVariableDouble().

I see your point, especially since ParseVariableBuffer is only used for this at
the moment.  It is however intended as generic functionality and would require
to diverge from the other ParseVariableXXX to support custom error messages.
I'm not sure it's worth the added complexity for something which is likely to
be a quite niche feature.

> ParseVariableDouble() doesn't have a comment yet, though you may be planning 
> to add one later.

Fixed.

> I believe the default value is 2 after the WATCH_INTERVAL is specified 
> because \unset
> WATCH_INTERVAL sets it to '2'. So, why not update the following sentence 
> accordingly?
> 
> -of rows. Wait the specified number of seconds (default 2) between 
> executions.
> -For backwards compatibility,
> +of rows. Wait the specified number of seconds (default 2, which can 
> be
> +changed with the variable
> +between executions.  For backwards compatibility,
> 
> For example,
> 
>> Wait WATCH_INTERVAL seconds unless the interval option is 
>> specified.
>> If the interval option is specified, wait the specified number of seconds 
>> instead.
> 
> +This variable sets the default interval which 
> \watch
> +waits between executing the query.  Specifying an interval in the
> +command overrides this variable.
> 
>> This variable sets the interval in seconds that \watch 
>> waits
>> between executions. The default value is 2.0.

I don't think this improves the documentation.  The patch only changes the
defult interval which is used if the interval is omitted from the command, the
above sections discuss how interval is handled when specified.  I did some
minor tweaks to the docs to try and clarify things.

> I think it's better to replace queries with executions because the \watch
> documentation says so.
> 
> + HELP0("  WATCH_INTERVAL\n"
> +  "number of seconds \\watch waits between queries\n");

Fair point, fixed.

The attached v4 fixes the above and also adds tests.

--
Daniel Gustafsson



v4-0001-Make-default-watch-interval-configurable.patch
Description: Binary data

Increased work_mem for "logical replication tablesync worker" only?

2025-02-02 Thread Dmitry Koterov

Hi.

Trying to monitor perf during the initial tablesync phase (COPY) right
after CREATE SUBSCRIPTION. I noticed that the size
of 17/main/base/pgsql_tmp on the destination node grows (tens of gigabytes)
as the COPY command (running internally on the publisher) progresses. Then
in the end (when its "EXPLAIN SELECT 1 FROM tbl" on the destination shows
the approximate number of rows equals to the number of rows on the source
node) it hangs for several minutes, and then 17/main/base/pgsql_tmp
empties, and the subscription progresses.

It seems like if I increase work_mem to several GB, then the growth
of 17/main/base/pgsql_tmp becomes less significant.

Questions:

1. Are there some diagnostics commands that would allow me to figure out
what is in those tmp files? Why does the subscriber create those tmp files
and not just write directly to the data files and WAL? (The table has 2
bytea columns, i.e. it's TOASTed for sure.)
2. Is there a way to set work_mem only for "logical replication tablesync
worker"? I don't want to have it that high for all connections, but for
logical replication tablesync worker - it's fine to have it set to a huge
value (I have max_sync_workers_per_subscription=1, so there is not more
than 1 of such processes in the system).
3. Is this work_mem consideration relevant at all? Maybe it's a red herring?

Thanks!

[PATCH] Fix incorrect range in pg_regress comment

2025-02-02 Thread Ilia Evdokimov


Hi hackers,

I noticed that a comment in pg_regress incorrectly states that 
alternative output files can be named filename{_i}.out with 0 < i <= 9. 
However, the actual valid range is 0 <= i <= 9. This patch corrects the 
comment.


The fix is trivial but ensures that the documentation in the code 
accurately reflects the behavior of pg_regress.


Attached is a small patch correcting this.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d2ef1b2ff2f42bc38eb2d2d87201a2822d53d8b5 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sun, 2 Feb 2025 21:35:10 +0300
Subject: [PATCH v1] Fix incorrect range in pg_regress comment

The comment in pg_regress incorrectly stated that alternative
output files could be named test{_i}.out with 0 < i <= 9.
However, the valid range is actually 0 <= i <= 9.
---
 src/test/regress/pg_regress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 61a234ae21..5d85dcc62f 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1335,7 +1335,7 @@ make_directory(const char *dir)
 }
 
 /*
- * In: filename.ext, Return: filename_i.ext, where 0 < i <= 9
+ * In: filename.ext, Return: filename_i.ext, where 0 <= i <= 9
  */
 static char *
 get_alternative_expectfile(const char *expectfile, int i)
-- 
2.34.1

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2025-02-02 Thread jian he

hi.
the following reviews based on
v10-0001-Introduce-the-ability-to-set-index-visibility-us.patch.

in src/test/regress/sql/create_index.sql
seems there are no sql tests for "create index ... invisible"?


   
VISIBLE

 
  Make the specified index visible. The index will be used for queries.
 

   
here it should be
"Make the specified index visible. The index can be used for query planning"
?


Do we need to add GUC use_invisible_index to postgresql.conf.sample?


CREATE TABLE t(id INT PRIMARY KEY, data TEXT,num INT, vector INT[],
range INT4RANGE);
ALTER INDEX t_pkey INVISIBLE;
alter table t alter column id set data type bigint;
\d t

after ALTER TABLE SET DATA TYPE, the "visible" status should not change?
but here it changed.
you may check ATPostAlterTypeParse to make the "visible" status not change.



@@ -3449,6 +3451,7 @@ typedef struct IndexStmt
 boolif_not_exists;/* just do nothing if index already
exists? */
 boolreset_default_tblspc;/* reset default_tablespace prior to
  * executing */
+  boolisvisible;/* true if VISIBLE (default), false
if INVISIBLE */
 } IndexStmt;
the indentation level is not right?

+opt_index_visibility:
+VISIBLE_P  { $$ = true; }
+| INVISIBLE_P   { $$ = false; }
+| /*EMPTY*/ { $$ = true; }
+;
+
the indentation level seems also not right?

+createFlags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+if (indexForm->indisvisible)
+createFlags |= INDEX_CREATE_VISIBLE;
the indentation level seems also not right?


INVISIBLE, VISIBLE is not special words, in gram.y, you don't need
"VISIBLE_P", "INVISIBLE_P", you can just use "INVISIBLE", "VISIBLE"
?


\d t3
  Table "public.t3"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 id | integer   |   | not null |
 data   | text  |   |  |
 num| integer   |   |  |
 vector | integer[] |   |  |
 range  | int4range |   |  |
 a  | box   |   |  |
Indexes:
"t3_pkey" PRIMARY KEY, btree (id) INVISIBLE
"grect2ind" gist (a) INVISIBLE
"t3_1" gist (a) INVISIBLE
"t3_2" gin (vector) WITH (fastupdate='on',
gin_pending_list_limit='128') INVISIBLE
"t3_4" spgist (data) INVISIBLE
"t3_6" hash (id) INVISIBLE


pg_dump will dump as
--
-- Name: t3 t3_pkey; Type: CONSTRAINT; Schema: public; Owner: jian
--
ALTER TABLE ONLY public.t3
ADD CONSTRAINT t3_pkey PRIMARY KEY (id);

after dump, restore index (primary key: t3_pkey) INVISIBLE will not be restored.
We need extra work for restoring the INVISIBLE flag for the primary key index.


I am not sure if we need to change index_concurrently_swap or not.
but many other pg_index columns changed.

Re: [PATCH] Fix incorrect range in pg_regress comment

Ilia Evdokimov  writes:
> I noticed that a comment in pg_regress incorrectly states that 
> alternative output files can be named filename{_i}.out with 0 < i <= 9. 
> However, the actual valid range is 0 <= i <= 9. This patch corrects the 
> comment.

Hmm, our convention is definitely that the numbers start with 1,
so I do not want to make this change.  Maybe we should change
the code instead.

regards, tom lane

Re: Pgoutput not capturing the generated columns

On Wed, Jan 29, 2025 at 2:48 PM Amit Kapila  wrote:
>
> On Wed, Jan 29, 2025 at 6:03 AM Peter Smith  wrote:
> >
> > On Tue, Jan 28, 2025 at 7:59 PM Amit Kapila  wrote:
> > >
> > > On Thu, Jan 23, 2025 at 9:58 AM vignesh C  wrote:
> > > >
> > >
> > > I have pushed the remaining part of this patch. Now, we can review the
> > > proposed documentation part.
> > >
> > > I feel we don't need the Examples sub-section for this part of the
> > > docs. The behavior is quite clear from the "Behavior Summary"
> > > sub-section table.
> >
> > It is good to hear that the "Behavior Summary" matrix is clear, but it
> > is never the purpose of examples to show behaviour that is not already
> > clearly documented. The examples are simply to demonstrate some real
> > usage. Personally, I find it far easier to understand this (or any
> > other) feature by working through a few examples in conjunction with
> > the behaviour matrix, instead of just having the matrix and nothing
> > else.
> >
>
> I am not against giving examples in the docs to make the topic easy to
> understand but in this particular case, I am not sure if additional
> examples are useful. You already gave one example in the beginning:
> "For example, note below that subscriber table generated column value
> comes from the subscriber column's calculation." the remaining text is
> clear enough to understand the feature.
>
> If you still want to make a case for additional examples, divide this
> patch into two parts. The first part without examples could be part of
> this thread and I can commit that. Then you can start a separate
> thread just for the examples and then we can see what others think and
> make a decision based on that.
>

I have created a new thread [1] to propose adding the "Examples" subsection.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt_7GV8eHSW4XQsC6rF13TWrz-SrGeeiV71%3DSE14DC4Jg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

DOCS - Generated Column Replication Examples

Hi,

A recent commit [1] added a new section "29.6. Generated Column
Replication" to the documentation [2]. But, no "Examples" were
included.

I've created this new thread to make a case for adding an "Examples"
subsection to that page.

(the proposed examples patch is attached)

1. Reasons AGAINST Adding Examples to this section (and why I disagree):

1a. The necessary information to describe the feature can already be
found in the text.

Yes, the documentation describes the feature, as it should. However,
examples serve a different purpose: they reinforce understanding by
demonstrating how the feature works in practice. They don’t introduce
new details; they clarify and illustrate existing ones.

1b. I can understand the feature without needing examples.

PostgreSQL serves a diverse user base with different levels of
expertise and learning styles. While experienced users might grasp the
feature quickly, others may struggle with dense explanatory text and
behaviour matrices. Why knowingly make it harder for them? If examples
can help some users, and others can simply skip them, there's no
downside to including them.

1c. Too many examples will swamp the page.

Agreed—too many examples would be excessive. That’s why the proposed
patch includes only a couple of well-chosen cases, rather than
covering every possible combination. This keeps the documentation
focused and useful without overwhelming the reader.

~~~

2. Reasons FOR Adding Examples to this section:

2a. To emphasise some special and/or subtle rules.

Some critical details—like how publication column lists override the
publish_generated_columns setting—are easy to overlook in a dense set
of behaviour rules. An example makes this clear by demonstrating the
effect directly, ensuring users don't miss it.

2b. To make it easy for users to try it themselves.

These examples are self-contained including the necessary
publication/subscription and tables. A user can simply cut/paste these
to PSQL and begin experimenting with the feature immediately, instead
of piecing together a working setup from scattered explanations. This
reduces friction and encourages hands-on learning.

2c. It may become inevitable later anyway.

With the eventual introduction of VIRTUAL generated columns,
replication behaviour will become even more complex, requiring more
rules and expanded behaviour matrices. Including examples now lays the
groundwork for future enhancements

~~~

Thoughts?

==
[1]
https://github.com/postgres/postgres/commit/6252b1eaf82bb8db361341d1c8651e43b29be816#diff-e593b875ae1b49848e9954e7a3ea26fb34de7454b338a0029aec72b0fb25bb0a
[2] https://www.postgresql.org/docs/devel/logical-replication-gencols.html

Kind Regards,
Peter Smith.
Fujitsu Australia.

v1-0001-DOCS-Generated-Column-Replication-Examples.patch
Description: Binary data

Re: pgbench with partitioned tables

2025-02-02 Thread Sami Imseih

I was looking at the comments [1] for why COPY FREEZE is
not allowed on a parent table, and it was mainly due
to having to open up partitions to check if they are able
to take the optimization (table created or truncated in the
current transaction ). Obviously as the number of
partitions grow, it will take more to perform these
checks. My suspicious is that in the cases in which partitions
are used, the benefits of COPY FREEZE could outweigh
the overhead of these additional checks.

So while we could try to solve the COPY FREEZE issue
specifically for pgbench, I wonder if we should try to do better
and see if the limitation on a parent partition can be removed.

I can provide a patch and some benchmark numbers unless
there is something bigger I am missing about the reason this
limitation exists?


[1] 
https://github.com/postgres/postgres/blob/master/src/backend/commands/copyfrom.c#L727-L735

Regards,

Sami

Re: Using Expanded Objects other than Arrays from plpgsql

2025-02-02 Thread Michel Pelletier

On Sun, Feb 2, 2025 at 1:57 PM Tom Lane  wrote:

> I wrote:
> > Hmm, it seemed to still apply for me.  But anyway, I needed to make
> > the other changes, so here's v4.
>
> PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
> to create plpgsql_statement_tree_walker(), and then in what's now
> 0003 we can use that instead of writing a lot of duplicate code.
>

Thanks Tom!  These patches apply for me and all my tests and benchmarks are
still good.

-Michel

Re: Increased work_mem for "logical replication tablesync worker" only?

On Sun, Feb 2, 2025 at 5:13 PM Dmitry Koterov  wrote:
>
> Trying to monitor perf during the initial tablesync phase (COPY) right after 
> CREATE SUBSCRIPTION. I noticed that the size of 17/main/base/pgsql_tmp on the 
> destination node grows (tens of gigabytes) as the COPY command (running 
> internally on the publisher) progresses. Then in the end (when its "EXPLAIN 
> SELECT 1 FROM tbl" on the destination shows the approximate number of rows 
> equals to the number of rows on the source node) it hangs for several 
> minutes, and then 17/main/base/pgsql_tmp empties, and the subscription 
> progresses.
>
> It seems like if I increase work_mem to several GB, then the growth of 
> 17/main/base/pgsql_tmp becomes less significant.
>
> Questions:
>
> 1. Are there some diagnostics commands that would allow me to figure out what 
> is in those tmp files? Why does the subscriber create those tmp files and not 
> just write directly to the data files and WAL? (The table has 2 bytea 
> columns, i.e. it's TOASTed for sure.)
>

We do write spill files (ending with '.spill') if the changes are
large. Can you please share the name of tmp files to avoid any
assumptions?

-- 
With Regards,
Amit Kapila.

Re: jsonlog missing from logging_collector description

On Fri, Jan 31, 2025 at 09:32:42PM -0500, Tom Lane wrote:
> Sure, fine by me.

This one has been done as d61b9662b09e.
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind with --write-recovery-conf option doesn't write dbname to primary_conninfo value.

On Thu, Jan 30, 2025 at 3:17 AM Masahiko Sawada  wrote:
>
> I found that pg_rewind with --write-recovery-conf option doesn't write
> the dbname to an auto-generated primary_conninfo value. Therefore,
> after a failover, the old primary cannot start if it's rewound by
> pg_rewind with --write-recovery-conf option if sync_replication_slots
> is on. Is it an oversight in commit a145f424d5?
>

IIRC, we tried to do minimal in the first version as we didn't have
all the use cases. However, it makes sense to support this in HEAD as
proposed by you.

-- 
With Regards,
Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

On Fri, Jan 31, 2025 at 8:02 PM Amit Kapila  wrote:
>
> On Fri, Jan 31, 2025 at 10:40 AM Peter Smith  wrote:
> >
> > ==
> > src/backend/replication/slot.c
> >
> > ReportSlotInvalidation:
> >
> > 1.
> > +
> > + case RS_INVAL_IDLE_TIMEOUT:
> > + Assert(inactive_since > 0);
> > + /* translator: second %s is a GUC variable name */
> > + appendStringInfo(&err_detail,
> > + _("The slot has remained idle since %s, which is longer than the
> > configured \"%s\" duration."),
> > + timestamptz_to_str(inactive_since),
> > + "idle_replication_slot_timeout");
> > + break;
> > +
> >
> > errdetail:
> >
> > I guess it is no fault of this patch because I see you've only copied
> > nearby code, but AFAICT this function is still having an each-way bet
> > by using a mixture of _() macro which is for strings intended be
> > translated, but then only using them in errdetail_internal() which is
> > for strings that are NOT intended to be translated. Isn't it
> > contradictory? Why don't we use errdetail() here?
> >
>
> Your question is valid and I don't have an answer. I encourage you to
> start a new thread to clarify this.
>

I think this was a false alarm.

After studying this more deeply, I've changed my mind and now think
the code is OK as-is.

AFAICT errdetail_internal is used when not wanting to translate the
*fmt* string passed to it (see EVALUATE_MESSAGE in elog.c). Now, here
the format string is just "%s" so it's fine to not translate that.
Meanwhile, the string value being substituted to the "%s" was already
translated because of the _(x) macro aka gettext(x).

I found other examples similar to this -- see the
error_view_not_updatable() function in rewriteHandler.c which does:
ereport(ERROR,
...
 detail ? errdetail_internal("%s", _(detail)) : 0,
...

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH] Fix incorrect range in pg_regress comment

Michael Paquier  writes:
> Right, let's adjust the comment to reflect what the docs say, as your
> patch does.  I presume that Tom will do that..

If I complain about it I gotta fix it, huh?  Okay, done.

regards, tom lane

Re: Proposal to CREATE FOREIGN TABLE LIKE

2025-02-02 Thread Mingli Zhang

Zhang Mingli
www.hashdata.xyz
On Feb 2, 2025 at 21:24 +0800, Álvaro Herrera ,
wrote:


Eh yeah, I guess for this use case it makes sense to allow a LIKE clause
on CREATE FOREIGN TABLE. Were you going to submit a patch?

Hi,

Yes, I would like to provide a patch.

Glad to see we have come to an agreement on this.

Re: Using Expanded Objects other than Arrays from plpgsql

I wrote:
> Hmm, it seemed to still apply for me.  But anyway, I needed to make
> the other changes, so here's v4.

I decided to see what would happen if we tried to avoid the code
duplication in pl_funcs.c by making some "walker" infrastructure
akin to expression_tree_walker.  While that doesn't seem useful
for the dump_xxx functions, it works very nicely for the free_xxx
functions and now for the mark_xxx ones as well.  pl_funcs.c
nets out about 400 lines shorter than in the v4 patch.  The
code coverage score for the file is still awful :-(, but that's
because we're not testing the dump_xxx functions at all.

PFA v5.  The new 0001 patch refactors the free_xxx infrastructure
to create plpgsql_statement_tree_walker(), and then in what's now
0003 we can use that instead of writing a lot of duplicate code.

regards, tom lane

From bec67b472a0f9b237c5ed1feffd01ee4428b0688 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Sun, 2 Feb 2025 16:05:01 -0500
Subject: [PATCH v5 1/5] Refactor pl_funcs.c to provide a usage-independent
 tree walker.

We haven't done this up to now because there was only one use-case,
namely plpgsql_free_function_memory's search for expressions to clean
up.  However an upcoming patch has another need for walking plpgsql
functions' statement trees, so let's create sharable tree-walker
infrastructure in the same style as expression_tree_walker().

This patch actually makes the code shorter, although that's
mainly down to having used a more compact coding style.  (I didn't
write a separate subroutine for each statement type, and I made
use of some newer notations like foreach_ptr.)

Discussion: https://postgr.es/m/CACxu=vjakfnsyxoosnw1wegsao5u_v1xybacfvj14wgjv_p...@mail.gmail.com
---
 src/pl/plpgsql/src/pl_funcs.c | 582 ++
 1 file changed, 244 insertions(+), 338 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 8c827fe5cc..88e25b54bc 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -334,387 +334,291 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 
 
 /**
- * Release memory when a PL/pgSQL function is no longer needed
+ * Support for recursing through a PL/pgSQL statement tree
  *
- * The code for recursing through the function tree is really only
- * needed to locate PLpgSQL_expr nodes, which may contain references
- * to saved SPI Plans that must be freed.  The function tree itself,
- * along with subsidiary data, is freed in one swoop by freeing the
- * function's permanent memory context.
+ * The point of this code is to encapsulate knowledge of where the
+ * sub-statements and expressions are in a statement tree, avoiding
+ * duplication of code.  The caller supplies two callbacks, one to
+ * be invoked on statements and one to be invoked on expressions.
+ * (The recursion should be started by invoking the statement callback
+ * on function->action.)  The statement callback should do any
+ * statement-type-specific action it needs, then recurse by calling
+ * plpgsql_statement_tree_walker().  The expression callback can be a
+ * no-op if no per-expression behavior is needed.
  **/
-static void free_stmt(PLpgSQL_stmt *stmt);
-static void free_block(PLpgSQL_stmt_block *block);
-static void free_assign(PLpgSQL_stmt_assign *stmt);
-static void free_if(PLpgSQL_stmt_if *stmt);
-static void free_case(PLpgSQL_stmt_case *stmt);
-static void free_loop(PLpgSQL_stmt_loop *stmt);
-static void free_while(PLpgSQL_stmt_while *stmt);
-static void free_fori(PLpgSQL_stmt_fori *stmt);
-static void free_fors(PLpgSQL_stmt_fors *stmt);
-static void free_forc(PLpgSQL_stmt_forc *stmt);
-static void free_foreach_a(PLpgSQL_stmt_foreach_a *stmt);
-static void free_exit(PLpgSQL_stmt_exit *stmt);
-static void free_return(PLpgSQL_stmt_return *stmt);
-static void free_return_next(PLpgSQL_stmt_return_next *stmt);
-static void free_return_query(PLpgSQL_stmt_return_query *stmt);
-static void free_raise(PLpgSQL_stmt_raise *stmt);
-static void free_assert(PLpgSQL_stmt_assert *stmt);
-static void free_execsql(PLpgSQL_stmt_execsql *stmt);
-static void free_dynexecute(PLpgSQL_stmt_dynexecute *stmt);
-static void free_dynfors(PLpgSQL_stmt_dynfors *stmt);
-static void free_getdiag(PLpgSQL_stmt_getdiag *stmt);
-static void free_open(PLpgSQL_stmt_open *stmt);
-static void free_fetch(PLpgSQL_stmt_fetch *stmt);
-static void free_close(PLpgSQL_stmt_close *stmt);
-static void free_perform(PLpgSQL_stmt_perform *stmt);
-static void free_call(PLpgSQL_stmt_call *stmt);
-static void free_commit(PLpgSQL_stmt_commit *stmt);
-static void free_rollback(PLpgSQL_stmt_rollback *stmt);
-static void free_expr(PLpgSQL_expr *expr);
+typedef void (*plpgsql_stmt_walker_callback) (PLpgSQL_stmt *stmt,
+			  void *context);
+typedef void (*plpgsql_expr_walker_callback) (PLpgSQL

Re: [PATCH] Fix incorrect range in pg_regress comment

2025-02-02 Thread Jelte Fennema-Nio

On Sun, 2 Feb 2025 at 22:26, Tom Lane  wrote:
> Hmm, our convention is definitely that the numbers start with 1,
> so I do not want to make this change.  Maybe we should change
> the code instead.

That would require any extensions that use the _0.out suffix to update
all those files to use _1.out as the suffix. One such extension is
Citus[1]. That seems like unnecessary extension churn with little
benefit. So my vote would be to update the comment (as is done in
patch v1).

[1]: 
https://github.com/citusdata/citus/blob/main/src/test/regress/expected/columnar_lz4_0.out

Re: [PATCH] Fix incorrect range in pg_regress comment

Jelte Fennema-Nio  writes:
> On Sun, 2 Feb 2025 at 22:26, Tom Lane  wrote:
>> Hmm, our convention is definitely that the numbers start with 1,
>> so I do not want to make this change.  Maybe we should change
>> the code instead.

> That would require any extensions that use the _0.out suffix to update
> all those files to use _1.out as the suffix. One such extension is
> Citus[1].

Oh.  I see we did document it as 0-9 in [1], so I guess we're
stuck with that now.  Objection withdrawn.

regards, tom lane

[1] https://www.postgresql.org/docs/devel/regress-variant.html

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2025-02-02 Thread jian he

hi.

modules/test_ddl_deparse/test_ddl_deparse.so.p/test_ddl_deparse.c.o -c
../../Desktop/pg_src/src7/postgres/src/test/modules/test_ddl_deparse/test_ddl_deparse.c
../../Desktop/pg_src/src7/postgres/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:
In function ‘get_altertable_subcmdinfo’:
../../Desktop/pg_src/src7/postgres/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:111:17:
error: enumeration value ‘AT_SetIndexVisible’ not handled in switch
[-Werror=switch]
  111 | switch (subcmd->subtype)
  | ^~
../../Desktop/pg_src/src7/postgres/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:111:17:
error: enumeration value ‘AT_SetIndexInvisible’ not handled in switch
[-Werror=switch]
cc1: all warnings being treated as errors


so we need to change test_ddl_deparse.c.
The attached patch fixes the indentation and test_ddl_deparse.c issue.
Maybe we can add some tests on src/test/modules/test_ddl_deparse.


v10-0001-minor-indentation-fix.no-cfbot
Description: Binary data

Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData

On Fri, Jan 31, 2025 at 04:54:17PM +0300, Vitaly Davydov wrote:
> I'm looking at the v13 patch. I see, there is the only file for v13:
> v2-0002-Fix-issues-with-2PC-file-handling-at-recovery-13.txt

Yes.  The fixes in 13~16 are simpler.  There are so many conflicts
across all the branches that I'm planning to maintain branches for
each stable versions, depending on the discussion going on.  The
refactoring of 17~ and HEAD could be done first, I suppose. 

> #1. In RecoverPreparedTransactions we iterave over all in-memory two phase
> states and check every xid state in CLOG unconditionally. Image, we have
> a very old transaction that is close to the oldestXid. Will CLOG state be
> available for such transaction? I'm not sure about it.

Yes.  RecoverPreparedTransactions() where the cluster is officially at
the end of recovery, and we've checked that we are consistent.

> #2. In restoreTwoPhaseData we load all the twostate files that are in
> the valid xid range (from oldestXid to nextFullXid in terms of logical
> comparision of xids with epoch). The question - should we load files
> which xids greater than ControlFile->checkPointCopy.nextXid
> (xids after last checkpoint). In general, all twophase files should belong
> to xids before the last checkpoint. I guess, we should probably ignore
> files which xid is equal or greater than the xid of the last checkpoint - 
> twostate data should be in the WAL. If not, I guess, we may see error messages
> like show below when doing xact_redo -> PrepareRedoAdd:

We are already using the nextFullXid from ShmemVariableCache, which is
something that StartupXLOG() fetches from the checkpoint record we
define as the starting point of recovery, which provides a protection
good enough.  Grabbing this information from checkPointCopy.nextXid
would be actually worse when recovering from a backup_label, no?  The
copy of the checkpoint record in the control file would be easily
newer than the checkpoint record retrieved from the LSN in the
backup_label, leading to less files considered as in the future.  I'd
rather trust what we have in WAL a maximum for this data.

Note that there is a minor release next week on the roadmap, and for
clarity I am not planning to rush this stuff in the tree this week:
https://www.postgresql.org/developer/roadmap/

Let's give it more time to find a solution we're all OK with.  I'd
love to hear from Noah here as well, as he got involved in the
discussion of the other thread.
--
Michael

signature.asc
Description: PGP signature

Re: Vacuum statistics

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
 wrote:
> I noticed that the cfbot is bad, the reason seems to be related to the lack 
> of a parameter in src/backend/utils/misc/postgresql.conf.sample. I added it, 
> it should help.

The patch doesn't apply cleanly.  Please rebase.

I see you introduced new GUC variable pgstat_track_vacuum_statistics,
which should address the increased size of statistics.  However, I
don't see how it could affect the size of PgStat_StatTabEntry struct.
It seems that when pgstat_track_vacuum_statistics == 0, extended
vacuum statistics is not collected but the size of hash table entries
is the same.  Also, should pgstat_track_vacuum_statistics also affect
per database statistics?

The name of 0001 is "... on heap relations".  Should we say "on table
relations", because new machinery should work with alternative table
AMs as well.

There are deletions of empty lines in
src/include/utils/pgstat_internal.h and src/include/pgstat.h.  Please,
remote them as it's not purpose of this patchset.

--
Regards,
Alexander Korotkov
Supabase

Re: Non-text mode for pg_dumpall

2025-02-02 Thread Mahendra Singh Thalor

Thanks Jian for review, testing and delta patches.

On Wed, 29 Jan 2025 at 15:09, jian he  wrote:
>
> hi.
>
> we need to escape the semicolon within the single quotes or double quotes.
> I think my patch in [1] is correct.
>
> we can have "ERROR:  role "z" already exists
> but
> error message like
> pg_restore: error: could not execute query: "ERROR:  unterminated
> quoted string at or near "';
> should not be accepted in execute_global_sql_commands, ReadOneStatement,
PQexec
>
> attached is the all the corner test case i come up with against
> ReadOneStatement.
> your v13 will generate errors like "ERROR:  unterminated quoted string
> at or near ..."',
> which is not good, i think.
>
> [1]
https://www.postgresql.org/message-id/CACJufxEQUcjBocKJQ0Amf3AfiS9wFB7zYSHrj1qqD_oWeaJoGQ%40mail.gmail.com

Yes, you are right. We can't read line by line. We should read char by char
and we need some extra handling for double quote names.
I have merged your delta patch into this and now I am doing some more
testing for corner cases of this type of names.
*Ex*: add some comments in names etc or multiple semicolons or other
special characters in name.

On Fri, 31 Jan 2025 at 09:23, jian he  wrote:
>
> hi.
>
> -extern void RestoreArchive(Archive *AHX);
> +extern void RestoreArchive(Archive *AHX, bool append_data);
> Can we spare some words to explain the purpose of append_data.

Fixed. I added some comments on the top of the RestoreArchive function.

>
> in get_dbname_oid_list_from_mfile
> pg_log_info("map.dat file is not present in dump of
> pg_dumpall, so nothing to restore.");
> maybe we can change it to
> pg_log_info("databases restoring is skipped as map.dat file is
> not present in \"%s\"", dumpdirpath);

Fixed.

> we can aslo add Assert(dumpdirpath != NULL)

No, we don't need it as we are already checking inputfileSpec!= NULL.

>
> pg_log_info("found dbname as : \"%s\" and db_oid:%u in map.dat file
> while restoring", dbname, db_oid);
> also need to change. maybe
> pg_log_info("found database \"%s\" (OID: %u) in map.dat file while
> restoring.", dbname, db_oid);

Fixed.

>
> I also did some minor refactoring, please check attached.

Thanks. I merged it.

>
>
> doc/src/sgml/ref/pg_restore.sgml
>  
>   pg_restore
>
>   
>restore a PostgreSQL database from an
>archive file created by pg_dump
>   
>  
> need to change, since now we can restore multiple databases.

Agreed. I added some comments.

>
> doc/src/sgml/ref/pg_dumpall.sgml
>  
>   pg_dumpall
>   extract a PostgreSQL database
> cluster into a script file
>  
> also need change.
On Sat, 1 Feb 2025 at 21:36, Srinath Reddy  wrote:
>
> Hi,
> i think we have to change the pg_dumpall "--help" message similar to
pg_dump's specifying that now pg_dumpall dumps cluster into to other
non-text formats.
> Need similar "--help" message change in pg_restore to specify that now
pg_restore supports restoring whole cluster from archive created from
pg_dumpall.

As Jian suggested, we need to change docs so I did the same changes into
doc and --help also.

On Fri, 31 Jan 2025 at 14:22, jian he  wrote:
>
> hi.
> more small issues.
>
> + count_db++; /* Increment db couter. */
> + dboidprecell = dboid_cell;
> + }
> +
> typo, "couter" should be "counter".

Fixed.

>
> +
> +/*
> + * get_dbname_oid_list_from_mfile
> + *
> + * Open map.dat file and read line by line and then prepare a list of
database
> + * names and correspoding db_oid.
> + *
> typo, "correspoding" should be "corresponding".

Fixed.

>
>
> execute_global_sql_commands comments didn't mention ``IF (outfile) ``
> branch related code.
> We can add some comments saying that
> ""IF opts->filename is not specified, then copy the content of
> global.dat to opts->filename""".

We already have some comments on the top of the execute_global_sql_commands
function.

>
> or split it into two functions.

Done. I added a new function for outfile.

>
>
> + while((fgets(line, MAXPGPATH, pfile)) != NULL)
> + {
> + Oid db_oid;
> + char db_oid_str[MAXPGPATH + 1];
> + chardbname[MAXPGPATH + 1];
> +
> + /* Extract dboid. */
> + sscanf(line, "%u" , &db_oid);
> + sscanf(line, "%s" , db_oid_str);
> +
> + /* Now copy dbname. */
> + strcpy(dbname, line + strlen(db_oid_str) + 1);
> +
> + /* Remove \n from dbanme. */
> + dbname[strlen(dbname) - 1] = '\0';
> +
> + pg_log_info("found dbname as : \"%s\" and db_oid:%u in map.dat file
> while restoring", dbname, db_oid);
> +
> + /* Report error if file has any corrupted data. */
> + if (!OidIsValid(db_oid) || strlen(dbname) == 0)
> + pg_fatal("invalid entry in map.dat file at line : %d", count + 1);
> +
> + /*
> + * XXX : before adding dbname into list, we can verify that this db
> + * needs to skipped for restore or not but as of now, we are making
> + * a list of all the databases.
> + */
> + simple_db_oid_list_append(dbname_oid_list, db_oid, dbname);
> + count++;
> + }
>
>
> db_oid first should be set to 0, dbname first character first should be
set to 0
> (char

meson's in-tree libpq header search order vs -Dextra_include_dirs

2025-02-02 Thread Thomas Munro

When I use configure/make and --with-includes=/usr/local/include, I
see compile lines like this:

... -I../../src/interfaces/libpq -I../../src/include  -I/usr/local/include ...

That's important, because if I happen to have libpq headers installed
on the system I don't want to be confused by them.  Yet meson's
-Dextra_include_dirs=/usr/local/include compiles like this:

... -I../src/include -I/usr/local/include -Isrc/interfaces/libpq ...

... which gives me errors due to the v16 headers I happen to have installed:

../src/fe_utils/connect_utils.c:164:3: error: unknown type name
'PGcancelConn'; did you mean 'PGcancel'?
  164 | PGcancelConn *cancelConn = PQcancelCreate(conn);

I guess this affects Macs, BSDs and a few lesser used Linux distros
that put optional packages outside the base system and default search
paths.  Perhaps you can see this on everything-goes-into-base-system
distros if you redundantly say -Dextra_include_dirs=/usr/include; I
didn't check if that is true, it wouldn't be if meson is opinionated
enough to remove it.

Reordering obvious mentions of libpq, as in the attached, gets past
most of them but I couldn't immediately figure out how to reorder
src/test/isolation/meson.build and gave up and uninstalled the
interfering libpq package for now.  Dumping these observations here as
this is as far as I got with this impediment today.


wip-reorder-include.diff
Description: Binary data

Re: Proposal to CREATE FOREIGN TABLE LIKE

On Mon, Feb 03, 2025 at 06:22:13AM +0800, Mingli Zhang wrote:
> Yes, I would like to provide a patch.
> 
> Glad to see we have come to an agreement on this.

Just adding my +1 here.  FWIW.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Fix incorrect range in pg_regress comment

On Sun, Feb 02, 2025 at 05:01:33PM -0500, Tom Lane wrote:
> Oh.  I see we did document it as 0-9 in [1], so I guess we're
> stuck with that now.  Objection withdrawn.

Oh.  I didn't know that 0 was accepted.  We learn new things every
day.

Right, let's adjust the comment to reflect what the docs say, as your
patch does.  I presume that Tom will do that..
--
Michael


signature.asc
Description: PGP signature

Re: Introduce XID age and inactive timeout based replication slot invalidation

Hi Nisha,

Some review comments for v66-0001.

==
src/backend/replication/slot.c

ReportSlotInvalidation:

1.
  StringInfoData err_detail;
+ StringInfoData err_hint;
  bool hint = false;

  initStringInfo(&err_detail);
+ initStringInfo(&err_hint);


I don't think you still need the 'hint' boolean anymore.

Instead of:
hint ? errhint("%s", err_hint.data) : 0);

You could just do something like:
err_hint.len ? errhint("%s", err_hint.data) : 0);

~~~

2.
+ appendStringInfo(&err_hint, "You might need to increase \"%s\".",
+ "max_slot_wal_keep_size");
  break;
2a.
In this case, shouldn't you really be using macro _("You might need to
increase \"%s\".") so that the common format string would be got using
gettext()?

~

2b.
Should you include a /* translator */ comment here? Other places where
GUC name is substituted do this.

~~~

3.
+ appendStringInfo(&err_hint, "You might need to increase \"%s\".",
+ "idle_replication_slot_timeout");
+ break;

3a.
Ditto above. IMO this common format string should be got using macro.
e.g.: _("You might need to increase \"%s\".")

~

3b.
Should you include a /* translator */ comment here? Other places where
GUC name is substituted do this.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Show WAL write and fsync stats in pg_stat_io

On Fri, Jan 31, 2025 at 11:29:31AM +0300, Nazir Bilal Yavuz wrote:
> On Wed, 29 Jan 2025 at 18:16, Bertrand Drouvot
>  wrote:
>> I think that's the main reason why ff99918c625 added this new GUC (looking at
>> the commit message). I'd feel more comfortable if we keep it.
> 
> As Michael suggested, I will run a couple of benchmarks to see the
> actual effect of this change. Then let's see if this affects anything.

I've looked at bit at all that today, and something like the attached
is what seems like the best streamlined version to me for the main
feature.  I am also planning to run some short benchmarks with
track_io_timing=on on HEAD and with the patch, then see the
difference, without any relationship to track_wal_io_timing.

The comment additions in pgstat_count_io_op_time() were worth a patch
of their own.  This part has been applied as b998fedab74c, after a few
tweaks of my own.
--
Michael
From f3bff6e4a646cb90fd5ba0e178e282d189eda39e Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 27 Jan 2025 14:03:54 +0300
Subject: [PATCH v13] Add WAL I/O stats to both pg_stat_io view and per backend
 I/O statistics

This commit adds WAL I/O stats to both pg_stat_io view and per backend
I/O statistics (pg_stat_get_backend_io()).

This commit introduces a three new I/O concepts:

- IOObject IOOBJECT_WAL is used for tracking all WAL I/Os.
- IOOBJECT_WAL / IOCONTEXT_NORMAL is used for tracking I/O operations
  done on already created wal segments.
- IOOBJECT_WAL / IOCONTEXT_INIT is used for tracking I/O operations done
  while creating the WAL segments.

XXX: Bump catalog version.
XXX: Bump pgstats file version.
---
 src/include/pgstat.h|  4 +-
 src/backend/access/transam/xlog.c   | 39 +++---
 src/backend/access/transam/xlogreader.c | 10 +++
 src/backend/access/transam/xlogrecovery.c   | 12 
 src/backend/utils/activity/pgstat_backend.c |  9 +--
 src/backend/utils/activity/pgstat_io.c  | 79 +++--
 src/test/regress/expected/stats.out | 53 ++
 src/test/regress/sql/stats.sql  | 25 +++
 doc/src/sgml/monitoring.sgml| 19 -
 9 files changed, 212 insertions(+), 38 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 81ec0161c09..a61b488e8d8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -274,14 +274,16 @@ typedef enum IOObject
 {
 	IOOBJECT_RELATION,
 	IOOBJECT_TEMP_RELATION,
+	IOOBJECT_WAL,
 } IOObject;
 
-#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+#define IOOBJECT_NUM_TYPES (IOOBJECT_WAL + 1)
 
 typedef enum IOContext
 {
 	IOCONTEXT_BULKREAD,
 	IOCONTEXT_BULKWRITE,
+	IOCONTEXT_INIT,
 	IOCONTEXT_NORMAL,
 	IOCONTEXT_VACUUM,
 } IOContext;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 90ade4e7d39..eb19cf5690c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2435,16 +2435,19 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 			{
 errno = 0;
 
-/* Measure I/O timing to write WAL data */
-if (track_wal_io_timing)
-	INSTR_TIME_SET_CURRENT(start);
-else
-	INSTR_TIME_SET_ZERO(start);
+/*
+ * Measure I/O timing to write WAL data, for pg_stat_wal
+ * and/or pg_stat_io.
+ */
+start = pgstat_prepare_io_time(track_wal_io_timing || track_io_timing);
 
 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 pgstat_report_wait_end();
 
+pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
+		IOOP_WRITE, start, 1, written);
+
 /*
  * Increment the I/O timing and the number of times WAL data
  * were written out to disk.
@@ -3216,6 +3219,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	int			fd;
 	int			save_errno;
 	int			open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
+	instr_time	io_start;
 
 	Assert(logtli != 0);
 
@@ -3259,6 +3263,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 (errcode_for_file_access(),
  errmsg("could not create file \"%s\": %m", tmppath)));
 
+	/* Measure I/O timing when initializing segment */
+	io_start = pgstat_prepare_io_time(track_io_timing);
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
 	if (wal_init_zero)
@@ -3294,6 +3301,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 	pgstat_report_wait_end();
 
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
+			io_start, 1, wal_segment_size);
+
 	if (save_errno)
 	{
 		/*
@@ -3310,6 +3320,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
  errmsg("could not write to file \"%s\": %m", tmppath)));
 	}
 
+	/* Measure I/O timing when flushing segment */
+	io_start = pgstat_prepare_io_time(track_io_timing);
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
@@ -3322,6 +3335,9 @@ XLogF

Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary

On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada  wrote:
>
> When a standby replays a XLOG_PARAMETER_CHANGE record that lowers
> wal_level from logical, we invalidate all logical slots only when the
> standby is in hot standby mode:
>
> if (InRecovery && InHotStandby &&
> xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> wal_level >= WAL_LEVEL_LOGICAL)
> InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
>0, InvalidOid,
>InvalidTransactionId);
>
> However, it's possible that this record is replayed when not in hot
> standby mode and the slot is used after the promotion. In this case,
> the following Assert in xlog_decode() fails:
>
> /*
>  * This can occur only on a standby, as a primary would
>  * not allow to restart after changing wal_level < logical
>  * if there is pre-existing logical slot.
>  */

Shouldn't we do similar to what this comment indicates on standby? We
can disallow to start the server as standby, if the hot_standby is off
and there is a pre-existing logical slot.

> Assert(RecoveryInProgress());
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("logical decoding on standby requires \"wal_level\" >=
> \"logical\" on the primary")));
>
> Here is brief steps to reproduce this issue:
>
> 1. setup the primary and the standby servers (with hot_standby=on).
> 2. create a logical slot on the standby.
> 3. on standby, set hot_standby=off and restart.
> 4. on primary, lower wal_level to replica and restart.
> 5. promote the standby and execute the logical decoding.
>
> I've attached a small patch to fix this issue. With the patch, we
> invalidate all logical slots even when not in hot standby, that is,
> the patch just removes InHotStandby from the condition. Thoughts?
>

This can fix the scenario you described but I am not sure it is a good
idea. We can consider the fix on these lines if you don't like the
above suggestion.

-- 
With Regards,
Amit Kapila.

Re: NOT ENFORCED constraint feature

2025-02-02 Thread Amul Sul

On Sat, Feb 1, 2025 at 8:31 PM jian he  wrote:
>
> [...]
> So the code should only call AlterConstrTriggerDeferrability,
> not call ATExecAlterConstrEnforceability?

Right. Thank you for the report. We need to know whether the
enforceability and/or deferability has actually been set or not before
catalog update.

Have you started working on the ALTER ... CONSTRAINT for the check
constraint? I am thinking to start that. To fix this bug, we have two
options: we could either throw an error as we don’t currently support
altering enforceability and deferability together (which I’m not a fan
of), or we could refactor the ALTER ... CONSTRAINT code to include
more information which would allow us to perform the appropriate
update action during the execution stage, and it would also help with
altering check constraints.

Regards,
Amul

Re: NOT ENFORCED constraint feature

2025-02-02 Thread Amul Sul

On Fri, Jan 31, 2025 at 7:10 PM Alvaro Herrera  wrote:
>
> On 2025-Jan-31, Ashutosh Bapat wrote:
>
> > But if the constraint is NOT VALID and later marked as NOT ENFORCED,
> > what is expected behaviour while changing it to ENFORCED?
>
> I think what you want is a different mode that would be ENFORCED NOT
> VALID, which would be an extension of the standard, because the standard
> does not support the concept of NOT VALID.  So while I think what you
> want is nice, I'm not sure that this patch necessarily must implement
> it.
>

Here is my understanding behind this feature implementation -- I am
not claiming to be 100% correct, I am confident I am not entirely
wrong either. Let me explain with an example: imagine a user adds a
VALID constraint to a table that already has data, and the user is
completely sure that all the data complies with the constraint. Even
in this case, the system still runs a validation check. This is
expected behavior because the system can't just take the user's word
for it -- it needs to explicitly confirm that the data is valid
through validation.

Now, with a NOT ENFORCED constraint, it's almost like the constraint
doesn't exist, because no checks are being performed and there is no
visible effect for the user, even though the constraint is technically
still there. So when the constraint is switched to ENFORCED, we should
be careful not to automatically mark it as validated (regardless of
its previous validate status) unless the data is actually checked
against the constraint -- treat as adding a new VALID constraint. Even
if the user is absolutely sure the data complies, we should still run
the validation to ensure reliability.

In response to Ashutosh’s point about the VALID/NOT ENFORCED scenario:
if a constraint is initially VALID, then marked as NOT ENFORCED, and
later switched back to ENFORCED -- IMO, it shouldn't automatically be
considered VALID. I had similar thoughts when working on a patch
before v5, but after further discussion, I now agree that it makes
more sense for the system to keep it as NOT VALID until the data has
been validated against the constraint. This is especially important
when a user adds the constraint, is confident the data complies, and
then needs to enforce it. Validating the data ensures the integrity
and consistency of the constraint.

In short, even if the user is 100% confident, skipping validation is
not an option. We need to make sure the constraint’s VALID status is
accurate and reliable before marking it as validated.

Regards,
Amul

Re: Introduce XID age and inactive timeout based replication slot invalidation

On Mon, Feb 3, 2025 at 9:04 AM Peter Smith  wrote:
>
> On Fri, Jan 31, 2025 at 8:02 PM Amit Kapila  wrote:
> >
> > On Fri, Jan 31, 2025 at 10:40 AM Peter Smith  wrote:
> > >
> > > ==
> > > src/backend/replication/slot.c
> > >
> > > ReportSlotInvalidation:
> > >
> > > 1.
> > > +
> > > + case RS_INVAL_IDLE_TIMEOUT:
> > > + Assert(inactive_since > 0);
> > > + /* translator: second %s is a GUC variable name */
> > > + appendStringInfo(&err_detail,
> > > + _("The slot has remained idle since %s, which is longer than the
> > > configured \"%s\" duration."),
> > > + timestamptz_to_str(inactive_since),
> > > + "idle_replication_slot_timeout");
> > > + break;
> > > +
> > >
> > > errdetail:
> > >
> > > I guess it is no fault of this patch because I see you've only copied
> > > nearby code, but AFAICT this function is still having an each-way bet
> > > by using a mixture of _() macro which is for strings intended be
> > > translated, but then only using them in errdetail_internal() which is
> > > for strings that are NOT intended to be translated. Isn't it
> > > contradictory? Why don't we use errdetail() here?
> > >
> >
> > Your question is valid and I don't have an answer. I encourage you to
> > start a new thread to clarify this.
> >
>
> I think this was a false alarm.
>
> After studying this more deeply, I've changed my mind and now think
> the code is OK as-is.
>
> AFAICT errdetail_internal is used when not wanting to translate the
> *fmt* string passed to it (see EVALUATE_MESSAGE in elog.c). Now, here
> the format string is just "%s" so it's fine to not translate that.
> Meanwhile, the string value being substituted to the "%s" was already
> translated because of the _(x) macro aka gettext(x).
>

I didn't get your point about " the "%s" was already translated
because of ...". If we don't want to translate the message then why
add '_(' to it in the first place?

-- 
With Regards,
Amit Kapila.

Re: why there is not VACUUM FULL CONCURRENTLY?

2025-02-02 Thread Antonin Houska

Alvaro Herrera  wrote:

> 
> 
> > From bf2ec8c5d753de340140839f1b061044ec4c1149 Mon Sep 17 00:00:00 2001
> > From: Antonin Houska 
> > Date: Mon, 13 Jan 2025 14:29:54 +0100
> > Subject: [PATCH 4/8] Add CONCURRENTLY option to both VACUUM FULL and CLUSTER
> >  commands.
> 
> > @@ -950,8 +1412,46 @@ copy_table_data(Relation NewHeap, Relation OldHeap, 
> > Relation OldIndex, bool verb
> 
> > +   if (concurrent)
> > +   {
> > +   PgBackendProgress   progress;
> > +
> > +   /*
> > +* Command progress reporting gets terminated at 
> > subtransaction
> > +* end. Save the status so it can be eventually 
> > restored.
> > +*/
> > +   memcpy(&progress, &MyBEEntry->st_progress,
> > +  sizeof(PgBackendProgress));
> > +
> > +   /* Release the locks by aborting the subtransaction. */
> > +   RollbackAndReleaseCurrentSubTransaction();
> > +
> > +   /* Restore the progress reporting status. */
> > +   pgstat_progress_restore_state(&progress);
> > +
> > +   CurrentResourceOwner = oldowner;
> > +   }
> 
> I was looking at 0002 to see if it'd make sense to commit it ahead of a
> fuller review of the rest, and I find that the reason for that patch is
> this hunk you have here in copy_table_data -- you want to avoid a
> subtransaction abort (which you use to release planner lock) clobbering
> the status.  I think this a bad idea.  It might be better to handle this
> in a different way, for instance
> 
> 1) maybe have a flag that says "do not reset progress status during
> subtransaction abort"; REPACK would set that flag, so it'd be able to
> continue its business without having to memcpy the current status (which
> seems like quite a hack) or restoring it afterwards.
> 
> 2) maybe subtransaction abort is not the best way to release the
> planning locks anyway.  I think it might be better to have a
> ResourceOwner that owns those locks, and we do ResourceOwnerRelease()
> which would release them.  I think this would be a novel usage of
> ResourceOwner so it needs more research.  But if this works, then we
> don't need the subtransaction at all, and therefore we don't need
> backend progress restore at all either.

If this needs change, I prefer 2) because it's less invasive: 1) still affects
the progress monitoring code. I'll look at it.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: UUID v7

2025-02-02 Thread Sergey Prokhorenko


Dearcolleagues,

I wouldlike to present for discussion my attached new draft documentation on 
UUIDfunctions (Section 9.14. UUID Functions), which replaces the 
previouslyproposed draft at 
https://www.postgresql.org/docs/devel/functions-uuid.html.I have preserved and 
significantly supplemented the text that was there.

I have thefollowing goals:

1. Statethat from now on, the function uuidv7(), rather than autoincrement, is 
thedefault choice for generating primary keys

2. Describethe advantages of uuidv7() over autoincrement and uuidv4()

3. Refutethe often-cited imaginary disadvantages of UUIDv7 compared to 
autoincrement,such as:

   - Lower performance (see the refutation inthe article "UUID Benchmark War" 
https://ardentperf.com/2024/02/03/uuid-benchmark-war/)

   - Disclosure of date and time of recordcreation in the table (in reality, 
the timestamp offset parameter distorts thisinformation)

4. Confirm thefault tolerance of the uuidv7() function in all possible critical 
situations,namely:

   - System clock failure

   - Receiving an invalid value of the offsetargument, which would result in a 
timestamp overflow or a negative timestamp

 

Regards,

SergeyProkhorenko

sergeyprokhore...@yahoo.com.au





UUID Functions.docx
Description: MS-Word 2007 document


UUID Functions.sgml
Description: Binary data

Optimize scram_SaltedPassword performance

2025-02-02 Thread Zixuan Fu

Hi Hackers,  

While profiling a program with `perf`, I noticed that `scram_SaltedPassword`
consumed more CPU time than expected. After some investigation, I found
that the function performs many HMAC iterations (4096 rounds for
SCRAM-SHA-256), and each iteration reinitializes the HMAC context, causing
excessive overhead.

OpenSSL has an optimization for this case: when the key remains the
same, the HMAC context can be reused with a lightweight state reset by
passing NULL as the key. To take advantage of this, I introduced
`pg_hmac_reuse()`, which replaces the key with NULL when OpenSSL is used.

With this change, the performance improves by approximately **4x** (reducing
execution time from 4ms to 1ms). The built-in PostgreSQL HMAC implementation
does not support context reuse, and modifying it would require substantial
changes. Therefore, `pg_hmac_reuse()` simply calls `pg_hmac_init()` in that
case, maintaining the existing logic.


Regards,

Zixuan
>From e13d9e9c674d1cf4f671ca81b950ae010fd958a4 Mon Sep 17 00:00:00 2001
From: Zi-Xuan Fu 
Date: Mon, 3 Feb 2025 13:33:43 +0800
Subject: [PATCH v1] Optimize "scram_SaltedPassword" with OpenSSL HMAC context reuse  

`scram_SaltedPassword()` requires a large number of HMAC iterations, with 
SCRAM-SHA-256 defaulting to 4096 rounds. Previously, each iteration 
reinitialized the HMAC context, incurring significant overhead. However, 
OpenSSL optimizes this case by allowing context reuse when the key remains 
unchanged—requiring only a lightweight state reset by passing NULL as the key.  

To leverage this, I introduced `pg_hmac_reuse()`, which sets the key to 
NULL when OpenSSL is used. This optimization improves performance by 
approximately 4x (reducing execution time from 4ms to 1ms).  

The native PostgreSQL HMAC implementation does not support context reuse, and 
modifying it would require substantial changes. Therefore, `pg_hmac_reuse()` 
simply calls `pg_hmac_init()` in that case, maintaining the existing logic.  

---
 src/common/hmac.c | 11 +++
 src/common/hmac_openssl.c | 12 
 src/common/scram-common.c |  2 +-
 src/include/common/hmac.h |  1 +
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/common/hmac.c b/src/common/hmac.c
index 9b85910672..6de95c44ee 100644
--- a/src/common/hmac.c
+++ b/src/common/hmac.c
@@ -214,6 +214,17 @@ pg_hmac_init(pg_hmac_ctx *ctx, const uint8 *key, size_t len)
 	return 0;
 }
 
+/*
+ * pg_hmac_reuse
+ *
+ * Reuse a HMAC context with the same key.  Returns 0 on success, -1 on failure.
+ */
+int
+pg_hmac_reuse(pg_hmac_ctx *ctx, const uint8 *key, size_t len)
+{
+	return pg_hmac_init(ctx, key, len);
+}
+
 /*
  * pg_hmac_update
  *
diff --git a/src/common/hmac_openssl.c b/src/common/hmac_openssl.c
index 31cd188904..9e7a92884d 100644
--- a/src/common/hmac_openssl.c
+++ b/src/common/hmac_openssl.c
@@ -208,6 +208,18 @@ pg_hmac_init(pg_hmac_ctx *ctx, const uint8 *key, size_t len)
 	return 0;
 }
 
+/*
+ * pg_hmac_reuse
+ *
+ * Reuse a HMAC context with the same key.  Returns 0 on success, -1 on failure.
+ */
+int
+pg_hmac_reuse(pg_hmac_ctx *ctx, const uint8 *key, size_t len)
+{
+	/* OpenSSL skips unnecessary reinitialization if the key is NULL */
+	return pg_hmac_init(ctx, NULL, len);
+}
+
 /*
  * pg_hmac_update
  *
diff --git a/src/common/scram-common.c b/src/common/scram-common.c
index 400900c51c..5717daff28 100644
--- a/src/common/scram-common.c
+++ b/src/common/scram-common.c
@@ -84,7 +84,7 @@ scram_SaltedPassword(const char *password,
 		CHECK_FOR_INTERRUPTS();
 #endif
 
-		if (pg_hmac_init(hmac_ctx, (uint8 *) password, password_len) < 0 ||
+		if (pg_hmac_reuse(hmac_ctx, (uint8 *) password, password_len) < 0 ||
 			pg_hmac_update(hmac_ctx, (uint8 *) Ui_prev, key_length) < 0 ||
 			pg_hmac_final(hmac_ctx, Ui, key_length) < 0)
 		{
diff --git a/src/include/common/hmac.h b/src/include/common/hmac.h
index c4d069e49a..e9a0366d6e 100644
--- a/src/include/common/hmac.h
+++ b/src/include/common/hmac.h
@@ -22,6 +22,7 @@ typedef struct pg_hmac_ctx pg_hmac_ctx;
 
 extern pg_hmac_ctx *pg_hmac_create(pg_cryptohash_type type);
 extern int	pg_hmac_init(pg_hmac_ctx *ctx, const uint8 *key, size_t len);
+extern int	pg_hmac_reuse(pg_hmac_ctx *ctx, const uint8 *key, size_t len);
 extern int	pg_hmac_update(pg_hmac_ctx *ctx, const uint8 *data, size_t len);
 extern int	pg_hmac_final(pg_hmac_ctx *ctx, uint8 *dest, size_t len);
 extern void pg_hmac_free(pg_hmac_ctx *ctx);
-- 
2.34.1

Re: SQL/JSON json_table plan clause

2025-02-02 Thread Vladlen Popolitov


Nikita Malakhov писал(а) 2025-02-03 02:43:

Hi hackers!

Patch has been rebased onto the current master.

--

Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Hi!

Tested on Mac with 'make check', all tests passed OK.

--
Best regards,

Vladlen Popolitov.

Re: pgbench with partitioned tables

2025-02-02 Thread Sergey Tatarintsev



02.02.2025 20:45, Álvaro Herrera пишет:

On 2025-Jan-31, Melanie Plageman wrote:


Maybe instead of just not using COPY FREEZE on a table if it is
partitioned, we could add new data generation init_steps. Perhaps one
that is client-side data generation (g) but with no freezing? I'm not
really sure what the letter would be (f? making it f, g, and G?).

I think it makes sense to do what you suggest, but on the other hand,
the original code that Sergey is patching looks like a hack in the sense
that it hardcodes which tables to use FREEZE with.  There's no point to
doing things that way actually, so accepting Sergey's patch to replace
that with a relkind check feels sensible to me.

I think the query should be
SELECT relkind FROM pg_catalog.pg_class WHERE oid='%s'::pg_catalog.regclass
if only for consistency with pgbench's other query on catalogs.


Your proposal to add different init_steps might be reasonable, at least
if we allowed partitionedness of tables to vary in other ways (eg. if we
made pgbench_history partitioned), but I don't think it conflicts with
Sergey's patch in spirit.


Thanks for the note. I changed the query in the patch (v2 patch attached)

Btw, an additional benefit from the patch is that we can use foreign tables
(for example, to test postgres_fdw optimizations)


--
With best regards,
Sergey Tatarintsev,
PostgresPro
From 9aaf2b222d6eb0c218c8a417280a8fdc0a42c296 Mon Sep 17 00:00:00 2001
From: Sergey Tatarintsev 
Date: Thu, 30 Jan 2025 14:52:29 +0700
Subject: [PATCH-v1] Fix pgbench client-side data generation for partitioned tables

Client-side data generation cannot perform COPY WITH (FREEZE = ON) on partitioned
tables except pgbench_accounts.
Patch disables FREEZE for any non-regular tables

---
 src/bin/pgbench/pgbench.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b5cf3c6ed01..cac5312c4b1 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -847,6 +847,28 @@ static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,		/* don't need get_variable functionality */
 };
 
+static bool
+is_regular_table(PGconn *con, const char *table)
+{
+	PGresult   *res;
+	char		*sql = psprintf("SELECT relkind FROM pg_catalog.pg_class WHERE oid='%s'::pg_catalog.regclass", table);
+	bool		is_regular_table = true;
+
+	res = PQexec(con, sql);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("query failed: %s", PQerrorMessage(con));
+		pg_log_error_detail("Query was: %s", sql);
+		exit(1);
+	}
+	if (!PQgetisnull(res, 0, 0))
+		is_regular_table = strncmp(PQgetvalue(res, 0, 0), "r", 1) == 0;
+
+	PQclear(res);
+	pfree(sql);
+	return is_regular_table;
+}
+
 static inline pg_time_usec_t
 pg_time_now(void)
 {
@@ -4958,16 +4980,10 @@ initPopulateTable(PGconn *con, const char *table, int64 base,
 
 	initPQExpBuffer(&sql);
 
-	/*
-	 * Use COPY with FREEZE on v14 and later for all the tables except
-	 * pgbench_accounts when it is partitioned.
-	 */
-	if (PQserverVersion(con) >= 14)
-	{
-		if (strcmp(table, "pgbench_accounts") != 0 ||
-			partitions == 0)
+	/* Use COPY with FREEZE on v14 and later for all regular tables */
+	if ((PQserverVersion(con) >= 14) && is_regular_table(con, table))
 			copy_statement_fmt = "copy %s from stdin with (freeze on)";
-	}
+
 
 	n = pg_snprintf(copy_statement, sizeof(copy_statement), copy_statement_fmt, table);
 	if (n >= sizeof(copy_statement))
-- 
2.43.0

Re: Introduce XID age and inactive timeout based replication slot invalidation

On Mon, Feb 3, 2025 at 4:04 PM Amit Kapila  wrote:
>
> On Mon, Feb 3, 2025 at 9:04 AM Peter Smith  wrote:
> >
> > On Fri, Jan 31, 2025 at 8:02 PM Amit Kapila  wrote:
> > >
> > > On Fri, Jan 31, 2025 at 10:40 AM Peter Smith  
> > > wrote:
> > > >
> > > > ==
> > > > src/backend/replication/slot.c
> > > >
> > > > ReportSlotInvalidation:
> > > >
> > > > 1.
> > > > +
> > > > + case RS_INVAL_IDLE_TIMEOUT:
> > > > + Assert(inactive_since > 0);
> > > > + /* translator: second %s is a GUC variable name */
> > > > + appendStringInfo(&err_detail,
> > > > + _("The slot has remained idle since %s, which is longer than the
> > > > configured \"%s\" duration."),
> > > > + timestamptz_to_str(inactive_since),
> > > > + "idle_replication_slot_timeout");
> > > > + break;
> > > > +
> > > >
> > > > errdetail:
> > > >
> > > > I guess it is no fault of this patch because I see you've only copied
> > > > nearby code, but AFAICT this function is still having an each-way bet
> > > > by using a mixture of _() macro which is for strings intended be
> > > > translated, but then only using them in errdetail_internal() which is
> > > > for strings that are NOT intended to be translated. Isn't it
> > > > contradictory? Why don't we use errdetail() here?
> > > >
> > >
> > > Your question is valid and I don't have an answer. I encourage you to
> > > start a new thread to clarify this.
> > >
> >
> > I think this was a false alarm.
> >
> > After studying this more deeply, I've changed my mind and now think
> > the code is OK as-is.
> >
> > AFAICT errdetail_internal is used when not wanting to translate the
> > *fmt* string passed to it (see EVALUATE_MESSAGE in elog.c). Now, here
> > the format string is just "%s" so it's fine to not translate that.
> > Meanwhile, the string value being substituted to the "%s" was already
> > translated because of the _(x) macro aka gettext(x).
> >
>
> I didn't get your point about " the "%s" was already translated
> because of ...". If we don't want to translate the message then why
> add '_(' to it in the first place?
>

I think this is same point where I was fooling myself yesterday.  In
fact we do want to translate the message seen by the user.

errdetail_internal really means don't translate the ***format
string***. In our case "%s" is not the message at all -- it is just
the a *format string* so translating "%s" is kind of meaningless.

e.g. Normally

errdetail("translate me") <-- This would translate the fmt string but
here the fmt is also the message; i.e. it will do gettext("translate
me") internally.

errdetail_internal("translate me") <-- This won't translate anything;
you will have the raw fmt string "translate me"

~~

But since ReportSlotInvalidation is building the message on the fly
there is no single report so it is a bit different

errdetail("%s", "translate me") <-- this would just use gettext("%s")
which is kind of useless. And the "translate me" is just a raw string
and won't be translated.

errdetail_internal("%s", "translate me") <-- this won't translate
anything; the fmt string and the "translate me" are just raw strings

errdetail_internal("%s", _("translate me"))  <-- This won't translate
the fmt string, but to translate %s is useless anyway. OTOH, the _()
macro means it will do gettext("translate me") so the "translate me"
string will get translated before it is substituted. This is
effectively what the ReportSlotInvalidation code is doing.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: pg_rewind with --write-recovery-conf option doesn't write dbname to primary_conninfo value.

2025-02-02 Thread Hayato Kuroda (Fujitsu)

Dear Sawada-san,

> I think it's a good idea to support it at least on HEAD. I've attached
> a patch for that.

+1. I've confirmed that pg_rewind and -R can't output dbname for now,
and your patch allows to do it.

Few comments for your patch.

1.

pg_basebackup.sgml has below description. I feel this can be ported to
pg_rewind.sgml as well.

```
The dbname will be recorded only if the dbname was
specified explicitly in the connection string or 
environment variable.
```

2.
I'm not sure whether recovery_gen.h/c is a correct place for the exported 
function
GetDbnameFromConnectionOptions(). The function itself does not handle
postgresql.cuto.conf file.
I seeked other header files and felt that connect_utils.h might be.

```
/*-
 *
 * Facilities for frontend code to connect to and disconnect from databases.
```

Another idea is to change the third API to accept the whole connection string, 
and
it extracts dbname from it. In this approach we can make 
GetDbnameFromConnectionOptions()
to static function - which does not feel strange for me.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-02 Thread Vladlen Popolitov


Sutou Kouhei писал(а) 2025-02-01 17:12:

Hi,



Hi
I would like to inform about the security breach in your design of COPY 
TO/FROM.


You use FORMAT option to add new formats, filling it with routine name
in shared library. As result any caller can call any routine in 
PostgreSQL kernel.

I think, it will start competition, who can find most dangerous routine
to call just from COPY FROM command.

 Standard PostgreSQL realisation for new methods to use USING keyword. 
Every
new method could have own options (FORMAT is option of internal 'copy 
from/to'

methods), it assumes some SetOptions interface, that defines
an options structure according to the new method requirements.

 I agree with the general direction of the extensibility, but it should 
be secure

and consistent.

--
Best regards,

Vladlen Popolitov.

Re: Introduce XID age and inactive timeout based replication slot invalidation

On Mon, Feb 3, 2025 at 6:16 AM Peter Smith  wrote:
>
>
> 2.
> + appendStringInfo(&err_hint, "You might need to increase \"%s\".",
> + "max_slot_wal_keep_size");
>   break;
> 2a.
> In this case, shouldn't you really be using macro _("You might need to
> increase \"%s\".") so that the common format string would be got using
> gettext()?
>
> ~
>
>
> ~~~
>
> 3.
> + appendStringInfo(&err_hint, "You might need to increase \"%s\".",
> + "idle_replication_slot_timeout");
> + break;
>
> 3a.
> Ditto above. IMO this common format string should be got using macro.
> e.g.: _("You might need to increase \"%s\".")
>
> ~

Instead, we can directly use '_(' in errhint as we are doing in one
other similar place "errhint("%s", _(view_updatable_error;". I
think we didn't use it for errdetail because, in one of the cases, it
needs to use ngettext

-- 
With Regards,
Amit Kapila.

Re: Optimize scram_SaltedPassword performance

2025-02-02 Thread Yura Sokolov

03.02.2025 10:11, Zixuan Fu пишет:
> Hi Hackers,  
> 
> While profiling a program with `perf`, I noticed that `scram_SaltedPassword`
> consumed more CPU time than expected. After some investigation, I found
> that the function performs many HMAC iterations (4096 rounds for
> SCRAM-SHA-256), and each iteration reinitializes the HMAC context, causing
> excessive overhead.
> 
> OpenSSL has an optimization for this case: when the key remains the
> same, the HMAC context can be reused with a lightweight state reset by
> passing NULL as the key. To take advantage of this, I introduced
> `pg_hmac_reuse()`, which replaces the key with NULL when OpenSSL is used.

Good catch.

Since pg_hmac_reuse is not `static`, I'd add some checks that key is
exactly same. At least there should be

   Assert(key == prev_key && len == prev_len && hash_bytes(key, len) ==
prev_hash);

Where `prev_key`, `prev_len` and `prev_hash` are static variables, filled
in `pg_hmac_init`.

I don't know, should it be `Assert`, or check that leads to `elog(ERROR)`.

`hash_bytes` is fast enough to not cause measurable slow down in production.

On the other hand, use cases are trivial enough to occasional misuses to be
caught using just `Assert`.

> With this change, the performance improves by approximately **4x** (reducing
> execution time from 4ms to 1ms). The built-in PostgreSQL HMAC implementation
> does not support context reuse, and modifying it would require substantial
> changes. Therefore, `pg_hmac_reuse()` simply calls `pg_hmac_init()` in that
> case, maintaining the existing logic.

---

regards,
Yura

Re: NOT ENFORCED constraint feature

2025-02-02 Thread Alvaro Herrera

On 2025-Feb-03, Ashutosh Bapat wrote:

> VALID, NOT ENFORCED changed to VALID, ENFORCED - data validation
> required, constraint is enforced

There's no such thing as a VALID NOT ENFORCED constraint.  It just
cannot exist.

> NOT VALID, NOT ENFORCED changed to NOT_VALID, ENFORCED - no data
> validation required, constraint is enforced on the new tuples/changes

This may make sense, but it needs special nonstandard syntax.  If you
start with a NOT VALID NOT ENFORCED constraint (which is the only way to
have a NOT ENFORCED constraint) and apply ALTER TABLE ALTER CONSTRAINT
ENFORCE, you will end up with a VALID ENFORCED constraint, therefore
validation must be run.

If you wanted to add a nonstandard command
ALTER TABLE ALTER CONSTRAINT ENFORCE NO VALIDATE
then maybe the transition you suggest could be made.
It should be a separate patch from regular ALTER CONSTRAINT ENFORCE
though, just in case some problems with it emerge later and we're forced
to revert it, we can still keep the standard command.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"

Re: UUID v7

2025-02-02 Thread Andrey Borodin



> On 31 Jan 2025, at 23:49, Masahiko Sawada  wrote:
> 
> Thank you for the patch! I agree with the basic direction of this fix.
> Here are some review comments:
> 
> ---
> -static inline int64 get_real_time_ns_ascending();
> +static inline uint64 get_real_time_ns_ascending();
> 
> IIUC we don't need to replace int64 with uint64 if we have two
> separate parameters for generate_uuidv7(). It seems to be conventional
> to use a signed int for timestamps.

OK, done.

> 
> ---
> Need to update the function comment of generate_uuidv7() as we changed
> the function arguments.

Done.

> 
> ---
> -   ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC)
> -   * NS_PER_US + ns % NS_PER_US;
> +   us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC);
> 
>/* Generate an UUIDv7 */
> -   uuid = generate_uuidv7(ns);
> +   uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % 
> NS_PER_US);
> 
> I think we can have an inline function or a marco (or use TMODULO()?)
> to split nanoseconds into milliseconds and sub-milliseconds so that
> uuidv7() and uuidv7_interval() can pass them to generate_uuidv7().

I doubt that such macro will make core more readable. I've replaced 1000 with 
macros.

> 
> The comments in uuidv7_interval() also need to be updated accordingly.

Done.

> 
> ---
> I think we need to consider how we can handle the timestamp shifting.
> UUIDv7 contains 48 bits Unix timestamp at milliseconds precision,
> which can represent timestamps approximately between 2493 BC and 6432
> AC. If users specify an interval to shift the timestamp beyond the
> range, 48-bits timestamp would be wrapped around and they would not be
> able to get an expected result. Do we need to raise an error in that
> case?
> 
> ---
> Another problem I found in uuid_extract_timestamp() is that it cannot
> correctly extract a timestamp before 1970/1/1 stored in a UUIDv7
> value:
> 
> postgres(1:1795331)=# select year, uuid_extract_timestamp(uuidv7((year
> || 'year ago')::interval)) from generate_series(54, 56) year;
> year |   uuid_extract_timestamp
> --+-
>   54 | 1971-01-31 10:46:25.111-08
>   55 | 1970-01-31 10:46:25.111-08
>   56 | 10888-09-01 17:18:15.768-07
> (3 rows)
> 
> The problem is that we correctly store a negative timestamp value in a
> UUIDv7 value but uuid_extract_timestamp() unconditionally treats it as
> a positive timestamp value. I think this is a separate bug we need to
> fix.


RFC says unix_ts_ms is unsigned. So, luckily, no BC dates. I bet Pharaohs could 
not measure nanoseconds.
I think it's totally fine to wrap UUID values around year 10598 without an 
error.

I was thinking about incorporating test like this.

>> With this patch we can generate correct UUIDs in a very distant future.
>> postgres=# select x, 
>>  
>>   uuid_extract_timestamp(uuidv7((x::text || ' 
>> year'::text)::interval)),
>> (x::text || ' year'::text)::interval
>> from generate_series(1,9000,1000) x;
>>  x   |   uuid_extract_timestamp|  interval
>> --+-+
>>1 | 2026-01-31 12:00:53.084+05  | 1 year
>> 1001 | 3026-01-31 12:00:53.084+05  | 1001 years
>> 2001 | 4026-01-31 12:00:53.084+05  | 2001 years
>> 3001 | 5026-01-31 12:00:53.084+05  | 3001 years
>> 4001 | 6026-01-31 12:00:53.084+05  | 4001 years
>> 5001 | 7026-01-31 12:00:53.085+05  | 5001 years
>> 6001 | 8026-01-31 12:00:53.085+05  | 6001 years
>> 7001 | 9026-01-31 12:00:53.085+05  | 7001 years
>> 8001 | 10026-01-31 12:00:53.085+05 | 8001 years
>> (9 rows)


or maybe something simple like

with u as (select uuidv7() id) select uuid_extract_timestamp(uuidv7('-09-09 
12:34:56.789+05' - uuid_extract_timestamp(u.id))) from u;

But it would still be flaky, second call to uuidv7() can overflow a millisecond.

Thanks!


Best regards, Andrey Borodin.


v2-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch
Description: Binary data

Doc fix of aggressive vacuum threshold for multixact members storage

2025-02-02 Thread Alex Friedman

Hi,

This patch suggests a correction to the doc page dealing with multixact
vacuuming, which, starting with PG 14, says that the multixact members
storage threshold for aggressive vacuum is 2 GB. However, I believe the
threshold is actually about 10 GB.

MultiXactMemberFreezeThreshold() defines the threshold as 2^32 (0x)
/ 2 or 2^31 multixact members. However, as discussed in multixact.c,
multixact members are stored in groups of 4, each group taking up 20 bytes,
meaning 5 bytes per member. (This is not quite exact as 12 bytes per 8 KB
page are wasted, but I believe it is close enough for the docs.)

This makes the threshold in bytes be 2^31 multixact members * 5 bytes per
member = 10 GiB. It was also confirmed by observing a live system (with an
admittedly unfortunate workload pattern). Also, the maximum storage size
for multixact members is 20 GiB (2^32 * 5), and it should be useful to call
this out in the doc as well.

For reference, the original commit which introduced the current wording is
c552e17, and the discussion was here:
https://www.postgresql.org/message-id/flat/162395467510.686.11947486273299446208%40wrigleys.postgresql.org

The attached patch is against master, but it should probably be backpatched
all the way through 14.

Best regards,

Alex Friedman


v1-0001-Doc-fix-of-aggressive-vacuum-threshold-for-multix.patch
Description: Binary data

Re: Cross-type index comparison support in contrib/btree_gin