Re: Make query cancellation keys longer

2024-03-09 Thread Jelte Fennema-Nio
On Fri, 8 Mar 2024 at 23:20, Heikki Linnakangas  wrote:
> Added some documentation. There's more work to be done there, but at
> least the message type descriptions are now up-to-date.

Thanks, that's very helpful.

> The nice thing about relying on the message length was that we could
> just redefine the CancelRequest message to have a variable length
> secret, and old CancelRequest with 4-byte secret was compatible with the
> new definition too. But it doesn't matter much, so I added an explicit
> length field.
>
> FWIW I don't think that restriction makes sense. Any code that parses
> the messages needs to have the message length available, and I don't
> think it helps with sanity checking that much. I think the documentation
> is a little anachronistic. The real reason that all the message types
> include enough information to find the message end is that in protocol
> version 2, there was no message length field. The only exception that
> doesn't have that property is CopyData, and it's no coincidence that it
> was added in protocol version 3.

Hmm, looking at the current code, I do agree that not introducing a
new message would simplify both client and server implementation. Now
clients need to do different things depending on if the server
supports 3.1 or 3.0 (sending CancelRequestExtended instead of
CancelRequest and having to parse BackendKeyData differently). And I
also agree that the extra length field doesn't add much in regards to
sanity checking (for the CancelRequest and BackendKeyData message
types at least). So, sorry for the back and forth on this, but I now
agree with you that we should not add the length field. I think one
reason I didn't see the benefit before was because the initial patch
0002 was still introducing a CancelRequestExtended message type. If we
can get rid of this message type by not adding a length, then I think
that's worth losing the sanity checking.

> Unfortunately 1234,5679 already in use by NEGOTIATE_SSL_CODE. That's why
> I went the other direction. If we want to add this to "the end", it
> needs to be 1234,5681. But I wanted to keep the cancel requests together.

Fair enough, I didn't realise that. This whole point is moot anyway if
we're not introducing CancelRequestExtended

> We could teach
> libpq to disconnect and reconnect with minor protocol version 3.0, if
> connecting with 3.1 fails, but IMHO it's better to not complicate this
> and accept the break in backwards-compatibility.

Yeah, implementing automatic reconnection seems a bit overkill to me
too. But it might be nice to add a connection option that causes libpq
to use protocol 3.0. Having to install an old libpq to connect to an
old server seems quite annoying. Especially since I think that many
other types of servers that implement the postgres protocol have not
implemented the minor version negotiation.

I at least know PgBouncer[1] and pgcat[2] have not, but probably other
server implementations like CockroachDB and Google Spanner have this
problem too.

I'll try to add such a fallback connection option to my patchset soon.

[1]: https://github.com/pgbouncer/pgbouncer/pull/1007
[2]: https://github.com/postgresml/pgcat/issues/706




Re: Reducing the log spam

2024-03-09 Thread Laurenz Albe
On Thu, 2024-03-07 at 08:30 +0100, Laurenz Albe wrote:
> On Wed, 2024-03-06 at 17:33 -0500, Isaac Morland wrote:
> > I have two questions about this:
> > 
> > First, can it be done per role? If I have a particular application which is
> > constantly throwing some particular error, I might want to suppress it, but
> > not suppress the same error occasionally coming from another application.
> > I see ALTER DATABASE name SET configuration_parameter … as being useful 
> > here,
> > but often multiple applications share a database.
> > 
> > Second, where can this setting be adjusted? Can any session turn off logging
> > of arbitrary sets of sqlstates resulting from its queries? It feels to me
> > like that might allow security problems to be hidden. Specifically, the 
> > first
> > thing an SQL injection might do would be to turn off logging of important
> > error states, then proceed to try various nefarious things.
> 
> I was envisioning the parameter to be like other logging parameters, for
> example "log_statement":  only superusers can set the parameter or GRANT
> that privilege to others.  Also, a superuser could use ALTER ROLE to set
> the parameter for all sessions by that role.
> 
> > It seems to me the above questions interact; an answer to the first might be
> > "ALTER ROLE role_specification SET configuration_parameter", but I think 
> > that
> > would allow roles to change their own settings, contrary to the concern
> > raised by the second question.
> 
> If a superuser sets "log_statement" on a role, that role cannot undo or change
> the setting.  That's just how I plan to implement the new parameter.

Here is a patch that implements this.

I went with "log_suppress_errcodes", since the term "error code" is used
throughout our documentation.

The initial value is 23505,3D000,3F000,42601,42704,42883,42P01,57P03

Yours,
Laurenz Albe
From 26465dafe4bcd2aea3889360987d2a863b66 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Sat, 9 Mar 2024 13:59:55 +0100
Subject: [PATCH v1] Add parameter log_suppress_errcodes

The parameter contains a list of SQLSTATEs for which
FATAL and ERROR messages are not logged.  This is to
suppress messages that are of little interest to the
database administrator, but tend to clutter the log.

Author: Laurenz Albe
Discussion: https://postgr.es/m/408f399e7de1416c47bab7e260327ed5ad92838c.camel%40cybertec.at
---
 doc/src/sgml/config.sgml  |  34 +
 src/backend/utils/error/elog.c| 118 ++
 src/backend/utils/misc/guc_tables.c   |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/pg_config_manual.h|  10 ++
 src/include/utils/guc.h   |   1 +
 src/include/utils/guc_hooks.h |   2 +
 7 files changed, 180 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 43b1a132a2..f9487de06f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6850,6 +6850,40 @@ local0.*/var/log/postgresql
   
  
 
+ 
+  log_suppress_errcodes (string)
+  
+   log_suppress_errcodes configuration parameter
+  
+  
+  
+   
+Causes ERROR and FATAL messages
+with certain error codes to be excluded from the log.
+The value is a comma-separated list of five-character error codes as
+listed in .  An error code that
+represents a class of errors (ends with three zeros) suppresses logging
+of all error codes within that class.  For example, the entry
+08000 (connection_exception)
+would suppress an error with code 08P01
+(protocol_violation).  The default setting is
+23505,3D000,3F000,42601,42704,42883,42P01,57P03.
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+
+   
+This setting is useful to exclude error messages from the log that are
+frequent but irrelevant.  That makes it easier to spot relevant
+messages in the log and keeps log files from growing too big.  For
+example, if you have a monitoring system that regularly establishes a
+TCP connection to the server without sending a correct startup packet,
+you could suppress the protocol violation errors by adding error code
+08P01 to the list.
+   
+  
+ 
+
  
   log_min_duration_statement (integer)
   
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index c9719f358b..4b38adfd4d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -112,12 +112,16 @@ int			Log_error_verbosity = PGERROR_DEFAULT;
 char	   *Log_line_prefix = NULL; /* format for extra log line info */
 int			Log_destination = LOG_DESTINATION_STDERR;
 char	   *Log_destination_string = NULL;
+char	   *log_suppress_errcodes = NULL;
 bool		syslog_sequence_numbers = true;
 bool		syslog_spl

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-09 Thread Dean Rasheed
On Fri, 16 Feb 2024 at 19:39, Tom Lane  wrote:
>
> > So now I'm thinking that we do have enough detail in the present
> > proposal, and we just need to think about whether there's some
> > nicer way to present it than the particular spelling I used here.
>

One thing that concerns me about making even greater use of "$n" is
the potential for confusion with generic plan parameters. Maybe it's
always possible to work out which is which from context, but still it
looks messy:

drop table if exists foo;
create table foo(id int, x int, y int);

explain (verbose, costs off, generic_plan)
select row($3,$4) = (select x,y from foo where id=y) and
   row($1,$2) = (select min(x+y),max(x+y) from generate_series(1,3) x)
from generate_series(1,3) y;

  QUERY PLAN
---
 Function Scan on pg_catalog.generate_series y
   Output: (($3 = $0) AND ($4 = $1) AND (ROWCOMPARE (($1 = $3) AND ($2
= $4)) FROM SubPlan 2 (returns $3,$4)))
   Function Call: generate_series(1, 3)
   InitPlan 1 (returns $0,$1)
 ->  Seq Scan on public.foo
   Output: foo.x, foo.y
   Filter: (foo.id = foo.y)
   SubPlan 2 (returns $3,$4)
 ->  Aggregate
   Output: min((x.x + y.y)), max((x.x + y.y))
   ->  Function Scan on pg_catalog.generate_series x
 Output: x.x
 Function Call: generate_series(1, 3)

Another odd thing about that is the inconsistency between how the
SubPlan and InitPlan expressions are displayed. I think "ROWCOMPARE"
is really just an internal detail that could be omitted without losing
anything. But the "FROM SubPlan ..." is useful to work out where it's
coming from. Should it also output "FROM InitPlan ..."? I think that
would risk making it harder to read.

Another possibility is to put the SubPlan and InitPlan names inline,
rather than outputting "FROM SubPlan ...". I had a go at hacking that
up and this was the result:

  QUERY PLAN
---
 Function Scan on pg_catalog.generate_series y
   Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND
((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4
   Function Call: generate_series(1, 3)
   InitPlan 1 (returns $0,$1)
 ->  Seq Scan on public.foo
   Output: foo.x, foo.y
   Filter: (foo.id = foo.y)
   SubPlan 2 (returns $3,$4)
 ->  Aggregate
   Output: min((x.x + y.y)), max((x.x + y.y))
   ->  Function Scan on pg_catalog.generate_series x
 Output: x.x
 Function Call: generate_series(1, 3)

It's a little more verbose in this case, but in a lot of other cases
it ended up being more compact.

The code is a bit messy, but I think the regression test output
(attached) is clearer and easier to interpret. SubPlans and InitPlans
are displayed consistently, and it's easier to distinguish
SubPlan/InitPlan outputs from external parameters.

There are a few more regression test changes, corresponding to cases
where InitPlans are referenced, such as:

  Seq Scan on document
-   Filter: ((dlevel <= $0) AND f_leak(dtitle))
+   Filter: ((dlevel <= (InitPlan 1).$0) AND f_leak(dtitle))
InitPlan 1 (returns $0)
  ->  Index Scan using uaccount_pkey on uaccount
Index Cond: (pguser = CURRENT_USER)

but I think that's useful extra clarification.

Regards,
Dean
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index c355e8f..f0ff936
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3021,7 +3021,7 @@ select exists(select 1 from pg_enum), su
 QUERY PLAN
 --
  Foreign Scan
-   Output: $0, (sum(ft1.c1))
+   Output: (InitPlan 1).$0, (sum(ft1.c1))
Relations: Aggregate on (public.ft1)
Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1"
InitPlan 1 (returns $0)
@@ -3039,7 +3039,7 @@ select exists(select 1 from pg_enum), su
 QUERY PLAN 
 ---
  GroupAggregate
-   Output: $0, sum(ft1.c1)
+   Output: (InitPlan 1).$0, sum(ft1.c1)
InitPlan 1 (returns $0)
  ->  Seq Scan on pg_catalog.pg_enum
->  Foreign Scan on public.ft1
@@ -3264,14 +3264,14 @@ select sum(c1) filter (where (c1 / c1) *
 
 explain (verbose, costs off)
 select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1;
-QUERY PLAN 

+ QUERY PLAN  
+-

Re: Support "Right Semi Join" plan shapes

2024-03-09 Thread Alena Rybakina


On 06.03.2024 05:23, wenhui qiu wrote:



  Hi Alena Rybakina
  For merge join
  + /*
  + * For now we do not support RIGHT_SEMI join in mergejoin.
  + */
  + if (jointype == JOIN_RIGHT_SEMI)
  + {
  + *mergejoin_allowed = false;
  + return NIL;
  + }
  +
  Tanks



Yes, I see it, thank you. Sorry for the noise.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: postgres_fdw test timeouts

2024-03-09 Thread Nathan Bossart
On Sat, Mar 09, 2024 at 10:00:00AM +0300, Alexander Lakhin wrote:
> I have re-run the tests and found out that the issue was fixed by
> d3c5f37dd. It changed the inner of the loop "while (PQisBusy(conn))",
> formerly contained in pgfdw_get_result() as follows:
>             /* Data available in socket? */
>             if (wc & WL_SOCKET_READABLE)
>             {
>                 if (!PQconsumeInput(conn))
>                     pgfdw_report_error(ERROR, NULL, conn, false, query);
>             }
> ->
>     /* Consume whatever data is available from the socket */
>     if (PQconsumeInput(conn) == 0)
>     {
>         /* trouble; expect PQgetResult() to return NULL */
>         break;
>     }
> 
> That is, the unconditional "if PQconsumeInput() ..." eliminates the test
> timeout.

Thanks for confirming!  I'm assuming this just masks the underlying
issue...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2024-03-09 Thread Magnus Hagander
On Fri, Dec 22, 2023 at 11:04 PM Alexander Korotkov
 wrote:
>
> Hi, Anton!
>
> On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov  
> wrote:
>>
>> Thanks for remarks!
>>
>> On 28.11.2023 21:34, Alexander Korotkov wrote:
>> > After examining the second patch
>> > ("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
>> > additional statistics as outlined in the patch is the most suitable
>> > approach to address the concerns raised. This solution provides more
>> > visibility into the system's behavior without altering its core
>> > mechanics.
>>
>> Agreed. I left only this variant of the patch and rework it due to commit 
>> 96f05261.
>> So the new counters is in the pg_stat_checkpointer view now.
>> Please see the v3-0001-add-restartpoints-stats.patch attached.
>>
>>
>> > However, it's essential that this additional functionality
>> > is accompanied by comprehensive documentation to ensure clear
>> > understanding and ease of use by the PostgreSQL community.
>> >
>> > Please consider expanding the documentation to include detailed
>> > explanations of the new statistics and their implications in various
>> > scenarios.
>>
>> In the separate v3-0002-doc-for-restartpoints-stats.patch i added the 
>> definitions
>> of the new counters into the "28.2.15. pg_stat_checkpointer" section
>> and explanation of them with examples into the "30.5.WAL Configuration" one.
>>
>> Would be glad for any comments and and concerns.
>
>
> I made some grammar corrections to the docs and have written the commit 
> message.
>
> I think this patch now looks good.  I'm going to push this if there are no 
> objections.

Per the docs, the sync_time, write_time and buffers_written only apply
to checkpoints, not restartpoints. Is this correct? AFAICT from a
quick look at the code they include both checkpoints and restartpoints
in which case I think the docs should be clarified to indicate that?
(Or if I'm wrong, and it doesn't include them, then shouldn't we have
separate counters for them?)

//Magnus




Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-03-09 Thread Nathan Bossart
On Sat, Mar 09, 2024 at 11:57:18AM +0900, Yugo NAGATA wrote:
> On Fri, 8 Mar 2024 16:17:58 -0600
> Nathan Bossart  wrote:
>> Is this guaranteed to be TOASTed for all possible page sizes?
> 
> Should we use block_size?
> 
>  SHOW block_size \gset
>  INSERT INTO test_chunk_id(v1,v2)
>   VALUES (repeat('x', 1), repeat('x', (:block_size / 4)));
> 
> I think this will work in various page sizes. 

WFM

> +SHOW block_size; \gset
> + block_size 
> +
> + 8192
> +(1 row)

I think we need to remove the ';' so that the output of the query is not
saved in the ".out" file.  With that change, this test passes when Postgres
is built with --with-blocksize=32.  However, many other unrelated tests
begin failing, so I guess this fix isn't tremendously important.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Fix for timestamp lag issue from emit_log_hook when GUC log_line_prefix has '%m'

2024-03-09 Thread Kambam Vinay
Hi,

We observed a slight lag in timestamp for a few logs from the emit_log_hook
hook implementation when the log_line_prefix GUC has '%m'.

Upon debugging, we found that the saved_timeval_set variable is set to
'true' in get_formatted_log_time() but is not reset to 'false' until the
next call to send_message_to_server_log(). Due to this, saved_timeval_set
will be true during the execution of hook emit_log_hook() which prefixes
the saved timestamp 'saved_timeval' from the previous log line (our hook
implementation calls log_line_prefix()).

Attached patch sets the saved_timeval_set to false before executing the
emit_log_hook()

Thanks,
Vinay


0001-set-saved_timeval_set-to-false-before-executing-emit.patch
Description: Binary data


Re: Improving contrib/tablefunc's error reporting

2024-03-09 Thread Joe Conway

On 3/5/24 17:04, Tom Lane wrote:

After reading the thread at [1], I could not escape the feeling
that contrib/tablefunc's error reporting is very confusing.
Looking into the source code, I soon found that it is also
very inconsistent, with similar error reports being phrased
quite differently.  The terminology for column names doesn't
match the SGML docs either.  And there are some places that are
so confused about whether they are complaining about the calling
query or the called query that the output is flat-out backwards.
So at that point my nascent OCD wouldn't let me rest without
fixing it.  Here's a quick patch series to do that.

For review purposes, I split this into two patches.  0001 simply
adds some more test cases to reach currently-unexercised error
reports.  Then 0002 makes my proposed code changes and shows
how the existing error messages change.

I'm not necessarily wedded to the phrasings I used here,
in case anyone has better ideas.


The changes all look good to me and indeed more consistent with the docs.

Do you want me to push these? If so, development tip  only, or backpatch?


BTW, while I didn't touch it here, it seems fairly bogus that
connectby() checks both type OID and typmod for its output
columns while crosstab() only checks type OID.  I think
crosstab() is in the wrong and needs to be checking typmod.
That might be fit material for a separate patch though.


I can take a look at this. Presumably this would not be for backpatching.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Improving contrib/tablefunc's error reporting

2024-03-09 Thread Tom Lane
Joe Conway  writes:
> On 3/5/24 17:04, Tom Lane wrote:
>> After reading the thread at [1], I could not escape the feeling
>> that contrib/tablefunc's error reporting is very confusing.

> The changes all look good to me and indeed more consistent with the docs.
> Do you want me to push these? If so, development tip  only, or backpatch?

I can push that.  I was just thinking HEAD, we aren't big on changing
error reporting in back branches.

>> BTW, while I didn't touch it here, it seems fairly bogus that
>> connectby() checks both type OID and typmod for its output
>> columns while crosstab() only checks type OID.  I think
>> crosstab() is in the wrong and needs to be checking typmod.
>> That might be fit material for a separate patch though.

> I can take a look at this. Presumably this would not be for backpatching.

I'm not sure whether that could produce results bad enough to be
called a bug or not.  But I too would lean towards not back-patching,
in view of the lack of field complaints.

regards, tom lane




Re: Improving contrib/tablefunc's error reporting

2024-03-09 Thread Joe Conway

On 3/9/24 13:07, Tom Lane wrote:

Joe Conway  writes:

On 3/5/24 17:04, Tom Lane wrote:

After reading the thread at [1], I could not escape the feeling
that contrib/tablefunc's error reporting is very confusing.



The changes all look good to me and indeed more consistent with the docs.
Do you want me to push these? If so, development tip  only, or backpatch?


I can push that.  I was just thinking HEAD, we aren't big on changing
error reporting in back branches.


BTW, while I didn't touch it here, it seems fairly bogus that
connectby() checks both type OID and typmod for its output
columns while crosstab() only checks type OID.  I think
crosstab() is in the wrong and needs to be checking typmod.
That might be fit material for a separate patch though.



I can take a look at this. Presumably this would not be for backpatching.


I'm not sure whether that could produce results bad enough to be
called a bug or not.  But I too would lean towards not back-patching,
in view of the lack of field complaints.



Something like the attached what you had in mind? (applies on top of 
your two patches)


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Make tablefunc crosstab() check typmod too

tablefunc connectby() checks both type OID and typmod for its output
columns while crosstab() only checks type OID. Fix that by makeing
the crosstab() check look more like the connectby() check.

--- tablefunc.c.v0002	2024-03-09 14:38:29.285393890 -0500
+++ tablefunc.c	2024-03-09 14:43:47.021399855 -0500
@@ -1527,10 +1527,10 @@
 compatCrosstabTupleDescs(TupleDesc ret_tupdesc, TupleDesc sql_tupdesc)
 {
 	int			i;
-	Form_pg_attribute ret_attr;
 	Oid			ret_atttypid;
-	Form_pg_attribute sql_attr;
 	Oid			sql_atttypid;
+	int32		ret_atttypmod;
+	int32		sql_atttypmod;
 
 	if (ret_tupdesc->natts < 2)
 		ereport(ERROR,
@@ -1539,34 +1539,39 @@
  errdetail("Return row must have at least two columns.")));
 	Assert(sql_tupdesc->natts == 3);	/* already checked by caller */
 
-	/* check the rowid types match */
+	/* check the row_name types match */
 	ret_atttypid = TupleDescAttr(ret_tupdesc, 0)->atttypid;
 	sql_atttypid = TupleDescAttr(sql_tupdesc, 0)->atttypid;
-	if (ret_atttypid != sql_atttypid)
+	ret_atttypmod = TupleDescAttr(ret_tupdesc, 0)->atttypmod;
+	sql_atttypmod = TupleDescAttr(sql_tupdesc, 0)->atttypmod;
+	if (ret_atttypid != sql_atttypid ||
+		(ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod))
 		ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg("invalid crosstab return type"),
  errdetail("Source row_name datatype %s does not match return row_name datatype %s.",
-		   format_type_be(sql_atttypid),
-		   format_type_be(ret_atttypid;
+		   format_type_with_typemod(sql_atttypid, sql_atttypmod),
+		   format_type_with_typemod(ret_atttypid, ret_atttypmod;
 
 	/*
-	 * - attribute [1] of the sql tuple is the category; no need to check it -
-	 * attribute [2] of the sql tuple should match attributes [1] to [natts]
+	 * attribute [1] of the sql tuple is the category; no need to check it
+	 * attribute [2] of the sql tuple should match attributes [1] to [natts - 1]
 	 * of the return tuple
 	 */
-	sql_attr = TupleDescAttr(sql_tupdesc, 2);
+	sql_atttypid = TupleDescAttr(sql_tupdesc, 2)->atttypid;
+	sql_atttypmod = TupleDescAttr(sql_tupdesc, 2)->atttypmod;
 	for (i = 1; i < ret_tupdesc->natts; i++)
 	{
-		ret_attr = TupleDescAttr(ret_tupdesc, i);
-
-		if (ret_attr->atttypid != sql_attr->atttypid)
+		ret_atttypid = TupleDescAttr(ret_tupdesc, i)->atttypid;
+		ret_atttypmod = TupleDescAttr(ret_tupdesc, i)->atttypmod;
+		if (ret_atttypid != sql_atttypid ||
+		(ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod))
 			ereport(ERROR,
 	(errcode(ERRCODE_DATATYPE_MISMATCH),
 	 errmsg("invalid crosstab return type"),
 	 errdetail("Source value datatype %s does not match return value datatype %s in column %d.",
-			   format_type_be(sql_attr->atttypid),
-			   format_type_be(ret_attr->atttypid),
+			   format_type_with_typemod(sql_atttypid, sql_atttypmod),
+			   format_type_with_typemod(ret_atttypid, ret_atttypmod),
 			   i + 1)));
 	}
 


Re: Improving contrib/tablefunc's error reporting

2024-03-09 Thread Tom Lane
Joe Conway  writes:
> On 3/9/24 13:07, Tom Lane wrote:
>>> BTW, while I didn't touch it here, it seems fairly bogus that
>>> connectby() checks both type OID and typmod for its output
>>> columns while crosstab() only checks type OID.  I think
>>> crosstab() is in the wrong and needs to be checking typmod.
>>> That might be fit material for a separate patch though.

> Something like the attached what you had in mind? (applies on top of 
> your two patches)

Yeah, exactly.

As far as the comment change goes:

-   * - attribute [1] of the sql tuple is the category; no need to check it -
-   * attribute [2] of the sql tuple should match attributes [1] to [natts]
+   * attribute [1] of the sql tuple is the category; no need to check it
+   * attribute [2] of the sql tuple should match attributes [1] to [natts - 1]
* of the return tuple

I suspect that this block looked better when originally committed but
didn't survive contact with pgindent.  You should check whether your
version will; if not, some dashes on the /* line will help.

regards, tom lane




Re: Combine Prune and Freeze records emitted by vacuum

2024-03-09 Thread Melanie Plageman
Thanks so much for the review!

On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas  wrote:
>
> On 25/01/2024 00:49, Melanie Plageman wrote:
>
> > The attached patch set is broken up into many separate commits for
> > ease of review. Each patch does a single thing which can be explained
> > plainly in the commit message. Every commit passes tests and works on
> > its own.
>
> About this very first change:
>
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -1526,8 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel,
> >* that everyone sees it as committed?
> >*/
> >   xmin = HeapTupleHeaderGetXmin(htup);
> > - if (!TransactionIdPrecedes(xmin,
> > -   
> >  vacrel->cutoffs.OldestXmin))
> > + if 
> > (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
> >   {
> >   all_visible = false;
> >   break;
>
> Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?

Okay, so I thought a lot about this, and I don't understand how
GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
correctly.

vacrel->cutoffs.OldestXmin is computed initially from
GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
GlobalVisState is updated by ComputeXidHorizons() (when it is
updated).

I do see that the comment above GlobalVisTestIsRemovableXid() says

 * It is crucial that this only gets called for xids from a source that
 * protects against xid wraparounds (e.g. from a table and thus protected by
 * relfrozenxid).

and then in

 * Convert 32 bit argument to FullTransactionId. We can do so safely
 * because we know the xid has to, at the very least, be between
 * [oldestXid, nextXid), i.e. within 2 billion of xid.

I'm not sure what oldestXid is here.
It's true that I don't see GlobalVisTestIsRemovableXid() being called
anywhere else with an xmin as an input. I think that hints that it is
not safe with FrozenTransactionId. But I don't see what could go
wrong.

Maybe it has something to do with converting it to a FullTransactionId?

   FullTransactionIdFromU64(U64FromFullTransactionId(rel)  + (int32)
(xid - rel_xid));

Sorry, I couldn't quite figure it out :(

> I read through all the patches in order, and aside from the above they
> all look correct to me. Some comments on the set as whole:
>
> I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
> anymore. By removing that, you also get rid of the freeze-only codepath
> near the end of heap_page_prune(), and the
> heap_freeze_execute_prepared() function.

That makes sense to me.

> The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
> XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
> the case that there's no pruning, just freezing. The record format
> (xl_heap_prune) looks pretty complex as it is, I think it could be made
> both more compact and more clear with some refactoring.

I'm happy to change up xl_heap_prune format. In its current form,
according to pahole, it has no holes and just 3 bytes of padding at
the end.

One way we could make it smaller is by moving the isCatalogRel member
to directly after snapshotConflictHorizon and then adding a flags
field and defining flags to indicate whether or not other members
exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
nredirected, nunused, nplans, ndead and just put the number of
redirected/unused/etc at the beginning of the arrays containing the
offsets. Then I could write various macros for accessing them. That
would make it smaller, but it definitely wouldn't make it less complex
(IMO).

> FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use
> pagefrz->cutoffs now.

Yep, I forgot to add a commit to do this. Thanks!

> HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
> cases. heap_page_prune() needs to track both, until it decides whether
> to freeze or not. But it doesn't make much sense that the caller
> (lazy_scan_prune()) has to initialize both, and has to choose which of
> the values to use after the call depending on whether heap_page_prune()
> froze or not. The two trackers should be just heap_page_prune()'s
> internal business.
>
> HeapPageFreeze is a bit confusing in general, as it's both an input and
> an output to heap_page_prune(). Not sure what exactly to do there, but I
> feel that we should make heap_page_prune()'s interface more clear.
> Perhaps move the output fields to PruneResult.

Great point. Perhaps I just add NewRelfrozenXid and NewRelminMxid t

Re: Improving contrib/tablefunc's error reporting

2024-03-09 Thread Joe Conway

On 3/9/24 15:39, Tom Lane wrote:

Joe Conway  writes:

On 3/9/24 13:07, Tom Lane wrote:

BTW, while I didn't touch it here, it seems fairly bogus that
connectby() checks both type OID and typmod for its output
columns while crosstab() only checks type OID.  I think
crosstab() is in the wrong and needs to be checking typmod.
That might be fit material for a separate patch though.


Something like the attached what you had in mind? (applies on top of 
your two patches)


Yeah, exactly.

As far as the comment change goes:

-   * - attribute [1] of the sql tuple is the category; no need to check it -
-   * attribute [2] of the sql tuple should match attributes [1] to [natts]
+   * attribute [1] of the sql tuple is the category; no need to check it
+   * attribute [2] of the sql tuple should match attributes [1] to [natts - 1]
 * of the return tuple

I suspect that this block looked better when originally committed but
didn't survive contact with pgindent.  You should check whether your
version will; if not, some dashes on the /* line will help.


Thanks for the review and heads up. I fiddled with it a bit to make it 
pgindent clean. I saw you commit your patches so I just pushed mine.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: remaining sql/json patches

2024-03-09 Thread Andy Fan


jian he  writes:

> On Tue, Mar 5, 2024 at 12:38 PM Andy Fan  wrote:
>>
>>
>> In the commit message of 0001, we have:
>>
>> """
>> Both JSON_VALUE() and JSON_QUERY() functions have options for
>> handling EMPTY and ERROR conditions, which can be used to specify
>> the behavior when no values are matched and when an error occurs
>> during evaluation, respectively.
>>
>> All of these functions only operate on jsonb values. The workaround
>> for now is to cast the argument to jsonb.
>> """
>>
>> which is not clear for me why we introduce JSON_VALUE() function, is it
>> for handling EMPTY or ERROR conditions? I think the existing cast
>> workaround have a similar capacity?
>>
>
> I guess because it's in the standard.
> but I don't see individual sql standard Identifier, JSON_VALUE in
> sql_features.txt
> I do see JSON_QUERY.
> mysql also have JSON_VALUE, [1]
>
> EMPTY, ERROR: there is a standard Identifier: T825: SQL/JSON: ON EMPTY
> and ON ERROR clauses
>
> [1] 
> https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-value

Thank you for this informatoin!

-- 
Best Regards
Andy Fan





Re: Extract numeric filed in JSONB more effectively

2024-03-09 Thread Andy Fan


>> But I have a different question about this patch set.  This has some
>> overlap with the JSON_VALUE function that is being discussed at
>> [0][1]. For example, if I apply the patch
>> v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run
>>
>> select count(*) from tb where json_value(a, '$.a' returning numeric) = 2;
>>
>> and I get a noticeable performance boost over
>>
>> select count(*) from tb where cast (a->'a' as numeric) = 2;
>
> Here is my test and profile about the above 2 queries.
>
..
> As we can see the patch here has the best performance (this result looks
> be different from yours?).
>
> After I check the code, I am sure both patches *don't* have the problem
> in master where it get a jsonbvalue first and convert it to jsonb and
> then cast to numeric.
>
> Then I perf the result, and find the below stuff:
>
..

> JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer,
> then I think JSON_VALUE probably is designed for some more complex path 
> which need to pay extra effort which bring the above performance
> difference. 


Hello Peter,

Thanks for highlight the JSON_VALUE patch! Here is the sistuation in my
mind now. 

My patch is desigined to *not* introducing any new user-faced functions, 
but let some existing functions run faster. JSON_VALUE patch is designed
to following more on SQL standard so introuduced one new function which
has more flexibility on ERROR handling [1].  

Both patches are helpful on the subject here, but my patch here has a
better performance per my testing, I don't think I did anything better
here, just because JSON_VALUE function is designed for some more generic
purpose which has to pay some extra effort, and even if we have some
chance to improve JSON_VALUE, I don't think it shoud not block the patch
here (I'd like to learn more about this, it may takes some time!)

So I think the my patch here can be go ahead again, what do you think? 

[1]
https://www.postgresql.org/message-id/CACJufxGtetrn34Hwnb9D2if5D_HOPAh235MtEZ1meVYx-BiNtg%40mail.gmail.com
 

-- 
Best Regards
Andy Fan





Vectored I/O in bulk_write.c

2024-03-09 Thread Thomas Munro
Hi,

I was trying to learn enough about the new bulk_write.c to figure out
what might be going wrong over at [1], and finished up doing this
exercise, which is experiment quality but passes basic tests.  It's a
bit like v1-0013 and v1-0014's experimental vectored checkpointing
from [2] (which themselves are not currently proposed, that too was in
the experiment category), but this usage is a lot simpler and might be
worth considering.  Presumably both things would eventually finish up
being done by (not yet proposed) streaming write, but could also be
done directly in this simple case.

This way, CREATE INDEX generates 128kB pwritev() calls instead of 8kB
pwrite() calls.  (There's a magic 16 in there, we'd probably need to
think harder about that.)  It'd be better if bulk_write.c's memory
management were improved: if buffers were mostly contiguous neighbours
instead of being separately palloc'd objects, you'd probably often get
128kB pwrite() instead of pwritev(), which might be marginally more
efficient.

This made me wonder why smgrwritev() and smgrextendv() shouldn't be
backed by the same implementation, since they are essentially the same
operation.  The differences are some assertions which might as well be
moved up to the smgr.c level as they must surely apply to any future
smgr implementation too, right?, and the segment file creation policy
which can be controlled with a new argument.  I tried that here.  An
alternative would be for md.c to have a workhorse function that both
mdextendv() and mdwritev() call, but I'm not sure if there's much
point in that.

While thinking about that I realised that an existing write-or-extend
assertion in master is wrong because it doesn't add nblocks.

Hmm, it's a bit weird that we have nblocks as int or BlockNumber in
various places, which I think should probably be fixed.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com
From 4611cb121bbfa787ddbba4bc0e80ac6c732345d0 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 9 Mar 2024 16:04:21 +1300
Subject: [PATCH 1/2] Provide vectored variant of smgrextend().

Since mdwrite() and mdextend() were basically the same, merge them.
They had different assertions, but those would surely apply to any
implementation of smgr, so move them up to the smgr.c wrapper
level.  The other difference was the policy on segment creation, but
that can be captured by having smgrwritev() and smgrextendv() call a
single mdwritev() function with a new "extend" flag.
---
 src/backend/storage/smgr/md.c   | 91 -
 src/backend/storage/smgr/smgr.c | 28 ++
 src/include/storage/md.h|  3 +-
 src/include/storage/smgr.h  | 12 -
 4 files changed, 40 insertions(+), 94 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bf0f3ca76d..80716f11e1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -447,74 +447,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 	pfree(path);
 }
 
-/*
- * mdextend() -- Add a block to the specified relation.
- *
- * The semantics are nearly the same as mdwrite(): write at the
- * specified position.  However, this is to be used for the case of
- * extending a relation (i.e., blocknum is at or beyond the current
- * EOF).  Note that we assume writing a block beyond current EOF
- * causes intervening file space to become filled with zeroes.
- */
-void
-mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 const void *buffer, bool skipFsync)
-{
-	off_t		seekpos;
-	int			nbytes;
-	MdfdVec*v;
-
-	/* If this build supports direct I/O, the buffer must be I/O aligned. */
-	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
-		Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer));
-
-	/* This assert is too expensive to have on normally ... */
-#ifdef CHECK_WRITE_VS_EXTEND
-	Assert(blocknum >= mdnblocks(reln, forknum));
-#endif
-
-	/*
-	 * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
-	 * more --- we mustn't create a block whose number actually is
-	 * InvalidBlockNumber.  (Note that this failure should be unreachable
-	 * because of upstream checks in bufmgr.c.)
-	 */
-	if (blocknum == InvalidBlockNumber)
-		ereport(ERROR,
-(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("cannot extend file \"%s\" beyond %u blocks",
-		relpath(reln->smgr_rlocator, forknum),
-		InvalidBlockNumber)));
-
-	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
-
-	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
-
-	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
-
-	if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
-	{
-		if (nbytes 

Re: Vectored I/O in bulk_write.c

2024-03-09 Thread Thomas Munro
Slightly better version, adjusting a few obsoleted comments, adjusting
error message to distinguish write/extend, fixing a thinko in
smgr_cached_nblocks maintenance.
From c786f979b0c38364775e32b9403b79303507d37b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 9 Mar 2024 16:04:21 +1300
Subject: [PATCH v2 1/2] Provide vectored variant of smgrextend().

Since mdwrite() and mdextend() were basically the same, merge them.
They had different assertions, but those would surely apply to any
implementation of smgr, so move them up to the smgr.c wrapper
level.  The other difference was the policy on segment creation, but
that can be captured by having smgrwritev() and smgrextendv() call a
single mdwritev() function with a new "extend" flag.
---
 src/backend/storage/smgr/md.c   | 106 +---
 src/backend/storage/smgr/smgr.c |  32 ++
 src/include/storage/md.h|   3 +-
 src/include/storage/smgr.h  |  12 +++-
 4 files changed, 47 insertions(+), 106 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bf0f3ca76d..9b12584122 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -447,79 +447,10 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 	pfree(path);
 }
 
-/*
- * mdextend() -- Add a block to the specified relation.
- *
- * The semantics are nearly the same as mdwrite(): write at the
- * specified position.  However, this is to be used for the case of
- * extending a relation (i.e., blocknum is at or beyond the current
- * EOF).  Note that we assume writing a block beyond current EOF
- * causes intervening file space to become filled with zeroes.
- */
-void
-mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 const void *buffer, bool skipFsync)
-{
-	off_t		seekpos;
-	int			nbytes;
-	MdfdVec*v;
-
-	/* If this build supports direct I/O, the buffer must be I/O aligned. */
-	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
-		Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer));
-
-	/* This assert is too expensive to have on normally ... */
-#ifdef CHECK_WRITE_VS_EXTEND
-	Assert(blocknum >= mdnblocks(reln, forknum));
-#endif
-
-	/*
-	 * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
-	 * more --- we mustn't create a block whose number actually is
-	 * InvalidBlockNumber.  (Note that this failure should be unreachable
-	 * because of upstream checks in bufmgr.c.)
-	 */
-	if (blocknum == InvalidBlockNumber)
-		ereport(ERROR,
-(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("cannot extend file \"%s\" beyond %u blocks",
-		relpath(reln->smgr_rlocator, forknum),
-		InvalidBlockNumber)));
-
-	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
-
-	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
-
-	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
-
-	if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
-	{
-		if (nbytes < 0)
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg("could not extend file \"%s\": %m",
-			FilePathName(v->mdfd_vfd)),
-	 errhint("Check free disk space.")));
-		/* short write: complain appropriately */
-		ereport(ERROR,
-(errcode(ERRCODE_DISK_FULL),
- errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u",
-		FilePathName(v->mdfd_vfd),
-		nbytes, BLCKSZ, blocknum),
- errhint("Check free disk space.")));
-	}
-
-	if (!skipFsync && !SmgrIsTemp(reln))
-		register_dirty_segment(reln, forknum, v);
-
-	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
-}
-
 /*
  * mdzeroextend() -- Add new zeroed out blocks to the specified relation.
  *
- * Similar to mdextend(), except the relation can be extended by multiple
- * blocks at once and the added blocks will be filled with zeroes.
+ * The added blocks will be filled with zeroes.
  */
 void
 mdzeroextend(SMgrRelation reln, ForkNumber forknum,
@@ -595,13 +526,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 		{
 			int			ret;
 
-			/*
-			 * Even if we don't want to use fallocate, we can still extend a
-			 * bit more efficiently than writing each 8kB block individually.
-			 * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry()
-			 * to avoid multiple writes or needing a zeroed buffer for the
-			 * whole length of the extension.
-			 */
+			/* Fall back to writing out zeroes. */
 			ret = FileZero(v->mdfd_vfd,
 		   seekpos, (off_t) BLCKSZ * numblocks,
 		   WAIT_EVENT_DATA_FILE_EXTEND);
@@ -920,19 +845,14 @@ mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 /*
  * mdwritev() -- Write the supplied blocks at the appropriate location.
  *
- * This is to be used only for updating already-existing blocks of a
- * relation (ie, those before the current EOF).  To extend a relation,
- * use mdextend().
+ * Note tha

Re: Add recovery to pg_control and remove backup_label

2024-03-09 Thread David Steele

On 1/29/24 12:28, David Steele wrote:

On 1/28/24 19:11, Michael Paquier wrote:

On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote:

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_3511.log


With the recent introduction of incremental backups that depend on
backup_label and the rather negative feedback received, I think that
it would be better to return this entry as RwF for now.  What do you
think?


I've been thinking it makes little sense to update the patch. It would 
be a lot of work with all the new changes for incremental backup and 
since Andres and Robert appear to be very against the idea, I doubt it 
would be worth the effort.


I've had a new idea which may revive this patch. The basic idea is to 
keep backup_label but also return a copy of pg_control from 
pg_stop_backup(). This copy of pg_control would be safe from tears and 
have a backupLabelRequired field set (as Andres suggested) so recovery 
cannot proceed without the backup label.


So, everything will continue to work as it does now. But, backup 
software can be enhanced to write the improved pg_control that is 
guaranteed not to be torn and has protection against a missing backup label.


Of course, pg_basebackup will write the new backupLabelRequired field 
into pg_control, but this way third party software can also gain 
advantages from the new field.


Thoughts?

Regards,
-David




Re: Add checkpoint/redo LSNs to recovery errors.

2024-03-09 Thread David Steele

On 2/29/24 16:42, Michael Paquier wrote:

On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:

This patch adds checkpoint/redo LSNs to recovery error messages where they
may be useful for debugging.


I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.

 ereport(PANIC,
 (errmsg("invalid redo record in shutdown checkpoint")));
[...]
 ereport(PANIC,
 (errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.

 ereport(PANIC,
 (errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?

 ereport(FATAL,
 (errmsg("WAL ends before consistent recovery point")));
[...]
 ereport(FATAL,
 (errmsg("WAL ends before end of online backup"),

These two are in xlog.c, and don't mention backupStartPoint for the
first one.  Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?


For now I'd like to just focus on these three messages that are related 
to a missing backup_label or a misconfiguration of restore_command when 
backup_label is present.


No doubt there are many other recovery log messages that could be 
improved, but I'd rather not bog down the patch.


Regards,
-David




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-09 Thread Thomas Munro
On Sat, Mar 9, 2024 at 9:48 AM Tomas Vondra
 wrote:
> I spent a bit of time investigating this today, but haven't made much
> progress due to (a) my unfamiliarity with the smgr code in general and
> the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
> time consuming to iterate and experiment.
>
> However, the smallest schedule that still reproduces the issue is:
>
> ---
> test: test_setup
>
> test: create_aggregate create_function_sql create_cast constraints
> triggers select inherit typed_table vacuum drop_if_exists
> updatable_views roleattributes create_am hash_func errors infinite_recurse
> ---

Thanks, reproduced here (painfully slowly).  Looking...

Huh, also look at these extra problems later in the logs of the latest
trilobite and avocet runs, further down after the short read errors:

TRAP: failed Assert("j > attnum"), File: "heaptuple.c", Line: 640, PID: 15753
postgres: autovacuum worker regression(ExceptionalCondition+0x67)[0x9c5f37]
postgres: autovacuum worker regression[0x4b60c8]
postgres: autovacuum worker regression[0x5ff735]
postgres: autovacuum worker regression[0x5fe468]
postgres: autovacuum worker regression(analyze_rel+0x133)[0x5fd5e3]
postgres: autovacuum worker regression(vacuum+0x6b6)[0x683926]
postgres: autovacuum worker regression[0x7ce5e3]
postgres: autovacuum worker regression[0x7cc4f0]
postgres: autovacuum worker regression(StartAutoVacWorker+0x22)[0x7cc152]
postgres: autovacuum worker regression[0x7d57d1]
postgres: autovacuum worker regression[0x7d37bf]
postgres: autovacuum worker regression[0x6f5a4f]

Then crash recovery fails, in one case with:

2024-03-07 20:28:18.632 CET [15860:4] WARNING:  will not overwrite a used ItemId
2024-03-07 20:28:18.632 CET [15860:5] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12
2024-03-07 20:28:18.632 CET [15860:6] PANIC:  failed to add tuple
2024-03-07 20:28:18.632 CET [15860:7] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12

... and in another with:

2024-03-05 11:51:27.992 CET [25510:4] PANIC:  invalid lp
2024-03-05 11:51:27.992 CET [25510:5] CONTEXT:  WAL redo at 0/F87A8D0
for Heap/INPLACE: off: 29; blkref #0: rel 1663/16384/20581, blk 0




Re: Allow non-superuser to cancel superuser tasks.

2024-03-09 Thread Leung, Anthony
Hi, 

I'm new to reviewing postgres patches, but I took an interest in reviewing this 
patch as recommended by Nathan.

I have the following comments:

>   if (!superuser()) {
>   if (!OidIsValid(proc->roleId)) {
>   LocalPgBackendStatus *local_beentry;
>   local_beentry = 
> pgstat_get_local_beentry_by_backend_id(proc->backendId);
>
>   if (!(local_beentry && 
> local_beentry->backendStatus.st_backendType == B_AUTOVAC_WORKER && 
>   has_privs_of_role(GetUserId(), 
> ROLE_PG_SIGNAL_AUTOVACUUM)))
>   return SIGNAL_BACKEND_NOSUPERUSER;
>   } else {
>   if (superuser_arg(proc->roleId))
>   return SIGNAL_BACKEND_NOSUPERUSER;
>
>   /* Users can signal backends they have role membership 
> in. */
>   if (!has_privs_of_role(GetUserId(), proc->roleId) &&
>   !has_privs_of_role(GetUserId(), 
> ROLE_PG_SIGNAL_BACKEND))
>   return SIGNAL_BACKEND_NOPERMISSION;
>   }
>   }
>
1. I would suggest not to do nested if blocks since it's becoming harder to 
read. Also, does it make sense to have a utilities function in backend_status.c 
to check if a backend of a given backend id is of a certain backend_type. And 
should we check if proc->backendId is invalid?

>ALTER SYSTEM SET autovacuum_vacuum_cost_limit TO 1;
>ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO 100;
>ALTER SYSTEM SET autovacuum_naptime TO 1;
2. Could we set these parameters at the beginning of the test before 
$node->start with $node->append_conf ? That way we can avoid restarting the 
node and doing the sleep later on.

> my $res_pid = $node_primary->safe_psql(
>.   'regress',
>   "SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum 
> worker' and datname = 'regress';"
> );
>
> my ($res_reg_psa_1, $stdout_reg_psa_1, $stderr_reg_psa_1)  = 
> $node_primary->psql('regress', qq[
 SET ROLE psa_reg_role_1;
>   SELECT pg_terminate_backend($res_pid);
>  ]);
>
> ok($res_reg_psa_1 != 0, "should fail for non pg_signal_autovacuum");
> like($stderr_reg_psa_1, qr/Only roles with the SUPERUSER attribute may 
> terminate processes of roles with the SUPERUSER attribute./, "matches");
>
> my ($res_reg_psa_2, $stdout_reg_psa_2, $stderr_reg_psa_2) = 
> $node_primary->psql('regress', qq[
>SET ROLE psa_reg_role_2;
>SELECT pg_terminate_backend($res_pid);
> ]");
3. Some nits on styling 

4. According to Postgres styles, I believe open brackets should be in a new 
line 



Re: Allow non-superuser to cancel superuser tasks.

2024-03-09 Thread Leung, Anthony
Another comment that I forgot to mention is that we should also make the 
documentation change in doc/src/sgml/user-manag.sgml for this new predefined 
role

Thanks.

-- 
Anthony Leung
Amazon Web Services: https://aws.amazon.com



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-09 Thread Thomas Munro
On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro  wrote:
> Thanks, reproduced here (painfully slowly).  Looking...

I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem.  Which is not cool, for a
btree.  Looking at btbuildempty(), we have this sequence:

bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);

/* Construct metapage. */
metabuf = smgr_bulk_get_buf(bulkstate);
_bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);

smgr_bulk_finish(bulkstate);

Ooh.  One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption.  Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok.  If that's it, I'm surprised that
things don't implode more spectacularly.  Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?

For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it).  Some other
bulk write got mixed up?  Not sure yet.

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-09 Thread Thomas Munro
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro  wrote:
> I won't be surprised if the answer is: if you're holding a reference,
> you have to get a pin (referring to bulk_write.c).

Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine.  It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().  That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required.  That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine.  It won't be destroyed until after
the end of the transaction.  Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right?