Re: Slow standby snapshot

2021-05-20 Thread Kirill Reshke
sorry, forgot to add a patch to the letter



чт, 20 мая 2021 г. в 13:52, Кирилл Решке :

> Hi,
> I recently ran into a problem in one of our production postgresql cluster.
> I had noticed lock contention on procarray lock on standby, which causes
> WAL replay lag growth.
> To reproduce this, you can do the following:
>
> 1) set max_connections to big number, like 10
> 2) begin a transaction on primary
> 3) start pgbench workload on primary and on standby
>
> After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in
> perf top consuming abount 75 % of CPU.
>
> %%
>   PerfTop:1060 irqs/sec  kernel: 0.0%  exact:  0.0% [4000Hz cycles:u],
>  (target_pid: 273361)
>
> ---
>
> 73.92%  postgres   [.] KnownAssignedXidsGetAndSetXmin
>  1.40%  postgres   [.] base_yyparse
>  0.96%  postgres   [.] LWLockAttemptLock
>  0.84%  postgres   [.] hash_search_with_hash_value
>  0.84%  postgres   [.] AtEOXact_GUC
>  0.72%  postgres   [.] ResetAllOptions
>  0.70%  postgres   [.] AllocSetAlloc
>  0.60%  postgres   [.] _bt_compare
>  0.55%  postgres   [.] core_yylex
>  0.42%  libc-2.27.so   [.] __strlen_avx2
>  0.23%  postgres   [.] LWLockRelease
>  0.19%  postgres   [.] MemoryContextAllocZeroAligned
>  0.18%  postgres   [.] expression_tree_walker.part.3
>  0.18%  libc-2.27.so   [.] __memmove_avx_unaligned_erms
>  0.17%  postgres   [.] PostgresMain
>  0.17%  postgres   [.] palloc
>  0.17%  libc-2.27.so   [.] _int_malloc
>  0.17%  postgres   [.] set_config_option
>  0.17%  postgres   [.] ScanKeywordLookup
>  0.16%  postgres   [.] _bt_checkpage
>
> %%
>
>
> We have tried to fix this by using BitMapSet instead of boolean array
> KnownAssignedXidsValid, but this does not help too much.
>
> Instead, using a doubly linked list helps a little more, we got +1000 tps
> on pgbench workload with patched postgresql. The general idea of this patch
> is that, instead of memorizing which elements in KnownAssignedXids are
> valid, lets maintain a doubly linked list of them. This  solution will work
> in exactly the same way, except that taking a snapshot on the replica is
> now O(running transaction) instead of O(head - tail) which is significantly
> faster under some workloads. The patch helps to reduce CPU usage of
> KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate
> it from perf top.
>
> The problem is better reproduced on PG13 since PG14 has some snapshot
> optimization.
>
> Thanks!
>
> Best regards, reshke
>
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5ff8cab394..165cf56ea9 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -255,9 +255,18 @@ static PGPROC *allProcs;
  * Bookkeeping for tracking emulated transactions in recovery
  */
 static TransactionId *KnownAssignedXids;
-static bool *KnownAssignedXidsValid;
+
+typedef struct {
+	int prv;
+	int nxt;
+} KnownAssignedXidValidNode;
+
+// Doubly linked list of valid xids
+static KnownAssignedXidValidNode *KnownAssignedXidsValidDLL;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
+
+
 /*
  * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
  * the highest xid that might still be running that we don't have in
@@ -348,6 +357,9 @@ static void GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons);
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
  */
+
+static KnownAssignedXidValidNode KAX_E_INVALID;
+
 Size
 ProcArrayShmemSize(void)
 {
@@ -380,13 +392,20 @@ ProcArrayShmemSize(void)
 		size = add_size(size,
 		mul_size(sizeof(TransactionId),
  TOTAL_MAX_CACHED_SUBXIDS));
-		size = add_size(size,
-		mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS));
+size = add_size(size,
+		mul_size(sizeof(KnownAssignedXidValidNode), TOTAL_MAX_CACHED_SUBXIDS));
 	}
 
+KAX_E_INVALID.prv = -1;
+KAX_E_INVALID.nxt = TOTAL_MAX_CACHED_SUBXIDS;
+
 	return size;
 }
 
+#define KAX_DLL_INDEX_VALID(i) (-1 < i && i < TOTAL_MAX_CACHED_SUBXIDS)
+#define KAX_DLL_ENTRY_INVALID(i) (-1 == KnownAssignedXidsValidDLL[i].prv && KnownAssignedXidsValidDLL[i].nxt == TOTAL_MAX_CACHED_SUBXIDS)
+
+
 /*
  * Initialize the shared PGPROC array during postmaster startup.
  */
@@ -431,10 +450,15 @@ CreateSharedProcArray(void)
 			mul_size(sizeof(TransactionId),
 	 TOTAL_MAX_CACHED_SUBXIDS),
 			&found);
-		KnownAssignedXidsValid = (bool *)
-			ShmemInitStruct("KnownAssignedXidsValid",
-			mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
+		
+KnownAssignedXidsValidDLL = (KnownAssignedXidValidNode*)
+			ShmemInitStruct("KnownAssignedXidsValidDLL",
+			mul_size(sizeof(KnownAssignedXidValidNode), TOTAL_MAX_CACHED_SUBXIDS),
 			&found);
+
+

Allow non-superuser to cancel superuser tasks.

2024-02-25 Thread Kirill Reshke
Hi hackers!

In our Cloud we have a patch, which allows non-superuser role ('mdb_admin')
to do some superuser things.
In particular, we have a patch that allows mdb admin to cancel the
autovacuum process and some other processes (processes with
application_name = 'MDB'), see the attachment.
This is needed to allow non-superuser roles to run pg_repack and to cancel
pg_repack.
We need to cancel running autovac to run pg_repack (because of locks), and
we need to cancel pg_repack sometimes also.

I want to reduce our internal patch size and transfer this logic to
extension or to core.
I have found similar threads [1] and [2], but, as far as I understand, they
do not solve this particular case.
I see 2 possible ways to implement this. The first one is to have hool in
pg_signal_backend, and define a hook in extension which can do the thing.
The second one is to have a predefined role. Something like a
`pg_signal_autovacuum` role which can signal running autovac to cancel. But
I don't see how we can handle specific `application_name` with this
solution.

[1]
https://www.postgresql.org/message-id/flat/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC%40enterprisedb.com
[2]
https://www.postgresql.org/message-id/flat/20220722203735.GB3996698%40nathanxps13


v1-0001-cloud-mdb_admin-patch-part-to-illustrate.patch
Description: Binary data


Re: Allow non-superuser to cancel superuser tasks.

2024-02-26 Thread Kirill Reshke
On Mon, 26 Feb 2024 at 20:10, Nathan Bossart 
wrote:

> On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
> > I see 2 possible ways to implement this. The first one is to have hool in
> > pg_signal_backend, and define a hook in extension which can do the thing.
> > The second one is to have a predefined role. Something like a
> > `pg_signal_autovacuum` role which can signal running autovac to cancel.
> But
> > I don't see how we can handle specific `application_name` with this
> > solution.
>
> pg_signal_autovacuum seems useful given commit 3a9b18b.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com


Thank you for your response.
Please find a patch attached.

In patch, pg_signal_autovacuum role with oid 6312 added. I grabbed oid from
unused_oids script output.
Also, tap tests for functionality added. I'm not sure where to place them,
so I placed them in a separate directory in `src/test/`
Seems that regression tests for this feature are not possible, am i right?
Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
Should pg_signal_autovacuum have power of pg_signal_backend (implicity)? Or
should this role have such little scope...


v1-0001-Allow-non-superuser-to-cancel-superuser-tasks.patch
Description: Binary data


Re: Allow non-superuser to cancel superuser tasks.

2024-02-27 Thread Kirill Reshke
On Tue, 27 Feb 2024 at 01:22, Kirill Reshke  wrote:

>
>
> On Mon, 26 Feb 2024 at 20:10, Nathan Bossart 
> wrote:
>
>> On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
>> > I see 2 possible ways to implement this. The first one is to have hool
>> in
>> > pg_signal_backend, and define a hook in extension which can do the
>> thing.
>> > The second one is to have a predefined role. Something like a
>> > `pg_signal_autovacuum` role which can signal running autovac to cancel.
>> But
>> > I don't see how we can handle specific `application_name` with this
>> > solution.
>>
>> pg_signal_autovacuum seems useful given commit 3a9b18b.
>>
>> --
>> Nathan Bossart
>> Amazon Web Services: https://aws.amazon.com
>
>
> Thank you for your response.
> Please find a patch attached.
>
> In patch, pg_signal_autovacuum role with oid 6312 added. I grabbed oid
> from unused_oids script output.
> Also, tap tests for functionality added. I'm not sure where to place them,
> so I placed them in a separate directory in `src/test/`
> Seems that regression tests for this feature are not possible, am i right?
> Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
> Should pg_signal_autovacuum have power of pg_signal_backend (implicity)?
> Or should this role have such little scope...
>
> Have a little thought on this, will share.
Do we need to test the pg_cancel_backend vs autovacuum case at all?
I think we do. Would it be better to split work into 2 patches: first one
with tests against current logic, and second
one with some changes/enhancements which allows to cancel running autovac
to non-superuser (via `pg_signal_autovacuum` role or some other mechanism)?


Fwd: Extensible storage manager API - SMGR hook Redux

2023-12-04 Thread Kirill Reshke
Sorry for double-posting, I accidentally replied to Matthias, not the
mailing list :(

-- Forwarded message -
From: Kirill Reshke 
Date: Mon, 4 Dec 2023 at 19:46
Subject: Re: Extensible storage manager API - SMGR hook Redux
To: Matthias van de Meent 


Hi!

On Fri, 30 Jun 2023 at 15:27, Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> Hi hackers,
>
> At Neon, we've been working on removing the file system dependency
> from PostgreSQL and replacing it with a distributed storage layer. For
> now, we've seen most success in this by replacing the implementation
> of the smgr API, but it did require some core modifications like those
> proposed early last year  by Anastasia [0].
>
> As mentioned in the previous thread, there are several reasons why you
> would want to use a non-default storage manager: storage-level
> compression, encryption, and disk limit quotas [0]; offloading of cold
> relation data was also mentioned [1].
>
> In the thread on Anastasia's patch, Yura Sokolov mentioned that
> instead of a hook-based smgr extension, a registration-based smgr
> would be preferred, with integration into namespaces. Please find
> attached an as of yet incomplete patch that starts to do that.
>
> The patch is yet incomplete (as it isn't derived from Anastasia's
> patch), but I would like comments on this regardless, as this is a
> fairly fundamental component of PostgreSQL that is being modified, and
> it is often better to get comments early in the development cycle. One
> significant issue that I've seen so far are that catcache is not
> guaranteed to be available in all backends that need to do smgr
> operations, and I've not yet found a good solution.
>
> Changes compared to HEAD:
> - smgrsw is now dynamically allocated and grows as new storage
> managers are loaded (during shared_preload_libraries)
> - CREATE TABLESPACE has new optional syntax USING smgrname (option [, ...])
> - tablespace storage is (planned) fully managed by smgr through some
> new smgr apis
>
> Changes compared to Anastasia's patch:
> - extensions do not get to hook and replace the api of the smgr code
> directly - they are hidden behind the smgr registry.
>
> Successes:
> - 0001 passes tests (make check-world)
> - 0002 builds without warnings (make)
>
> TODO:
> - fix dependency failures when catcache is unavailable
> - tablespace redo is currently broken with 0002
> - fix tests for 0002
> - ensure that pg_dump etc. works with the new tablespace storage manager
> options
>
> Looking forward to any comments, suggestions and reviews.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech/)
>
>
> [0]
> https://www.postgresql.org/message-id/CAP4vRV6JKXyFfEOf%3Dn%2Bv5RGsZywAQ3CTM8ESWvgq%2BS87Tmgx_g%40mail.gmail.com
> [1]
> https://www.postgresql.org/message-id/d365f19f-bc3e-4f96-a91e-8db130497...@yandex-team.ru


So, 0002 patch uses the `get_tablespace` function, which searches Catalog
to tablespace SMGR id. I wonder how `smgr_redo` would work with it?
Is it possible to query the system catalog during crash recovery? As far as
i understand the answer is "no", correct me if I'm wrong.
Furthermore, why do we only allow tablespace to have its own SMGR
implementation, can we have per-relation SMGR? Maybe we can do it in a way
similar to custom RMGR (meaning, write SMGR OID into WAL and use it in
crash recovery etc.)?


Re: Extensible storage manager API - SMGR hook Redux

2023-12-04 Thread Kirill Reshke
On Mon, 4 Dec 2023 at 22:21, Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Mon, 4 Dec 2023 at 17:51, Kirill Reshke  wrote:
> >
> > So, 0002 patch uses the `get_tablespace` function, which searches
> Catalog to tablespace SMGR id. I wonder how `smgr_redo` would work with it?
>
> That's a very good point I hadn't considered in detail yet. Quite
> clearly, the current code is wrong in assuming that the catalog is
> accessible, and it should probably be stored in a way similar to
> pg_filenode.map in a file managed outside the buffer pool.
>
> Hmm, pg_filenode.map  is a nice idea. So, simply maintain TableSpaceOId ->
smgr id mapping in a separate file and update the whole file on any
changes, right?
Looks reasonable to me, but it is clear that this solution can be really
slow in some patterns, like if we create many-many tablespaces(the way you
suggested it in the per-relation SMGR feature). Maybe we can store data in
files somehow separately, and only update one chunk per operation.

Anyway, if we use a `pg_filenode.map` - like solution, we need to reuse its
code infrasture, right? For example, it seems that code that calculates
checksums can be reused.
So, we need to refactor code here, define something like FileMap API maybe.
Or is it not really worth it? We can just write similar code twice.


Re: Add sub-transaction overflow status in pg_stat_activity

2023-02-08 Thread Kirill Reshke
On Tue, 20 Dec 2022 at 09:23, Dilip Kumar  wrote:
>
> On Tue, Dec 20, 2022 at 2:32 AM Robert Haas  wrote:
> >
> > On Mon, Dec 19, 2022 at 3:48 PM Ted Yu  wrote:
> > > It seems the comment for `backend_subxact_overflowed` missed a word.
> > >
> > > Please see the patch.
> >
> > Committed this fix, thanks.
>
> Thanks, Robert!
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>


Hi hackers!

Nice patch, seems it may be useful in cases like alerting that subxid
overflow happened is database or whatever.
But I'm curious, what is the following work on this? I think it may be
way more helpful to, for example, log queries, causing sub-tx
overflow,
or even kill the backend, causing sub-tx overflow with GUC variables,
setting server behaviour.
For example, in Greenplum there is gp_subtransaction_overflow
extension and GUC for simply logging problematic queries[1]. Can we
have something
similar in PostgreSQL on the server-side?

[1] 
https://github.com/greenplum-db/gpdb/blob/6X_STABLE/gpcontrib/gp_subtransaction_overflow/gp_subtransaction_overflow.c#L42




pg_init_privs corruption.

2023-02-17 Thread Kirill Reshke
Hi hackers!

Recently we faced a problem with one of our production clusters. Problem
was with pg_upgrade,
the reason was an invalid pg_dump of cluster schema. in pg_dump sql there
was strange records like

REVOKE SELECT,INSERT,DELETE,UPDATE ON TABLE *relation* FROM "144841";

but there is no role "144841"
We did dig in, and it turns out that 144841 was OID of previously-deleted
role.

I have reproduced issue using simple test extension yoext(1).

SQL script:

create role user1;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT select ON TABLES TO user1;
create extension yoext;
drop owned by user1;
select * from pg_init_privs  where privtype = 'e';
drop role user1;
select * from pg_init_privs  where privtype = 'e';

result of execution (executed on fest master from commit
17feb6a566b77bf62ca453dec215adcc71755c20):

psql (16devel)
Type "help" for help.

postgres=#
postgres=#
postgres=# create role user1;
CREATE ROLE
postgres=# ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT select ON TABLES
TO user1;
ALTER DEFAULT PRIVILEGES
postgres=# create extension yobaext ;
CREATE EXTENSION
postgres=# drop owned by user1;
DROP OWNED
postgres=# select * from pg_init_privs  where privtype = 'e';
 objoid | classoid | objsubid | privtype | initprivs
+--+--+--+---
  16387 | 1259 |0 | e|
{reshke=arwdDxtm/reshke,user1=r/reshke,=r/reshke}
(1 row)

postgres=# drop role user1;
DROP ROLE
postgres=# select * from pg_init_privs  where privtype = 'e';
 objoid | classoid | objsubid | privtype | initprivs
+--+--+--+---
  16387 | 1259 |0 | e|
{reshke=arwdDxtm/reshke,16384=r/reshke,=r/reshke}
(1 row)


As you can see, after drop role there is invalid records in pg_init_privs
system relation. After this, pg_dump generate sql statements, some of which
are based on content of pg_init_privs, resulting in invalid dump.

PFA fix.

The idea of fix is simply drop records from pg_init_privs while dropping
role.
Records with grantor of grantee equal to oid of dropped role will erase.
after that, pg_dump works ok.

Implementation comment: i failed to find proper way to alloc acl array, so
defined some acl.c internal function `allocacl` in header. Need to improve
this somehow.

[1] yoext https://github.com/reshke/yoext/


v1-0001-Fix-pg_init_prevs-corruption.patch
Description: Binary data


Annoying corruption in PostgreSQL.

2023-10-27 Thread Kirill Reshke
Hi hackers!

We run a large amount of PostgreSQL clusters in our production. They differ
by versions (we have 11-16 pg), load, amount of data, schema, etc. From
time to time, postgresql corruption happens. It says
ERROR,XX001,"missing chunk number 0 for toast value 18767319 in
pg_toast_2619",,"vacuum full ;"

in logs. the missing chunk number  almost every is equal to zero, while
other values vary. There are no known patterns, which triggers this issue.
Moreover, if trying to rerun the VACUUM statement against relations from a
log message, it succeeds all the time.  So, we just ignore these errors.
Maybe it is just some wierd data race?

We don't know how to trigger this problem, or why it occurs. I'm not asking
you to resolve this issue, but to help with debugging. What can we do to
deduct failure reasons? Maybe we can add more logging somewhere (we can
deploy a special patched PostgreSQL version everywhere), to have more
information about the issue, when it happens next time?


Fwd: Annoying corruption in PostgreSQL.

2023-10-27 Thread Kirill Reshke
Sorry, seems that i replied only to Tomas, so forwarding message.
-- Forwarded message -
From: Kirill Reshke 
Date: Sat, 28 Oct 2023 at 02:06
Subject: Re: Annoying corruption in PostgreSQL.
To: Tomas Vondra 


Hi Tomas!

Thanks for the explanation!

1) 11 to 15. This week there were 14.9 and 12.16 reproductions. Two weeks
ago there was 15.4 and 11.21 repro. Unfortunately, there is no info about
repro which were month old or more, but I found in our work chats that
there was repro on PostgreSQL 13 in April, a minor version unknown.
Overall, we observed this issue for over a year on all pgdg supported
versions.

2) Searching out bug tracker i have found:

1. missing chunk number 0 for toast value 592966012 in pg_toast_563953150
(some user relation)
2. missing chunk number 0 for toast value 18019714 in pg_toast_17706963
(some user relation)
3. missing chunk number 0 for toast value 52677740 in pg_toast_247794

So, this is not always pg_catalog. There toast tables were toast to some
user relations.

3) It is always about VACUUM FULL (FREEZE/VERBOSE/ANALYZE) / autovacuum.

We have physical backups and we can PITR. But restoring a cluster to some
point in the past is a bit of a different task: we need our client's
approval for these operations, since we are a Managed DBs Cloud Provider.
Will try to ask someone.

Best regards


On Fri, 27 Oct 2023 at 23:28, Tomas Vondra 
wrote:

>
>
> On 10/27/23 14:19, Kirill Reshke wrote:
> > Hi hackers!
> >
> > We run a large amount of PostgreSQL clusters in our production. They
> > differ by versions (we have 11-16 pg), load, amount of data, schema,
> > etc. From time to time, postgresql corruption happens. It says
> > ERROR,XX001,"missing chunk number 0 for toast value 18767319 in
> > pg_toast_2619",,"vacuum full ;"
> >
> > in logs. the missing chunk number  almost every is equal to zero, while
> > other values vary. There are no known patterns, which triggers this
> > issue. Moreover, if trying to rerun the VACUUM statement against
> > relations from a log message, it succeeds all the time.  So, we just
> > ignore these errors. Maybe it is just some wierd data race?
> >
> > We don't know how to trigger this problem, or why it occurs. I'm not
> > asking you to resolve this issue, but to help with debugging. What can
> > we do to deduct failure reasons? Maybe we can add more logging somewhere
> > (we can deploy a special patched PostgreSQL version everywhere), to have
> > more information about the issue, when it happens next time?
> >
>
> For starters, it'd be good to know something about the environment, and
> stuff that'd tell us if there's some possible pattern:
>
> 1) Which exact PG versions are you observing these errors on?
>
> 2) In the error example you shared it's pg_toast_2619, which is the
> TOAST table for pg_statistic (probably). Is it always this relation? Or
> what relations you noticed this for?
>
> 3) What kind of commands are triggering this? In the example it seems to
> be vacuum full. Did you see it for other commands too? People generally
> don't do VACUUM FULL very often, particularly not in environments with
> concurrent activity.
>
> Considering you don't know what's causing this, or what to look for, I
> think it might be interesting to use pg_waldump, and investigate what
> happened to the page containing the TOAST chunk and to the page
> referencing it. Do you have physical backups and ability to do PITR?
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: Commitfest manager for July 2024

2024-07-01 Thread Kirill Reshke
> Postgres 17
18




Re: Allow non-superuser to cancel superuser tasks.

2024-07-09 Thread Kirill Reshke
Hi

On Tue, 9 Jul 2024 at 23:13, Nathan Bossart  wrote:
>
> I've committed 0001.  It looks like 0002 failed CI testing [0], but I
> haven't investigated why.
>
> [0] https://cirrus-ci.com/task/5668467599212544
>
> --
> nathan

The problem is the error message has been changed.

# DETAIL:  Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum workers.'
# doesn't match '(?^:ERROR:  permission denied to terminate
process\nDETAIL:  Only roles with privileges of the
"pg_signal_autovacuum_worker" role may terminate autovacuum worker
processes.)'
# Looks like you failed 1 test of 2.

I changed the test to match the error message.


v7-0002-Add-tap-test-for-pg_signal_autovacuum-role.patch
Description: Binary data


Re: Allow non-superuser to cancel superuser tasks.

2024-07-09 Thread Kirill Reshke
> The script has two tests, and the CI is failing for the second test
> where we expect the signal to be processed:
> [12:48:23.370] #   Failed test 'autovacuum worker signaled with
> pg_signal_autovacuum_worker granted'
> [12:48:23.370] #   at t/006_signal_autovacuum.pl line 90.

> --
> Michael

That's very strange, because the test works fine on my virtual
machine. Also, it seems that it works in Cirrus [0], as there is this
line:

[autovacuum worker] FATAL:  terminating autovacuum process due to
administrator command

after `SET ROLE signal_autovacuum_worker_role;` and  `SELECT
pg_terminate_backend` in the log file.

Somehow the `node->log_contains` check does not catch that. Maybe
there is some issue with `$offset`? Will try to investigate

[0] 
https://api.cirrus-ci.com/v1/artifact/task/5668467599212544/log/src/test/modules/test_misc/tmp_check/log/006_signal_autovacuum_node.log




Re: Allow non-superuser to cancel superuser tasks.

2024-07-10 Thread Kirill Reshke
Hi, that's for digging into this. Turns out I completely missed one of
your emails today morning.

On Wed, 10 Jul 2024 at 10:15, Michael Paquier  wrote:
> And then the timestamp of the tests:
> [12:48:23.277](0.058s) not ok 2 - autovacuum worker signaled with
> pg_signal_autovacuum_worker granted
>
> We check for the contents of the logs 1ms before they are generated,
> hence failing the lookup check because the test is faster than the
> backend.
>
> Like what we are doing in 010_pg_basebackup.pl, we could do a
> poll_query_until() until the PID of the autovacuum worker is gone from
> pg_stat_activity before fetching the logs as ProcessInterrupts() stuff
> would be logged before the process exits, say:
> +# Wait for the autovacuum worker to exit before scanning the logs.
> +$node->poll_query_until('postgres',
> +   "SELECT count(*) = 0 FROM pg_stat_activity "
> +   . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
>
> That gives something like the attached.  Does that look correct to
> you?
> --
> Michael

+1.




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-07-21 Thread Kirill Reshke
Hi!
I'm interested in the vacuum concurrently feature being inside the
core, so will try to review patch set and give valuable feedback. For
now, just a few little thoughts..


> The first version is attached. The actual feature is in 0003. 0004 is probably
> not necessary now, but I haven't realized until I coded it.

The logical replication vacuum approach is a really smart idea, I like
it. As far as I understand, pg_squeeze works well in real production
databases, which
gives us hope that the vacuum concurrent feature in core will be good
too... What is the size of the biggest relation successfully vacuumed
via pg_squeeze?
Looks like in case of big relartion or high insertion load,
replication may lag and never catch up...

However, in general, the 3rd patch is really big, very hard to
comprehend.  Please consider splitting this into smaller (and
reviewable) pieces.
Also, we obviously need more tests on this. Both tap-test and
regression tests I suppose.

One more thing is about pg_squeeze background workers. They act in an
autovacuum-like fashion, aren't they? Maybe we can support this kind
of relation processing in core too?




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-07-22 Thread Kirill Reshke
> Also, we obviously need more tests on this. Both tap-test and
> regression tests I suppose.

The one simple test to this patch can be done this way:

1) create test relation (call it vac_conc_r1 for example) and fill it
with dead tuples (insert + update or insert + delete)
2) create injection point preventing concurrent vacuum from compiling.
3) run concurrent vacuum (VACUUM FULL CONCURRENTLY) in separate thread
or in some other async way.
4) Insert new data in relation to vac_conc_r1.
5) Release injection point, assert that vacuum completed successfully.
6) check that all data is present in vac_conc_r1 (data from step 1 and
from step 4).

This way we can catch some basic buggs, if some paths of VACUUM
CONCURRENTLY will be touched in the future.
The problem with this test is: i don't know how to do anything async
in current TAP tests (needed in step 3). Also, maybe test with async
interaction
may be too flappy (producing false negative flaps) to support.
Sequential test for this feature would be much better, but I can't
think of one.

Also, should we create a cf entry for this thread already?




Re: Add new COPY option REJECT_LIMIT

2024-07-22 Thread Kirill Reshke
Hi! Nice feature.

Few comments:

> +  When a positive integer value is specified, COPY 
> limits
> +  the maximum tolerable number of errors while converting a column's 
> input
> +  value into its data type.

If nothing is specified, then the maximum tolerable number of errors
is one, right? Should we state  this explicitly in the documentation?

> +COPY x from stdin with (on_error ignore, reject_limit 0);
How about a test where reject_limit is a string, but not
(case-intensively) 'infinity'?

> + CopyRejectLimits reject_limits; /* thresholds of reject_limit */

Why are there multiple thresholds? Can we have only one?

Other places look good to me.




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-07-25 Thread Kirill Reshke
Hi!

On Tue, 30 Jan 2024 at 15:31, Alvaro Herrera  wrote:

> FWIW a newer, more modern and more trustworthy alternative to pg_repack
> is pg_squeeze, which I discovered almost by random chance, and soon
> discovered I liked it much more.

Can you please clarify this a bit more? What is the exact reason for
pg_squeeze being more trustworthy than pg_repack?
Is there something about the logical replication approach that makes
it more bulletproof than the trigger-based repack approach?

Also, I was thinking about pg_repack vs pg_squeeze being used for the
VACUUM FULL CONCURRENTLY feature, and I'm a bit suspicious about the
latter.
If I understand correctly, we essentially parse the whole WAL to
obtain info about one particular relation changes. That may be a big
overhead, whereas the trigger approach does
not suffer from this. So, there is the chance that VACUUM FULL
CONCURRENTLY will never keep up with vacuumed relation changes. Am I
right?




Re: Remove duplicate table scan in logical apply worker and code refactoring

2024-07-25 Thread Kirill Reshke
Hi!

> When reviewing the code in logical/worker.c, I noticed that when applying a
> cross-partition update action, it scans the old partition twice.

Nice catch!


> -/*
> - * Workhorse for apply_handle_update()
> - * relinfo is for the relation we're actually updating in
> - * (could be a child partition of edata->targetRelInfo)
> - */
> -static void
> -apply_handle_update_internal(ApplyExecutionData *edata,
> - ResultRelInfo *relinfo,
> - TupleTableSlot *remoteslot,
> - LogicalRepTupleData *newtup,
> - Oid localindexoid)
> -{

What's the necessity of this change? Can we modify a function in-place
instead of removing and rewriting it in the same file?
This will reduce diff, making patch a bit more clear.

> +/*
> + * If the tuple to be modified could not be found, a log message is emitted.
> + */
> +static void
> +report_tuple_not_found(CmdType cmd, Relation targetrel, bool is_partition)
> +{
> + Assert(cmd == CMD_UPDATE || cmd == CMD_DELETE);
> +
> + /* XXX should this be promoted to ereport(LOG) perhaps? */
> + elog(DEBUG1,
> + "logical replication did not find row to be %s in replication target 
> relation%s \"%s\"",
> + cmd == CMD_UPDATE ? "updated" : "deleted",
> + is_partition ? "'s partition" : "",
> + RelationGetRelationName(targetrel));
> +}

Encapsulating report logic inside function is ok, but double ternary
expression is a bit out of line. I do not see similar places within
PostgreSQL,
so it probably violates code style.




Re: Incremental View Maintenance, take 2

2024-07-27 Thread Kirill Reshke
Hi!
Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query
use matview) feature, so i got interested in how it is implemented.

On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
>
> I updated the patch to bump up the version numbers in psql and pg_dump codes
> from 17 to 18.

Few suggestions:

1) `Add-relisivm-column-to-pg_class-system-catalog` commit message
should be fixed, there is "isimmv" in the last line.
2) I dont get why `Add-Incremental-View-Maintenance-support.patch`
goes after 0005 & 0004. Shoulndt we first implement feature server
side, only when client (psql & pg_dump) side?
3) Can we provide regression tests for each function separately? Test
for main feature in main patch, test for DISTINCT support in
v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset
will be easier to review, and can be committed separelety.
4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer
applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After
resolving issues manually, it does not compile, because
4b74ebf726d444ba820830cad986a1f92f724649 also removes
save_userid/save_sec_context fields from ExecCreateTableAs.

>   if (RelationIsIVM(matviewRel) && stmt->skipData)
Now this function accepts skipData param.

5) For DISTINCT support patch uses hidden __ivm* columns. Is this
design discussed anywhere? I wonder if this is a necessity (only
solution) or if there are alternatives.
6)
What are the caveats of supporting some simple cases for aggregation
funcs like in example?
```
regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT
sum(j) + sum(i) from mv_base_a;
ERROR:  expression containing an aggregate in it is not supported on
incrementally maintainable materialized view
```
I can see some difficulties with division CREATE IMMV  AS SELECT
1/sum(i) from mv_base_a;  (sum(i) == 0 case), but adding &
multiplication should be ok, aren't they?


Overall, patchset looks mature, however it is far from being
committable due to lack of testing/feedback/discussion. There is only
one way to fix this... Test and discuss it!


[1] https://github.com/cloudberrydb/cloudberrydb




Fix smgrtruncate code comment.

2024-07-29 Thread Kirill Reshke
Today I was busy rebasing my Greenplum-related Extendable SMGR
patches on Cloudberry, and I faced some conflicts.
While resolving them I noticed code & comments inconsistency in smgr.c
in smgrtruncate function, which tracks down to
c5315f4f44843c20ada876fdb0d0828795dfbdf5. In this commit,
smgr_fsm_nblocks & smgr_vm_nblocks fields were removed, however
comments were not fixed accordingly.

So i suggest to fix this, PFA


v1-0001-Fix-smgrtruncate-code-comment.patch
Description: Binary data


Re: Incremental View Maintenance, take 2

2024-07-29 Thread Kirill Reshke
On Sat, 27 Jul 2024 at 13:26, Kirill Reshke  wrote:
>
> Hi!
> Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query
> use matview) feature, so i got interested in how it is implemented.
>
> On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
> >
> > I updated the patch to bump up the version numbers in psql and pg_dump codes
> > from 17 to 18.
>
> Few suggestions:
>
> 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message
> should be fixed, there is "isimmv" in the last line.
> 2) I dont get why `Add-Incremental-View-Maintenance-support.patch`
> goes after 0005 & 0004. Shoulndt we first implement feature server
> side, only when client (psql & pg_dump) side?
> 3) Can we provide regression tests for each function separately? Test
> for main feature in main patch, test for DISTINCT support in
> v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset
> will be easier to review, and can be committed separelety.
> 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer
> applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After
> resolving issues manually, it does not compile, because
> 4b74ebf726d444ba820830cad986a1f92f724649 also removes
> save_userid/save_sec_context fields from ExecCreateTableAs.
>
> >   if (RelationIsIVM(matviewRel) && stmt->skipData)
> Now this function accepts skipData param.
>
> 5) For DISTINCT support patch uses hidden __ivm* columns. Is this
> design discussed anywhere? I wonder if this is a necessity (only
> solution) or if there are alternatives.
> 6)
> What are the caveats of supporting some simple cases for aggregation
> funcs like in example?
> ```
> regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT
> sum(j) + sum(i) from mv_base_a;
> ERROR:  expression containing an aggregate in it is not supported on
> incrementally maintainable materialized view
> ```
> I can see some difficulties with division CREATE IMMV  AS SELECT
> 1/sum(i) from mv_base_a;  (sum(i) == 0 case), but adding &
> multiplication should be ok, aren't they?
>
>
> Overall, patchset looks mature, however it is far from being
> committable due to lack of testing/feedback/discussion. There is only
> one way to fix this... Test and discuss it!
>
>
> [1] https://github.com/cloudberrydb/cloudberrydb

Hi! Small update: I tried to run a regression test and all
IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I
will try to investigate.

Another suggestion: support for \d and \d+ commands in psql. With v34
patchset applied, psql does not show anything IMMV-related in \d mode.

```
reshke=# \d m1
   Materialized view "public.m1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 i  | integer |   |  |
Distributed by: (i)


reshke=# \d+ m1
 Materialized view "public.m1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   |  | | plain   |
|  |
View definition:
 SELECT t1.i
   FROM t1;
Distributed by: (i)
Access method: heap

```

Output should be 'Incrementally materialized view "public.m1"' IMO.




Re: COPY FROM crash

2024-07-29 Thread Kirill Reshke
Hi!

On Tue, 30 Jul 2024 at 08:52, Zhang Mingli  wrote:
>
> Hi, all
>
> I got a crash when copy partition tables with mass data in Cloudberry 
> DB[0](based on Postgres14.4, Greenplum 7).
>
> I have a test on Postgres and it has the similar issue(different places but 
> same function).

Just to be clear, you are facing this on HEAD, on on REL_14_STABLE?


> However it’s a little hard to reproduce because it happened when inserting 
> next tuple after a previous copy multi insert buffer is flushed.
>
> To reproduce easily, change the Macros to:
>
> #define MAX_BUFFERED_TUPLES 1
> #define MAX_PARTITION_BUFFERS 0

This way it's harder to believe that the problem persists with the
original settings. Are these values valid?




Re: Extensible storage manager API - smgr hooks

2022-06-17 Thread Kirill Reshke
Hello Yura and Anastasia.

I have tried to implement per-relation SMGR approach, and faced with a
serious problem with redo.

So, to implement per-relation SMGR feature i have tried to do things
similar to custom table AM apporach: that is, we can define our custom SMGR
in an extention (which defines smgr handle) and then use this SMGR in
relation definition. like this:

```postgres=# create extension proxy_smgr ;
CREATE EXTENSION
postgres=# select * from pg_smgr ;
  oid  |  smgrname  |smgrhandler
---++
  4646 | md | smgr_md_handler
 16386 | proxy_smgr | proxy_smgr_handler
(2 rows)

postgres=# create table tt(i int) storage manager proxy_smgr_handler;
ERROR:  storage manager "proxy_smgr_handler" does not exist
postgres=# create table tt(i int) storage manager proxy_smgr;
INFO:  proxy open 1663 5 16391
INFO:  proxy create 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
CREATE TABLE
postgres=# select * from tt;
INFO:  proxy open 1663 5 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
(0 rows)

postgres=# insert into tt values(1);
INFO:  proxy exists 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
INFO:  proxcy extend 16391
INSERT 0 1
postgres=# select * from tt;
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
 1
(1 row)
```

extention sql files looks like this:

```
CREATE FUNCTION proxy_smgr_handler(internal)
RETURNS table_smgr_handler
AS 'MODULE_PATHNAME'
LANGUAGE C;

-- Storage manager
CREATE STORAGE MANAGER proxy_smgr HANDLER proxy_smgr_handler;
```

To do this i have defined catalog relation pg_smgr where i store smgr`s
handlers and use this relation when we need to open some other(non-catalog)
relations in smgropen function. The patch almost passes regression tests(8
of 214 tests failed.) but it fails on first checkpoint or in crash
recorvery. Also, i have changed WAL format, added SMGR oid to each WAL
record with RelFileNode structure. Why do we need WAL changes? well, i
tried to solve folowing issue.

As i mentioned, there is a problem with redo, with is: we cannot do
syscache search to get relation`s SMGR to apply wal, because syscache is
not initialized during redo (crash recovery). As i understand, syscache is
not initialised because system catalogs are not consistent until crash
recovery is done.


So, thants it, I decided to write to this thread to get feedback and
understand how best to solve the problem with redo.

What do you think?

On Thu, Jun 16, 2022 at 1:38 PM Andres Freund  wrote:

> Hi,
>
> On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> > Anastasia Lubennikova писал 2021-06-30 00:49:
> > > Hi, hackers!
> > >
> > > Many recently discussed features can make use of an extensible storage
> > > manager API. Namely, storage level compression and encryption [1],
> > > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > > other features that may want to substitute PostgreSQL storage layer
> > > with their implementation (i.e. lazy_restore [6]).
> > >
> > > Attached is a proposal to change smgr API to make it extensible.  The
> > > idea is to add a hook for plugins to get control in smgr and define
> > > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > > selector with smgr() function that loads f_smgr implementation.
> > >
> > > As before it has only one implementation - smgr_md, which is wrapped
> > > into smgr_standard().
> > >
> > > To create custom implementation, a developer needs to implement smgr
> > > API functions
> > > static const struct f_smgr smgr_custom =
> > > {
> > > .smgr_init = custominit,
> > > ...
> > > }
> > >
> > > create a hook function
> > >
> > >const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> > >   {
> > >   //Here we can also add some logic and chose which smgr to use
> > > based on rnode and backend
> > >   return &smgr_custom;
> > >   }
> > >
> > > and finally set the hook:
> > > smgr_hook = smgr_custom;
> > >
> > > [1]
> > >
> https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
> > > [2]
> > >
> https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > > [4]
> > >
> https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > > [5]
> > >
> https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > > [6]
> > > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> > >
> > > --
> > >
> > > Best regards,
> > > Lubennikova Anastasia
> >
> > Good day, Anastasia.
> >
> > I also think smgr should be extended with different implementations
> aside of
> > md.
> > But which way concrete implementation will be chosen for particular
> > relation?
> > I believ

Use fadvise in wal replay

2022-06-21 Thread Kirill Reshke
Hi hackers!

Recently we faced a problem with one of our production clusters. We use a
cascade replication setup in this cluster, that is: master, standby (r1),
and cascade standby (r2). From time to time, the replication lag on r1 used
to grow, while on r2 it did not. Analysys showed that r1 startup process
was spending a lot of time in reading wal from disk. Increasing
/sys/block/md2/queue/read_ahead_kb to 16384 (from 0) helps in this case.
Maybe we can add fadvise call in postgresql startup, so it would not be
necessary to change settings on the hypervisor?


v1-0001-Use-fadvise-to-prefect-wal-in-xlogrecovery.patch
Description: Binary data


Re: Incremental View Maintenance, take 2

2024-07-30 Thread Kirill Reshke
On Tue, 30 Jul 2024 at 03:32, Kirill Reshke  wrote:
>
> On Sat, 27 Jul 2024 at 13:26, Kirill Reshke  wrote:
> >
> > Hi!
> > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query
> > use matview) feature, so i got interested in how it is implemented.
> >
> > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
> > >
> > > I updated the patch to bump up the version numbers in psql and pg_dump 
> > > codes
> > > from 17 to 18.
> >
> > Few suggestions:
> >
> > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message
> > should be fixed, there is "isimmv" in the last line.
> > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch`
> > goes after 0005 & 0004. Shoulndt we first implement feature server
> > side, only when client (psql & pg_dump) side?
> > 3) Can we provide regression tests for each function separately? Test
> > for main feature in main patch, test for DISTINCT support in
> > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset
> > will be easier to review, and can be committed separelety.
> > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer
> > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After
> > resolving issues manually, it does not compile, because
> > 4b74ebf726d444ba820830cad986a1f92f724649 also removes
> > save_userid/save_sec_context fields from ExecCreateTableAs.
> >
> > >   if (RelationIsIVM(matviewRel) && stmt->skipData)
> > Now this function accepts skipData param.
> >
> > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this
> > design discussed anywhere? I wonder if this is a necessity (only
> > solution) or if there are alternatives.
> > 6)
> > What are the caveats of supporting some simple cases for aggregation
> > funcs like in example?
> > ```
> > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT
> > sum(j) + sum(i) from mv_base_a;
> > ERROR:  expression containing an aggregate in it is not supported on
> > incrementally maintainable materialized view
> > ```
> > I can see some difficulties with division CREATE IMMV  AS SELECT
> > 1/sum(i) from mv_base_a;  (sum(i) == 0 case), but adding &
> > multiplication should be ok, aren't they?
> >
> >
> > Overall, patchset looks mature, however it is far from being
> > committable due to lack of testing/feedback/discussion. There is only
> > one way to fix this... Test and discuss it!
> >
> >
> > [1] https://github.com/cloudberrydb/cloudberrydb
>
> Hi! Small update: I tried to run a regression test and all
> IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I
> will try to investigate.
>
> Another suggestion: support for \d and \d+ commands in psql. With v34
> patchset applied, psql does not show anything IMMV-related in \d mode.
>
> ```
> reshke=# \d m1
>Materialized view "public.m1"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  i  | integer |   |  |
> Distributed by: (i)
>
>
> reshke=# \d+ m1
>  Materialized view "public.m1"
>  Column |  Type   | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> +-+---+--+-+-+-+--+-
>  i  | integer |   |  | | plain   |
> |  |
> View definition:
>  SELECT t1.i
>FROM t1;
> Distributed by: (i)
> Access method: heap
>
> ```
>
> Output should be 'Incrementally materialized view "public.m1"' IMO.



And one more thing, noticed today while playing with patchset:
I believe non-terminal incremental should be OptIncremental

Im talking about this:
```
incremental: INCREMENTAL { $$ = true; }
| /*EMPTY*/ { $$ = false; }
;
```




Lack of possibility to specify CTAS TAM

2024-07-31 Thread Kirill Reshke
I have noticed $subj while working with other unrelated patches.
The question is, why there is no CREATE TABLE AS  USING
(some_access_method)?
This feature looks straightforward, and lack of it is a bit of
inconsistency from my point of view.
Maybe there are some unobvious caveats with implementing it?
I have done a little research reading related threads [1][2], but
these do not address $subj, if i'm not missing anything.
Neither can I find an open CF entry/thread implementing this (Im
looking here http://cfbot.cputube.org/) .

The same storage specification feature can actually be supported for
CTAE (like CTAS but execute) and CREATE MATERIALIZED VIEW.

I can try to propose a POC patch implementing $subj if there are no objections
to having this functionality in the core.


[1] 
https://www.postgresql.org/message-id/flat/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de
[2] 
https://www.postgresql.org/message-id/flat/20160812231527.GA690404%40alvherre.pgsql




Re: Lack of possibility to specify CTAS TAM

2024-07-31 Thread Kirill Reshke
On Wed, 31 Jul 2024, 12:12 Andrey M. Borodin,  wrote:

>
> Currently we do not have so many TAMs
>
Currently we do not have so many TAM in core. Outside core there is
actually a quite a number of projects doing TAMs. Orioledb is one example.

>


Re: Lack of possibility to specify CTAS TAM

2024-07-31 Thread Kirill Reshke
On Wed, 31 Jul 2024 at 12:15, David G. Johnston
 wrote:
>
> On Wednesday, July 31, 2024, Kirill Reshke  wrote:
>>
>> I have noticed $subj while working with other unrelated patches.
>> The question is, why there is no CREATE TABLE AS  USING
>> (some_access_method)?
>
>
> The syntax is documented…
My bad.
Everything is supported in core actually..




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Kirill Reshke
On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
> should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?
Sure

> Comments?
Seems like this is indeed inconsistent behaviour and should be fixed
in all PGDG-supported versions in the upcoming August release.
Do you have any suggestions on how to fix this?


> Regards,
> Jeff Davis




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-01 Thread Kirill Reshke
On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
>
> EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().

EXPLAIN ANALYZE and regular query goes through create_ctas_internal
(WITH NO DATA case too). Maybe we can simply move
SetUserIdAndSecContext call in this function?




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-08-02 Thread Kirill Reshke
On Fri, 2 Aug 2024 at 11:09, Antonin Houska  wrote:
>
> Kirill Reshke  wrote:
> > However, in general, the 3rd patch is really big, very hard to
> > comprehend.  Please consider splitting this into smaller (and
> > reviewable) pieces.
>
> I'll try to move some preparation steps into separate diffs, but not sure if
> that will make the main diff much smaller. I prefer self-contained patches, as
> also explained in [3].

Thanks for sharing [3], it is a useful link.

There is actually one more case when ACCESS EXCLUSIVE is held: during
table rewrite (AT set TAM, AT set Tablespace and AT alter column type
are some examples).
This can be done CONCURRENTLY too, using the same logical replication
approach, or do I miss something?
I'm not saying we must do it immediately, this should be a separate
thread, but we can do some preparation work here.

I can see that a bunch of functions which are currently placed in
cluster.c can be moved to something like
logical_rewrite_heap.c. ConcurrentChange struct and
apply_concurrent_insert function  is one example of such.

So, if this is the case, 0003 patch can be splitted in two:
The first one is general utility code for logical table rewrite
The second one with actual VACUUM CONCURRENTLY feature.

What do you think?




Small refactoring around vacuum_open_relation

2024-08-02 Thread Kirill Reshke
I hate to be "that guy", but there are many places in sources where we use
LOCKMODE lockmode; variable and exactly one where we use LOCKMODE
lmode: it is vacuum_open_relation function.
Is it worth a patch?


v1-0001-Rename-vacuum_open_relation-argument-to-lockmode.patch
Description: Binary data


Re: Small refactoring around vacuum_open_relation

2024-08-02 Thread Kirill Reshke
Thanks for review!

On Fri, 2 Aug 2024 at 14:31, Ashutosh Bapat
 wrote:
>
> On Fri, Aug 2, 2024 at 1:55 PM Kirill Reshke  wrote:
> >
> > I hate to be "that guy", but there are many places in sources where we use
> > LOCKMODE lockmode; variable and exactly one where we use LOCKMODE
> > lmode: it is vacuum_open_relation function.
>
> There are more instances of LOCKMODE lmode; I spotted one in plancat.c as 
> well.

Nice catch!

> Case1:
> toast_get_valid_index(Oid toastoid, LOCKMODE lock)
> toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
> GetLockmodeName(LOCKMETHODID lockmethodid, LOCKMODE mode)
> LOCKMODE mode = 0;
> Case 2:
> qualified variable names like
> LOCKMODE heapLockmode;
> LOCKMODE memlockmode;
> LOCKMODE table_lockmode;
> LOCKMODE parentLockmode;
> LOCKMODE cmd_lockmode = AccessExclusiveLock; /* default for compiler */
> LOCK_PRINT(const char *where, const LOCK *lock, LOCKMODE type)
>
> case3: some that have two LOCKMODE instances like
> DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)

Nice catch!

> > Is it worth a patch?
>
> When I see a variable with name lockmode, I know it's of type
> LOCKMODE. So changing the Case1 may be worth it. It's not a whole lot
> of code churn as well. May be patch backbranches.
>
> Case2 we should leave as is since the variable name has lockmode in it.
+1

> Case3, worth changing to lockmode1 and lockmode2.
Agree
> --
> Best Wishes,
> Ashutosh Bapat

Attached v2 patch with your suggestions addressed.


v2-0001-Rename-LOCKMODE-type-vairables-to-lockmode.patch
Description: Binary data


Support multi-column renaming?

2024-08-06 Thread Kirill Reshke
I have noticed that ALTER TABLE supports multiple column actions
(ADD/DROP column etc), but does not support multiple column renaming.
See [1]

Here is small example of what im talking about:

```
db2=# create table tt();
CREATE TABLE

-- multiple column altering ok
db2=# alter table tt add column i int, add column j int;
ALTER TABLE

-- single column renaming ok
db2=# alter table tt rename column i to i2;
ALTER TABLE
-- multiple column renaming not allowed
db2=# alter table tt rename column i2 to i3, rename column j to j2;
ERROR:  syntax error at or near ","
LINE 1: alter table tt rename column i2 to i3, rename column j to j2...
 ^
db2=#
```

Looking closely on gram.y, the only reason for this is that RenameStmt
is defined less flexible than alter_table_cmds (which is a list). All
other core infrastructure is ready to allow $subj.

So is it worth a patch?


[1] https://www.postgresql.org/docs/current/sql-altertable.html

-- 
Best regards,
Kirill Reshke




Re: Incremental View Maintenance, take 2

2024-08-06 Thread Kirill Reshke
On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
>
> On Tue, 2 Jul 2024 17:03:11 +0900
> Yugo NAGATA  wrote:
>
> > On Sun, 31 Mar 2024 22:59:31 +0900
> > Yugo NAGATA  wrote:
> > > >
> > > > Also, I added a comment on RelationIsIVM() macro persuggestion from 
> > > > jian he.
> > > > In addition, I fixed a failure reported from cfbot on FreeBSD build 
> > > > caused by;
> > > >
> > > >  WARNING:  outfuncs/readfuncs failed to produce an equal rewritten 
> > > > parse tree
> > > >
> > > > This warning was raised since I missed to modify outfuncs.c for a new 
> > > > field.
> > >
> > > I found cfbot on FreeBSD still reported a failure due to
> > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used
> > > wrong role names. Attached is a fixed version, v32.
> >
> > Attached is a rebased version, v33.
>
> I updated the patch to bump up the version numbers in psql and pg_dump codes
> from 17 to 18.
>
> Regards,
> Yugo Nagata
>
> >
> > Regards,
> > Yugo Nagata
> >
> >
> > --
> > Yugo NAGATA 
>
>
> --
> Yugo NAGATA 

Small updates with something o found recent days:

```
db2=# create incremental materialized view v2 as select * from v1;
ERROR:  VIEW or MATERIALIZED VIEW is not supported on incrementally
maintainable materialized view
```
Error messaging is not true, create view v2 as select * from v1; works fine.


```
db2=# create incremental materialized view vv2 as select i,j2, i / j2
from t1 join t2 on true;
db2=# insert into t2 values(1,0);
ERROR:  division by zero
```
It is very strange to receive `division by zero` while inserting into
relation, isn't it? Can we add some hints/CONTEXT here?
Regular triggers do it:
```
db2=# insert into ttt values(10,0);
ERROR:  division by zero
CONTEXT:  PL/pgSQL function f1() line 3 at IF
```


-- 
Best regards,
Kirill Reshke




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Kirill Reshke
On Tue, 6 Aug 2024 at 21:06, Yugo Nagata  wrote:
>
> On Thu, 1 Aug 2024 23:41:18 +0500
> Kirill Reshke  wrote:
>
> > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
> > > should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?
> > Sure
>
> REFRESH MATERIALIZED VIEW consists of not only the view query
> execution in refresh_matview_datafill but also refresh_by_heap_swap
> or refresh_by_match_merge. The former doesn't execute any query, so
> it would not a problem, but the latter executes additional queries
> including SELECT, some DDL, DELETE, and INSERT.
>
> If we would make EXPLAIN support REFRESH MATERIALIZED VIEW CONCURRENTLY
> command,  how should we handle these additional queries?
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 

Hmm, is it a big issue? Maybe we can just add them in proper places of
output the same way we do it with triggers?

-- 
Best regards,
Kirill Reshke




Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Kirill Reshke
On Tue, 6 Aug 2024 at 22:13, Yugo Nagata  wrote:
>
> On Thu, 01 Aug 2024 13:34:51 -0700
> Jeff Davis  wrote:
>
> > On Fri, 2024-08-02 at 00:13 +0500, Kirill Reshke wrote:
> > > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > >
> > > > EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> > > > ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().
> > >
> > > EXPLAIN ANALYZE and regular query goes through create_ctas_internal
> > > (WITH NO DATA case too). Maybe we can simply move
> > > SetUserIdAndSecContext call in this function?
> >
> > We need to set up the SECURITY_RESTRICTED_OPERATION before planning, in
> > case the planner executes some functions.
> >
> > I believe we need to do some more refactoring to make this work. In
> > version 17, I already refactored so that CREATE MATERIALIZED VIEW and
> > REFRESH MATERIALIZED VIEW share the code. We can do something similar
> > to extend that to EXPLAIN ... CREATE MATERIALIZED VIEW.
>
> I think the most simple way to fix this is to set up the userid and the
> the SECURITY_RESTRICTED_OPERATION, and call RestrictSearchPath()
> before pg_plan_query in ExplainOneQuery(), as in the attached patch.
>
> However, this is similar to the old code of ExecCreateTableAs() before
> refactored. If we want to reuse REFRESH even in EXPLAIN as similar as the
> current CREATE MATERIALIZED code, I think we would have to refactor
> RefereshMatViewByOid to receive instrument_option and eflags as arguments
> and call it in ExplainOnePlan, for example.
>
> Do you think that is more preferred than setting up
> SECURITY_RESTRICTED_OPERATION in ExplainOneQuery?
>
> > As for the August release, the code freeze is on Saturday. Perhaps it
> > can be done by then, but is there a reason we should rush it? This
> > affects all supported versions, so we've lived with it for a while, and
> > I don't see a security problem here. I wouldn't expect it to be a
> > common use case, either.
>
> I agree that we don't have to rush it since it is not a security bug
> but just it could make a materialized view that cannot be refreshed.
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 

> + /*
> + * For CREATE MATERIALIZED VIEW command, switch to the owner's userid, so
> + * that any functions are run as that user.  Also lock down 
> security-restricted
> + * operations and arrange to make GUC variable changes local to this command.
> + */
> + if (into && into->viewQuery)
> + {
> + GetUserIdAndSecContext(&save_userid, &save_sec_context);
> + SetUserIdAndSecContext(save_userid,
> +   save_sec_context | SECURITY_RESTRICTED_OPERATION);
> + save_nestlevel = NewGUCNestLevel();
> + RestrictSearchPath();
> + }
> +

In general, doing this ad-hoc coding for MV inside
standard_ExplainOneQuery function looks just out of place for me.
standard_ExplainOneQuery is on another level of abstraction.

-- 
Best regards,
Kirill Reshke




Re: Incremental View Maintenance, take 2

2024-08-08 Thread Kirill Reshke
I am really sorry for splitting my review comments into multiple
emails. I'll try to do a better review in a future, all-in-one.

On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
>
> On Tue, 2 Jul 2024 17:03:11 +0900
> Yugo NAGATA  wrote:
>
> > On Sun, 31 Mar 2024 22:59:31 +0900
> > Yugo NAGATA  wrote:
> > > >
> > > > Also, I added a comment on RelationIsIVM() macro persuggestion from 
> > > > jian he.
> > > > In addition, I fixed a failure reported from cfbot on FreeBSD build 
> > > > caused by;
> > > >
> > > >  WARNING:  outfuncs/readfuncs failed to produce an equal rewritten 
> > > > parse tree
> > > >
> > > > This warning was raised since I missed to modify outfuncs.c for a new 
> > > > field.
> > >
> > > I found cfbot on FreeBSD still reported a failure due to
> > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used
> > > wrong role names. Attached is a fixed version, v32.
> >
> > Attached is a rebased version, v33.
>
> I updated the patch to bump up the version numbers in psql and pg_dump codes
> from 17 to 18.
>
> Regards,
> Yugo Nagata
>
> >
> > Regards,
> > Yugo Nagata
> >
> >
> > --
> > Yugo NAGATA 
>
>
> --
> Yugo NAGATA 

1) Provided patches do not set process title correctly:
```
reshke   2602433 18.7  0.1 203012 39760 ?Rs   20:41   1:58
postgres: reshke ivm [local] CREATE MATERIALIZED VIEW
```
2) We allow to REFRESH IMMV. Why? IMMV should be always up to date.
Well, I can see that this utility command may be useful in case of
corruption of some base relation/view itself, so there will be a need
to rebuild the whole from scratch.
But we already have VACUUM FULL for this, aren't we?

3) Triggers created for IMMV are not listed via \dS [tablename]

4) apply_old_delta_with_count executes non-trivial SQL statements for
IMMV. It would be really helpful to see this in EXPLAIN ANALYZE.

5)
> + "DELETE FROM %s WHERE ctid IN ("
> + "SELECT tid FROM (SELECT pg_catalog.row_number() over (partition by %s) AS 
> \"__ivm_row_number__\","
> +  "mv.ctid AS tid,"
> +  "diff.\"__ivm_count__\""
> + "FROM %s AS mv, %s AS diff "
> + "WHERE %s) v "
> + "WHERE v.\"__ivm_row_number__\" OPERATOR(pg_catalog.<=) 
> v.\"__ivm_count__\")",
> + matviewname,
> + keysbuf.data,
> + matviewname, deltaname_old,
> + match_cond);

`SELECT pg_catalog.row_number()` is too generic to my taste. Maybe
pg_catalog.immv_row_number() /  pg_catalog.get_immv_row_number() ?

6)

> +static void
> +apply_new_delta(const char *matviewname, const char *deltaname_new,
> + StringInfo target_list)
> +{
> + StringInfoData querybuf;
>+
> + /* Search for matching tuples from the view and update or delete if found. 
> */

Is this comment correct? we only insert tuples here?

7)

 During patch development, one should pick OIDs from range 8000-
> +# IVM
> +{ oid => '786', descr => 'ivm trigger (before)',
> +  proname => 'IVM_immediate_before', provolatile => 'v', prorettype => 
> 'trigger',
> +  proargtypes => '', prosrc => 'IVM_immediate_before' },
> +{ oid => '787', descr => 'ivm trigger (after)',
> +  proname => 'IVM_immediate_maintenance', provolatile => 'v', prorettype => 
> 'trigger',
> +  proargtypes => '', prosrc => 'IVM_immediate_maintenance' },
> +{ oid => '788', descr => 'ivm filetring ',
> +  proname => 'ivm_visible_in_prestate', provolatile => 's', prorettype => 
> 'bool',
> +  proargtypes => 'oid tid oid', prosrc => 'ivm_visible_in_prestate' },
> ]


--
Best regards,
Kirill Reshke




Re: Incremental View Maintenance, take 2

2024-08-08 Thread Kirill Reshke
On Wed, 31 May 2023 at 20:14, Yugo NAGATA  wrote:
>
> Hello hackers,
>
> Here's a rebased version of the patch-set adding Incremental View
> Maintenance support for PostgreSQL. That was discussed in [1].
>
> The patch-set consists of the following eleven patches.
>
> - 0001: Add a syntax to create Incrementally Maintainable Materialized Views
> - 0002: Add relisivm column to pg_class system catalog
> - 0003: Allow to prolong life span of transition tables until transaction end
> - 0004: Add Incremental View Maintenance support to pg_dum
> - 0005: Add Incremental View Maintenance support to psql
> - 0006: Add Incremental View Maintenance support
> - 0007: Add DISTINCT support for IVM
> - 0008: Add aggregates support in IVM
> - 0009: Add support for min/max aggregates for IVM
> - 0010: regression tests
> - 0011: documentation
>
> [1] 
> https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp
>
>
> Regards,
> Yugo Nagata
>
> --
> Yugo NAGATA 

Actually, this new MV delta-table calculation can be used to make
faster REFRESH MATERIALIZED VIEW even for non-IMMV. Specifically, we
can use our cost-based Optimizer to decide which way is cheaper:
regular query execution, or delta-table approach (if it is
applicable).

Is it worth another thread?

-- 
Best regards,
Kirill Reshke




Re: [Patch] remove duplicated smgrclose

2024-08-09 Thread Kirill Reshke
On Thu, 1 Aug 2024 at 17:32, Junwang Zhao  wrote:
>
> Hi Steven,
>
> On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
> >
> > Hello, hackers,
> >
> > I think there may be some duplicated codes.
> > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and 
> > smgrclose().
> > But both functions would close SMgrRelation object, it's dupliacted 
> > behavior?
> >
> > So I make this patch. Could someone take a look at it?
> >
> > Thanks for your help,
> > Steven
> >
> > From Highgo.com
> >
> >
> You change LGTM, but the patch seems not to be applied to HEAD,
> I generate the attached v2 using `git format` with some commit message.
>
> --
> Regards
> Junwang Zhao

Hi all!
This change looks good to me. However, i have an objection to these
lines from v2:

>  /* Close the forks at smgr level */
> - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> - smgrsw[which].smgr_close(rels[i], forknum);
> + smgrclose(rels[i]);

Why do we do this? This seems to be an unrelated change given thread
$subj. This is just a pure refactoring job, which deserves a separate
patch. There is similar coding in
smgrdestroy function:

```
for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
smgrsw[reln->smgr_which].smgr_close(reln, forknum);
```

So, I feel like these two places should be either changed together or
not be altered at all. And is it definitely a separate change.

-- 
Best regards,
Kirill Reshke




Re: EphemeralNamedRelation and materialized view

2024-08-10 Thread Kirill Reshke
On Fri, 26 Jul 2024 at 12:07, Yugo Nagata  wrote:
>
> Hi,
>
> While looking into the commit b4da732fd64e936970f38c792f8b32c4bdf2bcd5,
> I noticed that we can create a materialized view using Ephemeral Named
> Relation in PostgreSQL 16 or earler.
>
>
> postgres=# create table tbl (i int);
> CREATE TABLE
>  ^
> postgres=# create or replace function f() returns trigger as $$ begin
>  create materialized view mv as select * from enr; return new; end; $$ 
> language plpgsql;
> CREATE FUNCTION
>
> postgres=# create trigger trig after insert on tbl referencing new table as 
> enr execute function f();
> CREATE TRIGGER
>
> postgres=# insert into tbl values (10);
>
> postgres=# \d
>  List of relations
>  Schema | Name |   Type| Owner
> +--+---+
>  public | mv   | materialized view | yugo-n
>  public | tbl  | table | yugo-n
> (2 rows)
>
>
> We cannot refresh or get the deinition of it, though.
>
> postgres=# refresh materialized view mv;
> ERROR:  executor could not find named tuplestore "enr"
>
> postgres=# \d+ mv
> ERROR:  unrecognized RTE kind: 7
>
> In PostgreSQL 17, materialized view using ENR cannot be created
> because queryEnv is not pass to RefreshMatViewByOid introduced by b4da732fd64.
> When we try to create it, the  error is raised.
>
>  ERROR: executor could not find named tuplestore "enr"
>
> Although it is hard to imagine users actually try to create materialized view
> using ENR, how about prohibiting it even in PG16 or earlier by passing NULL
> as queryEnv arg in CreateQueryDesc to avoid to create useless matviews 
> accidentally,
> as the attached patch?
>
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 

Hi
I think this is a clear bug fix, and should be backported in pg v12-v16.
LTGM

P.S should be set https://commitfest.postgresql.org/49/5153/ entry as RFC?

-- 
Best regards,
Kirill Reshke




Re: Constant Splitting/Refactoring

2024-08-10 Thread Kirill Reshke
On Wed, 13 Mar 2024 at 20:42, David Christensen
 wrote:
>
> Here is a version 2 of this patch, rebased atop 97d85be365.
>
> As before, this is a cleanup/prerequisite patch series for the page
> features/reserved page size patches[1].  (Said patch series is going
> to be updated shortly.)
>
> This splits each of the 4 constants that care about page size into
> Cluster-specific and -Limit variants, the first intended to become a
> variable in time, and the second being the maximum value such a
> variable may take (largely used for static
> allocations).
>
> Since these patches define these symbols to have the same values they
> previously had, there are no changes in functionality.  These were
> largely mechanical changes, and as such we should perhaps consider
> making the same changes to back-branches to make it so context lines
> and the like
> would be the same, simplifying maintainer's efforts when applying code
> in back branches that touch similar areas.
>
> The script I have to make these changes is simple, and could be run
> against the back branches with only the comments surrounding Calc()
> pieces needing
> to be adjusted once.
>
> Thanks,
>
> David
>
> [1] https://commitfest.postgresql.org/47/3986/

Hi! Your patchset needs a rebase. Are you going to push it forward?

Also, It is better to have more informative commit messages in patches.
-- 
Best regards,
Kirill Reshke




Re: [PATCH] Add log_transaction setting

2024-08-10 Thread Kirill Reshke
On Thu, 4 Jul 2024 at 21:46, Sergey Solovev
 wrote:
>
> Hi.
>
> We encountered a problem with excessive logging when transaction is
> sampled.
> When it is getting sampled every statement is logged, event SELECT. This
> can
> lead to performance degradation and log polluting.
> I have added new setting to filter statements when transaction is
> sampled - log_transaction.
>
> Overview
> ===
> This parameter works very similar to log_statement, but it filters out
> statements only
> in sampled transaction. It has same values as log_statement, access
> rights (only superuser),
> setup in postgresql.conf or by superuser with SET.
> But by default it has value "all" in compliance with existing behaviour
> (to log every statement).
>
> Example usage
> ==
> Log every DDL, but only subset of other statements.
>
> postgresql.conf
>  > log_transaction = ddl
>  > log_statement = all
>  > log_transaction_sample_rate = 1
>  > log_statement_sample_rate = 0.3
>
> backend:
>  > BEGIN;
>  > CREATE TABLE t1(value text);
>  > INSERT INTO t1(value) VALUES ('hello'), ('world');
>  > SELECT * FROM t1;
>  > DROP TABLE t1;
>  > COMMIT;
>
> logfile:
>  > LOG:  duration: 6.465 ms  statement: create table t1(value text);
>  > LOG:  statement: insert into t1(value) values ('hello'), ('world');
>  > LOG:  duration: 6.457 ms  statement: drop table t1;
>
> As you can see CREATE and DROP were logged with duration (because it is
> DDL), but
> only INSERT was logged (without duration) - not SELECT.
>
> Testing
> ===
> All tests are passed - default configuration is compatible with existing
> behaviour.
> Honestly, I could not find any TAP/regress tests that would test logging
> - some
> tests only use logs to ensure test results.
> I did not find suitable place for such tests, so no tests provided
>
> Implementation details
> ===
> I have modified signature of check_log_duration function - this accepts
> List of
> statements that were executed. This is to decide whether we should log
> current statement if transaction is sampled.
> But this list can be empty, in that case we decide to log. NIL is passed
> only
> in fast path, PARSE and BIND functions - by default these are logged
> if transaction is sampled, so we are compliant with existing behaviour
> again.
> In first implementation version, there was boolean flag (instead of
> List), but
> it was replaced by List to defer determining (function call) whether it
> is worth logging.
>
> -
> Best regards, Sergey Solovev

Hi!

As I understand, the need of this GUC variable comes from fact, that
is the transaction is sampled for logging, all statements within this
tx are logged and this is not configurable now, correct?
Well, if this is the case, why should we add a new GUC? Maybe we
should use `log_statement` in this case too, so there is a bug, that
log_statement is honored not during tx sampling?


Also, tests are failing[1]


[1] https://cirrus-ci.com/task/5645711230894080

-- 
Best regards,
Kirill Reshke




Re: Vacuum statistics

2024-08-10 Thread Kirill Reshke
On Wed, 12 Jun 2024 at 11:38, Alena Rybakina  wrote:
>
> Hi!
>
> On 11.06.2024 16:09, Alena Rybakina wrote:
>
> On 08.06.2024 09:30, Alena Rybakina wrote:
>
> I am currently working on dividing this patch into three parts to simplify 
> the review process: one of them will contain code for collecting vacuum 
> statistics on tables, the second on indexes and the last on databases.
>
> I have divided the patch into three: the first patch contains code for the 
> functionality of collecting and storage for tables, the second one for 
> indexes and the last one for databases.
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
Hi!
Few suggestions on this patch-set

1)
> +{ oid => '4701',
> +  descr => 'pg_stats_vacuum_tables return stats values',
> +  proname => 'pg_stats_vacuum_tables', provolatile => 's', prorettype => 
> 'record',proisstrict => 'f',
> +  proretset => 't',
> +  proargtypes => 'oid oid',
> +  proallargtypes =>

During development, OIDs should be picked up from range 8000-.
Same for pg_stats_vacuum_database & pg_stats_vacuum_indexes

Also, why are these function naming schemes like
pg_stats_vacuum_*something*, not pg_stat_vacuum_*something*, like
pg_stat_replication etc?

2) In 0003:
> +  proargnames => 
> '{dboid,dboid,db_blks_read,db_blks_hit,total_blks_dirtied,total_blks_written,wal_records,wal_fpi,wal_bytes,blk_read_time,blk_write_time,delay_time,system_time,user_time,total_time,interrupts}',

Repeated dboid arg name is strange. Is it done this way to make
pg_stats_vacuum function call in more unified fashion? I don't see any
other place within postgresql core with similar approach, so I doubt
it is correct.

3) 0001 patch vacuum_tables_statistics test creates
statistic_vacuum_database1, but does not use it. 0003 do.
Also I'm not sure if these additional checks on the second database
adds much value. Can you justify this please?

Other places look more or less fine to me.
However, I'll maybe post some additional nit-picky comments on the
next patch version.

-- 
Best regards,
Kirill Reshke




Re: CSN snapshots in hot standby

2024-08-13 Thread Kirill Reshke
On Wed, 14 Aug 2024 at 01:13, Heikki Linnakangas  wrote:
>
> On 05/04/2024 13:49, Andrey M. Borodin wrote:
> >> On 5 Apr 2024, at 02:08, Kirill Reshke  wrote:
>
> Thanks for taking a look, Kirill!
>
> >> maybe we need some hooks here? Or maybe, we can take CSN here from 
> >> extension somehow.
> >
> > I really like the idea of CSN-provider-as-extension.
> > But it's very important to move on with CSN, at least on standby, to make 
> > CSN actually happen some day.
> > So, from my perspective, having LSN-as-CSN is already huge step forward.
>
> Yeah, I really don't want to expand the scope of this.
>
> Here's a new version. Rebased, and lots of comments updated.
>
> I added a tiny cache of the CSN lookups into SnapshotData, which can
> hold the values of 4 XIDs that are known to be visible to the snapshot,
> and 4 invisible XIDs. This is pretty arbitrary, but the idea is to have
> something very small to speed up the common cases that 1-2 XIDs are
> repeatedly looked up, without adding too much overhead.
>
>
> I did some performance testing of the visibility checks using these CSN
> snapshots. The tests run SELECTs with a SeqScan in a standby, over a
> table where all the rows have xmin/xmax values that are still
> in-progress in the primary.
>
> Three test scenarios:
>
> 1. large-xact: one large transaction inserted all the rows. All rows
> have the same XMIN, which is still in progress
>
> 2. many-subxacts: one large transaction inserted each row in a separate
> subtransaction. All rows have a different XMIN, but they're all
> subtransactions of the same top-level transaction. (This causes the
> subxids cache in the proc array to overflow)
>
> 3. few-subxacts: All rows are inserted, committed, and vacuum frozen.
> Then, using 10 in separate subtransactions, DELETE the rows, in an
> interleaved fashion. The XMAX values cycle like this "1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 1, 2, 3, 4, 5, ...". The point of this is that these
> sub-XIDs fit in the subxids cache in the procarray, but the pattern
> defeats the simple 4-element cache that I added.
>
> The test script I used is attached. I repeated it a few times with
> master and the patches here, and picked the fastest runs for each. Just
> eyeballing the results, there's about ~10% variance in these numbers.
> Smaller is better.
>
> Master:
>
> large-xact: 4.57732510566711
> many-subxacts: 18.6958119869232
> few-subxacts: 16.467698097229
>
> Patched:
>
> large-xact: 10.230381775
> many-subxacts: 11.6501438617706
> few-subxacts: 19.8457028865814
>
> With cache:
>
> large-xact: 3.68792295455933
> many-subxacts: 13.3662350177765
> few-subxacts: 21.4426419734955
>
> The 'large-xacts' results show that the CSN lookups are slower than the
> binary search on the 'xids' array. Not a surprise. The 4-element cache
> fixes the regression, which is also not a surprise.
>
> The 'many-subxacts' results show that the CSN lookups are faster than
> the current method in master, when the subxids cache has overflowed.
> That makes sense: on master, we always perform a lookup in pg_subtrans,
> if the suxids cache has overflowed, which is more or less the same
> overhead as the CSN lookup. But we avoid the binary search on the xids
> array after that.
>
> The 'few-subxacts' shows a regression, when the 4-element cache is not
> effective. I think that's acceptable, the CSN approach has many
> benefits, and I don't think this is a very common scenario. But if
> necessary, it could perhaps be alleviated with more caching, or by
> trying to compensate by optimizing elsewhere.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)

Thanks for the update.  I will try to find time for perf-testing this.
Firstly, random suggestions. Sorry for being too nit-picky

1) in 0002
> +/*
> + * Number of shared CSNLog buffers.
> + */
> +static Size
> +CSNLogShmemBuffers(void)
> +{
> + return Min(32, Max(16, NBuffers / 512));
> +}

Should we GUC this?

2) In 0002 CSNLogShmemInit:

> + //SlruPagePrecedesUnitTests(CsnlogCtl, SUBTRANS_XACTS_PER_PAGE);

remove this?

3) In 0002 InitCSNLogPage:

> + SimpleLruZeroPage(CsnlogCtl, pageno);
we can use ZeroCSNLogPage here. This will justify existance of this
function a little bit more.

4) In 0002:
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -27,7 +27,7 @@
>  * removed. This is achieved by using the replication slot mechanism.
>  *
>  * As the percentage of transactions modifying the catalog normally is fairly
> - * small in comparisons to ones only manipulating user data, we keep track of
> + * small in compari

Re: Incremental View Maintenance, take 2

2024-08-19 Thread Kirill Reshke
On Tue, 30 Jul 2024 at 10:24, Yugo NAGATA  wrote:
>
> Hi,
>
> On Tue, 30 Jul 2024 03:32:19 +0500
> Kirill Reshke  wrote:
>
> > On Sat, 27 Jul 2024 at 13:26, Kirill Reshke  wrote:
> > >
> > > Hi!
> > > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query
> > > use matview) feature, so i got interested in how it is implemented.
>
> Thank you so much for a lot of comments!
> I will respond to the comments soon.
>
> > >
> > > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
> > > >
> > > > I updated the patch to bump up the version numbers in psql and pg_dump 
> > > > codes
> > > > from 17 to 18.
> > >
> > > Few suggestions:
> > >
> > > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message
> > > should be fixed, there is "isimmv" in the last line.
> > > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch`
> > > goes after 0005 & 0004. Shoulndt we first implement feature server
> > > side, only when client (psql & pg_dump) side?
> > > 3) Can we provide regression tests for each function separately? Test
> > > for main feature in main patch, test for DISTINCT support in
> > > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset
> > > will be easier to review, and can be committed separelety.
> > > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer
> > > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After
> > > resolving issues manually, it does not compile, because
> > > 4b74ebf726d444ba820830cad986a1f92f724649 also removes
> > > save_userid/save_sec_context fields from ExecCreateTableAs.
> > >
> > > >   if (RelationIsIVM(matviewRel) && stmt->skipData)
> > > Now this function accepts skipData param.
> > >
> > > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this
> > > design discussed anywhere? I wonder if this is a necessity (only
> > > solution) or if there are alternatives.
> > > 6)
> > > What are the caveats of supporting some simple cases for aggregation
> > > funcs like in example?
> > > ```
> > > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT
> > > sum(j) + sum(i) from mv_base_a;
> > > ERROR:  expression containing an aggregate in it is not supported on
> > > incrementally maintainable materialized view
> > > ```
> > > I can see some difficulties with division CREATE IMMV  AS SELECT
> > > 1/sum(i) from mv_base_a;  (sum(i) == 0 case), but adding &
> > > multiplication should be ok, aren't they?
> > >
> > >
> > > Overall, patchset looks mature, however it is far from being
> > > committable due to lack of testing/feedback/discussion. There is only
> > > one way to fix this... Test and discuss it!
> > >
> > >
> > > [1] https://github.com/cloudberrydb/cloudberrydb
> >
> > Hi! Small update: I tried to run a regression test and all
> > IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I
> > will try to investigate.
> >
> > Another suggestion: support for \d and \d+ commands in psql. With v34
> > patchset applied, psql does not show anything IMMV-related in \d mode.
> >
> > ```
> > reshke=# \d m1
> >Materialized view "public.m1"
> >  Column |  Type   | Collation | Nullable | Default
> > +-+---+--+-
> >  i  | integer |   |  |
> > Distributed by: (i)
> >
> >
> > reshke=# \d+ m1
> >  Materialized view "public.m1"
> >  Column |  Type   | Collation | Nullable | Default | Storage |
> > Compression | Stats target | Description
> > +-+---+--+-+-+-+--+-
> >  i  | integer |   |  | | plain   |
> > |  |
> > View definition:
> >  SELECT t1.i
> >FROM t1;
> > Distributed by: (i)
> > Access method: heap
> >
> > ```
> >
> > Output should be 'Incrementally materialized view "public.m1"' IMO.
>
>
> --
> Yugo NAGATA 


So, I spent another 2 weeks on this patch. I have read the whole
'Incremental View Maintenance' thread (from 2018), this thread, some
related threads. Have studied some papers on this topic. I got a
bet

Re: Incremental View Maintenance, take 2

2024-08-19 Thread Kirill Reshke
On Tue, 20 Aug 2024 at 02:14, Kirill Reshke  wrote:

> == Other thoughts
>
> In OLAP databases (see [2]), IVM opens the door for 'view
> exploitation' feature. That is, use IVM (which is always up-to-date)
> for query execution. But current IVM implementation is not compatible
> with Cloudberry Append-optimized Table Access Method. The problem is
> the 'table_tuple_fetch_row_version' call, which is used by
> ivm_visible_in_prestate to check tuple visibility within a snapshot. I
> am trying to solve this somehow. My current idea is the following:
> multiple base table modification via single statement along with tuple
> deletion from base tables are features. We can error-out these cases
> (at M.V. creation time) all for some TAMs, and support only insert &
> truncate. However, I don't know how to check if TAM supports
> 'tuple_fetch_row_version' other than calling it and receiving
> ERROR[3].
>

I reread this and I find this a little bit unclear. What I'm proposing
here is specifying the type of operations IVM supports on creation
time. So, one can run

CREATE IVM immv1 WITH (support_deletion = true/false,
support_multiple_relation_change = true/false). Then, in the query
execution time, we just ERROR if the query leads to deletion from IVM
and support_deletion if false.


-- 
Best regards,
Kirill Reshke




Re: Incremental View Maintenance, take 2

2024-08-20 Thread Kirill Reshke
On Tue, 20 Aug 2024 at 02:14, Kirill Reshke  wrote:
>
>
> == Major suggestions.
>
> 1) At first glance, working with this IVM/IMMV infrastructure feels
> really unintuitive about what servers actually do for query execution.
> I do think It will be much better for user experience to add more
> EXPLAIN about IVM work done inside IVM triggers. This way it is much
> clearer which part is working slow, so which index should be created,
> etc.
>
> 2) The kernel code for IVM lacks possibility to be extended for
> further IVM optimizations. The one example is foreign key optimization
> described here[1]. I'm not saying we should implement this within this
> patchset, but we surely should pave the way for this. I don't have any
> good suggestions for how to do this though.
>
> 3) I don't really think SQL design is good. CREATE [INCREMENTAL] M.V.
> is too ad-hoc. I would prefer CREATE M.V. with (maintain_incr=true).
> (reloption name is just an example).
> This way we can change regular M.V. to IVM and vice versa via ALTER
> M.V. SET *reloptions* - a type of syntax that is already present in
> PostgreSQL core.
>

One little follow-up here. Why do we do prepstate visibility the way
it is done? Can we instead export the snapshot in BEFORE trigger, save
it somewhere and use it after?

-- 
Best regards,
Kirill Reshke




Re: Index AM API cleanup

2024-08-21 Thread Kirill Reshke
On Thu, 22 Aug 2024 at 00:25, Mark Dilger  wrote:
>
> Hackers,
>
> The index access method API mostly encapsulates the functionality of in-core 
> index types, with some lingering oversights and layering violations.  There 
> has been an ongoing discussion for several release cycles concerning how the 
> API might be improved to allow interesting additional functionality.  That 
> discussion has frequently included patch proposals to support peculiar needs 
> of esoteric bespoke access methods, which have little interest for the rest 
> of the community.
>
> For your consideration, here is a patch series that takes a different 
> approach.  It addresses many of the limitations and layering violations, 
> along with introducing test infrastructure to validate the changes.  Nothing 
> in this series is intended to introduce new functionality to the API.  Any 
> such, "wouldn't it be great if..." type suggestions for the API are out of 
> scope for this work.  On the other hand, this patch set does not purport to 
> fix all such problems; it merely moves the project in that direction.
>
> For validation purposes, the first patch creates shallow copies of hash and 
> btree named "xash" and "xtree" and introduces some infrastructure to run the 
> src/test/regress and src/test/isolation tests against them without needing to 
> duplicate those tests.  Numerous failures like "unexpected non-btree AM" can 
> be observed in the test results.
>
> Also for validation purposes, the second patch creates a deep copy of btree 
> named "treeb" which uses modified copies of the btree implementation rather 
> than using the btree implementation by reference.  This allows for more 
> experimentation, but might be more code than the community wants.  Since this 
> is broken into its own patch, it can be excluded from what eventually gets 
> committed.  Even if we knew a priori that this "treeb" test would surely 
> never be committed, it still serves to help anybody reviewing the patch 
> series to experiment with those other changes without having to construct 
> such a test index AM individually.
>
> The next twenty patches are a mix of fixes of various layering violations, 
> such as not allowing non-core index AMs from use in replica identity full, or 
> for speculative insertion, or for foreign key constraints, or as part of 
> merge join; with updates to the "treeb" code as needed.  The changes to 
> "treeb" are broken out so that they can also easily be excluded from whatever 
> gets committed.
>
> The final commit changes the ordering of the strategy numbers in treeb.  The 
> name "treeb" is a rotation of "btree", and indeed the strategy numbers 
> 1,2,3,4,5 are rotated to 5,1,2,3,4.  The fact that treeb indexes work 
> properly after this change is meant to demonstrate that the core changes have 
> been sufficient to address the prior dependency on btree strategy number 
> ordering.  Again, this doesn't need to be committed; it might only serve to 
> help reviewers in determining if the functional changes are correct.
>
> Not to harp on this too heavily, but please note that running the core 
> regression and isolation tests against xash, xtree, and treeb are known not 
> to pass.  That's the point.  But by the end of the patch series, the failures 
> are limited to EXPLAIN output changes; the query results themselves are 
> intended to be consistent with the expected test output.  To avoid breaking 
> `make check-world`, these test modules are not added to the test schedule.  
> They are also, at least for now, only useable from make, not from meson.
>
> Internal development versions 1..16 not included.  Andrew, Peter, and Alex 
> have all provided reviews internally and are cc'd here.  Patch by me.  Here 
> is v17 for the community:
>
>
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

-- 
Best regards,
Kirill Reshke




Re: Vacuum statistics

2024-08-21 Thread Kirill Reshke
On Thu, 22 Aug 2024 at 07:48, jian he  wrote:
>
> On Wed, Aug 21, 2024 at 6:37 AM Alena Rybakina
>  wrote:
> >
> > We check it there: "tabentry->vacuum_ext.type != type". Or were you talking 
> > about something else?
> >
> > On 19.08.2024 12:32, jian he wrote:
> >
> > in pg_stats_vacuum
> > if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
> > {
> > Oidrelid = PG_GETARG_OID(1);
> >
> > /* Load table statistics for specified database. */
> > if (OidIsValid(relid))
> > {
> > tabentry = fetch_dbstat_tabentry(dbid, relid);
> > if (tabentry == NULL || tabentry->vacuum_ext.type != type)
> > /* Table don't exists or isn't an heap relation. */
> > PG_RETURN_NULL();
> >
> > tuplestore_put_for_relation(relid, rsinfo, tabentry);
> > }
> > else
> > {
> >}
> >
> >
> > So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
> > it seems you didn't check "relid" 's relkind,
> > you may need to use get_rel_relkind.
> >
> > --
>
> hi.
> I mentioned some points at [1],
> Please check the attached patchset to address these issues.
>
> there are four occurrences of "CurrentDatabaseId", i am still confused
> with usage of CurrentDatabaseId.
>
> also please don't  top-post, otherwise the archive, like [2] is not
> easier to read for future readers.
> generally you quote first, then reply.
>
> [1] 
> https://postgr.es/m/CACJufxHb_YGCp=pvh6dzcpk9yml+sueffpearbx2lzxzvah...@mail.gmail.com
> [2] https://postgr.es/m/78394e29-a900-4af4-b5ce-d6eb2d263...@postgrespro.ru

Hi, your points are valid.
Regarding 0003, I also wanted to object database naming in a
regression test during my review but for some reason didn't.Now, as
soon as we already need to change it, I suggest we also change
regression_statistic_vacuum_db1 to something less generic. Maybe
regression_statistic_vacuum_db_unaffected.



-- 
Best regards,
Kirill Reshke




Better documentation for RIR term?

2024-08-21 Thread Kirill Reshke
I have had hard times understanding what RIR rules are while reading
some threads in pgsql-hackers. One example is 'Virtual generated
columns'. I was trying to grep 'RIR rules', when I just started to
read kernel sources. And I only today I finally bumped into this note
in rewriteHandler.c

```
* NOTES
* Some of the terms used in this file are of historic nature: "retrieve"
* was the PostQUEL keyword for what today is SELECT. "RIR" stands for
* "Retrieve-Instead-Retrieve", that is an ON SELECT DO INSTEAD SELECT rule
* (which has to be unconditional and where only one rule can exist on each
* relation).
*

```

Maybe I'm really bad at searching things and nobody before had
problems understanding what RIR stands for. If not, should we enhance
documentation in some way? If yes, what is the proper place?

-- 
Best regards,
Kirill Reshke




Re: optimize hashjoin

2024-08-22 Thread Kirill Reshke
On Thu, 22 Aug 2024 at 17:08, bucoo  wrote:

>
> > 0) The patch does not apply anymore, thanks to David committing a patch
>
> > yesterday. Attached is a patch rebased on top of current master.
>
> That patch is based on PG17. I have now rewritten it based on the master
> branch and added some comments.
>
>
> > 1) Wouldn't it be easier (and just as efficient) to use slots with
>
> > TTSOpsMinimalTuple, i.e. backed by a minimal tuple?
>
> Use diffent kind of slot, the ExecEvalExpr function will report an error.
>
>
> > 2) I find the naming of the new functions a bit confusing. We now have
> > the "original" functions working with slots, and then also functions
> > with "Tuple" working with tuples. Seems asymmetric.
>
> In net patch function name renamed to ExecHashTableInsertSlot and
> ExecHashTableInsertTuple,
>
> also ExecParallelHashTableInsertSlotCurrentBatch and
> ExecParallelHashTableInsertTupleCurrentBatch.
>
>
> > 3) The code in ExecHashJoinOuterGetTuple is hard to understand, it'd
> > very much benefit from some comments. I'm a bit unsure if the likely()
> > and unlikely() hints really help.
>
> In new patch added some comments.
>
> "Likely" and "unlikely" might offer only marginal help on some CPUs and
> might not be beneficial at all on other platforms (I think).
>
>
> > 4) Is the hj_outerTupleBuffer buffer really needed / effective? I'd bet
> > just using palloc() will work just as well, thanks to the AllocSet
> > caching etc.
>
> The hj_outerTupleBuffer avoid reform(materialize) tuple in
> non-TTSOpsMinimalTuple scenarios,
>
> see ExecForceStoreMinimalTuple. I added some comments in new patch.
>
>
> > Can you provide more information about the benchmark you did? What
> > hardware, what scale, PostgreSQL configuration, which of the 22 queries
> > are improved, etc.
> >
> > I ran TPC-H with 1GB and 10GB scales on two machines, and I see pretty
> > much no difference compared to master. However, it occurred to me the
> > patch only ever helps if we increase the number of batches during
> > execution, in which case we need to move tuples to the right batch.
>
> Only parallel HashJoin speed up to ~2x(all data cached in memory),
>
> not full query, include non-parallel HashJoin.
>
> non-parallel HashJoin only when batchs large then one will speed up,
>
> because this patch only optimize for read batchs tuples to memory.
>
> Hi

likely/unlikely usage can be justified via benchmark. Parallel HashJoin
speed up still also can be verified  via benchmark. Either benchmark script
or benchmark result, and it is better to provide both.


-- 
Best regards,
Kirill Reshke


Avoid logging enormously large messages

2024-08-23 Thread Kirill Reshke
So, for our production use, we want to achieve $subj. The problem is
not new [1], as well as solutions to it[2]. So, what is the problem?

On one hand, we cannot resolve this issue with some primitive log
chunk truncation mechanism inside core (the way it is done in
greenplum db for example[3]), because this will result in invalid json
logged in postgresql log file. Also, we maybe don't want to truncate
HINT or DETAIL (which should not be long). We only want to truncate
the query or its parameters list in case of extended proto. So, any
patch like this is probably unacceptable.

On the other hand, it is also unclear how to truncate long lines
inside emit_log_hook. I came up with this [4] module. This approach is
bad, because we modify core-owned data, so we break other
emit_log_hook hooks, which is a show stopper for this extension use.
The one detail we also should address is debug_query_string. We have
no API currently to tell the kernel to log debug_query_string
partially. One can set ErrorData hide_stmt field to true; but this way
we erase query string from logs, instead of logging to more than $N
chars (or multibyte sequence) from it.

So, I want to hear suggestions from the community, which way to
achieve $subj. The approach I have in mind right now is setting
something like field_max_length inside emit_log_hook. Then, use this
new setting in core in the send_message_to_server_log function.

[1] 
https://www.postgresql.org/message-id/CAM-gcbQqriv%3Dnr%3DYFFmp5ytgW7HbiftLBANFB9C0GwHMGDC0LA%40mail.gmail.com
[2] 
https://stackoverflow.com/questions/45992209/how-to-truncate-log-statement-in-postgres-9-6
[3] 
https://github.com/greenplum-db/gpdb-archive/blob/283eea57100c690cb05b672b14eef7d0382e4e16/src/backend/utils/error/elog.c#L3668-L3671
[4] https://github.com/pg-sharding/pg_log_trunc/blob/master/pg_log_trunc.c#L43
-- 
Best regards,
Kirill Reshke




[BUG?] WAL file archive leads to crash during startup

2024-08-28 Thread Kirill Reshke
, especially the `redo is not required`
part, which is in fact simply skipping recovery, when recovery is 100%
needed.


Run with asserts compiled:
```
reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/postgres --single -D ./db/
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  invoking
IpcMemoryCreate(size=149471232)
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  mmap(150994944) with
MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  cleaning up dynamic
shared memory control segment with ID 1601540488
2024-08-28 07:33:30.151 UTC [572905] DEBUG:  dynamic shared memory
system will support 674 segments
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  created dynamic shared
memory control segment 773415688 (26976 bytes)
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  InitPostgres
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  my backend ID is 1
2024-08-28 07:33:30.152 UTC [572905] LOG:  database system was
interrupted while in recovery at 2024-08-28 07:31:48 UTC
2024-08-28 07:33:30.152 UTC [572905] HINT:  This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  removing all temporary WAL segments
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  checkpoint record is at 0/11400268
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  redo record is at
0/10E49720; shutdown false
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  next transaction ID: 744;
next OID: 16392
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  oldest unfrozen
transaction ID: 716, in database 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  oldest MultiXactId: 1, in
database 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2024-08-28 07:33:30.170 UTC [572905] LOG:  database system was not
properly shut down; automatic recovery in progress
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  transaction ID wrap limit
is 2147484363, limited by database with OID 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  starting up replication slots
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  xmin required by slots:
data 0, catalog 0
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  starting up replication
origin progress state
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  didn't need to unlink
permanent stats file "pg_stat/pgstat.stat" - didn't exist
2024-08-28 07:33:30.195 UTC [572905] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2024-08-28 07:33:30.195 UTC [572905] DEBUG:  could not open file
"pg_wal/00010010": No such file or directory
2024-08-28 07:33:30.195 UTC [572905] LOG:  redo is not required
TRAP: FailedAssertion("!XLogRecPtrIsInvalid(RecPtr)", File:
"xlogreader.c", Line: 235, PID: 572905)
/usr/local/pgsql/bin/postgres(ExceptionalCondition+0x99)[0x562a1a4311f9]
/usr/local/pgsql/bin/postgres(+0x229d41)[0x562a1a05dd41]
/usr/local/pgsql/bin/postgres(FinishWalRecovery+0x79)[0x562a1a062519]
/usr/local/pgsql/bin/postgres(StartupXLOG+0x324)[0x562a1a0578e4]
/usr/local/pgsql/bin/postgres(InitPostgres+0x72a)[0x562a1a44344a]
/usr/local/pgsql/bin/postgres(PostgresMain+0xb1)[0x562a1a300261]
/usr/local/pgsql/bin/postgres(PostgresSingleUserMain+0xf1)[0x562a1a3024d1]
/usr/local/pgsql/bin/postgres(main+0x4f1)[0x562a19f918c1]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f12dc110d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f12dc110e40]
/usr/local/pgsql/bin/postgres(_start+0x25)[0x562a19f91925]
Aborted (core dumped)
```

Problem tracks down to commit
70e81861fadd9112fa2d425c762e163910a4ee52
We only observe this problem for PostgreSQL version 15. (15.6)

=== My understanding

So, xlogrecovery internals wrongly assumes that if no WAL record was
successfully fetched, then redo is not needed.

=== Proposed fix

Is as simply as attached. WFM, but this is probably not a correct way
to fix this.



-- 
Best regards,
Kirill Reshke


v1-0001-Fix.patch
Description: Binary data


Re: CSN snapshots in hot standby

2024-04-04 Thread Kirill Reshke
Hi,

On Thu, 4 Apr 2024 at 22:21, Heikki Linnakangas  wrote:

> You cannot run queries on a Hot Standby server until the standby has
> seen a running-xacts record. Furthermore if the subxids cache had
> overflowed, you also need to wait for those transactions to finish. That
> is usually not a problem, because we write a running-xacts record after
> each checkpoint, and most systems don't use so many subtransactions that
> the cache would overflow. Still, you can run into it if you're unlucky,
> and it's annoying when you do.
>
> It occurred to me that we could replace the known-assigned-xids
> machinery with CSN snapshots. We've talked about CSN snapshots many
> times in the past, and I think it would make sense on the primary too,
> but for starters, we could use it just during Hot Standby.
>
> With CSN-based snapshots, you don't have the limitation with the
> fixed-size known-assigned-xids array, and overflowed sub-XIDs are not a
> problem either. You can always enter Hot Standby and start accepting
> queries as soon as the standby is in a physically consistent state.
>
> I dusted up and rebased the last CSN patch that I found on the mailing
> list [1], and modified it so that it's only used during recovery. That
> makes some things simpler and less scary. There are no changes to how
> transaction commit happens in the primary, the CSN log is only kept
> up-to-date in the standby, when commit/abort records are replayed. The
> CSN of each transaction is the LSN of its commit record.
>
> The CSN approach is much simpler than the existing known-assigned-XIDs
> machinery, as you can see from "git diff --stat" with this patch:
>
>   32 files changed, 773 insertions(+), 1711 deletions(-)
>
> With CSN snapshots, we don't need the known-assigned-XIDs machinery, and
> we can get rid of the xact-assignment records altogether. We no longer
> need the running-xacts records for Hot Standby either, but I wasn't able
> to remove that because it's still used by logical replication, in
> snapbuild.c. I have a feeling that that could somehow be simplified too,
> but didn't look into it.
>
> This is obviously v18 material, so I'll park this at the July commitfest
> for now. There are a bunch of little FIXMEs in the code, and needs
> performance testing, but overall I was surprised how easy this was.
>
> (We ran into this issue particularly hard with Neon, because with Neon
> you don't need to perform WAL replay at standby startup. However, when
> you don't perform WAL replay, you don't get to see the running-xact
> record after the checkpoint either. If the primary is idle, it doesn't
> generate new running-xact records, and the standby cannot start Hot
> Standby until the next time something happens in the primary. It's
> always a potential problem with overflowed sub-XIDs cache, but the lack
> of WAL replay made it happen even when there are no subtransactions
> involved.)
>
> [1]
> https://www.postgresql.org/message-id/2020081009525213277261%40highgo.ca
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)


Great. I really like the idea of vanishing KnownAssignedXids instead of
optimizing it (if optimizations are even possible).

> + /*
> + * TODO: We must mark CSNLOG first
> + */
> + CSNLogSetCSN(xid, parsed->nsubxacts, parsed->subxacts, lsn);
> +

As far as I understand we simply use the current Wal Record LSN as its XID
CSN number. Ok.
This seems to work for standbys snapshots, but this patch may be really
useful for distributed postgresql solutions, that use CSN for working
with distributed database snapshot (across multiple shards). These
solutions need to set CSN to some other value (time from True time/ClockSI
or whatever).
So, maybe we need some hooks here? Or maybe, we can take CSN here from
extension somehow. For example, we can define
some interface and extend it. Does this sound reasonable for you?

Also, I attached a patch which adds some more todos.


v1-0001-Point-comments-needed-to-be-edited.patch
Description: Binary data


Re: Add last_commit_lsn to pg_stat_database

2024-04-08 Thread Kirill Reshke
Hi James,

There are some review in the thread that need to be addressed.
it seems that we need to mark this entry "Waiting on Author" and move to
the next CF [0].

Thanks

[0] https://commitfest.postgresql.org/47/4355/

On Sat, 10 Jun 2023 at 05:27, James Coleman  wrote:

> I've previously noted in "Add last commit LSN to
> pg_last_committed_xact()" [1] that it's not possible to monitor how
> many bytes of WAL behind a logical replication slot is (computing such
> is obviously trivial for physical slots) because the slot doesn't need
> to replicate beyond the last commit. In some cases it's possible for
> the current WAL location to be far beyond the last commit. A few
> examples:
>
> - An idle server with checkout_timeout at a lower value (so lots of
> empty WAL advance).
> - Very large transactions: particularly insidious because committing a
> 1 GB transaction after a small transaction may show almost zero time
> lag even though quite a bit of data needs to be processed and sent
> over the wire (so time to replay is significantly different from
> current lag).
> - A cluster with multiple databases complicates matters further,
> because while physical replication is cluster-wide, the LSNs that
> matter for logical replication are database specific.
>
> Since we don't expose the most recent commit's LSN there's no way to
> say "the WAL is currently 1250, the last commit happened at 1000, the
> slot has flushed up to 800, therefore there are at most 200 bytes
> replication needs to read through to catch up.
>
> In the aforementioned thread [1] I'd proposed a patch that added a SQL
> function pg_last_commit_lsn() to expose the most recent commit's LSN.
> Robert Haas didn't think the initial version's modifications to
> commit_ts made sense, and a subsequent approach adding the value to
> PGPROC didn't have strong objections, from what I can see, but it also
> didn't generate any enthusiasm.
>
> As I was thinking about how to improve things, I realized that this
> information (since it's for monitoring anyway) fits more naturally
> into the stats system. I'd originally thought of exposing it in
> pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> this is a flaw I hadn't considered in the original patch), so I think
> pg_stat_database is the correct location.
>
> I've attached a patch to track the latest commit's LSN in pg_stat_database.
>
> Regards,
> James Coleman
>
> 1:
> https://www.postgresql.org/message-id/flat/caaaqye9qbiau+j8rbun_jkbre-3hekluhfvvsyfsxqg0vql...@mail.gmail.com
>


Re: Allow non-superuser to cancel superuser tasks.

2024-04-09 Thread Kirill Reshke
Hi, thanks for looking into this.

On Tue, 9 Apr 2024 at 08:53, Michael Paquier  wrote:

> On Mon, Apr 08, 2024 at 05:42:05PM +, Leung, Anthony wrote:
> > Are you suggesting that we check if the backend is B_AUTOVAC in
> > pg_cancel/ terminate_backend? That seems a bit unclean to me since
> > pg_cancel_backend & pg_cancel_backend does not access to the
> > procNumber to check the type of the backend.
> >
> > IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the
> > errmsg / errdetail to not expose that the backend is an AV
> > worker. It'll also be helpful if you can suggest what errdetail we
> > should use here.
>
> The thing is that you cannot rely on a lookup of the backend type for
> the error information, or you open yourself to letting the caller of
> pg_cancel_backend or pg_terminate_backend know if a backend is
> controlled by a superuser or if a backend is an autovacuum worker.
>

Good catch. Thanks.  I think we need to update the error message to not
leak backend type info.

> The choice of pg_signal_autovacuum is a bit inconsistent, as well,
> because autovacuum workers operate like regular backends.  This name
> can also be confused with the autovacuum launcher.

Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
enough?


Re: [PoC/RFC] Multiple passwords, interval expirations

2024-04-10 Thread Kirill Reshke
Hi!

I'm interested in this feature presence in the PostgreSQL core. Will
try to provide valuable review/comments/suggestions and other help.

On Tue, 10 Oct 2023 at 16:17, Gurjeet Singh  wrote:
>
> > On Mon, Oct 9, 2023 at 2:31 AM Gurjeet Singh  wrote:
> > >
> > > Next steps:
> > > - Break the patch into a series of smaller patches.
> > > - Add TAP tests (test the ability to actually login with these passwords)
> > > - Add/update documentation
> > > - Add more regression tests
>
> Please see attached the v4 of the patchset that introduces the notion
> of named passwords slots, namely 'first' and 'second' passwords, and
> allows users to address each of these passwords separately for the
> purposes of adding, dropping, or assigning expiration times.
>
> Apart from the changes described by each patch's commit title, one
> significant change since v3 is that now (included in v4-0002...patch)
> it is not allowed for a role to have a mix of a types of passwords.
> When adding a password, the patch ensures that the password being
> added uses the same hashing algorithm (md5 or scram-sha-256) as the
> existing password, if any.  Having all passwords of the same type
> helps the server pick the corresponding authentication method during
> connection attempt.
>
> The v3 patch also had a few bugs that were exposed by cfbot's
> automatic run. All those bugs have now been fixed, and the latest run
> on the v4 branch [1] on my private Git repo shows a clean run [1].
>
> The list of patches, and their commit titles are as follows:
>
> > v4-0001-...patch Add new columns to pg_authid
> > v4-0002-...patch Update password verification infrastructure to handle two 
> > passwords
> > v4-0003-...patch Added SQL support for ALTER ROLE to manage two passwords
> > v4-0004-...patch Updated pg_dumpall to support exporting a role's second 
> > password
> > v4-0005-...patch Update system views pg_roles and pg_shadow
> > v4-0006-...patch Updated pg_authid catalog documentation
> > v4-0007-...patch Updated psql's describe-roles meta-command
> > v4-0008-...patch Added documentation for ALTER ROLE command
> > v4-0009-...patch Added TAP tests to prove that a role can use two passwords 
> > to login
> > v4-0010-...patch pgindent run
> > v4-0011-...patch Run pgperltidy on files changed by this patchset
>
> Running pgperltidy updated many perl files unrelated to this patch, so
> in the last patch I chose to include only the one perl file that is
> affected by this patchset.
>
> [1]: password_rollover_v4 (910f81be54)
> https://github.com/gurjeet/postgres/commits/password_rollover_v4
>
> [2]: https://cirrus-ci.com/build/4675613999497216
>
> Best regards,
> Gurjeet
> http://Gurje.et


Latest attachment does not apply to HEAD anymore.  I have rebased
them. While rebasing, a couple of minor changes were done:

1) Little correction in the `plain_crypt_verify` comment. IMO this
sounds a little better and comprehensible, is it?

> - * 'shadow_pass' is the user's correct password hash, as stored in
> - * pg_authid's rolpassword or rolsecondpassword.
> + * 'shadow_pass' is one of the user's correct password hashes, as stored in
> + * pg_authid's.

2) in v4-0004:

>/* note: rolconfig is dumped later */
> -   if (server_version >= 90600)
> +   if (server_version >= 17)
>printfPQExpBuffer(buf,
>  "SELECT oid, rolname, 
> rolsuper, rolinherit, "
>  "rolcreaterole, rolcreatedb, 
> "
> - "rolcanlogin, rolconnlimit, 
> rolpassword, "
> - "rolvaliduntil, 
> rolreplication, rolbypassrls, "
> + "rolcanlogin, rolconnlimit, 
> "
> + "rolpassword, 
> rolvaliduntil, "
> + "rolsecondpassword, 
> rolsecondvaliduntil, "
> + "rolreplication, 
> rolbypassrls, "
> + 
> "pg_catalog.shobj_description(oid, '%s') as rolcomment, "
> + "rolname = current_user AS 
> is_current_user "
> + "FROM %s "
> + "WHERE rolname !~ '^pg_' "
> + "ORDER BY 2", role_catalog, 
> role_catalog);
> +   else if (server_version >= 90600)
> +   printfPQExpBuffer(buf,
> + "SELECT oid, rolname, 
> rolsuper, rolinherit, "
> + "rolcreaterole, 
> rolcreatedb, "
> + "rolcanlogin, rolconnlimit, 
> "
> + "rolpassword, 
> rolvaliduntil, "
> +  

Re: psql: Greatly speed up "\d tablename" when not using regexes

2024-04-10 Thread Kirill Reshke
Hi


> Regex matching is obviously unnecessary when we're looking for an exact
> match. This checks for this (common) case and starts using plain
> equality in that case.

+1

> + appendPQExpBuffer(buf, "(%s OPERATOR(pg_catalog.=) ", namevar);
> + appendStringLiteralConn(buf, &namebuf.data[2], conn);
> + appendPQExpBuffer(buf, "\nOR %s OPERATOR(pg_catalog.=) ",
> +  altnamevar);
> + appendStringLiteralConn(buf, &namebuf.data[2], conn);
> + appendPQExpBufferStr(buf, ")\n");

Do we need to force Collaction here like in other branches?
if (PQserverVersion(conn) >= 12)
   appendPQExpBufferStr(buf, " COLLATE pg_catalog.default");




Re: psql: Greatly speed up "\d tablename" when not using regexes

2024-04-10 Thread Kirill Reshke
On Wed, 10 Apr 2024, 23:37 Jelte Fennema-Nio,  wrote:

> On Wed, 10 Apr 2024 at 20:21, Kirill Reshke 
> wrote:
> > Do we need to force Collaction here like in other branches?
> > if (PQserverVersion(conn) >= 12)
> >appendPQExpBufferStr(buf, " COLLATE pg_catalog.default");
>
> According to the commit and codecomment that introduced the COLLATE,
> it was specifically added for correct regex matching (e.g. \w). So I
> don't think it's necessary, and I'm pretty sure adding it will cause
> the index scan not to be used anymore.
>

Ok, thanks for the clarification. If all of this is actually true, and
patch is really does speedup, maybe we need to state this in the comments?

>


Re: Allow non-superuser to cancel superuser tasks.

2024-04-11 Thread Kirill Reshke
Posting updated version of this patch with comments above addressed.

1) pg_signal_autovacuum -> pg_signal_autovacuum_worker, as there seems
to be no objections to that.

2)
There are comments on how to write if statement:

> In core, do_autovacuum() is only called in a process without
> a role specified

> It's feeling more natural here to check that we have a superuser-owned
> backend first, and then do a lookup of the process type.

> I figured since there's no reason to rely on that behavior, we might as
> well do a bit of future-proofing in case autovacuum workers are ever not
> run as InvalidOid.

I have combined them into this:

if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId))
&& pgstat_get_backend_type(GetNumberFromPGProc(proc)) == B_AUTOVAC_WORKER)

This is both future-proofing and natural, I suppose. Downside of this
is double checking condition (!OidIsValid(proc->roleId) ||
superuser_arg(proc->roleId)), but i think that is ok for the sake of
simplicity.

3) pg_signal_autovacuum_worker Oid changed to random one: 8916

4)

> An invalid BackendType is not false, but B_INVALID.
fixed, thanks

5)

 There is pg_read_all_stats as well, so I don't see a big issue in
 requiring to be a member of this role as well for the sake of what's
 proposing here.
>>>
>>> Well, that tells you quite a bit more than just which PIDs correspond to
>>> autovacuum workers, but maybe that's good enough for now.
>>
>> That may be a good initial compromise, for now.

>Sounds good to me. I will update the documentation.

@Anthony if you feel that documentation update adds much value here,
please do. Given that we know autovacuum worker PIDs from
pg_stat_progress_vacuum, I don't know how to reflect something about
pg_stat_autovac_worker in doc, and if it is worth it.

6)
> + INJECTION_POINT("autovacuum-start");
> Perhaps autovacuum-worker-start is more suited here

fixed, thanks

7)

> +# Copyright (c) 2022-2024, PostgreSQL Global Development Group
> [...]
> +# Copyright (c) 2024-2024, PostgreSQL Global Development Group

> These need to be cleaned up.

> +# Makefile for src/test/recovery
> +#
> +# src/test/recovery/Makefile

> This is incorrect, twice.

Cleaned up, thanks!

8)

> Not sure that there is a huge point in checking after a role that
> holds pg_signal_backend.
Ok. Removed.

Then:

> +like($psql_err, qr/ERROR: permission denied to terminate ...
> Checking only the ERRROR, and not the DETAIL should be sufficient
> here.


After removing the pg_signal_backend test case we have only one place
where errors check is done. So, I think we should keep DETAIL here to
ensure detail is correct (it differs from regular backend case).

9)
> +# Test signaling for pg_signal_autovacuum role.
> This needs a better documentation:

Updated. Hope now the test documentation helps to understand it.

10)

> +ok($terminate_with_pg_signal_av == 0, "Terminating autovacuum worker should 
> succeed with pg_signal_autovacuum role");
> Is that enough for the validation?

Added:
ok($node->log_contains(qr/FATAL: terminating autovacuum process due to
administrator command/, $offset),
"Autovacuum terminates when role is granted with pg_signal_autovacuum_worker");

11) references to `passcheck` extension removed. errors messages rephrased.

12) injection_point_detach added.

13)
> + INSERT INTO tab_int SELECT * FROM generate_series(1, 100);
> A good chunk of the test would be spent on that, but you don't need
> that many tuples to trigger an autovacuum worker as the short naptime
> is able to do it. I would recommend to reduce that to a minimum.

+1
Single tuple works.

14)

v3 suffers from segfault:
2024-04-11 11:28:31.116 UTC [147437] 001_signal_autovacuum.pl LOG:
statement: SELECT pg_terminate_backend(147427);
2024-04-11 11:28:31.116 UTC [147427] FATAL:  terminating autovacuum
process due to administrator command
2024-04-11 11:28:31.116 UTC [147410] LOG:  server process (PID 147427)
was terminated by signal 11: Segmentation fault
2024-04-11 11:28:31.116 UTC [147410] LOG:  terminating any other
active server processes
2024-04-11 11:28:31.117 UTC [147410] LOG:  shutting down because
restart_after_crash is off
2024-04-11 11:28:31.121 UTC [147410] LOG:  database system is shut down

The test doesn't fail because pg_terminate_backend actually meets his
point: autovac is killed. But while dying, autovac also receives
segfault. Thats because of injections points:


(gdb) bt
#0  0x56361c3379ea in tas (lock=0x7fbcb9632224 ) at
../../../../src/include/storage/s_lock.h:228
#1  ConditionVariableCancelSleep () at condition_variable.c:238
#2  0x56361c337e4b in ConditionVariableBroadcast
(cv=0x7fbcb66f498c) at condition_variable.c:310
#3  0x56361c330a40 in CleanupProcSignalState (status=, arg=) at procsignal.c:240
#4  0x56361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276
#5  0x56361c3288fc in proc_exit_prepare (code=code@entry=1) at ipc.c:198
#6  0x56361c3289bf in proc_exit (code=code@entry=1) at ipc.

TerminateOtherDBBackends code comments inconsistency.

2024-04-11 Thread Kirill Reshke
Hi hackers!

While working on [0] i have noticed this comment in
TerminateOtherDBBackends function:

/*
* Check whether we have the necessary rights to terminate other
* sessions. We don't terminate any session until we ensure that we
* have rights on all the sessions to be terminated. These checks are
* the same as we do in pg_terminate_backend.
*
* In this case we don't raise some warnings - like "PID %d is not a
* PostgreSQL server process", because for us already finished session
* is not a problem.
*/

This statement is not true after 3a9b18b.
"These checks are the same as we do in pg_terminate_backend."

But the code is still correct, I assume... or not? In fact, we are
killing autovacuum workers which are working with a given database
(proc->roleId == 0), which is OK in that case. Are there any other
cases when proc->roleId == 0 but we should not be able to kill such a
process?


[0] 
https://www.postgresql.org/message-id/flat/CALdSSPiOLADNe1ZS-1dnQXWVXgGAC6%3D3TBKABu9C3_97YGOoMg%40mail.gmail.com#4e869bc4b6ad88afe70e4484ffd256e2




Re: Allow non-superuser to cancel superuser tasks.

2024-04-11 Thread Kirill Reshke
On Thu, 11 Apr 2024 at 19:07, Nathan Bossart  wrote:
>
> On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
> > Posting updated version of this patch with comments above addressed.
>
> I look for a commitfest entry for this one, but couldn't find it.  Would
> you mind either creating one

Done: https://commitfest.postgresql.org/48/4922/




Re: Allow non-superuser to cancel superuser tasks.

2024-04-12 Thread Kirill Reshke
On Thu, 11 Apr 2024 at 16:55, Kirill Reshke  wrote:

> 7)
>
> > +# Copyright (c) 2022-2024, PostgreSQL Global Development Group
> > [...]
> > +# Copyright (c) 2024-2024, PostgreSQL Global Development Group
>
> > These need to be cleaned up.
>
> > +# Makefile for src/test/recovery
> > +#
> > +# src/test/recovery/Makefile
>
> > This is incorrect, twice.
>
> Cleaned up, thanks!

Oh, wait, I did this wrong.

Should i use

+# Copyright (c) 2024-2024, PostgreSQL Global Development Group

(Like in src/test/signals/meson.build &
src/test/signals/t/001_signal_autovacuum.pl)
or

+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
(Like in src/test/signals/Makefile)

at the beginning of each added file?