Re: Comments on Custom RMGRs

2024-03-29 Thread Danil Anisimow
On Fri, Mar 22, 2024 at 2:02 AM Jeff Davis  wrote:
> On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> > [pgss_001.v1.patch] adds a custom resource manager to the
> > pg_stat_statements extension.
>
> Did you consider moving the logic for loading the initial contents from
> disk from pgss_shmem_startup to .rmgr_startup?

I tried it, but .rmgr_startup is not called if the system was shut down
cleanly.

> My biggest concern is that it might not be quite right for a table AM
> that has complex state that needs action to be taken at a slightly
> different time, e.g. right after CheckPointBuffers().

> Then again, the rmgr is a low-level API, and any extension using it
> should be prepared to adapt to changes. If it works for pgss, then we
> know it works for at least one thing, and we can always improve it
> later. For instance, we might call the hook several times and pass it a
> "phase" argument.

In [rmgr_003.v3.patch] I added a phase argument to RmgrCheckpoint().
Currently it is only called in two places: before and after
CheckPointBuffers().

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 3e2f1d4a23..5a1fbe8379 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -44,8 +44,8 @@
 
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
-	{ name, redo, desc, identify, startup, cleanup, mask, decode },
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
+	{ name, redo, desc, identify, startup, cleanup, mask, decode, checkpoint },
 
 RmgrData	RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
@@ -83,6 +83,25 @@ RmgrCleanup(void)
 	}
 }
 
+/*
+ * Checkpoint all resource managers.
+ *
+ * See CreateCheckPoint for details about flags.
+ * phase shows a position in which RmgrCheckpoint is called in CheckPointGuts.
+ */
+void
+RmgrCheckpoint(int flags, RmgrCheckpointPhase phase)
+{
+	for (int rmid = 0; rmid <= RM_MAX_ID; rmid++)
+	{
+		if (!RmgrIdExists(rmid))
+			continue;
+
+		if (RmgrTable[rmid].rm_checkpoint != NULL)
+			RmgrTable[rmid].rm_checkpoint(flags, phase);
+	}
+}
+
 /*
  * Emit ERROR when we encounter a record with an RmgrId we don't
  * recognize.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 20a5f86209..d7ecab6769 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7357,8 +7357,13 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointSUBTRANS();
 	CheckPointMultiXact();
 	CheckPointPredicate();
+
+	RmgrCheckpoint(flags, RMGR_CHECKPOINT_BEFORE_BUFFERS);
+
 	CheckPointBuffers(flags);
 
+	RmgrCheckpoint(flags, RMGR_CHECKPOINT_AFTER_BUFFERS);
+
 	/* Perform all queued up fsyncs */
 	TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
 	CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 22f7351fdc..11ae1e7af4 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -28,7 +28,7 @@
  * RmgrNames is an array of the built-in resource manager names, to make error
  * messages a bit nicer.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
   name,
 
 static const char *const RmgrNames[RM_MAX_ID + 1] = {
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 6b8c17bb4c..2bb5ba8c9f 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -32,7 +32,7 @@
 #include "storage/standbydefs.h"
 #include "utils/relmapper.h"
 
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
 	{ name, desc, identify},
 
 static const RmgrDescData RmgrDescTable[RM_N_BUILTIN_IDS] = {
diff --git a/src/include/access/rmgr.h b/src/include/access/rmgr.h
index 3b6a497e1b..34ddc0210c 100644
--- a/src/include/access/rmgr.h
+++ b/src/include/access/rmgr.h
@@ -19,7 +19,7 @@ typedef uint8 RmgrId;
  * Note: RM_MAX_ID must fit in RmgrId; widening that type will affect the XLOG
  * file format.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
 	symname,
 
 typedef enum RmgrIds
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 78e6b908c6..0b03cc69be 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -24,26 +24,26 @@
  * Changes to this list possibly need an XLOG_PAGE_MAGIC bump.

Re: Comments on Custom RMGRs

2024-03-21 Thread Danil Anisimow
On Fri, Mar 1, 2024 at 2:06 AM Jeff Davis  wrote:
> Added to March CF.
>
> I don't have an immediate use case in mind for this, so please drive
> that part of the discussion. I can't promise this for 17, but if the
> patch is simple enough and a quick consensus develops, then it's
> possible.

[pgss_001.v1.patch] adds a custom resource manager to the
pg_stat_statements extension. The proposed patch is not a complete solution
for pgss and may not work correctly with replication.

The 020_crash.pl test demonstrates server interruption by killing a
backend. Without rm_checkpoint hook, the server restores pgss stats only
after last CHECKPOINT. Data added to WAL before the checkpoint is not
restored.

The rm_checkpoint hook allows saving shared memory data to disk at each
checkpoint. However, for pg_stat_statements, it matters when the checkpoint
occurred. When the server shuts down, pgss deletes the temporary file of
query texts. In other cases, this is unacceptable.
To provide this capability, a flags parameter was added to the
rm_checkpoint hook. The changes are presented in [rmgr_003.v2.patch].

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 3e2f1d4a23..ad0a1d5134 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -44,8 +44,8 @@
 
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
-	{ name, redo, desc, identify, startup, cleanup, mask, decode },
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
+	{ name, redo, desc, identify, startup, cleanup, mask, decode, checkpoint },
 
 RmgrData	RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
@@ -83,6 +83,22 @@ RmgrCleanup(void)
 	}
 }
 
+/*
+ * Checkpoint all resource managers.
+ */
+void
+RmgrCheckpoint(int flags)
+{
+	for (int rmid = 0; rmid <= RM_MAX_ID; rmid++)
+	{
+		if (!RmgrIdExists(rmid))
+			continue;
+
+		if (RmgrTable[rmid].rm_checkpoint != NULL)
+			RmgrTable[rmid].rm_checkpoint(flags);
+	}
+}
+
 /*
  * Emit ERROR when we encounter a record with an RmgrId we don't
  * recognize.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 20a5f86209..d21bf8ae24 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7357,6 +7357,9 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointSUBTRANS();
 	CheckPointMultiXact();
 	CheckPointPredicate();
+
+	RmgrCheckpoint(flags);
+
 	CheckPointBuffers(flags);
 
 	/* Perform all queued up fsyncs */
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 22f7351fdc..11ae1e7af4 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -28,7 +28,7 @@
  * RmgrNames is an array of the built-in resource manager names, to make error
  * messages a bit nicer.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
   name,
 
 static const char *const RmgrNames[RM_MAX_ID + 1] = {
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 6b8c17bb4c..2bb5ba8c9f 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -32,7 +32,7 @@
 #include "storage/standbydefs.h"
 #include "utils/relmapper.h"
 
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
 	{ name, desc, identify},
 
 static const RmgrDescData RmgrDescTable[RM_N_BUILTIN_IDS] = {
diff --git a/src/include/access/rmgr.h b/src/include/access/rmgr.h
index 3b6a497e1b..34ddc0210c 100644
--- a/src/include/access/rmgr.h
+++ b/src/include/access/rmgr.h
@@ -19,7 +19,7 @@ typedef uint8 RmgrId;
  * Note: RM_MAX_ID must fit in RmgrId; widening that type will affect the XLOG
  * file format.
  */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,checkpoint) \
 	symname,
 
 typedef enum RmgrIds
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 78e6b908c6..0b03cc69be 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -24,26 +24,26 @@
  * Changes to this list possibly need an XLOG_PAGE_MAGIC bump.
  */
 
-/* symbol name, textual name, redo, desc, identify, startup, cleanup, mask, decode */
-PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL, xlog_decode)
-PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL, xact_decode)
-PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, smgr_identify, NULL, NULL, NULL, NULL)
-PG_RMGR(RM_CLOG_ID, "CLOG", 

Re: Comments on Custom RMGRs

2024-02-29 Thread Danil Anisimow
On Tue, Feb 27, 2024 at 2:56 AM Jeff Davis  wrote:
> Let's pick this discussion back up, then. Where should the hook go?
> Does it need to be broken into phases like resource owners? What
> guidance can we provide to extension authors to use it correctly?
>
> Simon's right that these things don't need to be 100% answered for
> every hook we add; but I agree with Andres and Robert that this could
> benefit from some more discussion about the details.
>
> The proposal calls the hook right after CheckPointPredicate() and
> before CheckPointBuffers(). Is that the right place for the use case
> you have in mind with pg_stat_statements?

Hello!

Answering your questions might take some time as I want to write a sample
patch for pg_stat_statements and make some tests.
What do you think about putting the patch to commitfest as it closing in a
few hours?

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com


Re: Comments on Custom RMGRs

2024-02-26 Thread Danil Anisimow
Hi,

The checkpoint hook looks very useful, especially for extensions that have
their own storage, like pg_stat_statements.
For example, we can keep work data in shared memory and save it only during
checkpoints.
When recovering, we need to read all the data from the disk and then repeat
the latest changes from the WAL.

On Mon, Feb 26, 2024 at 2:42 PM Simon Riggs 
wrote:
> On Fri, 13 May 2022 at 00:42, Andres Freund  wrote:
> >
> > On 2022-05-12 22:26:51 +0100, Simon Riggs wrote:
> > > On Thu, 12 May 2022 at 04:40, Andres Freund 
wrote:
> > > > I'm not happy with the idea of random code being executed in the
middle of
> > > > CheckPointGuts(), without any documentation of what is legal to do
at that
> > > > point.
> > >
> > > The "I'm not happy.." ship has already sailed with pluggable rmgrs.
> >
> > I don't agree. The ordering within a checkpoint is a lot more fragile
than
> > what you do in an individual redo routine.
>
> Example?
>
>
> > > Checkpoints exist for one purpose - as the starting place for
recovery.
> > >
> > > Why would we allow pluggable recovery without *also* allowing
> > > pluggable checkpoints?
> >
> > Because one can do a lot of stuff with just pluggable WAL records,
without
> > integrating into checkpoints?
> >
> > Note that I'm *not* against making checkpoint extensible - I just think
it
> > needs a good bit of design work around when the hook is called etc.
>
> When was any such work done previously for any other hook?? That isn't
needed.
>
> Checkpoints aren't complete until all data structures have
> checkpointed, so there are no problems from a partial checkpoint being
> written.
>
> As a result, the order of actions in CheckpointGuts() is mostly
> independent of each other. The SLRUs are all independent of each
> other, as is CheckPointBuffers().
>
> The use cases I'm trying to support aren't tricksy modifications of
> existing code, they are just entirely new data structures which are
> completely independent of other Postgres objects.
>
>
> > I definitely think it's too late in the cycle to add checkpoint
extensibility
> > now.
> >
> >
> > > > To actually be useful we'd likely need multiple calls to such an
rmgr
> > > > callback, with a parameter where in CheckPointGuts() we are. Right
now the
> > > > sequencing is explicit in CheckPointGuts(), but with the proposed
callback,
> > > > that'd not be the case anymore.
> > >
> > > It is useful without the extra complexity you mention.
> >
> > Shrug. The documentation work definitely is needed. Perhaps we can get
away
> > without multiple callbacks within a checkpoint, I think it'll become
more
> > apparent when writing information about the precise point in time the
> > checkpoint callback is called.
>
> You seem to be thinking in terms of modifying the existing actions in
> CheckpointGuts(). I don't care about that. Anybody that wishes to do
> that can work out the details of their actions.
>
> There is nothing to document, other than "don't do things that won't
> work". How can anyone enumerate all the things that wouldn't work??
>
> There is no list of caveats for any other hook. Why is it needed here?

There are easily reproducible issues where rm_checkpoint() throws an ERROR.
When it occurs at the end-of-recovery checkpoint, the server fails to start
with a message like this:
ERROR:  Test error
FATAL:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.

Even if we remove the broken extension from shared_preload_libraries, we
get the following message in the server log:
FATAL:  resource manager with ID 128 not registered
HINT:  Include the extension module that implements this resource manager
in shared_preload_libraries.

In both cases, with or without the extension in shared_preload_libraries,
the server cannot start.

This seems like a programmer's problem, but what should the user do after
receiving such messages?

Maybe it would be safer to use something like after_checkpoint_hook, which
would be called after the checkpoint is completed?
This is enough for some cases when we only need to save shared memory to
disk.

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com


Estimation rows of FULL JOIN

2023-12-20 Thread Danil Anisimow
Hi.

In some cases, the planner underestimates FULL JOIN.

Example:

postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM
generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2
ON t1.b = t2.b;
QUERY PLAN

---
 Hash Full Join  (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
   Hash Cond: (t1.b = t2.b)
   ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=36) (actual rows=10
loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=36) (actual
rows=10 loops=1)
 Planning Time: 0.067 ms
 Execution Time: 0.052 ms
(8 rows)

Are these simple changes enough to improve this situation?

diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
break;
case JOIN_FULL:
nrows = outer_rows * inner_rows * fkselec * jselec;
+   if (2 * nrows < outer_rows + inner_rows)
+   nrows = outer_rows + inner_rows - nrows;
if (nrows < outer_rows)
nrows = outer_rows;
if (nrows < inner_rows)

There is no error in the above case:

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2
ON t1.b = t2.b;
QUERY PLAN

---
 Hash Full Join  (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
   Hash Cond: (t1.b = t2.b)
   ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=36) (actual rows=10
loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=36) (actual
rows=10 loops=1)
 Planning Time: 0.069 ms
 Execution Time: 0.065 ms
(8 rows)


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-12-09 Thread Danil Anisimow
Hi!

I have looked at your patch and have a few questions.

110: static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
111: TupleTableSlot *myslot);
---
636: bool
637: SafeCopying(CopyFromState cstate, ExprContext *econtext,
TupleTableSlot *myslot)

Why is there no static keyword in the definition of the SafeCopying()
function, but it is in the function declaration.

675: MemoryContext cxt =
MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
676:
677: valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values,
myslot->tts_isnull);
678: tuple_is_valid = valid_row;
679:
680: if (valid_row)
681: sfcstate->safeBufferBytes += cstate->line_buf.len;
682:
683: CurrentMemoryContext = cxt;

Why are you using a direct assignment to CurrentMemoryContext instead of
using the MemoryContextSwitchTo function in the SafeCopying() routine?

1160: /* Standard copying with option "safe copying" enabled by
IGNORE_ERRORS. */
1161: if (!SafeCopying(cstate, econtext, myslot))
1162: break;

I checked with GDB that the CurrentMemoryContext changes when SafeCopying
returns. And the target context may be different each time you do a COPY in
psql.

1879: cstate->sfcstate->safe_cxt = AllocSetContextCreate(oldcontext,
1880: "Safe_context",
1881: ALLOCSET_DEFAULT_SIZES);

When you initialize the cstate->sfcstate structure, you create a
cstate->sfcstate->safe_cxt memory context that inherits from oldcontext.
Was it intended to use cstate->copycontext as the parent context here?

On Wed, Nov 2, 2022 at 11:46 AM Damir Belyalov  wrote:

> Updated the patch:
> - Optimized and simplified logic of IGNORE_ERRORS
> - Changed variable names to more understandable ones
> - Added an analogue of MAX_BUFFERED_BYTES for safe buffer
>
>
> Regards,
> Damir Belyalov
> Postgres Professional
>

Regards,
Daniil Anisimov
Postgres Professional