date:20240226

On Mon, Feb 26, 2024 at 05:00:13PM +0800, Japin Li wrote:
> On Mon, 26 Feb 2024 at 16:41, jian he  wrote:
>> obvious typo errors.

These would cause compilation failures.  Saying that, this is a very
nice cleanup, so I've fixed these and applied the patch after checking
that the one-one replacements were correct.

About 0002, I can't help but notice pg_enc2icu_tbl and
pg_enc2gettext_tb.  There are exceptions like MULE_INTERNAL, but is it
possible to do better?
--
Michael

signature.asc
Description: PGP signature

Re: A failure in t/001_rep_changes.pl

2024-02-26 Thread Kyotaro Horiguchi

At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C  wrote in 
> By any chance do you have the log files when this failure occurred, if
> so please share it.

In my understanding, within a single instance, no two proclists can
simultaneously share the same waitlink member of PGPROC.

On the other hand, a publisher uses two condition variables for slots
and WAL waiting, which work on the same PGPROC member cvWaitLink. I
suspect this issue arises from the configuration. However, although it
is unlikly related to this specific issue, a similar problem can arise
in instances that function both as logical publisher and physical
primary.

Regardless of this issue, I think we should provide separate waitlink
members for condition variables that can possibly be used
simultaneously.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Injection points: some tools to wait and wake

2024-02-26 Thread Andrey M. Borodin




> On 27 Feb 2024, at 04:29, Michael Paquier  wrote:
> 
> For
> example, the test just posted here does not rely on that:
> https://www.postgresql.org/message-id/zdyzya4yrnapw...@ip-10-97-1-34.eu-west-3.compute.internal

Instead, that test is scanning logs

+ # Note: $node_primary->wait_for_replay_catchup($node_standby) would be
+ # hanging here due to the injection point, so check the log instead.+
+ my $terminated = 0;
+ for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+ {
+ if ($node_standby->log_contains(
+ 'terminating process .* to release replication slot 
\"injection_activeslot\"', $logstart))
+ {
+ $terminated = 1;
+ last;
+ }
+ usleep(100_000);
+ }

But, AFAICS, the purpose is the same: wait until event happened.


Best regards, Andrey Borodin.

Re: Improve readability by using designated initializers when possible

On Mon, Feb 26, 2024 at 05:00:13PM +0800, Japin Li wrote:
> On Mon, 26 Feb 2024 at 16:41, jian he  wrote:
>> similarly, last entry, no need an extra comma?
>> also other places last array entry no need extra comma.
> 
> For last entry comma, see [1].
> 
> [1] 
> https://www.postgresql.org/message-id/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org

And also see commit 611806cd726f.  This makes the diffs more elegant
to the eye when adding new elements at the end of these arrays.
--
Michael


signature.asc
Description: PGP signature

Re: WIP Incremental JSON Parser

2024-02-26 Thread Andrew Dunstan




On 2024-02-26 Mo 19:20, Andrew Dunstan wrote:


On 2024-02-26 Mo 10:10, Jacob Champion wrote:

On Mon, Feb 26, 2024 at 7:08 AM Jacob Champion
 wrote:

As a brute force example of the latter, with the attached diff I get
test failures at chunk sizes 1, 2, 3, 4, 6, and 12.

But this time with the diff.



Ouch. I'll check it out. Thanks!





The good news is that the parser is doing fine - this issue was due to a 
thinko on my part in the test program that got triggered by the input 
file size being an exact multiple of the chunk size. I'll have a new 
patch set later this week.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Potential issue in ecpg-informix decimal converting functions

On Mon, Feb 26, 2024 at 12:28:51AM +0100, Daniel Gustafsson wrote:
> Yeah, I think this is for HEAD only, especially given the lack of complaints
> against backbranches.

Daniel, are you planning to look at that?  I haven't done any detailed
lookup, but would be happy to do so it that helps.
--
Michael


signature.asc
Description: PGP signature

Re: cleanup patches for dshash

On Mon, Feb 26, 2024 at 03:55:10PM -0600, Nathan Bossart wrote:
> Committed.

I noticed that I forgot to update a couple of comments.  While fixing
those, I discovered additional oversights that have been around since 2017.
I plan to commit this shortly. 

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 298b98552f7e5586a268de9aeaec533631a9f608 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 26 Feb 2024 22:33:41 -0600
Subject: [PATCH 1/1] Fix comments for the dshash_parameters struct.

I recently added a copy_function member to the dshash_parameters
struct, but I forgot to update a couple of comments that refer to
the function pointer members of this struct.  While fixing that, I
noticed that one comment refers to a tranche_name member that was
removed during development of dshash tables.  The same comment also
refers to non-arg variants of the function pointer members, but
those were removed shortly after dshash table support was first
committed.

Oversights in commits 8c0d7bafad, d7694fc148, and 42a1de3013.
---
 src/backend/lib/dshash.c |  2 +-
 src/include/lib/dshash.h | 14 ++
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index cc49b4ca51..ab30f29eee 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -202,7 +202,7 @@ static inline void copy_key(dshash_table *hash_table, void *dest,
  * Create a new hash table backed by the given dynamic shared area, with the
  * given parameters.  The returned object is allocated in backend-local memory
  * using the current MemoryContext.  'arg' will be passed through to the
- * compare and hash functions.
+ * compare, hash, and copy functions.
  */
 dshash_table *
 dshash_create(dsa_area *area, const dshash_parameters *params, void *arg)
diff --git a/src/include/lib/dshash.h b/src/include/lib/dshash.h
index 2ff1ba6c24..e26f3ef97f 100644
--- a/src/include/lib/dshash.h
+++ b/src/include/lib/dshash.h
@@ -43,15 +43,13 @@ typedef void (*dshash_copy_function) (void *dest, const void *src, size_t size,
 
 /*
  * The set of parameters needed to create or attach to a hash table.  The
- * members tranche_id and tranche_name do not need to be initialized when
- * attaching to an existing hash table.
+ * tranche_id does not need to be initialized when attaching to an existing
+ * hash table.
  *
- * Compare and hash functions must be supplied even when attaching, because we
- * can't safely share function pointers between backends in general.  Either
- * the arg variants or the non-arg variants should be supplied; the other
- * function pointers should be NULL.  If the arg variants are supplied then the
- * user data pointer supplied to the create and attach functions will be
- * passed to the hash and compare functions.
+ * Compare, hash, and copy functions must be supplied even when attaching,
+ * because we can't safely share function pointers between backends in general.
+ * The user data pointer supplied to the create and attach functions will be
+ * passed to these functions.
  */
 typedef struct dshash_parameters
 {
-- 
2.25.1

The const expression evaluation routine should always return a copy

2024-02-26 Thread Andrei Lepikhov


IMO, the routine eval_const_expressions_mutator contains some stale code:

case T_SubPlan:
case T_AlternativeSubPlan:
  /*
   * Return a SubPlan unchanged --- too late to do anything with it.
   *
   * XXX should we ereport() here instead?  Probably this routine
   * should never be invoked after SubPlan creation.
   */
   return node;

At least, this code could be achieved with estimate_expression_value(). 
So, we should fix the comment. But maybe we need to do a bit more. 
According to the mutator idea, only the Query node is returned 
unchanged. If the Subplan node is on top of the expression, the call 
above returns the same node, which may be unconventional.
I'm not totally sure about the impossibility of constantifying SubPlan: 
we already have InitPlans for uncorrelated subplans. Maybe something 
about parameters that can be estimated as constants at this level and, 
as a result, allow to return a Const instead of SubPlan?
But at least we can return a flat copy of the SubPplan node just for the 
convention — the same thing for the AlternativeSubPlan. See the patch in 
the attachment.


--
regards,
Andrei Lepikhov
Postgres ProfessionalFrom a227e7e72d917726085001027c350d2fda2ad3c4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 27 Feb 2024 11:00:23 +0700
Subject: [PATCH] Force eval_const_expressions_mutator to return a copy of the
 node.

In some situations, when SubPlan or AlternativeSubPlan nodes are on the top of
the expression tree, this routine returns the same node, which could baffle
users who, remembering the mutator convention, wanted to free the evaluation
result immediately.

Author: a.rybakina.
---
 src/backend/optimizer/util/clauses.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index edc25d712e..f09c6c6420 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2907,15 +2907,28 @@ eval_const_expressions_mutator(Node *node,
}
 
case T_SubPlan:
+   {
+   SubPlan *subplan = makeNode(SubPlan);
+
+   memcpy(subplan, node, sizeof(SubPlan));
+
+   /*
+* Return a SubPlan unchanged. If the subplan had been 
uncorrelated
+* it already have been converted to an InitPlan.
+*/
+   return (Node *) subplan;
+   }
case T_AlternativeSubPlan:
+   {
+   AlternativeSubPlan *subplan = 
makeNode(AlternativeSubPlan);
+
+   memcpy(subplan, node, sizeof(AlternativeSubPlan));
 
/*
 * Return a SubPlan unchanged --- too late to do 
anything with it.
-*
-* XXX should we ereport() here instead?  Probably this 
routine
-* should never be invoked after SubPlan creation.
 */
-   return node;
+   return (Node *) subplan;
+   }
case T_RelabelType:
{
RelabelType *relabel = (RelabelType *) node;
-- 
2.43.2

Re: Streaming I/O, vectored I/O (WIP)

2024-02-26 Thread Robert Haas

On Tue, Feb 27, 2024 at 9:25 AM Thomas Munro  wrote:
> Here's the 2 step version.  The streaming_read.c API is unchanged, but
> the bugmfr.c API now has only the following extra functions:
>
>   bool StartReadBuffers(..., int *nblocks, ..., ReadBuffersOperation *op)
>   WaitReadBuffers(ReadBuffersOperation *op)

I wonder if there are semi-standard names that people use for this
kind of API. Somehow I like "start" and "finish" or "start" and
"complete" better than "start" and "wait". But I don't really know
what's best. If there's a usual practice, it'd be good to adhere to
it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Streaming I/O, vectored I/O (WIP)

2024-02-26 Thread Thomas Munro

On Wed, Feb 7, 2024 at 11:54 PM Nazir Bilal Yavuz  wrote:
> 0001-Provide-vectored-variant-of-ReadBuffer:
>
> - Do we need to pass the hit variable to ReadBuffer_common()? I think
> it can be just declared in the ReadBuffer_common() now.

Right, thanks!  Done, in the version I'll post shortly.

> 0003-Provide-API-for-streaming-reads-of-relations:
>
> - Do we need to re-think about getting a victim buffer logic?
> StrategyGetBuffer() function errors if it can not find any unpinned
> buffers, this can be more common in the async world since we pin
> buffers before completing the read (while looking ahead).

Hmm, well that is what the pin limit machinery is supposed to be the
solution to.  It has always been possible to see that error if
shared_buffers is too small, so that your backends can't pin what they
need to make progress because there are too many other backends, the
whole buffer pool is pinned and there is nothing available to
steal/evict.  Here, sure, we pin more stuff per backend, but not more
than a "fair share", that is, Buffers / max backends, so it's not any
worse, is it?  Well maybe it's marginally worse in some case, for
example if a query that uses many streams has one pinned buffer per
stream (which we always allow) where before we'd have acquired and
released pins in a slightly different sequence or whatever, but there
is always going to be a minimum shared_buffers that will work at all
for a given workload and we aren't changing it by much if at all here.
If you're anywhere near that limit, your performance must be so bad
that it'd only be a toy setting anyway.  Does that sound reasonable?

Note that this isn't the first to use multi-pin logic or that limit
mechanism: that was the new extension code that shipped in 16.  This
will do that more often, though.

> - If the returned block from the callback is an invalid block,
> pg_streaming_read_look_ahead() sets pgsr->finished = true. Could there
> be cases like the returned block being an invalid block but we should
> continue to read after this invalid block?

Yeah, I think there will be, and I think we should do it with some
kind of reset/restart function.  I don't think we need it for the
current users so I haven't included it yet (there is a maybe-related
discussion about reset for another reasons, I think Melanie has an
idea about that), but I think something like that will useful for
future stuff like streaming recovery, where you can run out of WAL to
read but more will come via the network soon.

> - max_pinned_buffers and pinned_buffers_trigger variables are set in
> the initialization part (in the
> pg_streaming_read_buffer_alloc_internal() function) then they do not
> change. In some cases there could be no acquirable buffers to pin
> while initializing the pgsr (LimitAdditionalPins() set
> max_pinned_buffers to 1) but while the read is continuing there could
> be chances to create larger reads (other consecutive reads are
> finished while this read is continuing). Do you think that trying to
> reset max_pinned_buffers and pinned_buffers_trigger to have higher
> values after the initialization to have larger reads make sense?

That sounds hard!  You're right that in the execution of a query there
might well be cases like that (inner and outer scan of a hash join
don't actually run at the same time, likewise for various other plan
shapes), and something that would magically and dynamically balance
resource usage might be ideal, but I don't know where to begin.
Concretely, as long as your buffer pool is measured in gigabytes and
your max backends is measured in hundreds, the per backend pin limit
should actually be fairly hard to hit anyway, as it would be in the
thousands.  So I don't think it is as important as other resource
usage balance problems that we also don't attempt (memory, CPU, I/O
bandwidth).

> +/* Is there a head range that we can't extend? */
> +head_range = >ranges[pgsr->head];
> +if (head_range->nblocks > 0 &&
> +(!need_complete ||
> + !head_range->need_complete ||
> + head_range->blocknum + head_range->nblocks != blocknum))
> +{
> +/* Yes, time to start building a new one. */
> +head_range = pg_streaming_read_new_range(pgsr);
>
> - I think if both need_complete and head_range->need_complete are
> false, we can extend the head range regardless of the consecutiveness
> of the blocks.

Yeah, I think we can experiment with ideas like that.  Not done yet
but I'm thinking about it -- more shortly.

> 0006-Allow-streaming-reads-to-ramp-up-in-size:
>
> - ramp_up_pin_limit variable is declared as an int but we do not check
> the overflow while doubling it. This could be a problem in longer
> reads.

But it can't get above very high, because eventually it exceeds
max_pinned_buffers, which is anchored to the ground by various small
limits.

Re: Fix incorrect PG_GETARG in pgcrypto

On Mon, Feb 26, 2024 at 02:47:27PM +0100, Tomas Vondra wrote:
> Should this be marked as committed, or is there some remaining part?

Thanks.  I've missed the existence of [1].  It is now marked as
committed. 

[1]: https://commitfest.postgresql.org/47/4822/
--
Michael


signature.asc
Description: PGP signature

Re: Printing backtrace of postgres processes

On Mon, Feb 26, 2024 at 04:05:05PM +0100, Christoph Berg wrote:
> I tried that now. Mind that I'm not a benchmarking expert, and there's
> been quite some jitter in the results, but I think there's a clear
> trend.
> 
> Even if we regard the 1873 as an outlier, I've seen many vanilla runs
> with 22xx tps, and not a single v28 run with 22xx tps. Other numbers I
> collected suggested a cost of at least 3% for the feature.

Thanks for the numbers.  Yes, that's annoying and I suspect could be
noticeable for a lot of users..
--
Michael


signature.asc
Description: PGP signature

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql

2024-02-26 Thread David G. Johnston

On Mon, Feb 26, 2024 at 6:54 PM Andy Fan  wrote:

>
> "David G. Johnston"  writes:
>
> > On Mon, Feb 26, 2024 at 5:46 PM Andy Fan  wrote:
> >
> >  > Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
> >  > error" messages when a %TYPE or %ROWTYPE construct references a
> >  > nonexistent object.  Here's a quick little finger exercise to try
> >  > to improve that.
> >
> >  Looks this modify the error message, I want to know how ould we treat
> >  error-message-compatible issue during minor / major upgrade.
> >
> > There is no bug here so no back-patch; and we are not yet past feature
> freeze for v17.
>
> Acutally I didn't asked about back-patch.


What else should I be understanding when you write the words "minor
upgrade"?


> So if the error message is changed, the above code may be broken.
>
>
A fair point to bring up, and is change-specific.  User-facing error
messages should be informative and where they are not changing them is
reasonable.  Runtime errors probably need more restraint since they are
more likely to be in a production monitoring alerting system but anything
that is reporting what amounts to a syntax error should be reasonable to
change and not expect people to be writing production code looking for
them.  This seems to fall firmly into the "badly written code"/syntax
category.

David J.

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql

"David G. Johnston"  writes:

> On Mon, Feb 26, 2024 at 5:46 PM Andy Fan  wrote:
>
>  > Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
>  > error" messages when a %TYPE or %ROWTYPE construct references a
>  > nonexistent object.  Here's a quick little finger exercise to try
>  > to improve that.
>
>  Looks this modify the error message, I want to know how ould we treat
>  error-message-compatible issue during minor / major upgrade.
>
> There is no bug here so no back-patch; and we are not yet past feature freeze 
> for v17.

Acutally I didn't asked about back-patch.  I meant error message is an
part of user interface, if we change a error message, the end
user may be impacted, at least in theory. for example, end-user has some
code like this:

String errMsg = ex.getErrorMsg();

if (errMsg.includes("a-target-string"))
{
// do sth.
}

So if the error message is changed, the above code may be broken.

I have little experience on this, so I want to know the policy we are
using, for the background which I said in previous reply.

-- 
Best Regards
Andy Fan

RE: Synchronizing slots from primary to standby

2024-02-26 Thread Zhijie Hou (Fujitsu)

On Monday, February 26, 2024 1:19 PM Amit Kapila  
wrote:
> 
> On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Feb 23, 2024 at 09:46:00AM +, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Besides, I'd like to clarify and discuss the behavior of 
> > > standby_slot_names
> once.
> > >
> > > As it stands in the patch, If the slots specified in
> > > standby_slot_names are dropped or invalidated, the logical walsender
> > > will issue a WARNING and continue to replicate the changes. Another
> > > option for this could be to have the walsender pause until the slot
> > > in standby_slot_names is re-created or becomes valid again. Does anyone
> else have an opinion on this matter ?
> >
> > Good point, I'd vote for: the only reasons not to wait are:
> >
> > - slots mentioned in standby_slot_names exist and valid and do catch
> > up or
> > - standby_slot_names is empty
> >
> > The reason is that setting standby_slot_names to a non empty value
> > means that one wants the walsender to wait until the standby catchup.
> > The way to remove this intentional behavior should be by changing the
> > standby_slot_names value (not the existence or the state of the slot(s) it
> points too).
> >
> 
> It seems we already do wait for the case when there is an inactive slot as 
> per the
> below code [1] in the patch. So, probably waiting in other cases is also okay 
> and
> also as this parameter is a SIGHUP parameter, users should be easily able to
> change its value if required. Do you think it is a good idea to mention this 
> in
> docs as well?
> 
> I think it is important to raise WARNING as the patch is doing in all the 
> cases
> where the slot is not being processed so that users can be notified and they 
> can
> take the required action.

Agreed. Here is the V99 patch which addressed the above.

This version also includes:
1. list_free the slot list when reloading the list due to GUC change.
2. Refactored the validate_standby_slots based on Shveta's suggestion.
3. Added errcode for the warnings as most of existing have errcodes.

Amit's latest comments[1] are pending, we will address that in next version.

[1] 
https://www.postgresql.org/message-id/CAA4eK1LJdmGATWG%3DxOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg%40mail.gmail.com

Best Regards,
Hou zj



v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
Description:  v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch


v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
Description:  v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2024-02-26 Thread Xing Guo

On Tue, Feb 27, 2024 at 6:38 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Xing Guo (higuox...@gmail.com) wrote:
> > Looks that many hackers are happy with the original patch except that
> > the original patch needs a simple rebase, though it has been 3 years.
>
> I'm not completely against the idea of adding these hooks, but I have to
> say that it's unfortunate to imagine having to use an extension for
> something like quotas as it's really something that would ideally be in
> core.
>
> > Shall we push forward this patch so that it can benefit extensions not
> > only diskquota?
>
> Would be great to hear about other use-cases for these hooks, which
> would also help us be comfortable that these are the right hooks to
> introduce with the correct arguments.  What are the other extensions
> that you're referring to here..?

Sorry, we don't have another concrete extension that utilizes the
smgrcreate / smgrtruncate / smgrdounlinkall hooks right now. But we
have a transparent data encryption extension that utilizes the
smgrread and smgrwrite hooks to mutate the read buffer and write
buffer in place to perform file level encryption and decryption. I
think smgrwrite / smgrread / smgrcreate / smgrtruncate and
smgrdounlinkall hooks fall into the same catagory and these hooks are
different from the recent proposed storage manager API hooks[1].
Probably it is worth starting a new thread and discussing them.

[1]: 
https://www.postgresql.org/message-id/flat/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com

> Thanks,
>
> Stephen

Best Regards,
Xing

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql

"David G. Johnston"  writes:
> On Mon, Feb 26, 2024 at 5:46 PM Andy Fan  wrote:
>> Looks this modify the error message,

Well, yeah, that's sort of the point.

>> I want to know how ould we treat
>> error-message-compatible issue during minor / major upgrade.

> There is no bug here so no back-patch; and we are not yet past feature
> freeze for v17.

Indeed, I did not intend this for back-patch.  However, I'm having
a hard time seeing the errors in question as something you'd have
automated handling for, so I don't grasp why you would think
there's a compatibility hazard.

regards, tom lane

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql

2024-02-26 Thread David G. Johnston

On Mon, Feb 26, 2024 at 5:46 PM Andy Fan  wrote:

> > Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
> > error" messages when a %TYPE or %ROWTYPE construct references a
> > nonexistent object.  Here's a quick little finger exercise to try
> > to improve that.
>
> Looks this modify the error message, I want to know how ould we treat
> error-message-compatible issue during minor / major upgrade.
>

There is no bug here so no back-patch; and we are not yet past feature
freeze for v17.

David J.

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql




> Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
> error" messages when a %TYPE or %ROWTYPE construct references a
> nonexistent object.  Here's a quick little finger exercise to try
> to improve that.

Looks this modify the error message, I want to know how ould we treat
error-message-compatible issue during minor / major upgrade. I'm not
sure if my question is too inconceivable, I ask this because one of my
patch [1] has blocked on this kind of issue [only] for 2 months and
actaully the error-message-compatible requirement was metioned by me at
the first and resolve it by adding a odd parameter. Then the odd
parameter blocked the whole process. 

[1] https://www.postgresql.org/message-id/87r0hmvuvr@163.com

-- 
Best Regards
Andy Fan

Re: WIP Incremental JSON Parser

2024-02-26 Thread Andrew Dunstan




On 2024-02-26 Mo 10:10, Jacob Champion wrote:

On Mon, Feb 26, 2024 at 7:08 AM Jacob Champion
 wrote:

As a brute force example of the latter, with the attached diff I get
test failures at chunk sizes 1, 2, 3, 4, 6, and 12.

But this time with the diff.



Ouch. I'll check it out. Thanks!


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Injection points: some tools to wait and wake

On Mon, Feb 26, 2024 at 02:10:49PM +0500, Andrey M. Borodin wrote:
> So that we could do something like
> 
> ok(node_standby->await_injection_point(“CreateRestartPoint”,”checkpointer"));

It would be more flexible with a full string to describe the test
rather than a process name in the second argument.

> IMO, this could make many tests cleaner.
> Or, perhaps, it’s a functionality for a future development?

This could just happen as separate refactoring, I guess.  But I'd wait
to see if more tests requiring scans to pg_stat_activity pop up.  For
example, the test just posted here does not rely on that:
https://www.postgresql.org/message-id/zdyzya4yrnapw...@ip-10-97-1-34.eu-west-3.compute.internal
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] updates to docs about HOT updates for BRIN

2024-02-26 Thread Stephen Frost

Greetings,

* Elizabeth Christensen (elizabeth.christen...@crunchydata.com) wrote:
> > On Feb 26, 2024, at 4:21 PM, Stephen Frost  wrote:
> > * Elizabeth Christensen (elizabeth.christen...@crunchydata.com) wrote:
> >> I have a small documentation patch to the HOT updates page
> >> to add references
> >> to summary (BRIN) indexes not blocking HOT updates
> >> ,
> >> committed 19d8e2308b.
> > 
> > Sounds good to me, though the attached patch didn't want to apply, and
> > it isn't immediately clear to me why, but perhaps you could try saving
> > the patch from a different editor and re-sending it?
> 
> Thanks Stephen, attempt #2 here. 

Here's an updated patch which tries to improve on the wording a bit by
having it be a bit more consistent.  Would certainly welcome feedback on
it though, of course.  This is a tricky bit of language to write and
a complex optimization to explain.

Thanks!

Stephen
From bb88237d1f807572a496ff2a267ab56bef61ac8e Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 26 Feb 2024 17:17:54 -0500
Subject: [PATCH] docs: Update HOT update docs for 19d8e2308b

Commit 19d8e2308b changed when the HOT update optimization is possible
but neglected to update the Heap-Only Tuples (HOT) documentation.  This
patch updates that documentation accordingly.

Author: Elizabeth Christensen
Reviewed-By: Stephen Frost
Backpatch-through: 16
Discussion: https://postgr.es/m/CABoUFXRjisr58Ct_3VsFEdQx+fJeQTWTdJnM7XAp=8mubto...@mail.gmail.com
---
 doc/src/sgml/storage.sgml | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3ea4e5526d..d6c53a8d25 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -1097,8 +1097,11 @@ data. Empty in ordinary tables.
   

 
- The update does not modify any columns referenced by the table's
- indexes, including expression and partial indexes.
+ The update only modifies columns which are not referenced by the
+ table's indexes or are only referenced by summarizing indexes (such
+ as BRIN) and does not update any columns
+ referenced by the table's non-summarizing indexes, including
+ expression and partial non-summarizing indexes.
  


@@ -1114,7 +1117,8 @@ data. Empty in ordinary tables.
   

 
- New index entries are not needed to represent updated rows.
+ New index entries are not needed to represent updated rows, however,
+ summary indexes may still need to be updated.
 


@@ -1130,11 +1134,10 @@ data. Empty in ordinary tables.
  
 
  
-  In summary, heap-only tuple updates can only be created
-  if columns used by indexes are not updated.  You can
-  increase the likelihood of sufficient page space for
-  HOT updates by decreasing a table's fillfactor.
+  In summary, heap-only tuple updates can only be created if columns used by
+  non-summarizing indexes are not updated. You can increase the likelihood of
+  sufficient page space for HOT updates by decreasing
+  a table's fillfactor.
   If you don't, HOT updates will still happen because
   new rows will naturally migrate to new pages and existing pages with
   sufficient free space for new row versions.  The system view 

signature.asc
Description: PGP signature

Re: Sequence Access Methods, round two

On Mon, Feb 26, 2024 at 09:38:06AM +0100, Matthias van de Meent wrote:
> On Mon, 26 Feb 2024 at 09:11, Michael Paquier  wrote:
>> Thanks, I've applied these two for now.  I'll reply to the rest
>> tomorrow or so.
> 
> Huh, that's surprising to me. I'd expected this to get at least a
> final set of patches before they'd get committed.

FWIW, these refactoring pieces just make sense taken independently,
IMHO.  I don't think that the rest of the patch set is going to make
it into v17, because there's no agreement about the layer we want,
which depend on the use cases we want to solve.  Perhaps 0001 or 0004
could be salvaged.  0005~ had no real design discussion, so it's good
for 18~ as far as I am concerned.  That's something that would be fit
for an unconference session at the next pgconf in Vancouver, in
combination with what we should do to support sequences across logical
replication setups.

> After a quick check
> 6e951bf seems fine, but I do have some nits on 449e798c:

Thanks.

>> +/* 
>> + *validate_relation_kind - check the relation's kind
>> + *
>> + *Make sure relkind is from an index
> 
> Shouldn't this be "... from a sequence"?

Right, will fix.

>> + * 
>> + */
>> +static inline void
>> +validate_relation_kind(Relation r)
> 
> Shouldn't this be a bit more descriptive than just
> "validate_relation_kind"? I notice this is no different from how this
> is handled in index.c and table.c, but I'm not a huge fan of shadowing
> names, even with static inlines functions.

Not sure that it matters much, TBH.  This is local to sequence.c.

>> -ERROR:  "serialtest1" is not a sequence
>> +ERROR:  cannot open relation "serialtest1"
>> +DETAIL:  This operation is not supported for tables.
> 
> We seem to lose some details here: We can most definitely open tables.
> We just can't open them while treating them as sequences, which is not
> mentioned in the error message.

I am not sure to agree with that.  The user already knows that he
should be dealing with a sequence based on the DDL used, and we gain
information about the relkind getting manipulated here.
--
Michael

signature.asc
Description: PGP signature

Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-02-26 Thread Thomas Munro

[Sorry to those who received this message twice -- the first time got
bounced by the list because of a defunct email address in the CC
list.]

Here is a rebase of Palak's v2 patch.  I didn't change anything except
for the required resource manager API change, a pgindent run, and
removal of a stray file, and there is still some feedback to be
addressed before we can get this in, but I wanted to fix the bitrot
and re-open this CF item because this is very useful work.  It's
essential for testing the prefetching-related stuff happening in
various other threads, where you want to be able to get the buffer
pool into various interesting states.
From 911776e07a767e8775f9e43948f4145f72d1de65 Mon Sep 17 00:00:00 2001
From: Palak 
Date: Fri, 30 Jun 2023 08:21:06 +
Subject: [PATCH v3] Invalidate Buffer By Bufnum

---
 contrib/pg_buffercache/Makefile   |  2 +-
 contrib/pg_buffercache/meson.build|  1 +
 .../pg_buffercache--1.4--1.5.sql  |  6 ++
 contrib/pg_buffercache/pg_buffercache.control |  2 +-
 contrib/pg_buffercache/pg_buffercache_pages.c | 35 +
 src/backend/storage/buffer/bufmgr.c   | 74 +++
 src/include/storage/bufmgr.h  |  2 +
 7 files changed, 120 insertions(+), 2 deletions(-)
 create mode 100644 contrib/pg_buffercache/pg_buffercache--1.4--1.5.sql

diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index d6b58d4da94..eae65ead9e5 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -8,7 +8,7 @@ OBJS = \
 EXTENSION = pg_buffercache
 DATA = pg_buffercache--1.2.sql pg_buffercache--1.2--1.3.sql \
 	pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \
-	pg_buffercache--1.3--1.4.sql
+	pg_buffercache--1.3--1.4.sql pg_buffercache--1.4--1.5.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 REGRESS = pg_buffercache
diff --git a/contrib/pg_buffercache/meson.build b/contrib/pg_buffercache/meson.build
index c86e33cc958..1ca3452918b 100644
--- a/contrib/pg_buffercache/meson.build
+++ b/contrib/pg_buffercache/meson.build
@@ -22,6 +22,7 @@ install_data(
   'pg_buffercache--1.2--1.3.sql',
   'pg_buffercache--1.2.sql',
   'pg_buffercache--1.3--1.4.sql',
+  'pg_buffercache--1.4--1.5.sql',
   'pg_buffercache.control',
   kwargs: contrib_data_args,
 )
diff --git a/contrib/pg_buffercache/pg_buffercache--1.4--1.5.sql b/contrib/pg_buffercache/pg_buffercache--1.4--1.5.sql
new file mode 100644
index 000..c96848c77d8
--- /dev/null
+++ b/contrib/pg_buffercache/pg_buffercache--1.4--1.5.sql
@@ -0,0 +1,6 @@
+\echo Use "ALTER EXTENSION pg_buffercache UPDATE TO '1.5'" to load this file. \quit
+
+CREATE FUNCTION pg_buffercache_invalidate(IN int, IN bool default true)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_buffercache_invalidate'
+LANGUAGE C PARALLEL SAFE;
diff --git a/contrib/pg_buffercache/pg_buffercache.control b/contrib/pg_buffercache/pg_buffercache.control
index a82ae5f9bb5..5ee875f77dd 100644
--- a/contrib/pg_buffercache/pg_buffercache.control
+++ b/contrib/pg_buffercache/pg_buffercache.control
@@ -1,5 +1,5 @@
 # pg_buffercache extension
 comment = 'examine the shared buffer cache'
-default_version = '1.4'
+default_version = '1.5'
 module_pathname = '$libdir/pg_buffercache'
 relocatable = true
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 33167323653..485cd988dd4 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -64,6 +64,41 @@ PG_FUNCTION_INFO_V1(pg_buffercache_pages);
 PG_FUNCTION_INFO_V1(pg_buffercache_summary);
 PG_FUNCTION_INFO_V1(pg_buffercache_usage_counts);
 
+PG_FUNCTION_INFO_V1(pg_buffercache_invalidate);
+Datum
+pg_buffercache_invalidate(PG_FUNCTION_ARGS)
+{
+	Buffer		bufnum;
+	bool		result;
+	bool		force;
+
+	if (PG_ARGISNULL(0))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("buffernum cannot be NULL")));
+	}
+
+	bufnum = PG_GETARG_INT32(0);
+	if (bufnum <= 0 || bufnum > NBuffers)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("buffernum is not valid")));
+
+	}
+
+	/*
+	 * Check whether to force invalidate the dirty buffer. The default value
+	 * of force is true.
+	 */
+
+	force = PG_GETARG_BOOL(1);
+
+	result = TryInvalidateBuffer(bufnum, force);
+	PG_RETURN_BOOL(result);
+}
+
 Datum
 pg_buffercache_pages(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index bdf89bbc4dc..1d0f185edbc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5705,3 +5705,77 @@ ResOwnerPrintBufferPin(Datum res)
 {
 	return DebugPrintBufferRefcount(DatumGetInt32(res));
 }
+
+/*
+ * Try Invalidating a buffer using bufnum.
+ * If the buffer is invalid, the function returns false.
+ * The function checks for dirty buffer and flushes the dirty buffer

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2024-02-26 Thread Stephen Frost

Greetings,

* Xing Guo (higuox...@gmail.com) wrote:
> Looks that many hackers are happy with the original patch except that
> the original patch needs a simple rebase, though it has been 3 years.

I'm not completely against the idea of adding these hooks, but I have to
say that it's unfortunate to imagine having to use an extension for
something like quotas as it's really something that would ideally be in
core.

> Shall we push forward this patch so that it can benefit extensions not
> only diskquota?

Would be great to hear about other use-cases for these hooks, which
would also help us be comfortable that these are the right hooks to
introduce with the correct arguments.  What are the other extensions
that you're referring to here..?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] updates to docs about HOT updates for BRIN

2024-02-26 Thread Elizabeth Christensen



> On Feb 26, 2024, at 4:21 PM, Stephen Frost  wrote:
> 
> Greetings,
> 
> * Elizabeth Christensen (elizabeth.christen...@crunchydata.com) wrote:
>> I have a small documentation patch to the HOT updates page
>> to add references
>> to summary (BRIN) indexes not blocking HOT updates
>> ,
>> committed 19d8e2308b.
> 
> Sounds good to me, though the attached patch didn't want to apply, and
> it isn't immediately clear to me why, but perhaps you could try saving
> the patch from a different editor and re-sending it?
> 
> Thanks,
> 
> Stephen

Thanks Stephen, attempt #2 here. 

Elizabeth



v2-0001-Adding-Summary-Index-Info-to-HOT-Update-Documentatio.patch
Description: Binary data

Re: [PATCH] updates to docs about HOT updates for BRIN

2024-02-26 Thread Stephen Frost

Greetings,

* Elizabeth Christensen (elizabeth.christen...@crunchydata.com) wrote:
> I have a small documentation patch to the HOT updates page
> to add references
> to summary (BRIN) indexes not blocking HOT updates
> ,
> committed 19d8e2308b.

Sounds good to me, though the attached patch didn't want to apply, and
it isn't immediately clear to me why, but perhaps you could try saving
the patch from a different editor and re-sending it?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: cleanup patches for dshash

On Fri, Feb 23, 2024 at 03:52:16PM -0600, Nathan Bossart wrote:
> If there are no objections, I plan to commit these patches early next week.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add publisher and subscriber to glossary documentation.

2024-02-26 Thread Andrew Atkinson

If there's a movement towards "node" to refer to the database which has the
Subscription object, then perhaps the documentation for

31.2. Subscription, Chapter 31. Logical Replication

should be updated as well, since it uses both the "database" and "node"
terms on the same page, and to me referring to the same thing (I could be
missing a subtlety).

See:

"The subscriber database..."

"A subscriber node may..."

Also, the word "database" in this sentence: "A subscription defines the
connection to another database" to me works, but I think using "node" there
could be more consistent if it’s referring to the server instance running
the database that holds the PUBLICATION. The connection string information
example later on the page shows "host" and "dbname" configured in the
CONNECTION value for the SUBSCRIPTION. This sentence seems like the use of
"database" in casual style to mean the "server instance" (or "node").

Also, the "The node where a subscription is defined". That one actually
feels to me like "The database where a subscription is defined", but then
that contradicts what I just said, and "node" is fine here but I think
"node" should be on the preceding sentence too.

Anyway, hopefully these examples show “node” and “database” are mixed and
perhaps others agree using one consistently might help the goals of the
docs.

Thanks!

On Mon, Feb 26, 2024 at 1:08 AM Peter Smith  wrote:

> Hi, the patch v4 LGTM.
>
> ==
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>
>
>

Re: Streaming read-ready sequential scan code

2024-02-26 Thread Melanie Plageman

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
 wrote:
>
> On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
>  wrote:
> >
> > There is an outstanding question about where to allocate the
> > PgStreamingRead object for sequential scans
>
> I've written three alternative implementations of the actual streaming
> read user for sequential scan which handle the question of where to
> allocate the streaming read object and how to handle changing scan
> direction in different ways.
>
> Option A) 
> https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
> - Allocates the streaming read object in initscan(). Since we do not
> know the scan direction at this time, if the scan ends up not being a
> forwards scan, the streaming read object must later be freed -- so
> this will sometimes allocate a streaming read object it never uses.
> - Only supports ForwardScanDirection and once the scan direction
> changes, streaming is never supported again -- even if we return to
> ForwardScanDirection
> - Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

- Melanie
From ee93bf41b41aefd6fe20c41be15d2fd5e5d75181 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 29 Jan 2024 11:50:01 -0500
Subject: [PATCH v2 2/5] Replace blocks with buffers in heapgettup control flow

Future commits will introduce the streaming read API and the sequential
scan streaming read API user. Streaming read API users implement a
callback which returns the next block to read. Sequential scans
previously looped through the blocks in the relation, synchronously
reading in a block and then processing it. An InvalidBlockNumber
returned by heapgettup_advance_block() meant that the relation was
exhausted and all blocks had been processed.

The streaming read API may exhaust the blocks in a relation (having read
all of them into buffers) before they have all been processed by the
sequential scan. As such, the sequential scan should continue processing
blocks until heapfetchbuf() returns InvalidBuffer.

Note that this commit does not implement the streaming read API user. It
simply restructures heapgettup() and heapgettup_pagemode() to use
buffers instead of blocks for control flow.

Not all sequential scans will support streaming reads. As such, this
code will remain for compatability even after sequential scans support
streaming reads.
---
 src/backend/access/heap/heapam.c | 79 ++--
 1 file changed, 35 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 449221da6ac..e0fe3d9c326 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -87,6 +87,9 @@ static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
 static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
  LockTupleMode mode, LockWaitPolicy wait_policy,
  bool *have_tuple_lock);
+static inline BlockNumber heapgettup_advance_block(HeapScanDesc scan,
+   BlockNumber block, ScanDirection dir);
+static inline BlockNumber heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir);
 static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 	  uint16 old_infomask2, TransactionId add_to_xmax,
 	  LockTupleMode mode, bool is_update,
@@ -463,14 +466,12 @@ heapbuildvis(TableScanDesc sscan)
 /*
  * heapfetchbuf - subroutine for heapgettup()
  *
- * This routine reads the specified block of the relation into a buffer and
- * returns with that pinned buffer saved in the scan descriptor.
+ * This routine reads the next block of the relation into a buffer and returns
+ * with that pinned buffer saved in the scan descriptor.
  */
 static inline void
-heapfetchbuf(HeapScanDesc scan, BlockNumber block)
+heapfetchbuf(HeapScanDesc scan, ScanDirection dir)
 {
-	Assert(block < scan->rs_nblocks);
-
 	/* release previous scan buffer, if any */
 	if (BufferIsValid(scan->rs_cbuf))
 	{
@@ -485,10 +486,19 @@ heapfetchbuf(HeapScanDesc scan, BlockNumber block)
 	 */
 	CHECK_FOR_INTERRUPTS();
 
-	/* read page using selected strategy */
-	scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM, block,
-	   RBM_NORMAL, scan->rs_strategy);
-	scan->rs_cblock = block;
+	if (!scan->rs_inited)
+	{
+		scan->rs_cblock = heapgettup_initial_block(scan, dir);
+		Assert(scan->rs_cblock != InvalidBlockNumber || !BufferIsValid(scan->rs_cbuf));
+		scan->rs_inited = true;
+	}
+	else
+		scan->rs_cblock = heapgettup_advance_block(scan, scan->rs_cblock, dir);
+
+	/* read block if valid */
+	if (BlockNumberIsValid(scan->rs_cblock))
+		scan->rs_cbuf =

Re: Detoasting optionally to make Explain-Analyze less misleading

2024-02-26 Thread stepan rutz


Hi Matthias, thanks for picking it up. I still believe this is valuable
to a lot of people out there. Thanks for dealing with my proposal.
Matthias, Tom, Tomas everyone.

Two (more or less) controversial remarks from side.

1. Actually serialization should be the default for "analyze" in
explain, as current analyze doesn't detoast and thus distorts the result
in extreme (but common) cases easily by many order of magnitude (see my
original video on that one). So current "explain analyze" only works for
some queries and since detoasting is really transparent, it is quite
something to leave detoasting out of explain analyze. This surprises
people all the time, since explain analyze suggests it "runs" the query
more realistically.

2. The bandwidth I computed in one of the previous versions of the patch
was certainly cluttering up the explain output and it is misleading yes,
but then again it performs a calculation people will now do in their
head. The "bandwidth" here is how much data your query gets out of
backend by means of the query and the deserialization. So of course if
you do id-lookups you get a single row and such querries do have a lower
data-retrieval bandwidth compared to bulk querries. However having some
measure of how fast data is delivered from the backend especially on
larger joins is still a good indicator of one aspect of a query.

Sorry for the remarks. Both are not really important, just restating my
points here. I understand the objections and reasons that speak against
both points and believe the current scope is just right.

/Stepan



On 26.02.24 20:30, Matthias van de Meent wrote:

Hi,

I've taken the liberty to update this patch, and register it in the
commitfest app to not lose track of progress [0].

The attached v8 patch measures scratch memory allocations (with MEMORY
option), total time spent in serialization (with TIMING on, measures
are inclusive of unseparated memcpy to the message buffer), and a
count of produced bytes plus the output format used (text or binary).
It's a light rework of the earlier 0007 patch, I've reused tests and
some infrastructure, while the implementation details and comments
have been updated significantly.

I think we can bikeshed on format and names, but overall I think the
patch is in a very decent shape.

Stepan, thank you for your earlier work, and feel free to check it out
or pick it up again if you want to; else I'll try to get this done.

Kind regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/47/4852/

Re: An improved README experience for PostgreSQL

2024-02-26 Thread Andrew Atkinson

Thanks for the feedback Nathan and Tom. Samay also suggested adding the
patch. I've added a .patch with the file for consideration.

On Mon, Feb 26, 2024 at 2:30 PM Tom Lane  wrote:

> Nathan Bossart  writes:
> > I think this would be nice.  If the Markdown version is reasonably
> readable
> > as plain-text, maybe we could avoid maintaining two READMEs files, too.
> > But overall, +1 to modernizing the README a bit.
>
> Per past track record, we change the top-level README only once every
> three years or so, so I doubt it'd be too painful to maintain two
> versions of it.
>
> Having said that, any proposal for this ought to be submitted as
> a patch, rather than expecting people to go digging around on
> some other repo.
>
> regards, tom lane
>


0001-Adds-modernized-README.md.patch
Description: Binary data

Re: An improved README experience for PostgreSQL

Nathan Bossart  writes:
> I think this would be nice.  If the Markdown version is reasonably readable
> as plain-text, maybe we could avoid maintaining two READMEs files, too.
> But overall, +1 to modernizing the README a bit.

Per past track record, we change the top-level README only once every
three years or so, so I doubt it'd be too painful to maintain two
versions of it.

Having said that, any proposal for this ought to be submitted as
a patch, rather than expecting people to go digging around on
some other repo.

regards, tom lane

[PATCH] updates to docs about HOT updates for BRIN

2024-02-26 Thread Elizabeth Christensen

Hello,

I have a small documentation patch to the HOT updates page
to add references
to summary (BRIN) indexes not blocking HOT updates
,
committed 19d8e2308b.

Thanks,
Elizabeth Christensen


0001-Adding-Summary-Index-Info-to-HOT-Update-Documentatio.patch
Description: Binary data

Re: Allow non-superuser to cancel superuser tasks.

2024-02-26 Thread Kirill Reshke

On Mon, 26 Feb 2024 at 20:10, Nathan Bossart 
wrote:

> On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
> > I see 2 possible ways to implement this. The first one is to have hool in
> > pg_signal_backend, and define a hook in extension which can do the thing.
> > The second one is to have a predefined role. Something like a
> > `pg_signal_autovacuum` role which can signal running autovac to cancel.
> But
> > I don't see how we can handle specific `application_name` with this
> > solution.
>
> pg_signal_autovacuum seems useful given commit 3a9b18b.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Thank you for your response.
Please find a patch attached.

In patch, pg_signal_autovacuum role with oid 6312 added. I grabbed oid from
unused_oids script output.
Also, tap tests for functionality added. I'm not sure where to place them,
so I placed them in a separate directory in `src/test/`
Seems that regression tests for this feature are not possible, am i right?
Also, I was thinking of pg_signal_autovacuum vs pg_signal_backend.
Should pg_signal_autovacuum have power of pg_signal_backend (implicity)? Or
should this role have such little scope...

v1-0001-Allow-non-superuser-to-cancel-superuser-tasks.patch
Description: Binary data

Re: libpq: PQfnumber overload for not null-terminated strings

Ivan Trofimov  writes:
>> If you need counted strings
>> for PQfnumber, wouldn't you need them for every single other
>> string-based API in libpq as well?

> No, not really.

> Thing is, out of all the functions listed in "34.3.2. Retrieving Query 
> Result Information" and "34.3.3. Retrieving Other Result Information" 
> PQfnumber is the only (well, technically also PQprint) one that takes a 
> string as an input.

I think that's a mighty myopic definition of which APIs would need
counted-string support if we were to make that a thing in libpq.
Just for starters, why are you only concerned with processing a
query result, and not with the functions needed to send the query?

> Right now as a library writer in a higher-level language I'm forced to 
> either
> * Sacrifice performance to ensure 'column_name' is null-terminated 
> (that's what some bindings in Rust do)

I'd go with that.  You would have a very hard time convincing me that
the per-query overhead that building a null-terminated string adds
is even measurable compared to the time needed to send, process, and
receive the query.

regards, tom lane

Re: Better error messages for %TYPE and %ROWTYPE in plpgsql

2024-02-26 Thread Pavel Stehule

po 26. 2. 2024 v 21:02 odesílatel Tom Lane  napsal:

> Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
> error" messages when a %TYPE or %ROWTYPE construct references a
> nonexistent object.  Here's a quick little finger exercise to try
> to improve that.
>
> The basic point is that plpgsql_parse_wordtype and friends are
> designed to return NULL rather than failing (at least when it's
> easy to do so), but that leaves the caller without enough info
> to deliver a good error message.  There is only one caller,
> and it has no use at all for this behavior, so let's just
> change those functions to throw appropriate errors.  Amusingly,
> plpgsql_parse_wordrowtype was already behaving that way, and
> plpgsql_parse_cwordrowtype did so in more cases than not,
> so we didn't even have a consistent "return NULL" story.
>
> Along the way I got rid of plpgsql_parse_cwordtype's restriction
> on what relkinds can be referenced.  I don't really see the
> point of that --- as long as the relation has the desired
> column, the column's type is surely well-defined.
>

+1

Pavel


> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/flat/88b574f4-cc08-46c5-826b-020849e5a356%40gelassene-pferde.biz
>
>

Better error messages for %TYPE and %ROWTYPE in plpgsql

Per recent discussion[1], plpgsql returns fairly unhelpful "syntax
error" messages when a %TYPE or %ROWTYPE construct references a
nonexistent object.  Here's a quick little finger exercise to try
to improve that.

The basic point is that plpgsql_parse_wordtype and friends are
designed to return NULL rather than failing (at least when it's
easy to do so), but that leaves the caller without enough info
to deliver a good error message.  There is only one caller,
and it has no use at all for this behavior, so let's just
change those functions to throw appropriate errors.  Amusingly,
plpgsql_parse_wordrowtype was already behaving that way, and
plpgsql_parse_cwordrowtype did so in more cases than not,
so we didn't even have a consistent "return NULL" story.

Along the way I got rid of plpgsql_parse_cwordtype's restriction
on what relkinds can be referenced.  I don't really see the
point of that --- as long as the relation has the desired
column, the column's type is surely well-defined.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/88b574f4-cc08-46c5-826b-020849e5a356%40gelassene-pferde.biz

diff --git a/src/pl/plpgsql/src/expected/plpgsql_misc.out b/src/pl/plpgsql/src/expected/plpgsql_misc.out
index faddba2511..a6511df08e 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_misc.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_misc.out
@@ -29,3 +29,39 @@ CREATE OR REPLACE PROCEDURE public.test2(IN x integer)
 BEGIN ATOMIC
  SELECT (x + 2);
 END
+-- Test %TYPE and %ROWTYPE error cases
+create table misc_table(f1 int);
+do $$ declare x foo%type; begin end $$;
+ERROR:  variable "foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x notice%type; begin end $$;  -- covers unreserved-keyword case
+ERROR:  variable "notice" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x foo.bar%type; begin end $$;
+ERROR:  relation "foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x foo.bar.baz%type; begin end $$;
+ERROR:  schema "foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x public.foo.bar%type; begin end $$;
+ERROR:  relation "public.foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x public.misc_table.zed%type; begin end $$;
+ERROR:  column "zed" of relation "misc_table" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x foo%rowtype; begin end $$;
+ERROR:  relation "foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x notice%rowtype; begin end $$;  -- covers unreserved-keyword case
+ERROR:  relation "notice" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x foo.bar%rowtype; begin end $$;
+ERROR:  schema "foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x foo.bar.baz%rowtype; begin end $$;
+ERROR:  cross-database references are not implemented: "foo.bar.baz"
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x public.foo%rowtype; begin end $$;
+ERROR:  relation "public.foo" does not exist
+CONTEXT:  compilation of PL/pgSQL function "inline_code_block" near line 1
+do $$ declare x public.misc_table%rowtype; begin end $$;
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index f18e8e3c64..f1bce708d6 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -1599,7 +1599,7 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3,
  * plpgsql_parse_wordtype	The scanner found word%TYPE. word should be
  *a pre-existing variable name.
  *
- * Returns datatype struct, or NULL if no match found for word.
+ * Returns datatype struct.  Throws error if no match found for word.
  * --
  */
 PLpgSQL_type *
@@ -1623,15 +1623,15 @@ plpgsql_parse_wordtype(char *ident)
 			case PLPGSQL_NSTYPE_REC:
 return ((PLpgSQL_rec *) (plpgsql_Datums[nse->itemno]))->datatype;
 			default:
-return NULL;
+break;
 		}
 	}
 
-	/*
-	 * Nothing found - up to now it's a word without any special meaning for
-	 * us.
-	 */
-	return NULL;
+	/* No match, complain */
+	ereport(ERROR,
+			(errcode(ERRCODE_UNDEFINED_OBJECT),
+			 errmsg("variable \"%s\" does not exist", ident)));
+	return NULL;/* keep compiler quiet */
 }
 
 
@@ -1639,7 +1639,8 @@ plpgsql_parse_wordtype(char *ident)
  * plpgsql_parse_cwordtype		Same lookup for compositeword%TYPE
  *
  * Here, we allow either a block-qualified variable name, or a reference
- * to a column of some table.
+ * to a column of some table.  (If we must throw error, we assume that the
+ * latter case was intended.)
  * --
  */

Re: Comments on Custom RMGRs

2024-02-26 Thread Jeff Davis

On Mon, 2024-02-26 at 23:29 +0700, Danil Anisimow wrote:
> Hi,
> 
> The checkpoint hook looks very useful, especially for extensions that
> have their own storage, like pg_stat_statements.
> For example, we can keep work data in shared memory and save it only
> during checkpoints.
> When recovering, we need to read all the data from the disk and then
> repeat the latest changes from the WAL.

Let's pick this discussion back up, then. Where should the hook go?
Does it need to be broken into phases like resource owners? What
guidance can we provide to extension authors to use it correctly?

Simon's right that these things don't need to be 100% answered for
every hook we add; but I agree with Andres and Robert that this could
benefit from some more discussion about the details.

The proposal calls the hook right after CheckPointPredicate() and
before CheckPointBuffers(). Is that the right place for the use case
you have in mind with pg_stat_statements?

Regards,
Jeff Davis

Re: Detoasting optionally to make Explain-Analyze less misleading

2024-02-26 Thread Matthias van de Meent

Hi,

I've taken the liberty to update this patch, and register it in the
commitfest app to not lose track of progress [0].

The attached v8 patch measures scratch memory allocations (with MEMORY
option), total time spent in serialization (with TIMING on, measures
are inclusive of unseparated memcpy to the message buffer), and a
count of produced bytes plus the output format used (text or binary).
It's a light rework of the earlier 0007 patch, I've reused tests and
some infrastructure, while the implementation details and comments
have been updated significantly.

I think we can bikeshed on format and names, but overall I think the
patch is in a very decent shape.

Stepan, thank you for your earlier work, and feel free to check it out
or pick it up again if you want to; else I'll try to get this done.

Kind regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/47/4852/


v8-0001-Explain-Add-SERIALIZE-option.patch
Description: Binary data

Re: libpq: PQfnumber overload for not null-terminated strings

2024-02-26 Thread Ivan Trofimov


Thanks for the quick reply.


If you need counted strings
for PQfnumber, wouldn't you need them for every single other
string-based API in libpq as well?


No, not really.

Thing is, out of all the functions listed in "34.3.2. Retrieving Query 
Result Information" and "34.3.3. Retrieving Other Result Information" 
PQfnumber is the only (well, technically also PQprint) one that takes a 
string as an input.


As far as I know PQfnumber is the only portable way to implement "give 
me the the value at this row with this 'column_name'", which is an 
essential feature for any kind of client-side parsing.
Right now as a library writer in a higher-level language I'm forced to 
either
* Sacrifice performance to ensure 'column_name' is null-terminated 
(that's what some bindings in Rust do)
* Sacrifice interface quality by requiring a null-terminated string, 
which is not necessary idiomatic (that's what we do)
* Sacrifice usability by requiring a user to guarantee that the 
'string_view' provided is null-terminated (that's what libpqxx does, for 
example)


I don't think it's _that_ big of a deal, but could it be QoL improvement 
nice enough to be worth of a tiny addition into libpq interface?

Re: Refactor SASL exchange in preparation for OAuth Bearer

2024-02-26 Thread Jacob Champion

On Fri, Feb 23, 2024 at 2:30 AM Daniel Gustafsson  wrote:
>
> The attached two patches are smaller refactorings to the SASL exchange and 
> init
> codepaths which are required for the OAuthbearer work [0].  Regardless of the
> future of that patchset, these refactorings are nice cleanups and can be
> considered in isolation.  Another goal is of course to reduce scope of the
> OAuth patchset to make it easier to review.

Thanks for pulling these out! They look good overall, just a few notes below.

In 0001:

> + * SASL_FAILED: The exchance has failed and the connection should be

s/exchance/exchange/

> - if (final && !done)
> + if (final && !(status == SASL_FAILED || status == SASL_COMPLETE))

Since there's not yet a SASL_ASYNC, I wonder if this would be more
readable if it were changed to
if (final && status == SASL_CONTINUE)
to match the if condition shortly after it.

In 0002, at the beginning of pg_SASL_init, the `password` variable now
has an uninitialized code path. The OAuth patchset initializes it to
NULL:

> +++ b/src/interfaces/libpq/fe-auth.c
> @@ -425,7 +425,7 @@ pg_SASL_init(PGconn *conn, int payloadlen)
> int initialresponselen;
> const char *selected_mechanism;
> PQExpBufferData mechanism_buf;
> -   char   *password;
> +   char   *password = NULL;
> SASLStatus  status;
>
> initPQExpBuffer(_buf);

I'll base the next version of the OAuth patchset on top of these.

Thanks!
--Jacob

Re: An improved README experience for PostgreSQL

On Mon, Feb 26, 2024 at 11:31:19AM -0600, Andrew Atkinson wrote:
> Hello Hackers. We’re proposing an improved README for PostgreSQL that
> includes more helpful links for prospective PostgreSQL contributors and has
> a nicer presentation.
> 
> Although development does not take place on GitHub or GitLab for
> PostgreSQL, many developers might view the PostgreSQL source code using one
> of those mirrors (I do myself). Since both support Markdown files, a
> Markdown version of the README (as README.md) gets presentational benefits
> that I think are helpful.

I think this would be nice.  If the Markdown version is reasonably readable
as plain-text, maybe we could avoid maintaining two READMEs files, too.
But overall, +1 to modernizing the README a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

RE: Popcount optimization using AVX512

2024-02-26 Thread Amonson, Paul D

Hello again,

This is now a blocking issue. I can find no reason for the failing behavior of 
the MSVC build. All other languages build fine in CI including the Mac. Since 
the master branch builds, I assume I changed something critical to linking, but 
I can't figure out what that would be. Can someone with Windows/MSVC experience 
help me?

* Code: https://github.com/paul-amonson/postgresql/tree/popcnt_patch
* CI build: https://cirrus-ci.com/task/4927666021728256

Thanks,
Paul

-Original Message-
From: Amonson, Paul D  
Sent: Wednesday, February 21, 2024 9:36 AM
To: Andres Freund 
Cc: Alvaro Herrera ; Shankaran, Akash 
; Nathan Bossart ; Noah 
Misch ; Tom Lane ; Matthias van de Meent 
; pgsql-hackers@lists.postgresql.org
Subject: RE: Popcount optimization using AVX512

Hi,

I am encountering a problem that I don't think I understand. I cannot get the 
MSVC build to link in CI. I added 2 files to the build, but the linker is 
complaining about the original pg_bitutils.c file is missing (specifically 
symbol 'pg_popcount'). To my knowledge my changes did not change linking for 
the offending file and I see the compiles for pg_bitutils.c in all 3 libs in 
the build. All other builds are compiling.

Any help on this issue would be greatly appreciated.

My fork is at https://github.com/paul-amonson/postgresql/tree/popcnt_patch and 
the CI build is at https://cirrus-ci.com/task/4927666021728256.

Thanks,
Paul

-Original Message-
From: Andres Freund 
Sent: Monday, February 12, 2024 12:37 PM
To: Amonson, Paul D 
Cc: Alvaro Herrera ; Shankaran, Akash 
; Nathan Bossart ; Noah 
Misch ; Tom Lane ; Matthias van de Meent 
; pgsql-hackers@lists.postgresql.org
Subject: Re: Popcount optimization using AVX512

Hi,

On 2024-02-12 20:14:06 +, Amonson, Paul D wrote:
> > > +# Check for header immintrin.h
> > ...
> > Do these all actually have to link?  Invoking the linker is slow.
> > I think you might be able to just use cc.has_header_symbol().
>
> I took this to mean the last of the 3 new blocks.

Yep.

> > Does this work with msvc?
>
> I think it will work but I have no way to validate it. I propose we remove 
> the AVX-512 popcount feature from MSVC builds. Sound ok?

CI [1], whould be able to test at least building. Including via cfbot, 
automatically run for each commitfest entry - you can see prior runs at [2].  
They run on Zen 3 epyc instances, so unfortunately runtime won't be tested.  If 
you look at [3], you can see that currently it doesn't seem to be considered 
supported at configure time:

...
[00:23:48.480] Checking if "__get_cpuid" : links: NO [00:23:48.480] Checking if 
"__cpuid" : links: YES ...
[00:23:48.492] Checking if "x86_64: popcntq instruction" compiles: NO ...

Unfortunately CI currently is configured to not upload the build logs if the 
build succeeds, so we don't have enough details to see why.

> > This will build all of pgport with the avx flags, which wouldn't be 
> > correct, I think? The compiler might inject automatic uses of avx512 in 
> > places, which would cause problems, no?
>
> This will take me some time to learn how to do this in meson. Any 
> pointers here would be helpful.

Should be fairly simple, add it to the replace_funcs_pos and add the relevant 
cflags to pgport_cflags, similar to how it's done for crc.

> > While you don't do the same for make, isn't even just using the avx512 for 
> > all of pg_bitutils.c broken for exactly that reson? That's why the existing 
> > code builds the files for various crc variants as their own file.
>
> I don't think its broken, nothing else in pg_bitutils.c will make use 
> of
> AVX-512

You can't really guarantee that compiler auto-vectorization won't decide to do 
so, no? I wouldn't call it likely, but it's also hard to be sure it won't 
happen at some point.

> If splitting still makes sense, I propose splitting into 3 files:  
> pg_bitutils.c (entry point +sw popcnt implementation), pg_popcnt_choose.c 
> (CPUID and xgetbv check) and pg_popcnt_x86_64_accel.c (64/512bit x86 
> implementations).
> I'm not an expert in meson, but splitting might add complexity to meson.build.
>
> Could you elaborate if there are other benefits to the split file approach?

It won't lead to SIGILLs ;)

Greetings,

Andres Freund

[1] https://github.com/postgres/postgres/blob/master/src/tools/ci/README
[2] 
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F47%2F4675
[3] https://cirrus-ci.com/task/5645112189911040

An improved README experience for PostgreSQL

2024-02-26 Thread Andrew Atkinson

Hello Hackers. We’re proposing an improved README for PostgreSQL that
includes more helpful links for prospective PostgreSQL contributors and has
a nicer presentation.

Although development does not take place on GitHub or GitLab for
PostgreSQL, many developers might view the PostgreSQL source code using one
of those mirrors (I do myself). Since both support Markdown files, a
Markdown version of the README (as README.md) gets presentational benefits
that I think are helpful.

For a head-to-head comparison of what that looks like, review the current
README and a proposed README.md version below:

Current version:

https://github.com/andyatkinson/postgres/blob/master/README

Markdown README.md version on GitHub:

https://github.com/andyatkinson/postgres/blob/e88138765750b6f7898089b4016641eee01bf616/README.md

Feedback Requested

Samay Sharma are both interested in the initial developer experience for
PostgreSQL. We had a chat about the role the README plays in that, while
it's a small role, we thought this might be a place to start.

We'd love some feedback.

Prospective contributors need to know about compilation, the mailing lists,
and how the commitfest events work. This information is scattered around on
wiki pages, but we're wondering if more could be brought into the README
and whether that would help?

If you do check out the new file, we'd love to know whether you think
there's useful additions, or there's content that's missing.

If there's any kind of feedback or consensus on this thread, I'm happy to
create and send a patch.

Thanks for taking a look!

Andrew Atkinson w/ reviews from Samay Sharma

Re: Comments on Custom RMGRs

2024-02-26 Thread Danil Anisimow

Hi,

The checkpoint hook looks very useful, especially for extensions that have
their own storage, like pg_stat_statements.
For example, we can keep work data in shared memory and save it only during
checkpoints.
When recovering, we need to read all the data from the disk and then repeat
the latest changes from the WAL.

On Mon, Feb 26, 2024 at 2:42 PM Simon Riggs 
wrote:
> On Fri, 13 May 2022 at 00:42, Andres Freund  wrote:
> >
> > On 2022-05-12 22:26:51 +0100, Simon Riggs wrote:
> > > On Thu, 12 May 2022 at 04:40, Andres Freund 
wrote:
> > > > I'm not happy with the idea of random code being executed in the
middle of
> > > > CheckPointGuts(), without any documentation of what is legal to do
at that
> > > > point.
> > >
> > > The "I'm not happy.." ship has already sailed with pluggable rmgrs.
> >
> > I don't agree. The ordering within a checkpoint is a lot more fragile
than
> > what you do in an individual redo routine.
>
> Example?
>
>
> > > Checkpoints exist for one purpose - as the starting place for
recovery.
> > >
> > > Why would we allow pluggable recovery without *also* allowing
> > > pluggable checkpoints?
> >
> > Because one can do a lot of stuff with just pluggable WAL records,
without
> > integrating into checkpoints?
> >
> > Note that I'm *not* against making checkpoint extensible - I just think
it
> > needs a good bit of design work around when the hook is called etc.
>
> When was any such work done previously for any other hook?? That isn't
needed.
>
> Checkpoints aren't complete until all data structures have
> checkpointed, so there are no problems from a partial checkpoint being
> written.
>
> As a result, the order of actions in CheckpointGuts() is mostly
> independent of each other. The SLRUs are all independent of each
> other, as is CheckPointBuffers().
>
> The use cases I'm trying to support aren't tricksy modifications of
> existing code, they are just entirely new data structures which are
> completely independent of other Postgres objects.
>
>
> > I definitely think it's too late in the cycle to add checkpoint
extensibility
> > now.
> >
> >
> > > > To actually be useful we'd likely need multiple calls to such an
rmgr
> > > > callback, with a parameter where in CheckPointGuts() we are. Right
now the
> > > > sequencing is explicit in CheckPointGuts(), but with the proposed
callback,
> > > > that'd not be the case anymore.
> > >
> > > It is useful without the extra complexity you mention.
> >
> > Shrug. The documentation work definitely is needed. Perhaps we can get
away
> > without multiple callbacks within a checkpoint, I think it'll become
more
> > apparent when writing information about the precise point in time the
> > checkpoint callback is called.
>
> You seem to be thinking in terms of modifying the existing actions in
> CheckpointGuts(). I don't care about that. Anybody that wishes to do
> that can work out the details of their actions.
>
> There is nothing to document, other than "don't do things that won't
> work". How can anyone enumerate all the things that wouldn't work??
>
> There is no list of caveats for any other hook. Why is it needed here?

There are easily reproducible issues where rm_checkpoint() throws an ERROR.
When it occurs at the end-of-recovery checkpoint, the server fails to start
with a message like this:
ERROR:  Test error
FATAL:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.

Even if we remove the broken extension from shared_preload_libraries, we
get the following message in the server log:
FATAL:  resource manager with ID 128 not registered
HINT:  Include the extension module that implements this resource manager
in shared_preload_libraries.

In both cases, with or without the extension in shared_preload_libraries,
the server cannot start.

This seems like a programmer's problem, but what should the user do after
receiving such messages?

Maybe it would be safer to use something like after_checkpoint_hook, which
would be called after the checkpoint is completed?
This is enough for some cases when we only need to save shared memory to
disk.

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-26 Thread Alvaro Herrera

On 2024-Feb-23, Dilip Kumar wrote:

> 1.
> + * If no process is already in the list, we're the leader; our first step
> + * is to "close out the group" by resetting the list pointer from
> + * ProcGlobal->clogGroupFirst (this lets other processes set up other
> + * groups later); then we lock the SLRU bank corresponding to our group's
> + * page, do the SLRU updates, release the SLRU bank lock, and wake up the
> + * sleeping processes.
> 
> I think here we are saying that we "close out the group" before
> acquiring the SLRU lock but that's not true.  We keep the group open
> until we gets the lock so that we can get maximum members in while we
> are anyway waiting for the lock.

Absolutely right.  Reworded that.

> 2.
>  static void
>  TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
>   RepOriginId nodeid, int slotno)
>  {
> - Assert(TransactionIdIsNormal(xid));
> + if (!TransactionIdIsNormal(xid))
> + return;
> +
> + entryno = TransactionIdToCTsEntry(xid);
> 
> I do not understand why we need this change.

Ah yeah, I was bothered by the fact that if you pass Xid values earlier
than NormalXid to this function, we'd reply with some nonsensical values
instead of throwing an error.  But you're right that it doesn't belong
in this patch, so I removed that.

Here's a version with these fixes, where I also added some text to the
pg_stat_slru documentation:

+  
+   For each SLRU area that's part of the core server,
+   there is a configuration parameter that controls its size, with the suffix
+   _buffers appended.  For historical
+   reasons, the names are not exact matches, but Xact
+   corresponds to transaction_buffers and the rest should
+   be obvious.
+   
+  

I think I would like to suggest renaming the GUCs to have the _slru_ bit
in the middle:

+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_slru_buffers = 0  # memory for pg_commit_ts (0 = 
auto)
+#multixact_offsets_slru_buffers = 16# memory for 
pg_multixact/offsets
+#multixact_members_slru_buffers = 32# memory for 
pg_multixact/members
+#notify_slru_buffers = 16   # memory for pg_notify
+#serializable_slru_buffers = 32 # memory for pg_serial
+#subtransaction_slru_buffers = 0# memory for pg_subtrans (0 = auto)
+#transaction_slru_buffers = 0   # memory for pg_xact (0 = auto)

and the pgstat_internal.h table:
 
static const char *const slru_names[] = {
"commit_timestamp",
"multixact_members",
"multixact_offsets",
"notify",
"serializable",
"subtransaction",
"transaction",
"other" /* has to be last */
};

This way they match perfectly.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
 (George Orwell's The Lord of the Rings)
>From 4dc139e70feb5e43bbe2689cfb044ef0957761b3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 22 Feb 2024 18:42:56 +0100
Subject: [PATCH v20] Make SLRU buffer sizes configurable

Also, divide the slot array in banks, so that the LRU algorithm can be
made more scalable.

Also remove the centralized control lock for even better scalability.

Authors: Dilip Kumar, Andrey Borodin
---
 doc/src/sgml/config.sgml  | 139 +++
 doc/src/sgml/monitoring.sgml  |  14 +-
 src/backend/access/transam/clog.c | 236 
 src/backend/access/transam/commit_ts.c|  81 ++--
 src/backend/access/transam/multixact.c| 190 +++---
 src/backend/access/transam/slru.c | 357 +-
 src/backend/access/transam/subtrans.c | 103 -
 src/backend/commands/async.c  |  61 ++-
 src/backend/storage/lmgr/lwlock.c |   9 +-
 src/backend/storage/lmgr/lwlocknames.txt  |  14 +-
 src/backend/storage/lmgr/predicate.c  |  34 +-
 .../utils/activity/wait_event_names.txt   |  15 +-
 src/backend/utils/init/globals.c  |   9 +
 src/backend/utils/misc/guc_tables.c   |  78 
 src/backend/utils/misc/postgresql.conf.sample |   9 +
 src/include/access/clog.h |   1 -
 src/include/access/commit_ts.h|   1 -
 src/include/access/multixact.h|   4 -
 src/include/access/slru.h |  86 +++--
 src/include/access/subtrans.h |   3 -
 src/include/commands/async.h  |   5 -
 src/include/miscadmin.h   |   8 +
 src/include/storage/lwlock.h  |   7 +
 src/include/storage/predicate.h   |   4 -
 src/include/utils/guc_hooks.h |  11 +
 src/test/modules/test_slru/test_slru.c|  35 +-
 26 files changed, 1161 insertions(+), 353 deletions(-)

diff --git a/doc/src/sgml/config.sgml

Re: Shared detoast Datum proposal

Nikita Malakhov  writes:

> Hi,
>
> Tomas, we already have a working jsonb partial detoast prototype,
> and currently I'm porting it to the recent master.

This is really awesome! Acutally when I talked to MySQL guys, they said
MySQL already did this and I admit it can resolve some different issues
than the current patch. Just I have many implemetion questions in my
mind. 

> Andy, thank you! I'll check the last patch set out and reply in a day
> or two.

Thank you for your attention!

-- 
Best Regards
Andy Fan

Re: Shared detoast Datum proposal

> On 2/26/24 14:22, Andy Fan wrote:
>> 
>>...
>>
>>> Also, toasted values
>>> are not always being used immediately and as a whole, i.e. jsonb values are 
>>> fully
>>> detoasted (we're working on this right now) to extract the smallest value 
>>> from
>>> big json, and these values are not worth keeping in memory. For text values 
>>> too,
>>> we often do not need the whole value to be detoasted.
>> 
>> I'm not sure how often this is true, espeically you metioned text data
>> type. I can't image why people acquire a piece of text and how can we
>> design a toasting system to fulfill such case without detoast the whole
>> as the first step. But for the jsonb or array data type, I think it is
>> more often. However if we design toasting like that, I'm not sure if it
>> would slow down other user case. for example detoasting the whole piece
>> use case. I am justing feeling both way has its user case, kind of heap
>> store and columna store.  
>> 
>
> Any substr/starts_with call benefits from this optimization, and such
> calls don't seem that uncommon. I certainly can imagine people doing
> this fairly often.

This leads me to pay more attention to pg_detoast_datum_slice user case
which I did overlook and caused some regression in this case. Thanks!

As you said later:

> Is there a way to identify cases that are likely to benefit from this
> slicing, and disable the detoasting for them? We already disable it for
> other cases, so maybe we can do this here too?

I think the answer is yes, if our goal is to disable the whole toast for
some speical calls like substr/starts_with. In the current patch, we have:

/*
 * increase_level_for_pre_detoast
 *  Check if the given Expr could detoast a Var directly, if yes,
 * increase the level and return true. otherwise return false;
 */
static inline void
increase_level_for_pre_detoast(Node *node, intermediate_level_context *ctx)
{
/* The following nodes is impossible to detoast a Var directly. */
if (IsA(node, List) || IsA(node, TargetEntry) || IsA(node, NullTest))
{
ctx->level_added = false;
}
else if (IsA(node, FuncExpr) && castNode(FuncExpr, node)->funcid == 
F_PG_COLUMN_COMPRESSION)
{
/* let's not detoast first so that pg_column_compression works. 
*/
ctx->level_added = false;
}

while it works, but the blacklist looks not awesome.

> In any case, it's a long-standing behavior /
> optimization, and I don't think we can just dismiss it this quickly.

I agree. So I said both have their user case. and when I say the *both*, I
mean the other method is "partial detoast prototype", which Nikita has
told me before. while I am sure it has many user case, I'm also feeling
it is complex and have many questions in my mind, but I'd like to see
the design before asking them.

> Or maybe there's a way to do the detoasting lazily, and only keep the
> detoasted value when it's requesting the whole value. Or perhaps even
> better, remember what part we detoasted, and maybe it'll be enough for
> future requests?
>
> I'm not sure how difficult would this be with the proposed approach,
> which eagerly detoasts the value into tts_values. I think it'd be
> easier to implement with the TOAST cache I briefly described ... 

I can understand the benefits of the TOAST cache, but the following
issues looks not good to me IIUC: 

1). we can't put the result to tts_values[*] since without the planner
decision, we don't know if this will break small_tlist logic. But if we
put it into the cache only, which means a cache-lookup as a overhead is
unavoidable.

2). It is hard to decide which entry should be evicted without attaching
it to the TupleTableSlot's life-cycle. then we can't grantee the entry
we keep is the entry we will reuse soon?

-- 
Best Regards
Andy Fan

Re: Optimize planner memory consumption for huge arrays

Tomas Vondra  writes:
> On 2/25/24 17:29, Tom Lane wrote:
>> Yeah.  Also: once we had such an idea, it'd be very tempting to apply
>> it to other frequently-reset contexts like the executor's per-tuple
>> evaluation contexts.  I'm not quite prepared to argue that
>> MemoryContextReset should just act that way all the time ... but
>> it's sure interesting to think about.

> Do the context resets consume enough time to make this measurable?

I think they do.  We previously invented the "isReset" mechanism to
eliminate work in the case of exactly zero allocations since the
last reset, and that made a very measurable difference at the time,
even though you'd think the amount of work saved would be negligible.
This idea seems like it might be able to supersede that one and win
in a larger fraction of cases.

> +1 to disable this optimization in assert-enabled builds. I guess we'd
> invent a new constant to disable it, and tie it to USE_ASSERT_CHECKING
> (similar to CLOBBER_FREED_MEMORY, for example).

> Thinking about CLOBBER_FREED_MEMORY, could it be useful to still clobber
> the memory, even if we don't actually reset the context?

I think in any case where we were trying to support debugging, we'd
just disable the optimization, so that reset always resets.

regards, tom lane

Re: Running the fdw test from the terminal crashes into the core-dump

2024-02-26 Thread Alvaro Herrera

On 2024-Feb-25, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2024-Feb-22, Tom Lane wrote:
> >> While I've not done anything about that here, I wonder if we shouldn't
> >> just write "privilege on the target table" without any special markup.
> 
> > That would work for me.
> 
> OK.  Will you do the honors, or shall I?

I can push the whole later today.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)

Re: Allow non-superuser to cancel superuser tasks.

On Mon, Feb 26, 2024 at 12:38:40PM +0500, Kirill Reshke wrote:
> I see 2 possible ways to implement this. The first one is to have hool in
> pg_signal_backend, and define a hook in extension which can do the thing.
> The second one is to have a predefined role. Something like a
> `pg_signal_autovacuum` role which can signal running autovac to cancel. But
> I don't see how we can handle specific `application_name` with this
> solution.

pg_signal_autovacuum seems useful given commit 3a9b18b.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: WIP Incremental JSON Parser

2024-02-26 Thread Jacob Champion

On Mon, Feb 26, 2024 at 7:08 AM Jacob Champion
 wrote:
> As a brute force example of the latter, with the attached diff I get
> test failures at chunk sizes 1, 2, 3, 4, 6, and 12.

But this time with the diff.

--Jacob
diff --git 
a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl 
b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
index 8eeb7f5b91..08c280037f 100644
--- a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
+++ b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
@@ -10,11 +10,13 @@ my $test_file = "$FindBin::RealBin/../tiny.json";
 
 my $exe = "test_json_parser_incremental";
 
-my ($stdout, $stderr) = run_command( [$exe, $test_file] );
-
-ok($stdout =~ /SUCCESS/, "test succeeds");
-ok(!$stderr, "no error output");
+for (my $size = 64; $size > 0; $size--)
+{
+   my ($stdout, $stderr) = run_command( [$exe, "-c", $size, $test_file] );
 
+   ok($stdout =~ /SUCCESS/, "chunk size $size: test succeeds");
+   ok(!$stderr, "chunk size $size: no error output");
+}
 
 done_testing();
 
diff --git a/src/test/modules/test_json_parser/test_json_parser_incremental.c 
b/src/test/modules/test_json_parser/test_json_parser_incremental.c
index dee5c6f7d1..a94cc8adb8 100644
--- a/src/test/modules/test_json_parser/test_json_parser_incremental.c
+++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
@@ -12,6 +12,7 @@
  * the parser in very small chunks. In practice you would normally use
  * much larger chunks, but doing this makes it more likely that the
  * full range of incement handling, especially in the lexer, is exercised.
+ * If the "-c SIZE" option is provided, that chunk size is used instead.
  *
  * The argument specifies the file containing the JSON input.
  *
@@ -34,12 +35,19 @@ main(int argc, char **argv)
JsonLexContext lex;
StringInfoData json;
int n_read;
+   size_t  chunk_size = 60;
+
+   if (strcmp(argv[1], "-c") == 0)
+   {
+   sscanf(argv[2], "%zu", _size);
+   argv += 2;
+   }
 
makeJsonLexContextIncremental(, PG_UTF8, false);
initStringInfo();
 
json_file = fopen(argv[1], "r");
-   while ((n_read = fread(buff, 1, 60, json_file)) > 0)
+   while ((n_read = fread(buff, 1, chunk_size, json_file)) > 0)
{
appendBinaryStringInfo(, buff, n_read);
appendStringInfoString(, "1+23 trailing junk");

Re: WIP Incremental JSON Parser

2024-02-26 Thread Jacob Champion

On Thu, Feb 22, 2024 at 3:43 PM Andrew Dunstan  wrote:
> > Are there plans to fill out the test suite more? Since we should be
> > able to control all the initial conditions, it'd be good to get fairly
> > comprehensive coverage of the new code.
>
> Well, it's tested (as we know) by the backup manifest tests. During
> development, I tested by making the regular parser use the
> non-recursive  parser (see FORCE_JSON_PSTACK). That doesn't test the
> incremental piece of it, but it does check that the rest of it is doing
> the right thing. We could probably extend the incremental test by making
> it output a stream of tokens and making sure that was correct.

That would also cover all the semantic callbacks (currently,
OFIELD_END and AELEM_* are uncovered), so +1 from me.

Looking at lcov, it'd be good to
- test failure modes as well as the happy path, so we know we're
rejecting invalid syntax correctly
- test the prediction stack resizing code
- provide targeted coverage of the partial token support, since at the
moment we're at the mercy of the manifest format and the default chunk
size

As a brute force example of the latter, with the attached diff I get
test failures at chunk sizes 1, 2, 3, 4, 6, and 12.

> > As an aside, I find the behavior of need_escapes=false to be
> > completely counterintuitive. I know the previous code did this, but it
> > seems like the opposite of "provides unescaped strings" should be
> > "provides raw strings", not "all strings are now NULL".
>
> Yes, we could possibly call it "need_strings"  or something like that.

+1

--Jacob

Re: Printing backtrace of postgres processes

2024-02-26 Thread Christoph Berg

Re: Michael Paquier
> Something like this can be measured with a bunch of concurrent
> connections attempting connections and a very high rate, like pgbench
> with an empty script and -C, for local connections.

I tried that now. Mind that I'm not a benchmarking expert, and there's
been quite some jitter in the results, but I think there's a clear
trend.

Current head without and with the v28 patchset.
Command line:
pgbench -n -C -c 20 -j 20 -f empty.sql -T 30 --progress=2
empty.sql just contains a ";"
model name: 13th Gen Intel(R) Core(TM) i7-13700H

head:
tps = 2211.289863 (including reconnection times)
tps = 2113.907588 (including reconnection times)
tps = 2200.406877 (including reconnection times)
average: 2175

v28:
tps = 1873.472459 (including reconnection times)
tps = 2068.094383 (including reconnection times)
tps = 2196.890897 (including reconnection times)
average: 2046

2046 / 2175 = 0.941

Even if we regard the 1873 as an outlier, I've seen many vanilla runs
with 22xx tps, and not a single v28 run with 22xx tps. Other numbers I
collected suggested a cost of at least 3% for the feature.

Christoph

Re: Shared detoast Datum proposal

2024-02-26 Thread Nikita Malakhov

Hi,

Tomas, we already have a working jsonb partial detoast prototype,
and currently I'm porting it to the recent master. Due to the size
 of the changes and very invasive nature it takes a lot of effort,
but it is already done. I'm also trying to make the core patch
less invasive. Actually, it is a subject for a separate topic, as soon
as I make it work on the current master we'll propose it
to the community.

Andy, thank you! I'll check the last patch set out and reply in a day or
two.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: Speeding up COPY TO for uuids and arrays

2024-02-26 Thread Ranier Vilela

Em seg., 26 de fev. de 2024 às 02:28, Michael Paquier 
escreveu:

> On Thu, Feb 22, 2024 at 04:42:37PM -0300, Ranier Vilela wrote:
> > Can you share exactly script used to create a table?
>
> Stressing the internals of array_out() for the area of the patch is
> not that difficult, as we want to quote each element that's returned
> in output.
>
> The trick is to have the following to stress the second quoting loop a
> maximum:
> - a high number of rows.
> - a high number of items in the arrays.
> - a *minimum* number of characters in each element of the array, with
> characters that require quoting.
>
> The best test case I can think of to demonstrate the patch would be
> something like that (adjust rows and elts as you see fit):
> -- Number of rows
> \set rows 6
> -- Number of elements
> \set elts 4
> create table tab as
>   with data as (
> select array_agg(a) as array
>   from (
> select '{'::text
>   from generate_series(1, :elts) as int(a)) as index(a))
>   select data.array from data, generate_series(1,:rows);
>
> Then I get:
>array
> ---
>  {"{","{","{","{"}
>  {"{","{","{","{"}
>  {"{","{","{","{"}
>  {"{","{","{","{"}
>  {"{","{","{","{"}
>  {"{","{","{","{"}
> (6 rows)
>
> With "\set rows 10" and "\set elts 1", giving 100MB of data
> with 100k rows with 10k elements each, I get for HEAD when data is in
> shared buffers:
> =# copy tab to '/dev/null';
> COPY 10
> Time: 48620.927 ms (00:48.621)
> And with v3:
> =# copy tab to '/dev/null';
> COPY 10
> Time: 47993.183 ms (00:47.993)
>
Thanks Michael, for the script.

It is easier to make comparisons, using the exact same script.

best regards,
Ranier Vilela

Re: Improve eviction algorithm in ReorderBuffer

2024-02-26 Thread Masahiko Sawada

On Mon, Feb 26, 2024 at 6:43 PM Tomas Vondra
 wrote:
>
>
>
> On 2/26/24 07:46, Masahiko Sawada wrote:
> > On Sat, Feb 24, 2024 at 1:29 AM Tomas Vondra
> >  wrote:
> >>...
> >>
> >> overall design
> >> --
> >>
> >> As for the design, I agree with the approach of using a binaryheap to
> >> track transactions by size. When going over the thread history,
> >> describing the initial approach with only keeping "large" transactions
> >> above some threshold (e.g. 10%), I was really concerned that'll either
> >> lead to abrupt changes in behavior (when transactions move just around
> >> the 10%), or won't help with many common cases (with most transactions
> >> being below the limit).
> >>
> >> I was going to suggest some sort of "binning" - keeping lists for
> >> transactions of similar size (e.g. <1kB, 1-2kB, 2-4kB, 4-8kB, ...) and
> >> evicting transactions from a list, i.e. based on approximate size. But
> >> if the indexed binary heap seems to be cheap enough, I think it's a
> >> better solution.
> >
> > I've also considered the binning idea. But it was not clear to me how
> > it works well in a case where all transactions belong to the
> > particular class. For example, if we need to free up 1MB memory, we
> > could end up evicting 2000 transactions consuming 50 bytes instead of
> > 100 transactions consuming 1000 bytes, resulting in that we end up
> > serializing more transactions. Also, I'm concerned about the cost of
> > maintaining the binning lists.
> >
>
> I don't think the list maintenance would be very costly - in particular,
> the lists would not need to be sorted by size. You're right in some
> extreme cases we might evict the smallest transactions in the list. I
> think on average we'd evict transactions with average size, which seems
> OK for this use case.
>
> Anyway, I don't think we need to be distracted with this. I mentioned it
> merely to show it was considered, but the heap seems to work well
> enough, and in the end is even simpler because the complexity is hidden
> outside reorderbuffer.
>
> >>
> >> The one thing I'm a bit concerned about is the threshold used to start
> >> using binary heap - these thresholds with binary decisions may easily
> >> lead to a "cliff" and robustness issues, i.e. abrupt change in behavior
> >> with significant runtime change (e.g. you add/remove one transaction and
> >> the code takes a much more expensive path). The value (1024) seems
> >> rather arbitrary, I wonder if there's something to justify that choice.
> >
> > True. 1024 seems small to me. In my environment, I started to see a
> > big difference from around 4 transactions. But it varies depending
> > on environments and workloads.
> >
> > I think that this performance problem we're addressing doesn't
> > normally happen as long as all transactions being decoded are
> > top-level transactions. Otherwise, we also need to improve
> > ReorderBufferLargestStreamableTopTXN(). Given this fact, I think
> > max_connections = 1024 is a possible value in some systems, and I've
> > observed such systems sometimes. On the other hand, I've observed >
> > 5000 in just a few cases, and having more than 5000 transactions in
> > ReorderBuffer seems unlikely to happen without subtransactions. I
> > think we can say it's an extreme case, the number is still an
> > arbitrary number though.
> >
> > Or probably we can compute the threshold based on max_connections,
> > e.g., max_connections * 10. That way, we can ensure that users won't
> > incur the max-heap maintenance costs as long as they don't use
> > subtransactions.
> >
>
> Tying this to max_connections seems like an interesting option. It'd
> make this adaptive to a system. I haven't thought about the exact value
> (m_c * 10), but it seems better than arbitrary hard-coded values.

I've updated the patch accordingly, using MaxConnections for now. I've
also updated some comments and commit messages and added typedef.list
changes.

>
> >>
> >> In any case, I agree it'd be good to have some dampening factor, to
> >> reduce the risk of trashing because of adding/removing a single
> >> transaction to the decoding.
> >>
> >>
> >> related stuff / GenerationContext
> >> -
> >>
> >> It's not the fault of this patch, but this reminds me I have some doubts
> >> about how the eviction interferes with using the GenerationContext for
> >> some of the data. I suspect we can easily get into a situation where we
> >> evict the largest transaction, but that doesn't actually reduce the
> >> memory usage at all, because the memory context blocks are shared with
> >> some other transactions and don't get 100% empty (so we can't release
> >> them). But it's actually worse, because GenerationContext does not even
> >> reuse this memory. So do we even gain anything by the eviction?
> >>
> >> When the earlier patch versions also considered age of the transaction,
> >> to try evicting the older ones first, I think that was interesting. I
> >>

Re: Shared detoast Datum proposal

On 2/26/24 14:22, Andy Fan wrote:
> 
>...
>
>> Also, toasted values
>> are not always being used immediately and as a whole, i.e. jsonb values are 
>> fully
>> detoasted (we're working on this right now) to extract the smallest value 
>> from
>> big json, and these values are not worth keeping in memory. For text values 
>> too,
>> we often do not need the whole value to be detoasted.
> 
> I'm not sure how often this is true, espeically you metioned text data
> type. I can't image why people acquire a piece of text and how can we
> design a toasting system to fulfill such case without detoast the whole
> as the first step. But for the jsonb or array data type, I think it is
> more often. However if we design toasting like that, I'm not sure if it
> would slow down other user case. for example detoasting the whole piece
> use case. I am justing feeling both way has its user case, kind of heap
> store and columna store.  
> 

Any substr/starts_with call benefits from this optimization, and such
calls don't seem that uncommon. I certainly can imagine people doing
this fairly often. In any case, it's a long-standing behavior /
optimization, and I don't think we can just dismiss it this quickly.

Is there a way to identify cases that are likely to benefit from this
slicing, and disable the detoasting for them? We already disable it for
other cases, so maybe we can do this here too?

Or maybe there's a way to do the detoasting lazily, and only keep the
detoasted value when it's requesting the whole value. Or perhaps even
better, remember what part we detoasted, and maybe it'll be enough for
future requests?

I'm not sure how difficult would this be with the proposed approach,
which eagerly detoasts the value into tts_values. I think it'd be easier
to implement with the TOAST cache I briefly described ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

Hi,

On Tue, Feb 20, 2024 at 04:03:53PM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Feb 20, 2024 at 02:33:44PM +0900, Michael Paquier wrote:
> > On Tue, Feb 20, 2024 at 08:51:17AM +0900, Michael Paquier wrote:
> > > Prefixing these with "initial_" is fine, IMO.  That shows the
> > > intention that these come from the slot's data before doing the
> > > termination.  So I'm OK with what's been proposed in v3.
> > 
> > I was looking at that a second time, and just concluded that this is
> > OK, so I've applied that down to 16, wordsmithing a bit the comments.
> 
> Thanks!
> FWIW, I've started to write a POC regarding the test we mentioned up-thread.
> 
> The POC test is based on what has been submitted by Michael in [1]. The POC 
> test
> seems to work fine and it seems that nothing more is needed in [1] (at some 
> point
> I thought I would need the ability to wake up multiple "wait" injection points
> in sequence but that was not necessary).
> 
> I'll polish and propose my POC test once [1] is pushed (unless you're curious
> about it before).

Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch now
(see the attached).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 05a4eace2002ad5387b89eb5c133a984de3f8c2d Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 20 Feb 2024 08:11:47 +
Subject: [PATCH v1] Add regression test for 818fefd8fd

---
 src/backend/replication/slot.c|   8 ++
 .../t/035_standby_logical_decoding.pl | 103 +-
 2 files changed, 109 insertions(+), 2 deletions(-)
   7.3% src/backend/replication/
  92.6% src/test/recovery/t/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 0f173f63a2..a7aeb9ca3e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -53,6 +53,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/injection_point.h"
 
 /*
  * Replication slot on-disk data structure.
@@ -1647,6 +1648,13 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 ReportSlotInvalidation(conflict, true, active_pid,
 	   slotname, restart_lsn,
 	   oldestLSN, snapshotConflictHorizon);
+/*
+ * This injection needs to be before the kill to ensure that
+ * the slot is still "active". It also has to be after
+ * ReportSlotInvalidation() to ensure the invalidation message
+ * is reported.
+ */
+INJECTION_POINT("TerminateProcessHoldingSlot");
 
 if (MyBackendType == B_STARTUP)
 	(void) SendProcSignal(active_pid,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 0710bccc17..3a2505bf5e 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -733,14 +733,113 @@ change_hot_standby_feedback_and_wait_for_xmins(1, 1);
 
 ##
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
-# Scenario 6: incorrect wal_level on primary.
+# Scenario 6: Ensure race condition described in 818fefd8fd is fixed.
+##
+SKIP:
+{
+skip "Injection points not supported by this build", 1
+  if ($ENV{enable_injection_points} ne 'yes');
+
+	# Get the position to search from in the standby logfile
+	$logstart = -s $node_standby->logfile;
+
+	# Drop the slots, re-create them, change hot_standby_feedback,
+	# check xmin and catalog_xmin values, make slot active and reset stat.
+	reactive_slots_change_hfs_and_wait_for_xmins('pruning_', 'injection_', 0, 1);
+
+	# Create the injection_points extension
+	$node_primary->safe_psql('testdb', 'CREATE EXTENSION injection_points;');
+
+	# Wait until the extension has been created on the standby
+	$node_primary->wait_for_replay_catchup($node_standby);
+
+	# Attach the injection point
+	$node_standby->safe_psql('testdb',
+	   "SELECT injection_points_attach('TerminateProcessHoldingSlot', 'wait');");
+
+	# Trigger a conflict and insert an XLOG_RUNNING_XACTS before triggering
+	# the vacuum.
+	$node_primary->safe_psql('testdb', qq[CREATE TABLE injection_test(x integer);
+		  DROP TABLE injection_test;
+		  SELECT pg_log_standby_snapshot();]);
+
+	# Now launch the vacuum
+	wait_until_vacuum_can_remove('', 'CREATE TABLE injection_test2(x integer);', 'pg_class');
+
+	# Note: $node_primary->wait_for_replay_catchup($node_standby) would be
+	# hanging here due to the injection point, so check the log instead.
+
+	my $terminated = 0;
+	for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+	{
+		if ($node_standby->log_contains(
+			'terminating process .* to release replication slot \"injection_activeslot\"', $logstart))
+		{
+			$terminated = 1;
+			last;

Re: RFC: Logging plan of the running query

2024-02-26 Thread Julien Rouhaud

On Mon, Feb 26, 2024 at 01:56:44PM +0530, Robert Haas wrote:
> On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud  wrote:
> > Yeah, trying to find a generalized solution seems like worth investing some
> > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code 
> > a
> > few years down the road.
>
> I just don't really see how to do it. I suspect that every task that
> wants to run at some CFIs but not others is going to have slightly
> different requirements, and we probably can't foresee what all of
> those requirements are.
>
> Said another way, if in the future we want to call
> DoSomethingOrOther() from the CFI handler, well then we need to know
> that we're not already in the middle of using any subsystem that
> DoSomethingOrOther() might also try to use ... and we also need to
> know that we're not in the middle of doing anything that's more
> critical than DoSomethingOrOther(). But both of these are likely to
> vary in each case.
>
> EXPLAIN might be one member of a general class of things that require
> catalog access (and thus might take locks or lwlocks, access the
> catalogs, trigger invalidations, etc.) but it's not clear how general
> that class really is. Also, I think if we try to do too many different
> kinds of things at CFIs, the whole thing is going to fall apart.
> You'll end up failing to foresee some interactions, or the stack will
> get too deep, or ... something.

I still fail to understand your point.  So you say we might want to check for
safe condition to run explain or DoSomethingOrOther or even DoMeh, with all
different requirements.  IIUC what you're saying is that we should have
CHECK_FOR_EXPLAIN(), CHECK_FOR_DOSOMETHINGOROTHER() and CHECK_FOR_MEH()?

And so in some code path A we could have

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();

In another

CHECK_FOR_INTERRUPTS();
CHECK_FOR_DOSOMETHINGOROTHER();

and in one happy path

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();
CHECK_FOR_DOSOMETHINGOROTHER();
CHECK_FOR_MEH();

Or that we should still have all of those but they shouldn't be anywhere close
to a CHECK_FOR_INTERRUPTS(), or something totally different?

In the first case, I'm not sure why having CHECK_FOR_INTERRUPTS(EXPLAIN|...),
with a combination of flags and with the flag handling being done after
ProcessIterrupts(), would be any different apart from having less boilerplate
lines.  Similarly for the 2nd case, but relying on a single more general
CHECK_FOR_CONDITION(EXPLAIN | ...) rather than N CHECK_FOR_XXX?

If you just want something totally different then sure.

Re: Shared detoast Datum proposal



Hi, 


> When we store and detoast large values, say, 1Gb - that's a very likely 
> scenario,
> we have such cases from prod systems - we would end up in using a lot of 
> shared
> memory to keep these values alive, only to discard them later.

First the feature can make sure if the original expression will not
detoast it, this feature will not detoast it as well. Based on this, if
there is a 1Gb datum, in the past, it took 1GB memory as well, but
it may be discarded quicker than the patched version. but patched
version will *not* keep it for too long time since once the
slot->tts_values[*] is invalidated, the memory will be released
immedately.

For example:

create table t(a text, b int);
insert into t select '1-gb-length-text' from generate_series(1, 100);

select * from t where a like '%gb%';

The patched version will take 1gb extra memory only.

Are you worried about this case or some other cases? 

> Also, toasted values
> are not always being used immediately and as a whole, i.e. jsonb values are 
> fully
> detoasted (we're working on this right now) to extract the smallest value from
> big json, and these values are not worth keeping in memory. For text values 
> too,
> we often do not need the whole value to be detoasted.

I'm not sure how often this is true, espeically you metioned text data
type. I can't image why people acquire a piece of text and how can we
design a toasting system to fulfill such case without detoast the whole
as the first step. But for the jsonb or array data type, I think it is
more often. However if we design toasting like that, I'm not sure if it
would slow down other user case. for example detoasting the whole piece
use case. I am justing feeling both way has its user case, kind of heap
store and columna store.  

FWIW, as I shown in the test case, this feature is not only helpful for
big datum, it is also helpful for small toast datum. 

-- 
Best Regards
Andy Fan

Re: Fix incorrect PG_GETARG in pgcrypto

On 2/16/24 02:35, shihao zhong wrote:
> On Tue, Feb 13, 2024 at 7:08 PM Michael Paquier  wrote:
>>
>> On Tue, Feb 13, 2024 at 05:36:36PM +0900, Michael Paquier wrote:
>>> You've indeed grabbed some historical inconsistencies here.  Please
>>> note that your patch has reversed diffs (for example, the SQL
>>> definition of pgp_sym_encrypt_bytea uses bytea,text,text as arguments
>>> and your resulting patch shows how HEAD does the job with
>>> bytea,bytea,bytea), but perhaps you have generated it with a command
>>> like `git diff -R`?
>>
>> The reversed part of the patch put aside aside, I've double-checked
>> your patch and the inconsistencies seem to be all addressed in this
>> area.
> Thanks for fixing and merging this patch, I appreciate it!
> 

Should this be marked as committed, or is there some remaining part?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-26 Thread Amit Kapila

On Fri, Feb 23, 2024 at 10:41 PM Давыдов Виталий
 wrote:
>
> Amit Kapila  wrote:
>
> I don't see we do anything specific for 2PC transactions to make them behave 
> differently than regular transactions with respect to synchronous_commit 
> setting. What makes you think so? Can you pin point the code you are 
> referring to?
>
> Yes, sure. The function RecordTransactionCommitPrepared is called on prepared 
> transaction commit (twophase.c). It calls XLogFlush unconditionally. The 
> function RecordTransactionCommit (for regular transactions, xact.c) calls 
> XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN.
>
> There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) 
> that shows that async commit is not supported yet:
> /*
> * We don't currently try to sleep before flush here ... nor is there any
> * support for async commit of a prepared xact (the very idea is probably
> * a contradiction)
> */
> /* Flush XLOG to disk */
> XLogFlush(recptr);
>

It seems this comment is added in the commit 4a78cdeb where we added
async commit support. I think the reason is probably that when the WAL
record for prepared is already flushed then what will be the idea of
async commit here?

> Right, I think for this we need to implement parallel apply.
>
> Yes, parallel apply is a good point. But, I believe, it will not work if 
> asynchronous commit is not supported. You have only one receiver process 
> which should dispatch incoming messages to parallel workers. I guess, you 
> will never reach such rate of parallel execution on replica as on the master 
> with multiple backends.
>
>
> Can you be a bit more specific about what exactly you have in mind to achieve 
> the above solutions?
>
> My proposal is to implement async commit for 2PC transactions as it is for 
> regular transactions. It should significantly speedup the catchup process. 
> Then, think how to apply in parallel, which is much diffcult to do. The 
> current problem is to get 2PC state from the WAL on commit prepared. At this 
> moment, the WAL is not flushed yet, commit function waits until WAL with 2PC 
> state is to be flushed. I just tried to do it in my sandbox and found such a 
> problem. Inability to get 2PC state from unflushed WAL stops me right now. I 
> think about possible solutions.
>

At commit prepared, it seems we read prepare's WAL record, right? If
so, it is not clear to me do you see a problem with a flush of
commit_prepared or reading WAL for prepared or both of these.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

Hi,

On Mon, Feb 26, 2024 at 05:52:40PM +0530, shveta malik wrote:
> On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila  wrote:
> >
> > > > > +   if (!ok)
> > > > > +   GUC_check_errdetail("List syntax is invalid.");
> > > > > +
> > > > > +   /*
> > > > > +* If there is a syntax error in the name or if the 
> > > > > replication slots'
> > > > > +* data is not initialized yet (i.e., we are in the startup 
> > > > > process), skip
> > > > > +* the slot verification.
> > > > > +*/
> > > > > +   if (!ok || !ReplicationSlotCtl)
> > > > > +   {
> > > > > +   pfree(rawname);
> > > > > +   list_free(elemlist);
> > > > > +   return ok;
> > > > > +   }
> > > > >
> > > > > we are testing the "ok" value twice, what about using if...else if... 
> > > > > instead and
> > > > > test it once? If so, it might be worth to put the:
> > > > >
> > > > > "
> > > > > +   pfree(rawname);
> > > > > +   list_free(elemlist);
> > > > > +   return ok;
> > > > > "
> > > > >
> > > > > in a "goto".
> > > >
> > > > There were comments to remove the 'goto' statement and avoid
> > > > duplicate free code, so I prefer the current style.
> > >
> > > The duplicate free code would come from the if...else if... rewrite but 
> > > then
> > > the "goto" would remove it, so I'm not sure to understand your point.
> > >
> >
> > I think Hou-San wants to say that there was previously a comment to
> > remove goto and now you are saying to introduce it. But, I think we
> > can avoid both code duplication and goto, if the first thing we check
> > in the function is ReplicationSlotCtl and return false if the same is
> > not set. Won't that be better?
> 
> I think we can not do that as we need to check atleast syntax before
> we return due to NULL ReplicationSlotCtl. We get NULL
> ReplicationSlotCtl during instance startup in
> check_standby_slot_names() as postmaster first loads GUC-table and
> then initializes shared-memory for replication slots. See calls of
> InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in
> PostmasterMain().  FWIW, I do not have any issue with current code as
> well, but if we have to change it, is [1] any better?
> 
> [1]:
> check_standby_slot_names()
> {
> 
> if (!ok)
> {
> GUC_check_errdetail("List syntax is invalid.");
> }
> else if (ReplicationSlotCtl)
> {
>foreach-loop for slot validation
> }
> 
> pfree(rawname);
> list_free(elemlist);
> return ok;
> }
> 

Yeah thanks, it does not test the "ok" value twice and get rid of the goto
while checking the syntax first: I'd vote for it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

Hi,

On Mon, Feb 26, 2024 at 05:18:25PM +0530, Amit Kapila wrote:
> On Mon, Feb 26, 2024 at 12:59 PM Bertrand Drouvot
>  wrote:
> > > > 10 ===
> > > >
> > > > +   if (slot->data.invalidated != RS_INVAL_NONE)
> > > > +   {
> > > > +   /*
> > > > +* Specified physical slot have been 
> > > > invalidated,
> > > > so no point
> > > > +* in waiting for it.
> > > >
> > > > We discovered in [1], that if the wal_status is "unreserved" then the 
> > > > slot is still
> > > > serving the standby. I think we should handle this case differently, 
> > > > thoughts?
> > >
> > > I think the 'invalidated' slot can still be used is a separate bug.
> > > Because
> > > once the slot is invalidated, it can neither protect WALs or ROWs from 
> > > being
> > > removed even if the restart_lsn of the slot can be moved forward after 
> > > being invalidated.
> > >
> > > If the standby can move restart_lsn forward for invalidated slots, then
> > > it should also set the 'invalidated' flag back to NONE, otherwise the slot
> > > cannot serve its purpose anymore. I also reported similar bug before[1].
> >
> ...
> >
> > My point is that I think we should behave like it's not a bug and then 
> > adapt the
> > code accordingly here (until the bug gets fixed).
> >
> 
> oh, I think this doesn't sound like a good idea to me. We should fix
> that bug independently rather than adding code in new features to
> consider the bug as a valid behavior.

Agree, but it all depends if there is a consensus of the other thread being a
bug or not.

I also think it is but there is this part of the code in 
pg_get_replication_slots()
that makes me think ones could think it is not.

"
case WALAVAIL_REMOVED:

/*
 * If we read the restart_lsn long enough ago, maybe that file
 * has been removed by now.  However, the walsender could have
 * moved forward enough that it jumped to another file after
 * we looked.  If checkpointer signalled the process to
 * termination, then it's definitely lost; but if a process is
 * still alive, then "unreserved" seems more appropriate.
 *
"

Anyway, I also think it is a bug so agree to keep the check as it is currenlty (
and keep an eye on the other thread outcome too).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

2024-02-26 Thread Nazir Bilal Yavuz

Hi,

On Fri, 23 Feb 2024 at 15:34, Daniel Gustafsson  wrote:
>
> > On 23 Feb 2024, at 13:09, Nazir Bilal Yavuz  wrote:
>
> > Does errmsg_internal() need to be used all the time when turning elogs
> > into ereports? errmsg_internal()'s comment says that "This should be
> > used for "can't happen" cases that are probably not worth spending
> > translation effort on.". Is it enough to check if the error message
> > has a translation, and then decide the use of errmsg_internal() or
> > errmsg()?
>
> If it's an elog then it won't have a translation as none are included in the
> translation set.  If the errmsg is generic enough to be translated anyways via
> another (un)related ereport call then we of course use that, but ideally not
> create new ones.

Thanks for the explanation.

All of your feedback is addressed in v2.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From fa45a69731da1488560b2749023e9b9573231c2b Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 23 Feb 2024 16:49:10 +0300
Subject: [PATCH v2 1/2] (xlog.c) Add missing error codes to PANIC/FATAL error
 reports

---
 src/backend/access/transam/xlog.c | 77 +--
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c1162d55bff..d295cef9606 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1354,7 +1354,9 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 	}
 
 	if (CurrPos != EndPos)
-		elog(PANIC, "space reserved for WAL record does not match what was written");
+		ereport(PANIC,
+errcode(ERRCODE_DATA_CORRUPTED),
+errmsg_internal("space reserved for WAL record does not match what was written"));
 }
 
 /*
@@ -4040,7 +4042,8 @@ ValidateXLOGDirectoryStructure(void)
 	if (stat(XLOGDIR, _buf) != 0 ||
 		!S_ISDIR(stat_buf.st_mode))
 		ereport(FATAL,
-(errmsg("required WAL directory \"%s\" does not exist",
+(errcode_for_file_access(),
+ errmsg("required WAL directory \"%s\" does not exist",
 		XLOGDIR)));
 
 	/* Check for archive_status */
@@ -4050,7 +4053,8 @@ ValidateXLOGDirectoryStructure(void)
 		/* Check for weird cases where it exists but isn't a directory */
 		if (!S_ISDIR(stat_buf.st_mode))
 			ereport(FATAL,
-	(errmsg("required WAL directory \"%s\" does not exist",
+	(errcode_for_file_access(),
+	 errmsg("required WAL directory \"%s\" does not exist",
 			path)));
 	}
 	else
@@ -4059,7 +4063,8 @@ ValidateXLOGDirectoryStructure(void)
 (errmsg("creating missing WAL directory \"%s\"", path)));
 		if (MakePGDirectory(path) < 0)
 			ereport(FATAL,
-	(errmsg("could not create missing directory \"%s\": %m",
+	(errcode_for_file_access(),
+	 errmsg("could not create missing directory \"%s\": %m",
 			path)));
 	}
 
@@ -4296,7 +4301,8 @@ ReadControlFile(void)
 
 	if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0)
 		ereport(FATAL,
-(errmsg("database files are incompatible with server"),
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("database files are incompatible with server"),
  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x),"
 		   " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).",
 		   ControlFile->pg_control_version, ControlFile->pg_control_version,
@@ -4305,7 +4311,8 @@ ReadControlFile(void)
 
 	if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 		ereport(FATAL,
-(errmsg("database files are incompatible with server"),
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("database files are incompatible with server"),
  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d,"
 		   " but the server was compiled with PG_CONTROL_VERSION %d.",
 		   ControlFile->pg_control_version, PG_CONTROL_VERSION),
@@ -4320,7 +4327,8 @@ ReadControlFile(void)
 
 	if (!EQ_CRC32C(crc, ControlFile->crc))
 		ereport(FATAL,
-(errmsg("incorrect checksum in control file")));
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("incorrect checksum in control file")));
 
 	/*
 	 * Do compatibility checking immediately.  If the database isn't
@@ -4329,68 +4337,78 @@ ReadControlFile(void)
 	 */
 	if (ControlFile->catalog_version_no != CATALOG_VERSION_NO)
 		ereport(FATAL,
-(errmsg("database files are incompatible with server"),
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("database files are incompatible with server"),
  errdetail("The database cluster was initialized with CATALOG_VERSION_NO %d,"
 		   " but the server was compiled with CATALOG_VERSION_NO %d.",
 		   ControlFile->catalog_version_no, CATALOG_VERSION_NO),
  errhint("It looks like you need to initdb.")));
 	if (ControlFile->maxAlign != MAXIMUM_ALIGNOF)

Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-26 Thread Andrei Lepikhov

On 26/2/2024 18:34, Richard Guo wrote:

On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

On 26/2/2024 12:44, Tender Wang wrote:
 > Make sense. I found MemoizeState already has a MemoryContext, so
I used it.
 > I update the patch.
This approach is better for me. In the next version of this patch, I
included a test case. I am still unsure about the context chosen and
the
stability of the test case. Richard, you recently fixed some Memoize
issues, could you look at this problem and patch?

I looked at this issue a bit.  It seems to me what happens is that at
first the memory areas referenced by probeslot->tts_values[] are
allocated in the per tuple context (see prepare_probe_slot).  And then
in MemoizeHash_hash, after we've calculated the hashkey, we will reset
the per tuple context.  However, later in MemoizeHash_equal, we still
need to reference the values in probeslot->tts_values[], which have been
cleared.

Agree

Actually the context would always be reset in MemoizeHash_equal, for
both binary and logical mode.  So I kind of wonder if it's necessary to
reset the context in MemoizeHash_hash.
I can only provide one thought against this solution: what if we have a 
lot of unique hash values, maybe all of them? In that case, we still 
have a kind of 'leak' David fixed by the commit 0b053e78b5.
Also, I have a segfault report of one client. As I see, it was caused by 
too long text column in the table slot. As I see, key value, stored in 
the Memoize hash table, was corrupted, and the most plain reason is this 
bug. Should we add a test on this bug, and what do you think about the 
one proposed in v3?

--
regards,
Andrei Lepikhov
Postgres Professional

Re: Synchronizing slots from primary to standby

2024-02-26 Thread shveta malik

On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila  wrote:
>
> > > > +   if (!ok)
> > > > +   GUC_check_errdetail("List syntax is invalid.");
> > > > +
> > > > +   /*
> > > > +* If there is a syntax error in the name or if the replication 
> > > > slots'
> > > > +* data is not initialized yet (i.e., we are in the startup 
> > > > process), skip
> > > > +* the slot verification.
> > > > +*/
> > > > +   if (!ok || !ReplicationSlotCtl)
> > > > +   {
> > > > +   pfree(rawname);
> > > > +   list_free(elemlist);
> > > > +   return ok;
> > > > +   }
> > > >
> > > > we are testing the "ok" value twice, what about using if...else if... 
> > > > instead and
> > > > test it once? If so, it might be worth to put the:
> > > >
> > > > "
> > > > +   pfree(rawname);
> > > > +   list_free(elemlist);
> > > > +   return ok;
> > > > "
> > > >
> > > > in a "goto".
> > >
> > > There were comments to remove the 'goto' statement and avoid
> > > duplicate free code, so I prefer the current style.
> >
> > The duplicate free code would come from the if...else if... rewrite but then
> > the "goto" would remove it, so I'm not sure to understand your point.
> >
>
> I think Hou-San wants to say that there was previously a comment to
> remove goto and now you are saying to introduce it. But, I think we
> can avoid both code duplication and goto, if the first thing we check
> in the function is ReplicationSlotCtl and return false if the same is
> not set. Won't that be better?

I think we can not do that as we need to check atleast syntax before
we return due to NULL ReplicationSlotCtl. We get NULL
ReplicationSlotCtl during instance startup in
check_standby_slot_names() as postmaster first loads GUC-table and
then initializes shared-memory for replication slots. See calls of
InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in
PostmasterMain().  FWIW, I do not have any issue with current code as
well, but if we have to change it, is [1] any better?

[1]:
check_standby_slot_names()
{

if (!ok)
{
GUC_check_errdetail("List syntax is invalid.");
}
else if (ReplicationSlotCtl)
{
   foreach-loop for slot validation
}

pfree(rawname);
list_free(elemlist);
return ok;
}

thanks
SHveta

Re: Add new error_action COPY ON_ERROR "log"

2024-02-26 Thread torikoshia


On 2024-02-20 17:22, torikoshia wrote:

On 2024-02-17 15:00, Bharath Rupireddy wrote:
On Fri, Feb 16, 2024 at 8:17 PM torikoshia 
 wrote:


I may be wrong since I seldom do data loading tasks, but I greed with
you.

I also a little concerned about the case where there are many 
malformed

data and it causes lots of messages, but the information is usually
valuable and if users don't need it, they can suppress it by changing
client_min_messages.

Currently both summary of failures and individual information is 
logged

in NOTICE level.
If we should assume that there are cases where only summary 
information

is required, it'd be useful to set lower log level, i.e. LOG to the
individual information.


How about we emit the summary at INFO level and individual information
at NOTICE level? With this, the summary is given a different priority
than the individual info. With SET client_min_messages = WARNING; one
can still get the summary but not the individual info. Also, to get
all of these into server log, one can SET log_min_messages = INFO; or
SET log_min_messages = NOTICE;.

Thoughts?


It looks good to me.


Here are comments on the v2 patch.

+   if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+   {
+   ereport(NOTICE,

I think cstate->opts.on_error is not COPY_ON_ERROR_STOP here, since if 
it is COPY_ON_ERROR_STOP, InputFunctionCallSafe() should already have 
errored out.


Should it be something like "Assert(cstate->opts.on_error != 
COPY_ON_ERROR_STOP)"?



Should below manual also be updated?

A NOTICE message containing the ignored row count is emitted at the end 
of the COPY FROM if at least one row was discarded.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

RE: speed up a logical replica setup

2024-02-26 Thread Hayato Kuroda (Fujitsu)

Dear Euler,

>> The possible solution would be
>> 1) allow to run pg_createsubscriber if standby is initially stopped .
>> I observed that pg_logical_createsubscriber also uses this approach.
>> 2) read GUCs via SHOW command and restore them when server restarts
>>
>3. add a config-file option. That's similar to what pg_rewind does.

Sorry, which pg_rewind option did you mention? I cannot find.
IIUC, -l is an only option which can accept the path, but it is not related 
with us.

Also, I'm not sure the benefit to add as new options. Basically it should be 
less.
Is there benefits than read via SHOW? Even if I assume the pg_resetwal has such
an option, the reason is that the target postmaster for pg_resetwal must be 
stopped.

>I expect
>that Debian-based installations will have this issue.

I'm not familiar with the Debian-env, so can you explain the reason?

>It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
>I mean you need extra steps that can be error prone (different settings between
>files).

Yeah, if we use my approach, users who specify such GUCs may not be happy.
So...based on above discussion, we should choose either of below items. Thought?

a)
enforce the standby must be *stopped*, and options like config_file can be 
specified via option.
b)
enforce the standby must be *running*,  options like config_file would be read 
via SHOW command.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Re: RFC: Logging plan of the running query

2024-02-26 Thread torikoshia

On 2024-02-24 00:23, Robert Haas wrote:
On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud  
wrote:

On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> On Thu, Feb 22, 2024 at 6:25 AM James Coleman  wrote:
> > This is potentially a bit of a wild idea, but I wonder if having some
> > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > "normal" as opposed to "critical" (using that word differently than
> > the existing critical sections) would be worth it.
>
> It's worth considering, but the definition of "normal" vs. "critical"
> might be hard to pin down. Or, we might end up with a definition that
> is specific to this particular case and not generalizable to others.

But it doesn't have to be all or nothing right?  I mean each call 
could say

what the situation is like in their context, like
CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | 
GUARANTEE_WHATEVER), and
slowly tag calls as needed, similarly to how we add already CFI based 
on users

report.

Absolutely. My gut feeling is that it's going to be simpler to pick a
small number of places that are safe and sufficient for this
particular feature and add an extra call there

Hmm, whether extending CHECK_FOR_INTERRUPTS() or adding extras call 
directly, currently I'm not sure where are the good 'places', which 
don't give performance impact.

As attached PoC patch, I experimentally added extra calls on 
ExecScanFetch() which would be less called than ExecProcNode()[1].
When running sequential scan on pgbench_accounts which is on the memory, 
there seems a performance degradation.

  - Executed "select * from pgbench_accounts" for 20 times
  - Compared the elapsed time between the patch applied and not applied 
on 874d817baa160ca7e68

  - there were no heap_blks_read during the query
  - pgbench_accounts has 300 rows

  patch NOT applied:
  - average: 335.88 ms
  - max: 367.313 ms
  - min: 309.609 ms

  patch applied:
  - average: 342.57 ms
  - max: 380.099 ms
  - min: 324.270 ms

It would be nice if there was a place accessed once every few seconds or 
so..

[1] 
https://www.postgresql.org/message-id/20240215185911.v4o6fo444md6a3w7%40awork3.anarazel.de

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom dee1ce857c89bf646ff98f94c846746b96869ebb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 26 Feb 2024 19:19:35 +0900
Subject: [PATCH v37] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next
CHECK_LOG_QUERY_PLAN_PENDING() the target backend logs its
plan at LOG_SERVER_ONLY level, so that these plans will appear
in the server log but not be sent to the client.

CHECK_LOG_QUERY_PLAN_PENDING() is called from ExecScanFetch()
and ProcessClientReadInterrupt(). The former is intended to
log plans during any kind of scans and the latter is intended
to log that the no query is executing during client read.

---
 contrib/auto_explain/Makefile |   2 +
 contrib/auto_explain/auto_explain.c   |  23 +--
 contrib/auto_explain/t/001_auto_explain.pl|  35 
 doc/src/sgml/func.sgml|  50 +
 src/backend/access/transam/xact.c |  19 ++
 src/backend/catalog/system_functions.sql  |   2 +
 src/backend/commands/explain.c| 176 +-
 src/backend/executor/execMain.c   |  19 ++
 src/backend/executor/execScan.c   |   2 +
 src/backend/storage/ipc/procsignal.c  |   4 +
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/init/globals.c  |   2 +
 src/include/catalog/pg_proc.dat   |   6 +
 src/include/commands/explain.h|  16 ++
 src/include/miscadmin.h   |   1 +
 src/include/storage/procsignal.h  |   1 +
 src/include/tcop/pquery.h |   2 +-
 src/include/utils/elog.h  |   1 +
 .../injection_points/injection_points.c   |  11 ++
 src/test/regress/expected/misc_functions.out  |  54 +-
 src/test/regress/sql/misc_functions.sql   |  41 +++-
 21 files changed, 430 insertions(+), 42 deletions(-)

diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile
index efd127d3ca..64fe0e0573 100644
--- a/contrib/auto_explain/Makefile
+++ b/contrib/auto_explain/Makefile
@@ -8,6 +8,8 @@ PGFILEDESC = "auto_explain - logging facility for execution plans"

 TAP_TESTS = 1

Re: Synchronizing slots from primary to standby

2024-02-26 Thread Amit Kapila

On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach the V98 patch set which addressed above comments.
>

Few comments:
=
1.
 WalSndWaitForWal(XLogRecPtr loc)
 {
  int wakeEvents;
+ bool wait_for_standby = false;
+ uint32 wait_event;
+ List*standby_slots = NIL;
  static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;

+ if (MyReplicationSlot->data.failover && replication_active)
+ standby_slots = GetStandbySlotList(true);
+
  /*
- * Fast path to avoid acquiring the spinlock in case we already know we
- * have enough WAL available. This is particularly interesting if we're
- * far behind.
+ * Check if all the standby servers have confirmed receipt of WAL up to
+ * RecentFlushPtr even when we already know we have enough WAL available.
+ *
+ * Note that we cannot directly return without checking the status of
+ * standby servers because the standby_slot_names may have changed, which
+ * means there could be new standby slots in the list that have not yet
+ * caught up to the RecentFlushPtr.
  */
- if (RecentFlushPtr != InvalidXLogRecPtr &&
- loc <= RecentFlushPtr)
- return RecentFlushPtr;
+ if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr)
+ {
+ FilterStandbySlots(RecentFlushPtr, _slots);

I think even if the slot list is not changed, we will always process
each slot mentioned in standby_slot_names once. Can't we cache the
previous list of slots for we have already waited for? In that case,
we won't even need to copy the list via GetStandbySlotList() unless we
need to wait.

2.
 WalSndWaitForWal(XLogRecPtr loc)
 {
+ /*
+ * Update the standby slots that have not yet caught up to the flushed
+ * position. It is good to wait up to RecentFlushPtr and then let it
+ * send the changes to logical subscribers one by one which are
+ * already covered in RecentFlushPtr without needing to wait on every
+ * change for standby confirmation.
+ */
+ if (wait_for_standby)
+ FilterStandbySlots(RecentFlushPtr, _slots);
+
  /* Update our idea of the currently flushed position. */
- if (!RecoveryInProgress())
+ else if (!RecoveryInProgress())
  RecentFlushPtr = GetFlushRecPtr(NULL);
  else
  RecentFlushPtr = GetXLogReplayRecPtr(NULL);
...
/*
* If postmaster asked us to stop, don't wait anymore.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (got_STOPPING)
break;

I think because 'wait_for_standby' may not be set in the first or
consecutive cycles we may send the WAL to the logical subscriber
before sending it to the physical subscriber during shutdown.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-02-26 Thread Amit Kapila

On Mon, Feb 26, 2024 at 12:59 PM Bertrand Drouvot
 wrote:
>
> On Mon, Feb 26, 2024 at 02:18:58AM +, Zhijie Hou (Fujitsu) wrote:
> > On Friday, February 23, 2024 6:12 PM Bertrand Drouvot 
> >  wrote:
> > > +   if (!ok)
> > > +   GUC_check_errdetail("List syntax is invalid.");
> > > +
> > > +   /*
> > > +* If there is a syntax error in the name or if the replication 
> > > slots'
> > > +* data is not initialized yet (i.e., we are in the startup 
> > > process), skip
> > > +* the slot verification.
> > > +*/
> > > +   if (!ok || !ReplicationSlotCtl)
> > > +   {
> > > +   pfree(rawname);
> > > +   list_free(elemlist);
> > > +   return ok;
> > > +   }
> > >
> > > we are testing the "ok" value twice, what about using if...else if... 
> > > instead and
> > > test it once? If so, it might be worth to put the:
> > >
> > > "
> > > +   pfree(rawname);
> > > +   list_free(elemlist);
> > > +   return ok;
> > > "
> > >
> > > in a "goto".
> >
> > There were comments to remove the 'goto' statement and avoid
> > duplicate free code, so I prefer the current style.
>
> The duplicate free code would come from the if...else if... rewrite but then
> the "goto" would remove it, so I'm not sure to understand your point.
>

I think Hou-San wants to say that there was previously a comment to
remove goto and now you are saying to introduce it. But, I think we
can avoid both code duplication and goto, if the first thing we check
in the function is ReplicationSlotCtl and return false if the same is
not set. Won't that be better?

>
> > >
> > > 10 ===
> > >
> > > +   if (slot->data.invalidated != RS_INVAL_NONE)
> > > +   {
> > > +   /*
> > > +* Specified physical slot have been 
> > > invalidated,
> > > so no point
> > > +* in waiting for it.
> > >
> > > We discovered in [1], that if the wal_status is "unreserved" then the 
> > > slot is still
> > > serving the standby. I think we should handle this case differently, 
> > > thoughts?
> >
> > I think the 'invalidated' slot can still be used is a separate bug.
> > Because
> > once the slot is invalidated, it can neither protect WALs or ROWs from being
> > removed even if the restart_lsn of the slot can be moved forward after 
> > being invalidated.
> >
> > If the standby can move restart_lsn forward for invalidated slots, then
> > it should also set the 'invalidated' flag back to NONE, otherwise the slot
> > cannot serve its purpose anymore. I also reported similar bug before[1].
>
...
>
> My point is that I think we should behave like it's not a bug and then adapt 
> the
> code accordingly here (until the bug gets fixed).
>

oh, I think this doesn't sound like a good idea to me. We should fix
that bug independently rather than adding code in new features to
consider the bug as a valid behavior. It will add the burden on us to
remember and remove the additional new check(s).

-- 
With Regards,
Amit Kapila.

Regardign RecentFlushPtr in WalSndWaitForWal()

2024-02-26 Thread shveta malik

Hi hackers,

I would like to understand why we have code [1] that retrieves
RecentFlushPtr in WalSndWaitForWal() outside of the loop. We utilize
RecentFlushPtr later within the loop, but prior to that, we already
have [2]. Wouldn't [2] alone be sufficient?

Just to check the impact, I ran 'make check-world' after removing [1],
and did not see any issue exposed by the test at-least.

Any thoughts?

[1]:
/* Get a more recent flush pointer. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr(NULL);
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

[2]:
/* Update our idea of the currently flushed position. */
else if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr(NULL);
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

thanks
Shveta

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2024-02-26 Thread Xing Guo

On Mon, Feb 26, 2024 at 7:27 PM Daniel Gustafsson  wrote:
>
> > On 1 Jul 2020, at 10:36, Daniel Gustafsson  wrote:
> >
> >> On 27 Mar 2020, at 11:22, Haozhou Wang  wrote:
> >
> >> We rebased this patch with the newest master.
> >
> > This patch now fails to build according to Travis:
> >
> > smgr.c: In function ‘smgrtruncate’:
> > smgr.c:578:47: error: passing argument 4 of ‘smgrtruncate_hook’ makes 
> > integer from pointer without a cast [-Werror=int-conversion]
> >   (*smgrtruncate_hook)(reln, forknum, nforks, nblocks);
> >   ^
> > smgr.c:578:47: note: expected ‘BlockNumber {aka unsigned int}’ but argument 
> > is of type ‘BlockNumber * {aka unsigned int *}’
> >
> >
> > The warning is also present in the Windows build:
> >
> > src/backend/storage/smgr/smgr.c(578): warning C4047: 'function' : 
> > 'BlockNumber' differs in levels of indirection from 'BlockNumber *' 
> > [C:\projects\postgresql\postgres.vcxproj]
> > src/backend/storage/smgr/smgr.c(578): warning C4024: 'smgrtruncate_hook' : 
> > different types for formal and actual parameter 4 
> > [C:\projects\postgresql\postgres.vcxproj]
> > 2 Warning(s)
> >
> > Marking the patch as Waiting for Author.
>
> As the thread has stalled and the above compilation issue hasn't been
> addressed, I'm marking this entry Returned with Feedback.  Feel free to open a
> new entry when there is a fixed patch.

Hi,

Looks that many hackers are happy with the original patch except that
the original patch needs a simple rebase, though it has been 3 years.
Shall we push forward this patch so that it can benefit extensions not
only diskquota?

The attachment is a rebased version of the original patch.

>
> cheers ./daniel
>
>
>
From 6479e1ccc92c53468dc879d33fa1a93f66ae4521 Mon Sep 17 00:00:00 2001
From: Hubert Zhang 
Date: Mon, 26 Feb 2024 19:24:43 +0800
Subject: [PATCH v8] Add smgr hooks to extend the logic of storage management

One example is that these hooks could be used by diskquota extension
to detect heap table change(create/extend/truncate/unlink).

Co-authored-by: Haozhou Wang 
Co-authored-by: Hubert Zhang 
Co-authored-by: Hao Wu 
---
 src/backend/storage/smgr/smgr.c | 21 +
 src/include/storage/smgr.h  | 16 
 2 files changed, 37 insertions(+)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index f7f7fe30b6..7992da84ce 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -142,6 +142,15 @@ static dlist_head unpinned_relns;
 static void smgrshutdown(int code, Datum arg);
 static void smgrdestroy(SMgrRelation reln);
 
+/*
+ * Hook for plugins to extend smgr functions.
+ * for example, collect statistics from smgr functions
+ * via recording the active relfilenode information.
+ */
+smgrcreate_hook_type smgrcreate_hook = NULL;
+smgrextend_hook_type smgrextend_hook = NULL;
+smgrtruncate_hook_type smgrtruncate_hook = NULL;
+smgrdounlinkall_hook_type smgrdounlinkall_hook = NULL;
 
 /*
  * smgrinit(), smgrshutdown() -- Initialize or shut down storage
@@ -413,6 +422,9 @@ smgrexists(SMgrRelation reln, ForkNumber forknum)
 void
 smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
+	if (smgrcreate_hook)
+		(*smgrcreate_hook)(reln, forknum, isRedo);
+
 	smgrsw[reln->smgr_which].smgr_create(reln, forknum, isRedo);
 }
 
@@ -471,6 +483,9 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 	if (nrels == 0)
 		return;
 
+	if (smgrdounlinkall_hook)
+		(*smgrdounlinkall_hook)(rels, nrels, isRedo);
+
 	/*
 	 * Get rid of any remaining buffers for the relations.  bufmgr will just
 	 * drop them without bothering to write the contents.
@@ -538,6 +553,9 @@ void
 smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		   const void *buffer, bool skipFsync)
 {
+	if (smgrextend_hook)
+		(*smgrextend_hook)(reln, forknum, blocknum, buffer, skipFsync);
+
 	smgrsw[reln->smgr_which].smgr_extend(reln, forknum, blocknum,
 		 buffer, skipFsync);
 
@@ -706,6 +724,9 @@ smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nb
 {
 	int			i;
 
+	if (smgrtruncate_hook)
+		(*smgrtruncate_hook)(reln, forknum, nforks, nblocks);
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 2b57addbdb..0ac1ed8f93 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -124,4 +124,20 @@ smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	smgrwritev(reln, forknum, blocknum, , 1, skipFsync);
 }
 
+/*
+ * Hook for plugins to extend smgr functions.
+ * for example, collect statistics from smgr functions
+ * via recording the active relfilenode information.
+ */
+typedef void (*smgrcreate_hook_type)(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern PGDLLIMPORT smgrcreate_hook_type

Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-26 Thread Richard Guo

On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov 
wrote:

> On 26/2/2024 12:44, Tender Wang wrote:
> > Make sense. I found MemoizeState already has a MemoryContext, so I used
> it.
> > I update the patch.
> This approach is better for me. In the next version of this patch, I
> included a test case. I am still unsure about the context chosen and the
> stability of the test case. Richard, you recently fixed some Memoize
> issues, could you look at this problem and patch?

I looked at this issue a bit.  It seems to me what happens is that at
first the memory areas referenced by probeslot->tts_values[] are
allocated in the per tuple context (see prepare_probe_slot).  And then
in MemoizeHash_hash, after we've calculated the hashkey, we will reset
the per tuple context.  However, later in MemoizeHash_equal, we still
need to reference the values in probeslot->tts_values[], which have been
cleared.

Actually the context would always be reset in MemoizeHash_equal, for
both binary and logical mode.  So I kind of wonder if it's necessary to
reset the context in MemoizeHash_hash.

The ResetExprContext call in MemoizeHash_hash was introduced in
0b053e78b to fix a memory leak issue.

commit 0b053e78b5990cd01e7169ee5bd2bb8e4045deea
Author: David Rowley 
Date:   Thu Oct 5 20:30:47 2023 +1300

Fix memory leak in Memoize code

It seems to me that switching to the per-tuple memory context is
sufficient to fix the memory leak.  Calling ResetExprContext in
MemoizeHash_hash each time seems too aggressive.

I tried to remove the ResetExprContext call in MemoizeHash_hash and did
not see the memory leak with the repro query in [1].

diff --git a/src/backend/executor/nodeMemoize.c
b/src/backend/executor/nodeMemoize.c
index 18870f10e1..f2f025520d 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -207,7 +207,6 @@ MemoizeHash_hash(struct memoize_hash *tb, const
MemoizeKey *key)
}
}

-   ResetExprContext(econtext);
MemoryContextSwitchTo(oldcontext);
return murmurhash32(hashkey);
 }

Looping in David to have a look.

[1]
https://www.postgresql.org/message-id/flat/83281eed63c74e4f940317186372abfd%40cft.ru

Thanks
Richard

Re: Improve eviction algorithm in ReorderBuffer




On 2/26/24 07:46, Masahiko Sawada wrote:
> On Sat, Feb 24, 2024 at 1:29 AM Tomas Vondra
>  wrote:
>>...
>>
>> overall design
>> --
>>
>> As for the design, I agree with the approach of using a binaryheap to
>> track transactions by size. When going over the thread history,
>> describing the initial approach with only keeping "large" transactions
>> above some threshold (e.g. 10%), I was really concerned that'll either
>> lead to abrupt changes in behavior (when transactions move just around
>> the 10%), or won't help with many common cases (with most transactions
>> being below the limit).
>>
>> I was going to suggest some sort of "binning" - keeping lists for
>> transactions of similar size (e.g. <1kB, 1-2kB, 2-4kB, 4-8kB, ...) and
>> evicting transactions from a list, i.e. based on approximate size. But
>> if the indexed binary heap seems to be cheap enough, I think it's a
>> better solution.
> 
> I've also considered the binning idea. But it was not clear to me how
> it works well in a case where all transactions belong to the
> particular class. For example, if we need to free up 1MB memory, we
> could end up evicting 2000 transactions consuming 50 bytes instead of
> 100 transactions consuming 1000 bytes, resulting in that we end up
> serializing more transactions. Also, I'm concerned about the cost of
> maintaining the binning lists.
> 

I don't think the list maintenance would be very costly - in particular,
the lists would not need to be sorted by size. You're right in some
extreme cases we might evict the smallest transactions in the list. I
think on average we'd evict transactions with average size, which seems
OK for this use case.

Anyway, I don't think we need to be distracted with this. I mentioned it
merely to show it was considered, but the heap seems to work well
enough, and in the end is even simpler because the complexity is hidden
outside reorderbuffer.

>>
>> The one thing I'm a bit concerned about is the threshold used to start
>> using binary heap - these thresholds with binary decisions may easily
>> lead to a "cliff" and robustness issues, i.e. abrupt change in behavior
>> with significant runtime change (e.g. you add/remove one transaction and
>> the code takes a much more expensive path). The value (1024) seems
>> rather arbitrary, I wonder if there's something to justify that choice.
> 
> True. 1024 seems small to me. In my environment, I started to see a
> big difference from around 4 transactions. But it varies depending
> on environments and workloads.
> 
> I think that this performance problem we're addressing doesn't
> normally happen as long as all transactions being decoded are
> top-level transactions. Otherwise, we also need to improve
> ReorderBufferLargestStreamableTopTXN(). Given this fact, I think
> max_connections = 1024 is a possible value in some systems, and I've
> observed such systems sometimes. On the other hand, I've observed >
> 5000 in just a few cases, and having more than 5000 transactions in
> ReorderBuffer seems unlikely to happen without subtransactions. I
> think we can say it's an extreme case, the number is still an
> arbitrary number though.
> 
> Or probably we can compute the threshold based on max_connections,
> e.g., max_connections * 10. That way, we can ensure that users won't
> incur the max-heap maintenance costs as long as they don't use
> subtransactions.
> 

Tying this to max_connections seems like an interesting option. It'd
make this adaptive to a system. I haven't thought about the exact value
(m_c * 10), but it seems better than arbitrary hard-coded values.

>>
>> In any case, I agree it'd be good to have some dampening factor, to
>> reduce the risk of trashing because of adding/removing a single
>> transaction to the decoding.
>>
>>
>> related stuff / GenerationContext
>> -
>>
>> It's not the fault of this patch, but this reminds me I have some doubts
>> about how the eviction interferes with using the GenerationContext for
>> some of the data. I suspect we can easily get into a situation where we
>> evict the largest transaction, but that doesn't actually reduce the
>> memory usage at all, because the memory context blocks are shared with
>> some other transactions and don't get 100% empty (so we can't release
>> them). But it's actually worse, because GenerationContext does not even
>> reuse this memory. So do we even gain anything by the eviction?
>>
>> When the earlier patch versions also considered age of the transaction,
>> to try evicting the older ones first, I think that was interesting. I
>> think we may want to do something like this even with the binary heap.
> 
> Thank you for raising this issue. This is one of the highest priority
> items in my backlog. We've seen cases where the logical decoding uses
> much more memory than logical_decoding_work_mem value[1][2] (e.g. it
> used 4GB memory even though the logical_decoding_work_mem was 256kB).
> I think that the problem

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-02-26 Thread Josef Šimánek

po 26. 2. 2024 v 8:20 odesílatel Jelte Fennema-Nio  napsal:
>
> On Sun, 25 Feb 2024 at 23:34, Josef Šimánek  wrote:
> > Exposing parser to frontend tools makes no sense to me
>
> Not everyone seems to agree with that, it's actually already done by
> Lukas from pganalyze: https://github.com/pganalyze/libpg_query

I did a quick look. That's clearly amazing work, but it is not parser
being exposed to frontend (refactored). It is (per my understanding)
backend code isolated to minimum to be able to parse query. It is
still bound to individual backend version and to backend source code.
And it is close to my effort (I was about to start with a simpler
version not providing tokens, just the result), but instead of copying
files from backend into separate project and shave off to minimum,
provide the same tool with backend utils directly.

Re: Documentation: warn about two_phase when altering a subscription

Hi,

On Fri, Feb 23, 2024 at 11:50:17PM +, Tristen Raab wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:tested, passed
> 
> Hello,
> 
> I've reviewed your patch,

Thanks!

> it applies correctly and the documentation builds without any errors. As for 
> the content of the patch itself, I think so far it's good but would make two 
> modifications. I like how the patch was worded originally when referring to 
> the subscription, stating these parameters were 'in' the subscription rather 
> than 'by' it. So I'd propose changing
> 
> > parameters specified by the subscription. When creating the slot, ensure
> 
> to 
> 
> > parameters specified in the subscription. When creating the slot, ensure

As non native English speaker I don't have a strong opinion on that (though I
also preferred how it was worded in V1).

> 
> and secondly the section "ensure the slot properties failover and two_phase 
> match their counterpart parameters of the subscription" sounds a bit clunky. 
> So I'd also propose changing:
> 
> > the slot properties failover and 
> > two_phase
> > match their counterpart parameters of the subscription.
> 
> to
> 
> > the slot properties failover and 
> > two_phase
> > match their counterpart parameters in the subscription.

Same here, I don't have a strong opinion on that.

As the patch as it is now looks good to Amit (see [1]), I prefer to let him
decide which wording he pefers to push.

> I feel this makes the description flow a bit better when reading. But other 
> than that I think it's quite clear.

Thanks!

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KdZMtJfo%3DK%3DXWsQQu8OHutT_dwJV2D3zs4h9g5zdNz2A%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Injection points: some tools to wait and wake

2024-02-26 Thread Andrey M. Borodin




> On 26 Feb 2024, at 08:57, Michael Paquier  wrote:
> 
> 

Would it be possible to have a helper function to check this:

+ok( $node_standby->poll_query_until(
+   'postgres',
+   qq[SELECT count(*) FROM pg_stat_activity
+   WHERE backend_type = 'checkpointer' AND wait_event = 
'CreateRestartPoint' ;],
+   '1'),
+   'checkpointer is waiting in restart point'
+) or die "Timed out while waiting for checkpointer to run restart point”;

So that we could do something like

ok(node_standby->await_injection_point(“CreateRestartPoint”,”checkpointer"));

IMO, this could make many tests cleaner.
Or, perhaps, it’s a functionality for a future development?

Thanks!


Best regards, Andrey Borodin.

Re: Optimize planner memory consumption for huge arrays




On 2/25/24 17:29, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/25/24 00:07, Tom Lane wrote:
>>> ...  I'm not sure if it'd be worth extending
>>> the mcxt.c API to provide something like "MemoryContextResetIfBig",
>>> with some internal rule that would be cheap to apply like "reset
>>> if we have any non-keeper blocks".
> 
>> I think MemoryContextResetIfBig is an interesting idea - I think a good
>> way to define "big" would be "has multiple blocks", because that's the
>> only case where we can actually reclaim some memory.
> 
> Yeah.  Also: once we had such an idea, it'd be very tempting to apply
> it to other frequently-reset contexts like the executor's per-tuple
> evaluation contexts.  I'm not quite prepared to argue that
> MemoryContextReset should just act that way all the time ... but
> it's sure interesting to think about.
> 

Do the context resets consume enough time to make this measurable? I may
be wrong, but I'd guess it's not measurable. In which case, what would
be the benefit?

> Another question is whether this wouldn't hurt debugging, in that
> dangling-pointer bugs would become harder to catch.  We'd certainly
> want to turn off the optimization in USE_VALGRIND builds, and maybe
> we just shouldn't do it at all if USE_ASSERT_CHECKING.
> 
>   regards, tom lane

+1 to disable this optimization in assert-enabled builds. I guess we'd
invent a new constant to disable it, and tie it to USE_ASSERT_CHECKING
(similar to CLOBBER_FREED_MEMORY, for example).

Thinking about CLOBBER_FREED_MEMORY, could it be useful to still clobber
the memory, even if we don't actually reset the context?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Improve readability by using designated initializers when possible

2024-02-26 Thread Japin Li



On Mon, 26 Feb 2024 at 16:41, jian he  wrote:
> Hi. minor issues.
>
> @@ -2063,12 +2009,12 @@ find_expr_references_walker(Node *node,
>   CoerceViaIO *iocoerce = (CoerceViaIO *) node;
>
>   /* since there is no exposed function, need to depend on type */
> - add_object_address(OCLASS_TYPE, iocoerce->resulttype, 0,
> + add_object_address(TypeRelationId iocoerce->resulttype, 0,
> context->addrs);
>
> @@ -2090,21 +2036,21 @@ find_expr_references_walker(Node *node,
>   ConvertRowtypeExpr *cvt = (ConvertRowtypeExpr *) node;
>
>   /* since there is no function dependency, need to depend on type */
> - add_object_address(OCLASS_TYPE, cvt->resulttype, 0,
> + add_object_address(TypeRelationId cvt->resulttype, 0,
> context->addrs);
>
> obvious typo errors.
>
> diff --git a/src/common/relpath.c b/src/common/relpath.c
> index b16fe19dea6..d9214f915c9 100644
> --- a/src/common/relpath.c
> +++ b/src/common/relpath.c
> @@ -31,10 +31,10 @@
>   * pg_relation_size().
>   */
>  const char *const forkNames[] = {
> - "main", /* MAIN_FORKNUM */
> - "fsm", /* FSM_FORKNUM */
> - "vm", /* VISIBILITYMAP_FORKNUM */
> - "init" /* INIT_FORKNUM */
> + [MAIN_FORKNUM] = "main",
> + [FSM_FORKNUM] = "fsm",
> + [VISIBILITYMAP_FORKNUM] = "vm",
> + [INIT_FORKNUM] = "init",
>  };
>
> `+ [INIT_FORKNUM] = "init", ` no need for an extra  comma?
>
> + [PG_SJIS] = {0, 0, pg_sjis_mblen, pg_sjis_dsplen,
> pg_sjis_verifychar, pg_sjis_verifystr, 2},
> + [PG_BIG5] = {0, 0, pg_big5_mblen, pg_big5_dsplen,
> pg_big5_verifychar, pg_big5_verifystr, 2},
> + [PG_GBK] = {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifychar,
> pg_gbk_verifystr, 2},
> + [PG_UHC] = {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifychar,
> pg_uhc_verifystr, 2},
> + [PG_GB18030] = {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen,
> pg_gb18030_verifychar, pg_gb18030_verifystr, 4},
> + [PG_JOHAB] = {0, 0, pg_johab_mblen, pg_johab_dsplen,
> pg_johab_verifychar, pg_johab_verifystr, 3},
> + [PG_SHIFT_JIS_2004] = {0, 0, pg_sjis_mblen, pg_sjis_dsplen,
> pg_sjis_verifychar, pg_sjis_verifystr, 2},
>  };
> similarly, last entry, no need an extra comma?
> also other places last array entry no need extra comma.

For last entry comma, see [1].

[1] 
https://www.postgresql.org/message-id/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org

Re: Improve readability by using designated initializers when possible

2024-02-26 Thread jian he

Hi. minor issues.

@@ -2063,12 +2009,12 @@ find_expr_references_walker(Node *node,
  CoerceViaIO *iocoerce = (CoerceViaIO *) node;

  /* since there is no exposed function, need to depend on type */
- add_object_address(OCLASS_TYPE, iocoerce->resulttype, 0,
+ add_object_address(TypeRelationId iocoerce->resulttype, 0,
context->addrs);

@@ -2090,21 +2036,21 @@ find_expr_references_walker(Node *node,
  ConvertRowtypeExpr *cvt = (ConvertRowtypeExpr *) node;

  /* since there is no function dependency, need to depend on type */
- add_object_address(OCLASS_TYPE, cvt->resulttype, 0,
+ add_object_address(TypeRelationId cvt->resulttype, 0,
context->addrs);

obvious typo errors.

diff --git a/src/common/relpath.c b/src/common/relpath.c
index b16fe19dea6..d9214f915c9 100644
--- a/src/common/relpath.c
+++ b/src/common/relpath.c
@@ -31,10 +31,10 @@
  * pg_relation_size().
  */
 const char *const forkNames[] = {
- "main", /* MAIN_FORKNUM */
- "fsm", /* FSM_FORKNUM */
- "vm", /* VISIBILITYMAP_FORKNUM */
- "init" /* INIT_FORKNUM */
+ [MAIN_FORKNUM] = "main",
+ [FSM_FORKNUM] = "fsm",
+ [VISIBILITYMAP_FORKNUM] = "vm",
+ [INIT_FORKNUM] = "init",
 };

`+ [INIT_FORKNUM] = "init", ` no need for an extra  comma?

+ [PG_SJIS] = {0, 0, pg_sjis_mblen, pg_sjis_dsplen,
pg_sjis_verifychar, pg_sjis_verifystr, 2},
+ [PG_BIG5] = {0, 0, pg_big5_mblen, pg_big5_dsplen,
pg_big5_verifychar, pg_big5_verifystr, 2},
+ [PG_GBK] = {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifychar,
pg_gbk_verifystr, 2},
+ [PG_UHC] = {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifychar,
pg_uhc_verifystr, 2},
+ [PG_GB18030] = {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen,
pg_gb18030_verifychar, pg_gb18030_verifystr, 4},
+ [PG_JOHAB] = {0, 0, pg_johab_mblen, pg_johab_dsplen,
pg_johab_verifychar, pg_johab_verifystr, 3},
+ [PG_SHIFT_JIS_2004] = {0, 0, pg_sjis_mblen, pg_sjis_dsplen,
pg_sjis_verifychar, pg_sjis_verifystr, 2},
 };
similarly, last entry, no need an extra comma?
also other places last array entry no need extra comma.

Re: Sequence Access Methods, round two

2024-02-26 Thread Matthias van de Meent

On Mon, 26 Feb 2024 at 09:11, Michael Paquier  wrote:
>
> On Thu, Feb 22, 2024 at 05:36:00PM +0100, Tomas Vondra wrote:
> > 0002, 0003
> > 
> > seems fine, cosmetic changes
>
> Thanks, I've applied these two for now.  I'll reply to the rest
> tomorrow or so.

Huh, that's surprising to me. I'd expected this to get at least a
final set of patches before they'd get committed. After a quick check
6e951bf seems fine, but I do have some nits on 449e798c:

> +/* 
> + *validate_relation_kind - check the relation's kind
> + *
> + *Make sure relkind is from an index

Shouldn't this be "... from a sequence"?

> + * 
> + */
> +static inline void
> +validate_relation_kind(Relation r)

Shouldn't this be a bit more descriptive than just
"validate_relation_kind"? I notice this is no different from how this
is handled in index.c and table.c, but I'm not a huge fan of shadowing
names, even with static inlines functions.

> -ERROR:  "serialtest1" is not a sequence
> +ERROR:  cannot open relation "serialtest1"
> +DETAIL:  This operation is not supported for tables.

We seem to lose some details here: We can most definitely open tables.
We just can't open them while treating them as sequences, which is not
mentioned in the error message.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: RFC: Logging plan of the running query

2024-02-26 Thread Robert Haas

On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud  wrote:
> Yeah, trying to find a generalized solution seems like worth investing some
> time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
> few years down the road.

I just don't really see how to do it. I suspect that every task that
wants to run at some CFIs but not others is going to have slightly
different requirements, and we probably can't foresee what all of
those requirements are.

Said another way, if in the future we want to call
DoSomethingOrOther() from the CFI handler, well then we need to know
that we're not already in the middle of using any subsystem that
DoSomethingOrOther() might also try to use ... and we also need to
know that we're not in the middle of doing anything that's more
critical than DoSomethingOrOther(). But both of these are likely to
vary in each case.

EXPLAIN might be one member of a general class of things that require
catalog access (and thus might take locks or lwlocks, access the
catalogs, trigger invalidations, etc.) but it's not clear how general
that class really is. Also, I think if we try to do too many different
kinds of things at CFIs, the whole thing is going to fall apart.
You'll end up failing to foresee some interactions, or the stack will
get too deep, or ... something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Injection points: some tools to wait and wake

Hi,

On Mon, Feb 26, 2024 at 12:57:09PM +0900, Michael Paquier wrote:
> On Thu, Feb 22, 2024 at 08:00:24AM +, Bertrand Drouvot wrote:
> > +/* Maximum number of wait usable in injection points at once */
> > 
> > s/Maximum number of wait/Maximum number of waits/ ?
> 
> Thanks.  I've edited a few more places while scanning the whole.

Thanks!

> > 
> > 2 ===
> > 
> > +# Check the logs that the restart point has started on standby.  This is
> > +# optional, but let's be sure.
> > +my $log = slurp_file($node_standby->logfile, $logstart);
> > +my $checkpoint_start = 0;
> > +if ($log =~ m/restartpoint starting: immediate wait/)
> > +{
> > +   $checkpoint_start = 1;
> > +}
> > +is($checkpoint_start, 1, 'restartpoint has started');
> > 
> > what about?
> > 
> > ok( $node_standby->log_contains( "restartpoint starting: immediate wait", 
> > $logstart),
> > "restartpoint has started");
> 
> And I'm behind the commit that introduced it (392ea0c78fdb).

;-)

> It is
> possible to remove the dependency to slurp_file() entirely by
> switching the second location checking the logs for the checkpoint
> completion.

Yeah right.

> > Except for the above, v3 looks good to me.
> 
> Thanks.  I'm looking at applying that at the beginning of next week if
> there are no more comments, to get something by the feature freeze.
> We could be more flexible for all that as that's related to testing,
> but let's be in the clear.

Sounds reasonable to me.

> I've also renamed the test to 041_checkpoint_at_promote.pl, as now
> that the original is fixed, the checkpoint is not invalid.  That's
> cleaner this way.

Agree.

I'll try to submit the POC patch in [1] before beginning of next week now that
we're "just waiting" if there is more comments on this current thread.


[1]: 
https://www.postgresql.org/message-id/ZdTNafYSxwnKNIhj%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: RFC: Logging plan of the running query

2024-02-26 Thread Ashutosh Bapat

On Mon, Feb 26, 2024 at 1:25 PM Julien Rouhaud  wrote:
> >
> > >
> > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL
> > >
> > > #define CHECK_FOR_INTERRUPTS(...) \
> > > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
> > > 
> > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
> > > 
> > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
> > > )
> > >
> > > We would have to duplicate the current CFI body, but it should never 
> > > really
> > > change, and we shouldn't end up with more than 2 overloads anyway so I 
> > > don't
> > > see it being much of a problem.
> >
> > Why do we need this complex macro?
>
> So that client code can use either CHECK_FOR_INTERRUPTS() or
> CHECK_FOR_INTERRUPTS(flag) rather that transforming every single
> CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's
> complaint about this approach.

It might be better to just create two marcos  (with names like
CHECK_FOR_INTERRUPTS() and CHECK_FOR_INTERRUPTS_SAFE()) to call
ProcessInterrupt() directly and modify ProcessInterrupts() to accept
the flag (and if required call CFIFlagHandler() additionally). Last
macro is hard to understand.

-- 
Best Wishes,
Ashutosh Bapat

Re: Sequence Access Methods, round two