Re: Minimal logical decoding on standbys

2023-04-02 Thread Amit Kapila
On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:
>
> On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote:
> > But if the ConditionVariableEventSleep() API is added, then I think
> > we
> > should change the non-recovery case to use a CV as well for
> > consistency, and it would avoid the need for WalSndWakeup().
>
> It seems like what we ultimately want is for WalSndWakeup() to
> selectively wake up physical and/or logical walsenders depending on the
> caller. For instance:
>
>WalSndWakeup(bool physical, bool logical)
>
> The callers:
>
>   * On promotion, StartupXLog would call:
> - WalSndWakeup(true, true)
>   * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call:
> - WalSndWakeup(true, !RecoveryInProgress())
>   * ApplyWalRecord would call:
> - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress())
>
> There seem to be two approaches to making that work:
>
> 1. Use two ConditionVariables, and WalSndWakeup would broadcast to one
> or both depending on its arguments.
>
> 2. Have a "replicaiton_kind" variable in WalSnd (either set based on
> MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate
> whether it's a physical or logical walsender. WalSndWakeup would wake
> up the right walsenders based on its arguments.
>
> #2 seems simpler at least for now. Would that work?
>

Agreed, even Bertrand and myself discussed the same approach few
emails above. BTW, if we have this selective logic to wake
physical/logical walsenders and for standby's, we only wake logical
walsenders at the time of  ApplyWalRecord() then do we need the new
conditional variable enhancement being discussed, and if so, why?

-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2023-04-02 Thread Amit Kapila
On Mon, Apr 3, 2023 at 1:31 AM Jeff Davis  wrote:
>
> On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote:
> > I was thinking that, if a new LogicalWalSndWakeup() replaces
> > "ConditionVariableBroadcast(>replayedCV);"
> > in ApplyWalRecord() then, it could be possible that some walsender(s)
> > are requested to wake up while they are actually doing decoding (but
> > I might be wrong).
>
> I don't think that's a problem, right?
>

Agreed, I also don't see a problem because of the reason you mentioned
below that if the latch is already set, we won't do anything in
SetLatch.

> We are concerned about wakeups when they happen repeatedly when there's
> no work to do, or when the wakeup doesn't happen when it should (and we
> need to wait for a timeout).
>
> > >
> > > Currently, we wake up walsenders only after writing some WAL
> > > records
> > > at the time of flush, so won't it be better to wake up only after
> > > applying some WAL records rather than after applying each record?
> >
> > Yeah that would be better.
>
> Why? If the walsender is asleep, and there's work to be done, why not
> wake it up?
>

I think we can wake it up when there is work to be done even if the
work unit is smaller. The reason why I mentioned waking up the
walsender only after processing some records is to avoid the situation
where it may not need to wait again after decoding very few records.
But probably the logic in WalSndWaitForWal() will help us to exit
before starting to wait by checking the replay location.

-- 
With Regards,
Amit Kapila.




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-02 Thread David Rowley
On Sat, 1 Apr 2023 at 13:24, Melanie Plageman  wrote:
> Your diff LGTM.
>
> Earlier upthread in [1], Bharath had mentioned in a review comment about
> removing the global variables that he would have expected the analogous
> global in analyze.c to also be removed (vac_strategy [and analyze.c also
> has anl_context]).
>
> I looked into doing this, and this is what I found out (see full
> rationale in [2]):
>
> > it is a bit harder to remove it from analyze because acquire_func
> > doesn't take the buffer access strategy as a parameter and
> > acquire_sample_rows uses the vac_context global variable to pass to
> > table_scan_analyze_next_block().
>
> I don't know if this is worth mentioning in the commit removing the
> other globals? Maybe it will just make it more confusing...

I did look at that, but it seems a little tricky to make work unless
the AcquireSampleRowsFunc signature was changed. To me, it just does
not seem worth doing that to get rid of the two globals in analyze.c.

I pushed the patch with just the vacuum.c changes.

David




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-04-02 Thread Pavel Luzanov

Hello,

I found that the 'standalone backend' backend type is not documented 
right now.

Adding something like (from commit message) would be helpful:

Both the bootstrap backend and single user mode backends will have 
backend_type STANDALONE_BACKEND.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: Support logical replication of DDLs

2023-04-02 Thread Amit Kapila
On Sun, Apr 2, 2023 at 3:25 PM Phil Florent  wrote:
>
> As an end-user, I am highly interested in the patch 
> https://commitfest.postgresql.org/42/3595/ but I don't fully get its main 
> goal in its first version.
> It's "for all tables"  that will be implemented ?
> If one needs a complete replication of a cluster, a hot standby will always 
> be more efficient than a publication right ? I use both for different needs 
> in public hospitals I work for (hot standby for disaster recovery & logical 
> replication for dss)
> The main interest of a publication is to be able to filter things on the 
> publisher and to add stuff on the replicated cluster.
> If you compare PostgreSQL with less avanced RDBMS that don't really implement 
> schemas (typically Oracle), the huge advantage of Postgre is that many things 
> (e.g security) can be dynamically implemented via schemas.
> Developers just have put a table in the "good" schema and that's all. Logical 
> DML replication now fully implements this logic since PostgreSQL 15. Only 
> remaining problem is that a "for tables in schema" publication has to be 
> owned by a superuser (because a normal user can have tables that don't belong 
> to him in a schema it owns ?) If DDL replication only works with "FOR ALL 
> TABLES " and not "FOR TABLES IN SCHEMA" it reduces its interest anyway.
>

I don't see any major issue with supporting it for both "FOR ALL
TABLES" and "FOR ALL TABLES IN SCHEMA". However, if we want to support
it with the "FOR TABLE .." variant then we will probably need to apply
some restrictions as we can only support 'alter' and 'drop'.
Additionally, there could be some additional problems to deal with
like say if the column list has been specified then we ideally
shouldn't send those columns even in DDL. For example, if one uses
Alter Table .. Rename Column and the new column name is not present in
the published column list then we shouldn't send it.

BTW, we have some proposals related to the specification of this
feature in emails [1][2][3]. See, if you have any suggestions on the
same?

Note- It seems you have copied this thread to pgsql-general. Is it
because you are not subscribed to pgsql-hackers? As this is a
development project so better to keep the discussion on pgsql-hackers.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B%2BY7a2SQq55DXT6neghZgj3j%2BpQ74%3D8zfT3k8Tkdj0ZA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1KZqvJsTt7OkS8AkxOKVvSpkQkPwsqzNmo10mFaVAKeZg%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/OS0PR01MB571646874A3E165D93999A9D94889%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2023-04-02 Thread Thomas Munro
On Sat, Jan 14, 2023 at 3:23 PM Thomas Munro  wrote:
> On Thu, Jan 5, 2023 at 2:14 PM Tom Lane  wrote:
> > The rcancelrequested API is something that I devised out of whole cloth
> > awhile ago.  It's not in Tcl's copy of the code, which AFAIK is the
> > only other project using this regex engine.  I do still have vague
> > hopes of someday seeing the engine as a standalone project, which is
> > why I'd prefer to keep a bright line between the engine and Postgres.
> > But there's no very strong reason to think that any hypothetical future
> > external users who need a cancel API would want this API as opposed to
> > one that requires exit() or longjmp() to get out of the engine.  So if
> > we're changing the way we use it, I think it's perfectly reasonable to
> > redesign that API to make it simpler and less of an impedance mismatch.
>
> Thanks for that background.  Alright then, here's a new iteration
> exploring this direction.  It gets rid of CANCEL_REQUESTED() ->
> REG_CANCEL and the associated error and callback function, and instead
> has just "INTERRUPT(re);" at those cancellation points, which is a
> macro that defaults to nothing (for Tcl's benefit).  Our regcustom.h
> defines it as CHECK_FOR_INTERRUPTS().  I dunno if it's worth passing
> the "re" argument...  I was imagining that someone who wants to free
> memory explicitly and then longjmp would probably need it?  (It might
> even be possible to expand to something that sets an error and
> returns, not investigated.)  Better name or design very welcome.

I think this experiment worked out pretty well.  I think it's a nice
side-effect that you can see what memory the regexp subsystem is
using, and that's likely to lead to more improvements.  (Why is it
limited to caching 32 entries?  Why is it a linear search, not a hash
table?  Why is LRU implemented with memmove() and not a list?  Could
we have a GUC regex_cache_memory, so someone who uses a lot of regexes
can opt into a large cache?)  On the other hand it also uses a bit
more RAM, like other code using the reparenting trick, which is a
topic for future research.

I vote for proceeding with this approach.  I wish we didn't have to
tackle either a regexp interface/management change (done here) or a
CFI() redesign (not done, but probably also a good idea for other
reasons) before getting this signal stuff straightened out, but here
we are.  This approach seems good to me.  Anyone have a different
take?




Fix a comment in basic_archive about NO_INSTALLCHECK

2023-04-02 Thread Bharath Rupireddy
Hi,

It looks like comments in make file and meson file about not running
basic_archive tests in NO_INSTALLCHECK mode are wrong. The comments say the
module needs to be loaded via shared_preload_libraries=basic_archive, but
it actually doesn't. The custom file needs archive related parameters and
wal_level=replica. Here's a patch correcting that comment.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From a3d9c1e6e1f080403b48232c07f136c173865106 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 1 Apr 2023 08:31:52 +
Subject: [PATCH v1] Fix a comment in basic_archive about NO_INSTALLCHECK

---
 contrib/basic_archive/Makefile| 5 +++--
 contrib/basic_archive/meson.build | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
index 55d299d650..9216a3060b 100644
--- a/contrib/basic_archive/Makefile
+++ b/contrib/basic_archive/Makefile
@@ -5,8 +5,9 @@ PGFILEDESC = "basic_archive - basic archive module"
 
 REGRESS = basic_archive
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
-# Disabled because these tests require "shared_preload_libraries=basic_archive",
-# which typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require wal_level = replica and archive related
+# parameters to be enabled, which typical installcheck users do not have
+# (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
diff --git a/contrib/basic_archive/meson.build b/contrib/basic_archive/meson.build
index bc1380e6f6..0753b07bf5 100644
--- a/contrib/basic_archive/meson.build
+++ b/contrib/basic_archive/meson.build
@@ -27,8 +27,9 @@ tests += {
 'regress_args': [
   '--temp-config', files('basic_archive.conf'),
 ],
-# Disabled because these tests require "shared_preload_libraries=basic_archive",
-# which typical runningcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require wal_level = replica and archive related
+# parameters to be enabled, which typical installcheck users do not have
+# (e.g. buildfarm clients).
 'runningcheck': false,
   },
 }
-- 
2.34.1



Re: running logical replication as the subscription owner

2023-04-02 Thread Noah Misch
On Mon, Mar 27, 2023 at 04:47:22PM -0400, Robert Haas wrote:
> So that's a grey area, at least IMHO. The patch could be revised in
> some way, and the permissions requirements downgraded. However, if we
> do that, I think we're going to have to document that although in
> theory table owners can make themselves against subscription owners,
> in practice they probably won't be. That approach has some advantages,
> and I don't think it's insane. However, I am not convinced that it is
> the best idea, either, and I had the impression based on
> pgsql-security discussion that Andres and Noah thought this way was
> better. I might have misinterpreted their position, and they might
> have changed their minds, and they might have been wrong. But that's
> how we got here.

["this way" = requirement for SET ROLE]

On Wed, Mar 29, 2023 at 04:00:45PM -0400, Robert Haas wrote:
> > The dangerous cases seem to be something along the lines of a security-
> > invoker trigger function that builds and executes arbirary SQL based on
> > the row contents. And then the table owner would then still need to set
> > ENABLE ALWAYS TRIGGER.
> >
> > Do we really want to take that case on as our security responsibility?
> 
> That's something about which I would like to get more opinions.

The most-plausible-to-me attack involves an ENABLE ALWAYS trigger that logs
CURRENT_USER to an audit table.  The "SQL based on the row contents" scenario
feels remote.  Another remotely-possible attack involves a trigger that
internally queries some other table having RLS.  (Switching to the table owner
can change the rows visible to that query.)

If having INSERT/UPDATE privileges on the table were enough to make a
subscription that impersonates the table owner, then relatively-unprivileged
roles could make a subscription to bypass the aforementioned auditing.  Commit
c3afe8c has imposed weighty requirements beyond I/U privileges, namely holding
the pg_create_subscription role and database-level CREATE privilege.  Since
database-level CREATE is already powerful, it would be plausible to drop the
SET ROLE requirement and add this audit bypass to its powers.  The SET ROLE
requirement is nice for keeping the powers disentangled.  One drawback is
making people do GRANTs regardless of whether a relevant audit trigger exists.
Another drawback is the subscription role having more privileges than ideally
needed.  I do like keeping strong privileges orthogonal, so I lean toward
keeping the SET ROLE requirement.

On Thu, Mar 30, 2023 at 02:17:31PM -0400, Robert Haas wrote:
> --- a/doc/src/sgml/logical-replication.sgml
> +++ b/doc/src/sgml/logical-replication.sgml
> @@ -1774,6 +1774,23 @@ CONTEXT:  processing remote data for replication 
> origin "pg_16395" during "INSER
> SET ROLE to each role that owns a replicated table.
>
>  
> +  
> +   If the subscription has been configued with

Typo.

> Subject: [PATCH v3 1/2] Perform logical replication actions as the table
>  owner.

> Since this involves switching the active user frequently within
> a session that is authenticated as the subscription user, also
> impose SECURITY_RESTRICTED_OPEATION restrictions on logical

s/OPEATION/OPERATION/

> replication code. As an exception, if the table owner can SET
> ROLE to the subscription owner, these restrictions have no
> security value, so don't impose them in that case.
> 
> Subscription owners are now required to have the ability to
> SET ROLE to every role that owns a table that the subscription
> is replicating. If they don't, replication will fail. Superusers,
> who normally own subscriptions, satisfy this property by default.
> Non-superusers users who own subscriptions will needed to be
> granted the roles that own relevant tables.

s/will needed/will need/

(I did not read the patches in their entirety.)




Re: Should vacuum process config file reload more often

2023-04-02 Thread Masahiko Sawada
On Sat, Apr 1, 2023 at 4:09 AM Melanie Plageman
 wrote:
>
> On Fri, Mar 31, 2023 at 10:31 AM Melanie Plageman
>  wrote:
> >
> > On Thu, Mar 30, 2023 at 3:26 PM Daniel Gustafsson  wrote:
> > >
> > > > On 30 Mar 2023, at 04:57, Masahiko Sawada  wrote:
> > >
> > > > As another idea, why don't we use macros for that? For example,
> > > > suppose VacuumCostStatus is like:
> > > >
> > > > typedef enum VacuumCostStatus
> > > > {
> > > >VACUUM_COST_INACTIVE_LOCKED = 0,
> > > >VACUUM_COST_INACTIVE,
> > > >VACUUM_COST_ACTIVE,
> > > > } VacuumCostStatus;
> > > > VacuumCostStatus VacuumCost;
> > > >
> > > > non-vacuum code can use the following macros:
> > > >
> > > > #define VacuumCostActive() (VacuumCost == VACUUM_COST_ACTIVE)
> > > > #define VacuumCostInactive() (VacuumCost <= VACUUM_COST_INACTIVE) //
> > > > or we can use !VacuumCostActive() instead.
> > >
> > > I'm in favor of something along these lines.  A variable with a name that
> > > implies a boolean value (active/inactive) but actually contains a 
> > > tri-value is
> > > easily misunderstood.  A VacuumCostState tri-value variable (or a better 
> > > name)
> > > with a set of convenient macros for extracting the boolean 
> > > active/inactive that
> > > most of the code needs to be concerned with would more for more readable 
> > > code I
> > > think.
> >
> > The macros are very error-prone. I was just implementing this idea and
> > mistakenly tried to set the macro instead of the variable in multiple
> > places. Avoiding this involves another set of macros, and, in the end, I
> > think the complexity is much worse. Given the reviewers' uniform dislike
> > of VacuumCostInactive, I favor going back to two variables
> > (VacuumCostActive + VacuumFailsafeActive) and moving
> > LVRelState->failsafe_active to the global VacuumFailsafeActive.
> >
> > I will reimplement this in the next version.

Thank you for updating the patches. Here are comments for 0001, 0002,
and 0003 patches:

 0001:

@@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
 Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
params->truncate != VACOPTVALUE_AUTO);
-vacrel->failsafe_active = false;
+VacuumFailsafeActive = false;

If we go with the idea of using VacuumCostActive +
VacuumFailsafeActive, we need to make sure that both are cleared at
the end of the vacuum per table. Since the patch clears it only here,
it remains true even after vacuum() if we trigger the failsafe mode
for the last table in the table list.

In addition to that,  to ensure that also in an error case, I think we
need to clear it also in PG_FINALLY() block in vacuum().

---
@@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32
*VacuumSharedCostBalance;
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
 extern PGDLLIMPORT int VacuumCostBalanceLocal;

+extern bool VacuumFailsafeActive;

Do we need PGDLLIMPORT for VacuumFailSafeActive?

0002:

@@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items)
 return offsetof(VacDeadItems, items) +
sizeof(ItemPointerData) * max_items;
 }

+
 /*
  * vac_tid_reaped() -- is a particular tid deletable?
  *

Unnecessary new line. There are some other unnecessary new lines in this patch.

---
@@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
 extern PGDLLIMPORT int VacuumCostBalanceLocal;

 extern bool VacuumFailsafeActive;
+extern int VacuumCostLimit;
+extern double VacuumCostDelay;

and

@@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers;
 extern PGDLLIMPORT int VacuumCostPageHit;
 extern PGDLLIMPORT int VacuumCostPageMiss;
 extern PGDLLIMPORT int VacuumCostPageDirty;
-extern PGDLLIMPORT int VacuumCostLimit;
-extern PGDLLIMPORT double VacuumCostDelay;

Do we need PGDLLIMPORT too?

---
@@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg)
}
 }

+
 /*
- * Update the cost-based delay parameters, so that multiple workers consume
- * each a fraction of the total available I/O.
+ * Update vacuum cost-based delay-related parameters for autovacuum workers and
+ * backends executing VACUUM or ANALYZE using the value of relevant gucs and
+ * global state. This must be called during setup for vacuum and after every
+ * config reload to ensure up-to-date values.
  */
 void
-AutoVacuumUpdateDelay(void)
+VacuumUpdateCosts(void)
 {

Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than
autovacuum.c as this is now a common code for both vacuum and
autovacuum?

0003:

@@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params,
 {
 ListCell   *cur;

-VacuumUpdateCosts();
 in_vacuum = true;
-VacuumCostActive = (VacuumCostDelay > 0);
+VacuumFailsafeActive = false;
+VacuumUpdateCosts();

Hmm, if we initialize VacuumFailsafeActive here, 

Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-04-02 Thread Yurii Rashkovskii
I want to chime in on the issue of lower-number releases that are released
after higher-number releases. The way I see this particular problem is that
we always put upgrade SQL files in release "packages," and they obviously
become static resources.

While I [intentionally] overlook some details here, what if (as a
convention, for projects where it matters) we shipped extensions with
non-upgrade SQL files only, and upgrades were available as separate
downloads? This way, we're not tying releases themselves to upgrade paths.
This also requires no changes to Postgres.

I know this may be a big delivery layout departure for well-established
projects; I also understand that this won't solve the problem of having to
have these files in the first place (though in many cases, they can be
automatically generated once, I suppose, if they are trivial).


On Tue, Jan 10, 2023 at 5:52 AM Tom Lane  wrote:

> I continue to think that this is a fundamentally bad idea.  It creates
> all sorts of uncertainties about what is a valid update path and what
> is not.  Restrictions like
>
> + Such wildcard update
> + scripts will only be used when no explicit path is found from
> + old to target version.
>
> are just band-aids to try to cover up the worst problems.
>
> Have you considered the idea of instead inventing a "\include" facility
> for extension scripts?  Then, if you want to use one-monster-script
> to handle different upgrade cases, you still need one script file for
> each supported upgrade step, but those can be one-liners including the
> common script file.  Plus, such a facility could be of use to people
> who want intermediate factorization solutions (that is, some sharing
> of code without buying all the way into one-monster-script).
>
> regards, tom lane
>
>
>

-- 
Y.


Re: Infinite Interval

2023-04-02 Thread Joseph Koshakow
On Sun, Apr 2, 2023 at 6:54 PM Tom Lane  wrote:
>
>Joseph Koshakow  writes:
>>> I've added an errcontext to all the errors of the form "X out of
>>> range".
>
>Please note the style guidelines [1]:
>
>errcontext(const char *msg, ...) is not normally called directly
from
>an ereport message site; rather it is used in error_context_stack
>callback functions to provide information about the context in
which
>an error occurred, such as the current location in a PL function.
>
>If we should have this at all, which I doubt, it's probably
>errdetail not errcontext.

I've attached a patch with all of the errcontext calls removed. None of
the existing out of range errors have an errdetail call so I think this
is more consistent. If we do want to add errdetail, then we should
probably do it in a later patch and add it to all out of range errors,
not just the ones related to infinity.

>> How do you feel about redefining interval_mi in terms of interval_um
>> and interval_pl? That one felt like an improvement to me even outside
>> of the context of this change.
>
>I did not think so.  For one thing, it introduces integer-overflow
>hazards that you would not have otherwise; ie, interval_um might have
>to throw an error for INT_MIN input, even though the end result of
>the calculation would have been in range.

Good point, I didn't think of that.

Thanks,
Joe Koshakow
From f6bf9c201a94a0b338dff520442ac5e8d2922c89 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 1 Apr 2023 10:22:24 -0400
Subject: [PATCH 1/3] Move integer helper function to int.h

---
 src/backend/utils/adt/datetime.c | 25 -
 src/include/common/int.h | 13 +
 2 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index be2e55bb29..64f28a85b0 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -51,7 +51,6 @@ static int	DecodeDate(char *str, int fmask, int *tmask, bool *is2digits,
 	   struct pg_tm *tm);
 static char *AppendSeconds(char *cp, int sec, fsec_t fsec,
 		   int precision, bool fillzeros);
-static bool int64_multiply_add(int64 val, int64 multiplier, int64 *sum);
 static bool AdjustFractMicroseconds(double frac, int64 scale,
 	struct pg_itm_in *itm_in);
 static bool AdjustFractDays(double frac, int scale,
@@ -515,22 +514,6 @@ AppendTimestampSeconds(char *cp, struct pg_tm *tm, fsec_t fsec)
 	return AppendSeconds(cp, tm->tm_sec, fsec, MAX_TIMESTAMP_PRECISION, true);
 }
 
-
-/*
- * Add val * multiplier to *sum.
- * Returns true if successful, false on overflow.
- */
-static bool
-int64_multiply_add(int64 val, int64 multiplier, int64 *sum)
-{
-	int64		product;
-
-	if (pg_mul_s64_overflow(val, multiplier, ) ||
-		pg_add_s64_overflow(*sum, product, sum))
-		return false;
-	return true;
-}
-
 /*
  * Multiply frac by scale (to produce microseconds) and add to itm_in->tm_usec.
  * Returns true if successful, false if itm_in overflows.
@@ -621,7 +604,7 @@ AdjustMicroseconds(int64 val, double fval, int64 scale,
    struct pg_itm_in *itm_in)
 {
 	/* Handle the integer part */
-	if (!int64_multiply_add(val, scale, _in->tm_usec))
+	if (pg_mul_add_s64_overflow(val, scale, _in->tm_usec))
 		return false;
 	/* Handle the float part */
 	return AdjustFractMicroseconds(fval, scale, itm_in);
@@ -2701,9 +2684,9 @@ DecodeTimeForInterval(char *str, int fmask, int range,
 		return dterr;
 
 	itm_in->tm_usec = itm.tm_usec;
-	if (!int64_multiply_add(itm.tm_hour, USECS_PER_HOUR, _in->tm_usec) ||
-		!int64_multiply_add(itm.tm_min, USECS_PER_MINUTE, _in->tm_usec) ||
-		!int64_multiply_add(itm.tm_sec, USECS_PER_SEC, _in->tm_usec))
+	if (pg_mul_add_s64_overflow(itm.tm_hour, USECS_PER_HOUR, _in->tm_usec) ||
+		pg_mul_add_s64_overflow(itm.tm_min, USECS_PER_MINUTE, _in->tm_usec) ||
+		pg_mul_add_s64_overflow(itm.tm_sec, USECS_PER_SEC, _in->tm_usec))
 		return DTERR_FIELD_OVERFLOW;
 
 	return 0;
diff --git a/src/include/common/int.h b/src/include/common/int.h
index 450800894e..81726c65f7 100644
--- a/src/include/common/int.h
+++ b/src/include/common/int.h
@@ -254,6 +254,19 @@ pg_mul_s64_overflow(int64 a, int64 b, int64 *result)
 #endif
 }
 
+/*
+ * Add val * multiplier to *sum.
+ * Returns false if successful, true on overflow.
+ */
+static inline bool
+pg_mul_add_s64_overflow(int64 val, int64 multiplier, int64 *sum)
+{
+	int64		product;
+
+	return pg_mul_s64_overflow(val, multiplier, ) ||
+		pg_add_s64_overflow(*sum, product, sum);
+}
+
 /*
  * Overflow routines for unsigned integers
  *
-- 
2.34.1

From 765aa1ebf9de5e5d48e1c588f7bde70074374979 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 1 Apr 2023 10:22:53 -0400
Subject: [PATCH 2/3] Check for overflow in 

Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 15:52:14 -0700, Peter Geoghegan wrote:
> On Sun, Apr 2, 2023 at 3:30 PM Andres Freund  wrote:
> > > Actually, I suppose that isn't quite true, since you'd still need to
> > > find a way to pass the heap relation down to nbtree VACUUM. Say by
> > > adding it to IndexVacuumInfo.
> >
> > It has been added to that already, so it should really be as trivial as you
> > suggested earlier...
> 
> Oh yeah, I missed it because you put it at the end of the struct,
> rather than at the start, next to the existing Relation.

Well, Bertrand. But I didn't change it, so you're not wrong...


> This page deletion issue matters a lot more after the Postgres 14
> optimization added by commit e5d8a99903, which came after your
> GlobalVisCheckRemovableFullXid() snapshot scalability work (well, a
> few months after, at least). I really don't like the idea of something
> like that being much less effective due to logical decoding.

Yea, it's definitely good to use the relation there.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-02 Thread Alvaro Herrera
> From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001
> From: bdrouvotAWS 
> Date: Tue, 7 Feb 2023 08:59:47 +
> Subject: [PATCH v52 3/6] Allow logical decoding on standby.
> 
> Allow a logical slot to be created on standby. Restrict its usage
> or its creation if wal_level on primary is less than logical.
> During slot creation, it's restart_lsn is set to the last replayed
> LSN. Effectively, a logical slot creation on standby waits for an
> xl_running_xact record to arrive from primary.

Hmm, not sure if it really applies here, but this sounds similar to
issues with track_commit_timestamps: namely, if the primary has it
enabled and you start a standby with it enabled, that's fine; but if the
primary is later shut down (but the standby isn't) and then the primary
restarted with a lesser value, then the standby would misbehave without
any obvious errors.  If that is a real problem, then perhaps you can
solve it by copying some of the logic from track_commit_timestamps,
which took a large number of iterations to get right.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)




Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 15:29 -0700, Andres Freund wrote:
> I agree that the *wait* has to go through condition_variable.c, but
> it doesn't
> seem right that creation of the WES needs to go through
> condition_variable.c.

The kind of WES required by a CV is an implementation detail, so I was
concerned about making too many assumptions across different APIs.

But what I ended up with is arguably not better, so perhaps I should do
it your way and then just have some comments about what assumptions are
being made?

> The only thing that ConditionVariableEventSleep() seems to require is
> that the
> WES is waiting for MyLatch. You don't even need a separate WES for
> that, the
> already existing WES should suffice.

By "already existing" WES, I assume you mean FeBeWaitSet? Yes, that
mostly matches, but it uses WL_POSTMASTER_DEATH instead of
WL_EXIT_ON_PM_DEATH, so I'd need to handle PM death in
condition_variable.c. That's trivial to do, though.

Regards,
Jeff Davis





Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote:
> But if the ConditionVariableEventSleep() API is added, then I think
> we
> should change the non-recovery case to use a CV as well for
> consistency, and it would avoid the need for WalSndWakeup().

It seems like what we ultimately want is for WalSndWakeup() to
selectively wake up physical and/or logical walsenders depending on the
caller. For instance:

   WalSndWakeup(bool physical, bool logical)

The callers:

  * On promotion, StartupXLog would call:
- WalSndWakeup(true, true)
  * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call:
- WalSndWakeup(true, !RecoveryInProgress())
  * ApplyWalRecord would call:
- WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress())

There seem to be two approaches to making that work:

1. Use two ConditionVariables, and WalSndWakeup would broadcast to one
or both depending on its arguments.

2. Have a "replicaiton_kind" variable in WalSnd (either set based on
MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate
whether it's a physical or logical walsender. WalSndWakeup would wake
up the right walsenders based on its arguments.

#2 seems simpler at least for now. Would that work?

Regards,
Jeff Davis





Re: Infinite Interval

2023-04-02 Thread Tom Lane
Joseph Koshakow  writes:
>> I've added an errcontext to all the errors of the form "X out of
>> range".

Please note the style guidelines [1]:

errcontext(const char *msg, ...) is not normally called directly from
an ereport message site; rather it is used in error_context_stack
callback functions to provide information about the context in which
an error occurred, such as the current location in a PL function.

If we should have this at all, which I doubt, it's probably
errdetail not errcontext.

> How do you feel about redefining interval_mi in terms of interval_um
> and interval_pl? That one felt like an improvement to me even outside
> of the context of this change.

I did not think so.  For one thing, it introduces integer-overflow
hazards that you would not have otherwise; ie, interval_um might have
to throw an error for INT_MIN input, even though the end result of
the calculation would have been in range.

regards, tom lane

[1] https://www.postgresql.org/docs/devel/error-message-reporting.html




Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Peter Geoghegan
On Sun, Apr 2, 2023 at 3:30 PM Andres Freund  wrote:
> > Actually, I suppose that isn't quite true, since you'd still need to
> > find a way to pass the heap relation down to nbtree VACUUM. Say by
> > adding it to IndexVacuumInfo.
>
> It has been added to that already, so it should really be as trivial as you
> suggested earlier...

Oh yeah, I missed it because you put it at the end of the struct,
rather than at the start, next to the existing Relation.

This page deletion issue matters a lot more after the Postgres 14
optimization added by commit e5d8a99903, which came after your
GlobalVisCheckRemovableFullXid() snapshot scalability work (well, a
few months after, at least). I really don't like the idea of something
like that being much less effective due to logical decoding. Granted,
the optimization in commit e5d8a99903 was itself kind of a hack, which
should be replaced by a scheme that explicitly makes recycle safety
the responsibility of the FSM itself, not the responsibility of
VACUUM.

-- 
Peter Geoghegan




Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

2023-04-02 Thread Andres Freund
Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> > Here's a draft patch.
> 
> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:


if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
 targetBlock, 
otherBlock, vmbuffer,
 
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
 otherBlock, 
targetBlock, vmbuffer_other,
 vmbuffer);
...

if (otherBuffer != InvalidBuffer)
{
if (GetVisibilityMapPins(relation, otherBuffer, buffer,
 
otherBlock, targetBlock, vmbuffer_other,
 
vmbuffer))
unlockedTargetBuffer = true;
}
else
{
if (GetVisibilityMapPins(relation, buffer, 
InvalidBuffer,
 
targetBlock, InvalidBlockNumber,
 
vmbuffer, InvalidBuffer))
unlockedTargetBuffer = true;
}
}


Greetings,

Andres Freund




Re: Infinite Interval

2023-04-02 Thread Joseph Koshakow
>On Sun, Apr 2, 2023 at 5:36 PM Tom Lane  wrote:
>
>Joseph Koshakow  writes:
>> I've attached a patch with these changes that is meant to be applied
>> over the previous three patches. Let me know what you think.
>
>Does not really seem like an improvement to me --- I think it's
>adding more complexity than it removes.  The changes in CONTEXT
>messages are definitely not an improvement; you might as well
>not have the context messages at all as give misleading ones.
>(Those context messages are added by the previous patches, no?
>They do not really seem per project style, and I'm not sure
>that they are helpful.)

Yes they were added in the previous patch,
v17-0003-Add-infinite-interval-values.patch. I also had the following
note about them.

>I've added an errcontext to all the errors of the form "X out of
>range". My one concern is that some of the messages can be slightly
>confusing. For example date arithmetic is converted to timestamp
>arithmetic, so the errcontext talks about timestamps even though the
>actual operation used dates. For example,
>
>SELECT date 'infinity' + interval '-infinity';
>ERROR:  interval out of range
>CONTEXT:  while adding an interval and timestamp

I would be OK with removing all of the context messages or maybe only
keeping a select few, like the ones in interval_um.

How do you feel about redefining interval_mi in terms of interval_um
and interval_pl? That one felt like an improvement to me even outside
of the context of this change.

Thanks,
Joe Koshakow


Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 10:22:18 -0700, Peter Geoghegan wrote:
> On Sun, Apr 2, 2023 at 10:18 AM Peter Geoghegan  wrote:
> > Making nbtree page deletion more efficient when logical decoding is in
> > use seems well worthwhile. There is an "XXX" comment about this issue,
> > similar to the SP-GiST one. It looks like you already have everything
> > you need to make this work from yesterday's commit 61b313e47e.

+1


> Actually, I suppose that isn't quite true, since you'd still need to
> find a way to pass the heap relation down to nbtree VACUUM. Say by
> adding it to IndexVacuumInfo.

It has been added to that already, so it should really be as trivial as you
suggested earlier...

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 15:15:44 -0700, Jeff Davis wrote:
> On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote:
> > Why not offer a function to add a CV to a WES? It seems somehow odd
> > to require
> > going through condition_variable.c to create a WES.
> 
> I agree that it's a bit odd, but remember that after waiting on a CV's
> latch, it needs to re-insert itself into the CV's wait list.
> 
> A WaitEventSetWait() can't do that, unless we move the details of re-
> adding to the wait list into latch.c. I considered that, but latch.c
> already implements the APIs for WaitEventSet and Latch, so it felt
> complex to also make it responsible for ConditionVariable.

I agree that the *wait* has to go through condition_variable.c, but it doesn't
seem right that creation of the WES needs to go through condition_variable.c.

The only thing that ConditionVariableEventSleep() seems to require is that the
WES is waiting for MyLatch. You don't even need a separate WES for that, the
already existing WES should suffice.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote:
> Why not offer a function to add a CV to a WES? It seems somehow odd
> to require
> going through condition_variable.c to create a WES.

I agree that it's a bit odd, but remember that after waiting on a CV's
latch, it needs to re-insert itself into the CV's wait list.

A WaitEventSetWait() can't do that, unless we move the details of re-
adding to the wait list into latch.c. I considered that, but latch.c
already implements the APIs for WaitEventSet and Latch, so it felt
complex to also make it responsible for ConditionVariable.

I'm open to suggestion.

Regards,
Jeff Davis





Re: Making background psql nicer to use in tap tests

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 22:24:16 +0200, Daniel Gustafsson wrote:
> And a v5 to fix a test failure in recovery tests.

Thanks for workin gon this!


There's this XXX that I added:

> @@ -57,11 +51,10 @@ sub test_streaming
>   COMMIT;
>   });
>  
> - $in .= q{
> - COMMIT;
> - \q
> - };
> - $h->finish;# errors make the next test fail, so ignore them here
> + $h->query_safe('COMMIT');
> + $h->quit;
> + # XXX: Not sure what this means
> +# errors make the next test fail, so ignore them here
>  
>   $node_publisher->wait_for_catchup($appname);

I still don't know what that comment is supposed to mean, unfortunately.

Greetings,

Andres Freund




Re: Infinite Interval

2023-04-02 Thread Tom Lane
Joseph Koshakow  writes:
> I've attached a patch with these changes that is meant to be applied
> over the previous three patches. Let me know what you think.

Does not really seem like an improvement to me --- I think it's
adding more complexity than it removes.  The changes in CONTEXT
messages are definitely not an improvement; you might as well
not have the context messages at all as give misleading ones.
(Those context messages are added by the previous patches, no?
They do not really seem per project style, and I'm not sure
that they are helpful.)

regards, tom lane




Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

On 2023-03-31 02:44:33 -0700, Jeff Davis wrote:
> From 2f05cab9012950ae9290fccbb6366d50fc01553e Mon Sep 17 00:00:00 2001
> From: Jeff Davis 
> Date: Wed, 1 Mar 2023 20:02:42 -0800
> Subject: [PATCH v2] Introduce ConditionVariableEventSleep().
> 
> The new API takes a WaitEventSet which can include socket events. The
> WaitEventSet must have been created by
> ConditionVariableWaitSetCreate(), another new function, so that it
> includes the wait events necessary for a condition variable.

Why not offer a function to add a CV to a WES? It seems somehow odd to require
going through condition_variable.c to create a WES.

Greetings,

Andres Freund




Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 04:37:38PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> It's been a little while since I dug into this, but I do see your point
>> that the wraparound risk could be higher in some cases.  For example, if
>> you have a billion temp files to clean up, the custodian could be stuck on
>> that task for a long time.  I will give this some further thought.  I'm all
>> ears if anyone has ideas about how to reduce this risk.
> 
> I wonder if a single long-lived custodian task is the right model at all.
> At least for RemovePgTempFiles, it'd make more sense to write it as a
> background worker that spawns, does its work, and then exits,
> independently of anything else.  Of course, then you need some mechanism
> for ensuring that a bgworker slot is available when needed, but that
> doesn't seem horridly difficult --- we could have a few "reserved
> bgworker" slots, perhaps.  An idle bgworker slot doesn't cost much.

This has crossed my mind.  Even if we use the custodian for several
different tasks, perhaps it could shut down while not in use.  For many
servers, the custodian process will be used sparingly, if at all.  And if
we introduce something like custodian_max_workers, perhaps we could dodge
the wraparound issue a bit by setting the default to the number of
supported tasks.  That being said, this approach adds some complexity.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Infinite Interval

2023-04-02 Thread Joseph Koshakow
> > This code is duplicated in timestamp_pl_interval(). We could create a
function
> > to encode the infinity handling rules and then call it in these two
places. The
> > argument types are different, Timestamp and TimestampTz viz. which map
to in64,
> > so shouldn't be a problem. But it will be slightly unreadable. Or use
macros
> > but then it will be difficult to debug.
> >
> > What do you think?
>
> I was hoping that I could come up with a macro that we could re-use for
> all the similar logic. If that doesn't work then I'll try the helper
> functions. I'll update the patch in a follow-up email to give myself some
> time to think about this.

So I checked where are all the places that we do arithmetic between two
potentially infinite values, and it's at the top of the following
functions:

- timestamp_mi()
- timestamp_pl_interval()
- timestamptz_pl_interval_internal()
- interval_pl()
- interval_mi()
- timestamp_age()
- timestamptz_age()

I was able to get an extremely generic macro to work, but it was very
ugly and unmaintainable in my view. Instead I took the following steps
to clean this up:

- I rewrote interval_mi() to be implemented in terms of interval_um()
and interval_pl().
- I abstracted the infinite arithmetic from timestamp_mi(),
timestamp_age(), and timestamptz_age() into a helper function called
infinite_timestamp_mi_internal()
- I abstracted the infinite arithmetic from timestamp_pl_interval() and
timestamptz_pl_interval_internal() into a helper function called
infinite_timestamp_pl_interval_internal()

The helper functions return a bool to indicate if they set the result.
An alternative approach would be to check for finiteness in either of
the inputs, then call the helper function which would have a void
return type. I think this alternative approach would be slightly more
readable, but involve duplicate finiteness checks before and during the
helper function.

I've attached a patch with these changes that is meant to be applied
over the previous three patches. Let me know what you think.

With this patch I believe that I've addressed all open comments except
for the discussion around whether we should check just the months field
or all three fields for finiteness. Please let me know if I've missed
something.

Thanks,
Joe Koshakow
From e50d4ca6321c58d216d563f74502356d721c2b4b Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sun, 2 Apr 2023 17:15:01 -0400
Subject: [PATCH 4/4] Clean up infinity arithmetic

---
 src/backend/utils/adt/timestamp.c | 254 +++---
 src/test/regress/expected/interval.out|  16 +-
 src/test/regress/expected/timestamp.out   |   4 +-
 src/test/regress/expected/timestamptz.out |   4 +-
 4 files changed, 86 insertions(+), 192 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 78133dfb17..50a47f3778 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2788,16 +2788,15 @@ timestamp_larger(PG_FUNCTION_ARGS)
 	PG_RETURN_TIMESTAMP(result);
 }
 
-
-Datum
-timestamp_mi(PG_FUNCTION_ARGS)
+/* Helper function to perform subtraction between two potentially infinite
+ * timestamps.
+ *
+ * Returns true if either dt1 or dt1 were infinite and result was set,
+ * false otherwise.
+ */
+bool
+infinite_timestamp_mi_internal(Timestamp dt1, Timestamp dt2, Interval *result)
 {
-	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
-	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
-	Interval   *result;
-
-	result = (Interval *) palloc(sizeof(Interval));
-
 	/*
 	 * Subtracting two infinite timestamps with different signs results in an
 	 * infinite interval with the same sign as the left operand. Subtracting
@@ -2812,6 +2811,7 @@ timestamp_mi(PG_FUNCTION_ARGS)
 	 errcontext("while subtracting timestamps")));
 		else
 			INTERVAL_NOBEGIN(result);
+		return true;
 	}
 	else if (TIMESTAMP_IS_NOEND(dt1))
 	{
@@ -2822,11 +2822,34 @@ timestamp_mi(PG_FUNCTION_ARGS)
 	 errcontext("while subtracting timestamps")));
 		else
 			INTERVAL_NOEND(result);
+		return true;
 	}
 	else if (TIMESTAMP_IS_NOBEGIN(dt2))
+	{
 		INTERVAL_NOEND(result);
+		return true;
+	}
 	else if (TIMESTAMP_IS_NOEND(dt2))
+	{
 		INTERVAL_NOBEGIN(result);
+		return true;
+	}
+	else
+		return false;
+}
+
+Datum
+timestamp_mi(PG_FUNCTION_ARGS)
+{
+	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
+	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
+	Interval   *result;
+
+	result = (Interval *) palloc(sizeof(Interval));
+
+	if (infinite_timestamp_mi_internal(dt1, dt2, result))
+	{
+	}
 	else
 	{
 		if (unlikely(pg_sub_s64_overflow(dt1, dt2, >time)))
@@ -3060,23 +3083,15 @@ interval_justify_days(PG_FUNCTION_ARGS)
 	PG_RETURN_INTERVAL_P(result);
 }
 
-/* timestamp_pl_interval()
- * Add an interval to a timestamp data type.
- * Note that interval has provisions for qualitative year/month and day
- *	units, so try to do the right thing with them.
- * To add a month, increment the month, and use the same day of month.
- * Then, if the next month 

Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Fri, 2023-03-31 at 02:44 -0700, Jeff Davis wrote:
> Thank you, done. I think the nearby line was also wrong, returning
> true
> when there was no timeout. I combined the lines and got rid of the
> early return so it can check the list and timeout condition like
> normal. Attached.

On second (third?) thought, I think I was right the first time. It
passes the flag WL_EXIT_ON_PM_DEATH (included in the
ConditionVariableWaitSet), so a WL_POSTMASTER_DEATH event should not be
returned.

Also, I think the early return is correct. The current code in
ConditionVariableTimedSleep() still checks the wait list even if
WaitLatch() returns WL_TIMEOUT (it ignores the return), but I don't see
why it can't early return true. For a socket event in
ConditionVariableEventSleep() I think it should early return false.

Regards,
Jeff Davis





Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 04:23:05PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
>>> * Why does LookupCustodianFunctions think it needs to search the
>>> constant array?
> 
>> The order of the tasks in the array isn't guaranteed to match the order in
>> the CustodianTask enum.
> 
> Why not?  It's a constant array, we can surely manage to make its
> order match the enum.

Alright.  I'll change this.

>>> * The original proposal included moving RemovePgTempFiles into this
>>> mechanism, which I thought was probably the most useful bit of the
>>> whole thing.  I'm sad to see that gone, what became of it?
> 
>> I postponed that based on advice from upthread [1].  I was hoping to start
>> a dedicated thread for that immediately after the custodian infrastructure
>> was committed.  FWIW I agree that it's the most useful task of what's
>> proposed thus far.
> 
> Hmm, given Andres' objections there's little point in moving forward
> without that task.

Yeah.  I should probably tackle that one first and leave the logical tasks
for later, given there is some prerequisite work required.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart  writes:
> It's been a little while since I dug into this, but I do see your point
> that the wraparound risk could be higher in some cases.  For example, if
> you have a billion temp files to clean up, the custodian could be stuck on
> that task for a long time.  I will give this some further thought.  I'm all
> ears if anyone has ideas about how to reduce this risk.

I wonder if a single long-lived custodian task is the right model at all.
At least for RemovePgTempFiles, it'd make more sense to write it as a
background worker that spawns, does its work, and then exits,
independently of anything else.  Of course, then you need some mechanism
for ensuring that a bgworker slot is available when needed, but that
doesn't seem horridly difficult --- we could have a few "reserved
bgworker" slots, perhaps.  An idle bgworker slot doesn't cost much.

regards, tom lane




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-02 Thread Daniel Gustafsson
> On 31 Mar 2023, at 19:59, Jacob Champion  wrote:

>> +  # Let tests differentiate between vanilla OpenSSL and LibreSSL.
>> +  AC_CHECK_DECLS([LIBRESSL_VERSION_NUMBER], [], [], [#include 
>> ])
>> We have a check for SSL_CTX_set_cert_cb which is specifically done since it's
>> not present in Libressl.  Rather than spending more cycles in autoconf/meson,
>> couldn't we use HAVE_SSL_CTX_SET_CERT_CB for this test?  (Longer term, maybe 
>> we
>> should make the checks properly distinguish between OpenSSL and LibreSSL as
>> they are diverging, but thats not for this patch to tackle.)
> 
> I can make that change; note that it'll also skip some of the new tests
> with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
> acceptable, it should be an easy switch.

I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  What
am I missing?

>> I can agree with the comment that this seems brittle. How about moving the 
>> installation of openssl to after the brew cleanup stage to avoid the need 
>> for this? While that may leave more in the cache, it seems more palatable. 
>> Something like this essentially:
>> 
>>  brew install 
>>  brew cleanup -s 
>>  # Comment about why OpenSSL is kept separate
>>  brew install openssl@3
> 
> That looks much better to me, but it didn't work when I tried it. One or
> more of the packages above it (and/or the previous cache?) has already
> installed OpenSSL as one of its dependencies, so the last `brew install`
> becomes a no-op. I tried an `install --force` as well, but that didn't
> seem to do anything differently. :/

Ugh, that's very unfortunate, I guess we're stuck with this then.  If we can't
make brew cleanup not remove it then any hack applied to make it stick around
will be equally brittle so we might as well mkdir it back.

>> +   if (SSL_CTX_set_default_verify_paths(SSL_context) != 1)
>> OpenSSL documents this as "A missing default location is still treated as a
>> success.", is that something we need to document or in any way deal with?
>> (Skimming the OpenSSL code I'm not sure it's actually correct in v3+, but I
>> might very well have missed something.)
> 
> I think it's still true in v3+, because that sounds exactly like the
> brew issue we're working around in Cirrus. I'm not sure if there's much
> for us to do in that case, short of reimplementing the OpenSSL defaults
> logic and checking it ourselves. (And even that would look different
> between OpenSSL and LibreSSL...)

Right, that's clearly not something we want to do.

> Is there something we could document that's more helpful than "make sure
> your installation isn't broken"?

I wonder if there is an openssl command line example for verifying defaults
that we can document and refer to?

--
Daniel Gustafsson





Re: Making background psql nicer to use in tap tests

2023-04-02 Thread Daniel Gustafsson
> On 31 Mar 2023, at 22:33, Daniel Gustafsson  wrote:
> 
> The attached v4 fixes some incorrect documentation (added by me in v3), and
> fixes that background_psql() didn't honor on_error_stop and extraparams passed
> by the user.  I've also added a commit which implements the \password test 
> from
> the SCRAM iteration count patchset as well as cleaned up a few IPC::Run
> includes from test scripts.

And a v5 to fix a test failure in recovery tests.

--
Daniel Gustafsson



v5-0001-Refactor-background-psql-TAP-functions.patch
Description: Binary data


v5-0002-Add-test-SCRAM-iteration-changes-with-psql-passwo.patch
Description: Binary data


Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart  writes:
> On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
>> * Why does LookupCustodianFunctions think it needs to search the
>> constant array?

> The order of the tasks in the array isn't guaranteed to match the order in
> the CustodianTask enum.

Why not?  It's a constant array, we can surely manage to make its
order match the enum.

>> * The original proposal included moving RemovePgTempFiles into this
>> mechanism, which I thought was probably the most useful bit of the
>> whole thing.  I'm sad to see that gone, what became of it?

> I postponed that based on advice from upthread [1].  I was hoping to start
> a dedicated thread for that immediately after the custodian infrastructure
> was committed.  FWIW I agree that it's the most useful task of what's
> proposed thus far.

Hmm, given Andres' objections there's little point in moving forward
without that task.

regards, tom lane




Re: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind

2023-04-02 Thread Tom Lane
Onur Tirtir  writes:
> Thank you for reviewing the patch and for your feedback. I believe the v2 
> patch should be able to handle other protocol messages too.

I like the concept here, but the reporting that the v2 patch provides
would be seriously horrid: it's trying to print stuff that isn't
necessarily text, and for bind and execute messages it's substantially
dumber than the existing debug_query_string infrastructure.  Another
thing that is not great is that if Postgres itself throws an error
later in the query, nothing will be reported since we don't reach the
bottom of the processing loop.

I suggest that we need something closer to the attached.  Some
bikeshedding is possible on the specific printouts, but I'm not
sure it's worth working harder than this.

regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..a10ecbaf50 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -27,6 +27,10 @@
 #include 
 #include 
 
+#ifdef USE_VALGRIND
+#include 
+#endif
+
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -191,6 +195,36 @@ static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
 
 
+/* 
+ *		infrastructure for valgrind debugging
+ * 
+ */
+#ifdef USE_VALGRIND
+/* This variable should be set at the top of the main loop. */
+static unsigned int old_valgrind_error_count;
+
+/*
+ * If Valgrind detected any errors since old_valgrind_error_count was updated,
+ * report the current query as the cause.  This should be called at the end
+ * of message processing.
+ */
+static void
+valgrind_report_error_query(const char *query)
+{
+	unsigned int valgrind_error_count = VALGRIND_COUNT_ERRORS;
+
+	if (unlikely(valgrind_error_count != old_valgrind_error_count) &&
+		query != NULL)
+		VALGRIND_PRINTF("Valgrind detected %u error(s) during execution of \"%s\"\n",
+		valgrind_error_count - old_valgrind_error_count,
+		query);
+}
+
+#else			/* !USE_VALGRIND */
+#define valgrind_report_error_query(query) ((void) 0)
+#endif			/* USE_VALGRIND */
+
+
 /* 
  *		routines to obtain user input
  * 
@@ -2041,6 +2075,8 @@ exec_bind_message(StringInfo input_message)
 	if (save_log_statement_stats)
 		ShowUsage("BIND MESSAGE STATISTICS");
 
+	valgrind_report_error_query(debug_query_string);
+
 	debug_query_string = NULL;
 }
 
@@ -2292,6 +2328,8 @@ exec_execute_message(const char *portal_name, long max_rows)
 	if (save_log_statement_stats)
 		ShowUsage("EXECUTE MESSAGE STATISTICS");
 
+	valgrind_report_error_query(debug_query_string);
+
 	debug_query_string = NULL;
 }
 
@@ -4287,6 +4325,12 @@ PostgresMain(const char *dbname, const char *username)
 		/* Report the error to the client and/or server log */
 		EmitErrorReport();
 
+		/*
+		 * If Valgrind noticed something during the erroneous query, print the
+		 * query string, assuming we have one.
+		 */
+		valgrind_report_error_query(debug_query_string);
+
 		/*
 		 * Make sure debug_query_string gets reset before we possibly clobber
 		 * the storage it points at.
@@ -4371,6 +4415,13 @@ PostgresMain(const char *dbname, const char *username)
 		 */
 		doing_extended_query_message = false;
 
+		/*
+		 * For valgrind reporting purposes, the "current query" begins here.
+		 */
+#ifdef USE_VALGRIND
+		old_valgrind_error_count = VALGRIND_COUNT_ERRORS;
+#endif
+
 		/*
 		 * Release storage left over from prior query cycle, and create a new
 		 * query input buffer in the cleared MessageContext.
@@ -4571,6 +4622,8 @@ PostgresMain(const char *dbname, const char *username)
 	else
 		exec_simple_query(query_string);
 
+	valgrind_report_error_query(query_string);
+
 	send_ready_for_query = true;
 }
 break;
@@ -4600,6 +4653,8 @@ PostgresMain(const char *dbname, const char *username)
 
 	exec_parse_message(query_string, stmt_name,
 	   paramTypes, numParams);
+
+	valgrind_report_error_query(query_string);
 }
 break;
 
@@ -4614,6 +4669,8 @@ PostgresMain(const char *dbname, const char *username)
  * the field extraction out-of-line
  */
 exec_bind_message(_message);
+
+/* exec_bind_message does valgrind_report_error_query */
 break;
 
 			case 'E':			/* execute */
@@ -4631,6 +4688,8 @@ PostgresMain(const char *dbname, const char *username)
 	pq_getmsgend(_message);
 
 	exec_execute_message(portal_name, max_rows);
+
+	/* exec_execute_message does valgrind_report_error_query */
 }
 break;
 
@@ -4664,6 +4723,8 @@ PostgresMain(const char *dbname, const char *username)
 /* commit the function-invocation transaction */
 finish_xact_command();
 
+

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-02 Thread Melanie Plageman
On Sat, Apr 1, 2023 at 1:57 PM Justin Pryzby  wrote:
>
> On Sat, Apr 01, 2023 at 01:29:13PM -0400, Melanie Plageman wrote:
> > Hi,
> >
> > I was just doing some cleanup on the main patch in this set and realized
> > that it was missing a few things. One of which is forbidding the
> > BUFFER_USAGE_LIMIT with VACUUM FULL since VACUUM FULL does not use a
> > BAS_VACUUM strategy.
> >
> > VACUUM FULL technically uses a bulkread buffer access strategy for
> > reading the original relation if its number of blocks is > number of
> > shared buffers / 4 (see initscan()). The new rel writing is done using
> > smgrextend/write directly and doesn't go through shared buffers. I
> > think it is a stretch to try and use the size passed in to VACUUM by
> > BUFFER_USAGE_LIMIT for the bulkread strategy ring.
>
> When you say that it's a stretch, do you mean that it'd be a pain to add
> arguments to handful of functions to pass down the setting ?  Or that
> it's unclear if doing so would be the desirable/needed/intended/expected
> behavior ?

More that I don't think it makes sense. VACUUM FULL only uses a buffer
access strategy (BAS_BULKREAD) for reading the original relation in and
not for writing the new one. It has different concerns because its
behavior is totally different from regular vacuum. It is not modifying
the original buffers (AFAIK) and the amount of WAL it is generating is
different. Also, no matter what, the new relation won't be in shared
buffers because of VACUUM FULL using the smgr functions directly. So, I
think that allowing the two options together is confusing for the user
because it seems to imply we can give them some benefit that we cannot.

> I wonder if maybe strategy should be configurable in some more generic
> way, like a GUC.  At one point I had a patch to allow INSERT to use
> strategy buffers (not just INSERT SELECT).  And that's still pretty
> desirable.  Also COPY.  I've seen load spikes caused by pg_dumping
> tables which are just below 25% of shared_buffers.  Which is exacerbated
> because pg_dump deliberately orders tables by size, so those tables are
> dumped one after another, each causing eviction of ~20% of shared
> buffers.  And exacerbated some more because TOAST don't seem to use a
> ring buffer in that case.

Yes, it is probably worth exploring how configurable or dynamic Buffer
Access Strategies should be for other users (e.g. not just VACUUM).
However, since the ring sizes wouldn't be the same for all the different
operations, it is probably easier to start with a single kind of
operation and go from there.

> > I somehow feel like VACUUM (FULL, BUFFER_USAGE_LIMIT 'x') should error
> > out instead of silently not using the buffer usage limit, though.
> >
> > I am looking for others' opinions.
>
> Sorry, no opinion here :)
>
> One thing is that it's fine to take something that previously throw an
> error and change it to not throw an error anymore.  But it's undesirable
> to do the opposite.  For that reason, there's may be a tendency to add
> errors for cases like this.

So, I have made it error out when you specify BUFFER_USAGE_LIMIT with
VACUUM FULL or VACUUM ONLY_DATABASE_STATS. However, if you specify
buffer_usage_limit -1 with either of these options, it will not error
out. I don't love this, but I noticed that VACUUM (FULL, PARALLEL 0)
does not error out, while VACUUM (FULL, PARALLEL X) where X > 0 does.

If I want to error out when BUFFER_USAGE_LIMIT specified at all but
still do so at the bottom of ExecVacuum() with the rest of the vacuum
option sanity checking, I will probably need to add a flag bit for
VacuumParams->options.

I was wondering why some "sanity checking" of vacuum options is done in
ExecVacuum() and some in vacuum() (it isn't just split by what is
applicable to autovacuum and what isn't).

I noticed that even in cases where we don't use the strategy object we
still made it, which I thought seemed like a bit of a waste and easy to
fix. I've added a commit which does not make the BufferAccessStrategy
object when VACUUM FULL or VACUUM ONLY_DATABASE_STATS are specified. I
noticed that we also don't use the strategy for VACUUM (PROCESS_MAIN
false, PROCESS_TOAST false), but it didn't seem worth handling this very
specific case, so I didn't.

v8 attached has the prohibitions specified above (including for
vacuumdb, as relevant) as well as some cleanup, added test cases, and
updated documentation.

0001 is essentially unmodified (i.e. I didn't do anything with the other
global variable David mentioned).

I still have a few open questions:
- what the initial value of ring_size for autovacuum should be (see the
  one remaining TODO in the code)
- should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc
  value when that is set?
- should INDEX_CLEANUP off cause VACUUM to use shared buffers and
  disable use of a strategy (like failsafe vacuum)
- should we add anything to VACUUM VERBOSE output about the number of
  reuses of strategy buffers?
- 

Re: Minimal logical decoding on standbys

2023-04-02 Thread Andres Freund
Hi,

Btw, most of the patches have some things that pgindent will change (and some
that my editor will highlight). It wouldn't hurt to run pgindent for the later
patches...

Pushed the WAL format change.


On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote:
> During WAL replay on standby, when slot conflict is identified,
> invalidate such slots. Also do the same thing if wal_level on the primary 
> server
> is reduced to below logical and there are existing logical slots
> on standby. Introduce a new ProcSignalReason value for slot
> conflict recovery. Arrange for a new pg_stat_database_conflicts field:
> confl_active_logicalslot.
> 
> Add a new field "conflicting" in pg_replication_slots.
> 
> Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
> Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
> Mello,
> Bharath Rupireddy
> ---
>  doc/src/sgml/monitoring.sgml  |  11 +
>  doc/src/sgml/system-views.sgml|  10 +
>  src/backend/access/gist/gistxlog.c|   2 +
>  src/backend/access/hash/hash_xlog.c   |   1 +
>  src/backend/access/heap/heapam.c  |   3 +
>  src/backend/access/nbtree/nbtxlog.c   |   2 +
>  src/backend/access/spgist/spgxlog.c   |   1 +
>  src/backend/access/transam/xlog.c |  20 +-
>  src/backend/catalog/system_views.sql  |   6 +-
>  .../replication/logical/logicalfuncs.c|  13 +-
>  src/backend/replication/slot.c| 189 ++
>  src/backend/replication/slotfuncs.c   |  16 +-
>  src/backend/replication/walsender.c   |   7 +
>  src/backend/storage/ipc/procsignal.c  |   3 +
>  src/backend/storage/ipc/standby.c |  13 +-
>  src/backend/tcop/postgres.c   |  28 +++
>  src/backend/utils/activity/pgstat_database.c  |   4 +
>  src/backend/utils/adt/pgstatfuncs.c   |   3 +
>  src/include/catalog/pg_proc.dat   |  11 +-
>  src/include/pgstat.h  |   1 +
>  src/include/replication/slot.h|  14 +-
>  src/include/storage/procsignal.h  |   1 +
>  src/include/storage/standby.h |   2 +
>  src/test/regress/expected/rules.out   |   8 +-
>  24 files changed, 308 insertions(+), 61 deletions(-)
>5.3% doc/src/sgml/
>6.2% src/backend/access/transam/
>4.6% src/backend/replication/logical/
>   55.6% src/backend/replication/
>4.4% src/backend/storage/ipc/
>6.9% src/backend/tcop/
>5.3% src/backend/
>3.8% src/include/catalog/
>5.3% src/include/replication/

I think it might be worth trying to split this up a bit.


>   restart_lsn = s->data.restart_lsn;
> -
> - /*
> -  * If the slot is already invalid or is fresh enough, we don't 
> need to
> -  * do anything.
> -  */
> - if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
> oldestLSN)
> + slot_xmin = s->data.xmin;
> + slot_catalog_xmin = s->data.catalog_xmin;
> +
> + /* the slot has been invalidated (logical decoding conflict 
> case) */
> + if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
> + /* or the xid is valid and this is a non conflicting slot */
> +  (TransactionIdIsValid(*xid) && 
> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) 
> ||
> + /* or the slot has been invalidated (obsolete LSN case) */
> + (!xid && (XLogRecPtrIsInvalid(restart_lsn) || 
> restart_lsn >= oldestLSN)))
>   {

This still looks nearly unreadable. I suggest moving comments outside of the
if (), remove redundant parentheses, use a function to detect if the slot has
been invalidated.


> @@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, 
> XLogRecPtr oldestLSN,
>*/
>   if (last_signaled_pid != active_pid)
>   {
> + boolsend_signal = false;
> +
> + initStringInfo(_msg);
> + initStringInfo(_detail);
> +
> + appendStringInfo(_msg, "terminating process 
> %d to release replication slot \"%s\"",
> +  active_pid,
> +  
> NameStr(slotname));

For this to be translatable you need to use _("message").


> + if (xid)
> + {
> + appendStringInfo(_msg, " because it 
> conflicts with recovery");
> + send_signal = true;
> +
> + if (TransactionIdIsValid(*xid))
> + 

Re: Minimal logical decoding on standbys

2023-04-02 Thread Jeff Davis
On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote:
> I was thinking that, if a new LogicalWalSndWakeup() replaces
> "ConditionVariableBroadcast(>replayedCV);"
> in ApplyWalRecord() then, it could be possible that some walsender(s)
> are requested to wake up while they are actually doing decoding (but
> I might be wrong).

I don't think that's a problem, right?

We are concerned about wakeups when they happen repeatedly when there's
no work to do, or when the wakeup doesn't happen when it should (and we
need to wait for a timeout).

> > 
> > Currently, we wake up walsenders only after writing some WAL
> > records
> > at the time of flush, so won't it be better to wake up only after
> > applying some WAL records rather than after applying each record?
> 
> Yeah that would be better.

Why? If the walsender is asleep, and there's work to be done, why not
wake it up?

If it's already doing work, and the latch gets repeatedly set, that
doesn't look like a problem either. The comment on SetLatch() says:

  /*
   * Sets a latch and wakes up anyone waiting on it.
   *
   * This is cheap if the latch is already set, otherwise not so much.

Regards,
Jeff Davis









Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 11:42:26AM -0700, Andres Freund wrote:
> Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving
> serialized logical decoding snapshots and mapping files, to custodian, and
> still do. Without further work it increases wraparound risks (the filenames
> contain xids), and afaict nothing has been done to ameliorate that.

>From your feedback earlier [0], I was under the (perhaps false) impression
that adding a note about this existing issue in the commit message was
sufficient, at least initially.  I did add such a note in 0003, but it's
missing from 0002 for some reason.  I suspect I left it out because the
serialized snapshot file names do not contain XIDs.  You cleared that up
earlier [1], so this is my bad.

It's been a little while since I dug into this, but I do see your point
that the wraparound risk could be higher in some cases.  For example, if
you have a billion temp files to clean up, the custodian could be stuck on
that task for a long time.  I will give this some further thought.  I'm all
ears if anyone has ideas about how to reduce this risk.

[0] https://postgr.es/m/20220702225456.zit5kjdtdfqmjujt%40alap3.anarazel.de
[1] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: GUC for temporarily disabling event triggers

2023-04-02 Thread Justin Pryzby
On Mon, Mar 06, 2023 at 01:24:56PM +0100, Daniel Gustafsson wrote:
> > On 27 Jan 2023, at 15:00, Mikhail Gribkov  wrote:
> 
> > There is a complete framework for disabling various types of the event 
> > triggers separately, but, the list of valid GUC values only include 'all' 
> > and 'none'. Why not adding support for all the event trigger types 
> > separately? Everything is already there in the patch; the only thing needed 
> > is expanding couple of enums. It's cheap in terms of code size and even 
> > cheaper in terms of performance. And moreover - it would be a good example 
> > for anyone adding new trigger types.
> 
> I can't exactly recall my reasoning, but I do think you're right that if we're
> to have this GUC it should support the types of existing EVTs.  The updated v4
> implements that as well as a rebase on top of master and fixing a typo
> discovered upthread.
> 
+   gettext_noop("Disable event triggers for the duration 
of the session."),

Why does is it say "for the duration of the session" ?

It's possible to disable ignoring, and within the same session.
GUCs are typically "for the duration of the session" .. but they don't
say so (and don't need to).

+   elog(ERROR, "unsupport event trigger: %d", event);

typo: unsupported

+Allows to temporarily disable event triggers from executing in order

=> Allow temporarily disabling execution of event triggers ..

+to troubleshoot and repair faulty event triggers. The value matches
+the type of event trigger to be ignored:
+ddl_command_start, 
ddl_command_end,
+table_rewrite and sql_drop.
+Additionally, all event triggers can be disabled by setting it to
+all. Setting the value to none
+will ensure that all event triggers are enabled, this is the default

It doesn't technically "ensure that they're enabled", since they can be
disabled by ALTER.  Better to say that it "doesn't disable any even triggers".

-- 
Justin




Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
> I took a brief look through v20, and generally liked what I saw,
> but there are a few things troubling me:

Thanks for taking a look.

> * The comments for CustodianEnqueueTask claim that it won't enqueue an
> already-queued task, but I don't think I believe that, because it stops
> scanning as soon as it finds an empty slot.  That data structure seems
> quite oddly designed in any case.  Why isn't it simply an array of
> need-to-run-this-one booleans indexed by the CustodianTask enum?
> Fairness of dispatch could be ensured by the same state variable that
> CustodianGetNextTask already uses to track which array element to
> inspect next.  While that wouldn't guarantee that tasks A and B are
> dispatched in the same order they were requested in, I'm not sure why
> we should care.

That works.  Will update.

> * I don't much like cust_lck, mainly because you didn't bother to
> document what it protects (in general, CustodianShmemStruct deserves
> more than zero commentary).  Do we need it at all?  If the task-needed
> flags were sig_atomic_t not bool, we probably don't need it for the
> basic job of tracking which tasks remain to be run.  I see that some
> of the tasks have possibly-non-atomically-assigned parameters to be
> transmitted, but restricting cust_lck to protect those seems like a
> better idea.

Will do.

> * Not quite convinced about handle_arg_func, mainly because the Datum
> API would be pretty inconvenient for any task with more than one arg.
> Why do we need that at all, rather than saying that callers should
> set up any required parameters separately before invoking
> RequestCustodian?

I had done it this way earlier, but added the Datum argument based on
feedback upthread [0].  It presently has only one proposed use, anyway, so
I think it would be fine to switch it back for now.

> * Why does LookupCustodianFunctions think it needs to search the
> constant array?

The order of the tasks in the array isn't guaranteed to match the order in
the CustodianTask enum.

> * The original proposal included moving RemovePgTempFiles into this
> mechanism, which I thought was probably the most useful bit of the
> whole thing.  I'm sad to see that gone, what became of it?

I postponed that based on advice from upthread [1].  I was hoping to start
a dedicated thread for that immediately after the custodian infrastructure
was committed.  FWIW I agree that it's the most useful task of what's
proposed thus far.

[0] https://postgr.es/m/20220703172732.wembjsb55xl63vuw%40awork3.anarazel.de
[1] 
https://postgr.es/m/CANbhV-EagKLoUH7tLEfg__VcLu37LY78F8gvLMzHrRZyZKm6sw%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: GUC for temporarily disabling event triggers

2023-04-02 Thread Daniel Gustafsson
> On 7 Mar 2023, at 16:02, Mikhail Gribkov  wrote:

> * The patch does what it intends to do;
> * The implementation way is clear;
> * All test are passed;
> * No additional problems catched - at least by my eyes;
> 
> I think it can be marked as Ready for Committer

This patch has been RFC for some time, and has been all green in the CFbot.  I
would like to go ahead with it this cycle since it gives a tool for admins to
avoid single-user mode - which is something we want to move away from.  Even
though login event triggers aren't going in (which is where this originated),
there are still lots of ways to break things with other ev triggers.

Any objections?

--
Daniel Gustafsson





Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 13:40:05 -0400, Tom Lane wrote:
> Nathan Bossart  writes:
> > another rebase for cfbot
> 
> I took a brief look through v20, and generally liked what I saw,
> but there are a few things troubling me:

Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving
serialized logical decoding snapshots and mapping files, to custodian, and
still do. Without further work it increases wraparound risks (the filenames
contain xids), and afaict nothing has been done to ameliorate that.

Without those, the current patch series does not have any tasks:

> * The original proposal included moving RemovePgTempFiles into this
> mechanism, which I thought was probably the most useful bit of the
> whole thing.  I'm sad to see that gone, what became of it?

Greetings,

Andres Freund




Re: regression coverage gaps for gist and hash indexes

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 13:03:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-02 12:38:32 -0400, Tom Lane wrote:
> >> If they have to run serially then that means that their runtime
> >> contributes 1-for-1 to the total runtime of the core regression tests,
> >> which is not nice at all.
>
> > Agreed, it's not nice. At least reasonably quick (74ms and 54ms on one run
> > here)...
>
> Oh, that's less bad than I expected.  The discussion in the other thread
> was pointing in the direction of needing hundreds of ms to make indexes
> that are big enough to hit all the code paths.

Well, the tests here really just try to hit the killtuples path, not some of
the paths discussed in [1]. It needs just enough index entries to encounter a
page split (which then is avoided by pruning tuples).

Looks like the test in [1] could be made a lot cheaper by changing 
effective_cache_size
for just that test:

/*
 * In 'auto' mode, check if the index has grown too large to fit in 
cache,
 * and switch to buffering mode if it has.
 *
 * To avoid excessive calls to smgrnblocks(), only check this every
 * BUFFERING_MODE_SWITCH_CHECK_STEP index tuples.
 *
 * In 'stats' state, switch as soon as we have seen enough tuples to 
have
 * some idea of the average tuple size.
 */
if ((buildstate->buildMode == GIST_BUFFERING_AUTO &&
 buildstate->indtuples % BUFFERING_MODE_SWITCH_CHECK_STEP == 0 
&&
 effective_cache_size < smgrnblocks(RelationGetSmgr(index),

MAIN_FORKNUM)) ||
(buildstate->buildMode == GIST_BUFFERING_STATS &&
 buildstate->indtuples >= 
BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET))
{
/*
 * Index doesn't fit in effective cache anymore. Try to switch 
to
 * buffering build mode.
 */
gistInitBuffering(buildstate);
}



> >> Can we move them to some other portion of our test suite, preferably one
> >> that's not repeated four or more times in each buildfarm run?
>
> > Not trivially, at least. Right now 027_stream_regress.pl doesn't run other
> > tests, so we'd not cover the replay portion if moved the tests to
> > contrib/btree_gist or such.
>
> Yeah, I was imagining teaching 027_stream_regress.pl to run additional
> scripts that aren't in the core tests.

Not opposed to that, but I'm not quite sure about what we'd use as
infrastructure. A second schedule?

I think the tests patches I am proposing here are quite valuable to run
without replication involved as well, the killtuples path isn't trivial, so
having it be covered by the normal regression tests makes sense to me.


> (I'm still quite unhappy that 027_stream_regress.pl uses the core tests at
> all, really, as they were never particularly designed to cover what it cares
> about.  The whole thing is extremely inefficient and it's no surprise that
> its coverage is scattershot.)

I don't think anybody would claim it's great as-is. But I still think that
having a meaningful coverage of replay is a heck of a lot better than not
having any, even if it's not a pretty or all that fast design. And the fact
that 027_stream_regress.pl runs with a small shared_buffers actually shook out
a few bugs...

I don't think we'd want to use a completely separate set of tests for
027_stream_regress.pl, typical tests will provide coverage on both the primary
and the standby, I think, and would just end up being duplicated between the
main regression test and something specific for 027_stream_regress.pl. But I
could imagine that it's worth maintaining a distinct version of
parallel_schedule that removes a tests that aren't likely to provide benenfits
for 027_stream_regress.pl.


Btw, from what I can tell, the main bottleneck for the main regression test
right now is the granular use of groups. Because the parallel groups have
fixed size limit, there's a stall waiting for the slowest test at the end of
each group. If we instead limited group concurrency solely in pg_regress,
instead of the schedule, a quick experiment suggest we could get a good bit of
speedup. And remove some indecision paralysis about which group to add a new
test to, as well as removing the annoyance of having to count the number of
tests in a group manually.

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/flat/16329-7a6aa9b6fa1118a1%40postgresql.org




Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart  writes:
> another rebase for cfbot

I took a brief look through v20, and generally liked what I saw,
but there are a few things troubling me:

* The comments for CustodianEnqueueTask claim that it won't enqueue an
already-queued task, but I don't think I believe that, because it stops
scanning as soon as it finds an empty slot.  That data structure seems
quite oddly designed in any case.  Why isn't it simply an array of
need-to-run-this-one booleans indexed by the CustodianTask enum?
Fairness of dispatch could be ensured by the same state variable that
CustodianGetNextTask already uses to track which array element to
inspect next.  While that wouldn't guarantee that tasks A and B are
dispatched in the same order they were requested in, I'm not sure why
we should care.

* I don't much like cust_lck, mainly because you didn't bother to
document what it protects (in general, CustodianShmemStruct deserves
more than zero commentary).  Do we need it at all?  If the task-needed
flags were sig_atomic_t not bool, we probably don't need it for the
basic job of tracking which tasks remain to be run.  I see that some
of the tasks have possibly-non-atomically-assigned parameters to be
transmitted, but restricting cust_lck to protect those seems like a
better idea.

* Not quite convinced about handle_arg_func, mainly because the Datum
API would be pretty inconvenient for any task with more than one arg.
Why do we need that at all, rather than saying that callers should
set up any required parameters separately before invoking
RequestCustodian?

* Why does LookupCustodianFunctions think it needs to search the
constant array?

* The original proposal included moving RemovePgTempFiles into this
mechanism, which I thought was probably the most useful bit of the
whole thing.  I'm sad to see that gone, what became of it?

regards, tom lane




Re: Add "host" to startup packet

2023-04-02 Thread Daniel Gustafsson
> On 2 Apr 2023, at 18:33, Tom Lane  wrote:
> 
> Greg Stark  writes:
>> My question is a bit different. How does this interact with TLS SNI.
>> Can you just use the SNI name given in the TLS handshake? Should the
>> server require them to match? Is there any value to having a separate
>> source for this info? Is something similar available in GSSAPI
>> authentication?
> 
> The idea that I was thinking about was to not hard-wire sending the host
> string exactly, but instead to invent another connection parameter along
> the line of "send_host = name-to-send".  This parallels the situation in
> HTTP where the "Host" header doesn't necessarily have to match the actual
> transport target.

Since we already have sslsni in libpq since v14, any SNI being well understood
and standardized, do we really want to invent our own parallel scheme?
Alternatively, the protocol in the.PROXY patch by Magnus [0] which stalled a
few CF's ago has similar functionality for the client to pass a hostname.

--
Daniel Gustafsson

[0] 
https://www.postgresql.org/message-id/flat/CABUevExJ0ifpUEiX4uOREy0s2kHBrBrb=pxlehhpmtr1vvr...@mail.gmail.com



Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Peter Geoghegan
On Sun, Apr 2, 2023 at 10:18 AM Peter Geoghegan  wrote:
> Making nbtree page deletion more efficient when logical decoding is in
> use seems well worthwhile. There is an "XXX" comment about this issue,
> similar to the SP-GiST one. It looks like you already have everything
> you need to make this work from yesterday's commit 61b313e47e.

Actually, I suppose that isn't quite true, since you'd still need to
find a way to pass the heap relation down to nbtree VACUUM. Say by
adding it to IndexVacuumInfo.

That doesn't seem hard at all. The hard part was passing the heap rel
down to _bt_getbuf(), which you've already taken care of.

-- 
Peter Geoghegan




Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Peter Geoghegan
On Sun, Apr 2, 2023 at 1:25 AM Drouvot, Bertrand
 wrote:
> now that the heap relation is passed down to vacuumRedirectAndPlaceholder()
> thanks to 61b313e47e, we can also pass it down to GlobalVisTestFor() too (to
> allow more pruning).

What about BTPageIsRecyclable() and _bt_pendingfsm_finalize()?

Making nbtree page deletion more efficient when logical decoding is in
use seems well worthwhile. There is an "XXX" comment about this issue,
similar to the SP-GiST one. It looks like you already have everything
you need to make this work from yesterday's commit 61b313e47e.

--
Peter Geoghegan




Re: regression coverage gaps for gist and hash indexes

2023-04-02 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-02 12:38:32 -0400, Tom Lane wrote:
>> If they have to run serially then that means that their runtime
>> contributes 1-for-1 to the total runtime of the core regression tests,
>> which is not nice at all.

> Agreed, it's not nice. At least reasonably quick (74ms and 54ms on one run
> here)...

Oh, that's less bad than I expected.  The discussion in the other thread
was pointing in the direction of needing hundreds of ms to make indexes
that are big enough to hit all the code paths.

>> Can we move them to some other portion of our test suite, preferably one
>> that's not repeated four or more times in each buildfarm run?

> Not trivially, at least. Right now 027_stream_regress.pl doesn't run other
> tests, so we'd not cover the replay portion if moved the tests to
> contrib/btree_gist or such.

Yeah, I was imagining teaching 027_stream_regress.pl to run additional
scripts that aren't in the core tests.  (I'm still quite unhappy that
027_stream_regress.pl uses the core tests at all, really, as they were
never particularly designed to cover what it cares about.  The whole
thing is extremely inefficient and it's no surprise that its coverage
is scattershot.)

regards, tom lane




Re: regression coverage gaps for gist and hash indexes

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-02 12:38:32 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-01 06:02:47 +0200, Drouvot, Bertrand wrote:
> >> +1 to put them in gist.sql and hash_index.sql.
> 
> > Unfortunately it turns out that running them in a parallel group reliably
> > prevents cleanup of the dead rows, at least on my machine. Thereby 
> > preventing
> > any increase in coverage. As they need to run serially, I think it makes 
> > more
> > sense to keep the tests focussed and leave gist.sql and hash_index.sql to 
> > run
> > in parallel.
> 
> If they have to run serially then that means that their runtime
> contributes 1-for-1 to the total runtime of the core regression tests,
> which is not nice at all.

Agreed, it's not nice. At least reasonably quick (74ms and 54ms on one run
here)...


> Can we move them to some other portion of our test suite, preferably one
> that's not repeated four or more times in each buildfarm run?

Not trivially, at least. Right now 027_stream_regress.pl doesn't run other
tests, so we'd not cover the replay portion if moved the tests to
contrib/btree_gist or such.

I wonder if we should have a test infrastructure function for waiting for the
visibility horizon to have passed a certain point. I think that might be
useful for making a few other tests robust...

Greetings,

Andres Freund




Re: regression coverage gaps for gist and hash indexes

2023-04-02 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-01 06:02:47 +0200, Drouvot, Bertrand wrote:
>> +1 to put them in gist.sql and hash_index.sql.

> Unfortunately it turns out that running them in a parallel group reliably
> prevents cleanup of the dead rows, at least on my machine. Thereby preventing
> any increase in coverage. As they need to run serially, I think it makes more
> sense to keep the tests focussed and leave gist.sql and hash_index.sql to run
> in parallel.

If they have to run serially then that means that their runtime
contributes 1-for-1 to the total runtime of the core regression tests,
which is not nice at all.  Can we move them to some other portion
of our test suite, preferably one that's not repeated four or more
times in each buildfarm run?

regards, tom lane




Re: Add "host" to startup packet

2023-04-02 Thread Tom Lane
Greg Stark  writes:
> My question is a bit different. How does this interact with TLS SNI.
> Can you just use the SNI name given in the TLS handshake? Should the
> server require them to match? Is there any value to having a separate
> source for this info? Is something similar available in GSSAPI
> authentication?

The idea that I was thinking about was to not hard-wire sending the host
string exactly, but instead to invent another connection parameter along
the line of "send_host = name-to-send".  This parallels the situation in
HTTP where the "Host" header doesn't necessarily have to match the actual
transport target.  I can think of a couple of benefits:

* Avoid breaking backward compatibility with old servers: if user doesn't
add this option then nothing extra is sent.

* Separating the send_host name would simplify testing scenarios.

Seems like it would depend a lot on your use-case whether you care about
the send_host name matching anything that's authenticated.  If you do,
there's a whole lot more infrastructure to build out around pg_hba.conf.
Right now that file is designed on the assumption that it describes
authentication rules for a single "host", but we'd need to generalize
it to describe rules for multiple host values.

regards, tom lane




Re: regression coverage gaps for gist and hash indexes

2023-04-02 Thread Andres Freund
Hi,

On 2023-04-01 06:02:47 +0200, Drouvot, Bertrand wrote:
> On 4/1/23 1:13 AM, Andres Freund wrote:
> > On 2023-03-31 17:00:00 +0300, Alexander Lakhin wrote:
> > > 31.03.2023 15:55, Tom Lane wrote:
> > > > See also the thread about bug #16329 [1].  Alexander promised to look
> > > > into improving the test coverage in this area, maybe he can keep an
> > > > eye on the WAL logic coverage too.
> > > 
> > > Yes, I'm going to analyze that area too. Maybe it'll take more time
> > > (a week or two) if I encounter some bugs there (for now I observe 
> > > anomalies
> > > with gist__int_ops), but I will definitely try to improve the gist 
> > > testing.
> > 
> > Because I needed it to verify the changes in the referenced patch, I wrote
> > tests exercising killtuples based pruning for gist and hash.
> > 
> 
> Thanks for the patch!
> 
> I did not looked at the detail but "just" checked that the coverage is now 
> done.
> 
> And Indeed, when running "make check" + "027_stream_regress.pl":
> 
> I can see it moving from (without the patch):
> 
> function gistXLogDelete called 0 returned 0% blocks executed 0%
> function gistRedoDeleteRecord called 0 returned 0% blocks executed 0%
> function gistprunepage called 0 returned 0% blocks executed 0%
> function _hash_vacuum_one_page called 0 returned 0% blocks executed 0%
> 
> to (with the patch):
> 
> function gistXLogDelete called 9 returned 100% blocks executed 100%
> function gistRedoDeleteRecord called 5 returned 100% blocks executed 100% 
> (thanks to 027_stream_regress.pl)
> function gistprunepage called 9 returned 100% blocks executed 79%
> function _hash_vacuum_one_page called 12 returned 100% blocks executed 94%
> 
> > For now I left the new tests in their own files. But possibly they should be
> > in gist.sql and hash_index.sql respectively?
> 
> +1 to put them in gist.sql and hash_index.sql.

Unfortunately it turns out that running them in a parallel group reliably
prevents cleanup of the dead rows, at least on my machine. Thereby preventing
any increase in coverage. As they need to run serially, I think it makes more
sense to keep the tests focussed and leave gist.sql and hash_index.sql to run
in parallel.

Greetings,

Andres Freund




Re: Add "host" to startup packet

2023-04-02 Thread Greg Stark
On Sun, 2 Apr 2023 at 11:38, Tom Lane  wrote:
>
> Even if all that infrastructure sprang into existence, is this really any
> more useful than basing your switching on the host's resolved IP address?
> I'm doubtful that there's enough win there to justify pushing this rock
> to the top of the mountain.

Hm. I think it's going to turn out to be useful. Experience shows
depending on the ip address often paints people into corners. However
I agree that we need to actually have a real use case in hand where
someone is going to actually do something with it.

My question is a bit different. How does this interact with TLS SNI.
Can you just use the SNI name given in the TLS handshake? Should the
server require them to match? Is there any value to having a separate
source for this info? Is something similar available in GSSAPI
authentication?

-- 
greg




Re: Add "host" to startup packet

2023-04-02 Thread Tom Lane
Lev Kokotov  writes:
> Patch attached below. TLDR, I'd like to add "host" to the startup packet.

I don't think this is of any use at all in isolation.  What is the server
going to do with it?  What's your plan for persuading clients other than
libpq to supply it?  How are poolers supposed to handle it?  What will
you do about old clients that don't supply it?  And most importantly,
how can a client know while connecting whether it's safe to include this,
realizing that existing servers will error out (they'll think it's a
GUC setting for "host")?

Even if all that infrastructure sprang into existence, is this really any
more useful than basing your switching on the host's resolved IP address?
I'm doubtful that there's enough win there to justify pushing this rock
to the top of the mountain.

regards, tom lane




Re: Add "host" to startup packet

2023-04-02 Thread Andrey Borodin
Hi Lev,02.04.2023, 14:43, "Lev Kokotov" :Patch attached below. TLDR, I'd like to add "host" to the startup packet.I'm trying to run multiple Postgres servers in a multi-tenant environment behind a pooler. Currently, the only way to differentiate Postgres databases is with the user/dbname combination which are very often included in the startup packet by the client. However, that's not sufficient if you have users that all want to have the user "admin" and the database "production" :)HTTP hosting providers solved this using the "Host" header, allowing the server to identify which website the client wants. In the case of Postgres, this is the DNS or IP address, depending on the client configuration.Upon receiving a startup packet with user, dbname, and host, the pooler (acting also as a proxy) can validate that the credentials exist for the host and that they are valid, and authorize or decline the connection.I like the idea of giving proxy information on database tenant to which client wants to connect. However, name “host” in web is chosen as part of URL specification. I’m not sure it applies here.And it is not clear from your description how server should interpret this information.I have never submitted a patch for Postgres before, so I'm not entirely sure how to test this change, although it seems pretty harmless. Any feedback and help are appreciated!Well, at minimum from the patch should be clear what purpose new feature has, some documentation and test must be included. You can judge from recently committed libpq load balancing what it takes to add a connection option [0]. But, obviously, it makes sense to discuss it before going all the way of implementation.Best regards, Andrey Borodin.[0] https://github.com/postgres/postgres/commit/7f5b1981

Add "host" to startup packet

2023-04-02 Thread Lev Kokotov
Hello,

Patch attached below. TLDR, I'd like to add "host" to the startup packet.

I'm trying to run multiple Postgres servers in a multi-tenant environment
behind a pooler . Currently, the only
way to differentiate Postgres databases is with the user/dbname combination
which are very often included in the startup packet by the client. However,
that's not sufficient if you have users that all want to have the user
"admin" and the database "production" :)

HTTP hosting providers solved this using the "Host" header, allowing the
server to identify which website the client wants. In the case of Postgres,
this is the DNS or IP address, depending on the client configuration.

Upon receiving a startup packet with user, dbname, and host, the pooler
(acting also as a proxy) can validate that the credentials exist for the
host and that they are valid, and authorize or decline the connection.

I have never submitted a patch for Postgres before, so I'm not entirely
sure how to test this change, although it seems pretty harmless. Any
feedback and help are appreciated!

Thank you!

Best,
Lev


host.patch
Description: Binary data


Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-02 Thread Drouvot, Bertrand

hi hackers,

now that the heap relation is passed down to vacuumRedirectAndPlaceholder()
thanks to 61b313e47e, we can also pass it down to GlobalVisTestFor() too (to
allow more pruning).

Please find attached a tiny patch to do so.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/access/spgist/spgvacuum.c 
b/src/backend/access/spgist/spgvacuum.c
index 3cff71e720..5a7e55441b 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -506,8 +506,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation 
heaprel, Buffer buffer)
xlrec.nToPlaceholder = 0;
xlrec.snapshotConflictHorizon = InvalidTransactionId;
 
-   /* XXX: providing heap relation would allow more pruning */
-   vistest = GlobalVisTestFor(NULL);
+   vistest = GlobalVisTestFor(heaprel);
 
START_CRIT_SECTION();
 


Re: Minimal logical decoding on standbys

2023-04-02 Thread Drouvot, Bertrand

Hi,

On 4/1/23 6:50 AM, Amit Kapila wrote:

On Fri, Mar 31, 2023 at 7:14 PM Drouvot, Bertrand
 wrote:


That sounds like a good idea. We could imagine creating a LogicalWalSndWakeup()
doing the Walsender(s) triage based on a new variable (as you suggest).

But, it looks to me that we:

- would need to go through the list of all the walsenders to do the triage
- could wake up some logical walsender(s) unnecessary



Why it could wake up unnecessarily?


I was thinking that, if a new LogicalWalSndWakeup() replaces 
"ConditionVariableBroadcast(>replayedCV);"
in ApplyWalRecord() then, it could be possible that some walsender(s)
are requested to wake up while they are actually doing decoding (but I might be 
wrong).




This extra work would occur during each replay.

while with the CV, only the ones in the CV wait queue would be waked up.



Currently, we wake up walsenders only after writing some WAL records
at the time of flush, so won't it be better to wake up only after
applying some WAL records rather than after applying each record?


Yeah that would be better.

Do you have any idea about how (and where) we could define the "some WAL records 
replayed"?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com