Re: doc: add missing "id" attributes to extension packaging page

2023-04-05 Thread Maciek Sakrejda
For what it's worth, I think having the anchors be always-visible when
CSS disabled is a feature. The content is still perfectly readable,
and the core feature from this patch is available. Introducing
JavaScript to lose that functionality seems like a step backwards.

By the way, the latest patch attachment was not the full patch series,
which I think confused cfbot: [1]  (unless I'm misunderstanding the
state of the patch series).

And thanks for working on this. I've hunted in the page source for ids
to link to a number of times. I look forward to not doing that
anymore.

Thanks,
Maciek

[1]: https://commitfest.postgresql.org/42/4042/




Re: Comment typo in recent push

2023-04-05 Thread Peter Smith
Thanks for pushing.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread Peter Geoghegan
On Wed, Apr 5, 2023 at 2:33 PM David Rowley  wrote:
> I read this twice yesterday and again this morning. It looks like
> you're taking an opportunity to complain/vent about
> vacuum_freeze_table_age and didn't really answer my query about why
> all the vacuum GUCs aren't defined in the one file.  I'd just picked
> vacuum_freeze_table_age as a random one from vacuum.c to raise the
> point about the inconsistency about the GUC locations.

I thought that the point was obvious. Which is: the current situation
with the locations of these GUCs came about because the division
between autovacuum and VACUUM used to be a lot clearer, but that
changed. Without the locations of the GUCs also changing. More
generally, the current structure has lots of problems. And so it seems
to me that you're probably not wrong to suspect that it just doesn't
make much sense to keep them in different files now.

-- 
Peter Geoghegan




Re: GUC for temporarily disabling event triggers

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 10:43:23AM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> Maybe we should back up and ask why we need more than "on" and "off".
>> If somebody is using this feature in any form more than very
>> occasionally, they should really go home and reconsider their database
>> schema.
> 
> +1 ... this seems perhaps overdesigned.

Yes.  If you begin with an "on"/"off" switch, it could always be
extended later if someone makes a case for it, with a grammar like one
I mentioned upthread, or even something else.  If there is no strong
case for more than a boolean for now, simpler is better.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 05:39:13PM -0400, Tom Lane wrote:
> v5 seems OK to me except I think CreateExecutorState() should explicitly
> zero the new es_total_processed field, alongside zeroing es_processed.
> (I realize that the makeNode would have done it already, but our
> coding conventions generally run towards not relying on that.  This is
> mainly for greppability, so you can find where a field is initialized.)

Makes sense to me.  I'll look at that again today, potentially apply
the fix on HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Imseih (AWS), Sami
> Makes sense to me. I'll look at that again today, potentially apply
> the fix on HEAD.

Here is v6. That was my mistake not to zero out the es_total_processed.
I had it in the first version.

--
Regards,

Sami Imseih
Amazon Web Services (AWS)




v6-0001-Fix-row-tracking-in-pg_stat_statements.patch
Description: v6-0001-Fix-row-tracking-in-pg_stat_statements.patch


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread Melanie Plageman
On Wed, Apr 5, 2023 at 5:15 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote:
> > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund  wrote:
> > > Perhaps the best solution for autovac vs interactive vacuum issue would 
> > > be to
> > > move the allocation of the bstrategy to ExecVacuum()?
> >
> > So, I started looking into allocating the bstrategy in ExecVacuum().
> >
> > While doing so, I was trying to understand if the "sanity checking" in
> > vacuum() could possibly apply to autovacuum, and I don't really see how.
> >
> > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or
> > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS.
> >
> > We could move those sanity checks up into ExecVacuum().
>
> Would make sense.
>
> ISTM that eventually most of what currently happens in vacuum() should be in
> ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it
> just seems to make more sense to move those parts to ExecVacuum().

I've done that in the attached wip patch. It is perhaps too much of a
change, I dunno.

> > I also noticed that we make the vac_context in vacuum() which says it is
> > for "cross-transaction storage". We use it for the buffer access
> > strategy and for the newrels relation list created in vacuum(). Then we
> > delete it at the end of vacuum().
>
> > Autovacuum workers already make a similar kind of memory context called
> > AutovacMemCxt in do_autovacuum() which the comment says is for the list
> > of relations to vacuum/analyze across transactions.
>
> AutovacMemCxt seems to be a bit longer lived / cover more than the context
> created in vacuum(). It's where all the hash tables etc live that
> do_autovacuum() uses to determine what to vacuum.
>
> Note that do_autovacuum() also creates:
>
> /*
>  * create a memory context to act as fake PortalContext, so that the
>  * contexts created in the vacuum code are cleaned up for each table.
>  */
> PortalContext = AllocSetContextCreate(AutovacMemCxt,
>   
> "Autovacuum Portal",
>   
> ALLOCSET_DEFAULT_SIZES);
>
> which is then what vacuum() creates the "Vacuum" context in.

Yea, I realized that when writing the patch.

> > What if we made ExecVacuum() make its own memory context and both it and
> > do_autovacuum() pass that memory context (along with the buffer access
> > strategy they make) to vacuum(), which then uses the memory context in
> > the same way it does now?
>
> Maybe? It's not clear to me why it'd be a win.

Less that it is a win and more that we need access to that memory
context when allocating the buffer access strategy, so we would have had
to make it in ExecVacuum(). And if we have already made it, we would
need to pass it in to vacuum() for it to use.

> > It simplifies the buffer usage limit patchset and also seems a bit more
> > clear than what is there now?
>
> I don't really see what it'd make simpler? The context in vacuum() is used for
> just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much
> longer (for all the tables a autovac worker processes).

Autovacuum already made the BufferAccessStrategy in the AutovacMemCxt,
so this is the same behavior. I simply made autovacuum_do_vac_analyze()
make the per table vacuum memory context and pass that to vacuum(). So
we have the same amount of memory context granularity as before.

Attached patchset has some kind of isolation test failure due to a hard
deadlock that I haven't figured out yet. I thought it was something with
the "in_vacuum" static variable and having VACUUM or ANALYZE called when
already in VACUUM or ANALYZE, but that variable is the same as in
master.

I've mostly shared it because I want to know if this approach is worth
pursuing or not.

Also, while working on it, I noticed that I made a mistake in the code
that was committed in 4830f102 and didn't remember that we should still
make a Buffer Access Strategy in the case of VACUUM (FULL, ANALYZE).

Changing this:

if (params->options & (VACOPT_ONLY_DATABASE_STATS | VACOPT_FULL)) == 0)

to this:

if ((params.options & VACOPT_ONLY_DATABASE_STATS) == 0 ||
(params.options & VACOPT_FULL && (params.options & VACOPT_ANALYZE) == 0)

should fix it.

- Melanie
From 9cdadabe9205cdd57e51a9c1ff0601c745cc308e Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 5 Apr 2023 17:22:02 -0400
Subject: [PATCH v1 2/2] add BUFFER_USAGE_LIMIT and vacuum_buffer_usage_limit

---
 doc/src/sgml/config.sgml  | 30 ++
 doc/src/sgml/ref/analyze.sgml | 18 
 doc/src/sgml/ref/vacuum.sgml  | 22 +
 src/backend/commands/vacuum.c | 95 ++-
 src/backend/commands/vacuumparallel.c | 14 ++-
 src/backend/postmaster/autovacuum.c   | 13 ++-
 src/backe

Re: generic plans and "initial" pruning

2023-04-05 Thread Amit Langote
On Tue, Apr 4, 2023 at 10:29 PM Amit Langote 
wrote:
> On Tue, Apr 4, 2023 at 6:41 AM Tom Lane  wrote:
> > A few concrete thoughts:
> >
> > * I understand that your plan now is to acquire locks on all the
> > originally-named tables, then do permissions checks (which will
> > involve only those tables), then dynamically lock just inheritance and
> > partitioning child tables as we descend the plan tree.
>
> Actually, with the current implementation of the patch, *all* of the
> relations mentioned in the plan tree would get locked during the
> ExecInitNode() traversal of the plan tree (and of those in
> plannedstmt->subplans), not just the inheritance child tables.
> Locking of non-child tables done by the executor after this patch is
> duplicative with AcquirePlannerLocks(), so that's something to be
> improved.
>
> > That seems
> > more or less okay to me, but it could be reflected better in the
> > structure of the patch perhaps.
> >
> > * In particular I don't much like the "viewRelations" list, which
> > seems like a wart; those ought to be handled more nearly the same way
> > as other RTEs.  (One concrete reason why is that this scheme is going
> > to result in locking views in a different order than they were locked
> > during original parsing, which perhaps could contribute to deadlocks.)
> > Maybe we should store an integer list of which RTIs need to be locked
> > in the early phase?  Building that in the parser/rewriter would provide
> > a solid guide to the original locking order, so we'd be trivially sure
> > of duplicating that.  (It might be close enough to follow the RT list
> > order, which is basically what AcquireExecutorLocks does today, but
> > this'd be more certain to do the right thing.)  I'm less concerned
> > about lock order for child tables because those are just going to
> > follow the inheritance or partitioning structure.
>
> What you've described here sounds somewhat like what I had implemented
> in the patch versions till v31, though it used a bitmapset named
> minLockRelids that is initialized by setrefs.c.  Your idea of
> initializing a list before planning seems more appealing offhand than
> the code I had added in setrefs.c to populate that minLockRelids
> bitmapset, which would be bms_add_range(1, list_lenth(finalrtable)),
> followed by bms_del_members(set-of-child-rel-rtis).
>
> I'll give your idea a try.

After sleeping on this, I think we perhaps don't need to remember
originally-named relations if only for the purpose of locking them for
execution.  That's because, for a reused (cached) plan,
AcquirePlannerLocks() would have taken those locks anyway.

AcquirePlannerLocks() doesn't lock inheritance children because they would
be added to the range table by the planner, so they should be locked
separately for execution, if needed.  I thought taking the execution-time
locks only when inside ExecInit[Merge]Append would work, but then we have
cases where single-child Append/MergeAppend are stripped of the
Append/MergeAppend nodes by setrefs.c.  Maybe we need a place to remember
such child relations, that is, only in the cases where Append/MergeAppend
elision occurs, in something maybe esoteric-sounding like
PlannedStmt.elidedAppendChildRels or something?

Another set of child relations that are not covered by Append/MergeAppend
child nodes is non-leaf partitions.  I've proposed adding a List of
Bitmapset field to Append/MergeAppend named 'allpartrelids' as part of this
patchset (patch 0001) to track those for execution-time locking.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: on placeholder entries in view rule action query's range table

2023-04-05 Thread Amit Langote
On Thu, Apr 6, 2023 at 3:33 Tom Lane  wrote:

> Amit Langote  writes:
> > While thinking about query view locking in context of [1], I realized
> > that we have missed also fixing AcquirePlannerLocks() /
> > ScanQueryForLocks() to consider that an RTE_SUBQUERY rte may belong to
> > a view, which must be locked the same as RTE_RELATION entries.
>
> I think you're right about that, because AcquirePlannerLocks is supposed
> to reacquire whatever locks parsing+rewriting would have gotten.
> However, what's with this hunk?
>
> @@ -527,7 +527,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
> result->partPruneInfos = glob->partPruneInfos;
> result->rtable = glob->finalrtable;
> result->permInfos = glob->finalrteperminfos;
> -   result->viewRelations = glob->viewRelations;
> +   result->viewRelations = NIL;
> result->resultRelations = glob->resultRelations;
> result->appendRelations = glob->appendRelations;
> result->subplans = glob->subplans;


Oops, I was working in the wrong local branch.

Thanks for pushing.  I agree that there’s no live bug as such right now,
but still good to be consistent.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-05 Thread Thomas Munro
On Thu, Apr 6, 2023 at 9:11 AM Tom Lane  wrote:
> Richard Guo  writes:
> > Thanks for reminding.  Attached is the rebased patch, with no other
> > changes.  I think the patch is ready for commit.
>
> Pushed after a little further fooling with the comments.  I also had
> to rebase it over 11c2d6fdf (Parallel Hash Full Join).  I think I did
> that correctly, but it's not clear to me whether any of the existing
> test cases are now doing parallelized hashed right antijoins.  Might
> be worth a little more testing.

I don't see any (at least that are EXPLAINed).  Wondering if we should
add some of these into join_hash.sql, but probably not before I figure
out how to make that whole file run faster...

> I think that Alvaro's concern about incorrect cost estimates may be
> misplaced.  I couldn't find any obvious errors in the costing logic for
> this, given that we concluded that the early-exit runtime logic cannot
> apply.  Also, when I try simply executing Richard's original test query
> (in a non-JIT build), the runtimes I get line up quite well ... maybe
> too well? ... with the cost estimates:
>
> v15 HEAD w/patchRatio
>
> Cost estimate   173691.19   90875.330.52
> Actual (best of 3)  514.200 ms  268.978 ms  0.52
>
> I think the smaller differentials you guys were seeing were all about
> EXPLAIN ANALYZE overhead.

I tried the original example from the top of this thread and saw a
decent speedup from parallelism, but only if I set
min_parallel_table_scan_size=0, and otherwise it doesn't choose
Parallel Hash Right Anti Join.  Same if I embiggen bar significantly.
Haven't looked yet but I wonder if there is some left/right confusion
on parallel degree computation or something like that...




Re: recovery modules

2023-04-05 Thread Michael Paquier
On Fri, Feb 17, 2023 at 11:41:32AM -0800, Andres Freund wrote:
> I don't fully now, it's not entirely clear to me what the goals here were. I
> think you'd likely need to do a bit of infrastructure work to do this
> sanely. So far we just didn't have the need to handle files being released in
> a way like you want to do there.
> 
> I suspect a good direction would be to use resource owners. Add a separate set
> of functions that release files on resource owner release. Most of the
> infrastructure is there already, for temporary files
> (c.f. OpenTemporaryFile()).

Yes, perhaps.  I've had good experience with these when it comes to
avoid leakages when releasing resources, particularly for resources
allocated by external libraries (cough, OpenSSL, cough).  And there
was some work to make these more scalable, for example.

At this stage of the CF, it seems pretty clear to me that this should
be pushed to v17, so moved to next CF.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Andres Freund
Hi,

On 2023-04-06 07:09:37 +0900, Michael Paquier wrote:
> I'll look at that again today, potentially apply the fix on HEAD.

Seems like a complicated enough facility to benefit from a test or two? Peter
Eisentraut added support for the extended query protocol to psql, so it
shouldn't be too hard...

commit 5b66de3433e
Author: Peter Eisentraut 
Date:   2022-11-15 13:50:27 +0100
 
psql: Add command to use extended query protocol

Greetings,

Andres Freund




Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 10:16:19PM +, Imseih (AWS), Sami wrote:
> Here is v6. That was my mistake not to zero out the es_total_processed.
> I had it in the first version.

The update of es_total_processed in standard_ExecutorRun() felt a bit
lonely, so I have added an extra comment, ran an indentation, and
applied the result.  Thanks Sami for the patch, and everyone else for
the feedback!
--
Michael


signature.asc
Description: PGP signature


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread Melanie Plageman
On Wed, Apr 5, 2023 at 6:55 PM Melanie Plageman
 wrote:
>
> On Wed, Apr 5, 2023 at 5:15 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote:
> > > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund  wrote:
> > > > Perhaps the best solution for autovac vs interactive vacuum issue would 
> > > > be to
> > > > move the allocation of the bstrategy to ExecVacuum()?
> > >
> > > So, I started looking into allocating the bstrategy in ExecVacuum().
> > >
> > > While doing so, I was trying to understand if the "sanity checking" in
> > > vacuum() could possibly apply to autovacuum, and I don't really see how.
> > >
> > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or
> > > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS.
> > >
> > > We could move those sanity checks up into ExecVacuum().
> >
> > Would make sense.
> >
> > ISTM that eventually most of what currently happens in vacuum() should be in
> > ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it
> > just seems to make more sense to move those parts to ExecVacuum().
>
> I've done that in the attached wip patch. It is perhaps too much of a
> change, I dunno.
>
> > > I also noticed that we make the vac_context in vacuum() which says it is
> > > for "cross-transaction storage". We use it for the buffer access
> > > strategy and for the newrels relation list created in vacuum(). Then we
> > > delete it at the end of vacuum().
> >
> > > Autovacuum workers already make a similar kind of memory context called
> > > AutovacMemCxt in do_autovacuum() which the comment says is for the list
> > > of relations to vacuum/analyze across transactions.
> >
> > AutovacMemCxt seems to be a bit longer lived / cover more than the context
> > created in vacuum(). It's where all the hash tables etc live that
> > do_autovacuum() uses to determine what to vacuum.
> >
> > Note that do_autovacuum() also creates:
> >
> > /*
> >  * create a memory context to act as fake PortalContext, so that the
> >  * contexts created in the vacuum code are cleaned up for each 
> > table.
> >  */
> > PortalContext = AllocSetContextCreate(AutovacMemCxt,
> > 
> >   "Autovacuum Portal",
> > 
> >   ALLOCSET_DEFAULT_SIZES);
> >
> > which is then what vacuum() creates the "Vacuum" context in.
>
> Yea, I realized that when writing the patch.
>
> > > What if we made ExecVacuum() make its own memory context and both it and
> > > do_autovacuum() pass that memory context (along with the buffer access
> > > strategy they make) to vacuum(), which then uses the memory context in
> > > the same way it does now?
> >
> > Maybe? It's not clear to me why it'd be a win.
>
> Less that it is a win and more that we need access to that memory
> context when allocating the buffer access strategy, so we would have had
> to make it in ExecVacuum(). And if we have already made it, we would
> need to pass it in to vacuum() for it to use.
>
> > > It simplifies the buffer usage limit patchset and also seems a bit more
> > > clear than what is there now?
> >
> > I don't really see what it'd make simpler? The context in vacuum() is used 
> > for
> > just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much
> > longer (for all the tables a autovac worker processes).
>
> Autovacuum already made the BufferAccessStrategy in the AutovacMemCxt,
> so this is the same behavior. I simply made autovacuum_do_vac_analyze()
> make the per table vacuum memory context and pass that to vacuum(). So
> we have the same amount of memory context granularity as before.
>
> Attached patchset has some kind of isolation test failure due to a hard
> deadlock that I haven't figured out yet. I thought it was something with
> the "in_vacuum" static variable and having VACUUM or ANALYZE called when
> already in VACUUM or ANALYZE, but that variable is the same as in
> master.
>
> I've mostly shared it because I want to know if this approach is worth
> pursuing or not.

Figured out how to fix the issue, though I'm not sure I understand how
the issue can occur.
use_own_xacts seems like it will always be true for autovacuum when it
calls vacuum() and ExecVacuum() only calls vacuum() once, so I thought
that I could make use_own_xacts a parameter to vacuum() and push up its
calculation for VACUUM and ANALYZE into ExecVacuum().
This caused a deadlock, so there must be a way that in_vacuum is false
but vacuum() is called in a nested context.
Anyway, recalculating it every time in vacuum() fixes it.

Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which
includes a commit to fix the bug in master and a commit to move relevant
code from vacuum() up into ExecVacuum().

The logic I suggested earlier for fixing the bug was...not right.
Attached fix should be right?

- 

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 05:39:35PM -0700, Andres Freund wrote:
> Seems like a complicated enough facility to benefit from a test or two? Peter
> Eisentraut added support for the extended query protocol to psql, so it
> shouldn't be too hard...

PQsendQueryGuts() does not split yet the bind/describe phase and the
execute phases, so we'd need a couple more libpq APIs to do that, with
more tracking of the state we're currently on when looping across
multiple execute fetches.  My guess is that it is possible to follow a
model similar to JDBC here.  I don't think that's necessarily
complicated, but it is not as straight-forward as it looks.  \bind was 
much more straight-forward than that, as it can feed on a single call
of PQsendQueryParams() after saving a set of parameters.  An \exec
would not completely do that.

Attaching one of the scripts I've played with, in a very rusty java
with no classes or such, for future reference.  Just update CLASSPATH
to point to a copy of the JDBC driver, run it with a java command, and
then look at rows, query in pg_stat_statements.
--
Michael
import java.sql.*;
//import System.out.format;
//import org.postgresql.Driver;

public class Test {
static final String DB_URL = "jdbc:postgresql://localhost/mydb";

public static void main(String[] args) {
Connection conn = null;
// Open a connection
try {
conn = DriverManager.getConnection(DB_URL);
conn.setAutoCommit(false);
			Integer increment = 0;

			// SELECT query
PreparedStatement statement = conn.prepareStatement("select * from pg_class");
			statement.setFetchSize(100);
ResultSet rs = statement.executeQuery();
			while (rs.next()) {
increment++;
			}
conn.commit();
			statement.close();
			System.out.format("SELECT increment " + increment + "\n");

			//INSERT RETURNING
			//CREATE TABLE aa (a int);
			increment = 0;
PreparedStatement insertStmt = conn.prepareStatement("insert into aa select oid from pg_class returning a");
			insertStmt.setFetchSize(100);
ResultSet insertRs = insertStmt.executeQuery();
			while (insertRs.next()) {
increment++;
			}
conn.commit();
			insertStmt.close();
			System.out.format("INSERT RETURNING increment " + increment + "\n");

			//INSERT RETURNING
			//CREATE TABLE bb (a int);
			increment = 0;
PreparedStatement withStmt = conn.prepareStatement("with insert_stmt AS (insert into bb select oid from pg_class returning a) SELECT r1.a, r2.a from insert_stmt as r1, aa as r2;");
			withStmt.setFetchSize(100);
ResultSet withRs = withStmt.executeQuery();
			while (withRs.next()) {
increment++;
			}
conn.commit();
			insertStmt.close();
			System.out.format("INSERT WITH increment " + increment + "\n");

			increment = 0;
PreparedStatement withStmt2 = conn.prepareStatement("with select_stmt AS (select oid from pg_class) SELECT r1.oid, r2.a from select_stmt as r1, aa as r2;");
			withStmt2.setFetchSize(10);
ResultSet withRs2 = withStmt2.executeQuery();
			while (withRs2.next()) {
increment++;
			}
conn.commit();
			insertStmt.close();
			System.out.format("SELECT WITH increment " + increment + "\n");
} catch (SQLException e) {
e.printStackTrace();
try {
conn.rollback();
} catch (SQLException ee) {
ee.printStackTrace();
}
}
}
}


signature.asc
Description: PGP signature


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread David Rowley
On Thu, 6 Apr 2023 at 12:42, Melanie Plageman  wrote:
> Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which
> includes a commit to fix the bug in master and a commit to move relevant
> code from vacuum() up into ExecVacuum().

I'm still playing catch up to the moving of the pre-checks from
vacuum() to ExecVacuum().  I'm now wondering...

Is it intended that VACUUM t1,t2; now share the same strategy?
Currently, in master, we'll allocate a new strategy for t2 after
vacuuming t1. Does this not mean we'll now leave fewer t1 pages in
shared_buffers because the reuse of the strategy will force them out
with t2 pages?  I understand there's nothing particularly invalid
about that, but it is a change in behaviour that the patch seems to be
making with very little consideration as to if it's better or worse.

David




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread Melanie Plageman
On Wed, Apr 5, 2023 at 9:15 PM David Rowley  wrote:
>
> On Thu, 6 Apr 2023 at 12:42, Melanie Plageman  
> wrote:
> > Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which
> > includes a commit to fix the bug in master and a commit to move relevant
> > code from vacuum() up into ExecVacuum().
>
> I'm still playing catch up to the moving of the pre-checks from
> vacuum() to ExecVacuum().  I'm now wondering...
>
> Is it intended that VACUUM t1,t2; now share the same strategy?
> Currently, in master, we'll allocate a new strategy for t2 after
> vacuuming t1. Does this not mean we'll now leave fewer t1 pages in
> shared_buffers because the reuse of the strategy will force them out
> with t2 pages?  I understand there's nothing particularly invalid
> about that, but it is a change in behaviour that the patch seems to be
> making with very little consideration as to if it's better or worse.

I'm pretty sure that in master we also reuse the strategy since we make
it above this loop in vacuum() (and pass it in)

/*
* Loop to process each selected relation.
*/
foreach(cur, relations)
{
VacuumRelation *vrel = lfirst_node(VacuumRelation, cur);
if (params->options & VACOPT_VACUUM)
{
if (!vacuum_rel(vrel->oid, vrel->relation, params,
false, bstrategy))
continue;
}

On Wed, Apr 5, 2023 at 8:41 PM Melanie Plageman
 wrote:
>
> On Wed, Apr 5, 2023 at 6:55 PM Melanie Plageman
>  wrote:
> >
> > On Wed, Apr 5, 2023 at 5:15 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote:
> > > > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund  wrote:
> > > > > Perhaps the best solution for autovac vs interactive vacuum issue 
> > > > > would be to
> > > > > move the allocation of the bstrategy to ExecVacuum()?
> > > >
> > > > So, I started looking into allocating the bstrategy in ExecVacuum().
> > > >
> > > > While doing so, I was trying to understand if the "sanity checking" in
> > > > vacuum() could possibly apply to autovacuum, and I don't really see how.
> > > >
> > > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or
> > > > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS.
> > > >
> > > > We could move those sanity checks up into ExecVacuum().
> > >
> > > Would make sense.
> > >
> > > ISTM that eventually most of what currently happens in vacuum() should be 
> > > in
> > > ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So 
> > > it
> > > just seems to make more sense to move those parts to ExecVacuum().
> >
> > I've done that in the attached wip patch. It is perhaps too much of a
> > change, I dunno.
> >
> > > > I also noticed that we make the vac_context in vacuum() which says it is
> > > > for "cross-transaction storage". We use it for the buffer access
> > > > strategy and for the newrels relation list created in vacuum(). Then we
> > > > delete it at the end of vacuum().
> > >
> > > > Autovacuum workers already make a similar kind of memory context called
> > > > AutovacMemCxt in do_autovacuum() which the comment says is for the list
> > > > of relations to vacuum/analyze across transactions.
> > >
> > > AutovacMemCxt seems to be a bit longer lived / cover more than the context
> > > created in vacuum(). It's where all the hash tables etc live that
> > > do_autovacuum() uses to determine what to vacuum.
> > >
> > > Note that do_autovacuum() also creates:
> > >
> > > /*
> > >  * create a memory context to act as fake PortalContext, so that 
> > > the
> > >  * contexts created in the vacuum code are cleaned up for each 
> > > table.
> > >  */
> > > PortalContext = AllocSetContextCreate(AutovacMemCxt,
> > >   
> > > "Autovacuum Portal",
> > >   
> > > ALLOCSET_DEFAULT_SIZES);
> > >
> > > which is then what vacuum() creates the "Vacuum" context in.
> >
> > Yea, I realized that when writing the patch.
> >
> > > > What if we made ExecVacuum() make its own memory context and both it and
> > > > do_autovacuum() pass that memory context (along with the buffer access
> > > > strategy they make) to vacuum(), which then uses the memory context in
> > > > the same way it does now?
> > >
> > > Maybe? It's not clear to me why it'd be a win.
> >
> > Less that it is a win and more that we need access to that memory
> > context when allocating the buffer access strategy, so we would have had
> > to make it in ExecVacuum(). And if we have already made it, we would
> > need to pass it in to vacuum() for it to use.
> >
> > > > It simplifies the buffer usage limit patchset and also seems a bit more
> > > > clear than what is there now?
> > >
> > > I don't really see what it'd make simpler? The context in vacuum() is 
> > > used 

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread David Rowley
On Thu, 6 Apr 2023 at 13:14, David Rowley  wrote:
> Is it intended that VACUUM t1,t2; now share the same strategy?
> Currently, in master, we'll allocate a new strategy for t2 after
> vacuuming t1. Does this not mean we'll now leave fewer t1 pages in
> shared_buffers because the reuse of the strategy will force them out
> with t2 pages?  I understand there's nothing particularly invalid
> about that, but it is a change in behaviour that the patch seems to be
> making with very little consideration as to if it's better or worse.

Actually, never mind that. I'm wrong. The same strategy is used for
both tables before and after this change.

I stumbled on thinking vacuum() was being called in a loop from
ExecVacuum() rather than it just passing all of the relations to
vacuum() and the loop being done inside vacuum(), which is does.

David




Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-05 Thread David Christensen
On Wed, Apr 5, 2023 at 3:30 PM Stephen Frost  wrote:

> Greetings,
>
> * David Christensen (david...@pgguru.net) wrote:
> > Did a code review pass here; here is some feedback.
>
> Thanks!
>
> > + /* If password was used to connect, make sure it was one provided
> */
> > + if (PQconnectionUsedPassword(conn) &&
> dblink_connstr_has_pw(connstr))
> > + return;
> >
> >   Do we need to consider whether these passwords are the same?  Is there
> a different vector where a different password could be acquired from a
> different source (PGPASSWORD, say) while both of these criteria are true?
> Seems like it probably doesn't matter that much considering we only checked
> Password alone in previous version of this code.
>
> Note that this patch isn't really changing how these checks are being
> done but more moving them around and allowing a GSSAPI-based approach
> with credential delegation to also be allowed.
>
> That said, as noted in the comments above dblink_connstr_check():
>
>  * For non-superusers, insist that the connstr specify a password, except
>  * if GSSAPI credentials have been proxied (and we check that they are used
>  * for the connection in dblink_security_check later).  This prevents a
>  * password or GSSAPI credentials from being picked up from .pgpass, a
>  * service file, the environment, etc.  We don't want the postgres user's
>  * passwords or Kerberos credentials to be accessible to non-superusers.
>
> The point of these checks is, indeed, to ensure that environmental
> values such as a .pgpass or variable don't end up getting picked up and
> used (or, if they do, we realize it post-connection and then throw away
> the connection).
>
> libpq does explicitly prefer to use the password passed in as part of
> the connection string and won't attempt to look up passwords in a
> .pgpass file or similar if a password has been included in the
> connection string.
>

The case I think I was thinking of was (manufactured) when we connected to
a backend with one password but the dblink or postgresql_fdw includes an
explicit password to a different server.  But now I'm thinking that this
PQconnectionUsedPassword() is checking the outgoing connection for dblink
itself, not the connection of the backend that connected to the main
server, so I think this objection is moot, like you say.

> Looks like the pg_gssinfo struct hides the `proxy_creds` def behind:
> >
> > #if defined(ENABLE_GSS) | defined(ENABLE_SSPI)
> > typedef struct
> > {
> > gss_buffer_desc outbuf;   /* GSSAPI output token
> buffer */
> > #ifdef ENABLE_GSS
> > ...
> > bool  proxy_creds;/* GSSAPI Delegated/proxy
> credentials */
> > #endif
> > } pg_gssinfo;
> > #endif
>
> ... right, proxy_creds only exists (today anyway) if ENABLE_GSS is set.
>
> > Which means that the later check in `be_gssapi_get_proxy()` we have:


 [analysis snipped]

Fairly confident the analysis here is wrong, further, the cfbot seems to
> agree that there isn't a compile failure here:
>
> https://cirrus-ci.com/task/6589717672624128
>
> [20:19:15.985] gss: NO
>
> (we always build with SSPI on Windows, per
> src/include/port/win32_port.h).
>

Cool; since we have coverage for that case seems like my concern was
unwarranted.

[snip]


> > 
> > 
> >  Only superusers may connect to foreign servers without password
> > -authentication, so always specify the password
> option
> > -for user mappings belonging to non-superusers.
> > +authentication or using gssapi proxied credentials, so specify the
> > +password option for user mappings belonging to
> > +non-superusers who are not able to proxy GSSAPI credentials.
> > 
> > 
> >
> > s/gssapi/GSSAPI/; this is kind of confusing, as this makes it sound like
> only superuser may use GSSAPI proxied credentials, which I disbelieve to be
> true.  Additionally, it sounds like you're wanting to explicitly maintain a
> denylist for users to not be allowed proxying; is that correct?
>
> Updated to GSSAPI and reworded in the updated patch (attached).
> Certainly open to suggestions on how to improve the documentation here.
> There is no 'denylist' for users when it comes to GSSAPI proxied
> credentials.  If there's a use-case for that then it could be added in
> the future.
>

Okay, I think your revisions here seem more clear, thanks.


>
> > ---
> >
> > libpq/auth.c:
> >
> >   if (proxy != NULL)
> >   {
> >   pg_store_proxy_credential(proxy);
> >   port->gss->proxy_creds = true;
> >   }
> >
> > Per GSS docs, seems like we should be comparing to GSS_C_NO_CREDENTIAL
> and validating that the gflags has the `deleg_flag` bit set before
> considering whether there are valid credentials; in practice this might be
> the same effect (haven't looked at what that symbol actually resolves to,
> but NULL would be sensible).

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-04-05 Thread Andres Freund
Hi,

On 2023-04-04 17:39:45 -0700, Andres Freund wrote:
> I'm planning to push the patches up to the hio.c changes soon, unless somebody
> would like me to hold off.

Done that.


> After that I'm planning to wait for a buildfarm cycle, and push the changes
> necessary to use bulk extension in hio.c (the main win).

Working on that. Might end up being tomorrow.


> I might split the patch to use ExtendBufferedRelTo() into two, one for
> vm_extend() and fsm_extend(), and one for xlogutils.c. The latter is more
> complicated and has more of a complicated history (see [1]).

I've pushed the vm_extend() and fsm_extend() piece, and did split out the
xlogutils.c case.

Greetings,

Andres Freund




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 04:35:37PM +0200, Matthias van de Meent wrote:
> I thought that the plan was to use int64 to skip checking for most
> overflows and to do a single check at the end in XLogRecordAssemble,
> so that the checking has minimal overhead in the performance-critical
> log record assembling path and reduced load on the branch predictor.

And that's the reason why your v11-0002 is better and simpler than the
v10-0001 I posted a few days ago.

+   if (regbuf->rdata_len + len > UINT16_MAX || len > UINT16_MAX)
+   ereport(ERROR,
+   (errmsg_internal("too much WAL data"),
+errdetail_internal("Registering more than max %u bytes total 
to block %u: current %uB, adding %uB",
+   UINT16_MAX, block_id, regbuf->rdata_len, 
len)));

I was wondering for a few minutes about the second part of this
check..  But you are worried about the case where len is too large
that it would overflow rdata_len if calling XLogRegisterBufData() more
than once on the same block, if len is between
(UINT32_MAX-UINT16_MAX,UINT32_MAX) on the second call.

The extra errdetail_internal() could be tweaked a bit more, but I'm
also OK with your proposal, overall.  One thing is "current %uB,
adding %uB" would be better using "bytes".

> One more issue that Andres was suggesting we'd fix was to allow XLog
> assembly separate from the actual XLog insertion:
> Currently you can't pre-assemble a record outside a critical section
> if the record must be inserted in a critical section, which makes e.g.
> commit records problematic due to the potentially oversized data
> resulting in ERRORs during record assembly. This would crash postgres
> because commit xlog insertion happens in a critical section. Having a
> pre-assembled record would greatly improve the ergonomics in that path
> and reduce the length of the critical path.
>
> I think it was something along the lines of the attached; 0001
> contains separated Commit/Abort record construction and insertion like
> Andres suggested,

I am honestly not sure whether we should complicate xloginsert.c this
way, but we could look at that for v17.

> 0002 does the size checks with updated error messages.

0002 can also be done before 0001, so I'd like to get that part
applied on HEAD before the feature freeze and close this thread.  If
there are any objections, please feel free..
--
Michael


signature.asc
Description: PGP signature


Re: cataloguing NOT NULL constraints

2023-04-05 Thread Andres Freund
Hi,

On 2023-04-06 01:33:56 +0200, Alvaro Herrera wrote:
> I'll go over this again tomorrow with fresh eyes, but I think it should
> be pretty close to ready.  (Need to amend docs to note the new NO
> INHERIT option for NOT NULL table constraints, and make sure pg_dump
> complies.)

Missed this thread somehow.  This is not a review - I just want to say that I
am very excited that we might finally catalogue NOT NULL constraints. This has
been a long time in the making...

Greetings,

Andres Freund




Re: Improve logging when using Huge Pages

2023-04-05 Thread Michael Paquier
On Tue, Mar 28, 2023 at 09:35:30AM +0900, Michael Paquier wrote:
> The patch needs to provide more details about the unknown state, IMO,
> to make it crystal-clear to the users what are the limitations of this
> GUC, especially regarding the fact that this is useful when "try"-ing
> to allocate huge pages, and that the value can only be consulted after
> the server has been started.

This is still unanswered?  Perhaps at this stage we'd better consider
that for v17 so as we have more time to agree on the user interface?
--
Michael


signature.asc
Description: PGP signature


Re: cataloguing NOT NULL constraints

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 06:54:54PM -0700, Andres Freund wrote:
> On 2023-04-06 01:33:56 +0200, Alvaro Herrera wrote:
>> I'll go over this again tomorrow with fresh eyes, but I think it should
>> be pretty close to ready.  (Need to amend docs to note the new NO
>> INHERIT option for NOT NULL table constraints, and make sure pg_dump
>> complies.)
> 
> Missed this thread somehow.  This is not a review - I just want to say that I
> am very excited that we might finally catalogue NOT NULL constraints. This has
> been a long time in the making...

+1!
--
Michael


signature.asc
Description: PGP signature


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-05 Thread Melanie Plageman
Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT
option. I haven't included a test for VACUUM (BUFFER_USAGE_LIMIT x,
PARALLEL x) for the reason I mentioned upthread -- even if we force it
to actually do the parallel vacuuming, we are adding exercising the code
where parallel vacuum workers make their own buffer access strategy
rings but not really adding a test that will fail usefully. If something
is wrong with the configurability of the buffer access strategy object,
I don't see how it will break differently in parallel vacuum workers vs
regular vacuum.

- Melanie
From 7d06a4deb75fe4dfe33e41f8261b43532ab29c3f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sun, 19 Mar 2023 18:00:28 -0400
Subject: [PATCH v14 3/3] Add buffer-usage-limit option to vacuumdb

---
 doc/src/sgml/ref/vacuumdb.sgml | 13 +
 src/bin/scripts/vacuumdb.c | 23 +++
 src/include/commands/vacuum.h  |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 74bac2d4ba..8280cf0fb0 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -278,6 +278,19 @@ PostgreSQL documentation
   
  
 
+ 
+  --buffer-usage-limit buffer_usage_limit
+  
+   
+Specifies the
+Buffer Access Strategy
+ring buffer size for a given invocation of vacuumdb.
+This size is used to calculate the number of shared buffers which will
+be reused as part of this strategy.  See .
+   
+  
+ 
+
  
   -n schema
   --schema=schema
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 39be265b5b..941eac7727 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,6 +46,7 @@ typedef struct vacuumingOptions
 	bool		process_main;
 	bool		process_toast;
 	bool		skip_database_stats;
+	char	   *buffer_usage_limit;
 } vacuumingOptions;
 
 /* object filter options */
@@ -123,6 +124,7 @@ main(int argc, char *argv[])
 		{"no-truncate", no_argument, NULL, 10},
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
+		{"buffer-usage-limit", required_argument, NULL, 13},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -147,6 +149,7 @@ main(int argc, char *argv[])
 	/* initialize options */
 	memset(&vacopts, 0, sizeof(vacopts));
 	vacopts.parallel_workers = -1;
+	vacopts.buffer_usage_limit = NULL;
 	vacopts.no_index_cleanup = false;
 	vacopts.force_index_cleanup = false;
 	vacopts.do_truncate = true;
@@ -266,6 +269,9 @@ main(int argc, char *argv[])
 			case 12:
 vacopts.process_main = false;
 break;
+			case 13:
+vacopts.buffer_usage_limit = pg_strdup(optarg);
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -343,6 +349,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
  "no-index-cleanup", "force-index-cleanup");
 
+	/* buffer-usage-limit doesn't make sense with VACUUM FULL */
+	if (vacopts.buffer_usage_limit && vacopts.full)
+		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
+ "buffer-usage-limit", "full");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -550,6 +561,10 @@ vacuum_one_database(ConnParams *cparams,
 		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
  "--parallel", "13");
 
+	if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 16)
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--buffer-usage-limit", "16");
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 16);
 
@@ -1048,6 +1063,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
   vacopts->parallel_workers);
 sep = comma;
 			}
+			if (vacopts->buffer_usage_limit)
+			{
+Assert(serverVersion >= 16);
+appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+  vacopts->buffer_usage_limit);
+sep = comma;
+			}
 			if (sep != paren)
 appendPQExpBufferChar(sql, ')');
 		}
@@ -,6 +1133,7 @@ help(const char *progname)
 	printf(_("  --force-index-cleanup   always remove index entries that point to dead tuples\n"));
 	printf(_("  -j, --jobs=NUM  use this many concurrent connections to vacuum\n"));
 	printf(_("  --min-mxid-age=MXID_AGE minimum multixact ID age of tables to vacuum\n"));
+	printf(_("  --buffer-usage-limit=BUFSIZE size of ring buffer used for vacuum\n"));
 	printf(_("  --min-xid-age=XID_AGE   minimum transaction ID age of tables to vacuum\n"));
 	printf(_("  --no-index-cleanup  don't remove index entries that point to dead tuples\n"));
 	printf(_("  --no-pr

Re: failure in 019_replslot_limit

2023-04-05 Thread Kyotaro Horiguchi
At Wed, 5 Apr 2023 11:55:14 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2023-04-05 11:48:53 -0700, Andres Freund wrote:
> > Note that a checkpoint started at "17:50:23.787", but didn't finish before 
> > the
> > database was shut down.  As far as I can tell, this can not be caused by
> > checkpoint_timeout, because by the time we get to invalidating replication
> > slots, we already did CheckPointBuffers(), and that's the only thing that
> > delays based on checkpoint_timeout.
> > 
> > ISTM that this indicates that checkpointer got stuck after signalling
> > 344783.
> > 
> > Do you see any other explanation?
> 
> This all sounded vaguely familiar. After a bit bit of digging I found this:
> 
> https://postgr.es/m/20220223014855.4lsddr464i7mymk2%40alap3.anarazel.de
> 
> Which seems like it plausibly explains the failed test?

As my understanding, ConditionVariableSleep() can experience random
wake-ups and ReplicationSlotControlLock doesn't prevent slot
release. So, I can imagine a situation where that blocking might
happen.  If the call ConditionVariableSleep(&s->active_cv) wakes up
unexpectedly due to a latch set for reasons other than the CV
broadcast, and the target process releases the slot between fetching
active_pid in the loop and the following call to
ConditionVariablePrepareToSleep(), the CV broadcast triggered by the
slot release might be missed. If that's the case, we'll need to check
active_pid again after the calling ConditionVariablePrepareToSleep().

Does this make sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Should vacuum process config file reload more often

2023-04-05 Thread Melanie Plageman
On Wed, Apr 5, 2023 at 3:43 PM Melanie Plageman
 wrote:
>
> On Wed, Apr 5, 2023 at 2:56 PM Robert Haas  wrote:
> >
> > + /*
> > + * Balance and update limit values for autovacuum workers. We must
> > + * always do this in case the autovacuum launcher or another
> > + * autovacuum worker has recalculated the number of workers across
> > + * which we must balance the limit. This is done by the launcher when
> > + * launching a new worker and by workers before vacuuming each table.
> > + */
> >
> > I don't quite understand what's going on here. A big reason that I'm
> > worried about this whole issue in the first place is that sometimes
> > there's a vacuum going on a giant table and you can't get it to go
> > fast. You want it to absorb new settings, and to do so quickly. I
> > realize that this is about the number of workers, not the actual cost
> > limit, so that makes what I'm about to say less important. But ... is
> > this often enough? Like, the time before we move onto the next table
> > could be super long. The time before a new worker is launched should
> > be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default
> > settings, so that's not horrible, but I'm kind of struggling to
> > understand the rationale for this particular choice. Maybe it's fine.
>
> VacuumUpdateCosts() also calls AutoVacuumUpdateCostLimit(), so this will
> happen if a config reload is pending the next time vacuum_delay_point()
> is called (which is pretty often -- roughly once per block vacuumed but
> definitely more than once per table).
>
> Relevant code is at the top of vacuum_delay_point():
>
> if (ConfigReloadPending && IsAutoVacuumWorkerProcess())
> {
> ConfigReloadPending = false;
> ProcessConfigFile(PGC_SIGHUP);
> VacuumUpdateCosts();
> }
>

Gah, I think I misunderstood you. You are saying that only calling
AutoVacuumUpdateCostLimit() after napping while vacuuming a table may
not be enough. The frequency at which the number of workers changes will
likely be different. This is a good point.
It's kind of weird to call AutoVacuumUpdateCostLimit() only after napping...

Hmm. Well, I don't think we want to call AutoVacuumUpdateCostLimit() on
every call to vacuum_delay_point(), though, do we? It includes two
atomic operations. Maybe that pales in comparison to what we are doing
on each page we are vacuuming. I haven't properly thought about it.

Is there some other relevant condition we can use to determine whether
or not to call AutoVacuumUpdateCostLimit() on a given invocation of
vacuum_delay_point()? Maybe something with naptime/max workers?

I'm not sure if there is a more reliable place than vacuum_delay_point()
for us to do this. I poked around heap_vacuum_rel(), but I think we
would want this cost limit update to happen table AM-agnostically.

Thank you for bringing this up!

- Melanie




Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-05 Thread Michael Paquier
On Wed, Apr 05, 2023 at 02:31:54PM +, Imseih (AWS), Sami wrote:
>> That seems a valid argument. I was thinking that such an asynchronous
>> state update mechanism would be a good infrastructure for progress
>> reporting of parallel operations. It might be worth considering to use
>> it in more general usage but since the current implementation is
>> minimal can we extend it in the future when we need it for other use
>> cases?
> 
> I don't think we should delay this patch to design a more general
> infrastructure. I agree this can be handled by a future requirement.

Not so sure to agree on that.  As the patch stands, we have a rather
generally-purposed new message type and facility (callback trigger
poke from workers to their leader) used for a not-so-general purpose,
while being documented under this not-so-general purpose, which is
progress reporting.  I agree that relying on pqmq.c to force the
leader to be more sensible to refreshes is sensible, because it is
cheap, but the interface seems kind of misplaced to me.  As one thing,
for example, it introduces a dependency to parallel.h to do progress
reporting without touching at backend_progress.h.  Is a callback
approach combined with a counter in shared memory the best thing there
could be?  Could it be worth thinking about a different design where
the value incremented and the parameters of
pgstat_progress_update_param() are passed through the 'P' message
instead?  It strikes me that gathering data in the leader from a poke
of the workers is something that could be useful in so much more areas
than just the parallel index operations done in a vacuum because we do
more and more things in parallel these days, so the API interface
ought to have some attention.

As some say, the introduction of a new message type in pqmq.c would be
basically a one-way door, because we'd have to maintain it in a stable
branch.  I would not take that lightly.  One idea of interface that
could be used is an extra set of APIs for workers to do progress
reporting, part of backend_progress.h, where we use a pqmq message
type in a centralized location, say something like a
pgstat_progress_parallel_incr_param().

About the callback interface, we may also want to be more careful
about more things, like the handling of callback chains, or even
unregistrations of callbacks?  There could be much more uses to that
than just progress reporting, though this comes to a balance of what
the leader needs to do before the workers are able to poke at it on a
periodic basis to make the refresh of the aggregated progress
reporting data more verbose.  There is also an argument where we could
have each worker report their progress independently of the leader?
--
Michael


signature.asc
Description: PGP signature


Re: Fix a comment in basic_archive about NO_INSTALLCHECK

2023-04-05 Thread Michael Paquier
On Mon, Apr 03, 2023 at 08:56:10AM +0530, Bharath Rupireddy wrote:
> It looks like comments in make file and meson file about not running
> basic_archive tests in NO_INSTALLCHECK mode are wrong. The comments say the
> module needs to be loaded via shared_preload_libraries=basic_archive, but
> it actually doesn't. The custom file needs archive related parameters and
> wal_level=replica. Here's a patch correcting that comment.

Wouldn't it be better to also set shared_preload_libraries in
basic_archive.conf?  It is true that the test works fine if setting
only archive_library, which would cause the library with its
_PG_init() to be loaded in the archiver process.  However the GUC 
basic_archive.archive_directory is missing from the backends.

Saying that, updating the comments about the dependency with
archive_library and the module's GUC is right.
--
Michael


signature.asc
Description: PGP signature


Re: Amcheck verification of GiST and GIN

2023-04-05 Thread Alexander Lakhin

Hi Andrey,

27.03.2023 01:17, Andrey Borodin wrote:

I've ported the B-tree TOAST test to GiST, and, as expected, it fails.
Finds non-indexed tuple for a fresh valid index.


I've tried to use this feature with the latest patch set and discovered that
modified pg_amcheck doesn't find any gist indexes when running without a
schema specification. For example:
CREATE TABLE tbl (id integer, p point);
INSERT INTO tbl VALUES (1, point(1, 1));
CREATE INDEX gist_tbl_idx ON tbl USING gist (p);
CREATE INDEX btree_tbl_idx ON tbl USING btree (id);

pg_amcheck -v -s public
prints:
pg_amcheck: checking index "regression.public.btree_tbl_idx"
pg_amcheck: checking heap table "regression.public.tbl"
pg_amcheck: checking index "regression.public.gist_tbl_idx"

but without "-s public" a message about checking of gist_tbl_idx is absent.

As I can see in the server.log, the queries, that generate relation lists in
these cases, differ in:
... AND ep.pattern_id IS NULL AND c.relam = 2 AND c.relkind IN ('r', 'S', 'm', 
't') AND c.relnamespace != 99 ...

... AND ep.pattern_id IS NULL AND c.relam IN (2, 403, 783)AND c.relkind IN ('r', 'S', 'm', 't', 'i') AND ((c.relam = 2 
AND c.relkind IN ('r', 'S', 'm', 't')) OR ((c.relam = 403 OR c.relam = 783) AND c.relkind = 'i')) ...


Best regards,
Alexander




RE: pg_upgrade and logical replication

2023-04-05 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

> I'm attaching a v3 to fix a recent conflict with pg_dump due to a563c24c9574b7
> (Allow pg_dump to include/exclude child tables automatically).

Thank you for making the patch.
FYI - it could not be applied due to recent commits. SUBOPT_* and attributes
in SubscriptionInfo was added these days.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-05 Thread Thomas Munro
On Thu, Apr 6, 2023 at 12:17 PM Thomas Munro  wrote:
> I tried the original example from the top of this thread and saw a
> decent speedup from parallelism, but only if I set
> min_parallel_table_scan_size=0, and otherwise it doesn't choose
> Parallel Hash Right Anti Join.  Same if I embiggen bar significantly.
> Haven't looked yet but I wonder if there is some left/right confusion
> on parallel degree computation or something like that...

Ahh, the problem is just that create_plain_partial_paths() doesn't
bother to create a partial path for foo at all, because it's so small,
so hash_inner_and_outer() can't even consider a parallel join (that
needs partial paths on both sides).  What we want here is a shared
hash table so we can have shared match flags, an entirely new concern,
but create_plain_partial_path() can't see any point in a partial scan
of such a small table.  It works if you're OK creating partial paths
for everything...

+#if 0
/* If any limit was set to zero, the user doesn't want a
parallel scan. */
if (parallel_workers <= 0)
return;
+#endif




Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-05 Thread Tom Lane
Thomas Munro  writes:
> ... It works if you're OK creating partial paths
> for everything...

Hmm.  The committed patch already causes us to investigate more
paths than before, which I was okay with because it only costs
more if there's an antijoin involved --- which it seems like
there's at least a 50% chance of winning on, depending on which
table is bigger.

This:

> +#if 0
> /* If any limit was set to zero, the user doesn't want a
> parallel scan. */
> if (parallel_workers <= 0)
> return;
> +#endif

seems like it adds a lot of new paths with a lot lower chance
of win, but maybe we could tighten the conditions to improve
the odds?

regards, tom lane




Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 12:28 PM, Amit Kapila wrote:
> > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
> >  wrote:
> >> Maybe we could change the doc with something among those lines instead?
> >>
> >> "
> >> Existing logical slots on standby also get invalidated if wal_level on 
> >> primary is reduced to
> >> less than 'logical'. This is done as soon as the standby detects such a 
> >> change in the WAL stream.
> >>
> >> It means, that for walsenders that are lagging (if any), some WAL records 
> >> up to the parameter change on the
> >> primary won't be decoded".
> >>
> >> I don't know whether this is what one would expect but that should be less 
> >> of a surprise if documented.
> >>
> >> What do you think?
> >>
> >
> > Yeah, I think it is better to document to avoid any surprises if
> > nobody else sees any problem with it.
>
> Ack.
>

This doesn't seem to be addressed in the latest version. And today, I
think I see one more point about this doc change:
+
+ A logical replication slot can also be created on a hot standby.
To prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot
between the primary
+ and the standby. Otherwise, hot_standby_feedback will work, but
only while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on
primary is reduced to
+ less than 'logical'.

If hot_standby_feedback is not set then can logical decoding on
standby misbehave? If so, that is not very clear from this doc change
if that is acceptable. One scenario where I think it can misbehave is
if applying WAL records generated after changing wal_level from
'logical' to 'replica' physically removes catalog tuples that could be
referenced by the logical decoding on the standby. Now, as mentioned
in patch 0003's comment in decode.c that it is possible that some
slots may creep even after we invalidate the slots on parameter
change, so while decoding using that slot if some required catalog
tuple has been removed by physical replication then the decoding can
misbehave even before reaching XLOG_PARAMETER_CHANGE record.

-- 
With Regards,
Amit Kapila.




Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-05 Thread Richard Guo
On Thu, Apr 6, 2023 at 8:18 AM Thomas Munro  wrote:

> On Thu, Apr 6, 2023 at 9:11 AM Tom Lane  wrote:
> > Richard Guo  writes:
> > > Thanks for reminding.  Attached is the rebased patch, with no other
> > > changes.  I think the patch is ready for commit.
> >
> > Pushed after a little further fooling with the comments.  I also had
> > to rebase it over 11c2d6fdf (Parallel Hash Full Join).  I think I did
> > that correctly, but it's not clear to me whether any of the existing
> > test cases are now doing parallelized hashed right antijoins.  Might
> > be worth a little more testing.
>
> I don't see any (at least that are EXPLAINed).  Wondering if we should
> add some of these into join_hash.sql, but probably not before I figure
> out how to make that whole file run faster...


Thanks Tom for the rebase and pushing.  Agreed that we need to add more
testing to cover Parallel Hash Right Anti Join.  I tried one in
join_hash.sql as below

explain (costs off)
select count(*) from simple r right join bigger_than_it_looks s using (id)
where r.id is null;
 QUERY PLAN
-
 Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Parallel Hash Right Anti Join
   Hash Cond: (r.id = s.id)
   ->  Parallel Seq Scan on simple r
   ->  Parallel Hash
 ->  Parallel Seq Scan on bigger_than_it_looks s
(8 rows)

But as Thomas said, maybe we need to wait until that file becomes
faster.

Thanks
Richard


Re: Should vacuum process config file reload more often

2023-04-05 Thread Masahiko Sawada
On Thu, Apr 6, 2023 at 12:29 AM Melanie Plageman
 wrote:
>
> Thanks all for the reviews.
>
> v16 attached. I put it together rather quickly, so there might be a few
> spurious whitespaces or similar. There is one rather annoying pgindent
> outlier that I have to figure out what to do about as well.
>
> The remaining functional TODOs that I know of are:
>
> - Resolve what to do about names of GUC and vacuum variables for cost
>   limit and cost delay (since it may affect extensions)
>
> - Figure out what to do about the logging message which accesses dboid
>   and tableoid (lock/no lock, where to put it, etc)
>
> - I see several places in docs which reference the balancing algorithm
>   for autovac workers. I did not read them in great detail, but we may
>   want to review them to see if any require updates.
>
> - Consider whether or not the initial two commits should just be
>   squashed with the third commit
>
> - Anything else reviewers are still unhappy with
>
>
> On Wed, Apr 5, 2023 at 1:56 AM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 5, 2023 at 5:05 AM Melanie Plageman
> >  wrote:
> > >
> > > On Tue, Apr 4, 2023 at 4:27 AM Masahiko Sawada  
> > > wrote:
> > > > ---
> > > > -if (worker->wi_proc != NULL)
> > > > -elog(DEBUG2, "autovac_balance_cost(pid=%d
> > > > db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d,
> > > > cost_delay=%g)",
> > > > - worker->wi_proc->pid,
> > > > worker->wi_dboid, worker->wi_tableoid,
> > > > - worker->wi_dobalance ? "yes" : "no",
> > > > - worker->wi_cost_limit,
> > > > worker->wi_cost_limit_base,
> > > > - worker->wi_cost_delay);
> > > >
> > > > I think it's better to keep this kind of log in some form for
> > > > debugging. For example, we can show these values of autovacuum workers
> > > > in VacuumUpdateCosts().
> > >
> > > I added a message to do_autovacuum() after calling VacuumUpdateCosts()
> > > in the loop vacuuming each table. That means it will happen once per
> > > table. It's not ideal that I had to move the call to VacuumUpdateCosts()
> > > behind the shared lock in that loop so that we could access the pid and
> > > such in the logging message after updating the cost and delay, but it is
> > > probably okay. Though noone is going to be changing those at this
> > > point, it still seemed better to access them under the lock.
> > >
> > > This does mean we won't log anything when we do change the values of
> > > VacuumCostDelay and VacuumCostLimit while vacuuming a table. Is it worth
> > > adding some code to do that in VacuumUpdateCosts() (only when the value
> > > has changed not on every call to VacuumUpdateCosts())? Or perhaps we
> > > could add it in the config reload branch that is already in
> > > vacuum_delay_point()?
> >
> > Previously, we used to show the pid in the log since a worker/launcher
> > set other workers' delay costs. But now that the worker sets its delay
> > costs, we don't need to show the pid in the log. Also, I think it's
> > useful for debugging and investigating the system if we log it when
> > changing the values. The log I imagined to add was like:
> >
> > @@ -1801,6 +1801,13 @@ VacuumUpdateCosts(void)
> > VacuumCostDelay = vacuum_cost_delay;
> >
> > AutoVacuumUpdateLimit();
> > +
> > +   elog(DEBUG2, "autovacuum update costs (db=%u, rel=%u,
> > dobalance=%s, cost_limit=%d, cost_delay=%g active=%s failsafe=%s)",
> > +MyWorkerInfo->wi_dboid, MyWorkerInfo->wi_tableoid,
> > +pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance)
> > ? "no" : "yes",
> > +VacuumCostLimit, VacuumCostDelay,
> > +VacuumCostDelay > 0 ? "yes" : "no",
> > +VacuumFailsafeActive ? "yes" : "no");
> > }
> > else
> > {
>
> Makes sense. I've updated the log message to roughly what you suggested.
> I also realized I think it does make sense to call it in
> VacuumUpdateCosts() -- only for autovacuum workers of course. I've done
> this. I haven't taken the lock though and can't decide if I must since
> they access dboid and tableoid -- those are not going to change at this
> point, but I still don't know if I can access them lock-free...
> Perhaps there is a way to condition it on the log level?
>
> If I have to take a lock, then I don't know if we should put these in
> VacuumUpdateCosts()...

I think we don't need to acquire a lock there as both values are
updated only by workers reporting this message. Also I agree with
where to put the log but I think the log message should start with
lower cases:

+elog(DEBUG2,
+ "Autovacuum VacuumUpdateCosts(db=%u, rel=%u,
dobalance=%s, cost_limit=%d, cost_delay=%g active=%s failsafe=%s)",
+ MyWorkerInfo->wi_dboid, MyWorkerInfo->wi_tableoid,
+
pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_d

Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-05 Thread Richard Guo
On Thu, Apr 6, 2023 at 1:06 PM Tom Lane  wrote:

> This:
>
> > +#if 0
> > /* If any limit was set to zero, the user doesn't want a
> > parallel scan. */
> > if (parallel_workers <= 0)
> > return;
> > +#endif
>
> seems like it adds a lot of new paths with a lot lower chance
> of win, but maybe we could tighten the conditions to improve
> the odds?


Seems it wins if the parallel scan becomes part of a hash join in final
plan.  I wonder if we have a way to know that in this early stage.

BTW, zero parallel_workers seems would break some later assumptions, so
we may need to give it a meaningful number if we want to do in this way.

Thanks
Richard


Re: Minimal logical decoding on standbys

2023-04-05 Thread Amit Kapila
On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand
 wrote:
>
> On 4/5/23 3:15 PM, Amit Kapila wrote:
> > On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand
> >  wrote:
> >>
> >> On 4/5/23 12:28 PM, Amit Kapila wrote:
> >>> On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand
> >>>  wrote:
> >>
> >>> minor nitpick:
> >>> +
> >>> + /* Intentional fall through to session cancel */
> >>> + /* FALLTHROUGH */
> >>>
> >>> Do we need to repeat fall through twice in different ways?
> >>>
> >>
> >> Do you mean, you'd prefer what was done in v52/0002?
> >>
> >
> > No, I was thinking that instead of two comments, we need one here.
> > But, now thinking about it, do we really need to fall through in this
> > case, if so why? Shouldn't this case be handled after
> > PROCSIG_RECOVERY_CONFLICT_DATABASE?
> >
>
> Indeed, thanks! Done in V61 posted up-thread.
>

After this, I think for backends that have active slots, it would
simply cancel the current query. Will that be sufficient? Because we
want the backend process should exit and release the slot so that the
startup process can mark it invalid. For walsender, an ERROR will lead
to its exit, so that is fine. If this understanding is correct, then
if 'am_cascading_walsender' is false, we should set ProcDiePending
apart from other parameters. Sorry, I haven't tested this, so I could
be wrong here. Also, it seems you have removed the checks related to
slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only
used for logical slots? If so, do you think an Assert would make
sense?

Another comment on 0001.
 extern void CheckSlotRequirements(void);
 extern void CheckSlotPermissions(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid,
TransactionId xid, char *reason);

This doesn't seem to be called from anywhere.

-- 
With Regards,
Amit Kapila.




<    1   2