date:20220330

Re: Add non-blocking version of PQcancel

2022-03-30 Thread Justin Pryzby

Note that the patch is still variously failing in cirrus.
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/37/3511

You may already know that it's possible to trigger the cirrus ci tasks using a
github branch.  See src/tools/ci/README.

Re: In-placre persistance change of a relation

2022-03-30 Thread Justin Pryzby

On Thu, Mar 31, 2022 at 01:58:45PM +0900, Kyotaro Horiguchi wrote:
> Thanks! Version 20 is attached.

The patch failed an all CI tasks, and seems to have caused the macos task to
hang.

http://cfbot.cputube.org/kyotaro-horiguchi.html

Would you send a fixed patch, or remove this thread from the CFBOT ?  Otherwise
cirrrus will try to every day to rerun but take 1hr to time out, which is twice
as slow as the slowest OS.

I think this patch should be moved to the next CF and set to v16.

Thanks,
-- 
Justin

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-30 Thread Andres Freund

On 2022-03-31 01:00:14 -0400, Tom Lane wrote:
> How well does this patch work with pre-14 buildfarm clients?

Looks to me like it'll just run the test twice, once via TestUpgrade, once via
taptest. It's possible that there could be trouble somehow due to duplicated
log files or something?

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-30 Thread Tom Lane

I wrote:
> There's still about a third of the buildfarm running older
> client releases --- I count

>   2 REL_8
>   2 REL_10
>  13 REL_11
>   6 REL_12
>  16 REL_13.1
>  89 REL_14

Wait a minute ... actually, what's most relevant here is
the population running TAP tests, which seems to be

  2 REL_8
  4 REL_11
  1 REL_12
  7 REL_13.1
 53 REL_14

So there are still some people we'd have to nag if it doesn't
work pre-v14, but fewer than I thought --- specifically,
the owners of

butterflyfish
copperhead
eelpout
elver
halibut
kittiwake
mantid
marabou
massasauga
myna
snakefly
snapper
spurfowl
tadarida

regards, tom lane

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-31 00:08:00 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that -
> > we only enable it when not using unix sockets:
> 
> Duh.  But can it work over unix sockets?

I wonder if we should go the other way and use unix sockets by default in the
tests. Even if CI windows could be made to use SSPI, it'd still be further
away from the environment most of us write tests in.

Afaics the only reason we use SSPI is to secure the tests, because they run
over tcp by default. But since we have unix socket support for windows now,
that shouldn't really be needed anymore.

The only animal that might not be new enough for it is hamerkop. I don't
really understand when windows features end up in which release.

Looking at 8f3ec75de40 it seems we just assume unix sockets are available, we
don't have a version / feature test (win32.h just defines
HAVE_STRUCT_SOCKADDR_UN).

Greetings,

Andres Freund

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-30 Thread Tom Lane

Michael Paquier  writes:
> So, any particular feelings about this patch?  This has been around
> for a couple of months/years now, so it could be a good time to do the
> switch now rather than wait an extra year, or even the beginning of
> the next release cycle.  And the buildfarm is already able to handle
> that in its code based on the last release, by skipping the upgrade
> check if it finds a pg_upgrade/t/ subdirectory.

There's still about a third of the buildfarm running older
client releases --- I count

  2 REL_8
  2 REL_10
 13 REL_11
  6 REL_12
 16 REL_13.1
 89 REL_14

How well does this patch work with pre-14 buildfarm clients?

regards, tom lane

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 21:29:16 -0700, Peter Geoghegan wrote:
> On Wed, Mar 30, 2022 at 9:20 PM Andres Freund  wrote:
> > But the debug elog reports that
> >
> > relfrozenxid updated 714 -> 717
> > relminmxid updated 1 -> 6
> >
> > Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid
> > from the shared relcache init file written by another backend:
> 
> We should have added logging of relfrozenxid and relminmxid a long time ago.

At least at DEBUG1 or such.

> > This is basically the inverse of a54e1f15 - we read a *newer* horizon. 
> > That's
> > normally fairly harmless - I think.
> 
> Is this one pretty old?

What do you mean with "this one"? The cause for the assert failure?

I'm not sure there's a proper bug on HEAD here. I think at worst it can delay
the horizon increasing a bunch, by falsely not using an aggressive vacuum when
we should have - might even be limited to a single autovacuum cycle.

> > Perhaps we should just fetch the horizons from the "local" catalog for 
> > shared
> > rels?
> 
> Not sure what you mean.

Basically, instead of relying on the relcache, which for shared relation is
vulnerable to seeing "too new" horizons due to the shared relcache init file,
explicitly load relfrozenxid / relminmxid from the the catalog / syscache.

I.e. fetch the relevant pg_class row in heap_vacuum_rel() (using
SearchSysCache[Copy1](RELID)). And use that to set vacrel->relfrozenxid
etc. Whereas right now we only fetch the pg_class row in
vac_update_relstats(), but use the relcache before.

Greetings,

Andres Freund

Re: In-placre persistance change of a relation

2022-03-30 Thread Kyotaro Horiguchi

Thanks! Version 20 is attached.


At Wed, 30 Mar 2022 08:44:02 -0500, Justin Pryzby  wrote 
in 
> On Tue, Mar 01, 2022 at 02:14:13PM +0900, Kyotaro Horiguchi wrote:
> > Rebased on a recent xlog refactoring.
> 
> It'll come as no surprise that this neds to be rebased again.
> At least a few typos I reported in January aren't fixed.
> Set to "waiting".

Oh, I'm sorry for overlooking it. It somehow didn't show up on my
mailer.

> I started looking at this and reviewed docs and comments again.
> 
> > +typedef struct PendingCleanup
> > +{
> > +   RelFileNode relnode;/* relation that may need to be deleted 
> > */
> > +   int op; /* operation 
> > mask */
> > +   boolbufpersistence; /* buffer persistence to set */
> > +   int unlink_forknum; /* forknum to unlink */
> 
> This can be of data type "ForkNumber"

Right. Fixed.

> > +* We are going to create an init fork. If server crashes before the
> > +* current transaction ends the init fork left alone corrupts data while
> > +* recovery.  The mark file works as the sentinel to identify that
> > +* situation.
> 
> s/while/during/

This was in v17, but dissapeared in v18.

> > +* index-init fork needs further initialization. ambuildempty shoud do
> 
> should (I reported this before)
> 
> > +   if (inxact_created)
> > +   {
> > +   SMgrRelation srel = smgropen(rnode, InvalidBackendId);
> > +
> > +   /*
> > +* INIT forks never be loaded to shared buffer so no point in 
> > dropping
> 
> "are never loaded"

If was fixed in v18.

> > +   elog(DEBUG1, "perform in-place persistnce change");
> 
> persistence (I reported this before)

Sorry. Fixed.

> > +   /*
> > +* While wal_level >= replica, switching to LOGGED requires the
> > +* relation content to be WAL-logged to recover the table.
> > +* We don't emit this fhile wal_level = minimal.
> 
> while (or "if")

There are "While" and "fhile". I changed the latter to "if".

> > +* The relation is persistent and stays remain 
> > persistent. Don't
> > +* drop the buffers for this relation.
> 
> "stays remain" is redundant (I reported this before)

Thanks. I changed it to "stays persistent".

> > +   if (unlink(rm_path) < 0)
> > +   ereport(ERROR,
> > +   (errcode_for_file_access(),
> > +errmsg("could not remove file 
> > \"%s\": %m",
> > +   rm_path)));
> 
> The parens around errcode are unnecessary since last year.
> I suggest to avoid using them here and elsewhere.

It is just moved from elsewhere without editing, but of course I can
do that.  I didn't know about that change of ereport and not found the
corresponding commit, but I found that Tom mentioned that change.

https://www.postgresql.org/message-id/flat/5063.1584641224%40sss.pgh.pa.us#63e611c30800133bbddb48de857668e8
> Now that we can rely on having varargs macros, I think we could
> stop requiring the extra level of parentheses, ie instead of
...
> ereport(ERROR,
> errcode(ERRCODE_DIVISION_BY_ZERO),
> errmsg("division by zero"));
> 
> (The old syntax had better still work, of course.  I'm not advocating
> running around and changing existing calls.)

I changed all ereport calls added by this patch to this style.

> > +* And we may have SMGR_MARK_UNCOMMITTED file.  Remove 
> > it if the
> > +* fork files has been successfully removed. It's ok if 
> > the file
> 
> file

Fixed.

> > + 
> > +  All tables in the current database in a tablespace can be changed by 
> > using
> 
> given tablespace

I did /database in a tablespace/database in the given tablespace/. Is
it right?

> > +  the ALL IN TABLESPACE form, which will lock all 
> > tables
> 
> which will first lock
>
> > +  to be changed first and then change each one.  This form also 
> > supports
> 
> remove "first" here

This is almost a dead copy of the description of SET TABLESPACE. This
change makes the two almost the same description vary slightly in that
wordings.  Anyway I did that as suggested only for the part this patch
adds in this version.

> > +  OWNED BY, which will only change tables owned by 
> > the
> > +  roles specified.  If the NOWAIT option is 
> > specified
> 
> specified roles.
> is specified, (comma)

This is the same as above. I did that but it makes the description
differ from another almost-the-same description.

> > +  then the command will fail if it is unable to acquire all of the 
> > locks
> 
> if it is unable to immediately acquire
>
> > +  required immediately.  The information_schema
> 
> remove immediately

Ditto.

> > +  relations are not considered part of the

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 9:29 PM Peter Geoghegan  wrote:
> > Perhaps we should just fetch the horizons from the "local" catalog for 
> > shared
> > rels?
>
> Not sure what you mean.

Wait, you mean use vacrel->relfrozenxid directly? Seems kind of ugly...

-- 
Peter Geoghegan

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 9:20 PM Andres Freund  wrote:
> But the debug elog reports that
>
> relfrozenxid updated 714 -> 717
> relminmxid updated 1 -> 6
>
> Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid
> from the shared relcache init file written by another backend:

We should have added logging of relfrozenxid and relminmxid a long time ago.

> This is basically the inverse of a54e1f15 - we read a *newer* horizon. That's
> normally fairly harmless - I think.

Is this one pretty old?

> Perhaps we should just fetch the horizons from the "local" catalog for shared
> rels?

Not sure what you mean.

-- 
Peter Geoghegan

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 21:11:48 -0700, Peter Geoghegan wrote:
> On Wed, Mar 30, 2022 at 9:04 PM Andres Freund  wrote:
> > (gdb) p vacrel->NewRelfrozenXid
> > $3 = 717
> > (gdb) p vacrel->relfrozenxid
> > $4 = 717
> > (gdb) p OldestXmin
> > $5 = 5112
> > (gdb) p aggressive
> > $6 = false
>
> Does this OldestXmin seem reasonable at this point in execution, based
> on context? Does it look too high? Something else?

Reasonable:
(gdb) p *ShmemVariableCache
$1 = {nextOid = 78969, oidCount = 2951, nextXid = {value = 21411}, oldestXid = 
714, xidVacLimit = 20714, xidWarnLimit = 2107484361,
  xidStopLimit = 2144484361, xidWrapLimit = 2147484361, oldestXidDB = 1, 
oldestCommitTsXid = 0, newestCommitTsXid = 0, latestCompletedXid = {value = 
21408},
  xactCompletionCount = 1635, oldestClogXid = 714}

I think the explanation I just sent explains the problem, without "in-memory"
confusion about what's running and what's not.

Greetings,

Andres Freund

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 21:04:07 -0700, Andres Freund wrote:
> On 2022-03-30 20:35:25 -0700, Peter Geoghegan wrote:
> > On Wed, Mar 30, 2022 at 8:28 PM Andres Freund  wrote:
> > > I triggered twice now, but it took a while longer the second time.
> >
> > Great.
> >
> > I wonder if you can get an RR recording...
>
> Started it, but looks like it's too slow.
>
> (gdb) p MyProcPid
> $1 = 2172500
>
> (gdb) p vacrel->NewRelfrozenXid
> $3 = 717
> (gdb) p vacrel->relfrozenxid
> $4 = 717
> (gdb) p OldestXmin
> $5 = 5112
> (gdb) p aggressive
> $6 = false

I added a bunch of debug elogs to see what sets *frozenxid_updated to true.

(gdb) p *vacrel
$1 = {rel = 0x7fe24f3e0148, indrels = 0x7fe255c17ef8, nindexes = 2, aggressive 
= false, skipwithvm = true, failsafe_active = false,
  consider_bypass_optimization = true, do_index_vacuuming = true, 
do_index_cleanup = true, do_rel_truncate = true, bstrategy = 0x7fe255bb0e28, 
pvs = 0x0,
  relfrozenxid = 717, relminmxid = 6, old_live_tuples = 42, OldestXmin = 20751, 
vistest = 0x7fe255058970 , FreezeLimit = 4244988047,
  MultiXactCutoff = 4289967302, NewRelfrozenXid = 717, NewRelminMxid = 6, 
skippedallvis = false, relnamespace = 0x7fe255c17bf8 "pg_catalog",
  relname = 0x7fe255c17cb8 "pg_database", indname = 0x0, blkno = 4294967295, 
offnum = 0, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, verbose = false,
  dead_items = 0x7fe255c131d0, rel_pages = 8, scanned_pages = 8, removed_pages 
= 0, lpdead_item_pages = 0, missed_dead_pages = 0, nonempty_pages = 8,
  new_rel_tuples = 124, new_live_tuples = 42, indstats = 0x7fe255c18320, 
num_index_scans = 0, tuples_deleted = 0, lpdead_items = 0, live_tuples = 42,
  recently_dead_tuples = 82, missed_dead_tuples = 0}

But the debug elog reports that

relfrozenxid updated 714 -> 717
relminmxid updated 1 -> 6

Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid
from the shared relcache init file written by another backend:

2022-03-30 21:10:47.626 PDT [2625038][autovacuum worker][6/433:0][] LOG:  
automatic vacuum of table 
"contrib_regression_postgres_fdw.pg_catalog.pg_database": index scans: 1
pages: 0 removed, 8 remain, 8 scanned (100.00% of total)
tuples: 4 removed, 114 remain, 72 are dead but not yet removable
removable cutoff: 20751, older by 596 xids when operation ended
new relfrozenxid: 717, which is 3 xids ahead of previous value
new relminmxid: 6, which is 5 mxids ahead of previous value
index scan needed: 3 pages from table (37.50% of total) had 8 dead item 
identifiers removed
index "pg_database_datname_index": pages: 2 in total, 0 newly deleted, 
0 currently deleted, 0 reusable
index "pg_database_oid_index": pages: 6 in total, 0 newly deleted, 2 
currently deleted, 2 reusable
I/O timings: read: 0.050 ms, write: 0.102 ms
avg read rate: 209.860 MB/s, avg write rate: 76.313 MB/s
buffer usage: 42 hits, 22 misses, 8 dirtied
WAL usage: 13 records, 5 full page images, 33950 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
...
2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][:0][] DEBUG:  
InitPostgres
2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/0:0][] DEBUG:  my 
backend ID is 6
2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/0:0][] LOG:  reading 
shared init file
2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/443:0][] DEBUG:  
StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, 
xid/sub>
2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/443:0][] LOG:  
reading non-shared init file

This is basically the inverse of a54e1f15 - we read a *newer* horizon. That's
normally fairly harmless - I think.

Perhaps we should just fetch the horizons from the "local" catalog for shared
rels?

Greetings,

Andres Freund

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-30 Thread Dilip Kumar

On Thu, Mar 31, 2022 at 5:07 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-03-29 11:55:05 -0400, Robert Haas wrote:
> > I committed v6 instead.
>
> I was just discussing the WAL prefetching patch with Thomas. A question in
> that discussion made me look at the coverage of REDO for CREATE DATABASE:
> https://coverage.postgresql.org/src/backend/commands/dbcommands.c.gcov.html
>
> Seems there's currently nothing hitting the REDO for
> XLOG_DBASE_CREATE_FILE_COPY (currently line 3019). I think it'd be good to
> keep coverage for that. How about adding a
>   CREATE DATABASE ... STRATEGY file_copy
> to 001_stream_rep.pl?
>
>
> Might be worth adding a test for ALTER DATABASE ... SET TABLESPACE at the same
> time, this patch did affect that path in some minor ways. And, somewhat
> shockingly, we don't have a single test for it.

I will add tests for both of these cases and send the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 9:04 PM Andres Freund  wrote:
> (gdb) p vacrel->NewRelfrozenXid
> $3 = 717
> (gdb) p vacrel->relfrozenxid
> $4 = 717
> (gdb) p OldestXmin
> $5 = 5112
> (gdb) p aggressive
> $6 = false

Does this OldestXmin seem reasonable at this point in execution, based
on context? Does it look too high? Something else?

-- 
Peter Geoghegan

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-30 Thread Michael Paquier

On Thu, Mar 03, 2022 at 02:03:38PM +0900, Michael Paquier wrote:
> Indeed.  I have reworked the whole, rather than just those three
> sentences.

So, any particular feelings about this patch?  This has been around
for a couple of months/years now, so it could be a good time to do the
switch now rather than wait an extra year, or even the beginning of
the next release cycle.  And the buildfarm is already able to handle
that in its code based on the last release, by skipping the upgrade
check if it finds a pg_upgrade/t/ subdirectory.
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Andres Freund  writes:
> On 2022-03-30 22:07:48 -0400, Tom Lane wrote:
>> So ... none of the Windows buildfarm members actually like this
>> test script.

> On windows CI sets
> # Avoids port conflicts between concurrent tap test runs
> PG_TEST_USE_UNIX_SOCKETS: 1

Ok ...

> I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that -
> we only enable it when not using unix sockets:

Duh.  But can it work over unix sockets?

regards, tom lane

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 20:35:25 -0700, Peter Geoghegan wrote:
> On Wed, Mar 30, 2022 at 8:28 PM Andres Freund  wrote:
> > I triggered twice now, but it took a while longer the second time.
>
> Great.
>
> I wonder if you can get an RR recording...

Started it, but looks like it's too slow.

(gdb) p MyProcPid
$1 = 2172500

(gdb) p vacrel->NewRelfrozenXid
$3 = 717
(gdb) p vacrel->relfrozenxid
$4 = 717
(gdb) p OldestXmin
$5 = 5112
(gdb) p aggressive
$6 = false

There was another autovacuum of pg_database 10s before:

2022-03-30 20:35:17.622 PDT [2165344][autovacuum worker][5/3:0][] LOG:  
automatic vacuum of table "postgres.pg_catalog.pg_database": index scans: 1
pages: 0 removed, 3 remain, 3 scanned (100.00% of total)
tuples: 61 removed, 4 remain, 1 are dead but not yet removable
removable cutoff: 1921, older by 3 xids when operation ended
new relfrozenxid: 717, which is 3 xids ahead of previous value
index scan needed: 3 pages from table (100.00% of total) had 599 dead 
item identifiers removed
index "pg_database_datname_index": pages: 2 in total, 0 newly deleted, 
0 currently deleted, 0 reusable
index "pg_database_oid_index": pages: 4 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
I/O timings: read: 0.029 ms, write: 0.034 ms
avg read rate: 134.120 MB/s, avg write rate: 89.413 MB/s
buffer usage: 35 hits, 12 misses, 8 dirtied
WAL usage: 12 records, 5 full page images, 27218 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

The dying backend:
2022-03-30 20:35:27.668 PDT [2172500][autovacuum worker][7/0:0][] DEBUG:  
autovacuum: processing database "contrib_regression_hstore"
...
2022-03-30 20:35:27.690 PDT [2172500][autovacuum worker][7/674:0][] CONTEXT:  
while cleaning up index "pg_database_oid_index" of relation 
"pg_catalog.pg_database"


Greetings,

Andres Freund

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-30 Thread Michael Paquier

On Wed, Mar 30, 2022 at 08:23:25PM +1300, Thomas Munro wrote:
> That leads to the attached patches, the first of which I'd want to back-patch.

Makes sense.

-   if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
-   d->ret.d_type = DT_DIR;
/* For reparse points dwReserved0 field will contain the ReparseTag */
-   else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
-(fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
+   if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
+   (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
d->ret.d_type = DT_LNK;
+   else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
+   d->ret.d_type = DT_DIR;

This should also work for plain files, so that looks fine to me.

> Unfortunately while testing this I realised there is something else
> wrong here: if you take a basebackup using tar format, in-place
> tablespaces are skipped (they should get their own OID.tar file, but
> they don't, also no error).  While it wasn't one of my original goals
> for in-place tablespaces to work in every way (and I'm certain some
> external tools would be confused by them), it seems we're pretty close
> so we should probably figure out that piece of the puzzle.  It may be
> obvious why but I didn't have time to dig into that today... perhaps
> instead of just skipping the readlink() we should be writing something
> different into the mapfile and then restoring as appropriate...

Yeah, I saw that in-place tablespaces were part of the main tarball in
base backups as we rely on the existence of a link to decide if the
contents of a path should be separated in a different tarball or not.
This does not strike me as a huge problem in itself, TBH, as the
improvement would be limited to make sure that the base backups could
be correctly restored with multiple tablespaces.  And you can get
pretty much the same amount of coverage to make sure that the backup
contents are correct without fully restoring them.
--
Michael


signature.asc
Description: PGP signature

Re: Handle infinite recursion in logical replication setup

2022-03-30 Thread Amit Kapila

On Wed, Mar 30, 2022 at 7:40 PM Ashutosh Bapat
 wrote:
> >
> > The changes for the same are available int the v5 patch available at [1].
> > [1] - 
> > https://www.postgresql.org/message-id/CALDaNm3wCf0YcvVo%2BgHMGpupk9K6WKJxCyLUvhPC2GkPKRZUWA%40mail.gmail.com
> >
>
>  cb->truncate_cb = pg_decode_truncate;
>  cb->commit_cb = pg_decode_commit_txn;
>  cb->filter_by_origin_cb = pg_decode_filter;
> +cb->filter_remote_origin_cb = pg_decode_filter_remote_origin;
>
> Why do we need a new hook? Can we use filter_by_origin_cb?
>

I also think there is no need for a new hook in this case but I might
also be missing something that Vignesh has in his mind.

> Also it looks like
> implementation of pg_decode_filter and pg_decode_filter_remote_origin is same,
> unless my eyes are deceiving me.
>
> -  binary, streaming, and
> -  disable_on_error.
> +  binary, streaming,
> +  disable_on_error and
> +  publish_local_only.
>
> "publish_local_only" as a "subscription" option looks odd. Should it be
> "subscribe_local_only"?
>

I also think "subscribe_local_only" fits better here.

-- 
With Regards,
Amit Kapila.

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 20:28:44 -0700, Andres Freund wrote:
> I was able to trigger the crash.
> 
> cat ~/tmp/pgbench-createdb.sql
> CREATE DATABASE pgb_:client_id;
> DROP DATABASE pgb_:client_id;
> 
> pgbench -n -P1 -c 10 -j10 -T100 -f ~/tmp/pgbench-createdb.sql
> 
> while I was also running
> 
> for i in $(seq 1 100); do echo iteration $i; make -Otarget -C contrib/ -s 
> installcheck -j48 -s prove_installcheck=true USE_MODULE_DB=1 > /tmp/ci-$i.log 
> 2>&1; done
> 
> I triggered twice now, but it took a while longer the second time.

Forgot to say how postgres was started. Via my usual devenv script, which
results in:

+ /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -c 
hba_file=/home/andres/tmp/pgdev/pg_hba.conf -D /srv/dev/pgdev-dev/ -p 5440 -c 
shared_buffers=2GB -c wal_level=hot_standby -c max_wal_senders=10 -c 
track_io_timing=on -c restart_after_crash=false -c max_prepared_transactions=20 
-c log_checkpoints=on -c min_wal_size=48MB -c max_wal_size=150GB -c 
'cluster_name=dev assert' -c 
ssl_cert_file=/home/andres/tmp/pgdev/ssl-cert-snakeoil.pem -c 
ssl_key_file=/home/andres/tmp/pgdev/ssl-cert-snakeoil.key -c 
'log_line_prefix=%m [%p][%b][%v:%x][%a] ' -c shared_buffers=16MB -c 
log_min_messages=debug1 -c log_connections=on -c allow_in_place_tablespaces=1 
-c log_autovacuum_min_duration=0 -c log_lock_waits=true -c 
autovacuum_naptime=10s -c fsync=off

Greetings,

Andres Freund

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 8:28 PM Andres Freund  wrote:
> I triggered twice now, but it took a while longer the second time.

Great.

I wonder if you can get an RR recording...
-- 
Peter Geoghegan

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

I was able to trigger the crash.

cat ~/tmp/pgbench-createdb.sql
CREATE DATABASE pgb_:client_id;
DROP DATABASE pgb_:client_id;

pgbench -n -P1 -c 10 -j10 -T100 -f ~/tmp/pgbench-createdb.sql

while I was also running

for i in $(seq 1 100); do echo iteration $i; make -Otarget -C contrib/ -s 
installcheck -j48 -s prove_installcheck=true USE_MODULE_DB=1 > /tmp/ci-$i.log 
2>&1; done

I triggered twice now, but it took a while longer the second time.

(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
set = {__val = {4194304, 0, 0, 0, 0, 0, 216172782113783808, 2, 
2377909399344644096, 18446497967838863616, 0, 0, 0, 0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7fe49a2db546 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}},
  sa_flags = 0, sa_restorer = 0x107e0}
sigs = {__val = {32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}
#2  0x7fe49b9706f1 in ExceptionalCondition (conditionName=0x7fe49ba0618d 
"diff > 0", errorType=0x7fe49ba05bd1 "FailedAssertion",
fileName=0x7fe49ba05b90 
"/home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c", 
lineNumber=724)
at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
No locals.
#3  0x7fe49b2fc739 in heap_vacuum_rel (rel=0x7fe497a8d148, 
params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10)
at /home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:724
buf = {
  data = 0x7fe49c17e238 "automatic vacuum of table 
\"contrib_regression_dict_int.pg_catalog.pg_database\": index scans: 1\npages: 
0 removed, 3 remain, 3 scanned (100.00% of total)\ntuples: 49 removed, 53 
remain, 9 are dead but no"..., len = 279, maxlen = 1024, cursor = 0}
msgfmt = 0x7fe49ba06038 "automatic vacuum of table \"%s.%s.%s\": index 
scans: %d\n"
diff = 0
endtime = 702011687982080
vacrel = 0x7fe49c19b5b8
verbose = false
instrument = true
ru0 = {tv = {tv_sec = 1648696487, tv_usec = 975963}, ru = {ru_utime = 
{tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 3086}, {
--Type  for more, q to quit, c to continue without paging--c
  ru_maxrss = 10824, __ru_maxrss_word = 10824}, {ru_ixrss = 0, 
__ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, 
__ru_isrss_word = 0}, {ru_minflt = 449, __ru_minflt_word = 449}, {ru_majflt = 
0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, 
__ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 
0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 
0, __ru_nsignals_word = 0}, {ru_nvcsw = 2, __ru_nvcsw_word = 2}, {ru_nivcsw = 
0, __ru_nivcsw_word = 0}}}
starttime = 702011687975964
walusage_start = {wal_records = 0, wal_fpi = 0, wal_bytes = 0}
walusage = {wal_records = 11, wal_fpi = 7, wal_bytes = 30847}
secs = 0
usecs = 6116
read_rate = 16.606033355134073
write_rate = 7.6643230869849575
aggressive = false
skipwithvm = true
frozenxid_updated = true
minmulti_updated = true
orig_rel_pages = 3
new_rel_pages = 3
new_rel_allvisible = 0
indnames = 0x7fe49c19bb28
errcallback = {previous = 0x0, callback = 0x7fe49b3012fd 
, arg = 0x7fe49c19b5b8}
startreadtime = 180
startwritetime = 0
OldestXmin = 67552
FreezeLimit = 4245034848
OldestMxact = 224
MultiXactCutoff = 4289967520
__func__ = "heap_vacuum_rel"
#4  0x7fe49b523d92 in table_relation_vacuum (rel=0x7fe497a8d148, 
params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10) at 
/home/andres/src/postgresql/src/include/access/tableam.h:1680
No locals.
#5  0x7fe49b527032 in vacuum_rel (relid=1262, relation=0x7fe49c1ae360, 
params=0x7fe49c130d7c) at 
/home/andres/src/postgresql/src/backend/commands/vacuum.c:2065
lmode = 4
rel = 0x7fe497a8d148
lockrelid = {relId = 1262, dbId = 0}
toast_relid = 0
save_userid = 10
save_sec_context = 0
save_nestlevel = 2
__func__ = "vacuum_rel"
#6  0x7fe49b524c3b in vacuum (relations=0x7fe49c1b03a8, 
params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10, isTopLevel=true) at 
/home/andres/src/postgresql/src/backend/commands/vacuum.c:482
vrel = 0x7fe49c1ae3b8
cur__state = {l = 0x7fe49c1b03a8, i = 0}
cur = 0x7fe49c1b03c0
_save_exception_stack = 0x7fff97e35a10
_save_context_stack = 0x0
_local_sigjmp_buf = {{__jmpbuf = {140735741652128, 6126579318940970843, 
9223372036854775747, 0, 0, 0, 6126579318957748059, 6139499258682879835}, 
__mask_was_saved = 0, __saved_mask = {__val = {32, 140619848279000, 8590910454, 
140619848278592, 32, 140619848278944, 7784,

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 7:37 PM Peter Geoghegan  wrote:
> Yeah, a WARNING would be good here. I can write a new version of my
> patch series with a separation patch for that this evening. Actually,
> better make it a PANIC for now...

Attached is v14, which includes a new patch that PANICs like that in
vac_update_relstats() --- 0003.

This approach also covers manual VACUUMs, which isn't the case with
the failing assertion, which is in instrumentation code (actually
VACUUM VERBOSE might hit it).

I definitely think that something like this should be committed.
Silently ignoring system catalog corruption isn't okay.

-- 
Peter Geoghegan

v14-0003-PANIC-on-relfrozenxid-from-the-future.patch
Description: Binary data

v14-0001-Set-relfrozenxid-to-oldest-extant-XID-seen-by-VA.patch
Description: Binary data

v14-0002-Generalize-how-VACUUM-skips-all-frozen-pages.patch
Description: Binary data

v14-0004-vacuumlazy.c-Move-resource-allocation-to-heap_va.patch
Description: Binary data

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 7:00 PM Andres Freund  wrote:
> Something vaguely like EXPECT_EQ_U32 in regress.c. Maybe
> AssertCmp(type, a, op, b),
>
> Then the assertion could have been something like
>AssertCmp(int32, diff, >, 0)

I'd definitely use them if they were there.

> Does the line number in the failed run actually correspond to the xid, rather
> than the mxid case? I didn't check.

Yes, I verified -- definitely relfrozenxid.

> You can do something similar locally on linux with
> make -Otarget -C contrib/ -j48 -s USE_MODULE_DB=1 installcheck 
> prove_installcheck=true
> (the prove_installcheck=true to prevent tap tests from running, we don't seem
> to have another way for that)
>
> I don't think windows uses USE_MODULE_DB=1, but it allows to cause a lot more
> load concurrently than running tests serially...

Can't get it to fail locally with that recipe.

> > Assert(vacrel->NewRelfrozenXid == OldestXmin ||
> >TransactionIdPrecedesOrEquals(vacrel->relfrozenxid,
> >  vacrel->NewRelfrozenXid));
>
> The comment in your patch says "is either older or newer than FreezeLimit" - I
> assume that's some rephrasing damage?

Both the comment and the assertion are correct. I see what you mean, though.

> Perhaps it's worth commiting improved assertions on master? If this is indeed
> a pre-existing bug, and we're just missing due to slightly less stringent
> asserts, we could rectify that separately.

I don't think there's much chance of the assertion actually hitting
without the rest of the patch series. The new relfrozenxid value is
always going to be OldestXmin - vacuum_min_freeze_age on HEAD, while
with the patch it's sometimes close to OldestXmin. Especially when you
have lots of dead tuples that you churn through constantly (like
pgbench_tellers, or like these system catalogs on the CI test
machine).

> Hm. This triggers some vague memories. There's some oddities around shared
> relations being vacuumed separately in all the databases and thus having
> separate horizons.

That's what I was thinking of, obviously.

> After "remembering" that, I looked in the cirrus log for the failed run, and
> the worker was processing shared a shared relation last:
>
> 2022-03-30 03:48:30.238 GMT [5984][autovacuum worker] LOG:  automatic analyze 
> of table "contrib_regression.pg_catalog.pg_authid"

I noticed the same thing myself. Should have said sooner.

> Perhaps this ought to be an elog() instead of an Assert()? Something has gone
> pear shaped if we get here... It's a bit annoying though, because it'd have to
> be a PANIC to be visible on the bf / CI :(.

Yeah, a WARNING would be good here. I can write a new version of my
patch series with a separation patch for that this evening. Actually,
better make it a PANIC for now...

-- 
Peter Geoghegan

Re: Add LZ4 compression in pg_dump

2022-03-30 Thread Michael Paquier

On Wed, Mar 30, 2022 at 03:32:55PM +, gkokola...@pm.me wrote:
> On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier 
>  wrote:
>> While moving on with 0002, I have noticed the following in
>>
>> _StartBlob():
>> if (AH->compression != 0)
>> sfx = ".gz";
>> else
>> sfx = "";
>>
>> Shouldn't this bit also be simplified, adding a fatal() like the other
>> code paths, for safety?
> 
> Agreed. Fixed.

Okay.  0002 looks fine as-is, and I don't mind the extra fatal()
calls.  These could be asserts but that's not a big deal one way or
the other.  And the cleanup is now applied.

>> + my $compress_program = $ENV{GZIP_PROGRAM};
>>
>> It seems to me that it is enough to rely on {compress_cmd}, hence
>> there should be no need for $compress_program, no?
> 
> Maybe not. We don't want to the tests to fail if the utility is not
> installed. That becomes even more evident as more methods are added.
> However I realized that the presence of the environmental variable does
> not guarrantee that the utility is actually installed. In the attached,
> the existance of the utility is based on the return value of system_log().

Hmm.  [.. thinks ..]  The thing that's itching me here is that you
align the concept of compression with gzip, but that's not going to be
true once more compression options are added to pg_dump, and that
would make $supports_compression and $compress_program_exists
incorrect.  Perhaps the right answer would be to rename all that with
a suffix like "_gzip" to make a difference?  Or would there be enough
control with a value of "compression_gzip" instead of "compression" in
test_key?

+my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h',
+ '>', '/dev/null') == 0);
Do we need this command execution at all?  In all the other tests, we
rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do
the same here.

A last thing is that we should perhaps make a clear difference between
the check that looks at if the code has been built with zlib and the
check for the presence of GZIP_PROGRAM, as it can be useful in some
environments to be able to run pg_dump built with zlib, even if the
GZIP_PROGRAM command does not exist (I don't think this can be the
case, but other tests are flexible).  As of now, the patch relies on
pg_dump enforcing uncompression if building under --without-zlib even
if --compress/-Z is used, but that also means that those compression
tests overlap with the existing tests in this case.  Wouldn't it be
more consistent to check after $supports_compression when executing
the dump command for test_key = "compression[_gzip]"?  This would mean
keeping GZIP_PROGRAM as sole check when executing the compression
command.
--
Michael


signature.asc
Description: PGP signature

RE: Logical replication timeout problem

2022-03-30 Thread shiy.f...@fujitsu.com

On Wed, Mar 30, 2022 3:54 PM wangw.f...@fujitsu.com  
wrote:
> 
> Rebase the patch because the commit d5a9d86d in current HEAD.
> 

Thanks for your patch, I tried this patch and confirmed that there is no timeout
problem after applying this patch, and I could reproduce this problem on HEAD.

Regards,
Shi yu

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 22:07:48 -0400, Tom Lane wrote:
> So ... none of the Windows buildfarm members actually like this
> test script.  They're all showing failures along the lines of
> 
> not ok 2 - fails if basebackup_to_shell.command is not set: matches
> 
> #   Failed test 'fails if basebackup_to_shell.command is not set: matches'
> #   at t/001_basic.pl line 38.
> #   'pg_basebackup: error: connection to server at 
> "127.0.0.1", port 55358 failed: FATAL:  SSPI authentication failed for user 
> "backupuser"
> # '
> # doesn't match '(?^:shell command for backup is not configured)'
> 
> Does the CI setup not account for this issue?

On windows CI sets
# Avoids port conflicts between concurrent tap test runs
PG_TEST_USE_UNIX_SOCKETS: 1

because I've otherwise seen a lot of spurious tap test failures - Cluster.pm
get_free_port() is racy, as it admits:
XXX A port available now may become unavailable by the time we start
the desired service.

The only alternative is to not use parallelism when running tap tests, but
that makes test runs even slower - and windows is already the bottleneck for
cfbot.

I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that -
we only enable it when not using unix sockets:

# Internal method to set up trusted pg_hba.conf for replication.  Not
# documented because you shouldn't use it, it's called automatically if needed.
sub set_replication_conf
{
my ($self) = @_;
my $pgdata = $self->data_dir;

$self->host eq $test_pghost
  or croak "set_replication_conf only works with the default host";

open my $hba, '>>', "$pgdata/pg_hba.conf";
print $hba "\n# Allow replication (set up by 
PostgreSQL::Test::Cluster.pm)\n";
if ($PostgreSQL::Test::Utils::windows_os && 
!$PostgreSQL::Test::Utils::use_unix_sockets)
{
print $hba
  "host replication all $test_localhost/32 sspi include_realm=1 
map=regress\n";
}
close $hba;
return;
}

Greetings,

Andres Freund

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

So ... none of the Windows buildfarm members actually like this
test script.  They're all showing failures along the lines of

not ok 2 - fails if basebackup_to_shell.command is not set: matches

#   Failed test 'fails if basebackup_to_shell.command is not set: matches'
#   at t/001_basic.pl line 38.
#   'pg_basebackup: error: connection to server at "127.0.0.1", 
port 55358 failed: FATAL:  SSPI authentication failed for user "backupuser"
# '
# doesn't match '(?^:shell command for backup is not configured)'

Does the CI setup not account for this issue?

regards, tom lane

Re: Commitfest Update

2022-03-30 Thread Julien Rouhaud

On Wed, Mar 30, 2022 at 02:41:26PM -0400, Greg Stark wrote:
>
> Patches that are Waiting on Author and haven't had activity in months
> -- traditionally they were set to Returned with Feedback. It seems the
> feeling these days is to not lose state on them and just move them to
> the next CF. I'm not sure that's wise, it ends up just filling up the
> list with patches nobody's working on.

+1 for closing such patches as Returned with Feedback, for the same reasons
Robert and Peter already stated.

Note that I already closed such CF entries during the last commitfest, so
hopefully there shouldn't be too much.  Last time I used "both Waiting on
Author and no activity from the author, since at least the 15th of the month"
as the threshold to close such patches (although I closed them at the beginning
of the next month), as it seems to be the usual (and informal) rule.

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 17:50:42 -0700, Peter Geoghegan wrote:
> I tried several times to recreate this issue on CI. No luck with that,
> though -- can't get it to fail again after 4 attempts.

It's really annoying that we don't have Assert variants that show the compared
values, that might make it easier to intepret what's going on.

Something vaguely like EXPECT_EQ_U32 in regress.c. Maybe
AssertCmp(type, a, op, b),

Then the assertion could have been something like
   AssertCmp(int32, diff, >, 0)

Does the line number in the failed run actually correspond to the xid, rather
than the mxid case? I didn't check.

You could try to increase the likelihood of reproducing the failure by
duplicating the invocation that lead to the crash a few times in the
.cirrus.yml file in your dev branch. That might allow hitting the problem more
quickly.

Maybe reduce autovacuum_naptime in src/tools/ci/pg_ci_base.conf?

Or locally - one thing that windows CI does different from the other platforms
is that it runs isolation, contrib and a bunch of other tests using the same
cluster. Which of course increases the likelihood of autovacuum having stuff
to do, *particularly* on shared relations - normally there's probably not
enough changes for that.

You can do something similar locally on linux with
make -Otarget -C contrib/ -j48 -s USE_MODULE_DB=1 installcheck 
prove_installcheck=true
(the prove_installcheck=true to prevent tap tests from running, we don't seem
to have another way for that)

I don't think windows uses USE_MODULE_DB=1, but it allows to cause a lot more
load concurrently than running tests serially...

> We know that this particular assertion did not fail during the same VACUUM:
> 
> Assert(vacrel->NewRelfrozenXid == OldestXmin ||
>TransactionIdPrecedesOrEquals(vacrel->relfrozenxid,
>  vacrel->NewRelfrozenXid));

The comment in your patch says "is either older or newer than FreezeLimit" - I
assume that's some rephrasing damage?

> So it's hard to see how this could be a bug in the patch -- the final
> new relfrozenxid is presumably equal to VACUUM's OldestXmin in the
> problem scenario seen on the CI Windows instance yesterday (that's why
> this earlier assertion didn't fail).

Perhaps it's worth commiting improved assertions on master? If this is indeed
a pre-existing bug, and we're just missing due to slightly less stringent
asserts, we could rectify that separately.

> The surprising part of the CI failure must have taken place just after
> this assertion, when VACUUM's call to vacuum_set_xid_limits() actually
> updates pg_class.relfrozenxid with vacrel->NewRelfrozenXid --
> presumably because the existing relfrozenxid appeared to be "in the
> future" when we examine it in pg_class again. We see evidence that
> this must have happened afterwards, when the closely related assertion
> (used only in instrumentation code) fails:

Hm. This triggers some vague memories. There's some oddities around shared
relations being vacuumed separately in all the databases and thus having
separate horizons.

After "remembering" that, I looked in the cirrus log for the failed run, and
the worker was processing shared a shared relation last:

2022-03-30 03:48:30.238 GMT [5984][autovacuum worker] LOG:  automatic analyze 
of table "contrib_regression.pg_catalog.pg_authid"

Obviously that's not a guarantee that the next table processed also is a
shared catalog, but ...

Oh, the relid is actually in the stack trace. 0x4ee = 1262 =
pg_database. Which makes sense, the test ends up with a high percentage of
dead rows in pg_database, due to all the different contrib tests
creating/dropping a database.

> From my patch:
> 
> > if (frozenxid_updated)
> > {
> > -   diff = (int32) (FreezeLimit - vacrel->relfrozenxid);
> > +   diff = (int32) (vacrel->NewRelfrozenXid - 
> > vacrel->relfrozenxid);
> > +   Assert(diff > 0);
> > appendStringInfo(,
> >  _("new relfrozenxid: %u, which is %d xids 
> > ahead of previous value\n"),
> > -FreezeLimit, diff);
> > +vacrel->NewRelfrozenXid, diff);
> > }

Perhaps this ought to be an elog() instead of an Assert()? Something has gone
pear shaped if we get here... It's a bit annoying though, because it'd have to
be a PANIC to be visible on the bf / CI :(.

Greetings,

Andres Freund

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-30 Thread Peter Geoghegan

On Wed, Mar 30, 2022 at 12:01 AM Peter Geoghegan  wrote:
> Perhaps something is amiss inside vac_update_relstats(), where the
> boolean flag that indicates that pg_class.relfrozenxid was advanced is
> set:
>
> if (frozenxid_updated)
> *frozenxid_updated = false;
> if (TransactionIdIsNormal(frozenxid) &&
> pgcform->relfrozenxid != frozenxid &&
> (TransactionIdPrecedes(pgcform->relfrozenxid, frozenxid) ||
>  TransactionIdPrecedes(ReadNextTransactionId(),
>pgcform->relfrozenxid)))
> {
> if (frozenxid_updated)
> *frozenxid_updated = true;
> pgcform->relfrozenxid = frozenxid;
> dirty = true;
> }
>
> Maybe the "existing relfrozenxid is in the future, silently update
> relfrozenxid" part of the condition (which involves
> ReadNextTransactionId()) somehow does the wrong thing here. But how?

I tried several times to recreate this issue on CI. No luck with that,
though -- can't get it to fail again after 4 attempts.

This was a VACUUM of pg_database, run from an autovacuum worker. I am
vaguely reminded of the two bugs fixed by Andres in commit a54e1f15.
Both were issues with the shared relcache init file affecting shared
and nailed catalog relations. Those bugs had symptoms like " ERROR:
found xmin ... from before relfrozenxid ..." for various system
catalogs.

We know that this particular assertion did not fail during the same VACUUM:

Assert(vacrel->NewRelfrozenXid == OldestXmin ||
   TransactionIdPrecedesOrEquals(vacrel->relfrozenxid,
 vacrel->NewRelfrozenXid));

So it's hard to see how this could be a bug in the patch -- the final
new relfrozenxid is presumably equal to VACUUM's OldestXmin in the
problem scenario seen on the CI Windows instance yesterday (that's why
this earlier assertion didn't fail).  The assertion I'm showing here
needs the "vacrel->NewRelfrozenXid == OldestXmin" part of the
condition to account for the fact that
OldestXmin/GetOldestNonRemovableTransactionId() is known to "go
backwards". Without that the regression tests will fail quite easily.

The surprising part of the CI failure must have taken place just after
this assertion, when VACUUM's call to vacuum_set_xid_limits() actually
updates pg_class.relfrozenxid with vacrel->NewRelfrozenXid --
presumably because the existing relfrozenxid appeared to be "in the
future" when we examine it in pg_class again. We see evidence that
this must have happened afterwards, when the closely related assertion
(used only in instrumentation code) fails:

>From my patch:

> if (frozenxid_updated)
> {
> -   diff = (int32) (FreezeLimit - vacrel->relfrozenxid);
> +   diff = (int32) (vacrel->NewRelfrozenXid - 
> vacrel->relfrozenxid);
> +   Assert(diff > 0);
> appendStringInfo(,
>  _("new relfrozenxid: %u, which is %d xids 
> ahead of previous value\n"),
> -FreezeLimit, diff);
> +vacrel->NewRelfrozenXid, diff);
> }

Does anybody have any ideas about what might be going on here?

-- 
Peter Geoghegan

Re: Mingw task for Cirrus CI

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-22 19:00:42 +0300, Melih Mutlu wrote:
> Rebased it.
> I also removed the temp installation task and
> used NoDefaultCurrentDirectoryInExePath env variable instead.

Hm. But you're still using a separate build directory, from what I can see?
The NoDefaultCurrentDirectoryInExePath thing should only have an effect when
not using a separate build directory, no?

Does it work to not use the separate build dir? Without it we don't need the
the "preparing build tree" step, and that's quite slow on mingw:
https://cirrus-ci.com/task/4713509253545984?logs=configure#L392

[00:23:44.371] preparing build tree... done
[00:24:25.429] configure: creating ./config.status

Chatting about this patch with Thomas I started to wonder about other reasons
for the slow speed of configure. I briefly experimented locally, and it looks
like using 'dash' as the shell makes configure run a good bit quicker.

> ---
>  .cirrus.yml | 79 +++--
>  1 file changed, 65 insertions(+), 14 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index e5335fede7..1ed40347cf 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -23,7 +23,6 @@ env:
>CHECKFLAGS: -Otarget
>PROVE_FLAGS: --timer
>PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
> -  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
>PG_TEST_EXTRA: kerberos ldap ssl

This removes TEMP_CONFIG from all other tasks.  You added it back to the VS
windows task, but not the others? I assume that was accidental?

> +  env:
> +CCACHE_DIR: C:/msys64/ccache
> +BUILD_DIR: "%CIRRUS_WORKING_DIR%/build"

I think this should use TEMP_CONFIG too. Is the problem that you need to
change the path?

> +  ccache_cache:
> +folder: ${CCACHE_DIR}
> +
> +  mingw_info_script:
> +- C:\msys64\usr\bin\bash.exe -lc "where gcc"
> +- C:\msys64\usr\bin\bash.exe -lc "gcc --version"
> +- C:\msys64\usr\bin\bash.exe -lc "where perl"
> +- C:\msys64\usr\bin\bash.exe -lc "perl --version"
> +
> +  configure_script:
> +- C:\msys64\usr\bin\bash.exe -lc "mkdir %BUILD_DIR% &&
> +  cd %BUILD_DIR% &&
> +  %CIRRUS_WORKING_DIR%/configure

Could you try using dash to invoke configure here, and whether it makes 
configure faster?

> +--host=x86_64-w64-mingw32
> +--enable-cassert
> +--enable-tap-tests
> +--with-icu
> +--with-libxml
> +--with-libxslt
> +--with-lz4
> +--enable-debug
> +CC='ccache gcc'
> +CXX='ccache g++'"

I think this task should specify CFLAGS="-Og", CXXFLAGS="-Og" similar to other
tasks. We end up with -O2 otherwise, which makes the build measurably slower.

> +  tests_script:
> +  - set "NoDefaultCurrentDirectoryInExePath=0"

A comment about why NoDefaultCurrentDirectoryInExePath=0 is used would be
good.

Greetings,

Andres Freund

Re: Higher level questions around shared memory stats

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 16:35:50 -0700, Andres Freund wrote:
> On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> > Separate from the minutia in [1] I'd like to discuss a few questions of more
> > general interest. I'll post another question or two later.
>
> 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define /
>pg_stats_temp directory?
>
>With shared memory stats patch, the stats system itself doesn't need it
>anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store
>pgss_query_texts.stat. That file can be fairly hot, so there's benefit in
>having something like stats_temp_directory.
>
>I'm inclined to just leave the guc / define / directory around, with a
>note saying that it's just used by extensions?

I had searched before on codesearch.debian.net whether there are external
extensions using it, without finding one (just a copy of pgstat.h). Now I
searched using https://cs.github.com/ ([1]) and found

https://github.com/powa-team/pg_sortstats
https://github.com/uptimejp/sql_firewall
https://github.com/legrandlegrand/pg_stat_sql_plans
https://github.com/ossc-db/pg_store_plans

Which seems to weigh in favor of at least keeping the directory and
define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR.

We currently have code removing files both in pg_stat and the configured
pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching
global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed.

With the shared memory stats patch there's only a single file, so we don't
need that anymore. I guess some extension could rely on files being removed
somehow, but it's hard to believe, because it'd conflict with the stats
collector's files.

Greetings,

Andres Freund

[1] 
https://cs.github.com/?scopeName=All+repos==PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Nathan Bossart  writes:
> I think we should give this module a .gitignore file.  Patch attached.

Indeed, before somebody accidentally commits the cruft that
check-world is leaving around.  Pushed.

regards, tom lane

Re: ORDER BY pushdowns seem broken in postgres_fdw

2022-03-30 Thread Tom Lane

Ronan Dunklau  writes:
> [ v8-0001-Fix-orderby-handling-in-postgres_fdw.patch ]

I looked through this patch.  It's going in the right direction,
but I have a couple of nitpicks:

1. There are still some more places that aren't checking shippability
of the relevant opfamily.

2. The existing usage of find_em_expr_for_rel is fundamentally broken,
because that function will seize on the first EC member that is from the
given rel, whether it's shippable or not.  There might be another one
later that is shippable, so this is just the wrong API.  It's not like
this function gives us any useful isolation from the details of ECs,
because postgres_fdw is already looking into those elsewhere, notably
in find_em_expr_for_input_target --- which has the same order-sensitivity
bug.

I think that instead of doubling down on a wrong API, we should just
take that out and move the logic into postgres_fdw.c.  This also has
the advantage of producing a patch that's much safer to backpatch,
because it doesn't rely on the core backend getting updated before
postgres_fdw.so is.

So hacking on those two points, and doing some additional cleanup,
led me to the attached v9.  (In this patch, the removal of code
from equivclass.c is only meant to be applied to HEAD; we have to
leave the function in place in the back branches for API stability.)

If no objections, I think this is committable.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac028..8f4d8a5022 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -183,6 +184,8 @@ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   Index ignore_rel, List **ignore_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
+static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
+deparse_expr_cxt *context);
 static void appendAggOrderBy(List *orderList, List *targetList,
 			 deparse_expr_cxt *context);
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
@@ -1038,6 +1041,33 @@ is_foreign_param(PlannerInfo *root,
 	return false;
 }
 
+/*
+ * Returns true if it's safe to push down the sort expression described by
+ * 'pathkey' to the foreign server.
+ */
+bool
+is_foreign_pathkey(PlannerInfo *root,
+   RelOptInfo *baserel,
+   PathKey *pathkey)
+{
+	EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+
+	/*
+	 * is_foreign_expr would detect volatile expressions as well, but checking
+	 * ec_has_volatile here saves some cycles.
+	 */
+	if (pathkey_ec->ec_has_volatile)
+		return false;
+
+	/* can't push down the sort if the pathkey's opfamily is not shippable */
+	if (!is_shippable(pathkey->pk_opfamily, OperatorFamilyRelationId, fpinfo))
+		return false;
+
+	/* can push if a suitable EC member exists */
+	return (find_em_for_rel(root, pathkey_ec, baserel) != NULL);
+}
+
 /*
  * Convert type OID + typmod info into a type name we can ship to the remote
  * server.  Someplace else had better have verified that this type name is
@@ -3445,44 +3475,59 @@ appendAggOrderBy(List *orderList, List *targetList, deparse_expr_cxt *context)
 	{
 		SortGroupClause *srt = (SortGroupClause *) lfirst(lc);
 		Node	   *sortexpr;
-		Oid			sortcoltype;
-		TypeCacheEntry *typentry;
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
 		first = false;
 
+		/* Deparse the sort expression proper. */
 		sortexpr = deparseSortGroupClause(srt->tleSortGroupRef, targetList,
 		  false, context);
-		sortcoltype = exprType(sortexpr);
-		/* See whether operator is default < or > for datatype */
-		typentry = lookup_type_cache(sortcoltype,
-	 TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
-		if (srt->sortop == typentry->lt_opr)
-			appendStringInfoString(buf, " ASC");
-		else if (srt->sortop == typentry->gt_opr)
-			appendStringInfoString(buf, " DESC");
-		else
-		{
-			HeapTuple	opertup;
-			Form_pg_operator operform;
-
-			appendStringInfoString(buf, " USING ");
-
-			/* Append operator name. */
-			opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(srt->sortop));
-			if (!HeapTupleIsValid(opertup))
-elog(ERROR, "cache lookup failed for operator %u", srt->sortop);
-			operform = (Form_pg_operator) GETSTRUCT(opertup);
-			deparseOperatorName(buf, operform);
-			ReleaseSysCache(opertup);
-		}
+		/* Add decoration as needed. */
+		appendOrderBySuffix(srt->sortop, exprType(sortexpr), srt->nulls_first,
+			context);
+	}
+}
 
-		if (srt->nulls_first)
-			appendStringInfoString(buf, " NULLS

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-29 11:55:05 -0400, Robert Haas wrote:
> I committed v6 instead.

I was just discussing the WAL prefetching patch with Thomas. A question in
that discussion made me look at the coverage of REDO for CREATE DATABASE:
https://coverage.postgresql.org/src/backend/commands/dbcommands.c.gcov.html

Seems there's currently nothing hitting the REDO for
XLOG_DBASE_CREATE_FILE_COPY (currently line 3019). I think it'd be good to
keep coverage for that. How about adding a
  CREATE DATABASE ... STRATEGY file_copy
to 001_stream_rep.pl?


Might be worth adding a test for ALTER DATABASE ... SET TABLESPACE at the same
time, this patch did affect that path in some minor ways. And, somewhat
shockingly, we don't have a single test for it.

- Andres

Re: Higher level questions around shared memory stats

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-29 12:17:27 -0700, Andres Freund wrote:
> Separate from the minutia in [1] I'd like to discuss a few questions of more
> general interest. I'll post another question or two later.

4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define /
   pg_stats_temp directory?

   With shared memory stats patch, the stats system itself doesn't need it
   anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store
   pgss_query_texts.stat. That file can be fairly hot, so there's benefit in
   having something like stats_temp_directory.

   I'm inclined to just leave the guc / define / directory around, with a
   note saying that it's just used by extensions?

Greetings,

Andres Freund

Re: POC: GROUP BY optimization

2022-03-30 Thread Tomas Vondra

Pushed, after going through the patch once more, running check-world
under valgrind, and updating the commit message.

Thanks to everyone who reviewed/tested this patch!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Nathan Bossart

I think we should give this module a .gitignore file.  Patch attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7466f18b7cb781f4db4919faf15b1b1d3cd7bc7a Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 30 Mar 2022 15:28:38 -0700
Subject: [PATCH 1/1] add .gitignore for basebackup_to_shell

---
 contrib/basebackup_to_shell/.gitignore | 4 
 1 file changed, 4 insertions(+)
 create mode 100644 contrib/basebackup_to_shell/.gitignore

diff --git a/contrib/basebackup_to_shell/.gitignore b/contrib/basebackup_to_shell/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/basebackup_to_shell/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
-- 
2.25.1

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 4:22 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I'm going to go ahead and commit this test script later on this
> > afternoon unless there are vigorous objections real soon now, and then
> > if somebody wants to improve it, great!
>
> I see you did that, but the CF entry is still open?

Fixed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 4:33 PM James Coleman  wrote:
> Hmm, having it match the way it works makes sense. Would you feel
> comfortable with an intermediate step (queueing up that as a larger
> change) changing the clause to something like "indexes will still have
> to be rebuilt unless the system can guarantee that the sort order is
> proven to be unchanged" (with appropriate wordsmithing to be a bit
> less verbose if possible)?

Yeah, that seems fine. It's arguable how much detail we should go into
here - but a statement of the form you propose is not misleading, and
that's what seems most important to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] vacuumdb --schema only

2022-03-30 Thread Nathan Bossart

I took a look at the v4 patch.

'git-apply' complains about whitespace errors:

0001-vacuumdb-schema-only-v4.patch:17: tab in indent.
   
0001-vacuumdb-schema-only-v4.patch:18: tab in indent.

0001-vacuumdb-schema-only-v4.patch:19: tab in indent.
 
0001-vacuumdb-schema-only-v4.patch:20: tab in indent.
  -n
0001-vacuumdb-schema-only-v4.patch:21: tab in indent.
  --schema
warning: squelched 13 whitespace errors
warning: 18 lines add whitespace errors.

+printf(_("  -N, --exclude-schema=PATTERNdo NOT vacuum tables in the 
specified schema(s)\n"));

I'm personally -1 for the --exclude-schema option.  I don't see any
existing "exclude" options in vacuumdb, and the uses for such an option
seem rather limited.  If we can point to specific use-cases for this
option, I might be willing to change my vote.

+   
+To clean all tables in the Foo and 
bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' 
xyzzy
+

nitpicks: I think the phrasing should be "To only clean tables in the...".
Also, is there any reason to use a schema name with a capital letter as an
example?  IMO that just adds unnecessary complexity to the example.

+$node->issues_sql_like(
+[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+qr/VACUUM "Foo".*/,
+'vacuumdb --schema schema only');

IIUC there should only be one table in the schema.  Can we avoid matching
"*" and check for the exact command instead?

I think there should be a few more test cases.  For example, we should test
using -n and -N at the same time, and we should test what happens when
those options are used for missing schemas.

+/*
+ * When filtering on schema name, filter by table is not allowed.
+ * The schema name can already be set to a fqdn table name.
+ */
+if (tbl_count && (schemas.head != NULL))
+{
+pg_log_error("cannot vacuum all tables in schema(s) and specific 
table(s) at the same time");
+exit(1);
+}

I think there might be some useful refactoring we can do that would
simplify adding similar options in the future.  Specifically, can we have a
global variable that stores the type of vacuumdb command (e.g., all,
tables, or schemas)?  If so, perhaps the tables list could be renamed and
reused for schemas (and any other objects that need listing in the future).

+if (schemas != NULL && schemas->head != NULL)
+{
+appendPQExpBufferStr(_query,
+ " AND c.relnamespace");
+if (schema_is_exclude == TRI_YES)
+appendPQExpBufferStr(_query,
+" OPERATOR(pg_catalog.!=) ALL (ARRAY[");
+else if (schema_is_exclude == TRI_NO)
+appendPQExpBufferStr(_query,
+" OPERATOR(pg_catalog.=) ANY (ARRAY[");
+
+for (cell = schemas->head; cell != NULL; cell = cell->next)
+{
+appendStringLiteralConn(_query, cell->val, conn);
+
+if (cell->next != NULL)
+appendPQExpBufferStr(_query, ", ");
+}
+
+/* Finish formatting schema filter */
+appendPQExpBufferStr(_query, 
"]::pg_catalog.regnamespace[])\n");
+}

IMO we should use a CTE for specified schemas like we do for the specified
tables.  I wonder if we could even have a mostly-shared CTE code path for
all vacuumdb commands with a list of names.

-/*
- * If no tables were listed, filter for the relevant relation types.  If
- * tables were given via --table, don't bother filtering by relation type.
- * Instead, let the server decide whether a given relation can be
- * processed in which case the user will know about it.
- */
-if (!tables_listed)
+else
 {
+/*
+ * If no tables were listed, filter for the relevant relation types.  
If
+ * tables were given via --table, don't bother filtering by relation 
type.
+ * Instead, let the server decide whether a given relation can be
+ * processed in which case the user will know about it.
+ */

nitpick: This change seems unnecessary.

I noticed upthread that there was some discussion around adding a way to
specify a schema in VACUUM and ANALYZE commands.  I think this patch is
useful even if such an option is eventually added, as we'll still want
vacuumdb to obtain the full list of tables to process so that it can
effectively parallelize.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Higher level questions around shared memory stats

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 14:42:23 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 5:01 PM Andres Freund  wrote:
> > I think it's reasonably rare because in cases there'd be corruption, we'd
> > typically not even have written them out / throw them away explicitly - we
> > only read stats when starting without crash recovery.
> >
> > So the "expected" case of corruption afaicts solely is a OS crash just after
> > the shutdown checkpoint completed?
> 
> Can we prevent that case from occurring, so that there are no expected cases?

We likely can, at least for the causes of corruption I know of. We already
write the statsfile into a temporary filename and then rename into place. I
think all we'd need to do is to use durable_rename() to make sure it's durable
once renamed into place.

It's really unrelated to the shared memory stats patch though, so I'd prefer
not to tie it to that.

> > I can think of these different times:
> >
> > - Last time stats were removed due to starting up in crash recovery
> > - Last time stats were created from scratch, because no stats data file was
> >   present at startup
> > - Last time stats were thrown away due to corruption
> > - Last time a subset of stats were reset using one of the pg_reset* 
> > functions
> >
> > Makes sense?
> 
> Yes. Possibly that last could be broken in to two: when all stats were
> last reset, when some stats were last reset.

Believe it or not, we don't currently have a function to reset all stats. We
should definitely add that though, because the invocation to reset all stats
gets more ridiculous^Wcomplicated with each release.

I think the minimal invocation currently is something like:

-- reset all stats shared between databases
SELECT pg_stat_reset_shared('archiver');
SELECT pg_stat_reset_shared('bgwriter');
SELECT pg_stat_reset_shared('wal');
SELECT pg_stat_reset_replication_slot(NULL);
SELECT pg_stat_reset_slru(NULL);
SELECT pg_stat_reset_subscription_stats(NULL);

-- connect to each database and reset the stats in that database
SELECT pg_stat_reset();

I've protested against replication slot, slru, subscription stats not being
resettable via pg_stat_reset_shared(), nobody else seemed to care.

> > > Does redo update the stats?
> >
> > With "update" do you mean generate new stats? In the shared memory stats 
> > patch
> > it triggers stats to be dropped, on HEAD it just resets all stats at 
> > startup.
> >
> > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.
> 
> Well, I guess what I'm trying to figure out is what happens if we run
> in recovery for a long time -- say, a year -- and then get promoted.
> Do we have reasons to expect that the stats will be accurate enough to
> use at that point, or will they be way off?

What do you mean with 'accurate enough'?

With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze,
n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay
process doesn't update them.

In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read,
idx_tup_fetch, ...} will be accurate, with one exception below.

pg_stat_bgwriter, pg_stat_wal, etc will always be accurate.

On HEAD, there may be a lot of dead stats for dropped databases / tables /
functions that have been dropped since the start of the cluster. They will
eventually get removed, once autovacuum starts running in the respective
database (i.e. pgstat_vacuum_stat() gets run).

The exception noted above is that because pg_stat_all_tables contents are
never removed during recovery, it becomes a lot more plausible for oid
conflicts to occur. So the stats for two different tables might get added up
accidentally - but that'll just affect the non-zero columns, of course.

With the shared memory stats patch, stats for dropped objects (i.e. databases,
tables, ... ) are removed shortly after they have been dropped, so that
conflict risk doesn't exist anymore.

So I don't think increasing inaccuracy is a reason to throw away stats on
replica startup. Particularly because we already don't throw them away when
promoting the replica, just when having started it last.

> I don't have a great understanding of how this all works, but if
> running recovery for a long time is going to lead to a situation where
> the stats progressively diverge from reality, then preserving them
> doesn't seem as valuable as if they're going to be more or less
> accurate.

Minus the oid wraparound risk on HEAD, the only way they increasingly diverge
is that the '0' in a bunch of pg_stat_all_tables columns might get less and
less accurate. But that's not the type of divergence you're talking about, I
think.

Greetings,

Andres Freund

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-30 Thread David G. Johnston

On Wed, Mar 30, 2022 at 1:39 PM Andres Freund  wrote:

> Hi,
>
> On 2022-03-30 12:29:51 -0700, David G. Johnston wrote:
> > On Wednesday, March 30, 2022, Andres Freund  wrote:
> > > My current proposal is to just have two reset times. One for the
> contents
> > > of
> > > pg_stat_database (i.e. not affected by
> pg_stat_reset_single_*_counters()),
> > > and
> > > one for stats within the entire database.
>
> > What IS it affected by?  And does whatever affects it affect anything
> else?
>
> pg_stat_reset() resets the current database's stats. That includes the
> database's row in pg_stat_database and all table and function stats.
>
>
Right, so basically it updates both of the fields you are talking about.

The existing stats_reset field is also updated upon
calling pg_stat_reset_single_*_counters()

So when the two fields are different we know that at least one relation or
function statistic row is out-of-sync with the rest of the database, we
just don't know which one(s).  This is an improvement over the status quo
where the single timestamp cannot be trusted to mean anything useful.  The
DBA can execute pg_stat_reset() to get the statistics back into a common
state.

As an added bonus we will always have a reference timestamp for when the
pg_stat_database database record was last reset (as well as any other
statistic record that can only be reset by using pg_stat_reset).

David J.

Re: CLUSTER on partitioned index

2022-03-30 Thread Alvaro Herrera

On 2022-Feb-23, Justin Pryzby wrote:

> I hope that Alvaro will comment on the simplified patches.  If multiple people
> think the patch isn't worth it, feel free to close it.  But I don't see how
> complexity could be the reason.

I gave your patch a look and it seems a reasonable thing to do.  Maybe
not terribly useful in most cases, but there may be some cases for which
it is.  I found some part of it a bit repetitive, so I moved things
around a bit.  What do think about this?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 86f5fdc469..b3463ae5c4 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -196,6 +196,12 @@ CLUSTER [VERBOSE]
 in the pg_stat_progress_cluster view. See
  for details.
   
+
+   
+Clustering a partitioned table clusters each of its partitions using the
+partition of the specified partitioned index (which must be specified).
+   
+
  
 
  
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 02a7e94bf9..8417cbdb67 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 			bool verbose, bool *pSwapToastByContent,
 			TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+			   Oid indexOid);
+static void cluster_multiple_rels(List *rvs, ClusterParams *params);
 
 
 /*---
@@ -105,6 +110,10 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
+	Relation	rel = NULL;
+	Oid			indexOid = InvalidOid;
+	MemoryContext cluster_context;
+	List	   *rtcs;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -126,11 +135,13 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	if (stmt->relation != NULL)
 	{
 		/* This is the single-relation case. */
-		Oid			tableOid,
-	indexOid = InvalidOid;
-		Relation	rel;
+		Oid			tableOid;
 
-		/* Find, lock, and check permissions on the table */
+		/*
+		 * Find, lock, and check permissions on the table.  We obtain
+		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+		 * single-transaction case.
+		 */
 		tableOid = RangeVarGetRelidExtended(stmt->relation,
 			AccessExclusiveLock,
 			0,
@@ -146,14 +157,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -188,71 +191,100 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 stmt->indexname, stmt->relation->relname)));
 		}
 
-		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* close relation, keep lock till commit */
+			table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, );
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, );
+
+			return;
+		}
+	}
+
+	/*
+	 * By here, we know we are in a multi-table situation.  In order to avoid
+	 * holding locks for too long, we want to process each table in its own
+	 * transaction.  This forces us to disallow running inside a user
+	 * transaction block.
+	 */
+	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+	/* Also, we need a memory context to hold our list of relations */
+	cluster_context = AllocSetContextCreate(PortalContext,
+			"Cluster",
+			ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Either we're processing a partitioned table, or we were not given any
+	 * table name at all.  In either case, obtain a list of relations to
+	 * process.
+	 *
+	 * In

Re: Higher level questions around shared memory stats

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 21:44:20 +0200, Peter Eisentraut wrote:
> On 29.03.22 23:01, Andres Freund wrote:
> > > I think what's actually most important here is the error reporting. We 
> > > need
> > > to make it clear, at least via log messages, that something bad has
> > > happened.
> > The message currently (on HEAD, but similarly on the path) is:
> > ereport(pgStatRunningInCollector ? LOG : 
> > WARNING,
> > (errmsg("corrupted statistics 
> > file \"%s\"",
> > statfile)));
> 
> Corrupted how?

We can't parse it. Which can mean that it's truncated (we notice this because
we expect an 'E' as the last byte), bits flipped in the wrong place (there's
different bytes indicating different types of stats). Corruption within
individual stats aren't detected.

Note that this is very old code / behaviour, not meaningfully affected by
shared memory stats patch.

> How does it know?  Is there a checksum, was the file the wrong length, what
> happened?  I think this could use more detail.

I agree. But it's independent of the shared memory stats patch, so I don't
want to tie improving it to that already huge patch.

Greetings,

Andres Freund

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 12:29:51 -0700, David G. Johnston wrote:
> On Wednesday, March 30, 2022, Andres Freund  wrote:
> > My current proposal is to just have two reset times. One for the contents
> > of
> > pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()),
> > and
> > one for stats within the entire database.

> What IS it affected by?  And does whatever affects it affect anything else?

pg_stat_reset() resets the current database's stats. That includes the
database's row in pg_stat_database and all table and function stats.

Greetings,

Andres Freund

Re: Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-30 Thread James Coleman

On Wed, Mar 30, 2022 at 11:41 AM Robert Haas  wrote:
>
> On Wed, Mar 30, 2022 at 10:04 AM James Coleman  wrote:
> > Admittedly I hadn't thought of that case. But isn't it already covered
> > in the existing docs by the phrase "or an unconstrained domain over
> > the new type"? I don't love the word "or" there because there's a
> > sense in which the first clause "binary coercible to the new type" is
> > still accurate for your example unless you narrowly separate "domain"
> > and "type", but I think that narrow distinction is what's technically
> > there already.
> >
> > That being said, I could instead of removing the clause entirely
> > replace it with something like "indexes may still need to be rebuilt
> > when the new type is a constrained domain".
>
> You're talking about this as if the normal cases is that indexes don't
> need to rebuilt, but sometimes they do. However, if I recall
> correctly, the way the code is structured, it supposes that the
> indexes do need a rebuild, and then tries to prove that they actually
> don't. That disconnect makes me nervous, because it seems to me that
> it could easily lead to a situation where our documentation contains
> subtle errors. I think somebody should go through and study the
> algorithm as it exists in the code, and then write documentation to
> match it. And I think that when they do that, they should approach it
> from the same point of view that the code does i.e. "these are the
> conditions for skipping the index rebuild" rather than "these are the
> conditions for performing an index rebuild." By doing it that way, I
> think we minimize the likelihood of disconnects between code and
> documentation, and also make it easier to update in future if the
> algorithm gets changed.

Hmm, having it match the way it works makes sense. Would you feel
comfortable with an intermediate step (queueing up that as a larger
change) changing the clause to something like "indexes will still have
to be rebuilt unless the system can guarantee that the sort order is
proven to be unchanged" (with appropriate wordsmithing to be a bit
less verbose if possible)?

Thanks,
James Coleman

Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Justin Pryzby

On Wed, Mar 30, 2022 at 04:14:47PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> >> Maybe if I just put that last sentence into the comment it's clear enough?
> 
> > Done that way, since I thought it was better to fix the bug than wait
> > for more feedback on the wording. We can still adjust the wording, or
> > the coding, if it's not clear enough.
> 
> FWIW, I thought that explanation was fine, but I was deferring to
> Justin who was the one who thought things were unclear.

I still think it's unnecessarily confusing to nest "if" and "?:" conditionals
in one statement, instead of 2 or 3 separate "if"s, or "||"s.
But it's also not worth fussing over any more.

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Robert Haas  writes:
> I'm going to go ahead and commit this test script later on this
> afternoon unless there are vigorous objections real soon now, and then
> if somebody wants to improve it, great!

I see you did that, but the CF entry is still open?

regards, tom lane

Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Tom Lane

Robert Haas  writes:
>> Maybe if I just put that last sentence into the comment it's clear enough?

> Done that way, since I thought it was better to fix the bug than wait
> for more feedback on the wording. We can still adjust the wording, or
> the coding, if it's not clear enough.

FWIW, I thought that explanation was fine, but I was deferring to
Justin who was the one who thought things were unclear.

regards, tom lane

Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Robert Haas

On Tue, Mar 29, 2022 at 8:51 AM Robert Haas  wrote:
> On Mon, Mar 28, 2022 at 8:11 PM Tom Lane  wrote:
> > I suspect Robert wrote it that way intentionally --- but if so,
> > I agree it could do with more than zero commentary.
>
> Well, the point is, we stop advancing kwend when we get to the end of
> the keyword, and *vend when we get to the end of the value. If there's
> a value, the end of the keyword can't have been the end of the string,
> but the end of the value might have been. If there's no value, the end
> of the keyword could be the end of the string.
>
> Maybe if I just put that last sentence into the comment it's clear enough?

Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Higher level questions around shared memory stats

2022-03-30 Thread Peter Eisentraut


On 29.03.22 23:01, Andres Freund wrote:

I think what's actually most important here is the error reporting. We need
to make it clear, at least via log messages, that something bad has
happened.

The message currently (on HEAD, but similarly on the path) is:
ereport(pgStatRunningInCollector ? LOG : 
WARNING,
(errmsg("corrupted statistics file 
\"%s\"",
statfile)));


Corrupted how?  How does it know?  Is there a checksum, was the file the 
wrong length, what happened?  I think this could use more detail.

Re: Commitfest Update

2022-03-30 Thread Peter Eisentraut


On 30.03.22 20:41, Greg Stark wrote:

Patches that are Waiting on Author and haven't had activity in months
-- traditionally they were set to Returned with Feedback. It seems the
feeling these days is to not lose state on them and just move them to
the next CF. I'm not sure that's wise, it ends up just filling up the
list with patches nobody's working on.


If you set them to Returned with Feedback now, they can still be 
reawoken later by setting them to Needs Review and pulling them into the 
then-next commit fest.  That preserves all the state and context.

Re: Possible fails in pg_stat_statements test

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 2:20 AM Anton A. Melnikov  wrote:
> > Can the test failures be encountered without such an elaborate setup? If 
> > not,
> > then I don't really see why we need to do anything here?
>
> There was a real bug report from our test department. They do long time
> repetitive tests and sometimes met this failure.
> So i suppose there is a non-zero probability that such error can occur
> in the one-shot test as well.
> The sequence given in the first letter helps to catch this failure quickly.

I don't think that the idea of "extra" WAL records is very principled.
It's pretty vague what "extra" means, and your definition seems to be
basically "whatever would be needed to make this test case pass." I
think the problem is basically with the test cases's idea that # of
WAL records and # of table rows ought to be equal. I think that's just
false. In general, we'd also have to worry about index insertions,
which would provoke variable numbers of WAL records depending on
whether they cause a page split. And we'd have to worry about TOAST
table insertions, which could produce different numbers of records
depending on the size of the data, the configured block size and TOAST
threshold, and whether the TOAST table index incurs a page split. So
even if we added a mechanism like what you propose here, we would only
be fixing this particular test case, not creating infrastructure of
any general utility.

If it's true that this test case sometimes randomly fails, then we
ought to fix that somehow, maybe by just removing this particular
check from the test case, or changing it to >=, or something like
that. But I don't think adding a new counter is the right idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-30 Thread David G. Johnston

On Wednesday, March 30, 2022, Andres Freund  wrote:

>
> My current proposal is to just have two reset times. One for the contents
> of
> pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()),
> and
> one for stats within the entire database.
>
>
What IS it affected by?  And does whatever affects it affect anything else?

David J.

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 14:57:25 -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 8:55 PM Andres Freund  wrote:
> > This behaviour can be trivially (and is) implemented for the shared memory
> > stats patch. But every time I read over that part of the code it feels just
> > profoundly wrong to me.  Way worse than *not* resetting
> > pg_stat_database.stats_reset.
> >
> > Anybody that uses the time since the stats reset as part of a calculation of
> > transactions / sec, reads / sec or such will get completely bogus results
> > after a call to pg_stat_reset_single_table_counters().
> 
> Sure, but that's unavoidable anyway. If some stats have been reset and
> other stats have not, you can't calculate a meaningful average over
> time unless you have a specific reset time for each statistic.

Individual pg_stat_database columns can't be reset independently. Other views
summarizing large parts of the system, like pg_stat_bgwriter, pg_stat_wal etc
have a stats_reset column that is only reset if their counters is also
reset. So the only reason we can't do that for pg_stat_database is that we
don't know since when pg_stat_database counters are counting.

> To me, the current behavior feels more correct than what you propose.
> Imagine for example that you are looking for tables/indexes where the
> counters are 0 as a way of finding unused objects. If you know that no
> counters have been zeroed in a long time, you know that this is
> reliable. But under your proposal, there's no way to know this. All
> you know is that the entire system wasn't reset, and therefore some of
> the 0s that you are seeing might be for individual objects that were
> reset.

My current proposal is to just have two reset times. One for the contents of
pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()), and
one for stats within the entire database.

> I think of this mechanism like as answering the question "When's the
> last time anybody tinkered with this thing by hand?". If it's recent,
> the tinkering has a good chance of being related to whatever problem
> I'm trying to solve. If it's not, it's probably unrelated.

When I look at a database with a problem, I'll often look at pg_stat_database
to get a first impression of the type of workload running. The fact that
stats_reset doesn't reflect the age of other pg_stat_database columns makes
that much harder.

Greetings,

Andres Freund

Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

2022-03-30 Thread Justin Pryzby

On Sat, Feb 26, 2022 at 03:07:20PM -0600, Justin Pryzby wrote:
> Rebased over ebf6c5249b7db525e59563fb149642665c88f747.
> It looks like that patch handles only query_id, and this patch also tries to
> handle a bunch of other stuff.
> 
> If it's helpful, feel free to kick this patch to a future CF.

Rebased over MERGE.
>From 058bf205cbac68baa6ff539893d934115d9927f9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 22 Feb 2020 21:17:10 -0600
Subject: [PATCH 1/6] Add GUC: explain_regress

This changes the defaults for explain to: costs off, timing off, summary off.
It'd be reasonable to use this for new regression tests which are not intended
to be backpatched.
---
 contrib/postgres_fdw/postgres_fdw.c |  2 +-
 src/backend/commands/explain.c  | 13 +++--
 src/backend/utils/misc/guc.c| 13 +
 src/bin/psql/describe.c |  2 +-
 src/include/commands/explain.h  |  2 ++
 src/test/regress/expected/explain.out   |  3 +++
 src/test/regress/expected/inherit.out   |  2 +-
 src/test/regress/expected/stats_ext.out |  2 +-
 src/test/regress/pg_regress.c   |  3 ++-
 src/test/regress/sql/explain.sql|  4 
 src/test/regress/sql/inherit.sql|  2 +-
 src/test/regress/sql/stats_ext.sql  |  2 +-
 12 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8f..54da40d6628 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3124,7 +3124,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		 * values, so don't request params_list.
 		 */
 		initStringInfo();
-		appendStringInfoString(, "EXPLAIN ");
+		appendStringInfoString(, "EXPLAIN (COSTS)");
 		deparseSelectStmtForRel(, root, foreignrel, fdw_scan_tlist,
 remote_conds, pathkeys,
 fpextra ? fpextra->has_final_sort : false,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index cb13227db1f..de107ea5026 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -154,6 +154,7 @@ static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
+bool explain_regress = false; /* GUC */
 
 
 /*
@@ -172,6 +173,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	ListCell   *lc;
 	bool		timing_set = false;
 	bool		summary_set = false;
+	bool		costs_set = false;
 
 	/* Parse options list. */
 	foreach(lc, stmt->options)
@@ -183,7 +185,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 		else if (strcmp(opt->defname, "verbose") == 0)
 			es->verbose = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "costs") == 0)
+		{
+			/* Need to keep track if it was explicitly set to ON */
+			costs_set = true;
 			es->costs = defGetBoolean(opt);
+		}
 		else if (strcmp(opt->defname, "buffers") == 0)
 			es->buffers = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "wal") == 0)
@@ -227,13 +233,16 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	 parser_errposition(pstate, opt->location)));
 	}
 
+	/* if the costs option was not set explicitly, set default value */
+	es->costs = (costs_set) ? es->costs : es->costs && !explain_regress;
+
 	if (es->wal && !es->analyze)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("EXPLAIN option WAL requires ANALYZE")));
 
 	/* if the timing was not set explicitly, set default value */
-	es->timing = (timing_set) ? es->timing : es->analyze;
+	es->timing = (timing_set) ? es->timing : es->analyze && !explain_regress;
 
 	/* check that timing is used with EXPLAIN ANALYZE */
 	if (es->timing && !es->analyze)
@@ -242,7 +251,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
  errmsg("EXPLAIN option TIMING requires ANALYZE")));
 
 	/* if the summary was not set explicitly, set default value */
-	es->summary = (summary_set) ? es->summary : es->analyze;
+	es->summary = (summary_set) ? es->summary : es->analyze && !explain_regress;
 
 	query = castNode(Query, stmt->query);
 	if (IsQueryIdEnabled())
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eb3a03b9762..acd32fc229c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_authid.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/explain.h"
 #include "commands/prepare.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
@@ -1478,6 +1479,18 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+
+	{
+		{"explain_regress", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Change defaults of EXPLAIN to avoid unstable output."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_EXPLAIN
+		},
+		_regress,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Andres Freund  writes:
> On 2022-03-30 13:16:47 -0400, Robert Haas wrote:
>> On Wed, Mar 30, 2022 at 1:11 PM Andres Freund  wrote:
>>> My concern is about the quote in the middle of the path, not about quoting 
>>> at all... I.e. the ' should be after /tmp_install, not before.

>> Makes no difference. We know that the string /tmp_install contains no
>> shell metacharacters, so why does it need to be in quotes? I would've
>> probably written it the way it is here, rather than what you are
>> proposing.

> It looks ugly, and it can't be copy-pasted as easily. Seems I'm alone on this,
> so I'll leave it be...

FWIW, I agree with Andres that I'd probably have put the quote
at the end.  But Robert is right that it's functionally equivalent;
so I doubt it's worth changing.

regards, tom lane

Re: Patch to avoid orphaned dependencies

2022-03-30 Thread Nathan Bossart

On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Bertand, do you think this has any chance of making it into v15?  If not,
>> are you alright with adjusting the commitfest entry to v16 and moving it to
>> the next commitfest?
> 
> I looked this over briefly, and IMO it should have no chance of being
> committed in anything like this form.

I marked the commitfest entry as waiting-on-author, set the target version
to v16, and moved it to the next commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 13:16:47 -0400, Robert Haas wrote:
> On Wed, Mar 30, 2022 at 1:11 PM Andres Freund  wrote:
> > On March 30, 2022 9:58:26 AM PDT, Tom Lane  wrote:
> > >Andres Freund  writes:
> > >> Random aside: Am I the only one bothered by a bunch of places in
> > >> Makefile.global.in quoting like
> > >>   $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
> > >> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
> > >> and
> > >>   rm -rf '$(CURDIR)'/tmp_check &&
> > >> etc
> > >
> > >Don't we need that to handle, say, build paths with spaces in them?
> >
> > My concern is about the quote in the middle of the path, not about quoting 
> > at all... I.e. the ' should be after /tmp_install, not before.
> 
> Makes no difference. We know that the string /tmp_install contains no
> shell metacharacters, so why does it need to be in quotes? I would've
> probably written it the way it is here, rather than what you are
> proposing.

It looks ugly, and it can't be copy-pasted as easily. Seems I'm alone on this,
so I'll leave it be...

Greetings,

Andres Freund

Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2022-03-30 Thread Tom Lane

Greg Stark  writes:
> It looks like this is -- like a lot of plpgsql patches -- having
> difficulty catching the attention of reviewers and committers.

I was hoping that someone with more familiarity with pldebugger
would comment on the suitableness of this patch for their desires.
But nobody's stepped up, so I took a look through this.  It looks
like there are several different things mashed into this patch:

1. Expose exec_eval_datum() to plugins.  OK; I see that pldebugger
has code that duplicates that functionality (and not terribly well).

2. Expose do_cast_value() to plugins.  Mostly OK, but shouldn't we
expose exec_cast_value() instead?  Otherwise it's on the caller
to make sure it doesn't ask for a no-op cast, which seems like a
bad idea; not least because the example usage in get_string_value()
fails to do so.

3. Store relevant PLpgSQL_nsitem chain link in each PLpgSQL_stmt.
This makes me itch, for a number of reasons:
* I was a bit astonished that it even works; I'd thought that the
nsitem data structure is transient data thrown away when we finish
compiling.  I see now that that's not so, but do we really want to
nail down that that can't ever be improved?
* This ties us forevermore to the present, very inefficient, nsitem
list data structure.  Sooner or later somebody is going to want to
improve that linear search, and what then?
* The space overhead seems nontrivial; many PLpgSQL_stmt nodes are
not very big.
* The code implications are way more subtle than you would think
from inspecting this totally-comment-free patch implementation.
In particular, the fact that the nsitem chain pointed to by a
plpgsql_block is the right thing depends heavily on exactly where
in the parse sequence we capture the value of plpgsql_ns_top().
That could be improved with a comment, perhaps.

I think that using the PLpgSQL_nsitem chains to look up variables
in a debugger is just the wrong thing.  The right thing is to
crawl up the statement tree, and when you see a PLpgSQL_stmt_block
or loop construct, examine the associated datums.  I'll concede
that crawling *up* the tree is hard, as we only store down-links.
Now a plugin could fix that by itself, by recursively traversing the
statement tree one time and recording parent relationships in its own
data structure (say, an array of parent-statement pointers indexed by
stmtid).  Or we could add parent links in the statement tree, though
I remain concerned about the space cost.  On the whole I prefer the
first way, because (a) we don't pay the overhead when it's not needed,
and (b) a plugin could use it even in existing release branches.

BTW, crawling up the statement tree would also be a far better answer
than what's shown in the patch for locating surrounding for-loops.

So my inclination is to accept the additional function pointers
(modulo pointing to exec_cast_value) but reject the nsitem additions.

Not sure what to do with test_dbgapi.  There's some value in exercising
the find_rendezvous_variable mechanism, but I'm dubious that that
justifies a whole test module.

regards, tom lane

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-30 Thread Robert Haas

On Wed, Mar 23, 2022 at 8:55 PM Andres Freund  wrote:
> This behaviour can be trivially (and is) implemented for the shared memory
> stats patch. But every time I read over that part of the code it feels just
> profoundly wrong to me.  Way worse than *not* resetting
> pg_stat_database.stats_reset.
>
> Anybody that uses the time since the stats reset as part of a calculation of
> transactions / sec, reads / sec or such will get completely bogus results
> after a call to pg_stat_reset_single_table_counters().

Sure, but that's unavoidable anyway. If some stats have been reset and
other stats have not, you can't calculate a meaningful average over
time unless you have a specific reset time for each statistic.

To me, the current behavior feels more correct than what you propose.
Imagine for example that you are looking for tables/indexes where the
counters are 0 as a way of finding unused objects. If you know that no
counters have been zeroed in a long time, you know that this is
reliable. But under your proposal, there's no way to know this. All
you know is that the entire system wasn't reset, and therefore some of
the 0s that you are seeing might be for individual objects that were
reset.

I think of this mechanism like as answering the question "When's the
last time anybody tinkered with this thing by hand?". If it's recent,
the tinkering has a good chance of being related to whatever problem
I'm trying to solve. If it's not, it's probably unrelated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoiding smgrimmedsync() during nbtree index builds

2022-03-30 Thread Greg Stark

It looks like this patch received a review from Andres and hasn't been
updated since. I'm not sure but the review looks to me like it's not
ready to commit and needs some cleanup or at least to check on a few
things.

I guess it's not going to get bumped in the next few days so I'll move
it to the next CF.

Re: Commitfest Update

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 2:42 PM Greg Stark  wrote:
> Patches that are Waiting on Author and haven't had activity in months
> -- traditionally they were set to Returned with Feedback. It seems the
> feeling these days is to not lose state on them and just move them to
> the next CF. I'm not sure that's wise, it ends up just filling up the
> list with patches nobody's working on.

Yes, we should mark those Returned with Feedback or some other status
that causes them not to get carried forward. The CF is full of stuff
that isn't likely to get committed any time in the foreseeable future,
and that's really unhelpful.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: range_agg with multirange inputs

2022-03-30 Thread Peter Eisentraut


This patch has been committed.  I split it into a few pieces.

On 12.03.22 04:18, Paul Jungwirth wrote:

On 3/10/22 14:07, Chapman Flack wrote:

When I apply this patch, I get a func.sgml with two entries for
range_intersect_agg(anymultirange).


Arg, fixed.


In range_agg_transfn, you've changed the message in the "must be called
with a range or multirange"; that seems like another good candidate to
be an elog.


Agreed. Updated here.


I kept those messages as "range" or "multirange" separately, instead of 
"range or multirange".  This way, we don't have to update all the 
messages of this kind when a new function is added.  Since these are 
only internal messages anyway, I opted for higher maintainability.

Re: Higher level questions around shared memory stats

2022-03-30 Thread Robert Haas

On Tue, Mar 29, 2022 at 5:01 PM Andres Freund  wrote:
> I think it's reasonably rare because in cases there'd be corruption, we'd
> typically not even have written them out / throw them away explicitly - we
> only read stats when starting without crash recovery.
>
> So the "expected" case of corruption afaicts solely is a OS crash just after
> the shutdown checkpoint completed?

Can we prevent that case from occurring, so that there are no expected cases?

> > And maybe we should have, inside the stats system, something that
> > keeps track of when the stats file was last recreated from scratch because
> > of a corruption event, separately from when it was last intentionally reset.
>
> That would be easy to add. Don't think we have a good place to show the
> information right now - perhaps just new functions not part of any view?

I defer to you on where to put it.

> I can think of these different times:
>
> - Last time stats were removed due to starting up in crash recovery
> - Last time stats were created from scratch, because no stats data file was
>   present at startup
> - Last time stats were thrown away due to corruption
> - Last time a subset of stats were reset using one of the pg_reset* functions
>
> Makes sense?

Yes. Possibly that last could be broken in to two: when all stats were
last reset, when some stats were last reset.

> > Does redo update the stats?
>
> With "update" do you mean generate new stats? In the shared memory stats patch
> it triggers stats to be dropped, on HEAD it just resets all stats at startup.
>
> Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.

Well, I guess what I'm trying to figure out is what happens if we run
in recovery for a long time -- say, a year -- and then get promoted.
Do we have reasons to expect that the stats will be accurate enough to
use at that point, or will they be way off?

I don't have a great understanding of how this all works, but if
running recovery for a long time is going to lead to a situation where
the stats progressively diverge from reality, then preserving them
doesn't seem as valuable as if they're going to be more or less
accurate.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Commitfest Update

2022-03-30 Thread Greg Stark

Since the previous update 30 additional patches have been committed
from the CF. This leaves us with 120 Needs Review and 20 Ready for
Committer. There's only a few days left until the end of the month.

* Add comment about startup process getting a free procState array slot always
* Consistent use of SSL/TLS in docs
* Allow COPY "text" to output a header and add header matching mode to COPY FROM
* Enable SSL library detection via PQsslAttribute
* Fully WAL logged CREATE DATABASE - No Checkpoints
* Skipping logical replication transactions on subscriber side
* Add system view pg_ident_file_mappings
* document the need to analyze partitioned tables
* Expose get_query_def()
* JSON path numeric literal syntax
* use has_privs_for_role for predefined roles
* Support for MERGE
* Column filtering in logical replication
* Corruption during WAL replay
* Doc patch for retryable xacts
* pg_statio_all_tables: several rows per table due to invalid TOAST index
* Parameter for planner estimates of recursive queries
* logical decoding and replication of sequences
* Add relation and block-level filtering to pg_waldump
* pgbench - allow retries on some errors
* pgcrypto: Remove internal padding implementation
* Preserving db/ts/relfilenode OIDs across pg_upgrade
* Add new reloption to views for enabling row level security
* Fix handling of outer GroupingFunc within subqueries
* Fix concurrent deadlock scenario with DROP INDEX on partitioned index
* Fix bogus dependency management for GENERATED expressions
* Fix firing of RI triggers during cross-partition updates of partitioned tables
* Add fix to table_to_xmlschema regex when timestamp has time zone
* ltree_gist indexes broken after pg_upgrade from 12
* ExecTypeSetColNames is fundamentally broken


I'm going to start moving any patches that are Waiting on Author to
the next CF if they made any progress recently.

Patches that are Waiting on Author and haven't had activity in months
-- traditionally they were set to Returned with Feedback. It seems the
feeling these days is to not lose state on them and just move them to
the next CF. I'm not sure that's wise, it ends up just filling up the
list with patches nobody's working on.

-- 
greg

Re: Mark all GUC variable as PGDLLIMPORT

2022-03-30 Thread Robert Haas

On Fri, Feb 18, 2022 at 7:02 PM Andres Freund  wrote:
> On 2022-02-15 08:06:58 -0800, Andres Freund wrote:
> > The more I think about it the more I'm convinced that if we want to do this,
> > we should do it for variables and functions.
>
> Btw, if we were to do this, we should just use -fvisibility=hidden everywhere
> and would see the same set of failures on unixoid systems as on windows. Of
> course only in in-core extensions, but it'd still be better than nothing.

Let's be less ambitious for this release, and just get the variables
marked with PGDLLIMPORT. We seem to have consensus to create parity
between Windows and non-Windows builds, which means precisely applying
PGDLLIMPORT to variables marked in header files, and nothing more. The
merits of -fvisibility=hidden or PGDLLIMPORT on functions are a
separate question that can be debated on its own merits, but I don't
want that larger discussion to bog down this effort. Here are updated
patches for that.

@RMT: Andres proposed upthread that we should plan to do this just
after feature freeze. Accordingly I propose to commit at least 0002
and perhaps 0001 if people want it just after feature freeze. I
therefore ask that the RMT either (a) regard this change as not being
a feature (and thus not subject to the freeze) or (b) give it a 1-day
extension. The reason for committing it just after freeze is to
minimize the number of conflicts that it creates for other patches.
The reason why that's probably an OK thing to do is that applying
PGDLLIMPORT markings is low-risk.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v2-0001-Dumb-script-to-apply-PGDLLIMPORT-markings.patch
Description: Binary data

v2-0002-Plaster-PGDLLIMPORT-declarations-on-header-files.patch
Description: Binary data

Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-30 Thread Jeff Davis

On Thu, 2022-02-24 at 20:35 +, Simon Riggs wrote:
> The approach is perfectly fine, it just needs to be finished and
> rebased.

Attached a new version. The scope expanded, so this is likely to slip
v15 at this late time. For 15, I'll focus on my extensible rmgr work,
which can serve similar purposes.

The main purpose of this patch is to be able to emit logical events for
a table (insert/update/delete/truncate) without actually modifying a
table or relying on the heap at all. That allows table AMs to support
logical decoding/replication, and perhaps allows a few other
interesting use cases (maybe foreign tables?). There are really two
advantages of this approach over relying on a custom rmgr:

  1. Easier for extension authors
  2. Doesn't require an extension module to be loaded to start the
server

Those are very nice advantages, but they come at the price of
flexibility and performance. I think there's room for both, and we can
discuss the merits individually.

Changes:
  * Support logical messages for INSERT/UPDATE/DELETE/TRUNCATE
(before it only supported INSERT)
  * SQL functions pg_logical_emit_insert/update/delete/truncate
(callable by superuser)
  * Tests (using test_decoding)
  * Use replica identities properly
  * In general a lot of cleanup, but still not quite ready

TODO:
  * Should SQL functions be callable by the table owner? I would lean
toward superuser-only, because logical replication is used for
administrative purposes like upgrades, and I don't think we want table
owners to be able to create inconsistencies.
  * Support for multi-insert
  * Docs for SQL functions, and maybe docs in the section on Generic
WAL
  * Try to get away from reliance on heap tuples specifically

Regards,
Jeff Davis

From f7b85aea60b06eb7019befec38566b5e014bea12 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 30 Oct 2021 12:07:35 -0700
Subject: [PATCH] Logical wal.

---
 contrib/test_decoding/expected/messages.out   | 148 +++
 contrib/test_decoding/sql/messages.sql|  58 +++
 src/backend/access/heap/heapam.c  |   4 +-
 src/backend/access/rmgrdesc/Makefile  |   2 +-
 .../{logicalmsgdesc.c => logicaldesc.c}   |  45 +-
 src/backend/access/transam/rmgr.c |   2 +-
 src/backend/replication/logical/Makefile  |   2 +-
 src/backend/replication/logical/decode.c  | 275 ++--
 .../replication/logical/logical_xlog.c| 399 ++
 .../replication/logical/logicalfuncs.c| 165 +++-
 src/backend/replication/logical/message.c |  89 
 src/bin/pg_waldump/.gitignore |   2 +-
 src/bin/pg_waldump/rmgrdesc.c |   2 +-
 src/include/access/heapam.h   |   2 +
 src/include/access/heapam_xlog.h  |   3 +
 src/include/access/rmgrlist.h |   2 +-
 src/include/catalog/pg_proc.dat   |  17 +
 src/include/replication/decode.h  |   2 +-
 src/include/replication/logical_xlog.h| 124 ++
 src/include/replication/message.h |  41 --
 20 files changed, 1211 insertions(+), 173 deletions(-)
 rename src/backend/access/rmgrdesc/{logicalmsgdesc.c => logicaldesc.c} (59%)
 create mode 100644 src/backend/replication/logical/logical_xlog.c
 delete mode 100644 src/backend/replication/logical/message.c
 create mode 100644 src/include/replication/logical_xlog.h
 delete mode 100644 src/include/replication/message.h

diff --git a/contrib/test_decoding/expected/messages.out b/contrib/test_decoding/expected/messages.out
index c75d40190b6..aa284bc37c2 100644
--- a/contrib/test_decoding/expected/messages.out
+++ b/contrib/test_decoding/expected/messages.out
@@ -91,6 +91,154 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'for
 --
 (0 rows)
 
+-- no data in this table, but emit logical INSERT/UPDATE/DELETE for it
+CREATE TABLE dummy(i int, t text, n numeric, primary key(t));
+SELECT pg_logical_emit_insert('dummy', row(1, 'one', 0.1)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT pg_logical_emit_insert('dummy', row(2, 'two', 0.2)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'force-binary', '0', 'skip-empty-xacts', '1', 'include-xids', '0');
+ data  
+---
+ BEGIN
+ table public.dummy: INSERT: i[integer]:1 t[text]:'one' n[numeric]:0.1
+ COMMIT
+ BEGIN
+ table public.dummy: INSERT: i[integer]:2 t[text]:'two' n[numeric]:0.2
+ COMMIT
+(6 rows)
+
+SELECT * FROM dummy;
+ i | t | n 
+---+---+---
+(0 rows)
+
+SELECT pg_logical_emit_delete('dummy', row(12, 'twelve', 0.12)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT pg_logical_emit_update('dummy', row(15, 'fifteen', 0.15)::dummy,
+   row(16, 'sixteen', 0.16)::dummy) <>

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Thomas Munro

On Thu, Mar 31, 2022 at 5:23 AM Andres Freund  wrote:
> On 2022-03-26 13:51:35 -0700, Andres Freund wrote:
> > Prototype patch attached.
>
> Because I forgot about cfbot when attaching the patch, cfbot actually ran with
> it under this thread's cf entry.  It does look like an improvement to me:
> https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900
>
> We certainly can do better, but it's sufficiently better than what we have
> right now. So I'd like to commit it?

Nice, this will save a lot of time scrolling around trying to figure
out what broke.

> > Would we want to do this in all branches? I'd vote for yes, but ...
>
> Unless somebody speaks in favor of doing this across branches, I'd just go for
> HEAD.

I don't see any reason not to do it on all branches.  If anyone is
machine-processing the output and cares about format changes they will
be happy about the improvement.

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-03-30 Thread Greg Stark

This patch is now failing in the sqljson regression test. It looks
like it's just the ordering of the elements in json_arrayagg() calls
which may actually be a faulty test that needs more ORDER BY clauses
rather than any issues with the code. Nonetheless it's something that
needs to be addressed before this patch could be applied.

Given it's gotten some feedback from Ronan and this regression test
failure I'll move it to Waiting on Author but we're near the end of
the CF and it'll probably be moved forward soon.

On Thu, 4 Nov 2021 at 04:00, Ronan Dunklau  wrote:
>
> Le jeudi 22 juillet 2021, 10:42:49 CET Ronan Dunklau a écrit :
> > Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit :
> > > On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau 
> wrote:
> > > > I tested the 0001 patch against both HEAD and my proposed bugfix for
> > > > postgres_fdw.
> > > >
> > > > There is a problem that the ordered aggregate is not pushed down
> > > > anymore.
> > > > The underlying Sort node is correctly pushed down though.
> > > >
> > > > This comes from the fact that postgres_fdw grouping path never contains
> > > > any
> > > > pathkey. Since the cost is fuzzily the same between the pushed-down
> > > > aggregate and the locally performed one, the tie is broken against the
> > > > pathkeys.
> > >
> > > I think this might be more down to a lack of any penalty cost for
> > > fetching foreign tuples. Looking at create_foreignscan_path(), I don't
> > > see anything that adds anything to account for fetching the tuples
> > > from the foreign server. If there was something like that then there'd
> > > be more of a preference to perform the remote aggregation so that
> > > fewer rows must arrive from the remote server.
> > >
> > > I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
> > > create_foreignscan_path() and I got the plan with the remote
> > > aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
> > > than parallel_tuple_cost's default value.
> > >
> > > I'm a bit undecided on how much this patch needs to get involved in
> > > adjusting foreign scan costs.  The problem is that we've given the
> > > executor a new path to consider and nobody has done any proper
> > > costings for the foreign scan so that it properly prefers paths that
> > > have to pull fewer foreign tuples.  This is a pretty similar problem
> > > to what parallel_tuple_cost aims to fix. Also similar to how we had to
> > > add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
> > > prefer grouping at the partition level rather than at the partitioned
> > > table level.
> > >
> > > > Ideally we would add the group pathkeys to the grouping path, but this
> > > > would add an additional ORDER BY expression matching the GROUP BY.
> > > > Moreover, some triaging of the pathkeys would be necessary since we now
> > > > mix the sort-in- aggref pathkeys with the group_pathkeys.
> > >
> > > I think you're talking about passing pathkeys into
> > > create_foreign_upper_path in add_foreign_grouping_paths. If so, I
> > > don't really see how it would be safe to add pathkeys to the foreign
> > > grouping path. What if the foreign server did a Hash Aggregate?  The
> > > rows might come back in any random order.
> >
> > Yes, I was suggesting to add a new path with the pathkeys factored in, which
> > if chosen over the non-ordered path would result in an additional ORDER BY
> > clause to prevent a HashAggregate. But that doesn't seem a good idea after
> > all.
> >
> > > I kinda think that to fix this properly would need a new foreign
> > > server option such as foreign_tuple_cost. I'd feel better about
> > > something like that with some of the people with a vested interest in
> > > the FDW code were watching more closely. So far we've not managed to
> > > entice any of them with the bug report yet, but it's maybe early days
> > > yet.
> >
> > We already have that in the form of fdw_tuple_cost as a server option if I'm
> > not mistaken ? It works as expected when the number of tuples is notably
> > reduced by the foreign group by.
> >
> > The problem arise when the cardinality of the groups is equal to the input's
> > cardinality. I think even in that case we should try to use a remote
> > aggregate since it's a computation that will not happen on the local
> > server. I also think we're more likely to have up to date statistics
> > remotely than the ones collected locally on the foreign tables, and the
> > estimated number of groups would be more accurate on the remote side than
> > the local one.
>
> I took some time to toy with this again.
>
> At first I thought that charging a discount in foreign grouping paths for
> Aggref targets (since they are computed remotely) would be a good idea,
> similar to what is done for the grouping keys.
>
> But in the end, it's probably not something we would like to do. Yes, the
> group planning will be more accurate on the remote side generally (better
> statistics than locally for

Re: real/float example for testlibpq3

2022-03-30 Thread Greg Stark

On Mon, 28 Feb 2022 at 17:50, Tom Lane  wrote:
>
> Chapman Flack  writes:
> > In the current state of affairs, what's considered the ur-source of that
> > information?
>
> The source code for the type's send/receive functions :-(.  One could
> wish for something better, but no one has stepped up to produce such
> documentation.

Fwiw the client library I heard of attempting to have good binary mode
support was the Crystal language client
https://github.com/will/crystal-pg. I think he was aiming for full
coverage of the built-in data types. That might make a good reference
implementation to write up documentation from. He probably uncovered
some corner cases in development that one might not find from just
inspection of the server code.

-- 
greg

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 1:11 PM Andres Freund  wrote:
> On March 30, 2022 9:58:26 AM PDT, Tom Lane  wrote:
> >Andres Freund  writes:
> >> Random aside: Am I the only one bothered by a bunch of places in
> >> Makefile.global.in quoting like
> >>   $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
> >> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
> >> and
> >>   rm -rf '$(CURDIR)'/tmp_check &&
> >> etc
> >
> >Don't we need that to handle, say, build paths with spaces in them?
>
> My concern is about the quote in the middle of the path, not about quoting at 
> all... I.e. the ' should be after /tmp_install, not before.

Makes no difference. We know that the string /tmp_install contains no
shell metacharacters, so why does it need to be in quotes? I would've
probably written it the way it is here, rather than what you are
proposing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi, 

On March 30, 2022 9:58:26 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> Random aside: Am I the only one bothered by a bunch of places in
>> Makefile.global.in quoting like
>>   $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
>> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
>> and
>>   rm -rf '$(CURDIR)'/tmp_check &&
>> etc
>
>Don't we need that to handle, say, build paths with spaces in them?

My concern is about the quote in the middle of the path, not about quoting at 
all... I.e. the ' should be after /tmp_install, not before.


>Admittedly we're probably not completely clean for such paths,
>but that's not an excuse to break the places that do it right.
>
>(I occasionally think about setting up a BF animal configured
>like that, but haven't tried yet.)

That might be a fun exercise. Not so much for the build aspect, but to make 
sure our tools handle it.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 12:54 PM Andres Freund  wrote:
> There are some commandline utilities (including copy) where backward slashes
> in arguments are necessary, to separate options from paths :/. Those are the
> extent of backslash use in Cluster.pm that I could see quickly.

I just copied this logic from that file:

$path =~ s{\\}{}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
  $PostgreSQL::Test::Utils::windows_os
  ? qq{copy "$path%f" "%p"}
  : qq{cp "$path/%f" "%p"};

In the first version of the patch I neglected the first of those lines
and it broke, so the s{\\}{}g thing is definitely needed. It's
possible that / would be as good as  in the command text itself,
but it doesn't seem worth getting excited about. It'd be best if any
unnecessary garbage of this sort got cleaned up by someone who has a
test environment locally, rather than me testing by sending emails to
a mailing list which Thomas then downloads into a sandbox and executes
which you then send me links to what broke on the mailing list and I
try again.

> Fair enough. I found it a bit grating to read in the test, that's why I
> mentioned it...

I'm going to go ahead and commit this test script later on this
afternoon unless there are vigorous objections real soon now, and then
if somebody wants to improve it, great!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: multithreaded zstd backup compression for client and server

2022-03-30 Thread Dagfinn Ilmari Mannsåker

Robert Haas  writes:

> On Wed, Mar 30, 2022 at 8:00 AM Dagfinn Ilmari Mannsåker
>  wrote:
>> Robert Haas  writes:
>> > This patch contains a trivial adjustment to
>> > PostgreSQL::Test::Cluster::run_log to make it return a useful value
>> > instead of not. I think that should be pulled out and committed
>> > independently regardless of what happens to this patch overall, and
>> > possibly back-patched.
>>
>> run_log() is far from the only such method in PostgreSQL::Test::Cluster.
>> Here's a patch that gives the same treatment to all the methods that
>> just pass through to the corresponding PostgreSQL::Test::Utils function.
>>
>> Also attached is a fix a typo in the _get_env doc comment that I noticed
>> while auditing the return values.
>
> I suggest posting these patches on a new thread with a subject line
> that matches what the patches do, and adding it to the next
> CommitFest.

Will do.

> It seems like a reasonable thing to do on first glance, but I wouldn't
> want to commit it without going through and figuring out whether
> there's any risk of anything breaking, and it doesn't seem like
> there's a strong need to do it in v15 rather than v16.

Given that the methods don't currently have a useful return value (undef
or the empty list, depending on context), I don't expect anything to be
relying on it (and it passed check-world with --enable-tap-tests and all
the --with-foo flags I could easily get to work), but I can grep the
code as well to be extra sure.

- ilmari

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Andres Freund  writes:
> On 2022-03-30 12:34:34 -0400, Tom Lane wrote:
>> One refinement that comes to mind as I look at the patch is to distinguish
>> between "check" and "installcheck".  Not sure that's worthwhile, but not
>> sure it isn't, either.

> As it's just about "free" to do so, I see no reason not to go for showing that
> difference.  How about:

> echo "+++ (tap|regress|isolation) [install-]check in $(subdir) +++" && \

WFM.

> I see no reason to distinguish the PGXS / non-PGXs tap installcheck cases?

Agreed.

> Random aside: Am I the only one bothered by a bunch of places in
> Makefile.global.in quoting like
>   $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
> and
>   rm -rf '$(CURDIR)'/tmp_check &&
> etc

Don't we need that to handle, say, build paths with spaces in them?
Admittedly we're probably not completely clean for such paths,
but that's not an excuse to break the places that do it right.

(I occasionally think about setting up a BF animal configured
like that, but haven't tried yet.)

regards, tom lane

Re: basebackup/lz4 crash

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 10:35 AM Justin Pryzby  wrote:
> It looks like maybe this code was copied from
> bbstreamer_lz4_compressor_finalize() which has an Assert rather than expanding
> the buffer like in bbstreamer_lz4_compressor_new()

Hmm, this isn't great. On the server side, we set up a chain of bbsink
buffers that can't be resized later. Each bbsink tells the next bbsink
how to make its buffer, but the successor bbsink has control of that
buffer and resizing it on-the-fly is not allowed. It looks like
bbstreamer_lz4_compressor_new() is mimicking that logic, but not well.
It sets the initial buffer size to
LZ4F_compressBound(streamer->base.bbs_buffer.maxlen, prefs), but
streamer->base.bbs_buffer.maxlen is not any kind of promise from the
caller about future chunk sizes. It's just whatever initStringInfo()
happens to do. My guess is that Dipesh thought that the buffer
wouldn't need to be resized because "we made it big enough already"
but that's not the case. The server knows how much data it is going to
read from disk at a time, but the client has to deal with whatever the
server sends.

I think your proposed change is OK, modulo some comments. But I think
maybe we ought to delete all the stuff related to compressed_bound
from bbstreamer_lz4_compressor_new() as well, because I don't see that
there's any point. And then I think we should also add logic similar
to what you've added here to bbstreamer_lz4_compressor_finalize(), so
that we're not making the assumption that the buffer will get enlarged
at some earlier point.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 12:42:50 -0400, Robert Haas wrote:
> On Wed, Mar 30, 2022 at 12:30 PM Andres Freund  wrote:
> > # Reconfigure to restrict access and require a detail.
> > $shell_command =
> > $PostgreSQL::Test::Utils::windows_os
> > ? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"}
> > : qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"};
> >
> > I don't think the branch is needed anymore, forward slashes should work for
> > output redirection.
> 
> We have similar things in src/test/perl/PostgreSQL/Test/Cluster.pm.

There are some commandline utilities (including copy) where backward slashes
in arguments are necessary, to separate options from paths :/. Those are the
extent of backslash use in Cluster.pm that I could see quickly.

> I'm not sure it's the place of this patch to introduce a mix of styles.

Fair enough. I found it a bit grating to read in the test, that's why I
mentioned it...

Greetings,

Andres Freund

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 12:34:34 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Unless somebody speaks in favor of doing this across branches, I'd just go 
> > for
> > HEAD.
> 
> +1 for HEAD only, especially if we think we might change it some more
> later.  It seems possible this might break somebody's tooling if we
> drop it into minor releases.

Yea. I certainly have written scripts that parse check-world output - they
didn't break, but...


> One refinement that comes to mind as I look at the patch is to distinguish
> between "check" and "installcheck".  Not sure that's worthwhile, but not
> sure it isn't, either.

As it's just about "free" to do so, I see no reason not to go for showing that
difference.  How about:

echo "+++ (tap|regress|isolation) [install-]check in $(subdir) +++" && \

I see no reason to distinguish the PGXS / non-PGXs tap installcheck cases?


Random aside: Am I the only one bothered by a bunch of places in
Makefile.global.in quoting like
  $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
and
  rm -rf '$(CURDIR)'/tmp_check &&
etc
yielding commands like:
  make -C '.' 
DESTDIR='/home/andres/build/postgres/dev-assert/vpath'/tmp_install install 
>'/home/andres/build/postgres/dev-assert/vpath'/tmp_install/log/install.log 2>&1
and
  rm -rf 
'/home/andres/build/postgres/dev-assert/vpath/contrib/test_decoding'/tmp_check &

Greetings,

Andres Freund

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Robert Haas

On Wed, Mar 30, 2022 at 12:30 PM Andres Freund  wrote:
> # Reconfigure to restrict access and require a detail.
> $shell_command =
> $PostgreSQL::Test::Utils::windows_os
> ? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"}
> : qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"};
>
> I don't think the branch is needed anymore, forward slashes should work for
> output redirection.

We have similar things in src/test/perl/PostgreSQL/Test/Cluster.pm. Do
you think those can also be removed? I'm not sure it's the place of
this patch to introduce a mix of styles.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Tom Lane

Andres Freund  writes:
> Because I forgot about cfbot when attaching the patch, cfbot actually ran with
> it under this thread's cf entry.  It does look like an improvement to me:
> https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900
> We certainly can do better, but it's sufficiently better than what we have
> right now. So I'd like to commit it?

No objection here.

>> Would we want to do this in all branches? I'd vote for yes, but ...

> Unless somebody speaks in favor of doing this across branches, I'd just go for
> HEAD.

+1 for HEAD only, especially if we think we might change it some more
later.  It seems possible this might break somebody's tooling if we
drop it into minor releases.

One refinement that comes to mind as I look at the patch is to distinguish
between "check" and "installcheck".  Not sure that's worthwhile, but not
sure it isn't, either.

regards, tom lane

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-30 09:28:58 +0530, Dilip Kumar wrote:
> On Wed, Mar 30, 2022 at 6:47 AM Andres Freund  wrote:
> >
> >
> > du -s /tmp/initdb/
> > WAL_LOG: 35112
> > FILE_COPY: 29288
> >
> > So it seems we should specify a strategy in initdb? It kind of makes sense -
> > we're not going to read anything from those database. And because of the
> > ringbuffer of 256kB, we'll not even reduce IO meaningfully.
> 
> I think this makes sense, so you mean with initdb we will always use
> file_copy or we want to give a command line option for initdb ?

Don't see a need for a commandline option / a situation where using WAL_LOG
would be preferrable for initdb.

Greetings,

Andres Freund

Re: make MaxBackends available in _PG_init

2022-03-30 Thread Nathan Bossart

On Tue, Mar 29, 2022 at 12:22:21PM -0400, Robert Haas wrote:
> It's not, though, because the original proposal was to change things
> around so that the value of MaxBackends would have been reliable in
> _PG_init(). If we'd done that, then extensions that are using it in
> _PG_init() would have gone from being buggy to being not-buggy. But
> since you advocated against that change, we instead committed
> something that caused them to go from being buggy to failing outright.
> That's pretty painful for people with such extensions. And IMHO, it's
> *much* more legitimate to want to size a data structure based on the
> value of MaxBackends than it is for extensions to override GUC values.
> If we can make the latter use case work in a sane way, OK, although I
> have my doubts about how sane it really is, but it can't be at the
> expense of telling extensions that have been (incorrectly) using
> MaxBackends in _PG_init() that we're throwing them under the bus.
> 
> IMHO, the proper thing to do if certain GUC values are required for an
> extension to work is to put that information in the documentation and
> error out at an appropriate point if the user does not follow the
> directions. Then this issue does not arise. But there's no reasonable
> workaround for being unable to size data structures based on
> MaxBackends.

FWIW I would be on board with reverting all the GetMaxBackends() stuff if
we made the value available in _PG_init() and stopped supporting GUC
overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
process_shared_preload_libraries_in_progress is true).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

On 2022-03-30 08:53:43 -0400, Robert Haas wrote:
> Here's a new series, adjusted to use 'gzip' instead of 'cat' and 'type'.

Seems to have done the trick: 
https://cirrus-ci.com/task/6474955838717952?logs=test_contrib_basebackup_to_shell#L1


# Reconfigure to restrict access and require a detail.
$shell_command =
$PostgreSQL::Test::Utils::windows_os
? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"}
: qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"};

I don't think the branch is needed anymore, forward slashes should work for
output redirection.

Greetings,

Andres Freund

Re: Identify missing publications from publisher while create/alter subscription.

2022-03-30 Thread vignesh C

On Wed, Mar 30, 2022 at 5:42 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 30, 2022 at 5:37 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Mar 30, 2022 at 4:29 PM Amit Kapila  wrote:
> > >
> > > On Wed, Mar 30, 2022 at 12:22 PM vignesh C  wrote:
> > > >
> > > > I have made the changes for this, attached v17 patch has the changes
> > > > for the same.
> > > >
> > >
> > > The patch looks good to me. I have made minor edits in the comments
> > > and docs. See the attached and let me know what you think? I intend to
> > > commit this tomorrow unless there are more comments or suggestions.
> >
> > I have one minor comment:
> >
> > + "Create subscription throws warning for multiple non-existent 
> > publications");
> > +$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION mysub1;");
> > + "CREATE SUBSCRIPTION mysub1 CONNECTION '$publisher_connstr'
> > PUBLICATION mypub;"
> > + "ALTER SUBSCRIPTION mysub1 ADD PUBLICATION non_existent_pub"
> > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub"
> >
> > Why should we drop the subscription mysub1 and create it for ALTER ..
> > ADD and ALTER .. SET tests? Can't we just do below which saves
> > unnecessary subscription creation, drop, wait_for_catchup and
> > poll_query_until?
> >
> > + "Create subscription throws warning for multiple non-existent 
> > publications");
> > + "ALTER SUBSCRIPTION mysub1 ADD PUBLICATION non_existent_pub2"
> > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub3"
>
> Or I would even simplify the entire tests as follows:
>
> + "CREATE SUBSCRIPTION mysub1 CONNECTION '$publisher_connstr'
> PUBLICATION mypub, non_existent_pub1"
> + "Create subscription throws warning for non-existent publication");
> >> no drop of mysub1 >>
> + "CREATE SUBSCRIPTION mysub2 CONNECTION '$publisher_connstr'
> PUBLICATION non_existent_pub1, non_existent_pub2"
> + "Create subscription throws warning for multiple non-existent 
> publications");
> >> no drop of mysub2 >>
> + "ALTER SUBSCRIPTION mysub2 ADD PUBLICATION non_existent_pub3"
> + "Alter subscription add publication throws warning for non-existent
> publication");
> + "ALTER SUBSCRIPTION mysub2 SET PUBLICATION non_existent_pub4"
> + "Alter subscription set publication throws warning for non-existent
> publication");
>  $node_subscriber->stop;
>  $node_publisher->stop;

Your suggestion looks valid, I have modified it as suggested.
Additionally I have removed Create subscription with multiple
non-existent publications and changed add publication with sing
non-existent publication to add publication with multiple non-existent
publications to cover the multiple non-existent publications path.
Attached v19 patch has the changes for the same.

Regards,
Vignesh
From fc175350780ca046e977c9b60d1f79873d19ae7e Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sat, 26 Mar 2022 19:43:27 +0530
Subject: [PATCH v19] Raise WARNING for missing publications.

When we create or alter a subscription to add publications raise a warning
for non-existent publications. We don't want to give an error here because
it is possible that users can later create the missing publications.

Author: Vignesh C
Reviewed-by: Amit Kapila, Bharath Rupireddy, Japin Li, Dilip Kumar, Euler Taveira, Ashutosh Sharma
Discussion: https://www.postgresql.org/message-id/flat/20220321235957.i4jtjn4wyjucex6b%40alap3.anarazel.de#b846fd4ef657cfaa8c9890f044e4
---
 doc/src/sgml/ref/alter_subscription.sgml  |   4 +-
 doc/src/sgml/ref/create_subscription.sgml |   7 ++
 src/backend/commands/subscriptioncmds.c   | 133 ++
 src/test/subscription/t/007_ddl.pl|  37 ++
 4 files changed, 160 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 3e46bbdb04..fe13ab9a2d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -114,7 +114,9 @@ ALTER SUBSCRIPTION name RENAME TO <
   replaces the entire list of publications with a new list,
   ADD adds additional publications to the list of
   publications, and DROP removes the publications from
-  the list of publications.  See 
+  the list of publications.  We allow non-existent publications to be
+  specified in ADD and SET variants
+  so that users can add those later.  See 
   for more information.  By default, this command will also act like
   REFRESH PUBLICATION.
  
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index b701752fc9..ebf7db57c5 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,13 @@ CREATE SUBSCRIPTION subscription_name
 
+  
+   We allow non-existent publications to be specified so that users can add
+   those later. This means
+   pg_subscription
+   can have non-existent publications.
+  
+
  
 
  
diff --git a/src/backend/commands/subscriptioncmds.c

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-30 Thread Andres Freund

Hi,

On 2022-03-26 13:51:35 -0700, Andres Freund wrote:
> Prototype patch attached.

Because I forgot about cfbot when attaching the patch, cfbot actually ran with
it under this thread's cf entry.  It does look like an improvement to me:
https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900

We certainly can do better, but it's sufficiently better than what we have
right now. So I'd like to commit it?

> Would we want to do this in all branches? I'd vote for yes, but ...

Unless somebody speaks in favor of doing this across branches, I'd just go for
HEAD.

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-30 Thread Nathan Bossart

On Tue, Mar 29, 2022 at 03:48:32PM -0700, Nathan Bossart wrote:
> On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote:
>>  /* we're only handling directories here, skip if it's not ours */
>> -if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode))
>> +if (lstat(path, ) != 0)
>> +ereport(ERROR,
>> +(errcode_for_file_access(),
>> + errmsg("could not stat file \"%s\": %m", path)));
>> +else if (!S_ISDIR(statbuf.st_mode))
>>  return;
>> 
>> Why is this a good place to silently ignore non-directories?
>> StartupReorderBuffer() is already in charge of skipping random
>> detritus found in the directory, so would it be better to do "if
>> (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then
>> drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs()
>> completely?  Then perhaps its ReadDirExtended() shoud be using ERROR
>> instead of INFO, so that missing/non-dir/b0rked directories raise an
>> error.
> 
> My guess is that this was done because ReorderBufferCleanupSerializedTXNs()
> is also called from ReorderBufferAllocate() and ReorderBufferFree().
> However, it is odd that we just silently return if the slot path isn't a
> directory in those cases.  I think we could use get_dirent_type() in
> StartupReorderBuffer() as you suggested, and then we could let ReadDir()
> ERROR for non-directories for the other callers of
> ReorderBufferCleanupSerializedTXNs().  WDYT?
> 
>> I don't understand why it's reporting readdir() errors at INFO
>> but unlink() errors at ERROR, and as far as I can see the other paths
>> that reach this code shouldn't be sending in paths to non-directories
>> here unless something is seriously busted and that's ERROR-worthy.
> 
> I agree.  I'll switch it to ReadDir() in the next revision so that we ERROR
> instead of INFO.

Here is an updated patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 784419919fca8f157a1c7304b79567b3ba29f3bf Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 15 Feb 2022 09:40:53 -0800
Subject: [PATCH v11 1/2] make more use of get_dirent_type()

---
 src/backend/access/heap/rewriteheap.c |  4 +-
 .../replication/logical/reorderbuffer.c   | 12 +++---
 src/backend/replication/logical/snapbuild.c   |  5 +--
 src/backend/replication/slot.c|  4 +-
 src/backend/storage/file/copydir.c| 21 +++---
 src/backend/storage/file/fd.c | 20 +
 src/backend/utils/misc/guc-file.l | 42 +++
 src/timezone/pgtz.c   |  8 +---
 8 files changed, 48 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 2a53826736..d64d7aae2e 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -113,6 +113,7 @@
 #include "access/xact.h"
 #include "access/xloginsert.h"
 #include "catalog/catalog.h"
+#include "common/file_utils.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1213,7 +1214,6 @@ CheckPointLogicalRewriteHeap(void)
 	mappings_dir = AllocateDir("pg_logical/mappings");
 	while ((mapping_de = ReadDir(mappings_dir, "pg_logical/mappings")) != NULL)
 	{
-		struct stat statbuf;
 		Oid			dboid;
 		Oid			relid;
 		XLogRecPtr	lsn;
@@ -1227,7 +1227,7 @@ CheckPointLogicalRewriteHeap(void)
 			continue;
 
 		snprintf(path, sizeof(path), "pg_logical/mappings/%s", mapping_de->d_name);
-		if (lstat(path, ) == 0 && !S_ISREG(statbuf.st_mode))
+		if (get_dirent_type(path, mapping_de, false, ERROR) != PGFILETYPE_REG)
 			continue;
 
 		/* Skip over files that cannot be ours. */
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c2d9be81fa..489a8300fa 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -126,6 +126,7 @@
 #include "access/xlog_internal.h"
 #include "catalog/catalog.h"
 #include "commands/sequence.h"
+#include "common/file_utils.h"
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -4806,15 +4807,10 @@ ReorderBufferCleanupSerializedTXNs(const char *slotname)
 {
 	DIR		   *spill_dir;
 	struct dirent *spill_de;
-	struct stat statbuf;
 	char		path[MAXPGPATH * 2 + 12];
 
 	sprintf(path, "pg_replslot/%s", slotname);
 
-	/* we're only handling directories here, skip if it's not ours */
-	if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode))
-		return;
-
 	spill_dir = AllocateDir(path);
 	while ((spill_de = ReadDirExtended(spill_dir, path, INFO)) != NULL)
 	{
@@ -4862,6 +4858,7 @@ StartupReorderBuffer(void)
 {
 	DIR		   *logical_dir;
 	struct dirent *logical_de;
+	char		path[MAXPGPATH * 2 + 12];
 
 	logical_dir = AllocateDir("pg_replslot");
 	while ((logical_de = ReadDir(logical_dir, "pg_replslot")) != NULL)
@@ -4874,6 +4871,11 @@ StartupReorderBuffer(void)
 		if

Re: Frontend error logging style

2022-03-30 Thread Tom Lane

Greg Stark  writes:
> FYI this patch no longer applies.

No surprise :-(

> Given it's a Tom patch and due to its nature it'll bitrot rapidly
> anyways I'm don't see a point in updating the status though.

We now seem to have buy-in on the concept, so my plan is to let
this sit till the end of the commitfest, then rebase and push.
That should avoid unnecessary churn for other pending patches.

regards, tom lane

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-30 Thread Jacob Champion

On Wed, 2022-03-30 at 13:37 +0200, Peter Eisentraut wrote:
> On 28.03.22 22:21, Jacob Champion wrote:
> > On Mon, 2022-03-28 at 11:17 +0200, Daniel Gustafsson wrote:
> > > Fixing up the switch_server_cert() calls and using default_ssl_connstr 
> > > makes
> > > the test pass for me.  The required fixes are in the supplied 0004 diff, 
> > > I kept
> > > them separate to allow the original author to incorporate them without 
> > > having
> > > to dig them out to see what changed (named to match the git format-patch 
> > > output
> > > since I think the CFBot just applies the patches in alphabetical order).
> > 
> > Thanks! Those changes look good to me; I've folded them into v11. This
> > is rebased on a newer HEAD so it should fix the apply failures that
> > Greg pointed out.
> 
> I'm not happy about how inet_net_pton.o is repurposed here.  That code
> is clearly meant to support backend data types with specifically.  Code like
> 
> + /*
> +  * pg_inet_net_pton() will accept CIDR masks, which we don't want to
> +  * match, so skip the comparison if the host string contains a slash.
> +  */
> 
> indicates that we are fighting against the API.

Horiguchi-san had the same concern upthread, I think. I replaced that
code in the next patch, but it was enough net-new stuff that I kept the
patches separate instead of merging them. I can change that if it's not
helpful for review.

> Also, if someone ever
> wants to change how those backend data types work, we then have to check
> a bunch of other code as well.
> 
> I think we should be using inet_ntop()/inet_pton() directly here.  We
> can throw substitute implementations into libpgport if necessary, that's
> not so difficult.

Is this request satisfied by the implementation of pg_inet_pton() in
patch 0003? If not, what needs to change?

Thanks,
--Jacob

Re: Returning multiple rows in materialized mode inside the extension

2022-03-30 Thread David G. Johnston

On Wed, Mar 30, 2022 at 9:01 AM Piotr Styczyński 
wrote:

> I don’t know if this mailing list is a good place to ask this question,
> but if it’s not, just correct me.
>
pgsql-general is probably better

> *The problem:*
>
> We currently have a one-to-many function (an operation that produces
> multiple rows per one one input row).
>
Now we would like to translate that functionality to a sensible
> many-to-many.
>
This seems like a big gap.

Input Situation Rows:
1
2
3
What is the expected output
1 A
1 B
1 C
2 A
2 B
2 C
3 A
3 B
3 C

I really don't know how you would change the internals to handle this - I'm
doubting it would even be possible.  If asked to accomplish this using just
standard PostgreSQL I would turn the inputs into an array

{1,2,3}

and pass that array into a set-returning function.  Now I have:

{1,2,3} A
{1,2,3} B
{1,2,3} C

as an output, and I can just unnest the array column to produce the final
result.

Something like (not tested):

SELECT unnest(arr_input.arr), func_call
FROM
(SELECT array_agg(inputvals) AS arr FROM tbl) AS arr_input
LATERAL func_call(arr_input.arr)
;

David J.

Re: Add non-blocking version of PQcancel

2022-03-30 Thread Jelte Fennema

I attached a new version of this patch. Which does three main things:
1. Change the PQrequestCancel implementation to use the regular 
connection establishement code, to support all connection options 
including encryption.
2. Add PQrequestCancelStart which is a thread-safe and non-blocking 
version of this new PQrequestCancel implementation.
3. Add PQconnectComplete, which completes a connection started by 
PQrequestCancelStart. This is useful if you want a thread-safe, but 
blocking cancel (without having a need for signal safety).

This change un-deprecates PQrequestCancel, since now there's actually an 
advantage to using it over PQcancel. It also includes user facing documentation
for all these functions. 

As a API design change from the previous version, PQrequestCancelStart now
returns a regular PGconn for the cancel connection.

@Tom Lane regarding this:
> Even in the non-blocking case, callers should only deal with the original 
> PGconn.

This would by definition result in non-threadsafe code (afaict). So I refrained 
from doing this.
The blocking version doesn't expose a PGconn at all, but the non-blocking one 
now returns a new PGconn.

There's two more changes that I at least want to do before considering this 
patch mergable:
1. Go over all the functions that can be called with a PGconn, but should not 
be 
called with a cancellation PGconn and error out or exit early.
2. Copy over the SockAddr from the original connection and always connect to 
the same socket. I believe with the current code the cancellation could end 
up
at the wrong server if there are multiple hosts listed in the connection 
string.

And there's a third item that I would like to do as a bonus:
3. Actually use the non-blocking API for the postgres_fdw code to implement a 
timeout. Which would allow this comment can be removed:
/*
 * Issue cancel request.  Unfortunately, there's no good way to limit 
the
 * amount of time that we might block inside PQgetCancel().
 */
 
So a next version of this patch can be expected somewhere later this week.
But any feedback on the current version would be appreciated. Because
these 3 changes won't change the overall design much.

Jelte

0001-Add-documentation-for-libpq_pipeline-tests.patch
Description: 0001-Add-documentation-for-libpq_pipeline-tests.patch


0002-Add-non-blocking-version-of-PQcancel.patch
Description: 0002-Add-non-blocking-version-of-PQcancel.patch

Re: Printing backtrace of postgres processes

2022-03-30 Thread Justin Pryzby

On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> Sadly the cfbot is showing a patch conflict again. It's just a trivial
> conflict in the regression test schedule so I'm not going to update
> the status but it would be good to rebase it so we continue to get
> cfbot testing.

Done.  No changes.
>From b984dacb4bf2794705a8da49fa68ac9d9a6f Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 25 Jan 2022 08:21:22 +0530
Subject: [PATCH] Add function to log the backtrace of the specified postgres
 process.

This commit adds pg_log_backtrace() function that requests to log
the backtrace of the specified backend or auxiliary process except
logger and statistic collector.

Only superusers are allowed to request to log the backtrace
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its backtrace at LOG_SERVER_ONLY level,
so that the backtrace will appear in the server log but not
be sent to the client.

Bump catalog version.

Authors: Vignesh C, Bharath Rupireddy, Justin Pryzby
Reviewers: Bharath Rupireddy, Justin Pryzby, Fujii Masao, Atsushi Torikoshi, Dilip Kumar, Robert Haas, Andres Freund, Tom lane, Craig Ringer
Discussion: https://www.postgresql.org/message-id/CALDaNm3ZzmFS-=r7oduzj7y7bgqv+n06kqyft6c3xzdoknk...@mail.gmail.com
---
 doc/src/sgml/func.sgml| 62 +++
 src/backend/catalog/system_functions.sql  |  2 +
 src/backend/postmaster/autovacuum.c   |  4 ++
 src/backend/postmaster/checkpointer.c |  4 ++
 src/backend/postmaster/interrupt.c|  4 ++
 src/backend/postmaster/pgarch.c   | 10 +++-
 src/backend/postmaster/startup.c  |  4 ++
 src/backend/postmaster/walwriter.c|  4 ++
 src/backend/storage/ipc/procarray.c   | 56 +
 src/backend/storage/ipc/procsignal.c  | 42 +
 src/backend/storage/ipc/signalfuncs.c | 73 ---
 src/backend/tcop/postgres.c   |  4 ++
 src/backend/utils/adt/mcxtfuncs.c | 40 +++--
 src/backend/utils/error/elog.c| 19 --
 src/backend/utils/init/globals.c  |  1 +
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/miscadmin.h   |  1 +
 src/include/storage/procarray.h   |  2 +
 src/include/storage/procsignal.h  |  3 +-
 src/include/utils/elog.h  |  2 +
 src/test/regress/expected/backtrace.out   | 49 +++
 src/test/regress/expected/backtrace_1.out | 55 +
 src/test/regress/parallel_schedule|  2 +-
 src/test/regress/sql/backtrace.sql| 33 ++
 24 files changed, 416 insertions(+), 65 deletions(-)
 create mode 100644 src/test/regress/expected/backtrace.out
 create mode 100644 src/test/regress/expected/backtrace_1.out
 create mode 100644 src/test/regress/sql/backtrace.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 524c6b98547..4d187b3a6d8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25394,6 +25394,30 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backtrace
+
+pg_log_backtrace ( pid integer )
+boolean
+   
+   
+Requests to log the backtrace of the backend with the
+specified process ID.  This function can send the request to
+backends and auxiliary processes except the logger and statistics
+collector.  The backtraces will be logged at LOG
+message level. They will appear in the server log based on the log
+configuration set (See  for
+more information), but will not be sent to the client regardless of
+. A backtrace identifies
+which function a process is currently executing and it may be useful
+for developers to diagnose stuck processes and other problems. This
+function is supported only if PostgreSQL was built with the ability to
+capture backtraces, otherwise it will emit a warning.
+   
+  
+
   

 
@@ -25614,6 +25638,44 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_backtrace can be used to log the backtrace of
+a backend process. For example:
+
+postgres=# select pg_log_backtrace(pg_backend_pid());
+ pg_log_backtrace
+--
+ t
+(1 row)
+
+The backtrace will be logged as specified by the logging configuration.
+For example:
+
+2021-01-27 11:33:50.247 IST [111735] LOG:  current backtrace:
+postgres: postgresdba postgres [local] SELECT(set_backtrace+0x38) [0xae06c5]
+postgres: postgresdba postgres [local] SELECT(ProcessInterrupts+0x788) [0x950c34]
+postgres: postgresdba postgres

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-30 Thread Jacob Champion

On Tue, 2022-03-29 at 16:53 -0700, Andres Freund wrote:
> > We'd also need to guess whether the GUC system's serialization of NULL
> > as an empty string is likely to cause problems for any future auth
> > methods.
> 
> You can't represent a NULL in a postgres 'text' datum, independent of
> parallelism. So the current definition of pg_session_authn_id() already
> precludes that (and set_authn_id() and ...).  Honestly, I can't see a reason
> why we should ever allow authn_id to contain a NULL byte.

I don't mean a NULL byte, just a NULL pointer. This part of the
implementation doesn't distinguish between it and an empty string:

> /* NULL becomes empty string, see estimate_variable_size() */
> do_serialize(destptr, maxbytes, "%s",
>  *conf->variable ? *conf->variable : "");

Whether that's a problem in the future entirely depends on whether
there's some authentication method that considers the empty string a
sane and meaningful identity. We might reasonably decide that the
answer is "no", but I like being able to make that decision as opposed
to delegating it to an existing generic framework.

(That last point may be my core concern about making it a GUC: I'd like
us to have full control of how and where this particular piece of
information gets modified.)

Thanks,
--Jacob

Returning multiple rows in materialized mode inside the extension

2022-03-30 Thread Piotr Styczyński

Hi,

I represent a small group of developers. We are working on an open-source
PostgreSQL extension to enable processing of genomic data inside Postgres.
We have an extensive knowledge of molecular biology or data science and
none of the Postgres internals.

I don’t know if this mailing list is a good place to ask this question, but
if it’s not, just correct me.

*The problem:*

We currently have a one-to-many function (an operation that produces
multiple rows per one one input row). Now we would like to translate that
functionality to a sensible many-to-many. We need to know how we are
constrained by the internals of Postgres itself and what syntax we should
use.

Also, the operation we are implementing requires knowing the full set of
inputs before it can be computed.

*Current solution:*

There is ValuePerCall (1/0 returned rows) or Materialize mode (any number
of returned rows), however the second one does not offer any invocation
counter (like ValuePerCall does). Hence to provide any persistence between
subcalls we introduced the following syntax:

*SELECT _ FROM table t, my_function(t.a, t.b, t.c, number_of_rows);*

Where by FROM a, b we mean cartesian product a times b. And my_function for
first (number_of_rows - 1) invocations returns an empty set and the full
result set for the last one.

Sadly this syntax requires us to enter a number of rows which is not very
convenient.

Do you know how to handle this situation correctly? We looked for example
at the code of tablefunc but the syntax there requires a full SQL query as
an input, so that wasn’t useful.

Re: Printing backtrace of postgres processes

2022-03-30 Thread Greg Stark

Sadly the cfbot is showing a patch conflict again. It's just a trivial
conflict in the regression test schedule so I'm not going to update
the status but it would be good to rebase it so we continue to get
cfbot testing.

Re: Frontend error logging style

2022-03-30 Thread Greg Stark

FYI this patch no longer applies.

Given it's a Tom patch and due to its nature it'll bitrot rapidly
anyways I'm don't see a point in updating the status though.

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-30 Thread David G. Johnston

On Wed, Mar 30, 2022 at 8:46 AM Tom Lane  wrote:

> I don't want to do that with
> a blunderbuss, but perhaps there's an argument to do it for specific
> cases (search_path comes to mind, though the performance cost could be
> significant, since I think setting that in function SET clauses is
> common).
>

I suspect it became considerably moreso when we fixed the search_path CVE
since we basically told people that doing so, despite the possible
performance hit, was the easiest solution to their immediate dump/restore
failures.  But ISTM that because that SET has a function invocation context
it could bypass any such check.  Though maybe the DO command exposes a flaw
in that idea.
David J.

Re: Adding CI to our tree

2022-03-30 Thread Andres Freund

Hi,

Now that zstd is used, enable it in CI. I plan to commit this shortly, unless
somebody sees a reason not to do so.

Greetings,

Andres Freund
>From 51cc830e2e82516966fe1c84cd134b1858a89bab Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 26 Mar 2022 12:36:04 -0700
Subject: [PATCH v1] ci: enable zstd where available.

Now that zstd is used, enable it in CI.

Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de
---
 .cirrus.yml | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 171bd29cf03..7b883b462a8 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -101,6 +101,7 @@ task:
 --with-ssl=openssl \
 --with-tcl --with-tclconfig=/usr/local/lib/tcl8.6/ \
 --with-uuid=bsd \
+--with-zstd \
 \
 --with-includes=/usr/local/include \
 --with-libs=/usr/local/lib \
@@ -142,6 +143,7 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >-
   --with-systemd
   --with-tcl --with-tclconfig=/usr/lib/tcl8.6/
   --with-uuid=ossp
+  --with-zstd
 
 
 task:
@@ -270,7 +272,8 @@ task:
   openldap \
   openssl \
   python \
-  tcl-tk
+  tcl-tk \
+  zstd
 
 brew cleanup -s # to reduce cache size
   upload_caches: homebrew
@@ -282,7 +285,7 @@ task:
 INCLUDES="${brewpath}/include:${INCLUDES}"
 LIBS="${brewpath}/lib:${LIBS}"
 
-for pkg in icu4c krb5 openldap openssl ; do
+for pkg in icu4c krb5 openldap openssl zstd ; do
   pkgpath="${brewpath}/opt/${pkg}"
   INCLUDES="${pkgpath}/include:${INCLUDES}"
   LIBS="${pkgpath}/lib:${LIBS}"
@@ -307,6 +310,7 @@ task:
   --with-ssl=openssl \
   --with-tcl --with-tclconfig=${brewpath}/opt/tcl-tk/lib/ \
   --with-uuid=e2fs \
+  --with-zstd \
   \
   --prefix=${HOME}/install \
   --with-includes="${INCLUDES}" \
-- 
2.35.1.677.gabf474a5dd

1 2 >

1 - 100 of 142 matches

Mail list logo