Re: Add non-blocking version of PQcancel
Note that the patch is still variously failing in cirrus. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/37/3511 You may already know that it's possible to trigger the cirrus ci tasks using a github branch. See src/tools/ci/README.
Re: In-placre persistance change of a relation
On Thu, Mar 31, 2022 at 01:58:45PM +0900, Kyotaro Horiguchi wrote: > Thanks! Version 20 is attached. The patch failed an all CI tasks, and seems to have caused the macos task to hang. http://cfbot.cputube.org/kyotaro-horiguchi.html Would you send a fixed patch, or remove this thread from the CFBOT ? Otherwise cirrrus will try to every day to rerun but take 1hr to time out, which is twice as slow as the slowest OS. I think this patch should be moved to the next CF and set to v16. Thanks, -- Justin
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
On 2022-03-31 01:00:14 -0400, Tom Lane wrote: > How well does this patch work with pre-14 buildfarm clients? Looks to me like it'll just run the test twice, once via TestUpgrade, once via taptest. It's possible that there could be trouble somehow due to duplicated log files or something?
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
I wrote: > There's still about a third of the buildfarm running older > client releases --- I count > 2 REL_8 > 2 REL_10 > 13 REL_11 > 6 REL_12 > 16 REL_13.1 > 89 REL_14 Wait a minute ... actually, what's most relevant here is the population running TAP tests, which seems to be 2 REL_8 4 REL_11 1 REL_12 7 REL_13.1 53 REL_14 So there are still some people we'd have to nag if it doesn't work pre-v14, but fewer than I thought --- specifically, the owners of butterflyfish copperhead eelpout elver halibut kittiwake mantid marabou massasauga myna snakefly snapper spurfowl tadarida regards, tom lane
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-31 00:08:00 -0400, Tom Lane wrote: > Andres Freund writes: > > I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that - > > we only enable it when not using unix sockets: > > Duh. But can it work over unix sockets? I wonder if we should go the other way and use unix sockets by default in the tests. Even if CI windows could be made to use SSPI, it'd still be further away from the environment most of us write tests in. Afaics the only reason we use SSPI is to secure the tests, because they run over tcp by default. But since we have unix socket support for windows now, that shouldn't really be needed anymore. The only animal that might not be new enough for it is hamerkop. I don't really understand when windows features end up in which release. Looking at 8f3ec75de40 it seems we just assume unix sockets are available, we don't have a version / feature test (win32.h just defines HAVE_STRUCT_SOCKADDR_UN). Greetings, Andres Freund
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Michael Paquier writes: > So, any particular feelings about this patch? This has been around > for a couple of months/years now, so it could be a good time to do the > switch now rather than wait an extra year, or even the beginning of > the next release cycle. And the buildfarm is already able to handle > that in its code based on the last release, by skipping the upgrade > check if it finds a pg_upgrade/t/ subdirectory. There's still about a third of the buildfarm running older client releases --- I count 2 REL_8 2 REL_10 13 REL_11 6 REL_12 16 REL_13.1 89 REL_14 How well does this patch work with pre-14 buildfarm clients? regards, tom lane
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 21:29:16 -0700, Peter Geoghegan wrote: > On Wed, Mar 30, 2022 at 9:20 PM Andres Freund wrote: > > But the debug elog reports that > > > > relfrozenxid updated 714 -> 717 > > relminmxid updated 1 -> 6 > > > > Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid > > from the shared relcache init file written by another backend: > > We should have added logging of relfrozenxid and relminmxid a long time ago. At least at DEBUG1 or such. > > This is basically the inverse of a54e1f15 - we read a *newer* horizon. > > That's > > normally fairly harmless - I think. > > Is this one pretty old? What do you mean with "this one"? The cause for the assert failure? I'm not sure there's a proper bug on HEAD here. I think at worst it can delay the horizon increasing a bunch, by falsely not using an aggressive vacuum when we should have - might even be limited to a single autovacuum cycle. > > Perhaps we should just fetch the horizons from the "local" catalog for > > shared > > rels? > > Not sure what you mean. Basically, instead of relying on the relcache, which for shared relation is vulnerable to seeing "too new" horizons due to the shared relcache init file, explicitly load relfrozenxid / relminmxid from the the catalog / syscache. I.e. fetch the relevant pg_class row in heap_vacuum_rel() (using SearchSysCache[Copy1](RELID)). And use that to set vacrel->relfrozenxid etc. Whereas right now we only fetch the pg_class row in vac_update_relstats(), but use the relcache before. Greetings, Andres Freund
Re: In-placre persistance change of a relation
Thanks! Version 20 is attached. At Wed, 30 Mar 2022 08:44:02 -0500, Justin Pryzby wrote in > On Tue, Mar 01, 2022 at 02:14:13PM +0900, Kyotaro Horiguchi wrote: > > Rebased on a recent xlog refactoring. > > It'll come as no surprise that this neds to be rebased again. > At least a few typos I reported in January aren't fixed. > Set to "waiting". Oh, I'm sorry for overlooking it. It somehow didn't show up on my mailer. > I started looking at this and reviewed docs and comments again. > > > +typedef struct PendingCleanup > > +{ > > + RelFileNode relnode;/* relation that may need to be deleted > > */ > > + int op; /* operation > > mask */ > > + boolbufpersistence; /* buffer persistence to set */ > > + int unlink_forknum; /* forknum to unlink */ > > This can be of data type "ForkNumber" Right. Fixed. > > +* We are going to create an init fork. If server crashes before the > > +* current transaction ends the init fork left alone corrupts data while > > +* recovery. The mark file works as the sentinel to identify that > > +* situation. > > s/while/during/ This was in v17, but dissapeared in v18. > > +* index-init fork needs further initialization. ambuildempty shoud do > > should (I reported this before) > > > + if (inxact_created) > > + { > > + SMgrRelation srel = smgropen(rnode, InvalidBackendId); > > + > > + /* > > +* INIT forks never be loaded to shared buffer so no point in > > dropping > > "are never loaded" If was fixed in v18. > > + elog(DEBUG1, "perform in-place persistnce change"); > > persistence (I reported this before) Sorry. Fixed. > > + /* > > +* While wal_level >= replica, switching to LOGGED requires the > > +* relation content to be WAL-logged to recover the table. > > +* We don't emit this fhile wal_level = minimal. > > while (or "if") There are "While" and "fhile". I changed the latter to "if". > > +* The relation is persistent and stays remain > > persistent. Don't > > +* drop the buffers for this relation. > > "stays remain" is redundant (I reported this before) Thanks. I changed it to "stays persistent". > > + if (unlink(rm_path) < 0) > > + ereport(ERROR, > > + (errcode_for_file_access(), > > +errmsg("could not remove file > > \"%s\": %m", > > + rm_path))); > > The parens around errcode are unnecessary since last year. > I suggest to avoid using them here and elsewhere. It is just moved from elsewhere without editing, but of course I can do that. I didn't know about that change of ereport and not found the corresponding commit, but I found that Tom mentioned that change. https://www.postgresql.org/message-id/flat/5063.1584641224%40sss.pgh.pa.us#63e611c30800133bbddb48de857668e8 > Now that we can rely on having varargs macros, I think we could > stop requiring the extra level of parentheses, ie instead of ... > ereport(ERROR, > errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero")); > > (The old syntax had better still work, of course. I'm not advocating > running around and changing existing calls.) I changed all ereport calls added by this patch to this style. > > +* And we may have SMGR_MARK_UNCOMMITTED file. Remove > > it if the > > +* fork files has been successfully removed. It's ok if > > the file > > file Fixed. > > + > > + All tables in the current database in a tablespace can be changed by > > using > > given tablespace I did /database in a tablespace/database in the given tablespace/. Is it right? > > + the ALL IN TABLESPACE form, which will lock all > > tables > > which will first lock > > > + to be changed first and then change each one. This form also > > supports > > remove "first" here This is almost a dead copy of the description of SET TABLESPACE. This change makes the two almost the same description vary slightly in that wordings. Anyway I did that as suggested only for the part this patch adds in this version. > > + OWNED BY, which will only change tables owned by > > the > > + roles specified. If the NOWAIT option is > > specified > > specified roles. > is specified, (comma) This is the same as above. I did that but it makes the description differ from another almost-the-same description. > > + then the command will fail if it is unable to acquire all of the > > locks > > if it is unable to immediately acquire > > > + required immediately. The information_schema > > remove immediately Ditto. > > + relations are not considered part of the
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 9:29 PM Peter Geoghegan wrote: > > Perhaps we should just fetch the horizons from the "local" catalog for > > shared > > rels? > > Not sure what you mean. Wait, you mean use vacrel->relfrozenxid directly? Seems kind of ugly... -- Peter Geoghegan
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 9:20 PM Andres Freund wrote: > But the debug elog reports that > > relfrozenxid updated 714 -> 717 > relminmxid updated 1 -> 6 > > Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid > from the shared relcache init file written by another backend: We should have added logging of relfrozenxid and relminmxid a long time ago. > This is basically the inverse of a54e1f15 - we read a *newer* horizon. That's > normally fairly harmless - I think. Is this one pretty old? > Perhaps we should just fetch the horizons from the "local" catalog for shared > rels? Not sure what you mean. -- Peter Geoghegan
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 21:11:48 -0700, Peter Geoghegan wrote: > On Wed, Mar 30, 2022 at 9:04 PM Andres Freund wrote: > > (gdb) p vacrel->NewRelfrozenXid > > $3 = 717 > > (gdb) p vacrel->relfrozenxid > > $4 = 717 > > (gdb) p OldestXmin > > $5 = 5112 > > (gdb) p aggressive > > $6 = false > > Does this OldestXmin seem reasonable at this point in execution, based > on context? Does it look too high? Something else? Reasonable: (gdb) p *ShmemVariableCache $1 = {nextOid = 78969, oidCount = 2951, nextXid = {value = 21411}, oldestXid = 714, xidVacLimit = 20714, xidWarnLimit = 2107484361, xidStopLimit = 2144484361, xidWrapLimit = 2147484361, oldestXidDB = 1, oldestCommitTsXid = 0, newestCommitTsXid = 0, latestCompletedXid = {value = 21408}, xactCompletionCount = 1635, oldestClogXid = 714} I think the explanation I just sent explains the problem, without "in-memory" confusion about what's running and what's not. Greetings, Andres Freund
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 21:04:07 -0700, Andres Freund wrote: > On 2022-03-30 20:35:25 -0700, Peter Geoghegan wrote: > > On Wed, Mar 30, 2022 at 8:28 PM Andres Freund wrote: > > > I triggered twice now, but it took a while longer the second time. > > > > Great. > > > > I wonder if you can get an RR recording... > > Started it, but looks like it's too slow. > > (gdb) p MyProcPid > $1 = 2172500 > > (gdb) p vacrel->NewRelfrozenXid > $3 = 717 > (gdb) p vacrel->relfrozenxid > $4 = 717 > (gdb) p OldestXmin > $5 = 5112 > (gdb) p aggressive > $6 = false I added a bunch of debug elogs to see what sets *frozenxid_updated to true. (gdb) p *vacrel $1 = {rel = 0x7fe24f3e0148, indrels = 0x7fe255c17ef8, nindexes = 2, aggressive = false, skipwithvm = true, failsafe_active = false, consider_bypass_optimization = true, do_index_vacuuming = true, do_index_cleanup = true, do_rel_truncate = true, bstrategy = 0x7fe255bb0e28, pvs = 0x0, relfrozenxid = 717, relminmxid = 6, old_live_tuples = 42, OldestXmin = 20751, vistest = 0x7fe255058970 , FreezeLimit = 4244988047, MultiXactCutoff = 4289967302, NewRelfrozenXid = 717, NewRelminMxid = 6, skippedallvis = false, relnamespace = 0x7fe255c17bf8 "pg_catalog", relname = 0x7fe255c17cb8 "pg_database", indname = 0x0, blkno = 4294967295, offnum = 0, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, verbose = false, dead_items = 0x7fe255c131d0, rel_pages = 8, scanned_pages = 8, removed_pages = 0, lpdead_item_pages = 0, missed_dead_pages = 0, nonempty_pages = 8, new_rel_tuples = 124, new_live_tuples = 42, indstats = 0x7fe255c18320, num_index_scans = 0, tuples_deleted = 0, lpdead_items = 0, live_tuples = 42, recently_dead_tuples = 82, missed_dead_tuples = 0} But the debug elog reports that relfrozenxid updated 714 -> 717 relminmxid updated 1 -> 6 Tthe problem is that the crashing backend reads the relfrozenxid/relminmxid from the shared relcache init file written by another backend: 2022-03-30 21:10:47.626 PDT [2625038][autovacuum worker][6/433:0][] LOG: automatic vacuum of table "contrib_regression_postgres_fdw.pg_catalog.pg_database": index scans: 1 pages: 0 removed, 8 remain, 8 scanned (100.00% of total) tuples: 4 removed, 114 remain, 72 are dead but not yet removable removable cutoff: 20751, older by 596 xids when operation ended new relfrozenxid: 717, which is 3 xids ahead of previous value new relminmxid: 6, which is 5 mxids ahead of previous value index scan needed: 3 pages from table (37.50% of total) had 8 dead item identifiers removed index "pg_database_datname_index": pages: 2 in total, 0 newly deleted, 0 currently deleted, 0 reusable index "pg_database_oid_index": pages: 6 in total, 0 newly deleted, 2 currently deleted, 2 reusable I/O timings: read: 0.050 ms, write: 0.102 ms avg read rate: 209.860 MB/s, avg write rate: 76.313 MB/s buffer usage: 42 hits, 22 misses, 8 dirtied WAL usage: 13 records, 5 full page images, 33950 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s ... 2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][:0][] DEBUG: InitPostgres 2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/0:0][] DEBUG: my backend ID is 6 2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/0:0][] LOG: reading shared init file 2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/443:0][] DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/sub> 2022-03-30 21:10:47.772 PDT [2625043][autovacuum worker][6/443:0][] LOG: reading non-shared init file This is basically the inverse of a54e1f15 - we read a *newer* horizon. That's normally fairly harmless - I think. Perhaps we should just fetch the horizons from the "local" catalog for shared rels? Greetings, Andres Freund
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Mar 31, 2022 at 5:07 AM Andres Freund wrote: > > Hi, > > On 2022-03-29 11:55:05 -0400, Robert Haas wrote: > > I committed v6 instead. > > I was just discussing the WAL prefetching patch with Thomas. A question in > that discussion made me look at the coverage of REDO for CREATE DATABASE: > https://coverage.postgresql.org/src/backend/commands/dbcommands.c.gcov.html > > Seems there's currently nothing hitting the REDO for > XLOG_DBASE_CREATE_FILE_COPY (currently line 3019). I think it'd be good to > keep coverage for that. How about adding a > CREATE DATABASE ... STRATEGY file_copy > to 001_stream_rep.pl? > > > Might be worth adding a test for ALTER DATABASE ... SET TABLESPACE at the same > time, this patch did affect that path in some minor ways. And, somewhat > shockingly, we don't have a single test for it. I will add tests for both of these cases and send the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 9:04 PM Andres Freund wrote: > (gdb) p vacrel->NewRelfrozenXid > $3 = 717 > (gdb) p vacrel->relfrozenxid > $4 = 717 > (gdb) p OldestXmin > $5 = 5112 > (gdb) p aggressive > $6 = false Does this OldestXmin seem reasonable at this point in execution, based on context? Does it look too high? Something else? -- Peter Geoghegan
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
On Thu, Mar 03, 2022 at 02:03:38PM +0900, Michael Paquier wrote: > Indeed. I have reworked the whole, rather than just those three > sentences. So, any particular feelings about this patch? This has been around for a couple of months/years now, so it could be a good time to do the switch now rather than wait an extra year, or even the beginning of the next release cycle. And the buildfarm is already able to handle that in its code based on the last release, by skipping the upgrade check if it finds a pg_upgrade/t/ subdirectory. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Andres Freund writes: > On 2022-03-30 22:07:48 -0400, Tom Lane wrote: >> So ... none of the Windows buildfarm members actually like this >> test script. > On windows CI sets > # Avoids port conflicts between concurrent tap test runs > PG_TEST_USE_UNIX_SOCKETS: 1 Ok ... > I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that - > we only enable it when not using unix sockets: Duh. But can it work over unix sockets? regards, tom lane
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 20:35:25 -0700, Peter Geoghegan wrote: > On Wed, Mar 30, 2022 at 8:28 PM Andres Freund wrote: > > I triggered twice now, but it took a while longer the second time. > > Great. > > I wonder if you can get an RR recording... Started it, but looks like it's too slow. (gdb) p MyProcPid $1 = 2172500 (gdb) p vacrel->NewRelfrozenXid $3 = 717 (gdb) p vacrel->relfrozenxid $4 = 717 (gdb) p OldestXmin $5 = 5112 (gdb) p aggressive $6 = false There was another autovacuum of pg_database 10s before: 2022-03-30 20:35:17.622 PDT [2165344][autovacuum worker][5/3:0][] LOG: automatic vacuum of table "postgres.pg_catalog.pg_database": index scans: 1 pages: 0 removed, 3 remain, 3 scanned (100.00% of total) tuples: 61 removed, 4 remain, 1 are dead but not yet removable removable cutoff: 1921, older by 3 xids when operation ended new relfrozenxid: 717, which is 3 xids ahead of previous value index scan needed: 3 pages from table (100.00% of total) had 599 dead item identifiers removed index "pg_database_datname_index": pages: 2 in total, 0 newly deleted, 0 currently deleted, 0 reusable index "pg_database_oid_index": pages: 4 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 0.029 ms, write: 0.034 ms avg read rate: 134.120 MB/s, avg write rate: 89.413 MB/s buffer usage: 35 hits, 12 misses, 8 dirtied WAL usage: 12 records, 5 full page images, 27218 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s The dying backend: 2022-03-30 20:35:27.668 PDT [2172500][autovacuum worker][7/0:0][] DEBUG: autovacuum: processing database "contrib_regression_hstore" ... 2022-03-30 20:35:27.690 PDT [2172500][autovacuum worker][7/674:0][] CONTEXT: while cleaning up index "pg_database_oid_index" of relation "pg_catalog.pg_database" Greetings, Andres Freund
Re: pg_tablespace_location() failure with allow_in_place_tablespaces
On Wed, Mar 30, 2022 at 08:23:25PM +1300, Thomas Munro wrote: > That leads to the attached patches, the first of which I'd want to back-patch. Makes sense. - if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0) - d->ret.d_type = DT_DIR; /* For reparse points dwReserved0 field will contain the ReparseTag */ - else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 && -(fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT)) + if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 && + (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT)) d->ret.d_type = DT_LNK; + else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0) + d->ret.d_type = DT_DIR; This should also work for plain files, so that looks fine to me. > Unfortunately while testing this I realised there is something else > wrong here: if you take a basebackup using tar format, in-place > tablespaces are skipped (they should get their own OID.tar file, but > they don't, also no error). While it wasn't one of my original goals > for in-place tablespaces to work in every way (and I'm certain some > external tools would be confused by them), it seems we're pretty close > so we should probably figure out that piece of the puzzle. It may be > obvious why but I didn't have time to dig into that today... perhaps > instead of just skipping the readlink() we should be writing something > different into the mapfile and then restoring as appropriate... Yeah, I saw that in-place tablespaces were part of the main tarball in base backups as we rely on the existence of a link to decide if the contents of a path should be separated in a different tarball or not. This does not strike me as a huge problem in itself, TBH, as the improvement would be limited to make sure that the base backups could be correctly restored with multiple tablespaces. And you can get pretty much the same amount of coverage to make sure that the backup contents are correct without fully restoring them. -- Michael signature.asc Description: PGP signature
Re: Handle infinite recursion in logical replication setup
On Wed, Mar 30, 2022 at 7:40 PM Ashutosh Bapat wrote: > > > > The changes for the same are available int the v5 patch available at [1]. > > [1] - > > https://www.postgresql.org/message-id/CALDaNm3wCf0YcvVo%2BgHMGpupk9K6WKJxCyLUvhPC2GkPKRZUWA%40mail.gmail.com > > > > cb->truncate_cb = pg_decode_truncate; > cb->commit_cb = pg_decode_commit_txn; > cb->filter_by_origin_cb = pg_decode_filter; > +cb->filter_remote_origin_cb = pg_decode_filter_remote_origin; > > Why do we need a new hook? Can we use filter_by_origin_cb? > I also think there is no need for a new hook in this case but I might also be missing something that Vignesh has in his mind. > Also it looks like > implementation of pg_decode_filter and pg_decode_filter_remote_origin is same, > unless my eyes are deceiving me. > > - binary, streaming, and > - disable_on_error. > + binary, streaming, > + disable_on_error and > + publish_local_only. > > "publish_local_only" as a "subscription" option looks odd. Should it be > "subscribe_local_only"? > I also think "subscribe_local_only" fits better here. -- With Regards, Amit Kapila.
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 20:28:44 -0700, Andres Freund wrote: > I was able to trigger the crash. > > cat ~/tmp/pgbench-createdb.sql > CREATE DATABASE pgb_:client_id; > DROP DATABASE pgb_:client_id; > > pgbench -n -P1 -c 10 -j10 -T100 -f ~/tmp/pgbench-createdb.sql > > while I was also running > > for i in $(seq 1 100); do echo iteration $i; make -Otarget -C contrib/ -s > installcheck -j48 -s prove_installcheck=true USE_MODULE_DB=1 > /tmp/ci-$i.log > 2>&1; done > > I triggered twice now, but it took a while longer the second time. Forgot to say how postgres was started. Via my usual devenv script, which results in: + /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -c hba_file=/home/andres/tmp/pgdev/pg_hba.conf -D /srv/dev/pgdev-dev/ -p 5440 -c shared_buffers=2GB -c wal_level=hot_standby -c max_wal_senders=10 -c track_io_timing=on -c restart_after_crash=false -c max_prepared_transactions=20 -c log_checkpoints=on -c min_wal_size=48MB -c max_wal_size=150GB -c 'cluster_name=dev assert' -c ssl_cert_file=/home/andres/tmp/pgdev/ssl-cert-snakeoil.pem -c ssl_key_file=/home/andres/tmp/pgdev/ssl-cert-snakeoil.key -c 'log_line_prefix=%m [%p][%b][%v:%x][%a] ' -c shared_buffers=16MB -c log_min_messages=debug1 -c log_connections=on -c allow_in_place_tablespaces=1 -c log_autovacuum_min_duration=0 -c log_lock_waits=true -c autovacuum_naptime=10s -c fsync=off Greetings, Andres Freund
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 8:28 PM Andres Freund wrote: > I triggered twice now, but it took a while longer the second time. Great. I wonder if you can get an RR recording... -- Peter Geoghegan
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, I was able to trigger the crash. cat ~/tmp/pgbench-createdb.sql CREATE DATABASE pgb_:client_id; DROP DATABASE pgb_:client_id; pgbench -n -P1 -c 10 -j10 -T100 -f ~/tmp/pgbench-createdb.sql while I was also running for i in $(seq 1 100); do echo iteration $i; make -Otarget -C contrib/ -s installcheck -j48 -s prove_installcheck=true USE_MODULE_DB=1 > /tmp/ci-$i.log 2>&1; done I triggered twice now, but it took a while longer the second time. (gdb) bt full #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 set = {__val = {4194304, 0, 0, 0, 0, 0, 216172782113783808, 2, 2377909399344644096, 18446497967838863616, 0, 0, 0, 0, 0, 0}} pid = tid = ret = #1 0x7fe49a2db546 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}, sa_flags = 0, sa_restorer = 0x107e0} sigs = {__val = {32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}} #2 0x7fe49b9706f1 in ExceptionalCondition (conditionName=0x7fe49ba0618d "diff > 0", errorType=0x7fe49ba05bd1 "FailedAssertion", fileName=0x7fe49ba05b90 "/home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c", lineNumber=724) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69 No locals. #3 0x7fe49b2fc739 in heap_vacuum_rel (rel=0x7fe497a8d148, params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10) at /home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:724 buf = { data = 0x7fe49c17e238 "automatic vacuum of table \"contrib_regression_dict_int.pg_catalog.pg_database\": index scans: 1\npages: 0 removed, 3 remain, 3 scanned (100.00% of total)\ntuples: 49 removed, 53 remain, 9 are dead but no"..., len = 279, maxlen = 1024, cursor = 0} msgfmt = 0x7fe49ba06038 "automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" diff = 0 endtime = 702011687982080 vacrel = 0x7fe49c19b5b8 verbose = false instrument = true ru0 = {tv = {tv_sec = 1648696487, tv_usec = 975963}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 3086}, { --Type for more, q to quit, c to continue without paging--c ru_maxrss = 10824, __ru_maxrss_word = 10824}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 449, __ru_minflt_word = 449}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 2, __ru_nvcsw_word = 2}, {ru_nivcsw = 0, __ru_nivcsw_word = 0}}} starttime = 702011687975964 walusage_start = {wal_records = 0, wal_fpi = 0, wal_bytes = 0} walusage = {wal_records = 11, wal_fpi = 7, wal_bytes = 30847} secs = 0 usecs = 6116 read_rate = 16.606033355134073 write_rate = 7.6643230869849575 aggressive = false skipwithvm = true frozenxid_updated = true minmulti_updated = true orig_rel_pages = 3 new_rel_pages = 3 new_rel_allvisible = 0 indnames = 0x7fe49c19bb28 errcallback = {previous = 0x0, callback = 0x7fe49b3012fd , arg = 0x7fe49c19b5b8} startreadtime = 180 startwritetime = 0 OldestXmin = 67552 FreezeLimit = 4245034848 OldestMxact = 224 MultiXactCutoff = 4289967520 __func__ = "heap_vacuum_rel" #4 0x7fe49b523d92 in table_relation_vacuum (rel=0x7fe497a8d148, params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10) at /home/andres/src/postgresql/src/include/access/tableam.h:1680 No locals. #5 0x7fe49b527032 in vacuum_rel (relid=1262, relation=0x7fe49c1ae360, params=0x7fe49c130d7c) at /home/andres/src/postgresql/src/backend/commands/vacuum.c:2065 lmode = 4 rel = 0x7fe497a8d148 lockrelid = {relId = 1262, dbId = 0} toast_relid = 0 save_userid = 10 save_sec_context = 0 save_nestlevel = 2 __func__ = "vacuum_rel" #6 0x7fe49b524c3b in vacuum (relations=0x7fe49c1b03a8, params=0x7fe49c130d7c, bstrategy=0x7fe49c130e10, isTopLevel=true) at /home/andres/src/postgresql/src/backend/commands/vacuum.c:482 vrel = 0x7fe49c1ae3b8 cur__state = {l = 0x7fe49c1b03a8, i = 0} cur = 0x7fe49c1b03c0 _save_exception_stack = 0x7fff97e35a10 _save_context_stack = 0x0 _local_sigjmp_buf = {{__jmpbuf = {140735741652128, 6126579318940970843, 9223372036854775747, 0, 0, 0, 6126579318957748059, 6139499258682879835}, __mask_was_saved = 0, __saved_mask = {__val = {32, 140619848279000, 8590910454, 140619848278592, 32, 140619848278944, 7784,
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 7:37 PM Peter Geoghegan wrote: > Yeah, a WARNING would be good here. I can write a new version of my > patch series with a separation patch for that this evening. Actually, > better make it a PANIC for now... Attached is v14, which includes a new patch that PANICs like that in vac_update_relstats() --- 0003. This approach also covers manual VACUUMs, which isn't the case with the failing assertion, which is in instrumentation code (actually VACUUM VERBOSE might hit it). I definitely think that something like this should be committed. Silently ignoring system catalog corruption isn't okay. -- Peter Geoghegan v14-0003-PANIC-on-relfrozenxid-from-the-future.patch Description: Binary data v14-0001-Set-relfrozenxid-to-oldest-extant-XID-seen-by-VA.patch Description: Binary data v14-0002-Generalize-how-VACUUM-skips-all-frozen-pages.patch Description: Binary data v14-0004-vacuumlazy.c-Move-resource-allocation-to-heap_va.patch Description: Binary data
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 7:00 PM Andres Freund wrote: > Something vaguely like EXPECT_EQ_U32 in regress.c. Maybe > AssertCmp(type, a, op, b), > > Then the assertion could have been something like >AssertCmp(int32, diff, >, 0) I'd definitely use them if they were there. > Does the line number in the failed run actually correspond to the xid, rather > than the mxid case? I didn't check. Yes, I verified -- definitely relfrozenxid. > You can do something similar locally on linux with > make -Otarget -C contrib/ -j48 -s USE_MODULE_DB=1 installcheck > prove_installcheck=true > (the prove_installcheck=true to prevent tap tests from running, we don't seem > to have another way for that) > > I don't think windows uses USE_MODULE_DB=1, but it allows to cause a lot more > load concurrently than running tests serially... Can't get it to fail locally with that recipe. > > Assert(vacrel->NewRelfrozenXid == OldestXmin || > >TransactionIdPrecedesOrEquals(vacrel->relfrozenxid, > > vacrel->NewRelfrozenXid)); > > The comment in your patch says "is either older or newer than FreezeLimit" - I > assume that's some rephrasing damage? Both the comment and the assertion are correct. I see what you mean, though. > Perhaps it's worth commiting improved assertions on master? If this is indeed > a pre-existing bug, and we're just missing due to slightly less stringent > asserts, we could rectify that separately. I don't think there's much chance of the assertion actually hitting without the rest of the patch series. The new relfrozenxid value is always going to be OldestXmin - vacuum_min_freeze_age on HEAD, while with the patch it's sometimes close to OldestXmin. Especially when you have lots of dead tuples that you churn through constantly (like pgbench_tellers, or like these system catalogs on the CI test machine). > Hm. This triggers some vague memories. There's some oddities around shared > relations being vacuumed separately in all the databases and thus having > separate horizons. That's what I was thinking of, obviously. > After "remembering" that, I looked in the cirrus log for the failed run, and > the worker was processing shared a shared relation last: > > 2022-03-30 03:48:30.238 GMT [5984][autovacuum worker] LOG: automatic analyze > of table "contrib_regression.pg_catalog.pg_authid" I noticed the same thing myself. Should have said sooner. > Perhaps this ought to be an elog() instead of an Assert()? Something has gone > pear shaped if we get here... It's a bit annoying though, because it'd have to > be a PANIC to be visible on the bf / CI :(. Yeah, a WARNING would be good here. I can write a new version of my patch series with a separation patch for that this evening. Actually, better make it a PANIC for now... -- Peter Geoghegan
Re: Add LZ4 compression in pg_dump
On Wed, Mar 30, 2022 at 03:32:55PM +, gkokola...@pm.me wrote: > On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier > wrote: >> While moving on with 0002, I have noticed the following in >> >> _StartBlob(): >> if (AH->compression != 0) >> sfx = ".gz"; >> else >> sfx = ""; >> >> Shouldn't this bit also be simplified, adding a fatal() like the other >> code paths, for safety? > > Agreed. Fixed. Okay. 0002 looks fine as-is, and I don't mind the extra fatal() calls. These could be asserts but that's not a big deal one way or the other. And the cleanup is now applied. >> + my $compress_program = $ENV{GZIP_PROGRAM}; >> >> It seems to me that it is enough to rely on {compress_cmd}, hence >> there should be no need for $compress_program, no? > > Maybe not. We don't want to the tests to fail if the utility is not > installed. That becomes even more evident as more methods are added. > However I realized that the presence of the environmental variable does > not guarrantee that the utility is actually installed. In the attached, > the existance of the utility is based on the return value of system_log(). Hmm. [.. thinks ..] The thing that's itching me here is that you align the concept of compression with gzip, but that's not going to be true once more compression options are added to pg_dump, and that would make $supports_compression and $compress_program_exists incorrect. Perhaps the right answer would be to rename all that with a suffix like "_gzip" to make a difference? Or would there be enough control with a value of "compression_gzip" instead of "compression" in test_key? +my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h', + '>', '/dev/null') == 0); Do we need this command execution at all? In all the other tests, we rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do the same here. A last thing is that we should perhaps make a clear difference between the check that looks at if the code has been built with zlib and the check for the presence of GZIP_PROGRAM, as it can be useful in some environments to be able to run pg_dump built with zlib, even if the GZIP_PROGRAM command does not exist (I don't think this can be the case, but other tests are flexible). As of now, the patch relies on pg_dump enforcing uncompression if building under --without-zlib even if --compress/-Z is used, but that also means that those compression tests overlap with the existing tests in this case. Wouldn't it be more consistent to check after $supports_compression when executing the dump command for test_key = "compression[_gzip]"? This would mean keeping GZIP_PROGRAM as sole check when executing the compression command. -- Michael signature.asc Description: PGP signature
RE: Logical replication timeout problem
On Wed, Mar 30, 2022 3:54 PM wangw.f...@fujitsu.com wrote: > > Rebase the patch because the commit d5a9d86d in current HEAD. > Thanks for your patch, I tried this patch and confirmed that there is no timeout problem after applying this patch, and I could reproduce this problem on HEAD. Regards, Shi yu
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-30 22:07:48 -0400, Tom Lane wrote: > So ... none of the Windows buildfarm members actually like this > test script. They're all showing failures along the lines of > > not ok 2 - fails if basebackup_to_shell.command is not set: matches > > # Failed test 'fails if basebackup_to_shell.command is not set: matches' > # at t/001_basic.pl line 38. > # 'pg_basebackup: error: connection to server at > "127.0.0.1", port 55358 failed: FATAL: SSPI authentication failed for user > "backupuser" > # ' > # doesn't match '(?^:shell command for backup is not configured)' > > Does the CI setup not account for this issue? On windows CI sets # Avoids port conflicts between concurrent tap test runs PG_TEST_USE_UNIX_SOCKETS: 1 because I've otherwise seen a lot of spurious tap test failures - Cluster.pm get_free_port() is racy, as it admits: XXX A port available now may become unavailable by the time we start the desired service. The only alternative is to not use parallelism when running tap tests, but that makes test runs even slower - and windows is already the bottleneck for cfbot. I assume SSPI doesn't work over unix sockets? Oh. Maybe it's not even that - we only enable it when not using unix sockets: # Internal method to set up trusted pg_hba.conf for replication. Not # documented because you shouldn't use it, it's called automatically if needed. sub set_replication_conf { my ($self) = @_; my $pgdata = $self->data_dir; $self->host eq $test_pghost or croak "set_replication_conf only works with the default host"; open my $hba, '>>', "$pgdata/pg_hba.conf"; print $hba "\n# Allow replication (set up by PostgreSQL::Test::Cluster.pm)\n"; if ($PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::use_unix_sockets) { print $hba "host replication all $test_localhost/32 sspi include_realm=1 map=regress\n"; } close $hba; return; } Greetings, Andres Freund
Re: pgsql: Add 'basebackup_to_shell' contrib module.
So ... none of the Windows buildfarm members actually like this test script. They're all showing failures along the lines of not ok 2 - fails if basebackup_to_shell.command is not set: matches # Failed test 'fails if basebackup_to_shell.command is not set: matches' # at t/001_basic.pl line 38. # 'pg_basebackup: error: connection to server at "127.0.0.1", port 55358 failed: FATAL: SSPI authentication failed for user "backupuser" # ' # doesn't match '(?^:shell command for backup is not configured)' Does the CI setup not account for this issue? regards, tom lane
Re: Commitfest Update
On Wed, Mar 30, 2022 at 02:41:26PM -0400, Greg Stark wrote: > > Patches that are Waiting on Author and haven't had activity in months > -- traditionally they were set to Returned with Feedback. It seems the > feeling these days is to not lose state on them and just move them to > the next CF. I'm not sure that's wise, it ends up just filling up the > list with patches nobody's working on. +1 for closing such patches as Returned with Feedback, for the same reasons Robert and Peter already stated. Note that I already closed such CF entries during the last commitfest, so hopefully there shouldn't be too much. Last time I used "both Waiting on Author and no activity from the author, since at least the 15th of the month" as the threshold to close such patches (although I closed them at the beginning of the next month), as it seems to be the usual (and informal) rule.
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-03-30 17:50:42 -0700, Peter Geoghegan wrote: > I tried several times to recreate this issue on CI. No luck with that, > though -- can't get it to fail again after 4 attempts. It's really annoying that we don't have Assert variants that show the compared values, that might make it easier to intepret what's going on. Something vaguely like EXPECT_EQ_U32 in regress.c. Maybe AssertCmp(type, a, op, b), Then the assertion could have been something like AssertCmp(int32, diff, >, 0) Does the line number in the failed run actually correspond to the xid, rather than the mxid case? I didn't check. You could try to increase the likelihood of reproducing the failure by duplicating the invocation that lead to the crash a few times in the .cirrus.yml file in your dev branch. That might allow hitting the problem more quickly. Maybe reduce autovacuum_naptime in src/tools/ci/pg_ci_base.conf? Or locally - one thing that windows CI does different from the other platforms is that it runs isolation, contrib and a bunch of other tests using the same cluster. Which of course increases the likelihood of autovacuum having stuff to do, *particularly* on shared relations - normally there's probably not enough changes for that. You can do something similar locally on linux with make -Otarget -C contrib/ -j48 -s USE_MODULE_DB=1 installcheck prove_installcheck=true (the prove_installcheck=true to prevent tap tests from running, we don't seem to have another way for that) I don't think windows uses USE_MODULE_DB=1, but it allows to cause a lot more load concurrently than running tests serially... > We know that this particular assertion did not fail during the same VACUUM: > > Assert(vacrel->NewRelfrozenXid == OldestXmin || >TransactionIdPrecedesOrEquals(vacrel->relfrozenxid, > vacrel->NewRelfrozenXid)); The comment in your patch says "is either older or newer than FreezeLimit" - I assume that's some rephrasing damage? > So it's hard to see how this could be a bug in the patch -- the final > new relfrozenxid is presumably equal to VACUUM's OldestXmin in the > problem scenario seen on the CI Windows instance yesterday (that's why > this earlier assertion didn't fail). Perhaps it's worth commiting improved assertions on master? If this is indeed a pre-existing bug, and we're just missing due to slightly less stringent asserts, we could rectify that separately. > The surprising part of the CI failure must have taken place just after > this assertion, when VACUUM's call to vacuum_set_xid_limits() actually > updates pg_class.relfrozenxid with vacrel->NewRelfrozenXid -- > presumably because the existing relfrozenxid appeared to be "in the > future" when we examine it in pg_class again. We see evidence that > this must have happened afterwards, when the closely related assertion > (used only in instrumentation code) fails: Hm. This triggers some vague memories. There's some oddities around shared relations being vacuumed separately in all the databases and thus having separate horizons. After "remembering" that, I looked in the cirrus log for the failed run, and the worker was processing shared a shared relation last: 2022-03-30 03:48:30.238 GMT [5984][autovacuum worker] LOG: automatic analyze of table "contrib_regression.pg_catalog.pg_authid" Obviously that's not a guarantee that the next table processed also is a shared catalog, but ... Oh, the relid is actually in the stack trace. 0x4ee = 1262 = pg_database. Which makes sense, the test ends up with a high percentage of dead rows in pg_database, due to all the different contrib tests creating/dropping a database. > From my patch: > > > if (frozenxid_updated) > > { > > - diff = (int32) (FreezeLimit - vacrel->relfrozenxid); > > + diff = (int32) (vacrel->NewRelfrozenXid - > > vacrel->relfrozenxid); > > + Assert(diff > 0); > > appendStringInfo(, > > _("new relfrozenxid: %u, which is %d xids > > ahead of previous value\n"), > > -FreezeLimit, diff); > > +vacrel->NewRelfrozenXid, diff); > > } Perhaps this ought to be an elog() instead of an Assert()? Something has gone pear shaped if we get here... It's a bit annoying though, because it'd have to be a PANIC to be visible on the bf / CI :(. Greetings, Andres Freund
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Mar 30, 2022 at 12:01 AM Peter Geoghegan wrote: > Perhaps something is amiss inside vac_update_relstats(), where the > boolean flag that indicates that pg_class.relfrozenxid was advanced is > set: > > if (frozenxid_updated) > *frozenxid_updated = false; > if (TransactionIdIsNormal(frozenxid) && > pgcform->relfrozenxid != frozenxid && > (TransactionIdPrecedes(pgcform->relfrozenxid, frozenxid) || > TransactionIdPrecedes(ReadNextTransactionId(), >pgcform->relfrozenxid))) > { > if (frozenxid_updated) > *frozenxid_updated = true; > pgcform->relfrozenxid = frozenxid; > dirty = true; > } > > Maybe the "existing relfrozenxid is in the future, silently update > relfrozenxid" part of the condition (which involves > ReadNextTransactionId()) somehow does the wrong thing here. But how? I tried several times to recreate this issue on CI. No luck with that, though -- can't get it to fail again after 4 attempts. This was a VACUUM of pg_database, run from an autovacuum worker. I am vaguely reminded of the two bugs fixed by Andres in commit a54e1f15. Both were issues with the shared relcache init file affecting shared and nailed catalog relations. Those bugs had symptoms like " ERROR: found xmin ... from before relfrozenxid ..." for various system catalogs. We know that this particular assertion did not fail during the same VACUUM: Assert(vacrel->NewRelfrozenXid == OldestXmin || TransactionIdPrecedesOrEquals(vacrel->relfrozenxid, vacrel->NewRelfrozenXid)); So it's hard to see how this could be a bug in the patch -- the final new relfrozenxid is presumably equal to VACUUM's OldestXmin in the problem scenario seen on the CI Windows instance yesterday (that's why this earlier assertion didn't fail). The assertion I'm showing here needs the "vacrel->NewRelfrozenXid == OldestXmin" part of the condition to account for the fact that OldestXmin/GetOldestNonRemovableTransactionId() is known to "go backwards". Without that the regression tests will fail quite easily. The surprising part of the CI failure must have taken place just after this assertion, when VACUUM's call to vacuum_set_xid_limits() actually updates pg_class.relfrozenxid with vacrel->NewRelfrozenXid -- presumably because the existing relfrozenxid appeared to be "in the future" when we examine it in pg_class again. We see evidence that this must have happened afterwards, when the closely related assertion (used only in instrumentation code) fails: >From my patch: > if (frozenxid_updated) > { > - diff = (int32) (FreezeLimit - vacrel->relfrozenxid); > + diff = (int32) (vacrel->NewRelfrozenXid - > vacrel->relfrozenxid); > + Assert(diff > 0); > appendStringInfo(, > _("new relfrozenxid: %u, which is %d xids > ahead of previous value\n"), > -FreezeLimit, diff); > +vacrel->NewRelfrozenXid, diff); > } Does anybody have any ideas about what might be going on here? -- Peter Geoghegan
Re: Mingw task for Cirrus CI
Hi, On 2022-03-22 19:00:42 +0300, Melih Mutlu wrote: > Rebased it. > I also removed the temp installation task and > used NoDefaultCurrentDirectoryInExePath env variable instead. Hm. But you're still using a separate build directory, from what I can see? The NoDefaultCurrentDirectoryInExePath thing should only have an effect when not using a separate build directory, no? Does it work to not use the separate build dir? Without it we don't need the the "preparing build tree" step, and that's quite slow on mingw: https://cirrus-ci.com/task/4713509253545984?logs=configure#L392 [00:23:44.371] preparing build tree... done [00:24:25.429] configure: creating ./config.status Chatting about this patch with Thomas I started to wonder about other reasons for the slow speed of configure. I briefly experimented locally, and it looks like using 'dash' as the shell makes configure run a good bit quicker. > --- > .cirrus.yml | 79 +++-- > 1 file changed, 65 insertions(+), 14 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index e5335fede7..1ed40347cf 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -23,7 +23,6 @@ env: >CHECKFLAGS: -Otarget >PROVE_FLAGS: --timer >PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests > - TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf >PG_TEST_EXTRA: kerberos ldap ssl This removes TEMP_CONFIG from all other tasks. You added it back to the VS windows task, but not the others? I assume that was accidental? > + env: > +CCACHE_DIR: C:/msys64/ccache > +BUILD_DIR: "%CIRRUS_WORKING_DIR%/build" I think this should use TEMP_CONFIG too. Is the problem that you need to change the path? > + ccache_cache: > +folder: ${CCACHE_DIR} > + > + mingw_info_script: > +- C:\msys64\usr\bin\bash.exe -lc "where gcc" > +- C:\msys64\usr\bin\bash.exe -lc "gcc --version" > +- C:\msys64\usr\bin\bash.exe -lc "where perl" > +- C:\msys64\usr\bin\bash.exe -lc "perl --version" > + > + configure_script: > +- C:\msys64\usr\bin\bash.exe -lc "mkdir %BUILD_DIR% && > + cd %BUILD_DIR% && > + %CIRRUS_WORKING_DIR%/configure Could you try using dash to invoke configure here, and whether it makes configure faster? > +--host=x86_64-w64-mingw32 > +--enable-cassert > +--enable-tap-tests > +--with-icu > +--with-libxml > +--with-libxslt > +--with-lz4 > +--enable-debug > +CC='ccache gcc' > +CXX='ccache g++'" I think this task should specify CFLAGS="-Og", CXXFLAGS="-Og" similar to other tasks. We end up with -O2 otherwise, which makes the build measurably slower. > + tests_script: > + - set "NoDefaultCurrentDirectoryInExePath=0" A comment about why NoDefaultCurrentDirectoryInExePath=0 is used would be good. Greetings, Andres Freund
Re: Higher level questions around shared memory stats
Hi, On 2022-03-30 16:35:50 -0700, Andres Freund wrote: > On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > > Separate from the minutia in [1] I'd like to discuss a few questions of more > > general interest. I'll post another question or two later. > > 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define / >pg_stats_temp directory? > >With shared memory stats patch, the stats system itself doesn't need it >anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store >pgss_query_texts.stat. That file can be fairly hot, so there's benefit in >having something like stats_temp_directory. > >I'm inclined to just leave the guc / define / directory around, with a >note saying that it's just used by extensions? I had searched before on codesearch.debian.net whether there are external extensions using it, without finding one (just a copy of pgstat.h). Now I searched using https://cs.github.com/ ([1]) and found https://github.com/powa-team/pg_sortstats https://github.com/uptimejp/sql_firewall https://github.com/legrandlegrand/pg_stat_sql_plans https://github.com/ossc-db/pg_store_plans Which seems to weigh in favor of at least keeping the directory and define. They all don't seem to use the guc, but just PG_STAT_TMP_DIR. We currently have code removing files both in pg_stat and the configured pg_stats_temp directory (defaulting to pg_stat_tmp). All files matching global.(stat|tmp), db_[0-9]+.(tmp|stat) are removed. With the shared memory stats patch there's only a single file, so we don't need that anymore. I guess some extension could rely on files being removed somehow, but it's hard to believe, because it'd conflict with the stats collector's files. Greetings, Andres Freund [1] https://cs.github.com/?scopeName=All+repos==PG_STAT_TMP_DIR+NOT+path%3Afilemap.c+NOT+path%3Apgstat.h+NOT+path%3Abasebackup.c+NOT+path%3Apg_stat_statements.c+NOT+path%3Aguc.c
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Nathan Bossart writes: > I think we should give this module a .gitignore file. Patch attached. Indeed, before somebody accidentally commits the cruft that check-world is leaving around. Pushed. regards, tom lane
Re: ORDER BY pushdowns seem broken in postgres_fdw
Ronan Dunklau writes: > [ v8-0001-Fix-orderby-handling-in-postgres_fdw.patch ] I looked through this patch. It's going in the right direction, but I have a couple of nitpicks: 1. There are still some more places that aren't checking shippability of the relevant opfamily. 2. The existing usage of find_em_expr_for_rel is fundamentally broken, because that function will seize on the first EC member that is from the given rel, whether it's shippable or not. There might be another one later that is shippable, so this is just the wrong API. It's not like this function gives us any useful isolation from the details of ECs, because postgres_fdw is already looking into those elsewhere, notably in find_em_expr_for_input_target --- which has the same order-sensitivity bug. I think that instead of doubling down on a wrong API, we should just take that out and move the logic into postgres_fdw.c. This also has the advantage of producing a patch that's much safer to backpatch, because it doesn't rely on the core backend getting updated before postgres_fdw.so is. So hacking on those two points, and doing some additional cleanup, led me to the attached v9. (In this patch, the removal of code from equivclass.c is only meant to be applied to HEAD; we have to leave the function in place in the back branches for API stability.) If no objections, I think this is committable. regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac028..8f4d8a5022 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -40,6 +40,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_namespace.h" #include "catalog/pg_operator.h" +#include "catalog/pg_opfamily.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" #include "commands/defrem.h" @@ -183,6 +184,8 @@ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, Index ignore_rel, List **ignore_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); +static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, +deparse_expr_cxt *context); static void appendAggOrderBy(List *orderList, List *targetList, deparse_expr_cxt *context); static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); @@ -1038,6 +1041,33 @@ is_foreign_param(PlannerInfo *root, return false; } +/* + * Returns true if it's safe to push down the sort expression described by + * 'pathkey' to the foreign server. + */ +bool +is_foreign_pathkey(PlannerInfo *root, + RelOptInfo *baserel, + PathKey *pathkey) +{ + EquivalenceClass *pathkey_ec = pathkey->pk_eclass; + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private; + + /* + * is_foreign_expr would detect volatile expressions as well, but checking + * ec_has_volatile here saves some cycles. + */ + if (pathkey_ec->ec_has_volatile) + return false; + + /* can't push down the sort if the pathkey's opfamily is not shippable */ + if (!is_shippable(pathkey->pk_opfamily, OperatorFamilyRelationId, fpinfo)) + return false; + + /* can push if a suitable EC member exists */ + return (find_em_for_rel(root, pathkey_ec, baserel) != NULL); +} + /* * Convert type OID + typmod info into a type name we can ship to the remote * server. Someplace else had better have verified that this type name is @@ -3445,44 +3475,59 @@ appendAggOrderBy(List *orderList, List *targetList, deparse_expr_cxt *context) { SortGroupClause *srt = (SortGroupClause *) lfirst(lc); Node *sortexpr; - Oid sortcoltype; - TypeCacheEntry *typentry; if (!first) appendStringInfoString(buf, ", "); first = false; + /* Deparse the sort expression proper. */ sortexpr = deparseSortGroupClause(srt->tleSortGroupRef, targetList, false, context); - sortcoltype = exprType(sortexpr); - /* See whether operator is default < or > for datatype */ - typentry = lookup_type_cache(sortcoltype, - TYPECACHE_LT_OPR | TYPECACHE_GT_OPR); - if (srt->sortop == typentry->lt_opr) - appendStringInfoString(buf, " ASC"); - else if (srt->sortop == typentry->gt_opr) - appendStringInfoString(buf, " DESC"); - else - { - HeapTuple opertup; - Form_pg_operator operform; - - appendStringInfoString(buf, " USING "); - - /* Append operator name. */ - opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(srt->sortop)); - if (!HeapTupleIsValid(opertup)) -elog(ERROR, "cache lookup failed for operator %u", srt->sortop); - operform = (Form_pg_operator) GETSTRUCT(opertup); - deparseOperatorName(buf, operform); - ReleaseSysCache(opertup); - } + /* Add decoration as needed. */ + appendOrderBySuffix(srt->sortop, exprType(sortexpr), srt->nulls_first, + context); + } +} - if (srt->nulls_first) - appendStringInfoString(buf, " NULLS
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Hi, On 2022-03-29 11:55:05 -0400, Robert Haas wrote: > I committed v6 instead. I was just discussing the WAL prefetching patch with Thomas. A question in that discussion made me look at the coverage of REDO for CREATE DATABASE: https://coverage.postgresql.org/src/backend/commands/dbcommands.c.gcov.html Seems there's currently nothing hitting the REDO for XLOG_DBASE_CREATE_FILE_COPY (currently line 3019). I think it'd be good to keep coverage for that. How about adding a CREATE DATABASE ... STRATEGY file_copy to 001_stream_rep.pl? Might be worth adding a test for ALTER DATABASE ... SET TABLESPACE at the same time, this patch did affect that path in some minor ways. And, somewhat shockingly, we don't have a single test for it. - Andres
Re: Higher level questions around shared memory stats
Hi, On 2022-03-29 12:17:27 -0700, Andres Freund wrote: > Separate from the minutia in [1] I'd like to discuss a few questions of more > general interest. I'll post another question or two later. 4) What to do with the stats_temp_directory GUC / PG_STAT_TMP_DIR define / pg_stats_temp directory? With shared memory stats patch, the stats system itself doesn't need it anymore. But pg_stat_statements also uses PG_STAT_TMP_DIR to store pgss_query_texts.stat. That file can be fairly hot, so there's benefit in having something like stats_temp_directory. I'm inclined to just leave the guc / define / directory around, with a note saying that it's just used by extensions? Greetings, Andres Freund
Re: POC: GROUP BY optimization
Pushed, after going through the patch once more, running check-world under valgrind, and updating the commit message. Thanks to everyone who reviewed/tested this patch! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgsql: Add 'basebackup_to_shell' contrib module.
I think we should give this module a .gitignore file. Patch attached. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 7466f18b7cb781f4db4919faf15b1b1d3cd7bc7a Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 30 Mar 2022 15:28:38 -0700 Subject: [PATCH 1/1] add .gitignore for basebackup_to_shell --- contrib/basebackup_to_shell/.gitignore | 4 1 file changed, 4 insertions(+) create mode 100644 contrib/basebackup_to_shell/.gitignore diff --git a/contrib/basebackup_to_shell/.gitignore b/contrib/basebackup_to_shell/.gitignore new file mode 100644 index 00..5dcb3ff972 --- /dev/null +++ b/contrib/basebackup_to_shell/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ -- 2.25.1
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On Wed, Mar 30, 2022 at 4:22 PM Tom Lane wrote: > Robert Haas writes: > > I'm going to go ahead and commit this test script later on this > > afternoon unless there are vigorous objections real soon now, and then > > if somebody wants to improve it, great! > > I see you did that, but the CF entry is still open? Fixed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Correct docs re: rewriting indexes when table rewrite is skipped
On Wed, Mar 30, 2022 at 4:33 PM James Coleman wrote: > Hmm, having it match the way it works makes sense. Would you feel > comfortable with an intermediate step (queueing up that as a larger > change) changing the clause to something like "indexes will still have > to be rebuilt unless the system can guarantee that the sort order is > proven to be unchanged" (with appropriate wordsmithing to be a bit > less verbose if possible)? Yeah, that seems fine. It's arguable how much detail we should go into here - but a statement of the form you propose is not misleading, and that's what seems most important to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] vacuumdb --schema only
I took a look at the v4 patch. 'git-apply' complains about whitespace errors: 0001-vacuumdb-schema-only-v4.patch:17: tab in indent. 0001-vacuumdb-schema-only-v4.patch:18: tab in indent. 0001-vacuumdb-schema-only-v4.patch:19: tab in indent. 0001-vacuumdb-schema-only-v4.patch:20: tab in indent. -n 0001-vacuumdb-schema-only-v4.patch:21: tab in indent. --schema warning: squelched 13 whitespace errors warning: 18 lines add whitespace errors. +printf(_(" -N, --exclude-schema=PATTERNdo NOT vacuum tables in the specified schema(s)\n")); I'm personally -1 for the --exclude-schema option. I don't see any existing "exclude" options in vacuumdb, and the uses for such an option seem rather limited. If we can point to specific use-cases for this option, I might be willing to change my vote. + +To clean all tables in the Foo and bar schemas +only in a database named xyzzy: + +$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy + nitpicks: I think the phrasing should be "To only clean tables in the...". Also, is there any reason to use a schema name with a capital letter as an example? IMO that just adds unnecessary complexity to the example. +$node->issues_sql_like( +[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ], +qr/VACUUM "Foo".*/, +'vacuumdb --schema schema only'); IIUC there should only be one table in the schema. Can we avoid matching "*" and check for the exact command instead? I think there should be a few more test cases. For example, we should test using -n and -N at the same time, and we should test what happens when those options are used for missing schemas. +/* + * When filtering on schema name, filter by table is not allowed. + * The schema name can already be set to a fqdn table name. + */ +if (tbl_count && (schemas.head != NULL)) +{ +pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time"); +exit(1); +} I think there might be some useful refactoring we can do that would simplify adding similar options in the future. Specifically, can we have a global variable that stores the type of vacuumdb command (e.g., all, tables, or schemas)? If so, perhaps the tables list could be renamed and reused for schemas (and any other objects that need listing in the future). +if (schemas != NULL && schemas->head != NULL) +{ +appendPQExpBufferStr(_query, + " AND c.relnamespace"); +if (schema_is_exclude == TRI_YES) +appendPQExpBufferStr(_query, +" OPERATOR(pg_catalog.!=) ALL (ARRAY["); +else if (schema_is_exclude == TRI_NO) +appendPQExpBufferStr(_query, +" OPERATOR(pg_catalog.=) ANY (ARRAY["); + +for (cell = schemas->head; cell != NULL; cell = cell->next) +{ +appendStringLiteralConn(_query, cell->val, conn); + +if (cell->next != NULL) +appendPQExpBufferStr(_query, ", "); +} + +/* Finish formatting schema filter */ +appendPQExpBufferStr(_query, "]::pg_catalog.regnamespace[])\n"); +} IMO we should use a CTE for specified schemas like we do for the specified tables. I wonder if we could even have a mostly-shared CTE code path for all vacuumdb commands with a list of names. -/* - * If no tables were listed, filter for the relevant relation types. If - * tables were given via --table, don't bother filtering by relation type. - * Instead, let the server decide whether a given relation can be - * processed in which case the user will know about it. - */ -if (!tables_listed) +else { +/* + * If no tables were listed, filter for the relevant relation types. If + * tables were given via --table, don't bother filtering by relation type. + * Instead, let the server decide whether a given relation can be + * processed in which case the user will know about it. + */ nitpick: This change seems unnecessary. I noticed upthread that there was some discussion around adding a way to specify a schema in VACUUM and ANALYZE commands. I think this patch is useful even if such an option is eventually added, as we'll still want vacuumdb to obtain the full list of tables to process so that it can effectively parallelize. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Higher level questions around shared memory stats
Hi, On 2022-03-30 14:42:23 -0400, Robert Haas wrote: > On Tue, Mar 29, 2022 at 5:01 PM Andres Freund wrote: > > I think it's reasonably rare because in cases there'd be corruption, we'd > > typically not even have written them out / throw them away explicitly - we > > only read stats when starting without crash recovery. > > > > So the "expected" case of corruption afaicts solely is a OS crash just after > > the shutdown checkpoint completed? > > Can we prevent that case from occurring, so that there are no expected cases? We likely can, at least for the causes of corruption I know of. We already write the statsfile into a temporary filename and then rename into place. I think all we'd need to do is to use durable_rename() to make sure it's durable once renamed into place. It's really unrelated to the shared memory stats patch though, so I'd prefer not to tie it to that. > > I can think of these different times: > > > > - Last time stats were removed due to starting up in crash recovery > > - Last time stats were created from scratch, because no stats data file was > > present at startup > > - Last time stats were thrown away due to corruption > > - Last time a subset of stats were reset using one of the pg_reset* > > functions > > > > Makes sense? > > Yes. Possibly that last could be broken in to two: when all stats were > last reset, when some stats were last reset. Believe it or not, we don't currently have a function to reset all stats. We should definitely add that though, because the invocation to reset all stats gets more ridiculous^Wcomplicated with each release. I think the minimal invocation currently is something like: -- reset all stats shared between databases SELECT pg_stat_reset_shared('archiver'); SELECT pg_stat_reset_shared('bgwriter'); SELECT pg_stat_reset_shared('wal'); SELECT pg_stat_reset_replication_slot(NULL); SELECT pg_stat_reset_slru(NULL); SELECT pg_stat_reset_subscription_stats(NULL); -- connect to each database and reset the stats in that database SELECT pg_stat_reset(); I've protested against replication slot, slru, subscription stats not being resettable via pg_stat_reset_shared(), nobody else seemed to care. > > > Does redo update the stats? > > > > With "update" do you mean generate new stats? In the shared memory stats > > patch > > it triggers stats to be dropped, on HEAD it just resets all stats at > > startup. > > > > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. > > Well, I guess what I'm trying to figure out is what happens if we run > in recovery for a long time -- say, a year -- and then get promoted. > Do we have reasons to expect that the stats will be accurate enough to > use at that point, or will they be way off? What do you mean with 'accurate enough'? With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze, n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay process doesn't update them. In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read, idx_tup_fetch, ...} will be accurate, with one exception below. pg_stat_bgwriter, pg_stat_wal, etc will always be accurate. On HEAD, there may be a lot of dead stats for dropped databases / tables / functions that have been dropped since the start of the cluster. They will eventually get removed, once autovacuum starts running in the respective database (i.e. pgstat_vacuum_stat() gets run). The exception noted above is that because pg_stat_all_tables contents are never removed during recovery, it becomes a lot more plausible for oid conflicts to occur. So the stats for two different tables might get added up accidentally - but that'll just affect the non-zero columns, of course. With the shared memory stats patch, stats for dropped objects (i.e. databases, tables, ... ) are removed shortly after they have been dropped, so that conflict risk doesn't exist anymore. So I don't think increasing inaccuracy is a reason to throw away stats on replica startup. Particularly because we already don't throw them away when promoting the replica, just when having started it last. > I don't have a great understanding of how this all works, but if > running recovery for a long time is going to lead to a situation where > the stats progressively diverge from reality, then preserving them > doesn't seem as valuable as if they're going to be more or less > accurate. Minus the oid wraparound risk on HEAD, the only way they increasingly diverge is that the '0' in a bunch of pg_stat_all_tables columns might get less and less accurate. But that's not the type of divergence you're talking about, I think. Greetings, Andres Freund
Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
On Wed, Mar 30, 2022 at 1:39 PM Andres Freund wrote: > Hi, > > On 2022-03-30 12:29:51 -0700, David G. Johnston wrote: > > On Wednesday, March 30, 2022, Andres Freund wrote: > > > My current proposal is to just have two reset times. One for the > contents > > > of > > > pg_stat_database (i.e. not affected by > pg_stat_reset_single_*_counters()), > > > and > > > one for stats within the entire database. > > > What IS it affected by? And does whatever affects it affect anything > else? > > pg_stat_reset() resets the current database's stats. That includes the > database's row in pg_stat_database and all table and function stats. > > Right, so basically it updates both of the fields you are talking about. The existing stats_reset field is also updated upon calling pg_stat_reset_single_*_counters() So when the two fields are different we know that at least one relation or function statistic row is out-of-sync with the rest of the database, we just don't know which one(s). This is an improvement over the status quo where the single timestamp cannot be trusted to mean anything useful. The DBA can execute pg_stat_reset() to get the statistics back into a common state. As an added bonus we will always have a reference timestamp for when the pg_stat_database database record was last reset (as well as any other statistic record that can only be reset by using pg_stat_reset). David J.
Re: CLUSTER on partitioned index
On 2022-Feb-23, Justin Pryzby wrote: > I hope that Alvaro will comment on the simplified patches. If multiple people > think the patch isn't worth it, feel free to close it. But I don't see how > complexity could be the reason. I gave your patch a look and it seems a reasonable thing to do. Maybe not terribly useful in most cases, but there may be some cases for which it is. I found some part of it a bit repetitive, so I moved things around a bit. What do think about this? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was amazing when I first started using it at 7.2, and I'm continually astounded by learning new features and techniques made available by the continuing work of the development team." Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index 86f5fdc469..b3463ae5c4 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -196,6 +196,12 @@ CLUSTER [VERBOSE] in the pg_stat_progress_cluster view. See for details. + + +Clustering a partitioned table clusters each of its partitions using the +partition of the specified partitioned index (which must be specified). + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 02a7e94bf9..8417cbdb67 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/defrem.h" @@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, bool *pSwapToastByContent, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, ClusterParams *params); /*--- @@ -105,6 +110,10 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) ListCell *lc; ClusterParams params = {0}; bool verbose = false; + Relation rel = NULL; + Oid indexOid = InvalidOid; + MemoryContext cluster_context; + List *rtcs; /* Parse option list */ foreach(lc, stmt->params) @@ -126,11 +135,13 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) if (stmt->relation != NULL) { /* This is the single-relation case. */ - Oid tableOid, - indexOid = InvalidOid; - Relation rel; + Oid tableOid; - /* Find, lock, and check permissions on the table */ + /* + * Find, lock, and check permissions on the table. We obtain + * AccessExclusiveLock right away to avoid lock-upgrade hazard in the + * single-transaction case. + */ tableOid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock, 0, @@ -146,14 +157,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot cluster temporary tables of other sessions"))); - /* - * Reject clustering a partitioned table. - */ - if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a partitioned table"))); - if (stmt->indexname == NULL) { ListCell *index; @@ -188,71 +191,100 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) stmt->indexname, stmt->relation->relname))); } - /* close relation, keep lock till commit */ - table_close(rel, NoLock); + if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + /* close relation, keep lock till commit */ + table_close(rel, NoLock); - /* Do the job. */ - cluster_rel(tableOid, indexOid, ); + /* Do the job. */ + cluster_rel(tableOid, indexOid, ); + + return; + } + } + + /* + * By here, we know we are in a multi-table situation. In order to avoid + * holding locks for too long, we want to process each table in its own + * transaction. This forces us to disallow running inside a user + * transaction block. + */ + PreventInTransactionBlock(isTopLevel, "CLUSTER"); + + /* Also, we need a memory context to hold our list of relations */ + cluster_context = AllocSetContextCreate(PortalContext, + "Cluster", + ALLOCSET_DEFAULT_SIZES); + + /* + * Either we're processing a partitioned table, or we were not given any + * table name at all. In either case, obtain a list of relations to + * process. + * + * In
Re: Higher level questions around shared memory stats
Hi, On 2022-03-30 21:44:20 +0200, Peter Eisentraut wrote: > On 29.03.22 23:01, Andres Freund wrote: > > > I think what's actually most important here is the error reporting. We > > > need > > > to make it clear, at least via log messages, that something bad has > > > happened. > > The message currently (on HEAD, but similarly on the path) is: > > ereport(pgStatRunningInCollector ? LOG : > > WARNING, > > (errmsg("corrupted statistics > > file \"%s\"", > > statfile))); > > Corrupted how? We can't parse it. Which can mean that it's truncated (we notice this because we expect an 'E' as the last byte), bits flipped in the wrong place (there's different bytes indicating different types of stats). Corruption within individual stats aren't detected. Note that this is very old code / behaviour, not meaningfully affected by shared memory stats patch. > How does it know? Is there a checksum, was the file the wrong length, what > happened? I think this could use more detail. I agree. But it's independent of the shared memory stats patch, so I don't want to tie improving it to that already huge patch. Greetings, Andres Freund
Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
Hi, On 2022-03-30 12:29:51 -0700, David G. Johnston wrote: > On Wednesday, March 30, 2022, Andres Freund wrote: > > My current proposal is to just have two reset times. One for the contents > > of > > pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()), > > and > > one for stats within the entire database. > What IS it affected by? And does whatever affects it affect anything else? pg_stat_reset() resets the current database's stats. That includes the database's row in pg_stat_database and all table and function stats. Greetings, Andres Freund
Re: Correct docs re: rewriting indexes when table rewrite is skipped
On Wed, Mar 30, 2022 at 11:41 AM Robert Haas wrote: > > On Wed, Mar 30, 2022 at 10:04 AM James Coleman wrote: > > Admittedly I hadn't thought of that case. But isn't it already covered > > in the existing docs by the phrase "or an unconstrained domain over > > the new type"? I don't love the word "or" there because there's a > > sense in which the first clause "binary coercible to the new type" is > > still accurate for your example unless you narrowly separate "domain" > > and "type", but I think that narrow distinction is what's technically > > there already. > > > > That being said, I could instead of removing the clause entirely > > replace it with something like "indexes may still need to be rebuilt > > when the new type is a constrained domain". > > You're talking about this as if the normal cases is that indexes don't > need to rebuilt, but sometimes they do. However, if I recall > correctly, the way the code is structured, it supposes that the > indexes do need a rebuild, and then tries to prove that they actually > don't. That disconnect makes me nervous, because it seems to me that > it could easily lead to a situation where our documentation contains > subtle errors. I think somebody should go through and study the > algorithm as it exists in the code, and then write documentation to > match it. And I think that when they do that, they should approach it > from the same point of view that the code does i.e. "these are the > conditions for skipping the index rebuild" rather than "these are the > conditions for performing an index rebuild." By doing it that way, I > think we minimize the likelihood of disconnects between code and > documentation, and also make it easier to update in future if the > algorithm gets changed. Hmm, having it match the way it works makes sense. Would you feel comfortable with an intermediate step (queueing up that as a larger change) changing the clause to something like "indexes will still have to be rebuilt unless the system can guarantee that the sort order is proven to be unchanged" (with appropriate wordsmithing to be a bit less verbose if possible)? Thanks, James Coleman
Re: refactoring basebackup.c (zstd workers)
On Wed, Mar 30, 2022 at 04:14:47PM -0400, Tom Lane wrote: > Robert Haas writes: > >> Maybe if I just put that last sentence into the comment it's clear enough? > > > Done that way, since I thought it was better to fix the bug than wait > > for more feedback on the wording. We can still adjust the wording, or > > the coding, if it's not clear enough. > > FWIW, I thought that explanation was fine, but I was deferring to > Justin who was the one who thought things were unclear. I still think it's unnecessarily confusing to nest "if" and "?:" conditionals in one statement, instead of 2 or 3 separate "if"s, or "||"s. But it's also not worth fussing over any more.
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Robert Haas writes: > I'm going to go ahead and commit this test script later on this > afternoon unless there are vigorous objections real soon now, and then > if somebody wants to improve it, great! I see you did that, but the CF entry is still open? regards, tom lane
Re: refactoring basebackup.c (zstd workers)
Robert Haas writes: >> Maybe if I just put that last sentence into the comment it's clear enough? > Done that way, since I thought it was better to fix the bug than wait > for more feedback on the wording. We can still adjust the wording, or > the coding, if it's not clear enough. FWIW, I thought that explanation was fine, but I was deferring to Justin who was the one who thought things were unclear. regards, tom lane
Re: refactoring basebackup.c (zstd workers)
On Tue, Mar 29, 2022 at 8:51 AM Robert Haas wrote: > On Mon, Mar 28, 2022 at 8:11 PM Tom Lane wrote: > > I suspect Robert wrote it that way intentionally --- but if so, > > I agree it could do with more than zero commentary. > > Well, the point is, we stop advancing kwend when we get to the end of > the keyword, and *vend when we get to the end of the value. If there's > a value, the end of the keyword can't have been the end of the string, > but the end of the value might have been. If there's no value, the end > of the keyword could be the end of the string. > > Maybe if I just put that last sentence into the comment it's clear enough? Done that way, since I thought it was better to fix the bug than wait for more feedback on the wording. We can still adjust the wording, or the coding, if it's not clear enough. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Higher level questions around shared memory stats
On 29.03.22 23:01, Andres Freund wrote: I think what's actually most important here is the error reporting. We need to make it clear, at least via log messages, that something bad has happened. The message currently (on HEAD, but similarly on the path) is: ereport(pgStatRunningInCollector ? LOG : WARNING, (errmsg("corrupted statistics file \"%s\"", statfile))); Corrupted how? How does it know? Is there a checksum, was the file the wrong length, what happened? I think this could use more detail.
Re: Commitfest Update
On 30.03.22 20:41, Greg Stark wrote: Patches that are Waiting on Author and haven't had activity in months -- traditionally they were set to Returned with Feedback. It seems the feeling these days is to not lose state on them and just move them to the next CF. I'm not sure that's wise, it ends up just filling up the list with patches nobody's working on. If you set them to Returned with Feedback now, they can still be reawoken later by setting them to Needs Review and pulling them into the then-next commit fest. That preserves all the state and context.
Re: Possible fails in pg_stat_statements test
On Wed, Mar 30, 2022 at 2:20 AM Anton A. Melnikov wrote: > > Can the test failures be encountered without such an elaborate setup? If > > not, > > then I don't really see why we need to do anything here? > > There was a real bug report from our test department. They do long time > repetitive tests and sometimes met this failure. > So i suppose there is a non-zero probability that such error can occur > in the one-shot test as well. > The sequence given in the first letter helps to catch this failure quickly. I don't think that the idea of "extra" WAL records is very principled. It's pretty vague what "extra" means, and your definition seems to be basically "whatever would be needed to make this test case pass." I think the problem is basically with the test cases's idea that # of WAL records and # of table rows ought to be equal. I think that's just false. In general, we'd also have to worry about index insertions, which would provoke variable numbers of WAL records depending on whether they cause a page split. And we'd have to worry about TOAST table insertions, which could produce different numbers of records depending on the size of the data, the configured block size and TOAST threshold, and whether the TOAST table index incurs a page split. So even if we added a mechanism like what you propose here, we would only be fixing this particular test case, not creating infrastructure of any general utility. If it's true that this test case sometimes randomly fails, then we ought to fix that somehow, maybe by just removing this particular check from the test case, or changing it to >=, or something like that. But I don't think adding a new counter is the right idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
On Wednesday, March 30, 2022, Andres Freund wrote: > > My current proposal is to just have two reset times. One for the contents > of > pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()), > and > one for stats within the entire database. > > What IS it affected by? And does whatever affects it affect anything else? David J.
Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
Hi, On 2022-03-30 14:57:25 -0400, Robert Haas wrote: > On Wed, Mar 23, 2022 at 8:55 PM Andres Freund wrote: > > This behaviour can be trivially (and is) implemented for the shared memory > > stats patch. But every time I read over that part of the code it feels just > > profoundly wrong to me. Way worse than *not* resetting > > pg_stat_database.stats_reset. > > > > Anybody that uses the time since the stats reset as part of a calculation of > > transactions / sec, reads / sec or such will get completely bogus results > > after a call to pg_stat_reset_single_table_counters(). > > Sure, but that's unavoidable anyway. If some stats have been reset and > other stats have not, you can't calculate a meaningful average over > time unless you have a specific reset time for each statistic. Individual pg_stat_database columns can't be reset independently. Other views summarizing large parts of the system, like pg_stat_bgwriter, pg_stat_wal etc have a stats_reset column that is only reset if their counters is also reset. So the only reason we can't do that for pg_stat_database is that we don't know since when pg_stat_database counters are counting. > To me, the current behavior feels more correct than what you propose. > Imagine for example that you are looking for tables/indexes where the > counters are 0 as a way of finding unused objects. If you know that no > counters have been zeroed in a long time, you know that this is > reliable. But under your proposal, there's no way to know this. All > you know is that the entire system wasn't reset, and therefore some of > the 0s that you are seeing might be for individual objects that were > reset. My current proposal is to just have two reset times. One for the contents of pg_stat_database (i.e. not affected by pg_stat_reset_single_*_counters()), and one for stats within the entire database. > I think of this mechanism like as answering the question "When's the > last time anybody tinkered with this thing by hand?". If it's recent, > the tinkering has a good chance of being related to whatever problem > I'm trying to solve. If it's not, it's probably unrelated. When I look at a database with a problem, I'll often look at pg_stat_database to get a first impression of the type of workload running. The fact that stats_reset doesn't reflect the age of other pg_stat_database columns makes that much harder. Greetings, Andres Freund
Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
On Sat, Feb 26, 2022 at 03:07:20PM -0600, Justin Pryzby wrote: > Rebased over ebf6c5249b7db525e59563fb149642665c88f747. > It looks like that patch handles only query_id, and this patch also tries to > handle a bunch of other stuff. > > If it's helpful, feel free to kick this patch to a future CF. Rebased over MERGE. >From 058bf205cbac68baa6ff539893d934115d9927f9 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 22 Feb 2020 21:17:10 -0600 Subject: [PATCH 1/6] Add GUC: explain_regress This changes the defaults for explain to: costs off, timing off, summary off. It'd be reasonable to use this for new regression tests which are not intended to be backpatched. --- contrib/postgres_fdw/postgres_fdw.c | 2 +- src/backend/commands/explain.c | 13 +++-- src/backend/utils/misc/guc.c| 13 + src/bin/psql/describe.c | 2 +- src/include/commands/explain.h | 2 ++ src/test/regress/expected/explain.out | 3 +++ src/test/regress/expected/inherit.out | 2 +- src/test/regress/expected/stats_ext.out | 2 +- src/test/regress/pg_regress.c | 3 ++- src/test/regress/sql/explain.sql| 4 src/test/regress/sql/inherit.sql| 2 +- src/test/regress/sql/stats_ext.sql | 2 +- 12 files changed, 41 insertions(+), 9 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 56654844e8f..54da40d6628 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3124,7 +3124,7 @@ estimate_path_cost_size(PlannerInfo *root, * values, so don't request params_list. */ initStringInfo(); - appendStringInfoString(, "EXPLAIN "); + appendStringInfoString(, "EXPLAIN (COSTS)"); deparseSelectStmtForRel(, root, foreignrel, fdw_scan_tlist, remote_conds, pathkeys, fpextra ? fpextra->has_final_sort : false, diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index cb13227db1f..de107ea5026 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -154,6 +154,7 @@ static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); static void escape_yaml(StringInfo buf, const char *str); +bool explain_regress = false; /* GUC */ /* @@ -172,6 +173,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, ListCell *lc; bool timing_set = false; bool summary_set = false; + bool costs_set = false; /* Parse options list. */ foreach(lc, stmt->options) @@ -183,7 +185,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, else if (strcmp(opt->defname, "verbose") == 0) es->verbose = defGetBoolean(opt); else if (strcmp(opt->defname, "costs") == 0) + { + /* Need to keep track if it was explicitly set to ON */ + costs_set = true; es->costs = defGetBoolean(opt); + } else if (strcmp(opt->defname, "buffers") == 0) es->buffers = defGetBoolean(opt); else if (strcmp(opt->defname, "wal") == 0) @@ -227,13 +233,16 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, parser_errposition(pstate, opt->location))); } + /* if the costs option was not set explicitly, set default value */ + es->costs = (costs_set) ? es->costs : es->costs && !explain_regress; + if (es->wal && !es->analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("EXPLAIN option WAL requires ANALYZE"))); /* if the timing was not set explicitly, set default value */ - es->timing = (timing_set) ? es->timing : es->analyze; + es->timing = (timing_set) ? es->timing : es->analyze && !explain_regress; /* check that timing is used with EXPLAIN ANALYZE */ if (es->timing && !es->analyze) @@ -242,7 +251,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, errmsg("EXPLAIN option TIMING requires ANALYZE"))); /* if the summary was not set explicitly, set default value */ - es->summary = (summary_set) ? es->summary : es->analyze; + es->summary = (summary_set) ? es->summary : es->analyze && !explain_regress; query = castNode(Query, stmt->query); if (IsQueryIdEnabled()) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index eb3a03b9762..acd32fc229c 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -47,6 +47,7 @@ #include "catalog/pg_authid.h" #include "catalog/storage.h" #include "commands/async.h" +#include "commands/explain.h" #include "commands/prepare.h" #include "commands/tablespace.h" #include "commands/trigger.h" @@ -1478,6 +1479,18 @@ static struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + + { + {"explain_regress", PGC_USERSET, DEVELOPER_OPTIONS, + gettext_noop("Change defaults of EXPLAIN to avoid unstable output."), + NULL, + GUC_NOT_IN_SAMPLE | GUC_EXPLAIN + }, + _regress, + false, + NULL, NULL, NULL + }, + { {"log_parser_stats", PGC_SUSET, STATS_MONITORING,
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Andres Freund writes: > On 2022-03-30 13:16:47 -0400, Robert Haas wrote: >> On Wed, Mar 30, 2022 at 1:11 PM Andres Freund wrote: >>> My concern is about the quote in the middle of the path, not about quoting >>> at all... I.e. the ' should be after /tmp_install, not before. >> Makes no difference. We know that the string /tmp_install contains no >> shell metacharacters, so why does it need to be in quotes? I would've >> probably written it the way it is here, rather than what you are >> proposing. > It looks ugly, and it can't be copy-pasted as easily. Seems I'm alone on this, > so I'll leave it be... FWIW, I agree with Andres that I'd probably have put the quote at the end. But Robert is right that it's functionally equivalent; so I doubt it's worth changing. regards, tom lane
Re: Patch to avoid orphaned dependencies
On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> Bertand, do you think this has any chance of making it into v15? If not, >> are you alright with adjusting the commitfest entry to v16 and moving it to >> the next commitfest? > > I looked this over briefly, and IMO it should have no chance of being > committed in anything like this form. I marked the commitfest entry as waiting-on-author, set the target version to v16, and moved it to the next commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-30 13:16:47 -0400, Robert Haas wrote: > On Wed, Mar 30, 2022 at 1:11 PM Andres Freund wrote: > > On March 30, 2022 9:58:26 AM PDT, Tom Lane wrote: > > >Andres Freund writes: > > >> Random aside: Am I the only one bothered by a bunch of places in > > >> Makefile.global.in quoting like > > >> $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install > > >> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 > > >> and > > >> rm -rf '$(CURDIR)'/tmp_check && > > >> etc > > > > > >Don't we need that to handle, say, build paths with spaces in them? > > > > My concern is about the quote in the middle of the path, not about quoting > > at all... I.e. the ' should be after /tmp_install, not before. > > Makes no difference. We know that the string /tmp_install contains no > shell metacharacters, so why does it need to be in quotes? I would've > probably written it the way it is here, rather than what you are > proposing. It looks ugly, and it can't be copy-pasted as easily. Seems I'm alone on this, so I'll leave it be... Greetings, Andres Freund
Re: proposal: enhancing plpgsql debug API - returns text value of variable content
Greg Stark writes: > It looks like this is -- like a lot of plpgsql patches -- having > difficulty catching the attention of reviewers and committers. I was hoping that someone with more familiarity with pldebugger would comment on the suitableness of this patch for their desires. But nobody's stepped up, so I took a look through this. It looks like there are several different things mashed into this patch: 1. Expose exec_eval_datum() to plugins. OK; I see that pldebugger has code that duplicates that functionality (and not terribly well). 2. Expose do_cast_value() to plugins. Mostly OK, but shouldn't we expose exec_cast_value() instead? Otherwise it's on the caller to make sure it doesn't ask for a no-op cast, which seems like a bad idea; not least because the example usage in get_string_value() fails to do so. 3. Store relevant PLpgSQL_nsitem chain link in each PLpgSQL_stmt. This makes me itch, for a number of reasons: * I was a bit astonished that it even works; I'd thought that the nsitem data structure is transient data thrown away when we finish compiling. I see now that that's not so, but do we really want to nail down that that can't ever be improved? * This ties us forevermore to the present, very inefficient, nsitem list data structure. Sooner or later somebody is going to want to improve that linear search, and what then? * The space overhead seems nontrivial; many PLpgSQL_stmt nodes are not very big. * The code implications are way more subtle than you would think from inspecting this totally-comment-free patch implementation. In particular, the fact that the nsitem chain pointed to by a plpgsql_block is the right thing depends heavily on exactly where in the parse sequence we capture the value of plpgsql_ns_top(). That could be improved with a comment, perhaps. I think that using the PLpgSQL_nsitem chains to look up variables in a debugger is just the wrong thing. The right thing is to crawl up the statement tree, and when you see a PLpgSQL_stmt_block or loop construct, examine the associated datums. I'll concede that crawling *up* the tree is hard, as we only store down-links. Now a plugin could fix that by itself, by recursively traversing the statement tree one time and recording parent relationships in its own data structure (say, an array of parent-statement pointers indexed by stmtid). Or we could add parent links in the statement tree, though I remain concerned about the space cost. On the whole I prefer the first way, because (a) we don't pay the overhead when it's not needed, and (b) a plugin could use it even in existing release branches. BTW, crawling up the statement tree would also be a far better answer than what's shown in the patch for locating surrounding for-loops. So my inclination is to accept the additional function pointers (modulo pointing to exec_cast_value) but reject the nsitem additions. Not sure what to do with test_dbgapi. There's some value in exercising the find_rendezvous_variable mechanism, but I'm dubious that that justifies a whole test module. regards, tom lane
Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
On Wed, Mar 23, 2022 at 8:55 PM Andres Freund wrote: > This behaviour can be trivially (and is) implemented for the shared memory > stats patch. But every time I read over that part of the code it feels just > profoundly wrong to me. Way worse than *not* resetting > pg_stat_database.stats_reset. > > Anybody that uses the time since the stats reset as part of a calculation of > transactions / sec, reads / sec or such will get completely bogus results > after a call to pg_stat_reset_single_table_counters(). Sure, but that's unavoidable anyway. If some stats have been reset and other stats have not, you can't calculate a meaningful average over time unless you have a specific reset time for each statistic. To me, the current behavior feels more correct than what you propose. Imagine for example that you are looking for tables/indexes where the counters are 0 as a way of finding unused objects. If you know that no counters have been zeroed in a long time, you know that this is reliable. But under your proposal, there's no way to know this. All you know is that the entire system wasn't reset, and therefore some of the 0s that you are seeing might be for individual objects that were reset. I think of this mechanism like as answering the question "When's the last time anybody tinkered with this thing by hand?". If it's recent, the tinkering has a good chance of being related to whatever problem I'm trying to solve. If it's not, it's probably unrelated. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Avoiding smgrimmedsync() during nbtree index builds
It looks like this patch received a review from Andres and hasn't been updated since. I'm not sure but the review looks to me like it's not ready to commit and needs some cleanup or at least to check on a few things. I guess it's not going to get bumped in the next few days so I'll move it to the next CF.
Re: Commitfest Update
On Wed, Mar 30, 2022 at 2:42 PM Greg Stark wrote: > Patches that are Waiting on Author and haven't had activity in months > -- traditionally they were set to Returned with Feedback. It seems the > feeling these days is to not lose state on them and just move them to > the next CF. I'm not sure that's wise, it ends up just filling up the > list with patches nobody's working on. Yes, we should mark those Returned with Feedback or some other status that causes them not to get carried forward. The CF is full of stuff that isn't likely to get committed any time in the foreseeable future, and that's really unhelpful. -- Robert Haas EDB: http://www.enterprisedb.com
Re: range_agg with multirange inputs
This patch has been committed. I split it into a few pieces. On 12.03.22 04:18, Paul Jungwirth wrote: On 3/10/22 14:07, Chapman Flack wrote: When I apply this patch, I get a func.sgml with two entries for range_intersect_agg(anymultirange). Arg, fixed. In range_agg_transfn, you've changed the message in the "must be called with a range or multirange"; that seems like another good candidate to be an elog. Agreed. Updated here. I kept those messages as "range" or "multirange" separately, instead of "range or multirange". This way, we don't have to update all the messages of this kind when a new function is added. Since these are only internal messages anyway, I opted for higher maintainability.
Re: Higher level questions around shared memory stats
On Tue, Mar 29, 2022 at 5:01 PM Andres Freund wrote: > I think it's reasonably rare because in cases there'd be corruption, we'd > typically not even have written them out / throw them away explicitly - we > only read stats when starting without crash recovery. > > So the "expected" case of corruption afaicts solely is a OS crash just after > the shutdown checkpoint completed? Can we prevent that case from occurring, so that there are no expected cases? > > And maybe we should have, inside the stats system, something that > > keeps track of when the stats file was last recreated from scratch because > > of a corruption event, separately from when it was last intentionally reset. > > That would be easy to add. Don't think we have a good place to show the > information right now - perhaps just new functions not part of any view? I defer to you on where to put it. > I can think of these different times: > > - Last time stats were removed due to starting up in crash recovery > - Last time stats were created from scratch, because no stats data file was > present at startup > - Last time stats were thrown away due to corruption > - Last time a subset of stats were reset using one of the pg_reset* functions > > Makes sense? Yes. Possibly that last could be broken in to two: when all stats were last reset, when some stats were last reset. > > Does redo update the stats? > > With "update" do you mean generate new stats? In the shared memory stats patch > it triggers stats to be dropped, on HEAD it just resets all stats at startup. > > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. Well, I guess what I'm trying to figure out is what happens if we run in recovery for a long time -- say, a year -- and then get promoted. Do we have reasons to expect that the stats will be accurate enough to use at that point, or will they be way off? I don't have a great understanding of how this all works, but if running recovery for a long time is going to lead to a situation where the stats progressively diverge from reality, then preserving them doesn't seem as valuable as if they're going to be more or less accurate. -- Robert Haas EDB: http://www.enterprisedb.com
Commitfest Update
Since the previous update 30 additional patches have been committed from the CF. This leaves us with 120 Needs Review and 20 Ready for Committer. There's only a few days left until the end of the month. * Add comment about startup process getting a free procState array slot always * Consistent use of SSL/TLS in docs * Allow COPY "text" to output a header and add header matching mode to COPY FROM * Enable SSL library detection via PQsslAttribute * Fully WAL logged CREATE DATABASE - No Checkpoints * Skipping logical replication transactions on subscriber side * Add system view pg_ident_file_mappings * document the need to analyze partitioned tables * Expose get_query_def() * JSON path numeric literal syntax * use has_privs_for_role for predefined roles * Support for MERGE * Column filtering in logical replication * Corruption during WAL replay * Doc patch for retryable xacts * pg_statio_all_tables: several rows per table due to invalid TOAST index * Parameter for planner estimates of recursive queries * logical decoding and replication of sequences * Add relation and block-level filtering to pg_waldump * pgbench - allow retries on some errors * pgcrypto: Remove internal padding implementation * Preserving db/ts/relfilenode OIDs across pg_upgrade * Add new reloption to views for enabling row level security * Fix handling of outer GroupingFunc within subqueries * Fix concurrent deadlock scenario with DROP INDEX on partitioned index * Fix bogus dependency management for GENERATED expressions * Fix firing of RI triggers during cross-partition updates of partitioned tables * Add fix to table_to_xmlschema regex when timestamp has time zone * ltree_gist indexes broken after pg_upgrade from 12 * ExecTypeSetColNames is fundamentally broken I'm going to start moving any patches that are Waiting on Author to the next CF if they made any progress recently. Patches that are Waiting on Author and haven't had activity in months -- traditionally they were set to Returned with Feedback. It seems the feeling these days is to not lose state on them and just move them to the next CF. I'm not sure that's wise, it ends up just filling up the list with patches nobody's working on. -- greg
Re: Mark all GUC variable as PGDLLIMPORT
On Fri, Feb 18, 2022 at 7:02 PM Andres Freund wrote: > On 2022-02-15 08:06:58 -0800, Andres Freund wrote: > > The more I think about it the more I'm convinced that if we want to do this, > > we should do it for variables and functions. > > Btw, if we were to do this, we should just use -fvisibility=hidden everywhere > and would see the same set of failures on unixoid systems as on windows. Of > course only in in-core extensions, but it'd still be better than nothing. Let's be less ambitious for this release, and just get the variables marked with PGDLLIMPORT. We seem to have consensus to create parity between Windows and non-Windows builds, which means precisely applying PGDLLIMPORT to variables marked in header files, and nothing more. The merits of -fvisibility=hidden or PGDLLIMPORT on functions are a separate question that can be debated on its own merits, but I don't want that larger discussion to bog down this effort. Here are updated patches for that. @RMT: Andres proposed upthread that we should plan to do this just after feature freeze. Accordingly I propose to commit at least 0002 and perhaps 0001 if people want it just after feature freeze. I therefore ask that the RMT either (a) regard this change as not being a feature (and thus not subject to the freeze) or (b) give it a 1-day extension. The reason for committing it just after freeze is to minimize the number of conflicts that it creates for other patches. The reason why that's probably an OK thing to do is that applying PGDLLIMPORT markings is low-risk. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Dumb-script-to-apply-PGDLLIMPORT-markings.patch Description: Binary data v2-0002-Plaster-PGDLLIMPORT-declarations-on-header-files.patch Description: Binary data
Re: Logical insert/update/delete WAL records for custom table AMs
On Thu, 2022-02-24 at 20:35 +, Simon Riggs wrote: > The approach is perfectly fine, it just needs to be finished and > rebased. Attached a new version. The scope expanded, so this is likely to slip v15 at this late time. For 15, I'll focus on my extensible rmgr work, which can serve similar purposes. The main purpose of this patch is to be able to emit logical events for a table (insert/update/delete/truncate) without actually modifying a table or relying on the heap at all. That allows table AMs to support logical decoding/replication, and perhaps allows a few other interesting use cases (maybe foreign tables?). There are really two advantages of this approach over relying on a custom rmgr: 1. Easier for extension authors 2. Doesn't require an extension module to be loaded to start the server Those are very nice advantages, but they come at the price of flexibility and performance. I think there's room for both, and we can discuss the merits individually. Changes: * Support logical messages for INSERT/UPDATE/DELETE/TRUNCATE (before it only supported INSERT) * SQL functions pg_logical_emit_insert/update/delete/truncate (callable by superuser) * Tests (using test_decoding) * Use replica identities properly * In general a lot of cleanup, but still not quite ready TODO: * Should SQL functions be callable by the table owner? I would lean toward superuser-only, because logical replication is used for administrative purposes like upgrades, and I don't think we want table owners to be able to create inconsistencies. * Support for multi-insert * Docs for SQL functions, and maybe docs in the section on Generic WAL * Try to get away from reliance on heap tuples specifically Regards, Jeff Davis From f7b85aea60b06eb7019befec38566b5e014bea12 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Sat, 30 Oct 2021 12:07:35 -0700 Subject: [PATCH] Logical wal. --- contrib/test_decoding/expected/messages.out | 148 +++ contrib/test_decoding/sql/messages.sql| 58 +++ src/backend/access/heap/heapam.c | 4 +- src/backend/access/rmgrdesc/Makefile | 2 +- .../{logicalmsgdesc.c => logicaldesc.c} | 45 +- src/backend/access/transam/rmgr.c | 2 +- src/backend/replication/logical/Makefile | 2 +- src/backend/replication/logical/decode.c | 275 ++-- .../replication/logical/logical_xlog.c| 399 ++ .../replication/logical/logicalfuncs.c| 165 +++- src/backend/replication/logical/message.c | 89 src/bin/pg_waldump/.gitignore | 2 +- src/bin/pg_waldump/rmgrdesc.c | 2 +- src/include/access/heapam.h | 2 + src/include/access/heapam_xlog.h | 3 + src/include/access/rmgrlist.h | 2 +- src/include/catalog/pg_proc.dat | 17 + src/include/replication/decode.h | 2 +- src/include/replication/logical_xlog.h| 124 ++ src/include/replication/message.h | 41 -- 20 files changed, 1211 insertions(+), 173 deletions(-) rename src/backend/access/rmgrdesc/{logicalmsgdesc.c => logicaldesc.c} (59%) create mode 100644 src/backend/replication/logical/logical_xlog.c delete mode 100644 src/backend/replication/logical/message.c create mode 100644 src/include/replication/logical_xlog.h delete mode 100644 src/include/replication/message.h diff --git a/contrib/test_decoding/expected/messages.out b/contrib/test_decoding/expected/messages.out index c75d40190b6..aa284bc37c2 100644 --- a/contrib/test_decoding/expected/messages.out +++ b/contrib/test_decoding/expected/messages.out @@ -91,6 +91,154 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'for -- (0 rows) +-- no data in this table, but emit logical INSERT/UPDATE/DELETE for it +CREATE TABLE dummy(i int, t text, n numeric, primary key(t)); +SELECT pg_logical_emit_insert('dummy', row(1, 'one', 0.1)::dummy) <> '0/0'::pg_lsn; + ?column? +-- + t +(1 row) + +SELECT pg_logical_emit_insert('dummy', row(2, 'two', 0.2)::dummy) <> '0/0'::pg_lsn; + ?column? +-- + t +(1 row) + +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'force-binary', '0', 'skip-empty-xacts', '1', 'include-xids', '0'); + data +--- + BEGIN + table public.dummy: INSERT: i[integer]:1 t[text]:'one' n[numeric]:0.1 + COMMIT + BEGIN + table public.dummy: INSERT: i[integer]:2 t[text]:'two' n[numeric]:0.2 + COMMIT +(6 rows) + +SELECT * FROM dummy; + i | t | n +---+---+--- +(0 rows) + +SELECT pg_logical_emit_delete('dummy', row(12, 'twelve', 0.12)::dummy) <> '0/0'::pg_lsn; + ?column? +-- + t +(1 row) + +SELECT pg_logical_emit_update('dummy', row(15, 'fifteen', 0.15)::dummy, + row(16, 'sixteen', 0.16)::dummy) <>
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On Thu, Mar 31, 2022 at 5:23 AM Andres Freund wrote: > On 2022-03-26 13:51:35 -0700, Andres Freund wrote: > > Prototype patch attached. > > Because I forgot about cfbot when attaching the patch, cfbot actually ran with > it under this thread's cf entry. It does look like an improvement to me: > https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900 > > We certainly can do better, but it's sufficiently better than what we have > right now. So I'd like to commit it? Nice, this will save a lot of time scrolling around trying to figure out what broke. > > Would we want to do this in all branches? I'd vote for yes, but ... > > Unless somebody speaks in favor of doing this across branches, I'd just go for > HEAD. I don't see any reason not to do it on all branches. If anyone is machine-processing the output and cares about format changes they will be happy about the improvement.
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
This patch is now failing in the sqljson regression test. It looks like it's just the ordering of the elements in json_arrayagg() calls which may actually be a faulty test that needs more ORDER BY clauses rather than any issues with the code. Nonetheless it's something that needs to be addressed before this patch could be applied. Given it's gotten some feedback from Ronan and this regression test failure I'll move it to Waiting on Author but we're near the end of the CF and it'll probably be moved forward soon. On Thu, 4 Nov 2021 at 04:00, Ronan Dunklau wrote: > > Le jeudi 22 juillet 2021, 10:42:49 CET Ronan Dunklau a écrit : > > Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit : > > > On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau > wrote: > > > > I tested the 0001 patch against both HEAD and my proposed bugfix for > > > > postgres_fdw. > > > > > > > > There is a problem that the ordered aggregate is not pushed down > > > > anymore. > > > > The underlying Sort node is correctly pushed down though. > > > > > > > > This comes from the fact that postgres_fdw grouping path never contains > > > > any > > > > pathkey. Since the cost is fuzzily the same between the pushed-down > > > > aggregate and the locally performed one, the tie is broken against the > > > > pathkeys. > > > > > > I think this might be more down to a lack of any penalty cost for > > > fetching foreign tuples. Looking at create_foreignscan_path(), I don't > > > see anything that adds anything to account for fetching the tuples > > > from the foreign server. If there was something like that then there'd > > > be more of a preference to perform the remote aggregation so that > > > fewer rows must arrive from the remote server. > > > > > > I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in > > > create_foreignscan_path() and I got the plan with the remote > > > aggregation. That's a fairly large penalty of 1.0 per row. Much bigger > > > than parallel_tuple_cost's default value. > > > > > > I'm a bit undecided on how much this patch needs to get involved in > > > adjusting foreign scan costs. The problem is that we've given the > > > executor a new path to consider and nobody has done any proper > > > costings for the foreign scan so that it properly prefers paths that > > > have to pull fewer foreign tuples. This is a pretty similar problem > > > to what parallel_tuple_cost aims to fix. Also similar to how we had to > > > add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates > > > prefer grouping at the partition level rather than at the partitioned > > > table level. > > > > > > > Ideally we would add the group pathkeys to the grouping path, but this > > > > would add an additional ORDER BY expression matching the GROUP BY. > > > > Moreover, some triaging of the pathkeys would be necessary since we now > > > > mix the sort-in- aggref pathkeys with the group_pathkeys. > > > > > > I think you're talking about passing pathkeys into > > > create_foreign_upper_path in add_foreign_grouping_paths. If so, I > > > don't really see how it would be safe to add pathkeys to the foreign > > > grouping path. What if the foreign server did a Hash Aggregate? The > > > rows might come back in any random order. > > > > Yes, I was suggesting to add a new path with the pathkeys factored in, which > > if chosen over the non-ordered path would result in an additional ORDER BY > > clause to prevent a HashAggregate. But that doesn't seem a good idea after > > all. > > > > > I kinda think that to fix this properly would need a new foreign > > > server option such as foreign_tuple_cost. I'd feel better about > > > something like that with some of the people with a vested interest in > > > the FDW code were watching more closely. So far we've not managed to > > > entice any of them with the bug report yet, but it's maybe early days > > > yet. > > > > We already have that in the form of fdw_tuple_cost as a server option if I'm > > not mistaken ? It works as expected when the number of tuples is notably > > reduced by the foreign group by. > > > > The problem arise when the cardinality of the groups is equal to the input's > > cardinality. I think even in that case we should try to use a remote > > aggregate since it's a computation that will not happen on the local > > server. I also think we're more likely to have up to date statistics > > remotely than the ones collected locally on the foreign tables, and the > > estimated number of groups would be more accurate on the remote side than > > the local one. > > I took some time to toy with this again. > > At first I thought that charging a discount in foreign grouping paths for > Aggref targets (since they are computed remotely) would be a good idea, > similar to what is done for the grouping keys. > > But in the end, it's probably not something we would like to do. Yes, the > group planning will be more accurate on the remote side generally (better > statistics than locally for
Re: real/float example for testlibpq3
On Mon, 28 Feb 2022 at 17:50, Tom Lane wrote: > > Chapman Flack writes: > > In the current state of affairs, what's considered the ur-source of that > > information? > > The source code for the type's send/receive functions :-(. One could > wish for something better, but no one has stepped up to produce such > documentation. Fwiw the client library I heard of attempting to have good binary mode support was the Crystal language client https://github.com/will/crystal-pg. I think he was aiming for full coverage of the built-in data types. That might make a good reference implementation to write up documentation from. He probably uncovered some corner cases in development that one might not find from just inspection of the server code. -- greg
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On Wed, Mar 30, 2022 at 1:11 PM Andres Freund wrote: > On March 30, 2022 9:58:26 AM PDT, Tom Lane wrote: > >Andres Freund writes: > >> Random aside: Am I the only one bothered by a bunch of places in > >> Makefile.global.in quoting like > >> $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install > >> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 > >> and > >> rm -rf '$(CURDIR)'/tmp_check && > >> etc > > > >Don't we need that to handle, say, build paths with spaces in them? > > My concern is about the quote in the middle of the path, not about quoting at > all... I.e. the ' should be after /tmp_install, not before. Makes no difference. We know that the string /tmp_install contains no shell metacharacters, so why does it need to be in quotes? I would've probably written it the way it is here, rather than what you are proposing. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On March 30, 2022 9:58:26 AM PDT, Tom Lane wrote: >Andres Freund writes: >> Random aside: Am I the only one bothered by a bunch of places in >> Makefile.global.in quoting like >> $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install >> install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 >> and >> rm -rf '$(CURDIR)'/tmp_check && >> etc > >Don't we need that to handle, say, build paths with spaces in them? My concern is about the quote in the middle of the path, not about quoting at all... I.e. the ' should be after /tmp_install, not before. >Admittedly we're probably not completely clean for such paths, >but that's not an excuse to break the places that do it right. > >(I occasionally think about setting up a BF animal configured >like that, but haven't tried yet.) That might be a fun exercise. Not so much for the build aspect, but to make sure our tools handle it. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On Wed, Mar 30, 2022 at 12:54 PM Andres Freund wrote: > There are some commandline utilities (including copy) where backward slashes > in arguments are necessary, to separate options from paths :/. Those are the > extent of backslash use in Cluster.pm that I could see quickly. I just copied this logic from that file: $path =~ s{\\}{}g if ($PostgreSQL::Test::Utils::windows_os); my $copy_command = $PostgreSQL::Test::Utils::windows_os ? qq{copy "$path%f" "%p"} : qq{cp "$path/%f" "%p"}; In the first version of the patch I neglected the first of those lines and it broke, so the s{\\}{}g thing is definitely needed. It's possible that / would be as good as in the command text itself, but it doesn't seem worth getting excited about. It'd be best if any unnecessary garbage of this sort got cleaned up by someone who has a test environment locally, rather than me testing by sending emails to a mailing list which Thomas then downloads into a sandbox and executes which you then send me links to what broke on the mailing list and I try again. > Fair enough. I found it a bit grating to read in the test, that's why I > mentioned it... I'm going to go ahead and commit this test script later on this afternoon unless there are vigorous objections real soon now, and then if somebody wants to improve it, great! -- Robert Haas EDB: http://www.enterprisedb.com
Re: multithreaded zstd backup compression for client and server
Robert Haas writes: > On Wed, Mar 30, 2022 at 8:00 AM Dagfinn Ilmari Mannsåker > wrote: >> Robert Haas writes: >> > This patch contains a trivial adjustment to >> > PostgreSQL::Test::Cluster::run_log to make it return a useful value >> > instead of not. I think that should be pulled out and committed >> > independently regardless of what happens to this patch overall, and >> > possibly back-patched. >> >> run_log() is far from the only such method in PostgreSQL::Test::Cluster. >> Here's a patch that gives the same treatment to all the methods that >> just pass through to the corresponding PostgreSQL::Test::Utils function. >> >> Also attached is a fix a typo in the _get_env doc comment that I noticed >> while auditing the return values. > > I suggest posting these patches on a new thread with a subject line > that matches what the patches do, and adding it to the next > CommitFest. Will do. > It seems like a reasonable thing to do on first glance, but I wouldn't > want to commit it without going through and figuring out whether > there's any risk of anything breaking, and it doesn't seem like > there's a strong need to do it in v15 rather than v16. Given that the methods don't currently have a useful return value (undef or the empty list, depending on context), I don't expect anything to be relying on it (and it passed check-world with --enable-tap-tests and all the --with-foo flags I could easily get to work), but I can grep the code as well to be extra sure. - ilmari
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Andres Freund writes: > On 2022-03-30 12:34:34 -0400, Tom Lane wrote: >> One refinement that comes to mind as I look at the patch is to distinguish >> between "check" and "installcheck". Not sure that's worthwhile, but not >> sure it isn't, either. > As it's just about "free" to do so, I see no reason not to go for showing that > difference. How about: > echo "+++ (tap|regress|isolation) [install-]check in $(subdir) +++" && \ WFM. > I see no reason to distinguish the PGXS / non-PGXs tap installcheck cases? Agreed. > Random aside: Am I the only one bothered by a bunch of places in > Makefile.global.in quoting like > $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install > install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 > and > rm -rf '$(CURDIR)'/tmp_check && > etc Don't we need that to handle, say, build paths with spaces in them? Admittedly we're probably not completely clean for such paths, but that's not an excuse to break the places that do it right. (I occasionally think about setting up a BF animal configured like that, but haven't tried yet.) regards, tom lane
Re: basebackup/lz4 crash
On Wed, Mar 30, 2022 at 10:35 AM Justin Pryzby wrote: > It looks like maybe this code was copied from > bbstreamer_lz4_compressor_finalize() which has an Assert rather than expanding > the buffer like in bbstreamer_lz4_compressor_new() Hmm, this isn't great. On the server side, we set up a chain of bbsink buffers that can't be resized later. Each bbsink tells the next bbsink how to make its buffer, but the successor bbsink has control of that buffer and resizing it on-the-fly is not allowed. It looks like bbstreamer_lz4_compressor_new() is mimicking that logic, but not well. It sets the initial buffer size to LZ4F_compressBound(streamer->base.bbs_buffer.maxlen, prefs), but streamer->base.bbs_buffer.maxlen is not any kind of promise from the caller about future chunk sizes. It's just whatever initStringInfo() happens to do. My guess is that Dipesh thought that the buffer wouldn't need to be resized because "we made it big enough already" but that's not the case. The server knows how much data it is going to read from disk at a time, but the client has to deal with whatever the server sends. I think your proposed change is OK, modulo some comments. But I think maybe we ought to delete all the stuff related to compressed_bound from bbstreamer_lz4_compressor_new() as well, because I don't see that there's any point. And then I think we should also add logic similar to what you've added here to bbstreamer_lz4_compressor_finalize(), so that we're not making the assumption that the buffer will get enlarged at some earlier point. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-30 12:42:50 -0400, Robert Haas wrote: > On Wed, Mar 30, 2022 at 12:30 PM Andres Freund wrote: > > # Reconfigure to restrict access and require a detail. > > $shell_command = > > $PostgreSQL::Test::Utils::windows_os > > ? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"} > > : qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"}; > > > > I don't think the branch is needed anymore, forward slashes should work for > > output redirection. > > We have similar things in src/test/perl/PostgreSQL/Test/Cluster.pm. There are some commandline utilities (including copy) where backward slashes in arguments are necessary, to separate options from paths :/. Those are the extent of backslash use in Cluster.pm that I could see quickly. > I'm not sure it's the place of this patch to introduce a mix of styles. Fair enough. I found it a bit grating to read in the test, that's why I mentioned it... Greetings, Andres Freund
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-30 12:34:34 -0400, Tom Lane wrote: > Andres Freund writes: > > Unless somebody speaks in favor of doing this across branches, I'd just go > > for > > HEAD. > > +1 for HEAD only, especially if we think we might change it some more > later. It seems possible this might break somebody's tooling if we > drop it into minor releases. Yea. I certainly have written scripts that parse check-world output - they didn't break, but... > One refinement that comes to mind as I look at the patch is to distinguish > between "check" and "installcheck". Not sure that's worthwhile, but not > sure it isn't, either. As it's just about "free" to do so, I see no reason not to go for showing that difference. How about: echo "+++ (tap|regress|isolation) [install-]check in $(subdir) +++" && \ I see no reason to distinguish the PGXS / non-PGXs tap installcheck cases? Random aside: Am I the only one bothered by a bunch of places in Makefile.global.in quoting like $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 and rm -rf '$(CURDIR)'/tmp_check && etc yielding commands like: make -C '.' DESTDIR='/home/andres/build/postgres/dev-assert/vpath'/tmp_install install >'/home/andres/build/postgres/dev-assert/vpath'/tmp_install/log/install.log 2>&1 and rm -rf '/home/andres/build/postgres/dev-assert/vpath/contrib/test_decoding'/tmp_check & Greetings, Andres Freund
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On Wed, Mar 30, 2022 at 12:30 PM Andres Freund wrote: > # Reconfigure to restrict access and require a detail. > $shell_command = > $PostgreSQL::Test::Utils::windows_os > ? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"} > : qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"}; > > I don't think the branch is needed anymore, forward slashes should work for > output redirection. We have similar things in src/test/perl/PostgreSQL/Test/Cluster.pm. Do you think those can also be removed? I'm not sure it's the place of this patch to introduce a mix of styles. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Andres Freund writes: > Because I forgot about cfbot when attaching the patch, cfbot actually ran with > it under this thread's cf entry. It does look like an improvement to me: > https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900 > We certainly can do better, but it's sufficiently better than what we have > right now. So I'd like to commit it? No objection here. >> Would we want to do this in all branches? I'd vote for yes, but ... > Unless somebody speaks in favor of doing this across branches, I'd just go for > HEAD. +1 for HEAD only, especially if we think we might change it some more later. It seems possible this might break somebody's tooling if we drop it into minor releases. One refinement that comes to mind as I look at the patch is to distinguish between "check" and "installcheck". Not sure that's worthwhile, but not sure it isn't, either. regards, tom lane
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Hi, On 2022-03-30 09:28:58 +0530, Dilip Kumar wrote: > On Wed, Mar 30, 2022 at 6:47 AM Andres Freund wrote: > > > > > > du -s /tmp/initdb/ > > WAL_LOG: 35112 > > FILE_COPY: 29288 > > > > So it seems we should specify a strategy in initdb? It kind of makes sense - > > we're not going to read anything from those database. And because of the > > ringbuffer of 256kB, we'll not even reduce IO meaningfully. > > I think this makes sense, so you mean with initdb we will always use > file_copy or we want to give a command line option for initdb ? Don't see a need for a commandline option / a situation where using WAL_LOG would be preferrable for initdb. Greetings, Andres Freund
Re: make MaxBackends available in _PG_init
On Tue, Mar 29, 2022 at 12:22:21PM -0400, Robert Haas wrote: > It's not, though, because the original proposal was to change things > around so that the value of MaxBackends would have been reliable in > _PG_init(). If we'd done that, then extensions that are using it in > _PG_init() would have gone from being buggy to being not-buggy. But > since you advocated against that change, we instead committed > something that caused them to go from being buggy to failing outright. > That's pretty painful for people with such extensions. And IMHO, it's > *much* more legitimate to want to size a data structure based on the > value of MaxBackends than it is for extensions to override GUC values. > If we can make the latter use case work in a sane way, OK, although I > have my doubts about how sane it really is, but it can't be at the > expense of telling extensions that have been (incorrectly) using > MaxBackends in _PG_init() that we're throwing them under the bus. > > IMHO, the proper thing to do if certain GUC values are required for an > extension to work is to put that information in the documentation and > error out at an appropriate point if the user does not follow the > directions. Then this issue does not arise. But there's no reasonable > workaround for being unable to size data structures based on > MaxBackends. FWIW I would be on board with reverting all the GetMaxBackends() stuff if we made the value available in _PG_init() and stopped supporting GUC overrides by extensions (e.g., ERROR-ing in SetConfigOption() when process_shared_preload_libraries_in_progress is true). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pgsql: Add 'basebackup_to_shell' contrib module.
On 2022-03-30 08:53:43 -0400, Robert Haas wrote: > Here's a new series, adjusted to use 'gzip' instead of 'cat' and 'type'. Seems to have done the trick: https://cirrus-ci.com/task/6474955838717952?logs=test_contrib_basebackup_to_shell#L1 # Reconfigure to restrict access and require a detail. $shell_command = $PostgreSQL::Test::Utils::windows_os ? qq{$gzip --fast > "$escaped_backup_path%d.%f.gz"} : qq{$gzip --fast > "$escaped_backup_path/%d.%f.gz"}; I don't think the branch is needed anymore, forward slashes should work for output redirection. Greetings, Andres Freund
Re: Identify missing publications from publisher while create/alter subscription.
On Wed, Mar 30, 2022 at 5:42 PM Bharath Rupireddy wrote: > > On Wed, Mar 30, 2022 at 5:37 PM Bharath Rupireddy > wrote: > > > > On Wed, Mar 30, 2022 at 4:29 PM Amit Kapila wrote: > > > > > > On Wed, Mar 30, 2022 at 12:22 PM vignesh C wrote: > > > > > > > > I have made the changes for this, attached v17 patch has the changes > > > > for the same. > > > > > > > > > > The patch looks good to me. I have made minor edits in the comments > > > and docs. See the attached and let me know what you think? I intend to > > > commit this tomorrow unless there are more comments or suggestions. > > > > I have one minor comment: > > > > + "Create subscription throws warning for multiple non-existent > > publications"); > > +$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION mysub1;"); > > + "CREATE SUBSCRIPTION mysub1 CONNECTION '$publisher_connstr' > > PUBLICATION mypub;" > > + "ALTER SUBSCRIPTION mysub1 ADD PUBLICATION non_existent_pub" > > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub" > > > > Why should we drop the subscription mysub1 and create it for ALTER .. > > ADD and ALTER .. SET tests? Can't we just do below which saves > > unnecessary subscription creation, drop, wait_for_catchup and > > poll_query_until? > > > > + "Create subscription throws warning for multiple non-existent > > publications"); > > + "ALTER SUBSCRIPTION mysub1 ADD PUBLICATION non_existent_pub2" > > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub3" > > Or I would even simplify the entire tests as follows: > > + "CREATE SUBSCRIPTION mysub1 CONNECTION '$publisher_connstr' > PUBLICATION mypub, non_existent_pub1" > + "Create subscription throws warning for non-existent publication"); > >> no drop of mysub1 >> > + "CREATE SUBSCRIPTION mysub2 CONNECTION '$publisher_connstr' > PUBLICATION non_existent_pub1, non_existent_pub2" > + "Create subscription throws warning for multiple non-existent > publications"); > >> no drop of mysub2 >> > + "ALTER SUBSCRIPTION mysub2 ADD PUBLICATION non_existent_pub3" > + "Alter subscription add publication throws warning for non-existent > publication"); > + "ALTER SUBSCRIPTION mysub2 SET PUBLICATION non_existent_pub4" > + "Alter subscription set publication throws warning for non-existent > publication"); > $node_subscriber->stop; > $node_publisher->stop; Your suggestion looks valid, I have modified it as suggested. Additionally I have removed Create subscription with multiple non-existent publications and changed add publication with sing non-existent publication to add publication with multiple non-existent publications to cover the multiple non-existent publications path. Attached v19 patch has the changes for the same. Regards, Vignesh From fc175350780ca046e977c9b60d1f79873d19ae7e Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Sat, 26 Mar 2022 19:43:27 +0530 Subject: [PATCH v19] Raise WARNING for missing publications. When we create or alter a subscription to add publications raise a warning for non-existent publications. We don't want to give an error here because it is possible that users can later create the missing publications. Author: Vignesh C Reviewed-by: Amit Kapila, Bharath Rupireddy, Japin Li, Dilip Kumar, Euler Taveira, Ashutosh Sharma Discussion: https://www.postgresql.org/message-id/flat/20220321235957.i4jtjn4wyjucex6b%40alap3.anarazel.de#b846fd4ef657cfaa8c9890f044e4 --- doc/src/sgml/ref/alter_subscription.sgml | 4 +- doc/src/sgml/ref/create_subscription.sgml | 7 ++ src/backend/commands/subscriptioncmds.c | 133 ++ src/test/subscription/t/007_ddl.pl| 37 ++ 4 files changed, 160 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 3e46bbdb04..fe13ab9a2d 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -114,7 +114,9 @@ ALTER SUBSCRIPTION name RENAME TO < replaces the entire list of publications with a new list, ADD adds additional publications to the list of publications, and DROP removes the publications from - the list of publications. See + the list of publications. We allow non-existent publications to be + specified in ADD and SET variants + so that users can add those later. See for more information. By default, this command will also act like REFRESH PUBLICATION. diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index b701752fc9..ebf7db57c5 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -356,6 +356,13 @@ CREATE SUBSCRIPTION subscription_name + + We allow non-existent publications to be specified so that users can add + those later. This means + pg_subscription + can have non-existent publications. + + diff --git a/src/backend/commands/subscriptioncmds.c
Re: pgsql: Add 'basebackup_to_shell' contrib module.
Hi, On 2022-03-26 13:51:35 -0700, Andres Freund wrote: > Prototype patch attached. Because I forgot about cfbot when attaching the patch, cfbot actually ran with it under this thread's cf entry. It does look like an improvement to me: https://cirrus-ci.com/task/6397292629458944?logs=test_world#L900 We certainly can do better, but it's sufficiently better than what we have right now. So I'd like to commit it? > Would we want to do this in all branches? I'd vote for yes, but ... Unless somebody speaks in favor of doing this across branches, I'd just go for HEAD. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Tue, Mar 29, 2022 at 03:48:32PM -0700, Nathan Bossart wrote: > On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote: >> /* we're only handling directories here, skip if it's not ours */ >> -if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode)) >> +if (lstat(path, ) != 0) >> +ereport(ERROR, >> +(errcode_for_file_access(), >> + errmsg("could not stat file \"%s\": %m", path))); >> +else if (!S_ISDIR(statbuf.st_mode)) >> return; >> >> Why is this a good place to silently ignore non-directories? >> StartupReorderBuffer() is already in charge of skipping random >> detritus found in the directory, so would it be better to do "if >> (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then >> drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs() >> completely? Then perhaps its ReadDirExtended() shoud be using ERROR >> instead of INFO, so that missing/non-dir/b0rked directories raise an >> error. > > My guess is that this was done because ReorderBufferCleanupSerializedTXNs() > is also called from ReorderBufferAllocate() and ReorderBufferFree(). > However, it is odd that we just silently return if the slot path isn't a > directory in those cases. I think we could use get_dirent_type() in > StartupReorderBuffer() as you suggested, and then we could let ReadDir() > ERROR for non-directories for the other callers of > ReorderBufferCleanupSerializedTXNs(). WDYT? > >> I don't understand why it's reporting readdir() errors at INFO >> but unlink() errors at ERROR, and as far as I can see the other paths >> that reach this code shouldn't be sending in paths to non-directories >> here unless something is seriously busted and that's ERROR-worthy. > > I agree. I'll switch it to ReadDir() in the next revision so that we ERROR > instead of INFO. Here is an updated patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 784419919fca8f157a1c7304b79567b3ba29f3bf Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 15 Feb 2022 09:40:53 -0800 Subject: [PATCH v11 1/2] make more use of get_dirent_type() --- src/backend/access/heap/rewriteheap.c | 4 +- .../replication/logical/reorderbuffer.c | 12 +++--- src/backend/replication/logical/snapbuild.c | 5 +-- src/backend/replication/slot.c| 4 +- src/backend/storage/file/copydir.c| 21 +++--- src/backend/storage/file/fd.c | 20 + src/backend/utils/misc/guc-file.l | 42 +++ src/timezone/pgtz.c | 8 +--- 8 files changed, 48 insertions(+), 68 deletions(-) diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index 2a53826736..d64d7aae2e 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -113,6 +113,7 @@ #include "access/xact.h" #include "access/xloginsert.h" #include "catalog/catalog.h" +#include "common/file_utils.h" #include "lib/ilist.h" #include "miscadmin.h" #include "pgstat.h" @@ -1213,7 +1214,6 @@ CheckPointLogicalRewriteHeap(void) mappings_dir = AllocateDir("pg_logical/mappings"); while ((mapping_de = ReadDir(mappings_dir, "pg_logical/mappings")) != NULL) { - struct stat statbuf; Oid dboid; Oid relid; XLogRecPtr lsn; @@ -1227,7 +1227,7 @@ CheckPointLogicalRewriteHeap(void) continue; snprintf(path, sizeof(path), "pg_logical/mappings/%s", mapping_de->d_name); - if (lstat(path, ) == 0 && !S_ISREG(statbuf.st_mode)) + if (get_dirent_type(path, mapping_de, false, ERROR) != PGFILETYPE_REG) continue; /* Skip over files that cannot be ours. */ diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index c2d9be81fa..489a8300fa 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -126,6 +126,7 @@ #include "access/xlog_internal.h" #include "catalog/catalog.h" #include "commands/sequence.h" +#include "common/file_utils.h" #include "lib/binaryheap.h" #include "miscadmin.h" #include "pgstat.h" @@ -4806,15 +4807,10 @@ ReorderBufferCleanupSerializedTXNs(const char *slotname) { DIR *spill_dir; struct dirent *spill_de; - struct stat statbuf; char path[MAXPGPATH * 2 + 12]; sprintf(path, "pg_replslot/%s", slotname); - /* we're only handling directories here, skip if it's not ours */ - if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode)) - return; - spill_dir = AllocateDir(path); while ((spill_de = ReadDirExtended(spill_dir, path, INFO)) != NULL) { @@ -4862,6 +4858,7 @@ StartupReorderBuffer(void) { DIR *logical_dir; struct dirent *logical_de; + char path[MAXPGPATH * 2 + 12]; logical_dir = AllocateDir("pg_replslot"); while ((logical_de = ReadDir(logical_dir, "pg_replslot")) != NULL) @@ -4874,6 +4871,11 @@ StartupReorderBuffer(void) if
Re: Frontend error logging style
Greg Stark writes: > FYI this patch no longer applies. No surprise :-( > Given it's a Tom patch and due to its nature it'll bitrot rapidly > anyways I'm don't see a point in updating the status though. We now seem to have buy-in on the concept, so my plan is to let this sit till the end of the commitfest, then rebase and push. That should avoid unnecessary churn for other pending patches. regards, tom lane
Re: [PATCH] Accept IP addresses in server certificate SANs
On Wed, 2022-03-30 at 13:37 +0200, Peter Eisentraut wrote: > On 28.03.22 22:21, Jacob Champion wrote: > > On Mon, 2022-03-28 at 11:17 +0200, Daniel Gustafsson wrote: > > > Fixing up the switch_server_cert() calls and using default_ssl_connstr > > > makes > > > the test pass for me. The required fixes are in the supplied 0004 diff, > > > I kept > > > them separate to allow the original author to incorporate them without > > > having > > > to dig them out to see what changed (named to match the git format-patch > > > output > > > since I think the CFBot just applies the patches in alphabetical order). > > > > Thanks! Those changes look good to me; I've folded them into v11. This > > is rebased on a newer HEAD so it should fix the apply failures that > > Greg pointed out. > > I'm not happy about how inet_net_pton.o is repurposed here. That code > is clearly meant to support backend data types with specifically. Code like > > + /* > + * pg_inet_net_pton() will accept CIDR masks, which we don't want to > + * match, so skip the comparison if the host string contains a slash. > + */ > > indicates that we are fighting against the API. Horiguchi-san had the same concern upthread, I think. I replaced that code in the next patch, but it was enough net-new stuff that I kept the patches separate instead of merging them. I can change that if it's not helpful for review. > Also, if someone ever > wants to change how those backend data types work, we then have to check > a bunch of other code as well. > > I think we should be using inet_ntop()/inet_pton() directly here. We > can throw substitute implementations into libpgport if necessary, that's > not so difficult. Is this request satisfied by the implementation of pg_inet_pton() in patch 0003? If not, what needs to change? Thanks, --Jacob
Re: Returning multiple rows in materialized mode inside the extension
On Wed, Mar 30, 2022 at 9:01 AM Piotr Styczyński wrote: > I don’t know if this mailing list is a good place to ask this question, > but if it’s not, just correct me. > pgsql-general is probably better > *The problem:* > > We currently have a one-to-many function (an operation that produces > multiple rows per one one input row). > Now we would like to translate that functionality to a sensible > many-to-many. > This seems like a big gap. Input Situation Rows: 1 2 3 What is the expected output 1 A 1 B 1 C 2 A 2 B 2 C 3 A 3 B 3 C I really don't know how you would change the internals to handle this - I'm doubting it would even be possible. If asked to accomplish this using just standard PostgreSQL I would turn the inputs into an array {1,2,3} and pass that array into a set-returning function. Now I have: {1,2,3} A {1,2,3} B {1,2,3} C as an output, and I can just unnest the array column to produce the final result. Something like (not tested): SELECT unnest(arr_input.arr), func_call FROM (SELECT array_agg(inputvals) AS arr FROM tbl) AS arr_input LATERAL func_call(arr_input.arr) ; David J.
Re: Add non-blocking version of PQcancel
I attached a new version of this patch. Which does three main things: 1. Change the PQrequestCancel implementation to use the regular connection establishement code, to support all connection options including encryption. 2. Add PQrequestCancelStart which is a thread-safe and non-blocking version of this new PQrequestCancel implementation. 3. Add PQconnectComplete, which completes a connection started by PQrequestCancelStart. This is useful if you want a thread-safe, but blocking cancel (without having a need for signal safety). This change un-deprecates PQrequestCancel, since now there's actually an advantage to using it over PQcancel. It also includes user facing documentation for all these functions. As a API design change from the previous version, PQrequestCancelStart now returns a regular PGconn for the cancel connection. @Tom Lane regarding this: > Even in the non-blocking case, callers should only deal with the original > PGconn. This would by definition result in non-threadsafe code (afaict). So I refrained from doing this. The blocking version doesn't expose a PGconn at all, but the non-blocking one now returns a new PGconn. There's two more changes that I at least want to do before considering this patch mergable: 1. Go over all the functions that can be called with a PGconn, but should not be called with a cancellation PGconn and error out or exit early. 2. Copy over the SockAddr from the original connection and always connect to the same socket. I believe with the current code the cancellation could end up at the wrong server if there are multiple hosts listed in the connection string. And there's a third item that I would like to do as a bonus: 3. Actually use the non-blocking API for the postgres_fdw code to implement a timeout. Which would allow this comment can be removed: /* * Issue cancel request. Unfortunately, there's no good way to limit the * amount of time that we might block inside PQgetCancel(). */ So a next version of this patch can be expected somewhere later this week. But any feedback on the current version would be appreciated. Because these 3 changes won't change the overall design much. Jelte 0001-Add-documentation-for-libpq_pipeline-tests.patch Description: 0001-Add-documentation-for-libpq_pipeline-tests.patch 0002-Add-non-blocking-version-of-PQcancel.patch Description: 0002-Add-non-blocking-version-of-PQcancel.patch
Re: Printing backtrace of postgres processes
On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote: > Sadly the cfbot is showing a patch conflict again. It's just a trivial > conflict in the regression test schedule so I'm not going to update > the status but it would be good to rebase it so we continue to get > cfbot testing. Done. No changes. >From b984dacb4bf2794705a8da49fa68ac9d9a6f Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Tue, 25 Jan 2022 08:21:22 +0530 Subject: [PATCH] Add function to log the backtrace of the specified postgres process. This commit adds pg_log_backtrace() function that requests to log the backtrace of the specified backend or auxiliary process except logger and statistic collector. Only superusers are allowed to request to log the backtrace because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its backtrace at LOG_SERVER_ONLY level, so that the backtrace will appear in the server log but not be sent to the client. Bump catalog version. Authors: Vignesh C, Bharath Rupireddy, Justin Pryzby Reviewers: Bharath Rupireddy, Justin Pryzby, Fujii Masao, Atsushi Torikoshi, Dilip Kumar, Robert Haas, Andres Freund, Tom lane, Craig Ringer Discussion: https://www.postgresql.org/message-id/CALDaNm3ZzmFS-=r7oduzj7y7bgqv+n06kqyft6c3xzdoknk...@mail.gmail.com --- doc/src/sgml/func.sgml| 62 +++ src/backend/catalog/system_functions.sql | 2 + src/backend/postmaster/autovacuum.c | 4 ++ src/backend/postmaster/checkpointer.c | 4 ++ src/backend/postmaster/interrupt.c| 4 ++ src/backend/postmaster/pgarch.c | 10 +++- src/backend/postmaster/startup.c | 4 ++ src/backend/postmaster/walwriter.c| 4 ++ src/backend/storage/ipc/procarray.c | 56 + src/backend/storage/ipc/procsignal.c | 42 + src/backend/storage/ipc/signalfuncs.c | 73 --- src/backend/tcop/postgres.c | 4 ++ src/backend/utils/adt/mcxtfuncs.c | 40 +++-- src/backend/utils/error/elog.c| 19 -- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 5 ++ src/include/miscadmin.h | 1 + src/include/storage/procarray.h | 2 + src/include/storage/procsignal.h | 3 +- src/include/utils/elog.h | 2 + src/test/regress/expected/backtrace.out | 49 +++ src/test/regress/expected/backtrace_1.out | 55 + src/test/regress/parallel_schedule| 2 +- src/test/regress/sql/backtrace.sql| 33 ++ 24 files changed, 416 insertions(+), 65 deletions(-) create mode 100644 src/test/regress/expected/backtrace.out create mode 100644 src/test/regress/expected/backtrace_1.out create mode 100644 src/test/regress/sql/backtrace.sql diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 524c6b98547..4d187b3a6d8 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25394,6 +25394,30 @@ SELECT collation for ('foo' COLLATE "de_DE"); + + + + pg_log_backtrace + +pg_log_backtrace ( pid integer ) +boolean + + +Requests to log the backtrace of the backend with the +specified process ID. This function can send the request to +backends and auxiliary processes except the logger and statistics +collector. The backtraces will be logged at LOG +message level. They will appear in the server log based on the log +configuration set (See for +more information), but will not be sent to the client regardless of +. A backtrace identifies +which function a process is currently executing and it may be useful +for developers to diagnose stuck processes and other problems. This +function is supported only if PostgreSQL was built with the ability to +capture backtraces, otherwise it will emit a warning. + + + @@ -25614,6 +25638,44 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 because it may generate a large number of log messages. + +pg_log_backtrace can be used to log the backtrace of +a backend process. For example: + +postgres=# select pg_log_backtrace(pg_backend_pid()); + pg_log_backtrace +-- + t +(1 row) + +The backtrace will be logged as specified by the logging configuration. +For example: + +2021-01-27 11:33:50.247 IST [111735] LOG: current backtrace: +postgres: postgresdba postgres [local] SELECT(set_backtrace+0x38) [0xae06c5] +postgres: postgresdba postgres [local] SELECT(ProcessInterrupts+0x788) [0x950c34] +postgres: postgresdba postgres
Re: [PATCH] Expose port->authn_id to extensions and triggers
On Tue, 2022-03-29 at 16:53 -0700, Andres Freund wrote: > > We'd also need to guess whether the GUC system's serialization of NULL > > as an empty string is likely to cause problems for any future auth > > methods. > > You can't represent a NULL in a postgres 'text' datum, independent of > parallelism. So the current definition of pg_session_authn_id() already > precludes that (and set_authn_id() and ...). Honestly, I can't see a reason > why we should ever allow authn_id to contain a NULL byte. I don't mean a NULL byte, just a NULL pointer. This part of the implementation doesn't distinguish between it and an empty string: > /* NULL becomes empty string, see estimate_variable_size() */ > do_serialize(destptr, maxbytes, "%s", > *conf->variable ? *conf->variable : ""); Whether that's a problem in the future entirely depends on whether there's some authentication method that considers the empty string a sane and meaningful identity. We might reasonably decide that the answer is "no", but I like being able to make that decision as opposed to delegating it to an existing generic framework. (That last point may be my core concern about making it a GUC: I'd like us to have full control of how and where this particular piece of information gets modified.) Thanks, --Jacob
Returning multiple rows in materialized mode inside the extension
Hi, I represent a small group of developers. We are working on an open-source PostgreSQL extension to enable processing of genomic data inside Postgres. We have an extensive knowledge of molecular biology or data science and none of the Postgres internals. I don’t know if this mailing list is a good place to ask this question, but if it’s not, just correct me. *The problem:* We currently have a one-to-many function (an operation that produces multiple rows per one one input row). Now we would like to translate that functionality to a sensible many-to-many. We need to know how we are constrained by the internals of Postgres itself and what syntax we should use. Also, the operation we are implementing requires knowing the full set of inputs before it can be computed. *Current solution:* There is ValuePerCall (1/0 returned rows) or Materialize mode (any number of returned rows), however the second one does not offer any invocation counter (like ValuePerCall does). Hence to provide any persistence between subcalls we introduced the following syntax: *SELECT _ FROM table t, my_function(t.a, t.b, t.c, number_of_rows);* Where by FROM a, b we mean cartesian product a times b. And my_function for first (number_of_rows - 1) invocations returns an empty set and the full result set for the last one. Sadly this syntax requires us to enter a number of rows which is not very convenient. Do you know how to handle this situation correctly? We looked for example at the code of tablefunc but the syntax there requires a full SQL query as an input, so that wasn’t useful.
Re: Printing backtrace of postgres processes
Sadly the cfbot is showing a patch conflict again. It's just a trivial conflict in the regression test schedule so I'm not going to update the status but it would be good to rebase it so we continue to get cfbot testing.
Re: Frontend error logging style
FYI this patch no longer applies. Given it's a Tom patch and due to its nature it'll bitrot rapidly anyways I'm don't see a point in updating the status though.
Re: Granting SET and ALTER SYSTE privileges for GUCs
On Wed, Mar 30, 2022 at 8:46 AM Tom Lane wrote: > I don't want to do that with > a blunderbuss, but perhaps there's an argument to do it for specific > cases (search_path comes to mind, though the performance cost could be > significant, since I think setting that in function SET clauses is > common). > I suspect it became considerably moreso when we fixed the search_path CVE since we basically told people that doing so, despite the possible performance hit, was the easiest solution to their immediate dump/restore failures. But ISTM that because that SET has a function invocation context it could bypass any such check. Though maybe the DO command exposes a flaw in that idea. David J.
Re: Adding CI to our tree
Hi, Now that zstd is used, enable it in CI. I plan to commit this shortly, unless somebody sees a reason not to do so. Greetings, Andres Freund >From 51cc830e2e82516966fe1c84cd134b1858a89bab Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sat, 26 Mar 2022 12:36:04 -0700 Subject: [PATCH v1] ci: enable zstd where available. Now that zstd is used, enable it in CI. Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de --- .cirrus.yml | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 171bd29cf03..7b883b462a8 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -101,6 +101,7 @@ task: --with-ssl=openssl \ --with-tcl --with-tclconfig=/usr/local/lib/tcl8.6/ \ --with-uuid=bsd \ +--with-zstd \ \ --with-includes=/usr/local/include \ --with-libs=/usr/local/lib \ @@ -142,6 +143,7 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >- --with-systemd --with-tcl --with-tclconfig=/usr/lib/tcl8.6/ --with-uuid=ossp + --with-zstd task: @@ -270,7 +272,8 @@ task: openldap \ openssl \ python \ - tcl-tk + tcl-tk \ + zstd brew cleanup -s # to reduce cache size upload_caches: homebrew @@ -282,7 +285,7 @@ task: INCLUDES="${brewpath}/include:${INCLUDES}" LIBS="${brewpath}/lib:${LIBS}" -for pkg in icu4c krb5 openldap openssl ; do +for pkg in icu4c krb5 openldap openssl zstd ; do pkgpath="${brewpath}/opt/${pkg}" INCLUDES="${pkgpath}/include:${INCLUDES}" LIBS="${pkgpath}/lib:${LIBS}" @@ -307,6 +310,7 @@ task: --with-ssl=openssl \ --with-tcl --with-tclconfig=${brewpath}/opt/tcl-tk/lib/ \ --with-uuid=e2fs \ + --with-zstd \ \ --prefix=${HOME}/install \ --with-includes="${INCLUDES}" \ -- 2.35.1.677.gabf474a5dd