Re: logical replication restrictions
At Wed, 10 Aug 2022 17:33:00 -0300, "Euler Taveira" wrote in > On Wed, Aug 10, 2022, at 9:39 AM, osumi.takami...@fujitsu.com wrote: > > Minor review comments for v6. > Thanks for your review. I'm attaching v7. Using interval is not standard as this kind of parameters but it seems convenient. On the other hand, it's not great that the unit month introduces some subtle ambiguity. This patch translates a month to 30 days but I'm not sure it's the right thing to do. Perhaps we shouldn't allow the units upper than days. apply_delay() chokes the message-receiving path so that a not-so-long delay can cause a replication timeout to fire. I think we should process walsender pings even while delaying. Needing to make replication timeout longer than apply delay is not great, I think. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: User functions for building SCRAM secrets
On Tue, Nov 08, 2022 at 04:57:09PM -0800, Jacob Champion wrote: > But I guess that wouldn't really help with ALTER ROLE ... PASSWORD, > because you can't parameterize it. Hm... Yeah, and I'd like to think that this is never something we should allow, either, as that could be easily a footgun for users (?). -- Michael signature.asc Description: PGP signature
Re: Locks release order in LogStandbySnapshot
On Wed, 09 Nov 2022 at 11:21, Andres Freund wrote: > I think it does. If we allow xid assignment before LogCurrentRunningXacts() is > done, those new xids would not have been mentioned in the xl_running_xacts > record, despite already running. Which I think result in corrupted snapshots > during hot standby and logical decoding. > > >> Does there any sense to release them in reversed acquisition order in >> LogStandbySnapshot like ProcArrayRemove? > > I don't think it's worth optimizing for, this happens at a low frequency > (whereas connection establishment can be very frequent). And due to the above, > we can sometimes release ProcArrayLock earlier. > Thanks for the explanation! Got it. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian wrote: > > So, what happens when an insufficient number of synchronous replicas > reply? It's a failover. > Sessions hang because the synchronous behavior cannot be > guaranteed. We then _allow_ query cancel so the user or administrator > can get out of the hung sessions and perhaps modify > synchronous_standby_names. Administrators should not modify synchronous_standby_names. Administrator must shoot this not in the head. > I have always felt this has to be done at the server level, meaning when > a synchronous_standby_names replica is not responding after a certain > timeout, the administrator must be notified by calling a shell command > defined in a GUC and all sessions will ignore the replica. Standbys are expelled from the waitlist according to quorum rules. I'd propose not to invent more quorum rules involving shell scripts. The Administrator expressed what number of standbys can be offline by setting synchronous_standby_names. They actively asked for hanging queries in case of insufficient standbys. We have reserved administrator connections for the case when all connection slots are used by hanging queries. Best regards, Andrey Borodin.
Re: [patch] Have psql's \d+ indicate foreign partitions
On Tue, Nov 08, 2022 at 03:38:22PM +0900, Ian Lawrence Barwick wrote: > CF entry updated accordingly. Missed this part, thanks.. -- Michael signature.asc Description: PGP signature
Re: Improve logging when using Huge Pages
On Wed, Nov 09, 2022 at 11:47:57AM +0900, Kyotaro Horiguchi wrote: > Honestly I don't come up with other users of the new > log-level. Another possible issue is it might be a bit hard for people > to connect that level to huge_pages=try, whereas I think we shouldn't > put a description about the concrete impact range of that log-level. > > I came up with an alternative idea that add a new huge_pages value > try_report or try_verbose, which tell postgresql to *always* report > the result of huge_pages = try. Here is an extra idea for the bucket of ideas: switch the user-visible value of huge_pages to 'on' when we are at "try" but success in using huge pages, and switch the visible value to "off". The idea of Justin in [1] to use an internal runtime-computed GUC sounds sensible, as well (say a boolean effective_huge_pages?). [1]: https://www.postgresql.org/message-id/20221106130426.gg16...@telsasoft.com -- Michael signature.asc Description: PGP signature
Re: Suppressing useless wakeups in walreceiver
On Wed, Nov 9, 2022 at 2:38 AM Nathan Bossart wrote: > > On Tue, Nov 08, 2022 at 09:45:40PM +1300, Thomas Munro wrote: > > On Tue, Nov 8, 2022 at 9:20 PM Bharath Rupireddy > > wrote: > >> Thanks. Do we need a similar wakeup approach for logical replication > >> workers in worker.c? Or is it okay that the nap time is 1sec there? > > > > Yeah, I think so. At least for its idle/nap case (waiting for flush > > is also a technically interesting case, but harder, and applies to > > non-idle systems so the polling is a little less offensive). > > Bharath, are you planning to pick this up? If not, I can probably spend > some time on it. Please go ahead! -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: allow segment size to be set to < 1GiB
On Tue, Nov 08, 2022 at 06:28:08PM -0800, Andres Freund wrote: > FWIW, with HEAD, all tests pass with -Dsegsize_blocks=6 on HEAD. Wow. The relation page size influences some of the plans in the main regression test suite, but this is nice to hear. +1 from me for more flexibility with this option at compile-time. -- Michael signature.asc Description: PGP signature
Re: Allow file inclusion in pg_hba and pg_ident files
Hi, On Wed, Nov 09, 2022 at 09:51:17AM +0900, Michael Paquier wrote: > On Tue, Nov 08, 2022 at 10:04:16AM +0900, Michael Paquier wrote: > > CF bot unhappy as I have messed up with rules.out. Rebased. I have > > removed the restriction on MAXPGPATH in AbsoluteConfigLocation() in > > 0001, while on it. The absolute paths built on GUC or ident > > inclusions are the same. > > Rebased after 6bbd8b7g that is an equivalent of the previous 0001. Thanks a lot! > Julien, please note that this is waiting on author for now. What do > you think about the now-named v18-0001 and the addition of an > ErrorContextCallback to provide more information about the list of > included files on error? Yes, I'm unfortunately fully aware that it's waiting on me. I've been a bit busy this week with $work but I will try to go back to it as soon as I can, hopefully this week!
Re: Reviving lost replication slots
On Mon, Nov 7, 2022 at 11:17 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > If the max_slot_wal_keep_size is set appropriately and the replication > lag is monitored properly along with some automatic actions such as > replacing/rebuilding the standbys or subscribers (which may not be > easy and cheap though), the chances of hitting the "lost replication" > problem becomes less, but not zero always. > pg_dump and pg_restore can take several hours to days on a large database. Keeping the WAL in the pg_wal folder (faster, smaller and costly disks?) is not always an option. > > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? If the idea is not to download the wal files > in the pg_wal directory, they can be placed in a slot specific folder > (data/pg_replslot//) until they are needed while decoding and can be > removed. > > Is the idea here the core copying back the WAL files from the archive? > If yes, I think it is not something the core needs to do. This very > well fits the job of an extension or an external module that revives > the lost replication slots by copying WAL files from archive location. > The current code is throwing an error that the slot is lost because the restart_lsn is set to invalid LSN when the WAL is truncated by checkpointer. In order to build an external service that can revive a lost slot, at the minimum we needed the patch attached. > > Having said above, what's the best way to revive a lost replication > slot today? Any automated way exists today? It seems like > pg_replication_slot_advance() doesn't do anything for the > invalidated/lost slots. > If the WAL is available in the pg_wal directory, the replication stream resumes normally when the client connects with the patch I posted. > > If it's a streaming replication slot, the standby will anyway jump to > archive mode ignoring the replication slot and the slot will never be > usable again unless somebody creates a new replication slot and > provides it to the standby for reuse. > If it's a logical replication slot, the subscriber will start to > diverge from the publisher and the slot will have to be revived > manually i.e. created again. > Physical slots can be revived with standby downloading the WAL from the archive directly. This patch is helpful for the logical slots. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 1:36 AM Amit Kapila wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > > > Ideally, in such a case the subscriber should fall back to the > physical standby of the publisher but unfortunately, we don't yet have > a functionality where subscribers can continue logical replication > from physical standby. Do you think if we had such functionality it > would serve our purpose? > Don't think streaming from standby helps as the disk layout is expected to remain the same on physical standby and primary. > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? > > > > No, I was just trying to see if your use case can be addressed in some > other way. BTW, won't copying the WAL again back from archive can lead > to a disk full situation. > The idea is to download the WAL from archive on demand as the slot requires them and throw away the segment once processed. > > -- > With Regards, > Amit Kapila. >
Re: Locks release order in LogStandbySnapshot
Hi, On 2022-11-09 11:03:04 +0800, Japin Li wrote: > GetRunningTransactionData requires holding both ProcArrayLock and > XidGenLock (in that order). Then LogStandbySnapshot releases those > locks in that order. However, to reduce the frequency of having to > wait for XidGenLock while holding ProcArrayLock, ProcArrayAdd releases > them in reversed acquisition order. > > The comments of LogStandbySnapshot says: > > > GetRunningTransactionData() acquired ProcArrayLock, we must release it. > > For Hot Standby this can be done before inserting the WAL record > > because ProcArrayApplyRecoveryInfo() rechecks the commit status using > > the clog. For logical decoding, though, the lock can't be released > > early because the clog might be "in the future" from the POV of the > > historic snapshot. This would allow for situations where we're waiting > > for the end of a transaction listed in the xl_running_xacts record > > which, according to the WAL, has committed before the xl_running_xacts > > record. Fortunately this routine isn't executed frequently, and it's > > only a shared lock. > > This comment is only for ProcArrayLock, not for XidGenLock. IIUC, > LogCurrentRunningXacts doesn't need holding XidGenLock, right? I think it does. If we allow xid assignment before LogCurrentRunningXacts() is done, those new xids would not have been mentioned in the xl_running_xacts record, despite already running. Which I think result in corrupted snapshots during hot standby and logical decoding. > Does there any sense to release them in reversed acquisition order in > LogStandbySnapshot like ProcArrayRemove? I don't think it's worth optimizing for, this happens at a low frequency (whereas connection establishment can be very frequent). And due to the above, we can sometimes release ProcArrayLock earlier. Greetings, Andres Freund
Locks release order in LogStandbySnapshot
Hi, hackers GetRunningTransactionData requires holding both ProcArrayLock and XidGenLock (in that order). Then LogStandbySnapshot releases those locks in that order. However, to reduce the frequency of having to wait for XidGenLock while holding ProcArrayLock, ProcArrayAdd releases them in reversed acquisition order. The comments of LogStandbySnapshot says: > GetRunningTransactionData() acquired ProcArrayLock, we must release it. > For Hot Standby this can be done before inserting the WAL record > because ProcArrayApplyRecoveryInfo() rechecks the commit status using > the clog. For logical decoding, though, the lock can't be released > early because the clog might be "in the future" from the POV of the > historic snapshot. This would allow for situations where we're waiting > for the end of a transaction listed in the xl_running_xacts record > which, according to the WAL, has committed before the xl_running_xacts > record. Fortunately this routine isn't executed frequently, and it's > only a shared lock. This comment is only for ProcArrayLock, not for XidGenLock. IIUC, LogCurrentRunningXacts doesn't need holding XidGenLock, right? Does there any sense to release them in reversed acquisition order in LogStandbySnapshot like ProcArrayRemove? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 7db86f7885..28d1c152bf 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -1283,6 +1283,9 @@ LogStandbySnapshot(void) */ running = GetRunningTransactionData(); + /* GetRunningTransactionData() acquired XidGenLock, we must release it */ + LWLockRelease(XidGenLock); + /* * GetRunningTransactionData() acquired ProcArrayLock, we must release it. * For Hot Standby this can be done before inserting the WAL record @@ -1304,9 +1307,6 @@ LogStandbySnapshot(void) if (wal_level >= WAL_LEVEL_LOGICAL) LWLockRelease(ProcArrayLock); - /* GetRunningTransactionData() acquired XidGenLock, we must release it */ - LWLockRelease(XidGenLock); - return recptr; }
Re: Asynchronous and "direct" IO support for PostgreSQL.
Hi, On 2022-10-12 14:45:26 +0900, Michael Paquier wrote: > On Tue, Aug 31, 2021 at 10:56:59PM -0700, Andres Freund wrote: > > I've attached the code for posterity, but the series is large enough that I > > don't think it makes sense to do that all that often... The code is at > > https://github.com/anarazel/postgres/tree/aio > > I don't know what's the exact status here, but as there has been no > activity for the past five months, I have just marked the entry as RwF > for now. We're trying to get a number of smaller prerequisite patches merged this CF (aligned alloc, direction IO, dclist, bulk relation extension, ...). Once that's done I'm planning to send out a new version of the (large) remainder of the changes. Greetings, Andres Freund
Re: Add index scan progress to pg_stat_progress_vacuum
Hi, On 2022-11-04 13:27:34 +, Imseih (AWS), Sami wrote: > diff --git a/src/backend/access/gin/ginvacuum.c > b/src/backend/access/gin/ginvacuum.c > index b4fa5f6bf8..3d5e4600dc 100644 > --- a/src/backend/access/gin/ginvacuum.c > +++ b/src/backend/access/gin/ginvacuum.c > @@ -633,6 +633,9 @@ ginbulkdelete(IndexVacuumInfo *info, > IndexBulkDeleteResult *stats, > UnlockReleaseBuffer(buffer); > buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, > > RBM_NORMAL, info->strategy); > + > + if (info->report_parallel_progress) > + > info->parallel_progress_callback(info->parallel_progress_arg); > } > > /* right now we found leftmost page in entry's BTree */ I don't think any of these progress callbacks should be done while pinning a buffer and ... > @@ -677,6 +680,9 @@ ginbulkdelete(IndexVacuumInfo *info, > IndexBulkDeleteResult *stats, > buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, > > RBM_NORMAL, info->strategy); > LockBuffer(buffer, GIN_EXCLUSIVE); > + > + if (info->report_parallel_progress) > + > info->parallel_progress_callback(info->parallel_progress_arg); > } > > MemoryContextDelete(gvs.tmpCxt); ... definitely not while holding a buffer lock. I also don't understand why info->parallel_progress_callback exists? It's only set to parallel_vacuum_progress_report(). Why make this stuff more expensive than it has to already be? > +parallel_vacuum_progress_report(void *arg) > +{ > + ParallelVacuumState *pvs = (ParallelVacuumState *) arg; > + > + if (IsParallelWorker()) > + return; > + > + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, > + > pg_atomic_read_u32(&(pvs->shared->idx_completed_progress))); > +} So each of the places that call this need to make an additional external function call for each page, just to find that there's nothing to do or to make yet another indirect function call. This should probably a static inline function. This is called, for every single page, just to read the number of indexes completed by workers? A number that barely ever changes? This seems all wrong. Greetings, Andres Freund
RE: Perform streaming logical transactions by background workers and parallel apply
Hi all, I have tested the patch set in two cases, so I want to share the result. Case 1. deadlock caused by leader worker, parallel worker, and backend. Case 2. deadlock caused by non-immutable trigger === It has worked well in both cases. PSA reports what I did. I want to investigate more if anymore wants to check. Best Regards, Hayato Kuroda FUJITSU LIMITED # Deadlock caused by leader worker, parallel worker, and backend. ## SCHEMA DEFINITIONS [Publisher] CREATE TABLE tbl1 (c int); CREATE TABLE tbl2 (c1 int primary key, c2 int, c3 int); CREATE PUBLICATION pub FOR ALL TABLES; [Subscriber] CREATE TABLE tbl1 (c int); CREATE UNIQUE INDEX idx_tbl1 on tbl1(c) CREATE TABLE tbl2 (c1 int primary key, c2 int, c3 int); CREATE SUBSCRIPTION sub CONNECTION 'port=$port_N1 user=postgres' PUBLICATION pub WITH(streaming = 'parallel', copy_data = 'false', two_phase = 'on'); ### WORKLOADS Tx1 on publisher BEGIN; INSERT INTO tbl1 SELECT i FROM generate_series(1, 5000); Tx2 on subscriber BEGIN; INSERT INTO tbl2 VALUES (1, 2, 3); INSERT INTO tbl1 VALUES (1); Tx3 on publisher BEGIN; INSERT INTO tbl2 VALUES(1, 2, 3); COMMIT; ### RESULTS Followings were copied from log on subscriber. ``` ERROR: deadlock detected DETAIL: Process (LA) waits for ShareLock on transaction 743; blocked by process (Backend). Process (Backend) waits for ShareLock on transaction 742; blocked by process (PA). Process (PA) waits for AccessShareLock on object 16393 of class 6100 of database 0; blocked by process (LA). Process (LA): Process (Backend): INSERT INTO tbl1 VALUES (1); Process (PA): ``` # deadlock caused by non-immutable trigger ## SCHEMA DEFINITIONS [Publisher] CREATE TABLE tbl1 (c int); CREATE PUBLICATION pub FOR ALL TABLES; [Subscriber] CREATE TABLE tbl1 (c int); CREATE TABLE tbl1_log (c int PRIMARY KEY); CREATE FUNCTION record_insert() RETURNS trigger AS $record_insert$ BEGIN SET search_path TO 'public'; INSERT INTO tbl1_log VALUES (NEW.c); RAISE LOG 'record_insert is fired'; RESET search_path; RETURN NEW; END; $record_insert$ LANGUAGE plpgsql; CREATE TRIGGER record_trigger AFTER INSERT ON tbl1 FOR EACH ROW EXECUTE FUNCTION record_insert(); ALTER TABLE tbl1 ENABLE ALWAYS TRIGGER record_trigger CREATE SUBSCRIPTION sub CONNECTION 'port=$port_N1 user=postgres' PUBLICATION pub WITH(streaming = 'parallel', copy_data = 'false', two_phase = 'on'); ### WORKLOADS Tx1 on publisher BEGIN; INSERT INTO tbl1 SELECT i FROM generate_series(1, 5000); Tx2 on publisher BEGIN; INSERT INTO tbl1 VALUES (1); COMMIT; COMMIT; ### RESULTS Followings were copied from log on subscriber. ``` ERROR: deadlock detected DETAIL: Process (LA) waits for ShareLock on transaction 735; blocked by process (PA). Process (PA) waits for AccessShareLock on object 16400 of class 6100 of database 0; blocked by process (LA). Process (LA): Process (PA): ```
Re: Improve logging when using Huge Pages
At Tue, 8 Nov 2022 11:34:53 +1300, Thomas Munro wrote in > On Tue, Nov 8, 2022 at 4:56 AM Tom Lane wrote: > > Andres Freund writes: > > > On 2022-11-06 14:04:29 +0700, John Naylor wrote: > > Agreed --- changing "on" to be exactly like "try" isn't an improvement. > > If you want "try" semantics, choose "try". > > Agreed, but how can we make the people who want a log message happy, > without upsetting the people who don't want new log messages? Hence > my suggestion of a new level. How about try_verbose? Honestly I don't come up with other users of the new log-level. Another possible issue is it might be a bit hard for people to connect that level to huge_pages=try, whereas I think we shouldn't put a description about the concrete impact range of that log-level. I came up with an alternative idea that add a new huge_pages value try_report or try_verbose, which tell postgresql to *always* report the result of huge_pages = try. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add connection active, idle time to pg_stat_activity
On Tue, Nov 8, 2022 at 7:37 PM Andres Freund wrote: > On 2022-11-08 19:25:27 -0700, David G. Johnston wrote: > > Actually two, because I also suggest that not only is the duration > recorded, > > but a counter be incremented each time a given state becomes the > currently > > active state. Seems like having access to a divisor of some form may be > > useful. > > What for? > Because 5 hours of idle-in-transaction time in a single block means something different than the same 5 hours accumulated across 300 mini-idles. David J.
Re: Collation version tracking for macOS
On Tue, Nov 8, 2022 at 1:22 AM Peter Eisentraut wrote: > I made a Homebrew repository for ICU versions 50 through 72: > https://github.com/petere/homebrew-icu Nice! > All of these packages build and pass their self-tests on my machine. So > from that experience, I think maintaining a repository of ICU versions, > and being able to install more than one for testing this feature, is > feasible. I wonder what the situation with CVEs is in older releases. I heard a rumour that upstream might only patch current + previous, leaving it up to distros to back-patch to whatever they need to support, but I haven't tried to track down cold hard evidence of this or think about what it means for this project...
Re: Add connection active, idle time to pg_stat_activity
On 2022-11-08 19:25:27 -0700, David G. Johnston wrote: > Actually two, because I also suggest that not only is the duration recorded, > but a counter be incremented each time a given state becomes the currently > active state. Seems like having access to a divisor of some form may be > useful. What for?
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hello, I tested this patch on Linux and there is no problem. Also, I reviewed this patch and commented below. @@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record) + if (fork >= 0 && fork <= MAX_FORKNUM) + { + if (fork) + sprintf(forkname, "_%s", forkNames[fork]); + else + forkname[0] = 0; + } + else + pg_fatal("found invalid fork number: %u", fork); Should we add the comment if the main fork is saved without "_main" suffix for code readability? @@ -679,6 +788,9 @@ usage(void) " (default: 1 or the value used in STARTSEG)\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -w, --fullpage only show records with a full page write\n")); + printf(_(" -W, --save-fpi=pathsave full page images to given path as\n" +" LSN.T.D.R.B_F\n")); + printf(_(" -X, --fixup-fpi=path like --save-fpi but apply LSN fixups to saved page\n")); printf(_(" -x, --xid=XID only show records with transaction ID XID\n")); printf(_(" -z, --stats[=record] show statistics instead of records\n" " (optionally, show per-record statistics)\n")); Since lower-case options are displayed at the top, should we switch the order of -x and -X? @@ -972,6 +1093,25 @@ main(int argc, char **argv) } } + int dir_status = pg_check_dir(config.save_fpw_path); + + if (dir_status < 0) + { + pg_log_error("could not access output directory: %s", config.save_fpw_path); + goto bad_argument; + } Should we output %s enclosed with \"? Regards, Sho Kato
Re: allow segment size to be set to < 1GiB
Hi, On 2022-11-07 21:36:33 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-11-07 12:52:25 -0500, Tom Lane wrote: > >> How about instead allowing the segment size to be set in pages? > > > In addition or instead of --with-segsize/-Dsegsize? > > In addition to. What I meant by "instead" was to replace > your proposal of --with-segsize-mb. Working on updating the patch. One semi-interesting bit is that <= 5 blocks per segment fails, because corrupt_page_checksum() doesn't know about segments and src/bin/pg_basebackup/t/010_pg_basebackup.pl does # induce further corruption in 5 more blocks $node->stop; for my $i (1 .. 5) { $node->corrupt_page_checksum($file_corrupt1, $i * $block_size); } $node->start; I'd be content with not dealing with that given the use case of the functionality? A buildfarm animal setting it to 10 seem to suffice. Alternatively we could add segment support to corrupt_page_checksum(). Opinions? FWIW, with HEAD, all tests pass with -Dsegsize_blocks=6 on HEAD. Greetings, Andres Freund
Re: Add connection active, idle time to pg_stat_activity
On Tue, Nov 8, 2022 at 6:56 PM Andres Freund wrote: > > Separately from that, I'm a bit worried about starting to add accumulative > counters to pg_stat_activity. It's already gotten hard to use interactively > due to the number of columns - and why stop with the columns you suggest? > Why > not show e.g. the total number of reads/writes, tuples inserted / deleted, > etc. as well? > > I wonder if we shouldn't add a pg_stat_session or such for per-connection > counters that show not the current state, but accumulated per-session > state. > > I would much rather go down this route than make the existing table wider. pg_stat_activity_state_duration (this patch) [the table - for a given backend - would be empty if track_activities is off] pg_stat_activity_bandwidth_usage (if someone feels like implementing the other items you mention) I'm not really buying into the idea of having multiple states sum their times together. I would expect one column per state. Actually two, because I also suggest that not only is the duration recorded, but a counter be incremented each time a given state becomes the currently active state. Seems like having access to a divisor of some form may be useful. So 10 columns of data plus pid to join back to pg_stat_activity proper. David J.
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
On Tue, 8 Nov 2022 at 19:51, Richard Guo wrote: > For unsorted paths, the original logic here is to explicitly add a Sort > path only for the cheapest-total path. This patch changes that and may > add a Sort path for other paths besides the cheapest-total path. I > think this may introduce in some unnecessary path candidates. Yeah, you're right. The patch shouldn't change that. I've adjusted the attached patch so that part works more like it did before. v2 attached. Thanks David diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 4c6b1d1f55..fe8573d92c 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1959,8 +1959,8 @@ cost_incremental_sort(Path *path, double input_tuples, int width, Cost comparison_cost, int sort_mem, double limit_tuples) { - Coststartup_cost = 0, - run_cost = 0, + Coststartup_cost, + run_cost, input_run_cost = input_total_cost - input_startup_cost; double group_tuples, input_groups; @@ -1969,10 +1969,9 @@ cost_incremental_sort(Path *path, group_input_run_cost; List *presortedExprs = NIL; ListCell *l; - int i = 0; boolunknown_varno = false; - Assert(presorted_keys != 0); + Assert(presorted_keys > 0 && presorted_keys < list_length(pathkeys)); /* * We want to be sure the cost of a sort is never estimated as zero, even @@ -2025,12 +2024,11 @@ cost_incremental_sort(Path *path, /* expression not containing any Vars with "varno 0" */ presortedExprs = lappend(presortedExprs, member->em_expr); - i++; - if (i >= presorted_keys) + if (foreach_current_index(l) + 1 >= presorted_keys) break; } - /* Estimate number of groups with equal presorted keys. */ + /* Estimate the number of groups with equal presorted keys. */ if (!unknown_varno) input_groups = estimate_num_groups(root, presortedExprs, input_tuples, NULL, NULL); @@ -2039,22 +2037,19 @@ cost_incremental_sort(Path *path, group_input_run_cost = input_run_cost / input_groups; /* -* Estimate average cost of sorting of one group where presorted keys are -* equal. Incremental sort is sensitive to distribution of tuples to the -* groups, where we're relying on quite rough assumptions. Thus, we're -* pessimistic about incremental sort performance and increase its average -* group size by half. +* Estimate the average cost of sorting of one group where presorted keys +* are equal. */ cost_tuplesort(_startup_cost, _run_cost, - 1.5 * group_tuples, width, comparison_cost, sort_mem, + group_tuples, width, comparison_cost, sort_mem, limit_tuples); /* * Startup cost of incremental sort is the startup cost of its first group * plus the cost of its input. */ - startup_cost += group_startup_cost - + input_startup_cost + group_input_run_cost; + startup_cost = group_startup_cost + input_startup_cost + + group_input_run_cost; /* * After we started producing tuples from the first group, the cost of @@ -2062,17 +2057,20 @@ cost_incremental_sort(Path *path, * group, plus the total cost to process the remaining groups, plus the * remaining cost of input. */ - run_cost += group_run_cost - + (group_run_cost + group_startup_cost) * (input_groups - 1) - + group_input_run_cost * (input_groups - 1); + run_cost = group_run_cost + (group_run_cost + group_startup_cost) * + (input_groups - 1) + group_input_run_cost * (input_groups - 1); /* * Incremental sort adds some overhead by itself. Firstly, it has to * detect the sort groups. This is roughly equal to one extra copy and -* comparison per tuple. Secondly, it has to reset the tuplesort context -* for every group. +* comparison per tuple. */ run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples; + + /* +* Additionally, we charge double cpu_tuple_cost for each input group to +* account for the tuplesort_reset that's performed after each group. +*/ run_cost += 2.0 * cpu_tuple_cost *
Re: Add connection active, idle time to pg_stat_activity
Hi, On 2022-07-21 18:22:51 +0200, Sergey Dudoladov wrote: > From b5298301a3f5223bd78c519ddcddbd1bec9cf000 Mon Sep 17 00:00:00 2001 > From: Sergey Dudoladov > Date: Wed, 20 Apr 2022 23:47:37 +0200 > Subject: [PATCH] pg_stat_activity: add 'total_active_time' and > 'total_idle_in_transaction_time' > > catversion bump because of the change in the contents of pg_stat_activity > > Author: Sergey Dudoladov, based on the initial version by Rafia Sabih > > Reviewed-by: Aleksander Alekseev, Bertrand Drouvot, and Atsushi Torikoshi > > Discussion: > https://www.postgresql.org/message-id/flat/CA%2BFpmFcJF0vwi-SWW0wYO-c-FbhyawLq4tCpRDCJJ8Bq%3Dja-gA%40mail.gmail.com Isn't this patch breaking pg_stat_database? You removed pgstat_count_conn_active_time() etc and the declaration for pgStatActiveTime / pgStatTransactionIdleTime (but left the definition in pgstat_database.c), but didn't replace it with anything afaics. Separately from that, I'm a bit worried about starting to add accumulative counters to pg_stat_activity. It's already gotten hard to use interactively due to the number of columns - and why stop with the columns you suggest? Why not show e.g. the total number of reads/writes, tuples inserted / deleted, etc. as well? I wonder if we shouldn't add a pg_stat_session or such for per-connection counters that show not the current state, but accumulated per-session state. Greetings, Andres Freund
Re: Slow standby snapshot
On Wed, Nov 9, 2022 at 1:54 PM Andres Freund wrote: > On 2022-11-09 11:42:36 +1300, Thomas Munro wrote: > > On Sat, Sep 17, 2022 at 6:27 PM Simon Riggs > > wrote: > > > I've cleaned up the comments and used a #define for the constant. > > > > > > IMHO this is ready for commit. I will add it to the next CF. > > > > FYI This had many successful cfbot runs but today it blew up on > > Windows when the assertion TransactionIdPrecedesOrEquals(safeXid, > > snap->xmin) failed: > > > > https://api.cirrus-ci.com/v1/artifact/task/5311549010083840/crashlog/crashlog-postgres.exe_1c40_2022-11-08_00-20-28-110.txt > > I don't immediately see how that could be connected to this patch - afaict > that crash wasn't during recovery, and the modified functions should only be > active during hot standby. Indeed, sorry for the noise.
Re: Cygwin cleanup
On Thu, Oct 20, 2022 at 10:40:40PM -0500, Justin Pryzby wrote: > On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote: > > On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby wrote: > > > [train wreck] > > > > Oh my, so I'm getting the impression we might actually be totally > > unstable on Cygwin. Which surprises me because ... wait a minute ... > > lorikeet isn't even running most of the tests. So... we don't really > > know the degree to which any of this works at all? > > Right. > > Maybe it's of limited interest, but .. > > This updates the patch to build and test with meson. > Which first requires patching some meson.builds. > I guess that's needed for some current BF members, too. > Unfortunately, ccache+PCH causes gcc to crash :( Resending with the 'only-if' line commented (doh). And some fixes to 001 as Andres pointed out by on other thread. -- Justin >From 2741472080eceac5cb6d002c39eaf319d7f72b50 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Fri, 30 Sep 2022 13:39:43 -0500 Subject: [PATCH 1/3] meson: other fixes for cygwin XXX: what about HAVE_BUGGY_STRTOF ? See: 20221021034040.gt16...@telsasoft.com --- meson.build | 8 ++-- src/port/meson.build | 4 src/test/regress/meson.build | 2 ++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/meson.build b/meson.build index ce2f223a409..ed24370672a 100644 --- a/meson.build +++ b/meson.build @@ -211,6 +211,10 @@ if host_system == 'aix' elif host_system == 'cygwin' cppflags += '-D_GNU_SOURCE' + dlsuffix = '.dll' + mod_link_args_fmt = ['@0@'] + mod_link_with_name = 'lib@0@.exe.a' + mod_link_with_dir = 'libdir' elif host_system == 'darwin' dlsuffix = '.dylib' @@ -2301,8 +2305,8 @@ gnugetopt_dep = cc.find_library('gnugetopt', required: false) # (i.e., allow '-' as a flag character), so use our version on those platforms # - We want to use system's getopt_long() only if the system provides struct # option -always_replace_getopt = host_system in ['windows', 'openbsd', 'solaris'] -always_replace_getopt_long = host_system == 'windows' or not cdata.has('HAVE_STRUCT_OPTION') +always_replace_getopt = host_system in ['windows', 'cygwin', 'openbsd', 'solaris'] +always_replace_getopt_long = host_system in ['windows', 'cygwin'] or not cdata.has('HAVE_STRUCT_OPTION') # Required on BSDs execinfo_dep = cc.find_library('execinfo', required: false) diff --git a/src/port/meson.build b/src/port/meson.build index c696f1b..0ba83cc7930 100644 --- a/src/port/meson.build +++ b/src/port/meson.build @@ -40,6 +40,10 @@ if host_system == 'windows' 'win32setlocale.c', 'win32stat.c', ) +elif host_system == 'cygwin' + pgport_sources += files( +'dirmod.c', + ) endif if cc.get_id() == 'msvc' diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build index f1adcd9198c..72a23737fa7 100644 --- a/src/test/regress/meson.build +++ b/src/test/regress/meson.build @@ -12,6 +12,8 @@ regress_sources = pg_regress_c + files( host_tuple_cc = cc.get_id() if host_system == 'windows' and host_tuple_cc == 'gcc' host_tuple_cc = 'mingw' +elif host_system == 'cygwin' and host_tuple_cc == 'gcc' + host_tuple_cc = 'cygwin' endif host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc) -- 2.25.1 >From 8f31be4d0bf036df890e32568dbc056c36fd57c5 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 25 Jul 2022 23:05:10 +1200 Subject: [PATCH 2/3] WIP CI support for Cygwin. ci-os-only: cygwin See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com https://cirrus-ci.com/task/5145086722834432 XXX This should use a canned Docker image with all the right packages installed? But if the larger image is slower to start, then maybe not... --- .cirrus.yml | 76 +++ configure | 2 +- configure.ac | 2 +- src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +- src/test/perl/PostgreSQL/Test/Utils.pm| 12 +++- src/test/recovery/t/020_archive_status.pl | 2 +- src/tools/ci/cores_backtrace.sh | 33 +- src/tools/ci/pg_ci_base.conf | 2 + 8 files changed, 122 insertions(+), 11 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 9f2282471a9..02b0f3b7045 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -464,6 +464,82 @@ task: type: text/plain +task: + name: Windows - Cygwin + only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*' + #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*' + timeout_in: 90m + + env: +CPUS: 4 +BUILD_JOBS: 4 +TEST_JOBS: 1 +CCACHE_DIR: /tmp/ccache +CCACHE_LOGFILE: ccache.log +CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap --with-ssl=openssl --with-libxml +# --enable-tap-tests +# --disable-dynamicbase +# --with-gssapi +CONFIGURE_CACHE: /tmp/ccache/configure.cache +PG_TEST_USE_UNIX_SOCKETS:
Re: User functions for building SCRAM secrets
On 11/8/22 12:26, Peter Eisentraut wrote: > On 04.11.22 21:39, Jacob Champion wrote: >> I don't think it's helpful for me to try to block progress on this >> patchset behind the other one. But is there a way for me to help this >> proposal skate in the same general direction? Could Peter's encryption >> framework expand to fit this case in the future? > > We already have support in libpq for doing this (PQencryptPasswordConn()). Sure, but you can't access that in SQL, right? The hand-wavy part is to combine that existing function with your transparent encryption proposal, as a special-case encryptor whose output could be bound to the query. But I guess that wouldn't really help with ALTER ROLE ... PASSWORD, because you can't parameterize it. Hm... --Jacob
Re: Slow standby snapshot
Hi, On 2022-11-09 11:42:36 +1300, Thomas Munro wrote: > On Sat, Sep 17, 2022 at 6:27 PM Simon Riggs > wrote: > > I've cleaned up the comments and used a #define for the constant. > > > > IMHO this is ready for commit. I will add it to the next CF. > > FYI This had many successful cfbot runs but today it blew up on > Windows when the assertion TransactionIdPrecedesOrEquals(safeXid, > snap->xmin) failed: > > https://api.cirrus-ci.com/v1/artifact/task/5311549010083840/crashlog/crashlog-postgres.exe_1c40_2022-11-08_00-20-28-110.txt I don't immediately see how that could be connected to this patch - afaict that crash wasn't during recovery, and the modified functions should only be active during hot standby. Greetings, Andres Freund
Re: security_context_t marked as deprecated in libselinux 3.1
On Fri, Nov 04, 2022 at 08:49:24AM +0900, Michael Paquier wrote: > In line of ad96696, seems like that it would make sense to do the same > here even if the bar is lower. sepgsql has not changed in years, so I > suspect few conflicts. Alvaro, if you want to take care of that, > that's fine by me. I could do it, but not before next week. I got to look at that, now that the minor releases have been tagged, and the change has no conflicts down to 9.3. 9.2 needed a slight tweak, though, but it seemed fine as well. (Tested the build on all branches.) So done all the way down, sticking with the new no-warning policy for all the buildable branches. -- Michael signature.asc Description: PGP signature
Re: Allow file inclusion in pg_hba and pg_ident files
On Tue, Nov 08, 2022 at 10:04:16AM +0900, Michael Paquier wrote: > CF bot unhappy as I have messed up with rules.out. Rebased. I have > removed the restriction on MAXPGPATH in AbsoluteConfigLocation() in > 0001, while on it. The absolute paths built on GUC or ident > inclusions are the same. Rebased after 6bbd8b7g that is an equivalent of the previous 0001. Julien, please note that this is waiting on author for now. What do you think about the now-named v18-0001 and the addition of an ErrorContextCallback to provide more information about the list of included files on error? -- Michael From 50441282e6b298ee737e6e1a490d77baedfe5d69 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 7 Nov 2022 13:35:44 +0900 Subject: [PATCH v18 1/2] Invent open_auth_file() in hba.c, to refactor auth file opening This adds a check on the recursion depth when including auth files, something that has never been done when processing '@' files for database and user name lists in pg_hba.conf. --- src/include/libpq/hba.h | 4 +- src/backend/libpq/hba.c | 100 ++- src/backend/utils/adt/hbafuncs.c | 18 ++ 3 files changed, 79 insertions(+), 43 deletions(-) diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h index 7ad227d34a..a84a5f0961 100644 --- a/src/include/libpq/hba.h +++ b/src/include/libpq/hba.h @@ -177,7 +177,9 @@ extern int check_usermap(const char *usermap_name, extern HbaLine *parse_hba_line(TokenizedAuthLine *tok_line, int elevel); extern IdentLine *parse_ident_line(TokenizedAuthLine *tok_line, int elevel); extern bool pg_isblank(const char c); +extern FILE *open_auth_file(const char *filename, int elevel, int depth, + char **err_msg); extern MemoryContext tokenize_auth_file(const char *filename, FILE *file, - List **tok_lines, int elevel); + List **tok_lines, int elevel, int depth); #endif /* HBA_H */ diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c index a9f87ab5bf..d8c0b585e5 100644 --- a/src/backend/libpq/hba.c +++ b/src/backend/libpq/hba.c @@ -117,7 +117,8 @@ static const char *const UserAuthName[] = static List *tokenize_inc_file(List *tokens, const char *outer_filename, - const char *inc_filename, int elevel, char **err_msg); + const char *inc_filename, int elevel, + int depth, char **err_msg); static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline, int elevel, char **err_msg); static int regcomp_auth_token(AuthToken *token, char *filename, int line_num, @@ -414,7 +415,7 @@ regexec_auth_token(const char *match, AuthToken *token, size_t nmatch, */ static List * next_field_expand(const char *filename, char **lineptr, - int elevel, char **err_msg) + int elevel, int depth, char **err_msg) { char buf[MAX_TOKEN]; bool trailing_comma; @@ -431,7 +432,7 @@ next_field_expand(const char *filename, char **lineptr, /* Is this referencing a file? */ if (!initial_quote && buf[0] == '@' && buf[1] != '\0') tokens = tokenize_inc_file(tokens, filename, buf + 1, - elevel, err_msg); + elevel, depth + 1, err_msg); else tokens = lappend(tokens, make_auth_token(buf, initial_quote)); } while (trailing_comma && (*err_msg == NULL)); @@ -459,6 +460,7 @@ tokenize_inc_file(List *tokens, const char *outer_filename, const char *inc_filename, int elevel, + int depth, char **err_msg) { char *inc_fullname; @@ -468,24 +470,18 @@ tokenize_inc_file(List *tokens, MemoryContext linecxt; inc_fullname = AbsoluteConfigLocation(inc_filename, outer_filename); + inc_file = open_auth_file(inc_fullname, elevel, depth, err_msg); - inc_file = AllocateFile(inc_fullname, "r"); if (inc_file == NULL) { - int save_errno = errno; - - ereport(elevel, -(errcode_for_file_access(), - errmsg("could not open secondary authentication file \"@%s\" as \"%s\": %m", - inc_filename, inc_fullname))); - *err_msg = psprintf("could not open secondary authentication file \"@%s\" as \"%s\": %s", - inc_filename, inc_fullname, strerror(save_errno)); + /* error already logged */ pfree(inc_fullname); return tokens; } /* There is possible recursion here if the file contains @ */ - linecxt = tokenize_auth_file(inc_fullname, inc_file, _lines, elevel); + linecxt = tokenize_auth_file(inc_fullname, inc_file, _lines, elevel, + depth); FreeFile(inc_file); pfree(inc_fullname); @@ -521,6 +517,59 @@ tokenize_inc_file(List *tokens, return tokens; } +/* + * open_auth_file + * Open the given file. + * + * filename: the absolute path to the target file + * elevel: message logging level + * depth: recursion level of the file opened. + * err_msg: details about the error. + * + * Return value is the opened file. On error, returns NULL with details + * about the error stored in "err_msg". + */ +FILE * +open_auth_file(const char *filename, int elevel,
Re: EINTR in ftruncate()
On Wed, Aug 17, 2022 at 7:51 AM Thomas Munro wrote: > On Sat, Jul 16, 2022 at 5:18 PM Thomas Munro wrote: > > On Sat, Jul 16, 2022 at 1:28 AM Tom Lane wrote: > > > Thomas Munro writes: > > > > On Fri, Jul 15, 2022 at 9:34 AM Tom Lane wrote: > > > >> (Someday we oughta go ahead and make our Windows signal API look more > > > >> like POSIX, as I suggested back in 2015. I'm still not taking > > > >> point on that, though.) > > > > > > > For the sigprocmask() part, here's a patch that passes CI. Only the > > > > SIG_SETMASK case is actually exercised by our current code, though. > > > > > > Passes an eyeball check, but I can't actually test it. > > > > Thanks. Pushed. > > > > I'm not brave enough to try to write a replacement sigaction() yet, > > but it does appear that we could rip more ugliness and inconsistencies > > that way, eg sa_mask. > > Here's a draft patch that adds a minimal sigaction() implementation > for Windows. It doesn't implement stuff we're not using, for sample > sa_sigaction functions, but it has the sa_mask feature so we can > harmonize the stuff that I believe you were talking about. Pushed. As discussed before, a much better idea would probably be to use latch-based wakeup instead of putting postmaster logic in signal handlers, but in the meantime this gets rid of the extra Windows-specific noise.
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Enclosed is v6, which squashes your refactor and adds the additional recent suggestions. Thanks! v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Re: Crash in BRIN minmax-multi indexes
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote: > I'll get this fixed in a couple days. Considering the benign nature of > this issue (unnecessary assert) I'm not going to rush. Is this still an outstanding issue ? -- Justin
Re: psql: Add command to use extended query protocol
> > > Btw., this also allows doing things like > > SELECT $1, $2 > \bind '1' '2' \g > \bind '3' '4' \g > That's one of the things I was hoping for. Very cool. > > This isn't a prepared statement being reused, but it relies on the fact > that psql \g with an empty query buffer resends the previous query. > Still kind of neat. Yeah, if they wanted a prepared statement there's nothing stopping them. Review: Patch applies, tests pass. Code is quite straightforward. As for the docs, they're very clear and probably sufficient as-is, but I wonder if we should we explicitly state that the bind-state and bind parameters do not "stay around" after the query is executed? Suggestions in bold: This command causes the extended query protocol (see ) to be used, unlike normal psql operation, which uses the simple query protocol. *Extended query protocol will be used* *even if no parameters are specified, s*o this command can be useful to test the extended query protocol from psql. *This command affects only the next query executed, all subsequent queries will use the regular query protocol by default.* Tests seem comprehensive. I went looking for the TAP test that this would have replaced, but found none, and it seems the only test where we exercise PQsendQueryParams is libpq_pipeline.c, so these tests are a welcome addition. Aside from the possible doc change, it looks ready to go.
Re: Commit fest 2022-11
On Thu, Nov 03, 2022 at 07:43:03PM -0500, Justin Pryzby wrote: > On Thu, Nov 03, 2022 at 11:33:57AM +0900, Ian Lawrence Barwick wrote: > > 2022年11月2日(水) 19:10 Greg Stark : > > > On Tue, 1 Nov 2022 at 06:56, Michael Paquier wrote: > > > > > > > Two people showing up to help is really great, thanks! I'll be around > > > > as well this month, so I'll do my share of patches, as usual. > > > > > > Fwiw I can help as well -- starting next week. I can't do much this week > > > though. > > > > > > I would suggest starting with the cfbot to mark anything that isn't > > > applying cleanly and passing tests (and looking for more than design > > > feedback) as Waiting on Author and reminding the author that it's > > > commitfest time and a good time to bring the patch into a clean state. > > > > Sounds like a plan; I'll make a start on that today/tomorrow as I have > > some time. > > If I'm not wrong, Jacob used the CF app to bulk-mail people about > patches not applying and similar things. That seemed to work well, and > doesn't require sending mails to dozens of threads. If my script is not wrong, these patches add TAP tests, but don't update the requisite ./meson.build file. It seems like it'd be reasonable to set them all as WOA until that's done. $ for a in `git branch -a |sort |grep commitfest/40`; do : echo "$a..."; x=`git log -1 --compact-summary "$a"`; echo "$x" |grep '/t/.*pl.*new' >/dev/null || continue; echo "$x" |grep -Fw meson >/dev/null && continue; git log -1 --oneline "$a"; done ... [CF 40/3558] Allow file inclusion in pg_hba and pg_ident files ... [CF 40/3628] Teach pg_waldump to extract FPIs from the WAL stream ... [CF 40/3646] Skip replicating the tables specified in except table option ... [CF 40/3663] Switching XLog source from archive to streaming when primary available ... [CF 40/3670] pg_rewind: warn when checkpoint hasn't happened after promotion ... [CF 40/3729] Testing autovacuum wraparound ... [CF 40/3877] vacuumlo: add test to vacuumlo for test coverage ... [CF 40/3985] TDE key management patches -- Justin
Re: libpq support for NegotiateProtocolVersion
On 11/8/22 00:40, Peter Eisentraut wrote: > On 02.11.22 20:02, Jacob Champion wrote: >> This new code path doesn't go through the message length checks that are >> done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() >> doesn't take the message length to know where to stop anyway, so a >> misbehaving server can chew up client resources. > > Fixed in new patch. pqGetNegotiateProtocolVersion3() is still ignoring the message length, though; it won't necessarily stop at the message boundary. > We could add negotiation in the future, but then we'd have to first have > a concrete case of something to negotiate about. For example, if we > added an optional performance feature into the protocol, then one could > negotiate by falling back to not using that. But for the kinds of > features I'm thinking about right now (column encryption), you can't > proceed if the feature is not supported. So I think this would need to > be considered case by case. I guess I'm wondering about the definition of "minor" version if the client treats an increment as incompatible by default. But that's a discussion for the future, and this patch is just improving the existing behavior, so I'll pipe down and watch. >> I think the documentation on NegotiateProtocolVersion (not introduced in >> this patch) is misleading/wrong; it says that the version number sent >> back is the "newest minor protocol version supported by the server for >> the major protocol version requested by the client" which doesn't seem >> to match the actual usage seen here. > > I don't follow. If libpq sends a protocol version of 3.1, then the > server responds by saying it supports only 3.0. What are you seeing? I see what you've described on my end, too. The sentence I quoted seemed to imply that the server should respond with only the minor version (the least significant 16 bits). I think it should probably just say "newest protocol version" in the docs. Thanks, --Jacob
Re: WIN32 pg_import_system_collations
On Mon, Nov 7, 2022 at 4:08 PM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > This looks pretty good to me. The refactoring of the non-Windows parts > makes sense. The Windows parts look reasonable on manual inspection, > but again, I don't have access to Windows here, so someone else should > also look it over. > > I was going to say that at least it is getting tested on the CI, but I have found out that meson changes version(). That is fixed in this version. > A small style issue: Change return (TRUE) to return TRUE. > > Fixed. > The code > > + if (strlen(localebuf) == 5 && localebuf[2] == '-') > > might be too specific. At least on some POSIX systems, I have seen > locales with a three-letter language name. Maybe you should look with > strchr() and not be too strict about the exact position. > > Ok, in this version the POSIX alias is created unconditionally. > For the test patch, why is a separate test for non-UTF8 needed on > Windows. Does the UTF8 one not work? > > Windows locales will retain their CP_ACP encoding unless you change the OS code page to UFT8, which is still experimental [1]. > + version() !~ 'Visual C\+\+' > > This probably won't work for MinGW. > > When I proposed this patch it wouldn't have worked because of the project's Windows minimum version requirement, now it should work in MinGW. It actually doesn't because most locales are failing with "skipping locale with unrecognized encoding", but checking what's wrong with pg_get_encoding_from_locale() in MiNGW is subject for another thread. [1] https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do Regards, Juan José Santamaría Flecha v8-0001-WIN32-pg_import_system_collations.patch Description: Binary data v8-0002-Add-collate.windows.win1252-test.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, Nov 8, 2022 at 4:45 PM Justin Pryzby wrote: > > On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote: > > Hi Justin et al, > > > > Enclosed is v5 of this patch which now passes the CirrusCI checks for > > all supported OSes. I went ahead and reworked the test a bit so it's a > > little more amenable to the OS-agnostic approach for testing. > > Great, thanks. > > This includes the changes that I'd started a few months ago. > Plus adding the test which was missing for meson. Cool, will review, thanks. > +format: > LSN.TSOID.DBOID.RELNODE.BLKNO > > I'd prefer if the abbreviations were "reltablespace" and "datoid" Sure, no issues there. > Also, should the test case call pg_relation_filenode() rather than using > relfilenode directly ? Is it a problem that the test code assumes > pagesize=8192 ? Both good points. Is pagesize just exposed via `current_setting('block_size')` or is there a different approach? David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote: > Hi Justin et al, > > Enclosed is v5 of this patch which now passes the CirrusCI checks for > all supported OSes. I went ahead and reworked the test a bit so it's a > little more amenable to the OS-agnostic approach for testing. Great, thanks. This includes the changes that I'd started a few months ago. Plus adding the test which was missing for meson. +format: LSN.TSOID.DBOID.RELNODE.BLKNO I'd prefer if the abbreviations were "reltablespace" and "datoid" Also, should the test case call pg_relation_filenode() rather than using relfilenode directly ? Is it a problem that the test code assumes pagesize=8192 ? -- Justin >From d4cb23bc2f0edcc65b0bef75fd2e8d6e5aeda21f Mon Sep 17 00:00:00 2001 From: David Christensen Date: Wed, 20 Apr 2022 19:59:35 -0500 Subject: [PATCH 1/2] Teach pg_waldump to extract FPIs from the WAL stream Extracts full-page images from the WAL stream into a target directory, which must be empty or not exist. These images are subject to the same filtering rules as normal display in pg_waldump, which means that you can isolate the full page writes to a target relation, among other things. Files are saved with the filename: with formatting to make things somewhat sortable; for instance: -01C0.1663.1.6117.0 -01000150.1664.0.6115.0 -010001E0.1664.0.6114.0 -01000270.1663.1.6116.0 -01000300.1663.1.6113.0 -01000390.1663.1.6112.0 -01000420.1663.1.8903.0 -010004B0.1663.1.8902.0 -01000540.1663.1.6111.0 -010005D0.1663.1.6110.0 If the FPI comes from a fork other than the main fork, the fork name will be appended on the output file name; e.g.: -014A4758.1663.1.12864.0_vm It's noteworthy that the raw block images do not have the current LSN stored with them in the WAL stream (as would be true for on-heap versions of the blocks), nor would the checksum be updated in them (though WAL itself has checksums, so there is some protection there). As such there are two versions of this functionality, one which returns the raw page as it appears in the WAL (--save-fpi) and one which applies the updated pd_lsn and pd_checksum (--fixup-fpi). These images could be loaded/inspected via `pg_read_binary_file()` and used in the `pageinspect` suite of tools to perform detailed analysis on the pages in question, based on historical information, and may come in handy for forensics work. --- doc/src/sgml/ref/pg_waldump.sgml | 79 +++ src/bin/pg_waldump/pg_waldump.c | 151 +- src/bin/pg_waldump/t/002_save_fullpage.pl | 137 3 files changed, 366 insertions(+), 1 deletion(-) create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index d559f091e5f..cc3ce918905 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -240,6 +240,85 @@ PostgreSQL documentation + + -W save_path + --save-fpi=save_path + -X save_path + --fixup-fpi=save_path + + +Save full page images seen in the WAL stream to the +save_path directory, which will be created +if it does not exist. The images saved will be subject to the same +filtering and limiting criteria as display records, but in this +mode pg_waldump will not output any other +information. + + +If invoked using +the -X/--fixup-fpi +option, this page image will include the pd_lsn of +the replayed record rather than the raw page image; as well, +the pd_checksum field will be updated if it had +previously existed. + + +The page images will be saved with the file +format: LSN.TSOID.DBOID.RELNODE.BLKNOFORK + +The dot-separated components are (in order): + + + + + +Component +Description + + + + + +LSN +The LSN of the record with this block, formatted + as two 8-character hexadecimal numbers %08X-%08X + + + +TSOID +tablespace OID for the block + + + +DBOID +database OID for the block + + + +RELNODE +relnode id for the block + + + +BLKNO +the block number of this block + + + +FORK + + if coming from the main fork, will be empty, otherwise will be + one of _fsm, _vm, + or _init. + + + + + + + + + -x xid --xid=xid diff --git
Re: Slow standby snapshot
On Sat, Sep 17, 2022 at 6:27 PM Simon Riggs wrote: > I've cleaned up the comments and used a #define for the constant. > > IMHO this is ready for commit. I will add it to the next CF. FYI This had many successful cfbot runs but today it blew up on Windows when the assertion TransactionIdPrecedesOrEquals(safeXid, snap->xmin) failed: https://api.cirrus-ci.com/v1/artifact/task/5311549010083840/crashlog/crashlog-postgres.exe_1c40_2022-11-08_00-20-28-110.txt
Re: Suppressing useless wakeups in walreceiver
On Tue, Nov 08, 2022 at 09:45:40PM +1300, Thomas Munro wrote: > On Tue, Nov 8, 2022 at 9:20 PM Bharath Rupireddy > wrote: >> Thanks. Do we need a similar wakeup approach for logical replication >> workers in worker.c? Or is it okay that the nap time is 1sec there? > > Yeah, I think so. At least for its idle/nap case (waiting for flush > is also a technically interesting case, but harder, and applies to > non-idle systems so the polling is a little less offensive). Bharath, are you planning to pick this up? If not, I can probably spend some time on it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Suppressing useless wakeups in walreceiver
On Tue, Nov 08, 2022 at 08:46:26PM +1300, Thomas Munro wrote: > On Tue, Nov 8, 2022 at 3:20 PM Thomas Munro wrote: >> This looks pretty good to me. Thanks for picking it up! I can live >> with the change to use a global variable; it seems we couldn't quite >> decide what the scope was for moving stuff into a struct (ie various >> pre-existing stuff representing the walreceiver's state), but I'm OK >> with considering that as a pure refactoring project for a separate >> thread. That array should be "static", though. > > And with that change and a pgindent, pushed. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Non-emergency patch for bug #17679
Andres Freund writes: > On 2022-11-08 11:28:08 -0500, Tom Lane wrote: >> Hence, the attached reverts everything 4ab5dae94 did to this function, >> and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an >> additional reason to take the immediate-unlink path. > I wonder if it's worth aiming slightly higher. There's plenty duplicated code > between the first segment handling and the loop body. Perhaps the if at the > top just should decide whether to unlink the first segment or not, and we then > check that in the body of the loop for segno == 0? I don't care for that. I think the point here is precisely that we want behavior A for the first segment and behavior B for the remaining ones, and so I'd prefer to keep the code that does A and the code that does B distinct. It was a misguided attempt to share that code that got us into trouble here in the first place. Moreover, any future changes to either behavior will be that much harder if we combine the implementations. regards, tom lane
Re: Non-emergency patch for bug #17679
Hi, On 2022-11-08 11:28:08 -0500, Tom Lane wrote: > In the release team's discussion leading up to commit 0e758ae89, > Andres opined that what commit 4ab5dae94 had done to mdunlinkfork > was a mess, and I concur. It invented an entirely new code path > through that function, and required two different behaviors from the > segment-deletion loop. I think a very straight line can be drawn > between that extra complexity and the introduction of a nasty bug. > It's all unnecessary too, because AFAICS all we really need is to > apply the pre-existing behavior for temp tables and REDO mode > to binary-upgrade mode as well. I'm not sure I understand the current code. In the binary upgrade case we currently *do* truncate the file in the else of "Delete or truncate the first segment.", then again truncate it in the loop and then unlink it, right? > Hence, the attached reverts everything 4ab5dae94 did to this function, > and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an > additional reason to take the immediate-unlink path. > > Barring objections, I'll push this after the release freeze lifts. I wonder if it's worth aiming slightly higher. There's plenty duplicated code between the first segment handling and the loop body. Perhaps the if at the top just should decide whether to unlink the first segment or not, and we then check that in the body of the loop for segno == 0? Greetings, Andres Freund
Re: User functions for building SCRAM secrets
On 04.11.22 21:39, Jacob Champion wrote: It seems to me that the use case here is extremely similar to the one being tackled by Peter E's client-side encryption [1]. People want to write SQL to perform a cryptographic operation using a secret, and then send the resulting ciphertext (or in this case, a one-way hash) to the server, but ideally the server should not actually have the secret. It might be possible, but it's a bit of a reach. For instance, there are no keys and no decryption associated with this kind of operation. I don't think it's helpful for me to try to block progress on this patchset behind the other one. But is there a way for me to help this proposal skate in the same general direction? Could Peter's encryption framework expand to fit this case in the future? We already have support in libpq for doing this (PQencryptPasswordConn()).
Re: [PATCHES] Post-special page storage TDE support
Looking into some CF bot failures which didn't show up locally. Will send a v3 when resolved.
Re: Introduce "log_connection_stages" setting.
Hi hackers, I've sketched an initial patch version; feedback is welcome. Regards, Sergey Dudoladov From be2e6b5c2d6fff1021f52f150b4d849dfbd26ec7 Mon Sep 17 00:00:00 2001 From: Sergey Dudoladov Date: Tue, 8 Nov 2022 18:56:26 +0100 Subject: [PATCH] Introduce 'log_connection_messages' This patch removes 'log_connections' and 'log_disconnections' in favor of 'log_connection_messages', thereby introducing incompatibility Author: Sergey Dudoladov Reviewed-by: Discussion: https://www.postgresql.org/message-id/flat/CAA8Fd-qCB96uwfgMKrzfNs32mqqysi53yZFNVaRNJ6xDthZEgA%40mail.gmail.com --- doc/src/sgml/config.sgml | 80 +++-- src/backend/commands/variable.c | 88 +++ src/backend/libpq/auth.c | 5 +- src/backend/postmaster/postmaster.c | 5 +- src/backend/tcop/postgres.c | 14 +-- src/backend/utils/init/postinit.c | 2 +- src/backend/utils/misc/guc_tables.c | 29 +++--- src/backend/utils/misc/postgresql.conf.sample | 5 +- src/include/postgres.h| 5 ++ src/include/postmaster/postmaster.h | 1 - src/include/utils/guc_hooks.h | 3 + src/test/authentication/t/001_password.pl | 2 +- src/test/kerberos/t/001_auth.pl | 2 +- src/test/ldap/t/001_auth.pl | 2 +- src/test/recovery/t/013_crash_restart.pl | 2 +- src/test/recovery/t/022_crash_temp_files.pl | 2 +- src/test/recovery/t/032_relfilenode_reuse.pl | 2 +- src/test/ssl/t/SSL/Server.pm | 2 +- src/tools/ci/pg_ci_base.conf | 3 +- 19 files changed, 182 insertions(+), 72 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 559eb898a9..ab9d118c12 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -140,7 +140,7 @@ An example of what this file might look like is: # This is a comment -log_connections = yes +log_connection_messages = all log_destination = 'syslog' search_path = '"$user", public' shared_buffers = 128MB @@ -335,7 +335,7 @@ UPDATE pg_settings SET setting = reset_val WHERE name = 'configuration_parameter passed to the postgres command via the -c command-line parameter. For example, -postgres -c log_connections=yes -c log_destination='syslog' +postgres -c log_connection_messages=all -c log_destination='syslog' Settings provided in this way override those set via postgresql.conf or ALTER SYSTEM, @@ -6978,23 +6978,65 @@ local0.*/var/log/postgresql - - log_connections (boolean) + + log_connection_messages (string) - log_connections configuration parameter + log_connection_messages configuration parameter -Causes each attempted connection to the server to be logged, -as well as successful completion of both client authentication (if -necessary) and authorization. +Causes stages of each attempted connection to the server to be logged. Example: authorized,disconnected. +The default is the empty string meaning nothing is logged. Only superusers and users with the appropriate SET privilege can change this parameter at session start, and it cannot be changed at all within a session. -The default is off. + + Connection messages + + + + + +Name +Description + + + + +received +Logs reception of a connection. At this point a connection has been received, but no further work has been done: +Postgres is about to start interacting with a client. + + + +authenticated +Logs the original identity that an authentication method employs to identify a user. In most cases, the identity string equals the PostgreSQL username, +but some third-party authentication methods may alter the original user identifier before the server stores it. Failed authentication is always logged regardless of the value of this setting. + + + +authorized +Logs successful completion of authorization. At this point the connection has been fully established. + + + +disconnected +Logs session termination. The log output provides information similar to authorized, +plus the duration of the session. + + + +all +A convenience alias. When present, must be the only value in the list. + + + + + + Some client programs, like psql, attempt @@ -7006,26 +7048,6 @@ local0.*/var/log/postgresql - - log_disconnections (boolean) - - log_disconnections configuration parameter - - - - -Causes session terminations to
Re: Add connection active, idle time to pg_stat_activity
Hello hackers, Is there anything we can do to facilitate merging of this patch ? It has been in the "ready-for-commiter" state for 3 commitfests in a row now. We would appreciate if the patch makes it to version 16: the need to monitor idle-in-transaction connections is very real for us. Regards, Sergey Dudoladov
Out-of-date clang/llvm version lists in PGAC_LLVM_SUPPORT
I happened to notice that these lists of supported versions haven't been updated in a good long time: PGAC_PATH_PROGS(LLVM_CONFIG, llvm-config llvm-config-7 llvm-config-6.0 llvm-config-5.0 llvm-config-4.0 llvm-config-3.9) PGAC_PATH_PROGS(CLANG, clang clang-7 clang-6.0 clang-5.0 clang-4.0 clang-3.9) Given the lack of complaints, it seems likely that nobody is relying on these. Maybe we should just nuke them? If not, I suppose we better add 8 through 15. I may be missing it, but it doesn't look like meson.build has any equivalent lists. So that might be an argument for getting rid of the lists here? regards, tom lane
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Fri, Nov 4, 2022 at 8:25 AM Masahiko Sawada wrote: > For parallel heap pruning, multiple workers will insert key-value > pairs to the radix tree concurrently. The simplest solution would be a > single lock to protect writes but the performance will not be good. > Another solution would be that we can divide the tables into multiple > ranges so that keys derived from TIDs are not conflicted with each > other and have parallel workers process one or more ranges. That way, > parallel vacuum workers can build *sub-trees* and the leader process > can merge them. In use cases of lazy vacuum, since the write phase and > read phase are separated the readers don't need to worry about > concurrent updates. I think that the VM snapshot concept can eventually be used to implement parallel heap pruning. Since every page that will become a scanned_pages is known right from the start with VM snapshots, it will be relatively straightforward to partition these pages into distinct ranges with an equal number of pages, one per worker planned. The VM snapshot structure can also be used for I/O prefetching, which will be more important with parallel heap pruning (and with aio). Working off of an immutable structure that describes which pages to process right from the start is naturally easy to work with, in general. We can "reorder work" flexibly (i.e. process individual scanned_pages in any order that is convenient). Another example is "changing our mind" about advancing relfrozenxid when it turns out that we maybe should have decided to do that at the start of VACUUM [1]. Maybe the specific "changing our mind" idea will turn out to not be a very useful idea, but it is at least an interesting and thought provoking concept. [1] https://postgr.es/m/CAH2-WzkQ86yf==mgAF=cq0qelrwkx3htlw9qo+qx3zbwjjk...@mail.gmail.com -- Peter Geoghegan
Non-emergency patch for bug #17679
In the release team's discussion leading up to commit 0e758ae89, Andres opined that what commit 4ab5dae94 had done to mdunlinkfork was a mess, and I concur. It invented an entirely new code path through that function, and required two different behaviors from the segment-deletion loop. I think a very straight line can be drawn between that extra complexity and the introduction of a nasty bug. It's all unnecessary too, because AFAICS all we really need is to apply the pre-existing behavior for temp tables and REDO mode to binary-upgrade mode as well. Hence, the attached reverts everything 4ab5dae94 did to this function, and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an additional reason to take the immediate-unlink path. Barring objections, I'll push this after the release freeze lifts. regards, tom lane diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index b1a2cb575d..0f642df3c1 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -263,6 +263,12 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo) * The fact that temp rels and regular rels have different file naming * patterns provides additional safety. * + * We also don't do it while performing a binary upgrade. There is no reuse + * hazard in that case, since after a crash or even a simple ERROR, the + * upgrade fails and the whole cluster must be recreated from scratch. + * Furthermore, it is important to remove the files from disk immediately, + * because we may be about to reuse the same relfilenumber. + * * All the above applies only to the relation's main fork; other forks can * just be removed immediately, since they are not needed to prevent the * relfilenumber from being recycled. Also, we do not carefully @@ -319,14 +325,15 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) { char *path; int ret; - BlockNumber segno = 0; path = relpath(rlocator, forknum); /* - * Delete or truncate the first segment. + * Truncate and then unlink the first segment, or just register a request + * to unlink it later, as described in the comments for mdunlink(). */ - if (isRedo || forknum != MAIN_FORKNUM || RelFileLocatorBackendIsTemp(rlocator)) + if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM || + RelFileLocatorBackendIsTemp(rlocator)) { if (!RelFileLocatorBackendIsTemp(rlocator)) { @@ -351,7 +358,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) ereport(WARNING, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); - segno++; } } else @@ -359,42 +365,25 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) /* Prevent other backends' fds from holding on to the disk space */ ret = do_truncate(path); - /* - * Except during a binary upgrade, register request to unlink first - * segment later, rather than now. - * - * If we're performing a binary upgrade, the dangers described in the - * header comments for mdunlink() do not exist, since after a crash or - * even a simple ERROR, the upgrade fails and the whole new cluster - * must be recreated from scratch. And, on the other hand, it is - * important to remove the files from disk immediately, because we may - * be about to reuse the same relfilenumber. - */ - if (!IsBinaryUpgrade) - { - register_unlink_segment(rlocator, forknum, 0 /* first seg */ ); - segno++; - } + /* Register request to unlink first segment later */ + register_unlink_segment(rlocator, forknum, 0 /* first seg */ ); } /* - * Delete any remaining segments (we might or might not have dealt with - * the first one above). + * Delete any additional segments. */ if (ret >= 0) { char *segpath = (char *) palloc(strlen(path) + 12); + BlockNumber segno; /* * Note that because we loop until getting ENOENT, we will correctly * remove all inactive segments as well as active ones. */ - for (;; segno++) + for (segno = 1;; segno++) { - if (segno == 0) -strcpy(segpath, path); - else -sprintf(segpath, "%s.%u", path, segno); + sprintf(segpath, "%s.%u", path, segno); if (!RelFileLocatorBackendIsTemp(rlocator)) {
Re: Tables not getting vacuumed in postgres
> On Nov 8, 2022, at 5:21 AM, Karthik Jagadish (kjagadis) > wrote: > > I didn’t get your point dead tuples are visible to transaction means? > Vacuuming job is to remove dead tuples right? Please see https://www.2ndquadrant.com/en/blog/when-autovacuum-does-not-vacuum/ for more information about your question. Specifically, you might look at the third section down, "Long transactions", which starts with "So, if the table is vacuumed regularly, surely it can’t accumulate a lot of dead rows, right?" You might benefit from reading the entire article rather than skipping down to that section. I hope it helps — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Sat, Nov 5, 2022 at 6:23 PM John Naylor wrote: > > On Fri, Nov 4, 2022 at 10:25 PM Masahiko Sawada wrote: > > > > For parallel heap pruning, multiple workers will insert key-value > > pairs to the radix tree concurrently. The simplest solution would be a > > single lock to protect writes but the performance will not be good. > > Another solution would be that we can divide the tables into multiple > > ranges so that keys derived from TIDs are not conflicted with each > > other and have parallel workers process one or more ranges. That way, > > parallel vacuum workers can build *sub-trees* and the leader process > > can merge them. In use cases of lazy vacuum, since the write phase and > > read phase are separated the readers don't need to worry about > > concurrent updates. > > It's a good idea to use ranges for a different reason -- readahead. See > commit 56788d2156fc3, which aimed to improve readahead for sequential scans. > It might work to use that as a model: Each worker prunes a range of 64 pages, > keeping the dead tids in a local array. At the end of the range: lock the tid > store, enter the tids into the store, unlock, free the local array, and get > the next range from the leader. It's possible contention won't be too bad, > and I suspect using small local arrays as-we-go would be faster and use less > memory than merging multiple sub-trees at the end. Seems a promising idea. I think it might work well even in the current parallel vacuum (ie., single writer). I mean, I think we can have a single lwlock for shared cases in the first version. If the overhead of acquiring the lwlock per insertion of key-value is not negligible, we might want to try this idea. Apart from that, I'm going to incorporate the comments on 0004 patch and try a pointer tagging. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Tables not getting vacuumed in postgres
I didn’t get your point dead tuples are visible to transaction means? Vacuuming job is to remove dead tuples right? Regards, Karthik From: Amul Sul Date: Tuesday, 8 November 2022 at 6:39 PM To: Karthik Jagadish (kjagadis) Cc: pgsql-hack...@postgresql.org , Prasanna Satyanarayanan (prassaty) , Chandruganth Ayyavoo Selvam (chaayyav) , Jaganbabu M (jmunusam) Subject: Re: Tables not getting vacuumed in postgres On Tue, Nov 8, 2022 at 6:11 PM Karthik Jagadish (kjagadis) wrote: > > Hi, > > > > Thanks for the response. > > > > But what I understand that insert update and delete would still work and will > not interfere with vacuuming process. Yes we do perform a lot of updates on > that particular table which is not vacuuming. Does it mean that it waiting > for the lock to be released? > Well, yes, that won't interfere but the primary job of autovacuum is to remove the bloat, if the dead tuple(s) is visible to any transaction, then not going to remove that. > > > Regards, > > Karthik > > > > From: Amul Sul > Date: Tuesday, 8 November 2022 at 5:38 PM > To: Karthik Jagadish (kjagadis) > Cc: pgsql-hack...@postgresql.org , Prasanna > Satyanarayanan (prassaty) , Chandruganth Ayyavoo Selvam > (chaayyav) , Jaganbabu M (jmunusam) > Subject: Re: Tables not getting vacuumed in postgres > > On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis) > wrote: > > > > Hi, > > > > We have a NMS application where we are using postgres as database, what we > > are noticing is that vacuuming is not happening for certain tables for 2-3 > > days and eventually the table bloats and disk space is running out. > > > > What could be the reason for auto vacuuming not happening for certain > > tables? > > > > Check if there is any long-running or prepared transaction. > > Regards, > Amul
Re: Check return value of pclose() correctly
On 01.11.22 21:30, Peter Eisentraut wrote: There are some places where the return value is apparently intentionally ignored, such as in error recovery paths, or psql ignoring a failure to launch the pager. (The intention can usually be inferred by the kind of error checking attached to the corresponding popen() call.) But there are a few places in psql that I'm suspicious about that I have marked, but need to think about further. Hmm. I would leave these out, I think. setQFout() relies on the result of openQueryOutputFile(). And this could make commands like \watch less reliable. I don't quite understand what you are saying here. My point is that, for example, setQFout() thinks it's important to check the result of popen() and write an error message, but it doesn't check the result of pclose() at all. I don't think that makes sense in practice. I have looked this over again. In these cases, if the piped-to command is faulty, you get a "broken pipe" error anyway, so the effect of not checking the pclose() result is negligible. So I have removed the "FIXME" markers without further action. There is also the question whether we want to check the exit status of a user-supplied command, such as in pgbench's \setshell. I have dialed back my patch there, since I don't know what the current practice in pgbench scripts is. If the command fails badly, pgbench will probably complain anyway about invalid output. More important is that something like pg_upgrade does check the exit status when it calls pg_controldata etc. That's what this patch accomplishes. From 86b2b61d30f848b84c69437c7106dea8fdf3738d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 8 Nov 2022 14:03:12 +0100 Subject: [PATCH v2] Check return value of pclose() correctly Some callers didn't check the return value of pclose() or ClosePipeStream() correctly. Either they didn't check it at all or they treated it like the return of fclose(). The correct way is to first check whether the return value is -1, and then report errno, and then check the return value like a result from system(), for which we already have wait_result_to_str() to make it simpler. To make this more compact, expand wait_result_to_str() to also handle -1 explicitly. Discussion: https://www.postgresql.org/message-id/flat/8cd9fb02-bc26-65f1-a809-b1cb360ee...@enterprisedb.com --- src/backend/commands/collationcmds.c | 11 ++- src/bin/pg_ctl/pg_ctl.c | 3 +-- src/bin/pg_upgrade/controldata.c | 12 +--- src/bin/pg_upgrade/exec.c| 6 +- src/bin/pg_upgrade/option.c | 6 +- src/bin/pgbench/pgbench.c| 2 +- src/bin/psql/command.c | 18 ++ src/common/wait_error.c | 6 +- 8 files changed, 50 insertions(+), 14 deletions(-) diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 86fbc7fa019f..fbf45f05aa0f 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -639,6 +639,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS) int naliases, maxaliases, i; + int pclose_rc; /* expansible array of aliases */ maxaliases = 100; @@ -745,7 +746,15 @@ pg_import_system_collations(PG_FUNCTION_ARGS) } } - ClosePipeStream(locale_a_handle); + pclose_rc = ClosePipeStream(locale_a_handle); + if (pclose_rc != 0) + { + ereport(ERROR, + (errcode_for_file_access(), +errmsg("could not execute command \"%s\": %s", + "locale -a", + wait_result_to_str(pclose_rc; + } /* * Before processing the aliases, sort them by locale name. The point diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index a509fc9109e8..ceab603c47d9 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -2154,12 +2154,11 @@ adjust_data_dir(void) fflush(NULL); fd = popen(cmd, "r"); - if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL) + if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL || pclose(fd) != 0) { write_stderr(_("%s: could not determine the data directory using command \"%s\"\n"), progname, cmd); exit(1); } - pclose(fd); free(my_exec_path); /* strip trailing newline and carriage return */ diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c index
Re: Tables not getting vacuumed in postgres
On Tue, Nov 8, 2022 at 6:11 PM Karthik Jagadish (kjagadis) wrote: > > Hi, > > > > Thanks for the response. > > > > But what I understand that insert update and delete would still work and will > not interfere with vacuuming process. Yes we do perform a lot of updates on > that particular table which is not vacuuming. Does it mean that it waiting > for the lock to be released? > Well, yes, that won't interfere but the primary job of autovacuum is to remove the bloat, if the dead tuple(s) is visible to any transaction, then not going to remove that. > > > Regards, > > Karthik > > > > From: Amul Sul > Date: Tuesday, 8 November 2022 at 5:38 PM > To: Karthik Jagadish (kjagadis) > Cc: pgsql-hack...@postgresql.org , Prasanna > Satyanarayanan (prassaty) , Chandruganth Ayyavoo Selvam > (chaayyav) , Jaganbabu M (jmunusam) > Subject: Re: Tables not getting vacuumed in postgres > > On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis) > wrote: > > > > Hi, > > > > We have a NMS application where we are using postgres as database, what we > > are noticing is that vacuuming is not happening for certain tables for 2-3 > > days and eventually the table bloats and disk space is running out. > > > > What could be the reason for auto vacuuming not happening for certain > > tables? > > > > Check if there is any long-running or prepared transaction. > > Regards, > Amul
Re: Tables not getting vacuumed in postgres
Hi, Thanks for the response. But what I understand that insert update and delete would still work and will not interfere with vacuuming process. Yes we do perform a lot of updates on that particular table which is not vacuuming. Does it mean that it waiting for the lock to be released? Regards, Karthik From: Amul Sul Date: Tuesday, 8 November 2022 at 5:38 PM To: Karthik Jagadish (kjagadis) Cc: pgsql-hack...@postgresql.org , Prasanna Satyanarayanan (prassaty) , Chandruganth Ayyavoo Selvam (chaayyav) , Jaganbabu M (jmunusam) Subject: Re: Tables not getting vacuumed in postgres On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis) wrote: > > Hi, > > We have a NMS application where we are using postgres as database, what we > are noticing is that vacuuming is not happening for certain tables for 2-3 > days and eventually the table bloats and disk space is running out. > > What could be the reason for auto vacuuming not happening for certain tables? > Check if there is any long-running or prepared transaction. Regards, Amul
Re: psql: Add command to use extended query protocol
On 08.11.22 13:02, Daniel Verite wrote: A pset variable to control the default seems reasonable as well. The implication would be that if you set that pset variable there is no way to have individual commands use simple query mode directly. +1 except that it would be a \set variable for consistency with the other execution-controlling variables. \pset variables control only the display. Is there a use case for a global setting? It seems to me that that would be just another thing that a super-careful psql script would have to reset to get a consistent starting state.
Re: psql: Add command to use extended query protocol
On 05.11.22 07:34, Corey Huinker wrote: The most compact idea I can think of is to have \bind and \endbind (or more terse equivalents \bp and \ebp) SELECT * FROM foo WHERE type_id = $1 AND cost > $2 \bind 'param1' 'param2' \endbind $2 \g filename.csv I like it. It makes my code even simpler, and it allows using all the different \g variants transparently. See attached patch. Maybe the end-bind param isn't needed at all, we just insist that bind params be single quoted strings or numbers, so the next slash command ends the bind list. Right, the end-bind isn't needed. Btw., this also allows doing things like SELECT $1, $2 \bind '1' '2' \g \bind '3' '4' \g This isn't a prepared statement being reused, but it relies on the fact that psql \g with an empty query buffer resends the previous query. Still kind of neat.From 076df09c701c5e60172e2dc80602726d3761e55c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 8 Nov 2022 13:33:29 +0100 Subject: [PATCH v2] psql: Add command to use extended query protocol This adds a new psql command \bind that sets query parameters and causes the next query to be sent using the extended query protocol. Example: SELECT $1, $2 \bind 'foo' 'bar' \g This may be useful for psql scripting, but one of the main purposes is also to be able to test various aspects of the extended query protocol from psql and to write tests more easily. Discussion: https://www.postgresql.org/message-id/flat/e8dd1cd5-0e04-3598-0518-a605159fe...@enterprisedb.com --- doc/src/sgml/ref/psql-ref.sgml | 33 ++ src/bin/psql/command.c | 37 ++ src/bin/psql/common.c | 15 +++- src/bin/psql/help.c| 1 + src/bin/psql/settings.h| 3 +++ src/bin/psql/tab-complete.c| 1 + src/test/regress/expected/psql.out | 31 + src/test/regress/sql/psql.sql | 14 +++ 8 files changed, 134 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 9494f28063ad..df93a5ca9897 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -879,6 +879,39 @@ Meta-Commands + + \bind [ parameter ] ... + + + + Sets query parameters for the next query execution, with the + specified parameters passed for any parameter placeholders + ($1 etc.). + + + + Example: + +INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g + + + + + This also works for query-execution commands besides + \g, such as \gx and + \gset. + + + + This command causes the extended query protocol (see ) to be used, unlike normal + psql operation, which uses the simple + query protocol. So this command can be useful to test the extended + query protocol from psql. + + + + \c or \connect [ -reuse-previous=on|off ] [ dbname [ username ] [ host ] [ port ] | conninfo ] diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index ab613dd49e0a..3b06169ba0dc 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -63,6 +63,7 @@ static backslashResult exec_command(const char *cmd, PQExpBuffer query_buf, PQExpBuffer previous_buf); static backslashResult exec_command_a(PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_bind(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_C(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_connect(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_cd(PsqlScanState scan_state, bool active_branch, @@ -308,6 +309,8 @@ exec_command(const char *cmd, if (strcmp(cmd, "a") == 0) status = exec_command_a(scan_state, active_branch); + else if (strcmp(cmd, "bind") == 0) + status = exec_command_bind(scan_state, active_branch); else if (strcmp(cmd, "C") == 0) status = exec_command_C(scan_state, active_branch); else if (strcmp(cmd, "c") == 0 || strcmp(cmd, "connect") == 0) @@ -453,6 +456,40 @@ exec_command_a(PsqlScanState scan_state, bool active_branch) return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR; } +/* + * \bind -- set query parameters + */ +static backslashResult +exec_command_bind(PsqlScanState scan_state, bool active_branch) +{ + backslashResult status = PSQL_CMD_SKIP_LINE; + + if (active_branch) + { + char *opt; + int nparams = 0; + int nalloc
Re: when the startup process doesn't (logging startup delays)
On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro wrote: > > On Sat, Oct 30, 2021 at 7:44 AM Robert Haas wrote: > > Committed. > > Is it expected that an otherwise idle standby's recovery process > receives SIGALRM every N seconds, or should the timer be canceled at > that point, as there is no further progress to report? Nice catch. Yeah, that seems unnecessary, see the below standby logs. I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);, something like the attached? I think there'll be no issue with the patch since the StandbyMode gets reset only at the end of recovery (in FinishWalRecovery()) but it can very well be set during recovery (in ReadRecord()). Note that I also added an assertion in has_startup_progress_timeout_expired(), just in case. 2022-11-08 11:28:23.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:23.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:33.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:33.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:43.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:43.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:53.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:53.563 UTC [980909] LOG: startup_progress_timeout_handler called Whilte at it, I noticed that we report redo progress for PITR, but we don't report when standby enters archive recovery mode, say due to a failure in the connection to primary or after the promote signal is found. Isn't it useful to report in this case as well to know the recovery progress? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 40ac498a76f38903bbee37108812907a76bb1a78 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 8 Nov 2022 12:18:11 + Subject: [PATCH v1] Disable STARTUP_PROGRESS_TIMEOUT in standby mode --- src/backend/access/transam/xlogrecovery.c | 13 - src/backend/postmaster/startup.c | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index cb07694aea..5326a98633 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -63,6 +63,7 @@ #include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/pg_rusage.h" +#include "utils/timeout.h" /* Unsupported old recovery command file names (relative to $PGDATA) */ #define RECOVERY_COMMAND_FILE "recovery.conf" @@ -1653,7 +1654,17 @@ PerformWalRecovery(void) */ do { - if (!StandbyMode) + /* + * To avoid server log bloat, we don't report recovery progress in + * a standby as it will always be in recovery unless promoted. We + * also disable the timeout as we don't need it anymore. + */ + if (StandbyMode) + { +if (get_timeout_active(STARTUP_PROGRESS_TIMEOUT)) + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false); + } + else ereport_startup_progress("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X", LSN_FORMAT_ARGS(xlogreader->ReadRecPtr)); diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index f99186eab7..1456e3ad3a 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -347,6 +347,8 @@ has_startup_progress_timeout_expired(long *secs, int *usecs) int useconds; TimestampTz now; + Assert(get_timeout_active(STARTUP_PROGRESS_TIMEOUT) == true); + /* No timeout has occurred. */ if (!startup_progress_timer_expired) return false; -- 2.34.1
Re: Tables not getting vacuumed in postgres
On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis) wrote: > > Hi, > > We have a NMS application where we are using postgres as database, what we > are noticing is that vacuuming is not happening for certain tables for 2-3 > days and eventually the table bloats and disk space is running out. > > What could be the reason for auto vacuuming not happening for certain tables? > Check if there is any long-running or prepared transaction. Regards, Amul
Re: psql: Add command to use extended query protocol
David G. Johnston wrote: > I would keep the \gp meta-command to force extended mode regardless > of whether the query itself requires it. +1 > A pset variable to control the default seems reasonable as well. > The implication would be that if you set that pset variable there is > no way to have individual commands use simple query mode directly. +1 except that it would be a \set variable for consistency with the other execution-controlling variables. \pset variables control only the display. BTW if we wanted to auto-detect that a query requires binding or the extended query protocol, we need to keep in mind that for instance "PREPARE stmt AS $1" must pass without binding, with both the simple and the extended query protocol. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Nov 7, 2022 at 6:49 PM houzj.f...@fujitsu.com wrote: > > On Friday, November 4, 2022 7:45 PM Amit Kapila > wrote: > > 3. > > apply_handle_stream_start(StringInfo s) > > { > > ... > > + if (!first_segment) > > + { > > + /* > > + * Unlock the shared object lock so that parallel apply worker > > + * can continue to receive and apply changes. > > + */ > > + parallel_apply_unlock(winfo->shared->stream_lock_id); > > ... > > } > > > > Can we have an assert before this unlock call that the lock must be > > held? Similarly, if there are other places then we can have assert > > there as well. > > It seems we don't have a standard API can be used without a transaction. > Maybe we can use the list ParallelApplyLockids to check that ? > Yeah, that occurred to me as well but I am not sure if it is a good idea to maintain this list just for assertion but if it turns out that we need to maintain it for a different purpose then we can probably use it for assert as well. Few other comments/questions: = 1. apply_handle_stream_start(StringInfo s) { ... + case TRANS_PARALLEL_APPLY: ... ... + /* + * Unlock the shared object lock so that the leader apply worker + * can continue to send changes. + */ + parallel_apply_unlock(MyParallelShared->stream_lock_id, AccessShareLock); As per the design in the email [1], this lock needs to be released by the leader worker during stream start which means it should be released under the state TRANS_LEADER_SEND_TO_PARALLEL. From the comments as well, it is not clear to me why at this time leader is supposed to be blocked. Is there a reason for doing differently than what is proposed in the original design? 2. Similar to above, it is not clear why the parallel worker needs to release the stream_lock_id lock at stream_commit and stream_prepare? 3. Am, I understanding correctly that you need to lock/unlock in apply_handle_stream_abort() for the parallel worker because after rollback to savepoint, there could be another set of stream or transaction end commands for which you want to wait? If so, maybe an additional comment would serve the purpose. 4. The leader may have sent multiple streaming blocks in the queue + * When the child is processing a streaming block. So only try to + * lock if there is no message left in the queue. Let's slightly reword this to: "By the time child is processing the changes in the current streaming block, the leader may have sent multiple streaming blocks. So, try to lock only if there is no message left in the queue." 5. +parallel_apply_unlock(uint16 lockid, LOCKMODE lockmode) +{ + if (!list_member_int(ParallelApplyLockids, lockid)) + return; + + UnlockSharedObjectForSession(SubscriptionRelationId, MySubscription->oid, + lockid, am_leader_apply_worker() ? + AccessExclusiveLock: + AccessShareLock); This function should use lockmode argument passed rather than deciding based on am_leader_apply_worker. I think this is anyway going to change if we start using a different locktag as discussed in one of the above emails. 6. + /* * Common spoolfile processing. */ -static void -apply_spooled_messages(TransactionId xid, XLogRecPtr lsn) +void +apply_spooled_messages(FileSet *stream_fileset, TransactionId xid, Seems like a spurious line addition. -- With Regards, Amit Kapila.
Re: [PoC] Reducing planning time when tables have many partitions
On 2/11/2022 15:27, Yuya Watari wrote: I noticed that the previous patch does not apply to the current HEAD. I attached the rebased version to this email. Looking into find_em_for_rel() changes I see that you replaced if (bms_is_subset(em->em_relids, rel->relids) with assertion statement. According of get_ecmember_indexes(), the em_relids field of returned equivalence members can contain relids, not mentioned in the relation. I don't understand, why it works now? For example, we can sort by t1.x, but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it is not a mistake, maybe to add a comment why assertion here isn't failed? -- regards, Andrey Lepikhov Postgres Professional
Tables not getting vacuumed in postgres
Hi, We have a NMS application where we are using postgres as database, what we are noticing is that vacuuming is not happening for certain tables for 2-3 days and eventually the table bloats and disk space is running out. What could be the reason for auto vacuuming not happening for certain tables? Autovacuum is enabled Regards, Karthik
Re: when the startup process doesn't (logging startup delays)
On Sat, Oct 30, 2021 at 7:44 AM Robert Haas wrote: > Committed. Is it expected that an otherwise idle standby's recovery process receives SIGALRM every N seconds, or should the timer be canceled at that point, as there is no further progress to report?
Re: Use proc instead of MyProc in ProcArrayGroupClearXid()/TransactionGroupUpdateXidStatus()
On Tue, Nov 8, 2022 at 11:59 AM Amit Kapila wrote: > > On Mon, Nov 7, 2022 at 3:17 PM rajesh singarapu > wrote: > > > > In both TransactionGroupUpdateXidStatus and ProcArrayGroupClearXid > > global MyProc is used. for consistency, replaced with a function local > > variable. > > > > In ProcArrayGroupClearXid(), currently, we always pass MyProc as proc, > so the change suggested by you will work but I think if in the future > someone calls it with a different proc, then the change suggested by > you won't work. Well, yes. Do you have any thoughts around such future usages of ProcArrayGroupClearXid()? > The change in TransactionGroupUpdateXidStatus() looks > good but If we don't want to change ProcArrayGroupClearXid() then I am > not sure if there is much value in making the change in > TransactionGroupUpdateXidStatus(). AFICS, there are many places in the code that use proc == MyProc (20 instances) or proc != MyProc (6 instances) sorts of things. I think defining a macro, something like below, is better for readability. However, I'm concerned that we might have to use it in 26 places. #define IsPGPROCMine(proc) (proc != NULL && proc == MyProc) or just #define IsPGPROCMine(proc) (proc == MyProc) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Mon, Nov 7, 2022 at 11:12 PM Robert Haas wrote: > > On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila wrote: > > I am fine with any of those. Would you like to commit or do you prefer > > me to take care of this? > > Sorry for not responding to this sooner. I think it's too late to do > anything about this for the current round of releases at this point, > but I am fine if you want to take care of it after that. > Okay, I'll take care of this either later this week after the release work is finished or early next week. -- With Regards, Amit Kapila.
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi wrote: > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila wrote: >> >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi >> wrote: >> > >> > A replication slot can be lost when a subscriber is not able to catch up >> > with the load on the primary and the WAL to catch up exceeds >> > max_slot_wal_keep_size. When this happens, target has to be reseeded >> > (pg_dump) from the scratch and this can take longer. I am investigating >> > the options to revive a lost slot. >> > >> >> Why in the first place one has to set max_slot_wal_keep_size if they >> care for WAL more than that? > > Disk full is a typical use where we can't wait until the logical slots to > catch up before truncating the log. > Ideally, in such a case the subscriber should fall back to the physical standby of the publisher but unfortunately, we don't yet have a functionality where subscribers can continue logical replication from physical standby. Do you think if we had such functionality it would serve our purpose? >> If you have a case where you want to >> handle this case for some particular slot (where you are okay with the >> invalidation of other slots exceeding max_slot_wal_keep_size) then the >> other possibility could be to have a similar variable at the slot >> level but not sure if that is a good idea because you haven't >> presented any such case. > > IIUC, ability to fetch WAL from the archive as a fall back mechanism should > automatically take care of all the lost slots. Do you see a need to take care > of a specific slot? > No, I was just trying to see if your use case can be addressed in some other way. BTW, won't copying the WAL again back from archive can lead to a disk full situation. -- With Regards, Amit Kapila.
Re: Tracking last scan time
On Tue, 8 Nov 2022 at 04:10, Michael Paquier wrote: > On Mon, Nov 07, 2022 at 04:54:07PM +0900, Michael Paquier wrote: > > FWIW, all the other areas of pgstatfuncs.c manipulate timestamptz > > fields with a style like the attached. That's a nit, still per the > > role of consistency with the surroundings.. > > > > Anyway, it seems to me that a regression test is in order before a > > scan happens just after the relation creation, and the same problem > > shows up with last_idx_scan. > > Hearing nothing, done this way as of d7744d5. Thanks for the report, > Robert. And thanks for the patch, Dave. > Thank you! -- Dave Page Blog: https://pgsnake.blogspot.com Twitter: @pgsnake EDB: https://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Hi: > cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is > currently underway, this would be an excellent time to update the patch. > Thank you Ian & Andrey for taking care of this! I am planning to start a new thread for this topic in 2 weeks, and will post an update patch at that time. -- Best Regards Andy Fan
Re: Suppressing useless wakeups in walreceiver
On Tue, Nov 8, 2022 at 9:20 PM Bharath Rupireddy wrote: > Thanks. Do we need a similar wakeup approach for logical replication > workers in worker.c? Or is it okay that the nap time is 1sec there? Yeah, I think so. At least for its idle/nap case (waiting for flush is also a technically interesting case, but harder, and applies to non-idle systems so the polling is a little less offensive).
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Le mardi 8 novembre 2022, 02:31:12 CET David Rowley a écrit : > 1. Adjusts add_paths_to_grouping_rel so that we don't add a Sort path > when we can add an Incremental Sort path instead. This removes quite a > few redundant lines of code. This seems sensible > 2. Removes the * 1.5 fuzz-factor in cost_incremental_sort() > 3. Does various other code tidy stuff in cost_incremental_sort(). > 4. Removes the test from incremental_sort.sql that was ensuring the > inferior Sort -> Sort plan was being used instead of the superior Sort > -> Incremental Sort plan. > > I'm not really that 100% confident in the removal of the * 1.5 thing. > I wonder if there's some reason we're not considering that might cause > a performance regression if we're to remove it. I'm not sure about it either. It seems to me that we were afraid of regressions, and having this overcharged just made us miss a new optimization without changing existing plans. With ordered aggregates, the balance is a bit trickier and we are at risk of either regressing on aggregate plans, or more common ordered ones. -- Ronan Dunklau
Re: libpq support for NegotiateProtocolVersion
On 02.11.22 20:02, Jacob Champion wrote: A few notes: + else if (beresp == 'v') + { + if (pqGetNegotiateProtocolVersion3(conn)) + { + /* We'll come back when there is more data */ + return PGRES_POLLING_READING; + } + /* OK, we read the message; mark data consumed */ + conn->inStart = conn->inCursor; + goto error_return; + } This new code path doesn't go through the message length checks that are done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() doesn't take the message length to know where to stop anyway, so a misbehaving server can chew up client resources. Fixed in new patch. It looks like the server is expecting to be able to continue the conversation with a newer client after sending a NegotiateProtocolVersion. Is an actual negotiation planned for the future? The protocol documentation says: | The client may then choose either | to continue with the connection using the specified protocol version | or to abort the connection. In this case, we are choosing to abort the connection. We could add negotiation in the future, but then we'd have to first have a concrete case of something to negotiate about. For example, if we added an optional performance feature into the protocol, then one could negotiate by falling back to not using that. But for the kinds of features I'm thinking about right now (column encryption), you can't proceed if the feature is not supported. So I think this would need to be considered case by case. I think the documentation on NegotiateProtocolVersion (not introduced in this patch) is misleading/wrong; it says that the version number sent back is the "newest minor protocol version supported by the server for the major protocol version requested by the client" which doesn't seem to match the actual usage seen here. I don't follow. If libpq sends a protocol version of 3.1, then the server responds by saying it supports only 3.0. What are you seeing? From 64dc983097553af9bed4293821bd45f1932e62c6 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 8 Nov 2022 09:24:29 +0100 Subject: [PATCH v3] libpq: Handle NegotiateProtocolVersion message Before, receiving a NegotiateProtocolVersion message would result in a confusing error message like expected authentication request from server, but received v This adds proper handling of this protocol message and produces an on-topic error message from it. Discussion: https://www.postgresql.org/message-id/flat/f9c7862f-b864-8ef7-a861-c4638c83e209%40enterprisedb.com --- src/interfaces/libpq/fe-connect.c | 23 ++--- src/interfaces/libpq/fe-protocol3.c | 50 + src/interfaces/libpq/libpq-int.h| 1 + 3 files changed, 69 insertions(+), 5 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 746e9b4f1efc..1e72eb92b073 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3246,10 +3246,11 @@ PQconnectPoll(PGconn *conn) /* * Validate message type: we expect only an authentication -* request or an error here. Anything else probably means -* it's not Postgres on the other end at all. +* request, NegotiateProtocolVersion, or an error here. +* Anything else probably means it's not Postgres on the other +* end at all. */ - if (!(beresp == 'R' || beresp == 'E')) + if (!(beresp == 'R' || beresp == 'v' || beresp == 'E')) { appendPQExpBuffer(>errorMessage, libpq_gettext("expected authentication request from server, but received %c\n"), @@ -3267,14 +3268,15 @@ PQconnectPoll(PGconn *conn) /* * Try to validate message length before using it. * Authentication requests can't be very large, although GSS -* auth requests may not be that small. Errors can be a +* auth requests may not be that small. Same for +* NegotiateProtocolVersion. Errors can be a * little larger, but not
Re: Suppressing useless wakeups in walreceiver
On Tue, Nov 8, 2022 at 1:17 PM Thomas Munro wrote: > > And with that change and a pgindent, pushed. Thanks. Do we need a similar wakeup approach for logical replication workers in worker.c? Or is it okay that the nap time is 1sec there? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: PL/pgSQL cursors should get generated portal names by default
On Mon, Nov 7, 2022 at 11:10 AM Jan Wieck wrote: > On 11/4/22 19:46, Tom Lane wrote: > > Jan Wieck writes: > >> I need to do some testing on this. I seem to recall that the naming was > >> originally done because a reference cursor is basically a named cursor > >> that can be handed around between functions and even the top SQL level > >> of the application. For the latter to work the application needs to > know > >> the name of the portal. > > > > Right. With this patch, it'd be necessary to hand back the actual > > portal name (by returning the refcursor value), or else manually > > set the refcursor value before OPEN to preserve the previous behavior. > > But as far as I saw, all our documentation examples show handing back > > the portal name, so I'm hoping most people do it like that already. > > I was mostly concerned that we may unintentionally break underdocumented > behavior that was originally implemented on purpose. As long as everyone > is aware that this is breaking backwards compatibility in the way it > does, that's fine. > I respect the concern, and applied some deeper thinking to it... Here is the logic I am applying to this compatibility issue and what may break. [FWIW, my motto is to be wrong out loud, as you learn faster] At first pass, I thought "Well, since this does not break a refcursor, which is the obvious use case for RETURNING/PASSING, we are fine!" But in trying to DEFEND this case, I have come up with example of code (that makes some SENSE, but would break): CREATE FUNCTION test() RETURNS refcursor() LANGUAGE plpgsql AS $$ DECLARE cur_this cursor FOR SELECT 1; ref_cur refcursor; BEGIN OPEN cur_this; ref_cur := 'cur_this'; -- Using the NAME of the cursor as the portal name: Should do: ref_cur := cur_this; -- Only works after OPEN RETURN ref_cur; END; $$; As noted in the comments. If the code were: ref_cur := 'cur_this'; -- Now you can't just use ref_cur := cur_this; OPEN cur_this; RETURN ref_cur; Then it would break now... And even the CORRECT syntax would break, since the cursor was not opened, so "cur_this" is null. Now, I have NO IDEA if someone would actually do this. It is almost pathological. The use case would be a complex cursor with parameters, and they changed the code to return a refcursor! This was the ONLY use case I could think of that wasn't HACKY! HACKY use cases involve a child routine setting: local_ref_cursor := 'cur_this'; in order to access a cursor that was NOT passed to the child. FWIW, I tested this, and it works, and I can FETCH in the child routine, and it affects the parents' LOOP as it should... WOW. I would be HAPPY to break such horrible code, it has to be a security concern at some level. Personally (and my 2 cents really shouldn't matter much), I think this should still be fixed. Because I believe this small use case is rare, it will break immediately, and the fix is trivial (just initialize cur_this := 'cur_this' in this example), and the fix removes the Orthogonal Behavior Tom pointed out, which led me to reporting this. I think I have exhausted examples of how this impacts a VALID refcursor implementation. I believe any other such versions are variations of this! And maybe we document that if a refcursor of a cursor is to be returned, that the refcursor is ASSIGNED after the OPEN of the cursor, and it is done without the quotes, as: ref_cursor := cur_this; -- assign the name after opening. Thanks!