Re: psql tests hangs
Hi When I remove this test, then all tests passed diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl index 596746de17..631a1a7335 100644 --- a/src/bin/psql/t/001_basic.pl +++ b/src/bin/psql/t/001_basic.pl @@ -353,11 +353,6 @@ psql_like( # Check \watch # Note: the interval value is parsed with locale-aware strtod() -psql_like( - $node, - sprintf('SELECT 1 \watch c=3 i=%g', 0.01), - qr/1\n1\n1/, - '\watch with 3 iterations'); # Check \watch errors psql_fails_like( Can somebody repeat this testing of FC38? Regards Pavel
Re: enhancing plpgsql debug api - hooks on statements errors and function errors
On Tue, Apr 25, 2023 at 11:33 AM Pavel Stehule wrote: > Hi > út 25. 4. 2023 v 10:27 odesílatel Pavel Stehule > napsal: > >> Hi >> >> When I implemented profiler and coverage check to plpgsql_check I had to >> write a lot of hard maintaining code related to corect finishing some >> operations (counter incrementing) usually executed by stmt_end and func_end >> hooks. It is based on the fmgr hook and its own statement call stack. Can >> be nice if I can throw this code and use some nice buildin API. >> >> Can we enhance dbg API with two hooks stmt_end_err func_end_err ? >> >> These hooks can be called from exception handlers before re raising. >> >> Or we can define new hooks like executor hooks - stmt_exec and func_exec. >> In custom hooks the exception can be catched. >> >> What do you think about this proposal? >> >> +1. I believe I bumped into a few of these use cases with plpgsql_check (special handling for security definer and exception handling). My cursory review of the patch file is that despite the movement of the code, it feels pretty straight forward. The 7% overhead appears in a "tight loop", so it's probably really overstated. I will see if I can apply this and do a more realistic test. [I have a procedure that takes ~2hrs to process a lot of data, I would be curious to see this impact and report back] > I did quick and ugly benchmark on worst case > > CREATE OR REPLACE FUNCTION public.speedtest(i integer) > RETURNS void > LANGUAGE plpgsql > IMMUTABLE > AS $function$ > declare c int = 0; > begin > while c < i > loop > c := c + 1; > end loop; > raise notice '%', c; > end; > $function$ > > and is possible to write some code (see ugly patch) without any > performance impacts if the hooks are not used. When hooks are active, then > there is 7% performance lost. It is not nice - but this is the worst case. > The impact on real code should be significantly lower > > Regards > > Pavel > >
Unlinking Parallel Hash Join inner batch files sooner
Hi, One complaint about PHJ is that it can, in rare cases, use a surprising amount of temporary disk space where non-parallel HJ would not. When it decides that it needs to double the number of batches to try to fit each inner batch into memory, and then again and again depending on your level of bad luck, it leaves behind all the earlier generations of inner batch files to be cleaned up at the end of the query. That's stupid. Here's a patch to unlink them sooner, as a small improvement. The reason I didn't do this earlier is that sharedtuplestore.c continues the pre-existing tradition where each parallel process counts what it writes against its own temp_file_limit. At the time I thought I'd need to have one process unlink all the files, but if a process were to unlink files that it didn't create, that accounting system would break. Without some new kind of shared temp_file_limit mechanism that doesn't currently exist, per-process counters could go negative, creating free money. In the attached patch, I realised something that I'd missed before: there is a safe point for each backend to unlink just the files that it created, and there is no way for a process that created files not to reach that point. Here's an example query that tries 8, 16 and then 32 batches on my machine, because reltuples is clobbered with a bogus value. Pathological cases can try many more rounds than that, but 3 is enough to demonstrate. Using truss and shell tricks I spat out the list of create and unlink operations from master and the attached draft/POC patch. See below. set work_mem = '1MB'; CREATE TABLE t (i int); INSERT INTO t SELECT generate_series(1, 100); ANALYZE t; UPDATE pg_class SET reltuples = reltuples / 4 WHERE relname = 't'; EXPLAIN ANALYZE SELECT COUNT(*) FROM t t1 JOIN t t2 USING (i); This code is also exercised by the existing "bad" case in join_hash.sql. This is the second of two experimental patches investigating increased resource usage in PHJ compared to HJ based on user complaints, this one being per-batch temp files, and the other[1] being per-batch buffer memory. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGKCnU9NjFfzO219V-YeyWr8mZe4JRrf%3Dx_uv6qsePBcOw%40mail.gmail.com = master: 99861: create i3of8.p0.0 99861: create i6of8.p0.0 99861: create i4of8.p0.0 99861: create i5of8.p0.0 99861: create i7of8.p0.0 99861: create i2of8.p0.0 99861: create i1of8.p0.0 99863: create i2of8.p1.0 99862: create i7of8.p2.0 99863: create i5of8.p1.0 99862: create i1of8.p2.0 99863: create i6of8.p1.0 99862: create i5of8.p2.0 99863: create i1of8.p1.0 99862: create i3of8.p2.0 99863: create i7of8.p1.0 99862: create i4of8.p2.0 99863: create i3of8.p1.0 99862: create i2of8.p2.0 99863: create i4of8.p1.0 99862: create i6of8.p2.0 99863: create i8of16.p1.0 99861: create i8of16.p0.0 99862: create i8of16.p2.0 99863: create i9of16.p1.0 99862: create i1of16.p2.0 99863: create i1of16.p1.0 99862: create i9of16.p2.0 99861: create i9of16.p0.0 99861: create i1of16.p0.0 99862: create i10of16.p2.0 99863: create i2of16.p1.0 99862: create i2of16.p2.0 99863: create i10of16.p1.0 99861: create i2of16.p0.0 99861: create i10of16.p0.0 99862: create i11of16.p2.0 99863: create i3of16.p1.0 99862: create i3of16.p2.0 99861: create i3of16.p0.0 99863: create i11of16.p1.0 99861: create i11of16.p0.0 99863: create i4of16.p1.0 99863: create i12of16.p1.0 99862: create i12of16.p2.0 99862: create i4of16.p2.0 99861: create i12of16.p0.0 99861: create i4of16.p0.0 99863: create i13of16.p1.0 99863: create i5of16.p1.0 99862: create i5of16.p2.0 99862: create i13of16.p2.0 99861: create i5of16.p0.0 99861: create i13of16.p0.0 99862: create i6of16.p2.0 99863: create i6of16.p1.0 99861: create i14of16.p0.0 99862: create i14of16.p2.0 99863: create i14of16.p1.0 99861: create i6of16.p0.0 99863: create i15of16.p1.0 99861: create i7of16.p0.0 99863: create i7of16.p1.0 99862: create i15of16.p2.0 99861: create i15of16.p0.0 99862: create i7of16.p2.0 99863: create i16of32.p1.0 99862: create i16of32.p2.0 99861: create i16of32.p0.0 99863: create i17of32.p1.0 99863: create i1of32.p1.0 99861: create i1of32.p0.0 99862: create i17of32.p2.0 99862: create i1of32.p2.0 99861: create i17of32.p0.0 99863: create i18of32.p1.0 99863: create i2of32.p1.0 99862: create i2of32.p2.0 99862: create i18of32.p2.0 99861: create i2of32.p0.0 99861: create i18of32.p0.0 99862: create i3of32.p2.0 99862: create i19of32.p2.0 99861: create i19of32.p0.0 99861: create i3of32.p0.0 99863: create i19of32.p1.0 99863: create i3of32.p1.0 99863: create i20of32.p1.0 99863: create i4of32.p1.0 99861: create i20of32.p0.0 99861: create i4of32.p0.0 99862: create i20of32.p2.0 99862: create i4of32.p2.0 99861: create i21of32.p0.0 99863: create i21of32.p1.0 99861: create i5of32.p0.0 99863: create i5of32.p1.0 99862: create i5of32.p2.0 99862: create i21of32.p2.0 99863: create i22of32.p1.0 99863: create i6of32.p1.0 99861: create i22of32.p0.0 99862: create i22of32.p2.0 99861: create i6of32.p0.0 99862:
Re: [PATCH] Use RelationClose rather than table_close in heap_create_with_catalog
Xiaoran Wang 于2023年3月18日周六 15:04写道: > Hi hackers, > > In heap_create_with_catalog, the Relation new_rel_desc is created > by RelationBuildLocalRelation, not table_open. So it's better to > call RelationClose to release it. > Why it's better to call RelationClose? Is there a problem if using table_close()? > What's more, the comment for it seems useless, just delete it. > > Thanks! > regard, tender wang
Re: Add LZ4 compression in pg_dump
On Tue, May 09, 2023 at 02:12:44PM +, gkokola...@pm.me wrote: > Thank you both for looking. A small consolation is that now there are > tests for this case. +1, noticing that was pure luck ;) Worth noting that the patch posted in [1] has these tests, not the version posted in [2]. +create_sql => 'INSERT INTO dump_test.test_compression_method (col1) ' + . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;', Yep, good and cheap idea to check for longer chunks. That should be enough to loop twice. [1]: https://www.postgresql.org/message-id/SYTRcNgtAbzyn3y3IInh1x-UfNTKMNpnFvI3mr6SyqyVf3PkaDsMy_cpKKgsl3_HdLy2MFAH4zwjxDmFfiLO8rWtSiJWBtqT06OMjeNo4GA=@pm.me [2]: https://www.postgresql.org/message-id/f735df01-0bb4-2fbc-1297-73a520cfc...@enterprisedb.com > Moving on to the other open item for this, please find attached v2 > of the patch as requested. Did you notice the comments of [3] about the second patch that aims to add the null termination in the line from the LZ4 fgets() callback? [3]: https://www.postgresql.org/message-id/zfhcyn4gm2eu6...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: fix stats_fetch_consistency value in postgresql.conf.sample
On Wed, Mar 29, 2023 at 11:03:59PM -0500, Justin Pryzby wrote: > On Wed, Jul 13, 2022 at 04:49:00PM -0700, Andres Freund wrote: > > On 2022-07-14 08:46:02 +0900, Michael Paquier wrote: > > > On Wed, Jul 13, 2022 at 12:30:00PM -0500, Justin Pryzby wrote: > > > > How did you make this list ? Was it by excluding things that failed > > > > for you ? > > > > > > > > cfbot is currently failing due to io_concurrency on windows. > > > > I think there are more GUC which should be included here. > > > > > > > > http://cfbot.cputube.org/kyotaro-horiguchi.html > > > > > > FWIW, I am not really a fan of making this test depend on a hardcoded > > > list of GUCs. > > > > I wonder if we should add flags indicating platform dependency etc to guc.c? > > That should allow to remove most of them? > > Michael commented on this, but on another thread, so I'm copying and > pasting it here. > > On Thu, Mar 23, 2023 at 08:59:57PM -0500, Justin Pryzby wrote: > > On Fri, Mar 24, 2023 at 10:24:43AM +0900, Michael Paquier wrote: > > > >> * Check consistency of GUC defaults between .sample.conf and > > > >> pg_settings.boot_val > > > > - It looks like this was pretty active until last October and might > > > > have been ready to apply at least partially? But no further work or > > > > review has happened since. > > > > > > FWIW, I don't find much appealing the addition of two GUC flags for > > > only the sole purpose of that, > > > > The flags seem independently interesting - adding them here follows > > a suggestion Andres made in response to your complaint. > > 20220713234900.z4rniuaerkq34...@awork3.anarazel.de > > > > > particularly as we get a stronger > > > dependency between GUCs that can be switched dynamically at > > > initialization and at compile-time. > > > > What do you mean by "stronger dependency between GUCs" ? > > I'm still not clear what that means ? Michael ? This fixes an issue with the last version that failed with log_autovacuum_min_duration in cirrusci's pg_ci_base.conf. And now includes both a perl and a sql-based versions of the test - both of which rely on the flags. >From 963b56636b9f7fd4a78e502c6acba07919518910 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 19 Jul 2022 08:31:56 -0500 Subject: [PATCH 1/2] pg_settings_get_flags(): add DEFAULT_COMPILE and DEFAULT_INITDB .. for settings which vary by ./configure or platform or initdb Note that this is independent from PGC_S_DYNAMIC_DEFAULT. --- doc/src/sgml/func.sgml | 15 ++ src/backend/utils/misc/guc_funcs.c | 6 ++- src/backend/utils/misc/guc_tables.c | 76 ++--- src/include/utils/guc.h | 2 + 4 files changed, 69 insertions(+), 30 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e13fce6f6b1..17069d2249e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24945,6 +24945,21 @@ SELECT collation for ('foo' COLLATE "de_DE"); FlagDescription + + + DEFAULT_COMPILE + Parameters with this flag have default values which vary by + platform, or compile-time options. + + + + + DEFAULT_INITDB + Parameters with this flag have default values which are + determined dynamically during initdb. + + + EXPLAIN Parameters with this flag are included in diff --git a/src/backend/utils/misc/guc_funcs.c b/src/backend/utils/misc/guc_funcs.c index 52c361e9755..183943c1973 100644 --- a/src/backend/utils/misc/guc_funcs.c +++ b/src/backend/utils/misc/guc_funcs.c @@ -551,7 +551,7 @@ ShowAllGUCConfig(DestReceiver *dest) Datum pg_settings_get_flags(PG_FUNCTION_ARGS) { -#define MAX_GUC_FLAGS 6 +#define MAX_GUC_FLAGS 8 char *varname = TextDatumGetCString(PG_GETARG_DATUM(0)); struct config_generic *record; int cnt = 0; @@ -564,6 +564,10 @@ pg_settings_get_flags(PG_FUNCTION_ARGS) if (record == NULL) PG_RETURN_NULL(); + if (record->flags & GUC_DEFAULT_COMPILE) + flags[cnt++] = CStringGetTextDatum("DEFAULT_COMPILE"); + if (record->flags & GUC_DEFAULT_INITDB) + flags[cnt++] = CStringGetTextDatum("DEFAULT_INITDB"); if (record->flags & GUC_EXPLAIN) flags[cnt++] = CStringGetTextDatum("EXPLAIN"); if (record->flags & GUC_NO_RESET) diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 2f42cebaf62..94b0aa24a98 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1233,7 +1233,7 @@ struct config_bool ConfigureNamesBool[] = {"debug_assertions", PGC_INTERNAL, PRESET_OPTIONS, gettext_noop("Shows whether the running server has assertion checks enabled."), NULL, - GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_DEFAULT_COMPILE }, _enabled, DEFAULT_ASSERT_ENABLED, @@ -1425,7 +1425,8 @@ struct config_bool ConfigureNamesBool[] = { {"update_process_title", PGC_SUSET, PROCESS_TITLE,
Re: Large files for relations
Greetings, * Corey Huinker (corey.huin...@gmail.com) wrote: > On Wed, May 3, 2023 at 1:37 AM Thomas Munro wrote: > > On Wed, May 3, 2023 at 5:21 PM Thomas Munro > > wrote: > > > rsync --link-dest ... rsync isn't really a safe tool to use for PG backups by itself unless you're using it with archiving and with start/stop backup and with checksums enabled. > > I wonder if rsync will grow a mode that can use copy_file_range() to > > share blocks with a reference file (= previous backup). Something > > like --copy-range-dest. That'd work for large-file relations > > (assuming a file system that has block sharing, like XFS and ZFS). > > You wouldn't get the "mtime is enough, I don't even need to read the > > bytes" optimisation, which I assume makes all database hackers feel a > > bit queasy anyway, but you'd get the space savings via the usual > > rolling checksum or a cheaper version that only looks for strong > > checksum matches at the same offset, or whatever other tricks rsync > > might have up its sleeve. There's also really good reasons to have multiple full backups and not just a single full backup and then lots and lots of incrementals which basically boils down to "are you really sure that one copy of that one really important file won't every disappear from your backup repository..?" That said, pgbackrest does now have block-level incremental backups (where we define our own block size ...) and there's reasons we decided against going down the LSN-based approach (not the least of which is that the LSN isn't always updated...), but long story short, moving to larger than 1G files should be something that pgbackrest will be able to handle without as much impact as there would have been previously in terms of incremental backups. There is a loss in the ability to use mtime to scan just the parts of the relation that changed and that's unfortunate but I wouldn't see it as really a game changer (and yes, there's certainly an argument for not trusting mtime, though I don't think we've yet had a report where there was an mtime issue that our mtime-validity checking didn't catch and force pgbackrest into checksum-based revalidation automatically which resulted in an invalid backup... of course, not enough people test their backups...). > I understand the need to reduce open file handles, despite the > possibilities enabled by using large numbers of small file sizes. I'm also generally in favor of reducing the number of open file handles that we have to deal with. Addressing the concerns raised nearby about weird corner-cases of non-1G length ABCDEF.1 files existing while ABCDEF.2, and more, files exist is certainly another good argument in favor of getting rid of segments. > I am curious whether a move like this to create a generational change in > file file format shouldn't be more ambitious, perhaps altering the block > format to insert a block format version number, whether that be at every > block, or every megabyte, or some other interval, and whether we store it > in-file or in a separate file to accompany the first non-segmented. Having > such versioning information would allow blocks of different formats to > co-exist in the same table, which could be critical to future changes such > as 64 bit XIDs, etc. To the extent you're interested in this, there are patches posted which are alrady trying to move us in a direction that would allow for different page formats that add in space for other features such as 64bit XIDs, better checksums, and TDE tags to be supported. https://commitfest.postgresql.org/43/3986/ Currently those patches are expecting it to be declared at initdb time, but the way they're currently written that's more of a soft requirement as you can tell on a per-page basis what features are enabled for that page. Might make sense to support it in that form first anyway though, before going down the more ambitious route of allowing different pages to have different sets of features enabled for them concurrently. When it comes to 'a separate file', we do have forks already and those serve a very valuable but distinct use-case where you can get information from the much smaller fork (be it the FSM or the VM or some future thing) while something like 64bit XIDs or a stronger checksum is something you'd really need on every page. I have serious doubts about a proposal where we'd store information needed on every page read in some far away block that's still in the same file such as using something every 1MB as that would turn every block access into two.. Thanks, Stephen signature.asc Description: PGP signature
Re: walsender performance regression due to logical decoding on standby changes
Hi, On 2023-05-09 13:38:24 -0700, Jeff Davis wrote: > On Tue, 2023-05-09 at 12:02 -0700, Andres Freund wrote: > > I don't think the approach of not having any sort of "registry" of > > whether > > anybody is waiting for the replay position to be updated is > > feasible. Iterating over all walsenders slots is just too expensive - > > Would it work to use a shared counter for the waiters (or, two > counters, one for physical and one for logical), and just early exit if > the count is zero? That doesn't really fix the problem - once you have a single walsender connected, performance is bad again. Greetings, Andres Freund
Re: Feature: Add reloption support for table access method
Hi, On 2023-05-05 16:44:39 +0800, 吴昊 wrote: > When I wrote an extension to implement a new storage by table access method. > I found some issues > that the existing code has strong assumptions for heap tables now. Here are 3 > issues that I currently have: > > > 1. Index access method has a callback to handle reloptions, but table access > method hasn't. We can't > add storage-specific parameters by `WITH` clause when creating a table. Makes sense to add that. > 2. Existing code strongly assumes that the data file of a table structures by > a serial physical files named > in a hard coded rule: [.]. It may only fit for heap like > tables. A new storage may have its > owner structure on how the data files are organized. The problem happens when > dropping a table. I agree that it's not great, but I don't think it's particularly easy to fix (because things like transactional DDL require fairly tight integration). Most of the time it should be possible can also work around the limitations though. > 3. The existing code also assumes that the data file consists of a series of > fix-sized block. It may not > be desired by other storage. Is there any suggestions on this situation? That's a requirement of using the buffer manager, but if you don't want to rely on that, you can use a different pattern. There's some limitations (format of TIDs, most prominently), but you should be able to deal with that. I don't think it would make sense to support other block sizes in the buffer manager. > The rest of this mail is to talk about the first issue. It looks reasonable > to add a similar callback in > struct TableAmRoutine, and parse reloptions by the callback. This patch is in > the attachment file. Why did you add relkind to the callbacks? The callbacks are specific to a certain relkind, so I don't think that makes sense. I don't think we really need GetTableAmRoutineByAmId() that raises nice errors etc - as the AM has already been converted to an oid, we shouldn't need to recheck? > +bytea * > +table_reloptions_am(Oid accessMethodId, Datum reloptions, char relkind, bool > validate) > { > ... > > + /* built-in table access method put here to fetch TAM fast */ > + case HEAP_TABLE_AM_OID: > + tam = GetHeapamTableAmRoutine(); > + break; > default: > - /* other relkinds are not supported */ > - return NULL; > + tam = GetTableAmRoutineByAmId(accessMethodId); > + break; Why do we need this fastpath? This shouldn't be something called at a meaningful frequency? > } > + return table_reloptions(tam->amoptions, reloptions, relkind, validate); > } I'd just pass the tam, instead of an individual function. > @@ -866,6 +866,11 @@ typedef struct TableAmRoutine > >struct SampleScanState *scanstate, > >TupleTableSlot *slot); > > + /* > + * This callback is used to parse reloptions for relation/matview/toast. > + */ > + bytea *(*amoptions)(Datum reloptions, char relkind, bool > validate); > + > } TableAmRoutine; Did you mean table instead of relation in the comment? > Another thing about reloption is that the current API passes a parameter > `validate` to tell the parse > functioin to check and whether to raise an error. It doesn't have enough > context when these reloptioins > are used: > 1. CREATE TABLE ... WITH(...) > 2. ALTER TABLE ... SET ... > 3. ALTER TABLE ... RESET ... > The reason why the context matters is that some reloptions are disallowed to > change after creating > the table, while some reloptions are allowed. What kind of reloption are you thinking of here? Greetings, Andres Freund
Re: Large files for relations
On Wed, May 3, 2023 at 1:37 AM Thomas Munro wrote: > On Wed, May 3, 2023 at 5:21 PM Thomas Munro > wrote: > > rsync --link-dest > > I wonder if rsync will grow a mode that can use copy_file_range() to > share blocks with a reference file (= previous backup). Something > like --copy-range-dest. That'd work for large-file relations > (assuming a file system that has block sharing, like XFS and ZFS). > You wouldn't get the "mtime is enough, I don't even need to read the > bytes" optimisation, which I assume makes all database hackers feel a > bit queasy anyway, but you'd get the space savings via the usual > rolling checksum or a cheaper version that only looks for strong > checksum matches at the same offset, or whatever other tricks rsync > might have up its sleeve. > I understand the need to reduce open file handles, despite the possibilities enabled by using large numbers of small file sizes. Snowflake, for instance, sees everything in 1MB chunks, which makes massively parallel sequential scans (Snowflake's _only_ query plan) possible, though I don't know if they accomplish that via separate files, or via segments within a large file. I am curious whether a move like this to create a generational change in file file format shouldn't be more ambitious, perhaps altering the block format to insert a block format version number, whether that be at every block, or every megabyte, or some other interval, and whether we store it in-file or in a separate file to accompany the first non-segmented. Having such versioning information would allow blocks of different formats to co-exist in the same table, which could be critical to future changes such as 64 bit XIDs, etc.
Re: walsender performance regression due to logical decoding on standby changes
On Tue, 2023-05-09 at 12:02 -0700, Andres Freund wrote: > I don't think the approach of not having any sort of "registry" of > whether > anybody is waiting for the replay position to be updated is > feasible. Iterating over all walsenders slots is just too expensive - Would it work to use a shared counter for the waiters (or, two counters, one for physical and one for logical), and just early exit if the count is zero? Regards, Jeff Davis
walsender performance regression due to logical decoding on standby changes
Hi, Unfortunately I have found the following commit to have caused a performance regression: commit e101dfac3a53c20bfbf1ca85d30a368c2954facf Author: Andres Freund Date: 2023-04-08 00:24:24 -0700 For cascading replication, wake physical and logical walsenders separately Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would have been not applied yet, so logical walsenders were awakened too early. Per idea from Jeff Davis and Amit Kapila. Author: "Drouvot, Bertrand" Reviewed-By: Jeff Davis Reviewed-By: Robert Haas Reviewed-by: Amit Kapila Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/caa4ek1+zo5lueisabx10c81lu-fwmko4m9wyg1cdkbw7hqh...@mail.gmail.com The problem is that, on a standby, after the change - as needed to for the approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens for every record, instead of only happening when the timeline is changed (or WAL is flushed or ...). WalSndWakeup() iterates over all walsender slots, regardless of whether in use. For each of the walsender slots it acquires a spinlock. When replaying a lot of small-ish WAL records I found the startup process to spend the majority of the time in WalSndWakeup(). I've not measured it very precisely yet, but the overhead is significant (~35% slowdown), even with the default max_wal_senders. If that's increased substantially, it obviously gets worse. The only saving grace is that this is not an issue on the primary. I unfortunately spent less time on this commit of the logical-decoding-on-standby series than on the others. There were so many other senior contributors discussing it, that I "relaxed" a bit too much. I don't think the approach of not having any sort of "registry" of whether anybody is waiting for the replay position to be updated is feasible. Iterating over all walsenders slots is just too expensive - WalSndWakeup() shows up even if I remove the spinlock (which we likely could, especially when just checking if the the walsender is connected). My current guess is that mis-using a condition variable is the best bet. I think it should work to use ConditionVariablePrepareToSleep() before a WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use ConditionVariableSleep(). The latch set from ConditionVariableBroadcast() would still cause the necessary wakeup. Greetings, Andres Freund
Re: psql tests hangs
Hi út 9. 5. 2023 v 13:53 odesílatel Pavel Stehule napsal: > > > út 9. 5. 2023 v 11:07 odesílatel Pavel Stehule > napsal: > >> >> >> út 9. 5. 2023 v 10:48 odesílatel Daniel Gustafsson >> napsal: >> >>> > On 9 May 2023, at 08:52, Pavel Stehule >>> wrote: >>> > >>> > Hi >>> > >>> > I try run make check-world. Now I have problems with tests of psql >>> > >>> > I had to cancel tests >>> > >>> > log: >>> > >>> > [08:46:49.828](0.038s) ok 63 - no ON_ERROR_STOP, --single-transaction >>> and multiple -c switches >>> > [08:46:49.860](0.033s) ok 64 - client-side error commits transaction, >>> no ON_ERROR_STOP and multiple -c switches >>> > [08:46:49.928](0.067s) ok 65 - \copy from with DEFAULT: exit code 0 >>> > [08:46:49.929](0.001s) ok 66 - \copy from with DEFAULT: no stderr >>> > [08:46:49.930](0.001s) ok 67 - \copy from with DEFAULT: matches >>> > death by signal at >>> /home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/Cluster.pm >>> line 3042. >>> > # Postmaster PID for node "main" is 157863 >>> > ### Stopping node "main" using mode immediate >>> > # Running: pg_ctl -D >>> /home/pavel/src/postgresql.master/src/bin/psql/tmp_check/t_001_basic_main_data/pgdata >>> -m immediate stop >>> > waiting for server to shut down done >>> > server stopped >>> > # No postmaster PID for node "main" >>> > [08:47:30.361](40.431s) # Tests were run but no plan was declared and >>> done_testing() was not seen. >>> > [08:47:30.362](0.001s) # Looks like your test exited with 4 just after >>> 67. >>> > Warning: unable to close filehandle $orig_stderr properly: Broken pipe >>> during global destruction. >>> >>> I'm unable to reproduce, and this clearly works in the buildfarm and >>> CI. Did >>> you run out of disk on the volume during the test or something similar? >>> Anything interesting in the serverlogs from the tmp_check install? >>> >> >> I have enough free space on disc >> >> I don't see nothing interesting in log (it is another run) >> >> 2023-05-09 08:50:04.839 CEST [158930] 001_basic.pl LOG: statement: COPY >> copy_default FROM STDIN with (format 'csv', default 'placeholder'); >> 2023-05-09 08:50:04.841 CEST [158930] 001_basic.pl LOG: statement: >> SELECT * FROM copy_default >> 2023-05-09 08:50:04.879 CEST [158932] 001_basic.pl LOG: statement: >> SELECT 1. >> 2023-05-09 08:50:04.888 CEST [158932] 001_basic.pl LOG: statement: >> SELECT 1. >> 2023-05-09 08:50:04.898 CEST [158932] 001_basic.pl LOG: statement: >> SELECT 1. >> 2023-05-09 08:50:28.375 CEST [158862] LOG: received immediate shutdown >> request >> 2023-05-09 08:50:28.385 CEST [158862] LOG: database system is shut down >> >> backtrace from perl >> >> Program received signal SIGINT, Interrupt. >> 0x7f387ecc1ade in select () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x7f387ecc1ade in select () from /lib64/libc.so.6 >> #1 0x7f387e97363b in Perl_pp_sselect () from /lib64/libperl.so.5.36 >> #2 0x7f387e917958 in Perl_runops_standard () from >> /lib64/libperl.so.5.36 >> #3 0x7f387e88259d in perl_run () from /lib64/libperl.so.5.36 >> #4 0x5588bceb234a in main () >> >> Regards >> > > I repeated another build with the same result. > > Tested REL_15_STABLE branch without any problems. > There is some dependence on locales for commit 96c498d2f8ce5f0082c64793f94e2d0cfa7d7605 with my cs_CZ.utf8 locale echo "# +++ tap check in src/bin/psql +++" && rm -rf '/home/pavel/src/postgresql.master/src/bin/psql'/tmp_check && /usr/bin/mkdir -p '/home/pavel/src/postgresql.master/src/bin/psql'/tmp_check && cd . && TESTLOGDIR='/home/pavel/src/postgresql.master/src/bin/psql/tmp_check/log' TESTDATADIR='/home/pavel/src/postgresql.master/src/bin/psql/tmp_check' PATH="/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/bin:/home/pavel/src/postgresql.master/src/bin/psql:$PATH" LD_LIBRARY_PATH="/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/lib" PGPORT='65432' top_builddir='/home/pavel/src/postgresql.master/src/bin/psql/../../..' PG_REGRESS='/home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/regress/pg_regress' /usr/bin/prove -I ../../../src/test/perl/ -I . t/*.pl # +++ tap check in src/bin/psql +++ t/001_basic.pl ... 15/? # Failed test '\watch with 3 iterations: exit code 0' # at t/001_basic.pl line 354. # got: '3' # expected: '0' # Failed test '\watch with 3 iterations: no stderr' # at t/001_basic.pl line 354. # got: 'psql::1: error: \watch: incorrect interval value "0.01"' # expected: '' # Failed test '\watch with 3 iterations: matches' # at t/001_basic.pl line 354. # '' # doesn't match '(?^:1\n1\n1)' # Looks like you failed 3 tests of 80. t/001_basic.pl ... Dubious, test returned 3 (wstat 768, 0x300) Failed 3/80 subtests t/010_tab_completion.pl .. ok t/020_cancel.pl .. ok Test Summary Report --- t/001_basic.pl (Wstat: 768 (exited 3) Tests: 80 Failed: 3) Failed
Re: benchmark results comparing versions 15.2 and 16
On Fri, May 5, 2023 at 10:01 PM MARK CALLAGHAN wrote: > I have two more runs of the benchmark in progress so we will have 3 > results for each of the test cases to confirm that the small regressions > are repeatable. > They get similar results. Then I tried Linux perf but the hierarchical call stacks, to be used for Flamegraph, have too many "[unknown]" entries. I was using: ./configure --prefix=$pfx --enable-debug CFLAGS="-O3 -march=native -mtune=native -flto" LDFLAGS="-flto" > o.cf.$x 2> e.cf.$x Adding -no-omit-frame-pointer fixes the problem, so I am repeating the benchmark with that to confirm there are still regressions and then I will get flamegraphs. -- Mark Callaghan mdcal...@gmail.com
Re: benchmark results comparing versions 15.2 and 16
Hi, On 2023-05-08 12:11:17 -0700, Andres Freund wrote: > Hi, > > On 2023-05-08 16:00:01 +0300, Alexander Lakhin wrote: > > This difference is confirmed by multiple test runs. `git bisect` for this > > regression pointed at f193883fc. > > I can reproduce a significant regression due to f193883fc of a workload just > running > SELECT CURRENT_TIMESTAMP; > > A single session running it on my workstation via pgbench -Mprepared gets > before: > tps = 89359.128359 (without initial connection time) > after: > tps = 83843.585152 (without initial connection time) > > Obviously this is an extreme workload, but that nevertheless seems too large > to just accept... Added an open item for this. Greetings, Andres Freund
Re: psql \watch 2nd argument: iteration count
Peter Eisentraut writes: > On 13.03.23 02:17, Michael Paquier wrote: >> I am not sure that this will be the last option we'll ever add to >> \watch, so I'd rather have us choose a design more flexible than >> what's proposed here, in a way similar to \g or \gx. > On the other hand, we also have option syntax in \connect that is like > -foo. Would that be a better match here? We should maybe decide before > we diverge and propagate two different option syntaxes in backslash > commands. Reasonable point to raise, but I think \connect's -reuse-previous is in the minority. \connect itself can use option=value syntax in the conninfo string (in fact, I guess -reuse-previous was spelled that way in hopes of not being confusable with a conninfo option). We also have option=value in the \g and \gx commands. I don't see any other psql metacommands that use options spelled like -foo. In short, I'm satisfied with the current answer. There's still time to debate it though. regards, tom lane
Re: psql \watch 2nd argument: iteration count
On 13.03.23 02:17, Michael Paquier wrote: On Sun, Mar 12, 2023 at 01:05:39PM -0700, Andrey Borodin wrote: In the review above Kyotaro-san suggested that message should contain information on what it expects... So, maybe then pg_log_error("\\watch interval must be non-negative number, but argument is '%s'", opt); ? Or perhaps with articles? pg_log_error("\\watch interval must be a non-negative number, but the argument is '%s'", opt); - HELP0(" \\watch [SEC] execute query every SEC seconds\n"); + HELP0(" \\watch [SEC [N]] execute query every SEC seconds N times\n"); Is that really the interface we'd want to work with in the long-term? For one, this does not give the option to specify only an interval while relying on the default number of seconds. This may be fine, but it does not strike me as the best choice. How about doing something more extensible, for example: \watch [ (option=value [, option=value] .. ) ] [SEC] I am not sure that this will be the last option we'll ever add to \watch, so I'd rather have us choose a design more flexible than what's proposed here, in a way similar to \g or \gx. On the other hand, we also have option syntax in \connect that is like -foo. Would that be a better match here? We should maybe decide before we diverge and propagate two different option syntaxes in backslash commands.
Re: Order changes in PG16 since ICU introduction
On Tue, 2023-05-09 at 10:25 +0200, Alvaro Herrera wrote: > I agree with removing these in v16, since they are going to become > more > meaningless and confusing. Agreed, but it would be nice to have an alternative that does the right thing. It's awkward for a user to read pg_database.datlocprovider, then depending on that, either look in datcollate or daticulocale. (It's awkward in the code, too.) Maybe some built-in function that returns a tuple of the default provider, the locale, and the version? Or should we also output the ctype somehow (which affects the results of upper()/lower())? Regards, Jeff Davis
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-05-07 05:05, Alena Rybakina wrote: Thanks for your reviewing and comments! I noticed that you used _ignore_datatype_errors_specified_ variable in _copy.c_ , but guc has a short name _ignore_datatype_errors_. Also you used the short variable name in _CopyFormatOptions_ structure. You may already understand it, but these variable names are given in imitation of FREEZE and BINARY cases: --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions * -1 if not specified */ boolbinary; /* binary format? */ boolfreeze; /* freeze rows on loading? */ + boolignore_datatype_errors; /* ignore rows with datatype errors */ --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate, boolformat_specified = false; boolfreeze_specified = false; boolheader_specified = false; + boolignore_datatype_errors_specified = false; Name used _ignore_datatype_errors_specified _is seemed very long to me, may be use a short version of it (_ignore_datatype_errors_) in _copy.c_ too? I think it would be sane to align the names with the FREEZE and BINARY options. I agree with the name is too long and we once used the name 'ignore_errors'. However, current implementation does not ignore all errors but just data type error, so I renamed it. There may be a better name, but I haven't come up with one. Besides, I noticed that you used _ignored_errors_ variable in _CopyFromStateData_ structure and it's name is strikingly similar to name (_ignore_datatype_error__s_), but they have different meanings. Maybe it will be better to rename it as _ignored_errors_counter_? As far as I take a quick look at on PostgreSQL source code, there're few variable name with "_counter". It seems to be used for function names. Something like "ignored_errors_count" might be better. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
PostgreSQL 16 Beta 1 release date
Hi, The release date for PostgreSQL 16 Beta 1 is scheduled for May 25, 2023. Please ensure you have committed any work for Beta 1 released committed by May 21, 2023 AoE. Thank you for your efforts with resolving open items[2] as we work to stabilize PostgreSQL 16 for GA! Thanks, Jonathan [1] https://en.wikipedia.org/wiki/Anywhere_on_Earth [2] https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items OpenPGP_signature Description: OpenPGP digital signature
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Tuesday, May 9th, 2023 at 2:54 PM, Tomas Vondra wrote: > > > On 5/9/23 00:10, Michael Paquier wrote: > > > On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote: > > > > > The LZ4Stream_write() forgot to move the pointer to the next chunk, so > > > it was happily decompressing the initial chunk over and over. A bit > > > embarrassing oversight :-( > > > > > > The custom format calls WriteDataToArchiveLZ4(), which was correct. > > > > > > The attached patch fixes this for me. > > > > Ouch. So this was corrupting the dumps and the compression when > > trying to write more than two chunks at once, not the decompression > > steps. That addresses the issue here as well, thanks! > > > Yeah. Thanks for the report, should have been found during review. Thank you both for looking. A small consolation is that now there are tests for this case. Moving on to the other open item for this, please find attached v2 of the patch as requested. Cheers, //Georgios > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL CompanyFrom cb0a229be59dffe09cc0ceceececdbd06a559d3f Mon Sep 17 00:00:00 2001 From: Georgios Kokolatos Date: Mon, 8 May 2023 11:58:57 + Subject: [PATCH v2] Null terminate the output buffer of LZ4Stream_gets LZ4Stream_gets did not null terminate its output buffer. Its callers expected the buffer to be null terminated so they passed it around to functions such as sscanf with unintended consequences. Reported-by: Alexander Lakhin --- src/bin/pg_dump/compress_lz4.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c index 423e1b7976..f97b7550d1 100644 --- a/src/bin/pg_dump/compress_lz4.c +++ b/src/bin/pg_dump/compress_lz4.c @@ -459,6 +459,10 @@ LZ4Stream_read_internal(LZ4State *state, void *ptr, int ptrsize, bool eol_flag) if (!LZ4Stream_init(state, size, false /* decompressing */ )) return -1; + /* No work needs to be done for a zero-sized output buffer */ + if (size <= 0) + return 0; + /* Verify that there is enough space in the outbuf */ if (size > state->buflen) { @@ -636,7 +640,7 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH) LZ4State *state = (LZ4State *) CFH->private_data; int ret; - ret = LZ4Stream_read_internal(state, ptr, size, true); + ret = LZ4Stream_read_internal(state, ptr, size - 1, true); if (ret < 0 || (ret == 0 && !LZ4Stream_eof(CFH))) pg_fatal("could not read from input file: %s", LZ4Stream_get_error(CFH)); @@ -644,6 +648,12 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH) if (ret == 0) return NULL; + /* + * Our caller expects the return string to be NULL terminated + * and we know that ret is greater than zero. + */ + ptr[ret - 1] = '\0'; + return ptr; } -- 2.34.1
Re: Proposal for Prototype Implementation to Enhance C/C++ Interoperability in PostgreSQL
I apologize for my previous hasty conclusion. I have conducted further testing on different platforms and would like to share my findings. > FreeBSD Based on my tests, it appears that FreeBSD follows the Itanium C++ ABI specification. The previous test failed because the C++ compiler was not used when linking libpgsjlj.so. > macOS, M1 My tests show that macOS M1 roughly follows the Itanium C++ ABI specification, with only slight differences, such as the parameters accepted by the _Unwind_Stop_Fn function. > macOS, x86 I don't have the resources to do the testing, but from a code perspective, it appears that macOS x86 follows the Itanium C++ ABI specification. > Windows It seems that Windows does not follow the Itanium C++ ABI specification at all. If we compile the program using the `/EHsc` option, longjmp will also trigger forced unwinding. However, unlike the Itanium C++ ABI, the forced unwinding triggered here cannot be captured by a C++ catch statement.
Re: Allow pg_archivecleanup to remove backup history files
Horiguchi-san, Michael-san Thanks for your comments and information! Attached a patch with documentation and regression tests. On 2023-04-26 06:39, Michael Paquier wrote: On Tue, Apr 25, 2023 at 05:29:48PM +0900, Kyotaro Horiguchi wrote: I thought that we have decided not to do that, but I coundn't find any discussion about it in the ML archive. Anyway, I think it is great that we have that option. No objections from here to make that optional. It's been argued for a couple of times that leaving the archive history files is good for debugging, but I can also get why one would one to clean up all the files older than a WAL retention policy. Even if these are just few bytes, it can be noisy for the eye to see a large accumulation of history files mixed with the WAL segments. -- Michael -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 7d44134e5de930ce04819285a5b7359e370708d4 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 9 May 2023 21:52:01 +0900 Subject: [PATCH v2] Allow pg_archivecleanup to remove backup history files Add new option -b to remove files including backup history files older than oldestkeptwalfile. --- doc/src/sgml/ref/pgarchivecleanup.sgml| 11 + src/bin/pg_archivecleanup/pg_archivecleanup.c | 11 - .../t/010_pg_archivecleanup.pl| 45 +-- 3 files changed, 62 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/pgarchivecleanup.sgml b/doc/src/sgml/ref/pgarchivecleanup.sgml index 635e7c7685..6cb565156f 100644 --- a/doc/src/sgml/ref/pgarchivecleanup.sgml +++ b/doc/src/sgml/ref/pgarchivecleanup.sgml @@ -93,6 +93,17 @@ pg_archivecleanup: removing file "archive/00010037000E" + + -b + + + Remove files including backup history files, whose suffix is .backup. + Note that when oldestkeptwalfile is a backup history file, + specified file is kept and only preceding WAL files and backup history files are removed. + + + + -d diff --git a/src/bin/pg_archivecleanup/pg_archivecleanup.c b/src/bin/pg_archivecleanup/pg_archivecleanup.c index 7726d05149..ae6ae76f08 100644 --- a/src/bin/pg_archivecleanup/pg_archivecleanup.c +++ b/src/bin/pg_archivecleanup/pg_archivecleanup.c @@ -23,6 +23,8 @@ const char *progname; /* Options and defaults */ bool dryrun = false; /* are we performing a dry-run operation? */ +bool removeBackupHistoryFile = false; /* remove files including + * backup history files */ char *additional_ext = NULL; /* Extension to remove from filenames */ char *archiveLocation; /* where to find the archive? */ @@ -118,7 +120,8 @@ CleanupPriorWALFiles(void) * file. Note that this means files are not removed in the order * they were originally written, in case this worries you. */ - if ((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) && + if (((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) || +(removeBackupHistoryFile && IsBackupHistoryFileName(walfile))) && strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0) { char WALFilePath[MAXPGPATH * 2]; /* the file path @@ -252,6 +255,7 @@ usage(void) printf(_("Usage:\n")); printf(_(" %s [OPTION]... ARCHIVELOCATION OLDESTKEPTWALFILE\n"), progname); printf(_("\nOptions:\n")); + printf(_(" -b remove files including backup history files\n")); printf(_(" -d generate debug output (verbose mode)\n")); printf(_(" -n dry run, show the names of the files that would be removed\n")); printf(_(" -V, --version output version information, then exit\n")); @@ -294,10 +298,13 @@ main(int argc, char **argv) } } - while ((c = getopt(argc, argv, "dnx:")) != -1) + while ((c = getopt(argc, argv, "bdnx:")) != -1) { switch (c) { + case 'b': /* Remove backup history files too */ +removeBackupHistoryFile = true; +break; case 'd': /* Debug mode */ pg_logging_increase_verbosity(); break; diff --git a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl index 76321d1284..afd2ff6209 100644 --- a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl +++ b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl @@ -16,9 +16,14 @@ my @walfiles = ( '00010037000C.gz', '00010037000D', '00010037000E','00010037000F.partial',); +my @walfiles_with_backuphistoryfile = ( + '000200380007.0028.backup', '000200380008', + '000200380009', + '00020038000A', '00020038000B.007C9330.backup'); + sub create_files { - foreach my $fn (@walfiles, 'unrelated_file') + foreach my $fn (@_, 'unrelated_file') { open my $file, '>', "$tempdir/$fn"; print $file 'CONTENT'; @@ -27,7 +32,7 @@ sub create_files return; } -create_files();
Re: Add LZ4 compression in pg_dump
On 5/9/23 00:10, Michael Paquier wrote: > On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote: >> The LZ4Stream_write() forgot to move the pointer to the next chunk, so >> it was happily decompressing the initial chunk over and over. A bit >> embarrassing oversight :-( >> >> The custom format calls WriteDataToArchiveLZ4(), which was correct. >> >> The attached patch fixes this for me. > > Ouch. So this was corrupting the dumps and the compression when > trying to write more than two chunks at once, not the decompression > steps. That addresses the issue here as well, thanks! Yeah. Thanks for the report, should have been found during review. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Allow Postgres to pick an unused port to listen
The documentation fails to build for me: $ ninja docs [1/2] Generating doc/src/sgml/postgres-full.xml with a custom command FAILED: doc/src/sgml/postgres-full.xml /usr/bin/python3 ../postgresql/doc/src/sgml/xmltools_dep_wrapper --targetname doc/src/sgml/postgres-full.xml --depfile doc/src/sgml/postgres-full.xml.d --tool /usr/bin/xmllint -- --nonet --noent --valid --path doc/src/sgml -o doc/src/sgml/postgres-full.xml ../postgresql/doc/src/sgml/postgres.sgml ../postgresql/doc/src/sgml/postgres.sgml:685: element para: validity error : Element entry is not declared in para list of possible children ninja: build stopped: subcommand failed. Removing the tag resolves the issue: diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cd07bad3b5..f71859f710 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -684,7 +684,7 @@ include_dir 'conf.d' The port can be set to 0 to make Postgres pick an unused port number. -The assigned port number can be then retrieved from postmaster.pid. +The assigned port number can be then retrieved from postmaster.pid.
Re: drop table in transaction
On Tue, May 9, 2023, at 7:42 AM, Fabrice Chapuis wrote: > Where in the code is written the mechanism used for isolation when drop table > is executed in a transaction RemoveRelations() in src/backend/commands/tablecmds.c If you are looking for a previous layer, check ExecDropStmt(). -- Euler Taveira EDB https://www.enterprisedb.com/
Re: psql tests hangs
út 9. 5. 2023 v 11:07 odesílatel Pavel Stehule napsal: > > > út 9. 5. 2023 v 10:48 odesílatel Daniel Gustafsson > napsal: > >> > On 9 May 2023, at 08:52, Pavel Stehule wrote: >> > >> > Hi >> > >> > I try run make check-world. Now I have problems with tests of psql >> > >> > I had to cancel tests >> > >> > log: >> > >> > [08:46:49.828](0.038s) ok 63 - no ON_ERROR_STOP, --single-transaction >> and multiple -c switches >> > [08:46:49.860](0.033s) ok 64 - client-side error commits transaction, >> no ON_ERROR_STOP and multiple -c switches >> > [08:46:49.928](0.067s) ok 65 - \copy from with DEFAULT: exit code 0 >> > [08:46:49.929](0.001s) ok 66 - \copy from with DEFAULT: no stderr >> > [08:46:49.930](0.001s) ok 67 - \copy from with DEFAULT: matches >> > death by signal at >> /home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/Cluster.pm >> line 3042. >> > # Postmaster PID for node "main" is 157863 >> > ### Stopping node "main" using mode immediate >> > # Running: pg_ctl -D >> /home/pavel/src/postgresql.master/src/bin/psql/tmp_check/t_001_basic_main_data/pgdata >> -m immediate stop >> > waiting for server to shut down done >> > server stopped >> > # No postmaster PID for node "main" >> > [08:47:30.361](40.431s) # Tests were run but no plan was declared and >> done_testing() was not seen. >> > [08:47:30.362](0.001s) # Looks like your test exited with 4 just after >> 67. >> > Warning: unable to close filehandle $orig_stderr properly: Broken pipe >> during global destruction. >> >> I'm unable to reproduce, and this clearly works in the buildfarm and CI. >> Did >> you run out of disk on the volume during the test or something similar? >> Anything interesting in the serverlogs from the tmp_check install? >> > > I have enough free space on disc > > I don't see nothing interesting in log (it is another run) > > 2023-05-09 08:50:04.839 CEST [158930] 001_basic.pl LOG: statement: COPY > copy_default FROM STDIN with (format 'csv', default 'placeholder'); > 2023-05-09 08:50:04.841 CEST [158930] 001_basic.pl LOG: statement: > SELECT * FROM copy_default > 2023-05-09 08:50:04.879 CEST [158932] 001_basic.pl LOG: statement: > SELECT 1. > 2023-05-09 08:50:04.888 CEST [158932] 001_basic.pl LOG: statement: > SELECT 1. > 2023-05-09 08:50:04.898 CEST [158932] 001_basic.pl LOG: statement: > SELECT 1. > 2023-05-09 08:50:28.375 CEST [158862] LOG: received immediate shutdown > request > 2023-05-09 08:50:28.385 CEST [158862] LOG: database system is shut down > > backtrace from perl > > Program received signal SIGINT, Interrupt. > 0x7f387ecc1ade in select () from /lib64/libc.so.6 > (gdb) bt > #0 0x7f387ecc1ade in select () from /lib64/libc.so.6 > #1 0x7f387e97363b in Perl_pp_sselect () from /lib64/libperl.so.5.36 > #2 0x7f387e917958 in Perl_runops_standard () from > /lib64/libperl.so.5.36 > #3 0x7f387e88259d in perl_run () from /lib64/libperl.so.5.36 > #4 0x5588bceb234a in main () > > Regards > I repeated another build with the same result. Tested REL_15_STABLE branch without any problems. Regards Pavel > > Pavel > > > >1. > > > > > > >> >> -- >> Daniel Gustafsson >> >>
Re: Tables getting stuck at 's' state during logical replication
On Fri, May 5, 2023 at 7:27 PM Padmavathi G wrote: > > Some background on the setup on which I am trying to carry out the upgrade: > > We have a pod in a kubernetes cluster which contains the postgres 11 image. > We are following the logical replication process for upgrade > > Steps followed for logical replication: > > 1. Created a new pod in the same kubernetes cluster with the latest postgres > 15 image > 2. Created a publication (say publication 1) in the old pod including all > tables in a database > 3. Created a subscription (say subscription 1) in the new pod for the above > mentioned publication > 4. When monitoring the subscription via pg_subscription_rel in the > subscriber, I noticed that out of 45 tables 20 were in the 'r' state and 25 > were in 's' state and they remained in the same state for almost 2 days, > there was no improvement in the state. But the logs showed that the tables > which had 's' state also had "synchronization workers for > finished". > I think the problem happened in this step where some of the tables remained in 's' state. It is not clear why this could happen because apply worker should eventually update these relations state to 'r'. If this is reproducible, we can try two things to further investigate the issue: (a) Disable and Enable the subscription once and see if that helps; (b) try increasing the LOG level to DEBUG2 and see if we get any useful LOG message. -- With Regards, Amit Kapila.
Re: Cleaning up array_in()
09.05.2023 06:06, Tom Lane wrote: Alexander Lakhin writes: The only thing that confused me, is the error message (it's not new, too): select '{{1}}'::int[]; or even: select '{{'::int[]; ERROR: number of array dimensions (7) exceeds the maximum allowed (6) Yeah, I didn't touch that, but it's pretty bogus because the first number will always be "7" even if you wrote more than 7 left braces, since the code errors out immediately upon finding that it's seen too many braces. The equivalent message in the PLs just says "number of array dimensions exceeds the maximum allowed (6)". I'm inclined to do likewise in array_in, but didn't touch it here. I think that, strictly speaking, we have no array dimensions in the string '{{'; there are only characters (braces) during the string parsing. While in the PLs we definitely deal with real arrays, which have dimensions. Beside that, I would like to note the following error text changes (all of these are feasible, I think): I'll look into whether we can improve those, unless you had a patch in mind already? Those messages looked more or less correct to me, I just wanted to note how they are changing (and haven't highlighted messages, that are not), but if you see here room for improvement, please look into it (I have no good formulations yet). Best regards, Alexander
drop table in transaction
Where in the code is written the mechanism used for isolation when drop table is executed in a transaction Thanks for your help Fabrice
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, May 9, 2023 at 7:50 AM Masahiko Sawada wrote: > > On Mon, May 8, 2023 at 8:09 PM Amit Kapila wrote: > > > > > > I think it is only possible for the leader apply can worker to try to > > receive the error message from an error queue after your 0002 patch. > > Because another place already detached from the queue before stopping > > the parallel apply workers. So, I combined both the patches and > > changed a few comments and a commit message. Let me know what you > > think of the attached. > > I have one comment on the detaching error queue part: > > + /* > +* Detach from the error_mq_handle for the parallel apply worker > before > +* stopping it. This prevents the leader apply worker from trying to > +* receive the message from the error queue that might already > be detached > +* by the parallel apply worker. > +*/ > + shm_mq_detach(winfo->error_mq_handle); > + winfo->error_mq_handle = NULL; > > In pa_detach_all_error_mq(), we try to detach error queues of all > workers in the pool. I think we should check if the queue is already > detached (i.e. is NULL) there. Otherwise, we will end up a SEGV if an > error happens after detaching the error queue and before removing the > worker from the pool. > Agreed, I have made this change, added the same check at one other place for the sake of consistency, and pushed the patch. -- With Regards, Amit Kapila.
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Peter, Thank you for reviewing! PSA new version. > > General. > > 1. pg_dump option is documented to the user. > > I'm not sure about exposing the new pg_dump > --logical-replication-slots-only option to the user. > > I thought this pg_dump option was intended only to be called > *internally* by the pg_upgrade. > But, this patch is also documenting the new option for the user (in > case they want to call it independently?) > > Maybe exposing it is OK, but if you do that then I thought perhaps > there should also be some additional pg_dump tests just for this > option (i.e. tested independently of the pg_upgrade) Right, I have written the document for the moment, but it should not If it is not exposed. Removed from the doc. > Commit message > > 2. > For pg_upgrade, when '--include-logical-replication-slots' is > specified, it executes > pg_dump with the new "--logical-replication-slots-only" option and > restores from the > dump. Apart from restoring schema, pg_resetwal must not be called > after restoring > replication slots. This is because the command discards WAL files and > starts from a > new segment, even if they are required by replication slots. This > leads to an ERROR: > "requested WAL segment XXX has already been removed". To avoid this, > replication slots > are restored at a different time than other objects, after running > pg_resetwal. > > ~~ > > The "Apart from" sentence maybe could do with some rewording. I > noticed there is a code comment (below fragment) that says the same as > this, but more clearly. Maybe it is better to use that code-comment > wording in the comment message. > > + * XXX We cannot dump replication slots at the same time as the schema > + * dump because we need to separate the timing of restoring > + * replication slots and other objects. Replication slots, in > + * particular, should not be restored before executing the pg_resetwal > + * command because it will remove WALs that are required by the slots. Changed. > src/bin/pg_dump/pg_dump.c > > 3. main > > + if (dopt.logical_slots_only && !dopt.binary_upgrade) > + pg_fatal("options --logical-replication-slots-only requires option > --binary-upgrade"); > + > + if (dopt.logical_slots_only && dopt.dataOnly) > + pg_fatal("options --logical-replication-slots-only and > -a/--data-only cannot be used together"); > + if (dopt.logical_slots_only && dopt.schemaOnly) > + pg_fatal("options --logical-replication-slots-only and > -s/--schema-only cannot be used together"); > + > > Consider if it might be simpler to combine together all those > dopt.logical_slots_only checks. > > SUGGESTION > > if (dopt.logical_slots_only) > { > if (!dopt.binary_upgrade) > pg_fatal("options --logical-replication-slots-only requires > option --binary-upgrade"); > > if (dopt.dataOnly) > pg_fatal("options --logical-replication-slots-only and > -a/--data-only cannot be used together"); > if (dopt.schemaOnly) > pg_fatal("options --logical-replication-slots-only and > -s/--schema-only cannot be used together"); > } Right, fixed. > 4. getLogicalReplicationSlots > > + /* Check whether we should dump or not */ > + if (fout->remoteVersion < 16 || !dopt->logical_slots_only) > + return; > > I'm not sure if this check is necessary. Given the way this function > is called, is it possible for this check to fail? Maybe that quick > exit would be better code as an Assert? I think the version check must be needed because it is not done yet. (Actually I'm not sure the restriction is needed, but now I will keep) About dopt->logical_slots_only, I agreed to remove that. > 5. dumpLogicalReplicationSlot > > +dumpLogicalReplicationSlot(Archive *fout, > +const LogicalReplicationSlotInfo *slotinfo) > +{ > + DumpOptions *dopt = fout->dopt; > + > + if (!dopt->logical_slots_only) > + return; > > (Similar to the previous comment). Is it even possible to arrive here > when dopt->logical_slots_only is false. Maybe that quick exit would be > better coded as an Assert? I think it is not possible, so changed to Assert(). > 6. > + PQExpBuffer query = createPQExpBuffer(); > + char*slotname = pg_strdup(slotinfo->dobj.name); > > I wondered if it was really necessary to strdup/free this slotname. > e.g. And if it is, then why don't you do this for the slotinfo->plugin > field? This was a debris for my testing. Removed. > src/bin/pg_upgrade/check.c > > 7. check_and_dump_old_cluster > > /* Extract a list of databases and tables from the old cluster */ > get_db_and_rel_infos(_cluster); > + get_logical_slot_infos(_cluster); > > Is it correct to associate this new call with that existing comment > about "databases and tables"? Added a comment. > 8. check_new_cluster > > @@ -188,6 +190,7 @@ void > check_new_cluster(void) > { > get_db_and_rel_infos(_cluster); > + get_logical_slot_infos(_cluster); > > check_new_cluster_is_empty(); > > @@ -210,6 +213,9 @@ check_new_cluster(void) >
Re: base backup vs. concurrent truncation
Hi Robert, > I admit I haven't done the legwork to nail down a test > case where everything comes together just right to show user-visible > breakage, but your success in finding one where it doesn't is no proof > of anything. Respectfully, what made you think this was my intention? Quite the opposite, personally I am inclined to think that the problem does exist. In order to fix it however we need a test that reliably reproduces it first. Otherwise there is no way to figure out whether the fix was correct or not. What the experiment showed is that the test scenario you initially described is probably the wrong one for reasons yet to be understood and we need to come up with a better one. -- Best regards, Aleksander Alekseev
Re: psql tests hangs
út 9. 5. 2023 v 10:48 odesílatel Daniel Gustafsson napsal: > > On 9 May 2023, at 08:52, Pavel Stehule wrote: > > > > Hi > > > > I try run make check-world. Now I have problems with tests of psql > > > > I had to cancel tests > > > > log: > > > > [08:46:49.828](0.038s) ok 63 - no ON_ERROR_STOP, --single-transaction > and multiple -c switches > > [08:46:49.860](0.033s) ok 64 - client-side error commits transaction, no > ON_ERROR_STOP and multiple -c switches > > [08:46:49.928](0.067s) ok 65 - \copy from with DEFAULT: exit code 0 > > [08:46:49.929](0.001s) ok 66 - \copy from with DEFAULT: no stderr > > [08:46:49.930](0.001s) ok 67 - \copy from with DEFAULT: matches > > death by signal at > /home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 3042. > > # Postmaster PID for node "main" is 157863 > > ### Stopping node "main" using mode immediate > > # Running: pg_ctl -D > /home/pavel/src/postgresql.master/src/bin/psql/tmp_check/t_001_basic_main_data/pgdata > -m immediate stop > > waiting for server to shut down done > > server stopped > > # No postmaster PID for node "main" > > [08:47:30.361](40.431s) # Tests were run but no plan was declared and > done_testing() was not seen. > > [08:47:30.362](0.001s) # Looks like your test exited with 4 just after > 67. > > Warning: unable to close filehandle $orig_stderr properly: Broken pipe > during global destruction. > > I'm unable to reproduce, and this clearly works in the buildfarm and CI. > Did > you run out of disk on the volume during the test or something similar? > Anything interesting in the serverlogs from the tmp_check install? > I have enough free space on disc I don't see nothing interesting in log (it is another run) 2023-05-09 08:50:04.839 CEST [158930] 001_basic.pl LOG: statement: COPY copy_default FROM STDIN with (format 'csv', default 'placeholder'); 2023-05-09 08:50:04.841 CEST [158930] 001_basic.pl LOG: statement: SELECT * FROM copy_default 2023-05-09 08:50:04.879 CEST [158932] 001_basic.pl LOG: statement: SELECT 1. 2023-05-09 08:50:04.888 CEST [158932] 001_basic.pl LOG: statement: SELECT 1. 2023-05-09 08:50:04.898 CEST [158932] 001_basic.pl LOG: statement: SELECT 1. 2023-05-09 08:50:28.375 CEST [158862] LOG: received immediate shutdown request 2023-05-09 08:50:28.385 CEST [158862] LOG: database system is shut down backtrace from perl Program received signal SIGINT, Interrupt. 0x7f387ecc1ade in select () from /lib64/libc.so.6 (gdb) bt #0 0x7f387ecc1ade in select () from /lib64/libc.so.6 #1 0x7f387e97363b in Perl_pp_sselect () from /lib64/libperl.so.5.36 #2 0x7f387e917958 in Perl_runops_standard () from /lib64/libperl.so.5.36 #3 0x7f387e88259d in perl_run () from /lib64/libperl.so.5.36 #4 0x5588bceb234a in main () Regards Pavel 1. > > -- > Daniel Gustafsson > >
Re: psql tests hangs
> On 9 May 2023, at 08:52, Pavel Stehule wrote: > > Hi > > I try run make check-world. Now I have problems with tests of psql > > I had to cancel tests > > log: > > [08:46:49.828](0.038s) ok 63 - no ON_ERROR_STOP, --single-transaction and > multiple -c switches > [08:46:49.860](0.033s) ok 64 - client-side error commits transaction, no > ON_ERROR_STOP and multiple -c switches > [08:46:49.928](0.067s) ok 65 - \copy from with DEFAULT: exit code 0 > [08:46:49.929](0.001s) ok 66 - \copy from with DEFAULT: no stderr > [08:46:49.930](0.001s) ok 67 - \copy from with DEFAULT: matches > death by signal at > /home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/Cluster.pm > line 3042. > # Postmaster PID for node "main" is 157863 > ### Stopping node "main" using mode immediate > # Running: pg_ctl -D > /home/pavel/src/postgresql.master/src/bin/psql/tmp_check/t_001_basic_main_data/pgdata > -m immediate stop > waiting for server to shut down done > server stopped > # No postmaster PID for node "main" > [08:47:30.361](40.431s) # Tests were run but no plan was declared and > done_testing() was not seen. > [08:47:30.362](0.001s) # Looks like your test exited with 4 just after 67. > Warning: unable to close filehandle $orig_stderr properly: Broken pipe during > global destruction. I'm unable to reproduce, and this clearly works in the buildfarm and CI. Did you run out of disk on the volume during the test or something similar? Anything interesting in the serverlogs from the tmp_check install? -- Daniel Gustafsson
Re: Order changes in PG16 since ICU introduction
On 2023-Apr-24, Peter Eisentraut wrote: > The GUC settings lc_collate and lc_ctype are from a time when those locale > settings were cluster-global. When we made those locale settings > per-database (PG 8.4), we kept them as read-only. As of PG 15, you can use > ICU as the per-database locale provider, so what is being attempted in the > above example is already meaningless before PG 16, since you need to look > into pg_database to find out what is really happening. > > I think we should just remove the GUC parameters lc_collate and lc_ctype. I agree with removing these in v16, since they are going to become more meaningless and confusing. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [DOC] Update ALTER SUBSCRIPTION documentation
On Tue, May 9, 2023 at 3:40 PM Amit Kapila wrote: > > On Mon, May 8, 2023 at 1:51 PM Masahiko Sawada wrote: > > > > Apart from the documentation change, given that setting slot_name = > > NONE always requires for the subscription to be disabled beforehand, > > does it make sense to change ALTER SUBSCRIPTION SET so that we disable > > the subscription when setting slot_name = NONE? Setting slot_name to a > > valid slot name doesn't enable the subscription, though. > > > > I think this is worth considering. Offhand, I don't see any problem > with this idea but users may not like the automatic disabling of > subscriptions along with setting slot_name=NONE. It would be better to > discuss this in a separate thread to seek the opinion of others. > Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Improve list manipulation in several places
On 2023-May-08, Ranier Vilela wrote: > Em seg., 8 de mai. de 2023 às 14:26, Alvaro Herrera > escreveu: > > > The problem I see is that each of these new functions has a single > > caller, and the only one that looks like it could have a performance > > advantage is list_copy_move_nth_to_head() (which is the weirdest of the > > lot). I'm inclined not to have any of these single-use functions unless > > a performance case can be made for them. > > > I think you missed list_nth_xid, It makes perfect sense to exist. I saw that one; it's just syntactic sugar, just like list_nth_int and list_nth_oid, except it has only one possible callsite instead of a dozen like those others. I see no harm in that function, but no practical advantage to it either. Xid lists are a very fringe feature, there being exactly one place in the whole server that uses them. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Add two missing tests in 035_standby_logical_decoding.pl
Hi, On 5/9/23 8:02 AM, Amit Kapila wrote: On Mon, May 8, 2023 at 1:45 PM Drouvot, Bertrand wrote: Why not initialize the cascading standby node just before the standby promotion test: "Test standby promotion and logical decoding behavior after the standby gets promoted."? That way we will avoid any unknown side-effects of cascading standby and it will anyway look more logical to initialize it where the test needs it. Yeah, that's even better. Moved the physical slot creation on the standby and the cascading standby initialization where "strictly" needed in V2 attached. Also ensuring that hsf is set to on on the cascading standby to be on the safe side of thing. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 0292d14a62b72b38197d434d85ee2c6a7f2cec00 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Tue, 9 May 2023 06:43:59 + Subject: [PATCH v2] fix retaining WAL test --- .../t/035_standby_logical_decoding.pl | 36 +-- 1 file changed, 18 insertions(+), 18 deletions(-) 100.0% src/test/recovery/t/ diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index ad1def2474..2b4a688330 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -276,20 +276,6 @@ $node_standby->append_conf('postgresql.conf', max_replication_slots = 5]); $node_standby->start; $node_primary->wait_for_replay_catchup($node_standby); -$node_standby->safe_psql('testdb', qq[SELECT * FROM pg_create_physical_replication_slot('$standby_physical_slotname');]); - -### -# Initialize cascading standby node -### -$node_standby->backup($backup_name); -$node_cascading_standby->init_from_backup( - $node_standby, $backup_name, - has_streaming => 1, - has_restoring => 1); -$node_cascading_standby->append_conf('postgresql.conf', - qq[primary_slot_name = '$standby_physical_slotname']); -$node_cascading_standby->start; -$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary); ### # Initialize subscriber node @@ -503,9 +489,6 @@ check_slots_conflicting_status(1); # Verify that invalidated logical slots do not lead to retaining WAL. ## -# Wait for the cascading standby to catchup before removing the WAL file(s) -$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary); - # Get the restart_lsn from an invalidated slot my $restart_lsn = $node_standby->safe_psql('postgres', "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 'vacuum_full_activeslot' and conflicting is true;" @@ -777,9 +760,26 @@ $node_standby->reload; $node_primary->psql('postgres', q[CREATE DATABASE testdb]); $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y text);]); -# Wait for the standby to catchup before creating the slots +# Wait for the standby to catchup before initializing the cascading standby $node_primary->wait_for_replay_catchup($node_standby); +# Create a physical replication slot on the standby. +# Keep this step after the "Verify that invalidated logical slots do not lead +# to retaining WAL" test (as the physical slot on the standby could prevent the +# WAL file removal). +$node_standby->safe_psql('testdb', qq[SELECT * FROM pg_create_physical_replication_slot('$standby_physical_slotname');]); + +# Initialize cascading standby node +$node_standby->backup($backup_name); +$node_cascading_standby->init_from_backup( + $node_standby, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_cascading_standby->append_conf('postgresql.conf', + qq[primary_slot_name = '$standby_physical_slotname' + hot_standby_feedback = on]); +$node_cascading_standby->start; + # create the logical slots create_logical_slots($node_standby, 'promotion_'); -- 2.34.1
psql tests hangs
Hi I try run make check-world. Now I have problems with tests of psql I had to cancel tests log: [08:46:49.828](0.038s) ok 63 - no ON_ERROR_STOP, --single-transaction and multiple -c switches [08:46:49.860](0.033s) ok 64 - client-side error commits transaction, no ON_ERROR_STOP and multiple -c switches [08:46:49.928](0.067s) ok 65 - \copy from with DEFAULT: exit code 0 [08:46:49.929](0.001s) ok 66 - \copy from with DEFAULT: no stderr [08:46:49.930](0.001s) ok 67 - \copy from with DEFAULT: matches death by signal at /home/pavel/src/postgresql.master/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 3042. # Postmaster PID for node "main" is 157863 ### Stopping node "main" using mode immediate # Running: pg_ctl -D /home/pavel/src/postgresql.master/src/bin/psql/tmp_check/t_001_basic_main_data/pgdata -m immediate stop waiting for server to shut down done server stopped # No postmaster PID for node "main" [08:47:30.361](40.431s) # Tests were run but no plan was declared and done_testing() was not seen. [08:47:30.362](0.001s) # Looks like your test exited with 4 just after 67. Warning: unable to close filehandle $orig_stderr properly: Broken pipe during global destruction. I use Fedora 38 Regards Pavel
Re: [DOC] Update ALTER SUBSCRIPTION documentation
On Mon, May 8, 2023 at 1:51 PM Masahiko Sawada wrote: > > Apart from the documentation change, given that setting slot_name = > NONE always requires for the subscription to be disabled beforehand, > does it make sense to change ALTER SUBSCRIPTION SET so that we disable > the subscription when setting slot_name = NONE? Setting slot_name to a > valid slot name doesn't enable the subscription, though. > I think this is worth considering. Offhand, I don't see any problem with this idea but users may not like the automatic disabling of subscriptions along with setting slot_name=NONE. It would be better to discuss this in a separate thread to seek the opinion of others. -- With Regards, Amit Kapila.
Re: WAL Insertion Lock Improvements
On Tue, May 09, 2023 at 02:10:20PM +0900, Michael Paquier wrote: > Should we split this patch into two parts, as they aim at tackling two > different cases then? One for LWLockConflictsWithVar() and > LWLockReleaseClearVar() which are the straight-forward pieces > (using one pg_atomic_write_u64() in LWLockUpdateVar instead), then > a second for LWLockUpdateVar()? I have been studying that a bit more, and I'd like to take this suggestion back. Apologies for the noise. -- Michael signature.asc Description: PGP signature
Re: Add two missing tests in 035_standby_logical_decoding.pl
On Mon, May 8, 2023 at 1:45 PM Drouvot, Bertrand wrote: > > On 5/8/23 4:42 AM, Amit Kapila wrote: > > On Sat, May 6, 2023 at 9:33 PM Drouvot, Bertrand > > wrote: > >> > >> On 5/6/23 3:28 PM, Amit Kapila wrote: > >>> On Sat, May 6, 2023 at 1:52 PM Drouvot, Bertrand > >>> wrote: > >> > >>> Next steps: > >>> = > >>> 1. The first thing is we should verify this theory by adding some LOG > >>> in KeepLogSeg() to see if the _logSegNo is reduced due to the value > >>> returned by XLogGetReplicationSlotMinimumLSN(). > >> > >> Yeah, will do that early next week. > > It's done with the following changes: > > https://github.com/bdrouvot/postgres/commit/79e1bd9ab429a22f876b9364eb8a0da2dacaaef7#diff-c1cb3ab2a19606390c1a7ed00ffe5a45531702ca5faf999d401c548f8951c65bL7454-R7514 > > With that in place, there is one failing test here: > https://cirrus-ci.com/task/5173216310722560 > > Where we can see: > > 2023-05-08 07:42:56.301 UTC [18038][checkpointer] LOCATION: > UpdateMinRecoveryPoint, xlog.c:2500 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: > CreateRestartPoint: After XLByteToSeg RedoRecPtr is 0/498, _logSegNo is 4 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION: > CreateRestartPoint, xlog.c:7271 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: KeepLogSeg: > segno changed to 4 due to XLByteToSeg > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION: KeepLogSeg, > xlog.c:7473 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: KeepLogSeg: > segno changed to 3 due to XLogGetReplicationSlotMinimumLSN() > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION: KeepLogSeg, > xlog.c:7483 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: > CreateRestartPoint: After KeepLogSeg RedoRecPtr is 0/498, endptr is > 0/4000148, _logSegNo is 3 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION: > CreateRestartPoint, xlog.c:7284 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: BDT1 about to > call RemoveOldXlogFiles in CreateRestartPoint > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION: > CreateRestartPoint, xlog.c:7313 > 2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG: 0: attempting to > remove WAL segments older than log file 0002 > > So the suspicion about XLogGetReplicationSlotMinimumLSN() was correct > (_logSegNo moved from > 4 to 3 due to XLogGetReplicationSlotMinimumLSN()). > > >> What about postponing the physical slot creation on the standby and the > >> cascading standby node initialization after this test? > >> > > > > Yeah, that is also possible. But, I have a few questions regarding > > that: (a) There doesn't seem to be a physical slot on cascading > > standby, if I am missing something, can you please point me to the > > relevant part of the test? > > That's right. There is a physical slot only on the primary and on the standby. > > What I meant up-thread is to also postpone the cascading standby node > initialization > after this test (once the physical slot on the standby is created). > > Please find attached a proposal doing so. > > > (b) Which test is currently dependent on > > the physical slot on standby? > > Not a test but the cascading standby initialization with the > "primary_slot_name" parameter. > > Also, now I think that's better to have the physical slot on the standby + > hsf set to on on the > cascading standby (coming from the standby backup). > > Idea is to avoid any risk of logical slot invalidation on the cascading > standby in the > standby promotion test. > Why not initialize the cascading standby node just before the standby promotion test: "Test standby promotion and logical decoding behavior after the standby gets promoted."? That way we will avoid any unknown side-effects of cascading standby and it will anyway look more logical to initialize it where the test needs it. -- With Regards, Amit Kapila.