Re: Time to drop plpython2?
On Thu, Feb 17, 2022 at 10:08:55AM -0800, Andres Freund wrote: > On 2022-02-16 23:14:46 -0800, Andres Freund wrote: >> I've pinged the owners of the animals failing so far: >> - myna, butterflyfish > > Fixed, as noted by Micheal on this thread. Fixed is an incorrect word here, "temporarily bypassed" fits better :) Unfortunately, I had to remove --with-python on both animals for the time being, as I was not able to figure out why Python.h could not be found in those installations, and it was Friday afternoon. I'll try to investigate and re-enable that some time next week. -- Michael signature.asc Description: PGP signature
Re: improve --with-lz4 install documentation
On Fri, Feb 18, 2022 at 04:49:48PM +0900, Michael Paquier wrote: > With everything that has been merged recently based on LZ4, it makes > sense to me to simplify things as you are suggesting. > > install-windows.sgml includes the same description, so we'd better do > the same thing and simplify the paragraph of LZ4 there as well, no? I have checked the docs for any other areas that could be changed, and did not notice anything else, so applied. Thanks, Jeevan. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add support for building with ZSTD.
Hi, On Fri, Feb 18, 2022 at 06:53:10PM +, Robert Haas wrote: > Add support for building with ZSTD. > > This commit doesn't actually add anything that uses ZSTD; that will be > done separately. It just puts the basic infrastructure into place. > > Jeevan Ladhe, Robert Haas, and Michael Paquier. Reviewed by Justin > Pryzby and Andres Freund. I completely forgot that the section of the docs dedicated to the TAP tests with MSVC also needs a refresh to mention the new variable ZSTD, as of the attached. Do you mind if I apply that? Thanks, -- Michael diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index 98fa6962f6..91b1410e4a 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -557,6 +557,15 @@ $ENV{PROVE_TESTS}='t/020*.pl t/010*.pl' PATH. + + + ZSTD + + Path to a zstd command. The default is + zstd, that would be the command found in + PATH. + + signature.asc Description: PGP signature
Re: MultiXact\SLRU buffers configuration
> 8 апр. 2021 г., в 17:22, Thomas Munro написал(а): > > Thanks! I chickened out of committing a buffer replacement algorithm > patch written 11 hours before the feature freeze, but I also didn't > really want to commit the GUC patch without that. Ahh, if only we'd > latched onto the real problems here just a little sooner, but there is > always PostgreSQL 15, I heard it's going to be amazing. Moved to next > CF. Hi Thomas! There's feature freeze approaching again. I see that you are working on moving SLRUs to buffer pools, but it is not clear to which PG version it will land. And there is 100% consensus that first patch is useful and helps to prevent big issues. Maybe let's commit 1'st step without lifting default xact_buffers limit? Or 1st patch as-is with any simple technique that prevents linear search in SLRU buffers. Best regards, Andrey Borodin.
Re: logical decoding and replication of sequences
On Sat, Feb 19, 2022 at 7:48 AM Tomas Vondra wrote: > > Do we need to handle both FOR ALL TABLES and FOR ALL SEQUENCES at the > same time? That seems like a reasonable thing people might want. > +1. That seems reasonable to me as well. I think the same will apply to FOR ALL TABLES IN SCHEMA and FOR ALL SEQUENCES IN SCHEMA. -- With Regards, Amit Kapila.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Hi, On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > The backend_pid contains a valid value only during > the CHECKPOINT command issued by the backend explicitly, otherwise the > value will be 0. We may have to add an additional field to > 'CheckpointerShmemStruct' to hold the backend pid. The backend > requesting the checkpoint will update its pid to this structure. > Kindly let me know if you still feel the backend_pid field is not > necessary. There are more scenarios where you can have a baackend requesting a checkpoint and waiting for its completion, and there may be more than one backend concerned, so I don't think that storing only one / the first backend pid is ok. > > And also while looking at the patch I see there's the same problem that I > > mentioned in the previous thread, which is that the effective flags can be > > updated once the checkpoint started, and as-is the view won't reflect that. > > It > > also means that you can't simply display one of wal, time or force but a > > possible combination of the flags (including the one not handled in v1). > > If I understand the above comment properly, it has 2 points. First is > to display the combination of flags rather than just displaying wal, > time or force - The idea behind this is to just let the user know the > reason for checkpointing. That is, the checkpoint is started because > max_wal_size is reached or checkpoint_timeout expired or explicitly > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > has to be performed. Hence I have not included those in the view. If > it is really required, I would like to modify the code to include > other flags and display the combination. I think all the information should be exposed. Only knowing why the current checkpoint has been triggered without any further information seems a bit useless. Think for instance for cases like [1]. > Second point is to reflect > the updated flags in the view. AFAIK, there is a possibility that the > flags get updated during the on-going checkpoint but the reason for > checkpoint (wal, time or force) will remain same for the current > checkpoint. There might be a change in how checkpoint has to be > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > displaying the combination of flags in the view, then probably we may > have to reflect this in the view. You can only "upgrade" a checkpoint, but not "downgrade" it. So if for instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is possible) you can easily know which one was the one that triggered the checkpoint and which one was added later. > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > added for this purpose. > > > > Maybe num_process_waiting? > > I feel 'processes_wiating' aligns more with the naming conventions of > the fields of the existing progres views. There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have a strong opinion on it, just make sure to correct the typo. > > > Probably writing of buffers or syncing files may complete before > > > pg_is_in_recovery() returns false. But there are some cleanup > > > operations happen as part of the checkpoint. During this scenario, we > > > may get false value for pg_is_in_recovery(). Please refer following > > > piece of code which is present in CreateRestartpoint(). > > > > > > if (!RecoveryInProgress()) > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > You should still be able to compute the information while giving a bit more > > information for the same memory usage. > > Can you please describe more about how checkpoint/restartpoint can be > confirmed using the timeline id. If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a restartpoint if the checkpoint's timeline is different from the current timeline? [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Re: Fix formatting of Interval output
Joseph Koshakow writes: > When formatting the output of an Interval, we call abs() on the hours > field of the Interval. Calling abs(INT_MIN) returns back INT_MIN > causing the output to contain two '-' characters. The attached patch > fixes that issue by special casing INT_MIN hours. Good catch, but it looks to me like three out of the four formats in EncodeInterval have variants of this problem --- there are assumptions throughout that code that we can compute "-x" or "abs(x)" without fear. Not much point in fixing only one symptom. Also, I notice that there's an overflow hazard upstream of here, in interval2tm: regression=# select interval '214748364 hours' * 11; ERROR: interval out of range regression=# \errverbose ERROR: 22008: interval out of range LOCATION: interval2tm, timestamp.c:1982 There's no good excuse for not being able to print a value that we computed successfully. I wonder if the most reasonable fix would be to start using int64 instead of int arithmetic for the values that are potentially large. I doubt that we'd be taking much of a performance hit on modern hardware. regards, tom lane
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Sat, Feb 19, 2022 at 2:24 AM Nathan Bossart wrote: > > On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote: > > Some review comments on the latest version: > > Thanks for the feedback! Before I start spending more time on this one, I > should probably ask if this has any chance of making it into v15. I don't see any reason why it can't make it to v15. However, it is not super urgent as the users facing this problem have a choice. They can use non-exclusive mode. -- With Regards, Ashutosh Sharma.
Re: Make mesage at end-of-recovery less scary.
On Thu, Feb 17, 2022 at 1:20 PM Kyotaro Horiguchi wrote: > > At Tue, 15 Feb 2022 20:17:20 +0530, Ashutosh Sharma > wrote in > > OK. The v13 patch looks good. I have marked it as ready to commit. > > Thank you for working on all my review comments. > > Thaks! But the recent xlog.c refactoring crashes into this patch. > And I found a silly bug while rebasing. > > xlog.c:12463 / xlogrecovery.c:3168 > if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen, > .. > { > + Assert(!StandbyMode); > ... > + xlogreader->EndOfWAL = true; > > Yeah, I forgot about promotion there.. Yes, we exit WaitForWALToBecomeAvailable() even in standby mode provided the user has requested for the promotion. So checking for the !StandbyMode condition alone was not enough. So what I should have done is > setting EndOfWAL according to StandbyMode. > > + Assert(!StandbyMode || CheckForStandbyTrigger()); > ... > + /* promotion exit is not end-of-WAL */ > + xlogreader->EndOfWAL = !StandbyMode; > The changes looks good. thanks.! -- With Regards, Ashutosh Sharma.
Re: Timeout control within tests
On Fri, Feb 18, 2022 at 10:26:52AM -0500, Tom Lane wrote: > Noah Misch writes: > > On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote: > >> Meson's test runner has the concept of a "timeout multiplier" for ways of > >> running tests. Meson's stuff is about entire tests (i.e. one tap test), so > >> doesn't apply here, but I wonder if we shouldn't do something similar? > > > Hmmm. It is good if the user can express an intent that continues to make > > sense if we change the default timeout. For the buildfarm use case, a > > multiplier is moderately better on that axis (PG_TEST_TIMEOUT_MULTIPLIER=100 > > beats PG_TEST_TIMEOUT_DEFAULT=18000). For the hacker use case, an absolute > > value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 beats > > PG_TEST_TIMEOUT_MULTIPLIER=.01). > > FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after > finding that smaller values didn't work reliably in the buildfarm. > Now maybe 741d7f1 fixed that, but I wouldn't count on it. So while I > approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this > centralized setting, I think that we might need to have a multiplier > there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300 > across the board. Perhaps the latter is fine, but a multiplier seems a > bit more flexible. The PGISOLATIONTIMEOUT replacement was 2*timeout_default, so isolation suites would get 2*180s=360s. (I don't want to lower any default timeouts, but I don't mind raising them.) In a sense, PG_TEST_TIMEOUT_DEFAULT is a multiplier with as many sites as possible multiplying it by 1. The patch has multiples at two code sites. > On the other hand, I also support your point that an absolute setting > is easier to think about / adjust for special uses. So maybe we should > just KISS and use a single absolute setting until we find a hard reason > why that doesn't work well.
Re: logical decoding and replication of sequences
On 2/15/22 10:00, Peter Eisentraut wrote: > On 13.02.22 14:10, Tomas Vondra wrote: >> Here's the remaining part, rebased, with a small tweak in the TAP test >> to eliminate the issue with not waiting for sequence increments. I've >> kept the tweak in a separate patch, so that we can throw it away easily >> if we happen to resolve the issue. > > This basically looks fine to me. You have identified a few XXX and > FIXME spots that should be addressed. > > Here are a few more comments: > > * general > > Handling of owned sequences, as previously discussed. This would > probably be a localized change somewhere in get_rel_sync_entry(), so it > doesn't affect the overall structure of the patch. > So you're suggesting not to track owned sequences in pg_publication_rel explicitly, and handle them dynamically in output plugin? So when calling get_rel_sync_entry on the sequence, we'd check if it's owned by a table that is replicated. We'd want a way to enable/disable this for each publication, but that makes the lookups more complicated. Also, we'd probably need the same logic elsewhere (e.g. in psql, when listing sequences in a publication). I'm not sure we want this complexity, maybe we should simply deal with this in the ALTER PUBLICATION and track all sequences (owned or not) in pg_publication_rel. > pg_dump support is missing. > Will fix. > Some psql \dxxx support should probably be there. Check where existing > publication-table relationships are displayed. > Yeah, I noticed this while adding regression tests. Currently, \dRp+ shows something like this: Publication testpub_fortbl Owner | All tables | Inserts | Updates ... --++-+- ... regress_publication_user | f | t | t... Tables: "pub_test.testpub_nopk" "public.testpub_tbl1" or Publication testpub5_forschema Owner | All tables | Inserts | Updates | ... --++-+-+- ... regress_publication_user | f | t | t | ... Tables from schemas: "CURRENT_SCHEMA" "public" I wonder if we should copy the same block for sequences, so Publication testpub_fortbl Owner | All tables | Inserts | Updates ... --++-+- ... regress_publication_user | f | t | t... Tables: "pub_test.testpub_nopk" "public.testpub_tbl1" Sequences: "pub_test.sequence_s1" "public.sequence_s2" And same for "tables from schemas" etc. > * src/backend/catalog/system_views.sql > > + LATERAL pg_get_publication_sequences(P.pubname) GPT, > > The GPT presumably stood for "get publication tables", so should be > changed. > > (There might be a few more copy-and-paste occasions like this around. I > have not written down all of them.) > Will fix. > * src/backend/commands/publicationcmds.c > > This adds a new publication option called "sequence". I would have > expected it to be named "sequences", but that's just my opinion. > > But in any case, the option is not documented at all. > > Some of the new functions added in this file are almost exact duplicates > of the analogous functions for tables. For example, > PublicationAddSequences()/PublicationDropSequences() are almost > exactly the same as PublicationAddTables()/PublicationDropTables(). This > could be handled with less duplication by just adding an ObjectType > argument to the existing functions. > Yes, I noticed that too, and I'll review this later, after making sure the behavior is correct. > * src/backend/commands/sequence.c > > Could use some refactoring of ResetSequence()/ResetSequence2(). Maybe > call the latter ResetSequenceData() and have the former call it internally. > Will check. > * src/backend/commands/subscriptioncmds.c > > Also lots of duplication between tables and sequences in this file. > Same as the case above. > * src/backend/replication/logical/tablesync.c > > The comment says it needs sequence support, but there appears to be > support for initial sequence syncing. What exactly is missing here? > I think that's just obsolete comment. > * src/test/subscription/t/028_sequences.pl > > Change to use done_testing() (see > 549ec201d6132b7c7ee11ee90a4e02119259ba5b). Will fix. Do we need to handle both FOR ALL TABLES and FOR ALL SEQUENCES at the same time? That seems like a reasonable thing people might want. The patch probably also needs to modify pg_publication_namespace to track whether the schema is FOR TABLES IN SCHEMA or FOR SEQUENCES. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Time to drop plpython2?
Hi everyone, On Fri, Feb 18, 2022 at 02:41:04PM -0800, Andres Freund wrote: > There's snapper ("pgbf [ a t ] twiska.com"), and there's Mark Wong's large > menagerie. Mark said yesterday that he's working on updating. I've made one pass. Hopefully I didn't make any mistakes. :) Regards, Mark
Re: pg_upgrade verbosity when redirecting output to log file
+* If outputting to a tty / or , append newline. pg_log_v() will put the +* individual progress items onto the next line. +*/ + if (log_opts.isatty || log_opts.verbose) I guess the comment should say "or in verbose mode". -- Justin
Re: pg_upgrade verbosity when redirecting output to log file
Hi, On 2022-02-16 17:09:34 +1300, Thomas Munro wrote: > On Tue, Jan 11, 2022 at 4:42 AM Bruce Momjian wrote: > > On Sun, Jan 9, 2022 at 10:39:58PM -0800, Andres Freund wrote: > > > On 2022-01-10 01:14:32 -0500, Tom Lane wrote: > > > > I think I'd vote for just nuking that output altogether. > > > > It seems of very dubious value. > > > > > > It seems worthwhile in some form - on large cluster in copy mode, the > > > "Copying > > > user relation files" step can take *quite* a while, and even link/clone > > > mode > > > aren't fast. But perhaps what'd be really needed is something counting up > > > actual progress in percentage of files and/or space... > > > > > > I think just coupling it to verbose mode makes the most sense, for now? > > > > All of this logging is from the stage where I was excited pg_upgrade > > worked, and I wanted to give clear output if it failed in some way --- > > printing the file names seems like an easy solution. I agree at this > > point that logging should be reduced, and if they want more logging, the > > verbose option is the right way to get it. > > +1 I got a bit stuck on how to best resolve this. I felt bad about removing all interactive progress, because a pg_upgrade can take a while after all. But it's also not easy to come up with some good, without a substantially bigger effort than I want to invest. After all, I just want to be able to read check-world output. Nearly half of which is pg_upgrade test output right now. The attached is my attempt at coming up with something halfway sane without rewriting pg_upgrade logging entirely. I think it mostly ends up with at least as sane output as the current code. I needed to add a separate prep_status_progress() function to make that work. Greetings, Andres Freund >From a04d989a64e6c6bc273b0c8c0d778d5f66f00158 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 18 Feb 2022 17:16:37 -0800 Subject: [PATCH v2] pg_upgrade: Don't print progress status when output is not a tty. Until this change pg_upgrade with output redirected to a file / pipe would end up printing all files in the cluster. This has made check-world output exceedingly verbose. Author: Andres Freund Discussion: https://postgr.es/m/CA+hUKGKjrV61ZVJ8OSag+3rKRmCZXPc03bDyWMqhXg3rdZ=f...@mail.gmail.com --- src/bin/pg_upgrade/dump.c| 2 +- src/bin/pg_upgrade/option.c | 2 + src/bin/pg_upgrade/pg_upgrade.c | 2 +- src/bin/pg_upgrade/pg_upgrade.h | 2 + src/bin/pg_upgrade/relfilenode.c | 6 +-- src/bin/pg_upgrade/util.c| 63 ++-- 6 files changed, 60 insertions(+), 17 deletions(-) diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c index b69b4f95695..29b9e44f782 100644 --- a/src/bin/pg_upgrade/dump.c +++ b/src/bin/pg_upgrade/dump.c @@ -29,7 +29,7 @@ generate_old_dump(void) GLOBALS_DUMP_FILE); check_ok(); - prep_status("Creating dump of database schemas\n"); + prep_status_progress("Creating dump of database schemas"); /* create per-db dump files */ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c index d2c82cc2bbb..e75be2c423e 100644 --- a/src/bin/pg_upgrade/option.c +++ b/src/bin/pg_upgrade/option.c @@ -207,6 +207,8 @@ parseCommandLine(int argc, char *argv[]) if (log_opts.verbose) pg_log(PG_REPORT, "Running in verbose mode\n"); + log_opts.isatty = isatty(fileno(stdout)); + /* Turn off read-only mode; add prefix to PGOPTIONS? */ if (getenv("PGOPTIONS")) { diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c index f66bbd53079..ecb3e1f6474 100644 --- a/src/bin/pg_upgrade/pg_upgrade.c +++ b/src/bin/pg_upgrade/pg_upgrade.c @@ -381,7 +381,7 @@ create_new_objects(void) { int dbnum; - prep_status("Restoring database schemas in the new cluster\n"); + prep_status_progress("Restoring database schemas in the new cluster"); /* * We cannot process the template1 database concurrently with others, diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h index 0aca0a77aae..ca86c112924 100644 --- a/src/bin/pg_upgrade/pg_upgrade.h +++ b/src/bin/pg_upgrade/pg_upgrade.h @@ -274,6 +274,7 @@ typedef struct char *basedir; /* Base output directory */ char *dumpdir; /* Dumps */ char *logdir; /* Log files */ + bool isatty; /* is stdout a tty */ } LogOpts; @@ -427,6 +428,7 @@ void pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3); void pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn(); void end_progress_output(void); void prep_status(const char *fmt,...) pg_attribute_printf(1, 2); +void prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2); void check_ok(void); unsigned int str2uint(const char *str); diff --git a/src/bin/pg_upgrade/relfilenode.c b/src/bin/pg_upgrade/relfilenode.c index 2f4deb34163..d23ac884bd1 100644 --- a/src/bin/pg_upgrade/r
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 18, 2022 at 2:11 PM Andres Freund wrote: > I think it'd be good to add a few isolationtest cases for the > can't-get-cleanup-lock paths. I think it shouldn't be hard using cursors. The > slightly harder part is verifying that VACUUM did something reasonable, but > that still should be doable? We could even just extend existing, related tests, from vacuum-reltuples.spec. Another testing strategy occurs to me: we could stress-test the implementation by simulating an environment where the no-cleanup-lock path is hit an unusually large number of times, possibly a fixed percentage of the time (like 1%, 5%), say by making vacuumlazy.c's ConditionalLockBufferForCleanup() call return false randomly. Now that we have lazy_scan_noprune for the no-cleanup-lock path (which is as similar to the regular lazy_scan_prune path as possible), I wouldn't expect this ConditionalLockBufferForCleanup() testing gizmo to be too disruptive. -- Peter Geoghegan
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On 2/17/22 23:16, Robert Haas wrote: > On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra > wrote: >> IMHO the whole problem is we're unable to estimate the join clause as a >> conditional probability, i.e. >> >>P(A.x = B.x | (A.x < 42) & (B.x < 42)) >> >> so maybe instead of trying to generate additional RelOptInfo items we >> should think about improving that. The extra RelOptInfos don't really >> solve this, because even if you decide to join A|x<42 to B|x<42 it does >> nothing to improve the join clause estimate. > > I guess I hadn't considered that angle. I think the extra RelOptInfos > (or whatever) actually do solve a problem, because enforcing a > high-selectivity join qual against both sides is potentially quite > wasteful, and you need some way to decide whether to do it on one > side, the other, or both. But it's also true that I was wrong to > assume independence ... and if we could avoid assuming that, then the > join selectivity would work itself out without any of the machinery > that I just proposed. > True. We kinda already have this issue for the equality clauses, and having paths with the condition pushed down (or not) seems like a natural approach. >> It actually deals with a more general form of this case, because the >> clauses don't need to reference the same attribute - so for example this >> would work too, assuming there is extended stats object on the columns >> on each side: >> >> P(A.c = B.d | (A.e < 42) & (B.f < 42)) > > That'd be cool. > Yeah, but the patch implementing this still needs more work. >> Not sure. In my experience queries with both a join clause and other >> clauses referencing the same attribute are pretty rare. But I agree if >> we can do the expensive stuff only when actually needed, with no cost in >> the 99.999% other cases, I don't see why not. Of course, code complexity >> is a cost too. > > Right. I mean, we could have a planner GUC to control whether the > optimization is used even in cases where we see that it's possible. > But Tom keeps arguing that it is possible in many queries and would > benefit few queries, and I'm not seeing why that should be so. I think > it's likely to benefit many of the queries to which it applies. > Maybe. Although the example I linked some time ago shows a pretty dramatic improvement, due to picking merge join + index scan, and not realizing we'll have to skip a lot of data. But that's just one anecdotal example. Anyway, I think the best way to deal with these (perfectly legitimate) concerns is to show how expensive it is for queries not not having such join/restriction clauses, with the cost being close to 0. And then for queries with such clauses but not benefiting from the change (a bit like a worst case). regards [1] https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 18, 2022 at 1:56 PM Robert Haas wrote: > + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current > + * target relfrozenxid and relminmxid for the relation. Assumption is that > > "maintains" is fuzzy. I think you should be saying something much more > explicit, and the thing you are saying should make it clear that these > arguments are input-output arguments: i.e. the caller must set them > correctly before calling this function, and they will be updated by > the function. Makes sense. > I don't think you have to spell all of that out in every > place where this comes up in the patch, but it needs to be clear from > what you do say. For example, I would be happier with a comment that > said something like "Every call to this function will either set > HEAP_XMIN_FROZEN in the xl_heap_freeze_tuple struct passed as an > argument, or else reduce *NewRelfrozenxid to the xmin of the tuple if > it is currently newer than that. Thus, after a series of calls to this > function, *NewRelfrozenxid represents a lower bound on unfrozen xmin > values in the tuples examined. Before calling this function, caller > should initialize *NewRelfrozenxid to ." We have to worry about XIDs from MultiXacts (and xmax values more generally). And we have to worry about the case where we start out with only xmin frozen (by an earlier VACUUM), and then have to freeze xmax too. I believe that we have to generally consider xmin and xmax independently. For example, we cannot ignore xmax, just because we looked at xmin, since in general xmin alone might have already been frozen. > + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current > + * target relfrozenxid and relminmxid for the relation. Assumption is that > + * caller will never freeze any of the XIDs from the tuple, even when we say > + * that they should. If caller opts to go with our recommendation to freeze, > + * then it must account for the fact that it shouldn't trust how we've set > + * NewRelfrozenxid/NewRelminmxid. (In practice aggressive VACUUMs always > take > + * our recommendation because they must, and non-aggressive VACUUMs always > opt > + * to not freeze, preferring to ratchet back NewRelfrozenxid instead). > > I don't understand this one. > > +* (Actually, we maintain NewRelminmxid differently here, because we > +* assume that XIDs that should be frozen according to cutoff_xid > won't > +* be, whereas heap_prepare_freeze_tuple makes the opposite > assumption.) > > This one either. The difference between the cleanup lock path (in lazy_scan_prune/heap_prepare_freeze_tuple) and the share lock path (in lazy_scan_noprune/heap_tuple_needs_freeze) is what is at issue in both of these confusing comment blocks, really. Note that cutoff_xid is the name that both heap_prepare_freeze_tuple and heap_tuple_needs_freeze have for FreezeLimit (maybe we should rename every occurence of cutoff_xid in heapam.c to FreezeLimit). At a high level, we aren't changing the fundamental definition of an aggressive VACUUM in any of the patches -- we still need to advance relfrozenxid up to FreezeLimit in an aggressive VACUUM, just like on HEAD, today (we may be able to advance it *past* FreezeLimit, but that's just a bonus). But in a non-aggressive VACUUM, where there is still no strict requirement to advance relfrozenxid (by any amount), the code added by 0001 can set relfrozenxid to any known safe value, which could either be from before FreezeLimit, or after FreezeLimit -- almost anything is possible (provided we respect the relfrozenxid invariant, and provided we see that we didn't skip any all-visible-not-all-frozen pages). Since we still need to "respect FreezeLimit" in an aggressive VACUUM, the aggressive case might need to wait for a full cleanup lock the hard way, having tried and failed to do it the easy way within lazy_scan_noprune (lazy_scan_noprune will still return false when any call to heap_tuple_needs_freeze for any tuple returns false) -- same as on HEAD, today. And so the difference at issue here is: FreezeLimit/cutoff_xid only needs to affect the new NewRelfrozenxid value we use for relfrozenxid in heap_prepare_freeze_tuple, which is involved in real freezing -- not in heap_tuple_needs_freeze, whose main purpose is still to help us avoid freezing where a cleanup lock isn't immediately available. While the purpose of FreezeLimit/cutoff_xid within heap_tuple_needs_freeze is to determine its bool return value, which will only be of interest to the aggressive case (which might have to get a cleanup lock and do it the hard way), not the non-aggressive case (where ratcheting back NewRelfrozenxid is generally possible, and generally leaves us with almost as good of a value). In other words: the calls to heap_tuple_needs_freeze made from lazy_scan_noprune are simply concerned with the page as it actually is, whereas the similar/corresponding calls to heap_prepare_freeze_tuple from lazy
Re: Race conditions in 019_replslot_limit.pl
Hi, On 2022-02-18 18:49:14 -0500, Tom Lane wrote: > Andres Freund writes: > > We'd also need to add pq_endmessage_noblock(), because the pq_endmessage() > > obviously tries to send (as in the backtrace upthread) if the output buffer > > is > > large enough, which it often will be in walsender. > > I don't see that as "obvious". If we're there, we do not have an > error situation. The problem is that due walsender using pq_putmessage_noblock(), the output buffer is often going to be too full for a plain pq_endmessage() to not send out data. Because walsender will have an output buffer > PQ_SEND_BUFFER_SIZE a lot of the time, errors will commonly be thrown with the output buffer full. That leads the pq_endmessage() in send_message_to_frontend() to block sending the error message and the preceding WAL data. Leading to backtraces like: > #0 0x7faf4a570ec6 in epoll_wait (epfd=4, events=0x7faf4c201458, > maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 > #1 0x7faf4b8ced5c in WaitEventSetWaitBlock (set=0x7faf4c2013e0, > cur_timeout=-1, occurred_events=0x7ffe47df2320, nevents=1) > at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1465 > #2 0x7faf4b8cebe3 in WaitEventSetWait (set=0x7faf4c2013e0, timeout=-1, > occurred_events=0x7ffe47df2320, nevents=1, wait_event_info=100663297) > at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1411 > #3 0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, > ptr=0x7faf4c2f1210, len=21470) at > /home/andres/src/postgresql/src/backend/libpq/be-secure.c:298 > #4 0x7faf4b71aecb in internal_flush () at > /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1380 > #5 0x7faf4b71ada1 in internal_putbytes (s=0x7ffe47df23dc "E\177", len=1) > at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1326 > #6 0x7faf4b71b0cf in socket_putmessage (msgtype=69 'E', s=0x7faf4c201700 > "SFATAL", len=112) > at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1507 > #7 0x7faf4b71c151 in pq_endmessage (buf=0x7ffe47df2460) at > /home/andres/src/postgresql/src/backend/libpq/pqformat.c:301 > #8 0x7faf4babbb5e in send_message_to_frontend (edata=0x7faf4be2f1c0 > ) at > /home/andres/src/postgresql/src/backend/utils/error/elog.c:3253 > #9 0x7faf4bab8aa0 in EmitErrorReport () at > /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541 > #10 0x7faf4bab5ec0 in errfinish (filename=0x7faf4bc9770d "postgres.c", > lineno=3192, funcname=0x7faf4bc99170 <__func__.8> "ProcessInterrupts") > at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 > #11 0x7faf4b907e73 in ProcessInterrupts () at > /home/andres/src/postgresql/src/backend/tcop/postgres.c:3192 > #12 0x7faf4b8920af in WalSndLoop (send_data=0x7faf4b8927f2 > ) at > /home/andres/src/postgresql/src/backend/replication/walsender.c:2404 > > I guess we could try to flush in a blocking manner sometime later in the > > shutdown sequence, after we've released resources? But I'm doubtful it's a > > good idea, we don't really want to block waiting to exit when e.g. the > > network > > connection is dead without the TCP stack knowing. > > I think you are trying to move in the direction of possibly exiting > without ever sending at all, which does NOT seem like an improvement. I was just talking about doing another blocking pq_flush(), in addition to the pq_flush_if_writable() earlier. That'd be trying harder to send out data, not less hard... Greetings, Andres Freund
Re: Mark all GUC variable as PGDLLIMPORT
Hi, On 2022-02-15 08:06:58 -0800, Andres Freund wrote: > The more I think about it the more I'm convinced that if we want to do this, > we should do it for variables and functions. Btw, if we were to do this, we should just use -fvisibility=hidden everywhere and would see the same set of failures on unixoid systems as on windows. Of course only in in-core extensions, but it'd still be better than nothing. I proposed using -fvisibility=hidden in extensions: https://postgr.es/m/20211101020311.av6hphdl6xbjb...@alap3.anarazel.de Mostly because using explicit symbol exports makes it a lot easier to build extensions libraries on windows (with meson, but also everything built outside of postgres with msvc). But also because it makes function calls *inside* an extension have noticeably lower overhead. And it makes the set of symbols exported smaller. One problem I encountered was that it's actually kind of hard to set build flags only for extension libraries: https://postgr.es/m/20220111025328.iq5g6uck53j5qtin%40alap3.anarazel.de But if we added explicit exports to everything we export, we'd not need to restrict the use of -fvisibility=hidden to extension libraries anymore. Would get decent error messages on all platforms. Subsequently we could yield a bit of performance from critical paths by marking selected symbols as *not* exported. That'd make them a tad cheaper to call and allow the optimizer more leeway. Greetings, Andres Freund
Re: Race conditions in 019_replslot_limit.pl
Andres Freund writes: > On 2022-02-18 18:15:21 -0500, Tom Lane wrote: >> Perhaps it'd be sensible to do this only in debugging (ie Assert) >> builds? > That seems not great, because it pretty clearly can lead to hangs, which is > problematic in tests too. What about using pq_flush_if_writable()? In nearly > all situations that'd still push the failure to the client. That'd be okay by me. > We'd also need to add pq_endmessage_noblock(), because the pq_endmessage() > obviously tries to send (as in the backtrace upthread) if the output buffer is > large enough, which it often will be in walsender. I don't see that as "obvious". If we're there, we do not have an error situation. > I guess we could try to flush in a blocking manner sometime later in the > shutdown sequence, after we've released resources? But I'm doubtful it's a > good idea, we don't really want to block waiting to exit when e.g. the network > connection is dead without the TCP stack knowing. I think you are trying to move in the direction of possibly exiting without ever sending at all, which does NOT seem like an improvement. regards, tom lane
Re: Race conditions in 019_replslot_limit.pl
Hi, On 2022-02-18 18:15:21 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-02-17 21:55:21 -0800, Andres Freund wrote: > >> Isn't it pretty bonkers that we allow error processing to get stuck behind > >> network traffic, *before* we have have released resources (locks etc)? > > It's more or less intentional, per elog.c: > > /* > * This flush is normally not necessary, since postgres.c will flush out > * waiting data when control returns to the main loop. But it seems best > * to leave it here, so that the client has some clue what happened if the > * backend dies before getting back to the main loop ... error/notice > * messages should not be a performance-critical path anyway, so an extra > * flush won't hurt much ... > */ > pq_flush(); > > Perhaps it'd be sensible to do this only in debugging (ie Assert) > builds? That seems not great, because it pretty clearly can lead to hangs, which is problematic in tests too. What about using pq_flush_if_writable()? In nearly all situations that'd still push the failure to the client. We'd also need to add pq_endmessage_noblock(), because the pq_endmessage() obviously tries to send (as in the backtrace upthread) if the output buffer is large enough, which it often will be in walsender. I guess we could try to flush in a blocking manner sometime later in the shutdown sequence, after we've released resources? But I'm doubtful it's a good idea, we don't really want to block waiting to exit when e.g. the network connection is dead without the TCP stack knowing. Hm. There already is code trying to short-circuit sending errors to the client if a backend gets terminated. Introduced in commit 2ddb9149d14de9a2e7ac9ec6accf3ad442702b24 Author: Tom Lane Date: 2018-10-19 21:39:21 -0400 Server-side fix for delayed NOTIFY and SIGTERM processing. and predecessors. If ProcessClientWriteInterrupt() sees ProcDiePending, we'll stop trying to send stuff to the client if writes block. However, this doesn't work here, because we've already unset ProcDiePending: #7 0x7faf4b71c151 in pq_endmessage (buf=0x7ffe47df2460) at /home/andres/src/postgresql/src/backend/libpq/pqformat.c:301 #8 0x7faf4babbb5e in send_message_to_frontend (edata=0x7faf4be2f1c0 ) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3253 #9 0x7faf4bab8aa0 in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541 #10 0x7faf4bab5ec0 in errfinish (filename=0x7faf4bc9770d "postgres.c", lineno=3192, funcname=0x7faf4bc99170 <__func__.8> "ProcessInterrupts") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 #11 0x7faf4b907e73 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3192 #12 0x7faf4b8920af in WalSndLoop (send_data=0x7faf4b8927f2 ) at /home/andres/src/postgresql/src/backend/replication/walsender.c:2404 #13 0x7faf4b88f82e in StartReplication (cmd=0x7faf4c293fc0) at /home/andres/src/postgresql/src/backend/replication/walsender.c:834 Before ProcessInterrupts() FATALs due to a SIGTERM, it resets ProcDiePending. This seems not optimal. We can't just leave ProcDiePending set, otherwise we'll probably end up throwing more errors during the shutdown sequence. But it seems we need something similar to proc_exit_inprogress, except set earlier? And then take that into account in ProcessClientWriteInterrupt()? Greetings, Andres Freund
Re: Time to drop plpython2?
Andres Freund writes: > On 2022-02-18 18:09:19 -0500, Tom Lane wrote: >> This one was discussed on the buildfarm-owners list last month. >> There's no 32-bit python3 on that box, and apparently no plans >> to install one --- it sounded like the box is due for retirement >> anyway. > Any chance that was a private response? I just looked in the buildfarm-members > list (I assume you meant that?), and didn't see anything: > https://www.postgresql.org/message-id/flat/3162195.1642093011%40sss.pgh.pa.us Oh ... hm, yeah, looking at my mail logs, that came in as private mail and there's no corresponding buildfarm-members traffic. I did not keep the message unfortunately, but the owner indicated that he wasn't planning to bother updating python. Which is fine, but maybe we should press him to remove --with-python instead of disabling the box altogether --- we don't have a lot of Solaris clones in the farm. regards, tom lane
Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)
On Fri, Feb 18, 2022 at 05:38:56PM +0800, Julien Rouhaud wrote: > On Fri, Feb 18, 2022 at 05:22:36PM +0900, Michael Paquier wrote: >> So, I have been looking at this problem, and I don't see a problem in >> doing something like the attached, where we add a "regress" mode to >> compute_query_id that is a synonym of "auto". Or, in short, we have >> the default of letting a module decide if a query ID can be computed >> or not, at the exception that we hide its result in EXPLAIN outputs. >> >> Julien, what do you think? > > I don't see any problem with that patch. Thanks for looking at it. An advantage of this version is that it is backpatchable as the option enum won't break any ABI compatibility, so that's a win-win. >> FWIW, about your question of upthread, I have noticed the behavior >> while testing, but I know of some internal customers that enable >> pg_stat_statements and like doing tests on the PostgreSQL instance >> deployed this way, so that would break. They are not on 14 yet as far >> as I know. > > Are they really testing EXPLAIN output, or just doing application-level tests? With the various internal customers, both. And installcheck implies EXPLAIN outputs. First, let's wait a couple of days and see if anyone has an extra opinion to offer on this topic. -- Michael signature.asc Description: PGP signature
Re: Time to drop plpython2?
Hi, On 2022-02-18 18:09:19 -0500, Tom Lane wrote: > Andres Freund writes: > > There's one further failure, but the symptoms are quite different. I've also > > pinged its owner. I think it's a problem on the system, rather than our > > side, > > but less certain than with the other cases: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09 > > This one was discussed on the buildfarm-owners list last month. > There's no 32-bit python3 on that box, and apparently no plans > to install one --- it sounded like the box is due for retirement > anyway. Any chance that was a private response? I just looked in the buildfarm-members list (I assume you meant that?), and didn't see anything: https://www.postgresql.org/message-id/flat/3162195.1642093011%40sss.pgh.pa.us Greetings, Andres Freund
Re: Race conditions in 019_replslot_limit.pl
Andres Freund writes: > On 2022-02-17 21:55:21 -0800, Andres Freund wrote: >> Isn't it pretty bonkers that we allow error processing to get stuck behind >> network traffic, *before* we have have released resources (locks etc)? It's more or less intentional, per elog.c: /* * This flush is normally not necessary, since postgres.c will flush out * waiting data when control returns to the main loop. But it seems best * to leave it here, so that the client has some clue what happened if the * backend dies before getting back to the main loop ... error/notice * messages should not be a performance-critical path anyway, so an extra * flush won't hurt much ... */ pq_flush(); Perhaps it'd be sensible to do this only in debugging (ie Assert) builds? regards, tom lane
Re: Race conditions in 019_replslot_limit.pl
Hi, On 2022-02-18 14:42:48 -0800, Andres Freund wrote: > On 2022-02-17 21:55:21 -0800, Andres Freund wrote: > > Isn't it pretty bonkers that we allow error processing to get stuck behind > > network traffic, *before* we have have released resources (locks etc)? > > This is particularly likely to be a problem for walsenders, because they often > have a large output buffer filled, because walsender uses > pq_putmessage_noblock() to send WAL data. Which obviously can be large. > > In the stacktrace upthread you can see: > #3 0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, > ptr=0x7faf4c2f1210, len=21470) at > /home/andres/src/postgresql/src/backend/libpq/be-secure.c:29 > > which certainly is more than in most other cases of error messages being > sent. And it obviously might not be the first to have gone out. > > > > I wonder if we should try to send, but do it in a nonblocking way. > > I think we should probably do so at least during FATAL error processing. But > also consider doing so for ERROR, because not releasing resources after > getting cancelled / terminated is pretty nasty imo. Is it possible that what we're seeing is a deadlock, with both walsender and the pg_basebackup child trying to send data, but neither receiving? But that'd require that somehow the basebackup child process didn't exit with its parent. And I don't really see how that'd happen. I'm running out of ideas for how to try to reproduce this. I think we might need some additional debugging information to get more information from the buildfarm. I'm thinking of adding log_min_messages=DEBUG2 to primary3, passing --verbose to pg_basebackup in $node_primary3->backup(...). It might also be worth adding DEBUG2 messages to ReplicationSlotShmemExit(), ReplicationSlotCleanup(), InvalidateObsoleteReplicationSlots(). Greetings, Andres Freund
Re: Time to drop plpython2?
Andres Freund writes: > There's one further failure, but the symptoms are quite different. I've also > pinged its owner. I think it's a problem on the system, rather than our side, > but less certain than with the other cases: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09 This one was discussed on the buildfarm-owners list last month. There's no 32-bit python3 on that box, and apparently no plans to install one --- it sounded like the box is due for retirement anyway. regards, tom lane
Re: Race conditions in 019_replslot_limit.pl
Hi, On 2022-02-17 21:55:21 -0800, Andres Freund wrote: > Isn't it pretty bonkers that we allow error processing to get stuck behind > network traffic, *before* we have have released resources (locks etc)? This is particularly likely to be a problem for walsenders, because they often have a large output buffer filled, because walsender uses pq_putmessage_noblock() to send WAL data. Which obviously can be large. In the stacktrace upthread you can see: #3 0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, ptr=0x7faf4c2f1210, len=21470) at /home/andres/src/postgresql/src/backend/libpq/be-secure.c:29 which certainly is more than in most other cases of error messages being sent. And it obviously might not be the first to have gone out. > I wonder if we should try to send, but do it in a nonblocking way. I think we should probably do so at least during FATAL error processing. But also consider doing so for ERROR, because not releasing resources after getting cancelled / terminated is pretty nasty imo. Greetings, Andres Freund
Re: Time to drop plpython2?
Hi, Thanks to some more buildfarm animal updates things are looking better. I think there's now only three owners that haven't updated their animals successfully. One of which I hadn't yet pinged (chipmunk / Heikki), done now. There's snapper ("pgbf [ a t ] twiska.com"), and there's Mark Wong's large menagerie. Mark said yesterday that he's working on updating. There's one further failure, but the symptoms are quite different. I've also pinged its owner. I think it's a problem on the system, rather than our side, but less certain than with the other cases: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09 checking Python.h usability... no checking Python.h presence... yes configure: WARNING: Python.h: present but cannot be compiled configure: WARNING: Python.h: check for missing prerequisite headers? configure: WARNING: Python.h: see the Autoconf documentation configure: WARNING: Python.h: section "Present But Cannot Be Compiled" configure: WARNING: Python.h: proceeding with the compiler's result configure: WARNING: ## -- ## configure: WARNING: ## Report this to pgsql-b...@lists.postgresql.org ## configure: WARNING: ## -- ## checking for Python.h... no configure: error: header file is required for Python configure:19158: ccache gcc -c -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -m32 -I/usr/include/python3.9 -DWAIT_USE_POLL -D_POSIX_PTHREAD_SEMANTICS -I/usr/include/libxml2 conftest.c >&5 In file included from /usr/include/python3.9/Python.h:8, from conftest.c:235: /usr/include/python3.9/pyconfig.h:1443: warning: "SIZEOF_LONG" redefined 1443 | #define SIZEOF_LONG 8 | conftest.c:183: note: this is the location of the previous definition 183 | #define SIZEOF_LONG 4 | In file included from /usr/include/python3.9/Python.h:8, from conftest.c:235: /usr/include/python3.9/pyconfig.h:1467: warning: "SIZEOF_SIZE_T" redefined 1467 | #define SIZEOF_SIZE_T 8 | conftest.c:182: note: this is the location of the previous definition 182 | #define SIZEOF_SIZE_T 4 | In file included from /usr/include/python3.9/Python.h:8, from conftest.c:235: /usr/include/python3.9/pyconfig.h:1476: warning: "SIZEOF_VOID_P" redefined 1476 | #define SIZEOF_VOID_P 8 | conftest.c:181: note: this is the location of the previous definition 181 | #define SIZEOF_VOID_P 4 | In file included from /usr/include/python3.9/Python.h:63, from conftest.c:235: /usr/include/python3.9/pyport.h:736:2: error: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)." 736 | #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)." | ^ This is a 64bit host, targetting 32bit "CFLAGS' => '-m32'. However it linked successfully against python 2. Greetings, Andres Freund
Re: killing perl2host
On 2/17/22 12:12, Andres Freund wrote: > Hi, > > On 2022-02-17 09:20:56 -0500, Andrew Dunstan wrote: >> I don't think we have or have ever had a buildfarm animal targeting >> msys. In general I think of msys as a build environment to create native >> binaries. But if we want to support targeting msys we should have an >> animal doing that. > It's pretty much cygwin. Wouldn't hurt to have a dedicated animal though, I > agree. We do have a dedicated path for it in configure.ac: > > case $host_os in > ... > cygwin*|msys*) template=cygwin ;; FYI I tested it while in wait mode for something else, and it fell over at the first hurdle: running bootstrap script ... 2022-02-18 22:25:45.119 UTC [34860] FATAL: could not create shared memory segment: Function not implemented 2022-02-18 22:25:45.119 UTC [34860] DETAIL: Failed system call was shmget(key=1407374884304065, size=56, 03600). child process exited with exit code 1 I'm not intending to put any more effort into supporting it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Synchronizing slots from primary to standby
Hi, On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote: > On 05.02.22 20:59, Andres Freund wrote: > > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote: > > > From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001 > > > From: Peter Eisentraut > > > Date: Mon, 3 Jan 2022 14:43:36 +0100 > > > Subject: [PATCH v3] Synchronize logical replication slots from primary to > > > standby > > I've just skimmed the patch and the related threads. As far as I can tell > > this > > cannot be safely used without the conflict handling in [1], is that correct? > > This or similar questions have been asked a few times about this or similar > patches, but they always come with some doubt. I'm certain it's a problem - the only reason I couched it was that there could have been something clever in the patch preventing problems that I missed because I just skimmed it. > If we think so, it would be > useful perhaps if we could come up with test cases that would demonstrate > why that other patch/feature is necessary. (I'm not questioning it > personally, I'm just throwing out ideas here.) The patch as-is just breaks one of the fundamental guarantees necessary for logical decoding, that no rows versions can be removed that are still required for logical decoding (signalled via catalog_xmin). So there needs to be an explicit mechanism upholding that guarantee, but there is not right now from what I can see. One piece of the referenced patchset is that it adds information about removed catalog rows to a few WAL records, and then verifies during replay that no record can be replayed that removes resources that are still needed. If such a conflict exists it's dealt with as a recovery conflict. That itself doesn't provide prevention against removal of required, but it provides detection. The prevention against removal can then be done using a physical replication slot with hot standby feedback or some other mechanism (e.g. slot syncing mechanism could maintain a "placeholder" slot on the primary for all sync targets or something like that). Even if that infrastructure existed / was merged, the slot sync stuff would still need some very careful logic to protect against problems due to concurrent WAL replay and "synchronized slot" creation. But that's doable. Greetings, Andres Freund
Re: Emit a warning if the extension's GUC is set incorrectly
Florin Irion writes: > Il giorno ven 18 feb 2022 alle ore 10:58 Tom Lane ha > scritto: >> Here's a delta patch (meant to be applied after reverting cab5b9ab2) >> that does things like that. It fixes the parallel-mode problem ... >> so do we want to tighten things up this much? > Thank you for taking care of the bug I introduced with 75d22069e, > I didn't notice this thread until now. Yeah, this thread kinda disappeared under the rug during the Christmas holidays, but we need to decide whether we want to push forward or leave things at the status quo. > For what it's worth, I agree with throwing an ERROR if the placeholder is > unrecognized. Initially, I didn't want to change too much the liberty of > setting any placeholder, but mainly to not go unnoticed. I also think that this is probably a good change to make. One situation where somebody would be unhappy is if they're using GUC variables as plain custom variables despite them being within the namespace of an extension they're using. But that seems to me to be (a) far-fetched and (b) mighty dangerous, since some new version of the extension might suddenly start reacting to those variables in ways you presumably didn't expect. Perhaps a more plausible use-case is "I need to work with both versions X and Y of extension E, and Y has setting e.foo while X doesn't --- but I can just set e.foo unconditionally since X will ignore it". With this patch, that could still work, but you'd have to be certain to apply the setting before loading E. I don't really see any other use-cases favoring the current behavior. On balance, eliminating possible mistakes seems like enough of a benefit to justify disallowing these cases. regards, tom lane
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-02-18 15:54:19 -0500, Robert Haas wrote: > > Are there any objections to this plan? > > I really like the idea of reducing the scope of what is being changed > here, and I agree that eagerly advancing relfrozenxid carries much > less risk than the other changes. Sounds good to me too! Greetings, Andres Freund
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-02-18 13:09:45 -0800, Peter Geoghegan wrote: > 0001 is tricky in the sense that there are a lot of fine details, and > if you get any one of them wrong the result might be a subtle bug. For > example, the heap_tuple_needs_freeze() code path is only used when we > cannot get a cleanup lock, which is rare -- and some of the branches > within the function are relatively rare themselves. The obvious > concern is: What if some detail of how we track the new relfrozenxid > value (and new relminmxid value) in this seldom-hit codepath is just > wrong, in whatever way we didn't think of? I think it'd be good to add a few isolationtest cases for the can't-get-cleanup-lock paths. I think it shouldn't be hard using cursors. The slightly harder part is verifying that VACUUM did something reasonable, but that still should be doable? Greetings, Andres Freund
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 18, 2022 at 4:10 PM Peter Geoghegan wrote: > It does not change any other behavior. It's totally mechanical. > > 0001 is tricky in the sense that there are a lot of fine details, and > if you get any one of them wrong the result might be a subtle bug. For > example, the heap_tuple_needs_freeze() code path is only used when we > cannot get a cleanup lock, which is rare -- and some of the branches > within the function are relatively rare themselves. The obvious > concern is: What if some detail of how we track the new relfrozenxid > value (and new relminmxid value) in this seldom-hit codepath is just > wrong, in whatever way we didn't think of? Right. I think we have no choice but to accept such risks if we want to make any progress here, and every patch carries them to some degree. I hope that someone else will review this patch in more depth than I have just now, but what I notice reading through it is that some of the comments seem pretty opaque. For instance: + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current + * target relfrozenxid and relminmxid for the relation. Assumption is that "maintains" is fuzzy. I think you should be saying something much more explicit, and the thing you are saying should make it clear that these arguments are input-output arguments: i.e. the caller must set them correctly before calling this function, and they will be updated by the function. I don't think you have to spell all of that out in every place where this comes up in the patch, but it needs to be clear from what you do say. For example, I would be happier with a comment that said something like "Every call to this function will either set HEAP_XMIN_FROZEN in the xl_heap_freeze_tuple struct passed as an argument, or else reduce *NewRelfrozenxid to the xmin of the tuple if it is currently newer than that. Thus, after a series of calls to this function, *NewRelfrozenxid represents a lower bound on unfrozen xmin values in the tuples examined. Before calling this function, caller should initialize *NewRelfrozenxid to ." +* Changing nothing, so might have to ratchet back NewRelminmxid, +* NewRelfrozenxid, or both together This comment I like. +* New multixact might have remaining XID older than +* NewRelfrozenxid This one's good, too. + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current + * target relfrozenxid and relminmxid for the relation. Assumption is that + * caller will never freeze any of the XIDs from the tuple, even when we say + * that they should. If caller opts to go with our recommendation to freeze, + * then it must account for the fact that it shouldn't trust how we've set + * NewRelfrozenxid/NewRelminmxid. (In practice aggressive VACUUMs always take + * our recommendation because they must, and non-aggressive VACUUMs always opt + * to not freeze, preferring to ratchet back NewRelfrozenxid instead). I don't understand this one. +* (Actually, we maintain NewRelminmxid differently here, because we +* assume that XIDs that should be frozen according to cutoff_xid won't +* be, whereas heap_prepare_freeze_tuple makes the opposite assumption.) This one either. I haven't really grokked exactly what is happening in heap_tuple_needs_freeze yet, and may not have time to study it further in the near future. Not saying it's wrong, although improving the comments above would likely help me out. > > If there's a way you can make the precise contents of 0002 and 0003 > > more clear, I would like that, too. > > The really big one is 0002 -- even 0003 (the FSM PageIsAllVisible() > thing) wasn't on the table before now. 0002 is the patch that changes > the basic criteria for freezing, making it block-based rather than > based on the FreezeLimit cutoff (barring edge cases that are important > for correctness, but shouldn't noticeably affect freezing overhead). > > The single biggest practical improvement from 0002 is that it > eliminates what I've called the freeze cliff, which is where many old > tuples (much older than FreezeLimit/vacuum_freeze_min_age) must be > frozen all at once, in a balloon payment during an eventual aggressive > VACUUM. Although it's easy to see that that could be useful, it is > harder to justify (much harder) than anything else. Because we're > freezing more eagerly overall, we're also bound to do more freezing > without benefit in certain cases. Although I think that this can be > justified as the cost of doing business, that's a hard argument to > make. You've used the term "freezing cliff" repeatedly in earlier emails, and this is the first time I've been able to understand what you meant. I'm glad I do, now. But can you describe the algorithm that 0002 uses to accomplish this improvement? Like "if it sees that the page meets criteria X, then it freezes all tuples on the page, else if it sees th
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Thanks a lot for updating the patch. Tried to apply the patches to master branch, no warning found and regression test passed. Now, we have many places (5) calling the same function with a constant number 3. Is this a good time to consider redefine this number a macro somewhere? Thank you, On 2022-02-17 8:46 a.m., Fujii Masao wrote: On 2022/02/11 21:59, Etsuro Fujita wrote: I tweaked comments/docs a little bit as well. Thanks for updating the patches! I reviewed 0001 patch. It looks good to me except the following minor things. If these are addressed, I think that the 001 patch can be marked as ready for committer. + * Also determine to commit (sub)transactions opened on the remote server + * in parallel at (sub)transaction end. Like the comment "Determine whether to keep the connection ...", "determine to commit" should be "determine whether to commit"? "remote server" should be "remote servers"? + curlevel = GetCurrentTransactionNestLevel(); + snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel); Why does pgfdw_finish_pre_subcommit_cleanup() need to call GetCurrentTransactionNestLevel() and construct the "RELEASE SAVEPOINT" query string again? pgfdw_subxact_callback() already does them and probably we can make it pass either of them to pgfdw_finish_pre_subcommit_cleanup() as its argument. + This option controls whether postgres_fdw commits + remote (sub)transactions opened on a foreign server in a local + (sub)transaction in parallel when the local (sub)transaction commits. "a foreign server" should be "foreign servers"? "in a local (sub)transaction" part seems not to be necessary. Regards, -- David Software Engineer Highgo Software Inc. (Canada) www.highgo.ca
Re: Synchronizing slots from primary to standby
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut wrote: > > The way I understand it: > > 1. This feature (probably) depends on the "Minimal logical decoding on > standbys" patch. The details there aren't totally clear (to me). That > patch had some activity lately but I don't see it in a state that it's > nearing readiness. > > 2. I think the way this (my) patch is currently written needs some > refactoring about how we launch and manage workers. Right now, it's all > mangled together with logical replication, since that is a convenient > way to launch and manage workers, but it really doesn't need to be tied > to logical replication, since it can also be used for other logical slots. > > 3. It's an open question how to configure this. My patch show a very > minimal configuration that allows you to keep all logical slots always > behind one physical slot, which addresses one particular use case. In > general, you might have things like, one set of logical slots should > stay behind one physical slot, another set behind another physical slot, > another set should not care, etc. This could turn into something like > the synchronous replication feature, where it ends up with its own > configuration language. > > Each of these are clearly significant jobs on their own. > Thanks for bringing this topic back up again. I haven't gotten a chance to do any testing on the patch yet, but here are my initial notes from reviewing it: First, reusing the logical replication launcher seems a bit gross. It's obviously a pragmatic choice, but I find it confusing and likely to become only moreso given the fact there's nothing about slot syncing that's inherently limited to logical slots. Plus the feature currently is about syncing slots on a physical replica. So I think that probably should change. Second, it seems to me that the worker-per-DB architecture means that this is unworkable on a cluster with a large number of DBs. The original thread said that was because "logical slots are per database, walrcv_exec needs db connection, etc". As to the walrcv_exec, we're (re)connecting to the primary for each synchronization anyway, so that doesn't seem like a significant reason. I don't understand why logical slots being per-database means we have to do it this way. Is there something about the background worker architecture (I'm revealing my own ignorance here I suppose) that requires this? Also it seems that we reconnect to the primary every time we want to synchronize slots. Maybe that's OK, but it struck me as a bit odd, so I wanted to ask about it. Third, wait_for_standby_confirmation() needs a function comment. Andres noted this earlier, but it seems like we're doing quite a bit of work in this function for each commit. Some of it is obviously duplicative like the parsing of standby_slot_names. The waiting introduced also doesn't seem like a good idea. Andres also commented on that earlier; I'd echo his comments here absent an explanation of why it's preferable/necessary to do it this way. > + if (flush_pos >= commit_lsn && wait_slots_remaining > 0) > + wait_slots_remaining --; I might be missing something re: project style, but the space before the "--" looks odd to my eyes. >* Call either PREPARE (for two-phase transactions) or COMMIT (for >* regular ones). >*/ > + > + wait_for_standby_confirmation(commit_lsn); > + > if (rbtxn_prepared(txn)) > rb->prepare(rb, txn, commit_lsn); > else It appears the addition of this call splits the comment from the code it goes with. > + * Wait for remote slot to pass localy reserved position. Typo ("localy" -> "locally"). This patch would be a significant improvement for us; I'm hoping we can see some activity on it. I'm also hoping to try to do some testing next week and see if I can poke any holes in the functionality (with the goal of verifying Andres's concerns about the safety without the minimal logical decoding on a replica patch). Thanks, James Coleman
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 18, 2022 at 12:54 PM Robert Haas wrote: > I'd like to have a clearer idea of exactly what is in each of the > remaining patches before forming a final opinion. Great. > What's tricky about 0001? Does it change any other behavior, either as > a necessary component of advancing relfrozenxid more eagerly, or > otherwise? It does not change any other behavior. It's totally mechanical. 0001 is tricky in the sense that there are a lot of fine details, and if you get any one of them wrong the result might be a subtle bug. For example, the heap_tuple_needs_freeze() code path is only used when we cannot get a cleanup lock, which is rare -- and some of the branches within the function are relatively rare themselves. The obvious concern is: What if some detail of how we track the new relfrozenxid value (and new relminmxid value) in this seldom-hit codepath is just wrong, in whatever way we didn't think of? On the other hand, we must already be precise in almost the same way within heap_tuple_needs_freeze() today -- it's not all that different (we currently need to avoid leaving any XIDs < FreezeLimit behind, which isn't made that less complicated by the fact that it's a static XID cutoff). Plus, we have experience with bugs like this. There was hardening added to catch stuff like this back in 2017, following the "freeze the dead" bug. > If there's a way you can make the precise contents of 0002 and 0003 > more clear, I would like that, too. The really big one is 0002 -- even 0003 (the FSM PageIsAllVisible() thing) wasn't on the table before now. 0002 is the patch that changes the basic criteria for freezing, making it block-based rather than based on the FreezeLimit cutoff (barring edge cases that are important for correctness, but shouldn't noticeably affect freezing overhead). The single biggest practical improvement from 0002 is that it eliminates what I've called the freeze cliff, which is where many old tuples (much older than FreezeLimit/vacuum_freeze_min_age) must be frozen all at once, in a balloon payment during an eventual aggressive VACUUM. Although it's easy to see that that could be useful, it is harder to justify (much harder) than anything else. Because we're freezing more eagerly overall, we're also bound to do more freezing without benefit in certain cases. Although I think that this can be justified as the cost of doing business, that's a hard argument to make. In short, 0001 is mechanically tricky, but easy to understand at a high level. Whereas 0002 is mechanically simple, but tricky to understand at a high level (and therefore far trickier than 0001 overall). -- Peter Geoghegan
Re: Time to drop plpython2?
On 2/18/22 15:53, Andres Freund wrote: the next run succeeded, with 'PYTHON' => 'python3' in build env. But presumably this just was because you installed the python3-devel package? Ok, I guess I got confused when it failed due to the missing devel package, because I removed the PYTHON => 'python3' from the build env and it is still getting successfully past the configure stage. Sorry for the noise. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: Time to drop plpython2?
On 2022-Feb-17, Andres Freund wrote: > Now also pinged: > - guaibasaurus Fixed now (apt install python3-dev), but I had initially added PYTHON=>python3 to the .conf, unsuccessfully because I failed to install the dev pkg. After the first success I removed that line. It should still work if we do test python3 first, but if it does fail, then I'll put that line back. Thanks -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "The problem with the future is that it keeps turning into the present" (Hobbes)
Re: Small TAP tests cleanup for Windows and unused modules
> On 16 Feb 2022, at 23:36, Andrew Dunstan wrote: > On 2/16/22 17:04, Andrew Dunstan wrote: >> On 2/16/22 16:36, Daniel Gustafsson wrote: >>> Seeing msys in TAP tests mentioned in a thread [1] tonight reminded me about >>> two related (well, one of them) patches I had sitting around, so rather than >>> forgetting again here are some small cleanups. >>> >>> 0001 attempts to streamline how we detect Windows in the TAP tests (after >>> that >>> there is a single msys check left that I'm not sure about, but [1] seems to >>> imply it could go); 0002 removes some unused module includes which either >>> were >>> used at some point in the past or likely came from copy/paste. >>> >> 0002 looks OK at first glance. >> >> 0001 is something we should investigate. It's really going in the wrong >> direction I suspect. We should be looking to narrow the scope of these >> platform-specific bits of processing, not expand them. > > More specifically, all the tests in question are now passing on jacana > and fairywren, and their $Config{osname} is NOT 'msys' (it's 'MSWin32'). +1 > So we should simply remove any line that ends "if $Config{osname} eq > 'msys';" since we don't have any such animals any more. Since msys is discussed in the other thread let's drop the 0001 from this thread and just go ahead with 0002. -- Daniel Gustafsson https://vmware.com/
Re: Trap errors from streaming child in pg_basebackup to exit early
> On 16 Feb 2022, at 08:27, Michael Paquier wrote: > > On Wed, Sep 29, 2021 at 01:18:40PM +0200, Daniel Gustafsson wrote: >> So there is one mention of a background WAL receiver already in there, but >> it's >> pretty inconsistent as to what we call it. For now I've changed the >> messaging >> in this patch to say "background process", leaving making this all consistent >> for a follow-up patch. >> >> The attached fixes the above, as well as the typo mentioned off-list and is >> rebased on top of todays HEAD. > > I have been looking a bit at this patch, and did some tests on Windows > to find out that this is able to catch the failure of the thread > streaming the WAL segments in pg_basebackup, avoiding a completion of > the base backup, while HEAD waits until the backup finishes. Testing > this scenario is actually simple by issuing pg_terminate_backend() on > the WAL sender that streams the WAL with START_REPLICATION, while > throttling the base backup. Great, thanks! > Could you add a test to automate this scenario? As far as I can see, > something like the following should be stable even for Windows: > 1) Run a pg_basebackup in the background with IPC::Run, using > --max-rate with a minimal value to slow down the base backup, for slow > machines. 013_crash_restart.pl does that as one example with $killme. > 2) Find out the WAL sender doing START_REPLICATION in the backend, and > issue pg_terminate_backend() on it. > 3) Use a variant of pump_until() on the pg_basebackup process and > check after one or more failure patterns. We should refactor this > part, actually. If this new test uses the same logic, that would make > three tests doing that with 022_crash_temp_files.pl and > 013_crash_restart.pl. The CI should be fine to provide any feedback > with the test in place, though I am fine to test things also in my > box. This is good idea, I was going in a different direction earlier with a test but this is cleaner. The attached 0001 refactors pump_until; 0002 fixes a trivial spelling error found while hacking; and 0003 is the previous patch complete with a test that passes on Cirrus CI. -- Daniel Gustafsson https://vmware.com/ v5-0001-Add-function-to-pump-IPC-process-until-string-mat.patch Description: Binary data v5-0002-Remove-duplicated-word-in-comment.patch Description: Binary data v5-0003-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch Description: Binary data
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote: > Some review comments on the latest version: Thanks for the feedback! Before I start spending more time on this one, I should probably ask if this has any chance of making it into v15. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 18, 2022 at 3:41 PM Peter Geoghegan wrote: > Concerns about my general approach to this project (and even the > Postgres 14 VACUUM work) were expressed by Robert and Andres over on > the "Nonrandom scanned_pages distorts pg_class.reltuples set by > VACUUM" thread. Some of what was said honestly shocked me. It now > seems unwise to pursue this project on my original timeline. I even > thought about shelving it indefinitely (which is still on the table). > > I propose the following compromise: the least contentious patch alone > will be in scope for Postgres 15, while the other patches will not be. > I'm referring to the first patch from v8, which adds dynamic tracking > of the oldest extant XID in each heap table, in order to be able to > use it as our new relfrozenxid. I can't imagine that I'll have > difficulty convincing Andres of the merits of this idea, for one, > since it was his idea in the first place. It makes a lot of sense, > independent of any change to how and when we freeze. > > The first patch is tricky, but at least it won't require elaborate > performance validation. It doesn't change any of the basic performance > characteristics of VACUUM. It sometimes allows us to advance > relfrozenxid to a value beyond FreezeLimit (typically only possible in > an aggressive VACUUM), which is an intrinsic good. If it isn't > effective then the overhead seems very unlikely to be noticeable. It's > pretty much a strictly additive improvement. > > Are there any objections to this plan? I really like the idea of reducing the scope of what is being changed here, and I agree that eagerly advancing relfrozenxid carries much less risk than the other changes. I'd like to have a clearer idea of exactly what is in each of the remaining patches before forming a final opinion. What's tricky about 0001? Does it change any other behavior, either as a necessary component of advancing relfrozenxid more eagerly, or otherwise? If there's a way you can make the precise contents of 0002 and 0003 more clear, I would like that, too. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Time to drop plpython2?
Hi, On 2022-02-18 15:35:37 -0500, Joe Conway wrote: > Initially I just installed the python3 RPMs and when I tried running > manually it was still error'ing on configure due to finding python2. > Even after adding EXPORT PYTHON=python3 to my ~/.bash_profile I was seeing > the same. > > By adding PYTHON => 'python3' to build-farm.conf I saw that the error > changed to indicate missing python3-devel package. Once I installed that, > everything went green. Hm. It definitely did test python3, earlier today: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2022-02-18%2016%3A52%3A13 checking for python3... no checking for python... /usr/bin/python the next run then saw: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2022-02-18%2017%3A50%3A09 checking for PYTHON... python3 configure: using python 3.6.8 (default, Nov 16 2020, 16:55:22) checking for Python sysconfig module... yes checking Python configuration directory... /usr/lib64/python3.6/config-3.6m-x86_64-linux-gnu checking Python include directory... -I/usr/include/python3.6m but then failed because the python headers weren't available: checking for Python.h... no configure: error: header file is required for Python Note that this did *not* yet use PYTHON => 'python3' in build_env, but has it in the environment starting the buildfarm client. the next run succeeded, with 'PYTHON' => 'python3' in build env. But presumably this just was because you installed the python3-devel package? Greetings, Andres Freund
Re: O(n) tasks cause lengthy startups and checkpoints
> On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote: >>> > The improvements around deleting temporary files and serialized snapshots >>> > afaict don't require a dedicated process - they're only relevant during >>> > startup. We could use the approach of renaming the directory out of the >>> > way as >>> > done in this patchset but perform the cleanup in the startup process after >>> > we're up. BTW I know you don't like the dedicated process approach, but one improvement to that approach could be to shut down the custodian process when it has nothing to do. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Fri, Feb 11, 2022 at 8:30 PM Peter Geoghegan wrote: > Attached is v8. No real changes -- just a rebased version. Concerns about my general approach to this project (and even the Postgres 14 VACUUM work) were expressed by Robert and Andres over on the "Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM" thread. Some of what was said honestly shocked me. It now seems unwise to pursue this project on my original timeline. I even thought about shelving it indefinitely (which is still on the table). I propose the following compromise: the least contentious patch alone will be in scope for Postgres 15, while the other patches will not be. I'm referring to the first patch from v8, which adds dynamic tracking of the oldest extant XID in each heap table, in order to be able to use it as our new relfrozenxid. I can't imagine that I'll have difficulty convincing Andres of the merits of this idea, for one, since it was his idea in the first place. It makes a lot of sense, independent of any change to how and when we freeze. The first patch is tricky, but at least it won't require elaborate performance validation. It doesn't change any of the basic performance characteristics of VACUUM. It sometimes allows us to advance relfrozenxid to a value beyond FreezeLimit (typically only possible in an aggressive VACUUM), which is an intrinsic good. If it isn't effective then the overhead seems very unlikely to be noticeable. It's pretty much a strictly additive improvement. Are there any objections to this plan? -- Peter Geoghegan
Re: Time to drop plpython2?
On 2/18/22 15:25, Andres Freund wrote: On 2022-02-18 14:46:39 -0500, Joe Conway wrote: $ ll /usr/bin/python lrwxrwxrwx. 1 root root 7 Mar 13 2021 /usr/bin/python -> python2 8<--- Yea, that all looks fine. What's the problem if you don't specify the PYTHON=python3? We try python3, python in that order by default, so it should pick up the same python3 you specify explicitly? Initially I just installed the python3 RPMs and when I tried running manually it was still error'ing on configure due to finding python2. Even after adding EXPORT PYTHON=python3 to my ~/.bash_profile I was seeing the same. By adding PYTHON => 'python3' to build-farm.conf I saw that the error changed to indicate missing python3-devel package. Once I installed that, everything went green. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: Design of pg_stat_subscription_workers vs pgstats
Hi, On 2022-02-18 17:26:04 +0900, Masahiko Sawada wrote: > With this change, pg_stat_subscription_workers will be like: > > * subid > * subname > * subrelid > * error_count > * last_error_timestamp > This view will be extended by adding transaction statistics proposed > on another thread[1]. I do not agree with these bits. What's the point of these per-relation stats at this poitns. You're just duplicating the normal relation pg_stats here. I really think we just should drop pg_stat_subscription_workers. Even if we don't, we definitely should rename it, because it still isn't meaningfully about workers. This stuff is getting painful for me. I'm trying to clean up some stuff for shared memory stats, and this stuff doesn't fit in with the rest. I'll have to rework some core stuff in the shared memory stats patch to make it work with this. Just to then quite possibly deal with reverting that part. Given the degree we're still designing stuff at this point, I think the appropriate thing is to revert the patch, and then try from there. Greetings, Andres Freund
Re: Time to drop plpython2?
Hi, On 2022-02-18 14:46:39 -0500, Joe Conway wrote: > On 2/18/22 14:37, Andres Freund wrote: > > > That seems to have worked. > > > > > > But the question is, is that the correct/recommended method? > > > > If python3 is in PATH, then you shouldn't need that part. > > Not quite -- python3 is definitely in the PATH: > > 8<--- > $ which python3 > /usr/bin/python3 > 8<--- > > And I gather that merely installing python3 RPMs on a RHEL-esque 7.X system > does not replace the python symlink: > > 8<--- > $ yum whatprovides /usr/bin/python > > python-2.7.5-89.el7.x86_64 : An interpreted, interactive, object-oriented > programming language > Repo: base > Matched from: > Filename: /usr/bin/python > > $ ll /usr/bin/python > lrwxrwxrwx. 1 root root 7 Mar 13 2021 /usr/bin/python -> python2 > 8<--- Yea, that all looks fine. What's the problem if you don't specify the PYTHON=python3? We try python3, python in that order by default, so it should pick up the same python3 you specify explicitly? Or maybe I'm just misunderstanding what you're asking... Greetings, Andres Freund
Re: buildfarm warnings
On Thu, Feb 17, 2022 at 3:57 PM Robert Haas wrote: > True. That would be easy enough. I played around with this a bit, and of course it is easy enough to add --progress with or without --verbose to a few tests, but when I reverted 62cb7427d1e491faf8612a82c2e3711a8cd65422 it didn't make any tests fail. So then I tried sticking --progress --verbose into @pg_basebackup_defs so all the tests would run with it, and that did make some tests fail, but the same ones fail with and without the patch. So it doesn't seem like we would have caught this particular problem via this testing method no matter what we did in detail. If we just want to splatter a few --progress switches around for the heck of it, we could do something like the attached. But I don't know if this is best: it seems highly arguable what to do in detail (and also not worth arguing about, so if someone else feels motivated to do something different than this, or the same as this, fine by me). -- Robert Haas EDB: http://www.enterprisedb.com splatter-some-progress.patch Description: Binary data
Re: Design of pg_stat_subscription_workers vs pgstats
On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada wrote: > > Here is the summary of the discussion, changes, and plan. > > 1. Move some error information such as the error message to a new > system catalog, pg_subscription_error. The pg_subscription_error table > would have the following columns: > > * sesubid : subscription Oid. > * serelid : relation Oid (NULL for apply worker). > * seerrlsn : commit-LSN or the error transaction. > * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction. > * seerrmsg : error message > Not a fan of the "se" prefix but overall yes. We should include a timestamp. > The tuple is inserted or updated when an apply worker or a tablesync > worker raises an error. If the same error occurs in a row, the update > is skipped. I disagree with this - I would treat every new instance of an error to be important and insert on conflict (sesubid, serelid) the new entry, updating lsn/cmd/msg with the new values. The tuple is removed in the following cases: > > * the subscription is dropped. > * the table is dropped. * the table is removed from the subscription. > * the worker successfully committed a non-empty transaction. > Correct. This handles the "end of sync worker" just fine since its final action should be a successful commit of a non-empty transaction. > > With this change, pg_stat_subscription_workers will be like: > > * subid > * subname > * subrelid > * error_count > * last_error_timestamp > > This view will be extended by adding transaction statistics proposed > on another thread[1]. > I haven't reviewed that thread but in-so-far as this one goes I would just drop this altogether. The error count, if desired, can be added to pg_subscription_error, and the timestamp should be added there as noted above. If this is useful for the feature on the other thread it can be reconstituted from there. > 2. Fix a bug in pg_stat_subscription_workers. As pointed out by > Andres, there is a bug in pg_stat_subscription_workers; it doesn't > drop entries for already-dropped tables. We need to fix it. > Given the above, this becomes an N/A. > 3. Show commit-LSN of the error transaction in errcontext. Currently, > we show XID and commit timestamp in the errcontext. But given that we > use LSN in pg_subscriptoin_error, it's better to show commit-LSN as > well (or instead of error-XID). > Agreed, I think: what "errcontext" is this referring to? > > 5. Record skipped data to the system catalog, say > pg_subscription_conflict_history so that there is a chance to analyze > and recover. We can discuss the > details in a new thread. > Agreed - the "before skipping" consideration seems considerably more helpful; but wouldn't need to be persistent, it could just be a view. A permanent record probably would best be recorded in the logs - though if we get the pre-skip functionality the user can view directly and save the results wherever they wish or we can provide a function to spool the information to the log. I don't see persistent in-database storage being that desirable here. If we only do something after the transaction has been skipped it may be useful to add an option to the skipping command to auto-disable the subscription after the transaction has been skipped and before any subsequent transactions are applied. This will give the user a chance to process the post-skipped information before restoring sync and having the system begin changing underneath them again. > > 4 and 5 might be better introduced together but I think since the user > is able to check what changes will be skipped on the publisher side we > can do 5 for v16. How would one go about doing that (checking what changes will be skipped on the publisher side)? David J.
Re: Time to drop plpython2?
On 2/18/22 14:37, Andres Freund wrote: Hi, On 2022-02-18 14:19:49 -0500, Joe Conway wrote: On 2/17/22 13:08, Andres Freund wrote: > On 2022-02-16 23:14:46 -0800, Andres Freund wrote: > > > Done. Curious how red the BF will turn out to be. Let's hope it's not > > > too bad. > > - rhinoceros > > Joe replied that he is afk, looking into it tomorrow. I installed python3 packages (initially forgetting the devel package -- d'oh!) and changed build-farm.conf thusly: 8<--- *** *** 185,190 --- 185,193 build_env => { + # specify python 3 + PYTHON => 'python3', + # use a dedicated cache for the build farm. this should give us # very high hit rates and slightly faster cache searching. # 8<--- That seems to have worked. But the question is, is that the correct/recommended method? If python3 is in PATH, then you shouldn't need that part. Not quite -- python3 is definitely in the PATH: 8<--- $ which python3 /usr/bin/python3 8<--- And I gather that merely installing python3 RPMs on a RHEL-esque 7.X system does not replace the python symlink: 8<--- $ yum whatprovides /usr/bin/python python-2.7.5-89.el7.x86_64 : An interpreted, interactive, object-oriented programming language Repo: base Matched from: Filename: /usr/bin/python $ ll /usr/bin/python lrwxrwxrwx. 1 root root 7 Mar 13 2021 /usr/bin/python -> python2 8<--- Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: Time to drop plpython2?
Hi, On 2022-02-18 14:19:49 -0500, Joe Conway wrote: > On 2/17/22 13:08, Andres Freund wrote: > > On 2022-02-16 23:14:46 -0800, Andres Freund wrote: > > > > Done. Curious how red the BF will turn out to be. Let's hope it's not > > > > too bad. > > > > - rhinoceros > > > > Joe replied that he is afk, looking into it tomorrow. > > I installed python3 packages (initially forgetting the devel package -- > d'oh!) and changed build-farm.conf thusly: > > 8<--- > *** > *** 185,190 > --- 185,193 > > build_env => { > > + # specify python 3 > + PYTHON => 'python3', > + > # use a dedicated cache for the build farm. this should give > us > # very high hit rates and slightly faster cache searching. > # > 8<--- > > That seems to have worked. > > But the question is, is that the correct/recommended method? If python3 is in PATH, then you shouldn't need that part. Greetings, Andres Freund
Re: Time to drop plpython2?
On 2/17/22 13:08, Andres Freund wrote: On 2022-02-16 23:14:46 -0800, Andres Freund wrote: > Done. Curious how red the BF will turn out to be. Let's hope it's not > too bad. - rhinoceros Joe replied that he is afk, looking into it tomorrow. I installed python3 packages (initially forgetting the devel package -- d'oh!) and changed build-farm.conf thusly: 8<--- *** *** 185,190 --- 185,193 build_env => { + # specify python 3 + PYTHON => 'python3', + # use a dedicated cache for the build farm. this should give us # very high hit rates and slightly faster cache searching. # 8<--- That seems to have worked. But the question is, is that the correct/recommended method? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: [PATCH] Add support to table_to_xmlschema regex when timestamp has time zone
On Fri, Feb 18, 2022, at 2:47 PM, Renan Soares Lopes wrote: > Hello, > > I added a patch to fix table_to_xmlschema, could you point me how to add a > unit test to that? You should edit src/test/regress/expected/xmlmap.out. In this case, you should also modify src/test/regress/expected/xmlmap_1.out that the output from this test when you build without libxml support. Run 'make check' to test your fix after building with/without libxml support. Regarding this fix, it looks good to me. FWIW, character class escape is defined here [1]. [1] https://www.w3.org/TR/xmlschema11-2/#cces -- Euler Taveira EDB https://www.enterprisedb.com/
Re: adding 'zstd' as a compression algorithm
On Fri, Feb 18, 2022 at 9:52 AM Robert Haas wrote: > So here's a revised patch for zstd support that uses > AC_CHECK_HEADER(), plus a second patch to change the occurrences above > to use AC_CHECK_HEADER() and remove all traces of the corresponding > preprocessor symbol. And I've now committed the first of these. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Fri, Feb 18, 2022 at 12:56 AM Andy Fan wrote: > What do you think about moving on this feature? The items known by me > are: 1). Make sure the estimation error can be fixed or discuss if my current > solution is workable. b). Just distribute some selectivity restrictinfo to > RelOptInfo. c). See how hard it is to treat the original / derived qual > equally. > d). Reduce the extra planner cost at much as possible. Any other important > items I missed? I think it's not realistic to do anything here for PostgreSQL 15. Considering that it's almost the end of February and feature freeze will probably be in perhaps 5-6 weeks, in order to get something committed at this point, you would need to have (1) sufficient consensus on the design, (2) a set of reasonably complete patches implementing that design at an acceptable level of quality, and (3) a committer interested in putting in the necessary time to help you get this over the finish line. As far as I can see, you have none of those things. Tom doesn't think we need this at all, and you and I and Tomas all have somewhat different ideas on what approach we ought to be taking, and the patches appear to be at a POC level at this point rather than something that's close to being ready to ship, and no committer has expressed interest in trying to get them into this release. It seems to me that the thing to do here is see if you can build consensus on an approach. Just saying that we ought to think the patches you've already got are good enough is not going to get you anywhere. I do understand that the political element of this problem is frustrating to you, as it is to many people. But consider the alternative: suppose the way things worked around here is that any committer could commit anything they liked without needing the approval of any other committer, or even over their objections. Well, it would be chaos. People would be constantly reverting or rewriting things that other people had done, and everybody would probably be pissed off at each other all the time, and the quality would go down the tubes and nobody would use PostgreSQL any more. I'm not saying the current system is perfect, not at all. It's frustrating as all get out at times. But the reason it's frustrating is because the PostgreSQL community is a community of human beings, and there's nothing more frustrating in the world than the stuff other human beings do. However, it's equally true that we get further working together than we would individually. I think Tom is wrong about the merits of doing something in this area, but I also think he's incredibly smart and thoughtful and one of the best technologists I've ever met, and probably just one of the absolute best technologists on Planet Earth. And I also have to consider, and this is really important, the possibility that Tom is right about this issue and I am wrong. So far Tom hasn't replied to what I wrote, but I hope he does. Maybe he'll admit that I have some valid points. Maybe he'll tell me why he thinks I'm wrong. Maybe I'll learn about some problem that I haven't considered from his response, and maybe that will lead to a refinement of the idea that will make it better. I don't know, but it's certainly happened in plenty of other cases. And that's how PostgreSQL gets to be this pretty amazing database that it is. So, yeah, building consensus is frustrating and it takes a long time and sometimes it feels like other people are obstructing you needlessly and sometimes that's probably true. But there's not a realistic alternative. Nobody here is smart enough to create software that is as good as what all of us create together. -- Robert Haas EDB: http://www.enterprisedb.com
Re: USE_BARRIER_SMGRRELEASE on Linux?
On Wed, Feb 16, 2022 at 01:00:53PM -0800, Nathan Bossart wrote: > On Wed, Feb 16, 2022 at 11:27:31AM -0800, Andres Freund wrote: >> That doesn't strike me as great architecturally. E.g. in theory the same >> problem could exist in single user mode. I think it doesn't today, because >> RegisterSyncRequest() will effectively "absorb" it immediately, but that kind >> of feels like an implementation detail? > > Yeah, maybe that is a reason to add an absorb somewhere within > CreateCheckPoint() instead, like v1 [0] does. Then the extra absorb would > be sufficient for single-user mode if the requests were not absorbed > immediately. Here is a new patch. This is essentially the same as v1, but I've spruced up the comments and the commit message. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From adf7f7c9717897cc72a3c76d90ad0db082a7a432 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 15 Feb 2022 22:53:04 -0800 Subject: [PATCH v3 1/1] Fix race condition between DROP TABLESPACE and checkpointing. Commands like ALTER TABLE SET TABLESPACE may leave files for the next checkpoint to clean up. If such files are not removed by the time DROP TABLESPACE is called, we request a checkpoint so that they are deleted. However, there is presently a window before checkpoint start where new unlink requests won't be scheduled until the following checkpoint. This means that the checkpoint forced by DROP TABLESPACE might not remove the files we expect it to remove, and the following ERROR will be emitted: ERROR: tablespace "mytblspc" is not empty To fix, add a call to AbsorbSyncRequests() just before advancing the unlink cycle counter. This ensures that any unlink requests forwarded prior to checkpoint start (i.e., when ckpt_started is incremented) will be processed by the current checkpoint. Since AbsorbSyncRequests() performs memory allocations, it cannot be called within a critical section, so we also need to move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section. This is an old bug, so back-patch to all supported versions. Author: Nathan Bossart Reported-by: Nathan Bossart Diagnosed-by: Thomas Munro, Nathan Bossart Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13 --- src/backend/access/transam/xlog.c | 15 --- src/backend/storage/sync/sync.c | 14 +- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce78ac413e..174aab5ae4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6294,6 +6294,14 @@ CreateCheckPoint(int flags) MemSet(&CheckpointStats, 0, sizeof(CheckpointStats)); CheckpointStats.ckpt_start_t = GetCurrentTimestamp(); + /* + * Let smgr prepare for checkpoint; this has to happen outside the critical + * section and before we determine the REDO pointer. Note that smgr must + * not do anything that'd have to be undone if we decide no checkpoint is + * needed. + */ + SyncPreCheckpoint(); + /* * Use a critical section to force system panic if we have trouble. */ @@ -6307,13 +6315,6 @@ CreateCheckPoint(int flags) LWLockRelease(ControlFileLock); } - /* - * Let smgr prepare for checkpoint; this has to happen before we determine - * the REDO pointer. Note that smgr must not do anything that'd have to - * be undone if we decide no checkpoint is needed. - */ - SyncPreCheckpoint(); - /* Begin filling in the checkpoint WAL record */ MemSet(&checkPoint, 0, sizeof(checkPoint)); checkPoint.time = (pg_time_t) time(NULL); diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c index e161d57761..f16b25faa1 100644 --- a/src/backend/storage/sync/sync.c +++ b/src/backend/storage/sync/sync.c @@ -173,7 +173,9 @@ InitSync(void) * counter is incremented here. * * This must be called *before* the checkpoint REDO point is determined. - * That ensures that we won't delete files too soon. + * That ensures that we won't delete files too soon. Since this calls + * AbsorbSyncRequests(), which performs memory allocations, it cannot be + * called within a critical section. * * Note that we can't do anything here that depends on the assumption * that the checkpoint will be completed. @@ -181,6 +183,16 @@ InitSync(void) void SyncPreCheckpoint(void) { + /* + * Operations such as DROP TABLESPACE assume that the next checkpoint will + * process all recently forwarded unlink requests, but if they aren't + * absorbed prior to advancing the cycle counter, they won't be processed + * until a future checkpoint. The following absorb ensures that any unlink + * requests forwarded before the checkpoint began will be processed in the + * current checkpoint. + */ + AbsorbSyncRequests(); + /* * Any unlink requests arriving after this point will be assigned the next * cycle counter, and won't be unlinked until next chec
[PATCH] Add support to table_to_xmlschema regex when timestamp has time zone
Hello,I added a patch to fix table_to_xmlschema, could you point me how to add a unit test to that?From 24768b689638fc35edcdb378735ae2505315a151 Mon Sep 17 00:00:00 2001 From: Renan Lopes Date: Fri, 18 Feb 2022 14:36:30 -0300 Subject: [PATCH] fix: table_to_xmlschema regex for timestamp with time zone --- src/backend/utils/adt/xml.c | 4 ++-- src/test/regress/sql/xmlmap.sql | 7 +++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 51773db..801ad8f 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3659,7 +3659,7 @@ map_sql_type_to_xmlschema_type(Oid typeoid, int typmod) case TIMEOID: case TIMETZOID: { - const char *tz = (typeoid == TIMETZOID ? "(+|-)\\p{Nd}{2}:\\p{Nd}{2}" : ""); + const char *tz = (typeoid == TIMETZOID ? "(\\+|-)\\p{Nd}{2}:\\p{Nd}{2}" : ""); if (typmod == -1) appendStringInfo(&result, @@ -3682,7 +3682,7 @@ map_sql_type_to_xmlschema_type(Oid typeoid, int typmod) case TIMESTAMPOID: case TIMESTAMPTZOID: { - const char *tz = (typeoid == TIMESTAMPTZOID ? "(+|-)\\p{Nd}{2}:\\p{Nd}{2}" : ""); + const char *tz = (typeoid == TIMESTAMPTZOID ? "(\\+|-)\\p{Nd}{2}:\\p{Nd}{2}" : ""); if (typmod == -1) appendStringInfo(&result, diff --git a/src/test/regress/sql/xmlmap.sql b/src/test/regress/sql/xmlmap.sql index fde1b9e..13407ee 100644 --- a/src/test/regress/sql/xmlmap.sql +++ b/src/test/regress/sql/xmlmap.sql @@ -55,3 +55,10 @@ CREATE TABLE testxmlschema.test3 SELECT xmlforest(c1, c2, c3, c4) FROM testxmlschema.test3; SELECT table_to_xml('testxmlschema.test3', true, true, ''); + + +-- test datetime with time zone + +CREATE TABLE testxmlschema.test4 (a timestamp with time zone default current_timestamp); +SELECT table_to_xmlschema('testxmlschema.test4', true, true, ''); + -- 2.35.1
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Some review comments on the latest version: +* runningBackups is a counter indicating the number of backups currently in +* progress. forcePageWrites is set to true when either of these is +* non-zero. lastBackupStart is the latest checkpoint redo location used as +* a starting point for an online backup. */ - ExclusiveBackupState exclusiveBackupState; - int nonExclusiveBackups; What do you mean by "either of these is non-zero ''. Earlier we used to set forcePageWrites in case of both exclusive and non-exclusive backups, but we have just one type of backup now. == -* OK to update backup counters, forcePageWrites and session-level lock. +* OK to update backup counters and forcePageWrites. * We still update the status of session-level lock so I don't think we should update the above comment. See below code: if (XLogCtl->Insert.runningBackups == 0) { XLogCtl->Insert.forcePageWrites = false; } /* * Clean up session-level lock. * * You might think that WALInsertLockRelease() can be called before * cleaning up session-level lock because session-level lock doesn't need * to be protected with WAL insertion lock. But since * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be * cleaned up before it. */ sessionBackupState = SESSION_BACKUP_NONE; WALInsertLockRelease(); == @@ -8993,18 +8686,16 @@ do_pg_abort_backup(int code, Datum arg) boolemit_warning = DatumGetBool(arg); /* -* Quick exit if session is not keeping around a non-exclusive backup -* already started. +* Quick exit if session does not have a running backup. */ - if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE) + if (sessionBackupState != SESSION_BACKUP_RUNNING) return; WALInsertLockAcquireExclusive(); - Assert(XLogCtl->Insert.nonExclusiveBackups > 0); - XLogCtl->Insert.nonExclusiveBackups--; + Assert(XLogCtl->Insert.runningBackups > 0); + XLogCtl->Insert.runningBackups--; - if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE && - XLogCtl->Insert.nonExclusiveBackups == 0) + if (XLogCtl->Insert.runningBackups == 0) { XLogCtl->Insert.forcePageWrites = false; } I think we have a lot of common code in do_pg_abort_backup() and pg_do_stop_backup(). So why not have a common function that can be called from both these functions. == +# Now delete the bogus backup_label file since it will interfere with startup +unlink("$pgdata/backup_label") + or BAIL_OUT("unable to unlink $pgdata/backup_label"); + Why do we need this additional change? Earlier this was not required. -- With Regards, Ashutosh Sharma. On Thu, Feb 17, 2022 at 6:41 AM Nathan Bossart wrote: > > Here is a rebased patch. > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com
Re: Per-table storage parameters for TableAM/IndexAM extensions
On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro wrote: > > > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas wrote: > >> > >> > >> Imagine that I am using the "foo" tableam with "compression=lots" and > >> I want to switch to the "bar" AM which does not support that option. > >> If I remove the "compression=lots" option using a separate command, > >> the "foo" table AM may rewrite my whole table and decompress > >> everything. Then when I convert to the "bar" AM it's going to have to > >> be rewritten again. That's painful. I clearly need some way to switch > >> AMs without having to rewrite the table twice. > > > Agreed. Better to avoid multiple rewrites here. Thank you for figuring out > this. > > > > You'd need to be able to do multiple things with one command e.g. > > > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET > > preferred_fruit = 'banana'; > > +1 > Silently dropping some options is not right and it may confuse users > too. So I would like to go > for the command you have suggested, where the user should be able to > SET & RESET multiple > options in a single command for an object. I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning accurately to me. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: O(n) tasks cause lengthy startups and checkpoints
On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote: >> > The improvements around deleting temporary files and serialized snapshots >> > afaict don't require a dedicated process - they're only relevant during >> > startup. We could use the approach of renaming the directory out of the >> > way as >> > done in this patchset but perform the cleanup in the startup process after >> > we're up. >> >> Perhaps this is a good place to start. As I mentioned above, IME the >> temporary file cleanup is the most common problem, so I think even getting >> that one fixed would be a huge improvement. > > Cool. Hm. How should this work for standbys? I can think of the following options: 1. Do temporary file cleanup in the postmaster (as it does today). 2. Pause after allowing connections to clean up temporary files. 3. Do small amounts of temporary file cleanup whenever there is an opportunity during recovery. 4. Wait until recovery completes before cleaning up temporary files. I'm not too thrilled about any of these options. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Feb 17, 2022 at 6:39 PM Andres Freund wrote: > > The only other kind of hazard I can think of is: could it be unsafe to > > try to interpret the contents of a database to which no one else is > > connected at present due to any of the issues you mention? But what's > > the hazard exactly? > > I don't quite know. But I don't think it's impossible to run into trouble in > this area. E.g. xid horizons are computed in a database specific way. If the > routine reading pg_class did hot pruning, you could end up removing data > that's actually needed for a logical slot in the other database because the > backend local horizon state was computed for the "local" database? Yeah, but it doesn't -- and shouldn't. There's no HeapScanDesc here, so we can't accidentally wander into heap_page_prune_opt(). We should document the things we're thinking about here in the comments to prevent future mistakes, but I think for the moment we are OK. > Could there be problems because other backends wouldn't see the backend > accessing the other database as being connected to that database > (PGPROC->databaseId)? I think that if there's any hazard here, it must be related to snapshots, which brings us to your next point: > Or what if somebody optimized snapshots to disregard readonly transactions in > other databases? So there are two related questions here. One is whether the snapshot that we're using to access the template database can be unsafe, and the other is whether the read-only access that we're performing inside the template database could mess up somebody else's snapshot. Let's deal with the second point first: nobody else knows that we're reading from the template database, and nobody else is reading from the template database except possibly for someone who is doing exactly what we're doing. Therefore, I think this hazard can be ruled out. On the first point, a key point in my opinion is that there can be no in-flight transactions in the template database, because nobody is connected to it, and prepared transactions in a template database are verboten. It therefore can't matter if we include too few XIDs in our snapshot, or if our xmin is too new. The reverse case can matter, though: if the xmin of our snapshot were too old, or if we had extra XIDs in our snapshot, then we might think that a transaction is still in progress when it isn't. Therefore, I think the patch is wrong to use GetActiveSnapshot() and must use GetLatestSnapshot() *after* it's finished making sure that nobody is using the template database. I don't think there's a hazard beyond that, though. Let's consider the two ways in which things could go wrong: 1. Extra XIDs in the snapshot. Any current or future optimization of snapshots would presumably be trying to make them smaller by removing XIDs from the snapshot, not making them bigger by adding XIDs to the snapshot. I guess in theory you can imagine an optimization that tests for the presence of XIDs by some method other than scanning through an array, and which feels free to add XIDs to the snapshot if they "can't matter," but I think it's up to the author of that hypothetical future patch to make sure they don't break anything in so doing -- especially because it's entirely possible for our session to see XIDs used by a backend in some other database, because they could show up in shared catalogs. I think that's why, as far as I can tell, we only use the database ID when setting pruning thresholds, and not for snapshots. 2. xmin of snapshot too new. There are no in-progress transactions in the template database, so how can this even happen? If we set the xmin "in the future," then we could get confused about what's visible due to wraparound, but that seems crazy. I don't see how there can be a problem here. > > It can't be a problem if we've failed to process sinval messages for the > > target database, because we're not using any caches anyway. > > Could you end up with an outdated relmap entry? Probably not. Again, we're not relying on caching -- we read the file. > > We can't. It can't be unsafe to test visibility of XIDs for that database, > > because in an alternate universe some backend could have connected to that > > database and seen the same XIDs. > > That's a weak argument, because in that alternative universe a PGPROC entry > with the PGPROC->databaseId = template_databases_oid would exist. So what? As I also argue above, I don't think that affects snapshot generation, and if it did it wouldn't matter anyway, because it could only remove in-progress transactions from the snapshot, and there aren't any in the template database anyhow. Another way of looking at this is: we could just as well use SnapshotSelf or (if it still existed) SnapshotNow to test visibility. In a world where there are no transactions in flight, it's the same thing. In fact, maybe we should do it that way, just to make it clearer what's happening. I think these are really good questions you are
Re: Timeout control within tests
Noah Misch writes: > On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote: >> Meson's test runner has the concept of a "timeout multiplier" for ways of >> running tests. Meson's stuff is about entire tests (i.e. one tap test), so >> doesn't apply here, but I wonder if we shouldn't do something similar? > Hmmm. It is good if the user can express an intent that continues to make > sense if we change the default timeout. For the buildfarm use case, a > multiplier is moderately better on that axis (PG_TEST_TIMEOUT_MULTIPLIER=100 > beats PG_TEST_TIMEOUT_DEFAULT=18000). For the hacker use case, an absolute > value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 beats > PG_TEST_TIMEOUT_MULTIPLIER=.01). FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after finding that smaller values didn't work reliably in the buildfarm. Now maybe 741d7f1 fixed that, but I wouldn't count on it. So while I approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this centralized setting, I think that we might need to have a multiplier there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300 across the board. Perhaps the latter is fine, but a multiplier seems a bit more flexible. On the other hand, I also support your point that an absolute setting is easier to think about / adjust for special uses. So maybe we should just KISS and use a single absolute setting until we find a hard reason why that doesn't work well. regards, tom lane
Re: Synchronizing slots from primary to standby
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut wrote: > > On 10.02.22 22:47, Bruce Momjian wrote: > > I would love to see this feature in PG 15. Can someone explain its > > current status? Thanks. > > The way I understand it: > ... Hi Peter, I'm starting to review this patch, and last time I checked I noticed it didn't seem to apply cleanly to master anymore. Would you be able to send a rebased version? Thanks, James Coleman
RE: Failed transaction statistics to measure the logical replication progress
On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英 wrote: > On Wed, Jan 12, 2022 8:35 PM osumi.takami...@fujitsu.com > wrote: > > > > The attached v21 has a couple of other minor updates like a > > modification of error message text. > > > > > > Thanks for updating the patch. Here are some comments. Thank you for your reivew ! > 1) I saw the following description about pg_stat_subscription_workers view in > existing doc: > >The pg_stat_subscription_workers view will > contain >one row per subscription worker on which errors have occurred, ... > > It only says "which errors have occurred", maybe we should also mention > transactions here, right? I wrote about this statistics in the next line but as you pointed out, separating the description into two sentences wasn't good idea. Fixed. > 2) > /* -- > + * pgstat_send_subworker_xact_stats() - > + * > + * Send a subworker's transaction stats to the collector. > + * The statistics are cleared upon return. > > Should "The statistics are cleared upon return" changed to "The statistics are > cleared upon sending"? Because if it doesn't reach PGSTAT_STAT_INTERVAL > and the transaction stats are not sent, the function will return without > clearing > out statistics. Now, the purpose of this function has become purely to send a message and whenever it's called, the function clears the saved stats. So, I skipped this comments now. > 3) > + Assert(command == LOGICAL_REP_MSG_COMMIT || > +command == LOGICAL_REP_MSG_STREAM_COMMIT || > +command == LOGICAL_REP_MSG_COMMIT_PREPARED > || > +command == > LOGICAL_REP_MSG_ROLLBACK_PREPARED); > + > + switch (command) > + { > + case LOGICAL_REP_MSG_COMMIT: > + case LOGICAL_REP_MSG_STREAM_COMMIT: > + case LOGICAL_REP_MSG_COMMIT_PREPARED: > + MyLogicalRepWorker->commit_count++; > + break; > + case LOGICAL_REP_MSG_ROLLBACK_PREPARED: > + MyLogicalRepWorker->abort_count++; > + break; > + default: > + ereport(ERROR, > + errmsg("invalid logical message type > for transaction statistics of subscription")); > + break; > + } > > I'm not sure that do we need the assert, because it will report an error > later if > command is an invalid value. The Assert has been removed, along with the switch branches now. Since there was an adivce that we should call this from apply_handle_commit_internal and from that function, if we don't want to change this function's argument, all we need to do is to pass a boolean value that indicates the stats is commit_count or abort_count. Kindly have a look at the updated version. > 4) I noticed that the abort_count doesn't include aborted streaming > transactions. > Should we take this case into consideration? Hmm, we can add this into this column, when there's no objection. I'm not sure but someone might say those should be separate columns. The new patch v22 is shared in [2]. [2] - https://www.postgresql.org/message-id/TYCPR01MB83737C689F8F310C87C19C1EED379%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: [PATCH] Add reloption for views to enable RLS
On Tue, 2022-02-15 at 16:32 +0100, walt...@technowledgy.de wrote: > > "check_permissions_as_owner" is ok with me, but a bit long. > > check_permissions_as_owner is exactly what happens. The additional "as" > shouldn't be a problem in length - but is much better to read. I > wouldn't associate that with CHECK OPTION either. +1 Here is a new version, with improved documentation and the option renamed to "check_permissions_owner". I just prefer the shorter form. Yours, Laurenz Albe From e31ea3de2838dcfdc8c364fc08e54e5d37f00882 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Fri, 18 Feb 2022 15:53:06 +0100 Subject: [PATCH] Add new boolean reloption "check_permissions_owner" to views When this reloption is set to "false", all permissions on the underlying relations will be checked against the invoking user rather than the view owner. The latter remains the default behavior. Author: Christoph Heiss Reviewed-By: Laurenz Albe, Wolfgang Walther Discussion: https://postgr.es/m/b66dd6d6-ad3e-c6f2-8b90-47be773da240%40cybertec.at --- doc/src/sgml/ref/alter_view.sgml | 13 - doc/src/sgml/ref/create_view.sgml | 68 ++- src/backend/access/common/reloptions.c| 11 src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/nodes/outfuncs.c | 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/plan/subselect.c| 1 + src/backend/optimizer/prep/prepjointree.c | 1 + src/backend/rewrite/rewriteHandler.c | 19 +-- src/backend/utils/cache/relcache.c| 63 +++-- src/include/nodes/parsenodes.h| 1 + src/include/utils/rel.h | 11 src/test/regress/expected/create_view.out | 46 --- src/test/regress/expected/rowsecurity.out | 65 +- src/test/regress/sql/create_view.sql | 22 +++- src/test/regress/sql/rowsecurity.sql | 44 +++ 17 files changed, 309 insertions(+), 60 deletions(-) diff --git a/doc/src/sgml/ref/alter_view.sgml b/doc/src/sgml/ref/alter_view.sgml index 98c312c5bf..0ea764738a 100644 --- a/doc/src/sgml/ref/alter_view.sgml +++ b/doc/src/sgml/ref/alter_view.sgml @@ -156,11 +156,22 @@ ALTER VIEW [ IF EXISTS ] name RESET Changes the security-barrier property of the view. The value must - be Boolean value, such as true + be a Boolean value, such as true or false. + +check_permissions_owner (boolean) + + + Changes whether permission checks on the underlying relations are + performed as the view owner or as the calling user. Default is + true. The value must be a Boolean value, such as + true or false. + + + diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml index bf03287592..cb253388e7 100644 --- a/doc/src/sgml/ref/create_view.sgml +++ b/doc/src/sgml/ref/create_view.sgml @@ -137,8 +137,6 @@ CREATE VIEW [ schema . ] view_namelocal or cascaded, and is equivalent to specifying WITH [ CASCADED | LOCAL ] CHECK OPTION (see below). - This option can be changed on existing views using ALTER VIEW. @@ -152,7 +150,22 @@ CREATE VIEW [ schema . ] view_name - + + +check_permissions_owner (boolean) + + + Set by default. If this option is set to true, + it will cause all access to underlying tables to be checked as + referenced by the view owner, otherwise as the invoking user. + + + + + + All of the above options can be changed on existing views using ALTER VIEW. + @@ -265,13 +278,35 @@ CREATE VIEW vista AS SELECT text 'Hello World' AS hello; -Access to tables referenced in the view is determined by permissions of -the view owner. In some cases, this can be used to provide secure but -restricted access to the underlying tables. However, not all views are -secure against tampering; see for -details. Functions called in the view are treated the same as if they had -been called directly from the query using the view. Therefore the user of +By default, access to relations referenced in the view is determined +by permissions of the view owner. This can be used to provide secure +but restricted access to the underlying tables. However, not all views +are secure against tampering; see +for details. + + + +Functions called in the view are treated the same as if they had been +called directly from the query using the view. Therefore, the user of a view must have permissions to call all functions used by the view. +This also means that functions are executed
Re: adding 'zstd' as a compression algorithm
On Fri, Feb 18, 2022 at 9:24 AM Robert Haas wrote: > On Fri, Feb 18, 2022 at 9:02 AM Robert Haas wrote: > > Oh wait ... you want it the other way. Yeah, that seems harmless to > > change. I wonder how many others there are that could be changed > > similarly... > > I went through configure.ac looking for instances of > AC_CHECK_HEADERS() where the corresponding symbol was not used. I > found four: > > AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is > required for LZ4])]) > AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header > file is required for GSSAPI])])]) > AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file is > required for LDAP])] > AC_CHECK_HEADER(winldap.h, [], ...stuff...) > > I guess we could clean all of those up similarly. So here's a revised patch for zstd support that uses AC_CHECK_HEADER(), plus a second patch to change the occurrences above to use AC_CHECK_HEADER() and remove all traces of the corresponding preprocessor symbol. -- Robert Haas EDB: http://www.enterprisedb.com v4-0001-Add-support-for-building-with-ZSTD.patch Description: Binary data v4-0002-Demote-AC_CHECK_HEADERS-calls-to-AC_CHECK_HEADER-.patch Description: Binary data
RE: Failed transaction statistics to measure the logical replication progress
On Friday, February 18, 2022 8:11 PM Amit Kapila wrote: > On Fri, Feb 18, 2022 at 2:04 PM osumi.takami...@fujitsu.com > wrote: > > > > On Thursday, February 17, 2022 6:45 PM Amit Kapila > wrote: > > > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila > > > > > > wrote: > > > > Can't we use pgstat_report_stat() here? Basically, you can update > > > > xact completetion counters during apply, and then from > > > > pgstat_report_stat(), you can invoke a logical replication worker > > > > stats-related function. > > > > > > > > > > If we can do this then we can save the logic this patch is trying to > > > introduce for PGSTAT_STAT_INTERVAL. > > Hi, I've encounter a couple of questions during my modification, following > your advice. > > > > In the pgstat_report_stat, we refer to the return value of > > GetCurrentTransactionStopTimestamp to compare the time different from > the last time. > > (In my previous patch, I used GetCurrentTimestamp) > > > > This time is updated in apply_handle_commit_internal's > CommitTransactionCommand for the apply worker. > > Then, if I update the subscription worker > > stats(commit_count/abort_count) immediately after this > > CommitTransactionCommand and immediately call pgstat_report_stat in the > apply_handle_commit_internal, the time difference becomes too small (falls > short of PGSTAT_STAT_INTERVAL). > > Also, the time of GetCurrentTransactionStopTimestamp is not updated > > even when I keep calling pgstat_report_stat repeatedly. > > Then, IIUC the next possible timing that message of commit_count or > > abort_count is sent to the stats collector would become the time when > > we execute another transaction by the apply worker and update the time > > for GetCurrentTransactionStopTimestamp > > and rerun pgstat_report_stat again. > > > > I think but same is true in the case of the transaction in the backend where > we > increment commit counter via AtEOXact_PgStat after updating the transaction > time. After that, we call pgstat_report_stat() at later point. How is this > case > different? > > > So, if we keep GetCurrentTransactionStopTimestamp without change, an > > update of stats depends on another new subsequent transaction if we > simply merge those. > > (this leads to users cannot see the latest stats information ?) > > > > I think this should be okay as these don't need to be accurate. > > > At least, I got a test failure because of this function for streaming > > commit case because it uses poll_query_until to wait for stats update. > > > > I feel it is not a good idea to wait for the accurate update of these > counters. Ah, then I had wrote tests based on totally wrong direction and made noises for it. Sorry for that. I don't see tests for existing xact_commit/rollback count, so I'll follow the same way. Attached a new patch that addresses three major improvements I've got so far as comments. 1. skip increment of stats counter by empty transaction, on the subscriber side (except for commit prepared) 2. utilize the existing pgstat_report_stat, instead of having a similar logic newly. 3. remove the wrong tests. Best Regards, Takamichi Osumi v22-0001-Extend-pg_stat_subscription_workers-to-include-g.patch Description: v22-0001-Extend-pg_stat_subscription_workers-to-include-g.patch
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, Thank you so much for your review. Sorry for the late reply. I've stopped working on it due to other jobs but I came back again. I attached the updated patch. I would appreciate it if you could review this again. On Mon, 19 Jul 2021 20:04:23 +0200 (CEST) Fabien COELHO wrote: > # About pgbench error handling v15 > > Patches apply cleanly. Compilation, global and local tests ok. > > - v15.1: refactoring is a definite improvement. > Good, even if it is not very useful (see below). Ok, we don't need to save variables in order to implement the retry feature on pbench as you suggested. Well, should we completely separate these two patches and should I fix v15.2 to not rely v15.1? > While restructuring, maybe predefined variables could be make readonly > so that a script which would update them would fail, which would be a > good thing. Maybe this is probably material for an independent patch. Yes, It shoule be for an independent patch. > - v15.2: see detailed comments below > > # Doc > > Doc build is ok. > > ISTM that "number of tries" line would be better placed between the > #threads and #transactions lines. What do you think? Agreed. Fixed. > Aggregate logging description: "{ failures | ... }" seems misleading > because it suggests we have one or the other, whereas it can also be > empty. I suggest: "{ | failures | ... }" to show the empty case. The description is correct because either "failures" or "both of serialization_failures and deadlock_failures" should appear in aggregate logging. If "failures" was printed only when any transaction failed, each line in aggregate logging could have different numbers of columns and which would make it difficult to parse the results. > I'd wonder whether the number of tries is set too high, > though, ISTM that an application should give up before 100? Indeed, max-tries=100 seems too high for practical system. Also, I noticed that sum of latencies of each command (= 15.839 ms) is significantly larger than the latency average (= 10.870 ms) because "per commands" results in the documentation were fixed. So, I retook a measurement on my machine for more accurate documentation. I used max-tries=10. > Minor editing: > > "there're" -> "there are". > > "the --time" -> "the --time option". Fixed. > "The latency for failed transactions and commands is not computed > separately." is unclear, > please use a positive sentence to tell what is true instead of what is not > and the reader > has to guess. Maybe: "The latency figures include failed transactions which > have reached > the maximum number of tries or the transaction latency limit.". I'm not the original author of this description, but I guess this means "The latency is measured only for successful transactions and commands but not for failed transactions or commands.". > "The main report contains the number of failed transactions if it is > non-zero." ISTM that > this is a pain for scripts which would like to process these reports data, > because the data > may or may not be there. I'm sure to write such scripts, which explains my > concern:-) I agree with you. I fixed the behavior to report the the number of failed transactions always regardless with if it is non-zero or not. > "If the total number of retried transactions is non-zero…" should it rather > be "not one", > because zero means unlimited retries? I guess that this means the actual number of retried transaction not the max-tries, so "non-zero" was correct. However, for the same reason above, I fixed the behavior to report the the retry statistics always regardeless with the actual retry numbers. > > # FEATURES > Copying variables: ISTM that we should not need to save the variables > states… no clearing, no copying should be needed. The restarted > transaction simply overrides the existing variables which is what the > previous version was doing anyway. The scripts should write their own > variables before using them, and if they don't then it is the user > problem. This is important for performance, because it means that after a > client has executed all scripts once the variable array is stable and does > not incur significant maintenance costs. The only thing that needs saving > for retry is the speudo-random generator state. This suggest simplifying > or removing "RetryState". Yes. The variables states is not necessary because we retry the whole script. It was necessary in the initial patch because it planned to retry one transaction included in the script. I removed RetryState and copyVariables. > # CODE > In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and > could be dealt with the default case. We are only interested in > serialization/deadlocks which are fatal errors? We need PGRES_NONFATAL_ERROR to save st->estatus. It is used outside readCommandResponse to determine whether we should abort or not. > doRetry: for consistency, gi
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
> > Okay. I feel this can be added as additional field but it will not > > replace backend_pid field as this represents the pid of the backend > > which triggered the current checkpoint. > > I don't think that's true. Requesting a checkpoint means telling the > checkpointer that it should wake up and start a checkpoint (or restore point) > if it's not already doing so, so the pid will always be the checkpointer pid. > The only exception is a standalone backend, but in that case you won't be able > to query that view anyway. Yes. I agree that the checkpoint will always be performed by the checkpointer process. So the pid in the pg_stat_progress_checkpoint view will always correspond to the checkpointer pid only. Checkpoints get triggered in many scenarios. One of the cases is the CHECKPOINT command issued explicitly by the backend. In this scenario I would like to know the backend pid which triggered the checkpoint. Hence I would like to add a backend_pid field. So the pg_stat_progress_checkpoint view contains pid fields as well as backend_pid fields. The backend_pid contains a valid value only during the CHECKPOINT command issued by the backend explicitly, otherwise the value will be 0. We may have to add an additional field to 'CheckpointerShmemStruct' to hold the backend pid. The backend requesting the checkpoint will update its pid to this structure. Kindly let me know if you still feel the backend_pid field is not necessary. > And also while looking at the patch I see there's the same problem that I > mentioned in the previous thread, which is that the effective flags can be > updated once the checkpoint started, and as-is the view won't reflect that. > It > also means that you can't simply display one of wal, time or force but a > possible combination of the flags (including the one not handled in v1). If I understand the above comment properly, it has 2 points. First is to display the combination of flags rather than just displaying wal, time or force - The idea behind this is to just let the user know the reason for checkpointing. That is, the checkpoint is started because max_wal_size is reached or checkpoint_timeout expired or explicitly issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint has to be performed. Hence I have not included those in the view. If it is really required, I would like to modify the code to include other flags and display the combination. Second point is to reflect the updated flags in the view. AFAIK, there is a possibility that the flags get updated during the on-going checkpoint but the reason for checkpoint (wal, time or force) will remain same for the current checkpoint. There might be a change in how checkpoint has to be performed if CHECKPOINT_IMMEDIATE flag is set. If we go with displaying the combination of flags in the view, then probably we may have to reflect this in the view. > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > added for this purpose. > > Maybe num_process_waiting? I feel 'processes_wiating' aligns more with the naming conventions of the fields of the existing progres views. > > Probably writing of buffers or syncing files may complete before > > pg_is_in_recovery() returns false. But there are some cleanup > > operations happen as part of the checkpoint. During this scenario, we > > may get false value for pg_is_in_recovery(). Please refer following > > piece of code which is present in CreateRestartpoint(). > > > > if (!RecoveryInProgress()) > > replayTLI = XLogCtl->InsertTimeLineID; > > Then maybe we could store the timeline rather then then kind of checkpoint? > You should still be able to compute the information while giving a bit more > information for the same memory usage. Can you please describe more about how checkpoint/restartpoint can be confirmed using the timeline id. Thanks & Regards, Nitin Jadhav On Fri, Feb 18, 2022 at 1:13 PM Julien Rouhaud wrote: > > Hi, > > On Fri, Feb 18, 2022 at 12:20:26PM +0530, Nitin Jadhav wrote: > > > > > > If there's a checkpoint timed triggered and then someone calls > > > pg_start_backup() which then wait for the end of the current checkpoint > > > (possibly after changing the flags), I think the view should reflect that > > > in > > > some way. Maybe storing an array of (pid, flags) is too much, but at > > > least a > > > counter with the number of processes actively waiting for the end of the > > > checkpoint. > > > > Okay. I feel this can be added as additional field but it will not > > replace backend_pid field as this represents the pid of the backend > > which triggered the current checkpoint. > > I don't think that's true. Requesting a checkpoint means telling the > checkpointer that it should wake up and start a checkpoint (or restore point) > if it's not already doing so, so the pid will always be the checkpointer pid. > The only exception is a standal
Re: adding 'zstd' as a compression algorithm
On Fri, Feb 18, 2022 at 9:02 AM Robert Haas wrote: > Oh wait ... you want it the other way. Yeah, that seems harmless to > change. I wonder how many others there are that could be changed > similarly... I went through configure.ac looking for instances of AC_CHECK_HEADERS() where the corresponding symbol was not used. I found four: AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])]) AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])]) AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file is required for LDAP])] AC_CHECK_HEADER(winldap.h, [], ...stuff...) I guess we could clean all of those up similarly. -- Robert Haas EDB: http://www.enterprisedb.com
Re: killing perl2host
On 2/17/22 15:46, Andres Freund wrote: > On 2022-02-17 15:23:36 -0500, Andrew Dunstan wrote: >> Very well. I think the easiest way will be to stash $host_os in the >> environment and let the script pick it up similarly to what I suggested >> with MSYSTEM. > WFM. OK, here's a patch. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/config/check_modules.pl b/config/check_modules.pl index cc0a7ab0e7..470c3e9c14 100644 --- a/config/check_modules.pl +++ b/config/check_modules.pl @@ -6,6 +6,7 @@ # use strict; use warnings; +use Config; use IPC::Run 0.79; @@ -19,5 +20,9 @@ diag("IPC::Run::VERSION: $IPC::Run::VERSION"); diag("Test::More::VERSION: $Test::More::VERSION"); diag("Time::HiRes::VERSION: $Time::HiRes::VERSION"); +# Check that if prove is using msys perl it is for an msys target +ok(($ENV{__CONFIG_HOST_OS__} || "") eq 'msys', + "Msys perl used for correct target") + if $Config{osname} eq 'msys'; ok(1); done_testing(); diff --git a/configure b/configure index ba635a0062..5e3af5c35b 100755 --- a/configure +++ b/configure @@ -19453,6 +19453,7 @@ fi # installation than perl, eg on MSys, so we have to check using prove. { $as_echo "$as_me:${as_lineno-$LINENO}: checking for Perl modules required for TAP tests" >&5 $as_echo_n "checking for Perl modules required for TAP tests... " >&6; } + __CONFIG_HOST_OS__=$host_os; export __CONFIG_HOST_OS__ modulestderr=`"$PROVE" "$srcdir/config/check_modules.pl" 2>&1 >/dev/null` if test $? -eq 0; then # log the module version details, but don't show them interactively diff --git a/configure.ac b/configure.ac index 16167329fc..d00e92357e 100644 --- a/configure.ac +++ b/configure.ac @@ -2396,6 +2396,7 @@ if test "$enable_tap_tests" = yes; then # AX_PROG_PERL_MODULES here, but prove might be part of a different Perl # installation than perl, eg on MSys, so we have to check using prove. AC_MSG_CHECKING(for Perl modules required for TAP tests) + __CONFIG_HOST_OS__=$host_os; export __CONFIG_HOST_OS__ [modulestderr=`"$PROVE" "$srcdir/config/check_modules.pl" 2>&1 >/dev/null`] if test $? -eq 0; then # log the module version details, but don't show them interactively
Re: adding 'zstd' as a compression algorithm
On Fri, Feb 18, 2022 at 8:48 AM Robert Haas wrote: > On Thu, Feb 17, 2022 at 8:36 PM Andres Freund wrote: > > On 2022-02-17 13:34:08 +0900, Michael Paquier wrote: > > > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so > > > this version fails the sanity check between pg_config.h.in and the > > > MSVC scripts checking that all flags exist. > > > > Do we really need all three defines? How about using AC_CHECK_HEADER() > > instead > > of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we > > error > > out if a header isn't found make it a bit pointless to then still define > > HAVE_*_H. Plenty other cases in configure.ac just use AC_CHECK_HEADER. > > I have to admit to being somewhat confused by the apparent lack of > consistency in the way we do configure checks. The ZSTD check we added > here is just based on the LZ4 check just above it, which was the > result of my commit of Dilip's patch to add LZ4 TOAST compression. So > if we want to do something different we should change them both. But > that just begs the question of why the LZ4 support looks the way it > does, and to be honest I don't recall. The zlib and XSLT formulas just > above are much simpler, but for some reason what we're doing here > seems to be based on the more-complex formula we use for XML support > instead of either of those. > > But having said that, the proposed patch adds no new call to > AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I > don't understand the specifics of your complaint here. Oh wait ... you want it the other way. Yeah, that seems harmless to change. I wonder how many others there are that could be changed similarly... -- Robert Haas EDB: http://www.enterprisedb.com
Re: adding 'zstd' as a compression algorithm
On Thu, Feb 17, 2022 at 8:36 PM Andres Freund wrote: > On 2022-02-17 13:34:08 +0900, Michael Paquier wrote: > > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so > > this version fails the sanity check between pg_config.h.in and the > > MSVC scripts checking that all flags exist. > > Do we really need all three defines? How about using AC_CHECK_HEADER() instead > of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we error > out if a header isn't found make it a bit pointless to then still define > HAVE_*_H. Plenty other cases in configure.ac just use AC_CHECK_HEADER. I have to admit to being somewhat confused by the apparent lack of consistency in the way we do configure checks. The ZSTD check we added here is just based on the LZ4 check just above it, which was the result of my commit of Dilip's patch to add LZ4 TOAST compression. So if we want to do something different we should change them both. But that just begs the question of why the LZ4 support looks the way it does, and to be honest I don't recall. The zlib and XSLT formulas just above are much simpler, but for some reason what we're doing here seems to be based on the more-complex formula we use for XML support instead of either of those. But having said that, the proposed patch adds no new call to AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I don't understand the specifics of your complaint here. -- Robert Haas EDB: http://www.enterprisedb.com
Patch a potential memory leak in describeOneTableDetails()
Hi all, I find a potential memory leak in PostgresSQL 14.1, which is in the function describeOneTableDetails (./src/bin/psql/describe.c). The bug has been confirmed by an auditor of . Specifically, at line 1603, a memory chunk is allocated with pg_strdup and assigned to tableinfo.reloptions. If the branch takes true at line 1621, the execution path will eventually jump to error_return at line 3394. Unfortunately, the memory is not freed when the function describeOneTableDetail() returns. As a result, the memory is leaked. Similarly, the two memory chunks (tableinfo.reloftype and tableinfo.relam) allocated at line 1605, 1606 and 1612 may also be leaked for the same reason. 1355 bool 1356 describeTableDetails(const char *pattern, bool verbose, bool showSystem) 1357 { ... 1603 tableinfo.reloptions = pg_strdup(PQgetvalue(res, 0, 9)); 1604 tableinfo.tablespace = atooid(PQgetvalue(res, 0, 10)); 1605 tableinfo.reloftype = (strcmp(PQgetvalue(res, 0, 11), "") != 0) ? 1606 pg_strdup(PQgetvalue(res, 0, 11)) : NULL; ... 1610 if (pset.sversion >= 12) 1611 tableinfo.relam = PQgetisnull(res, 0, 14) ? 1612 (char *) NULL : pg_strdup(PQgetvalue(res, 0, 14)); 1613 else 1614 tableinfo.relam = NULL; ... 1621 if (tableinfo.relkind == RELKIND_SEQUENCE) 1622 { ... 1737 goto error_return; /* not an error, just return early */ 1738 } ... 3394 error_return: 3395 3396 /* clean up */ 3397 if (printTableInitialized) 3398 printTableCleanup(&cont); 3399 termPQExpBuffer(&buf); 3400 termPQExpBuffer(&title); 3401 termPQExpBuffer(&tmpbuf); 3402 3403 if (view_def) 3404 free(view_def); 3405 3406 if (res) 3407 PQclear(res); 3408 3409 return retval; 3410 } We believe we can fix the problems by adding corresponding release function before the function returns, such as. if (tableinfo.reloptions) pg_free (tableinfo.reloptions); if (tableinfo.reloftype) pg_free (tableinfo.reloftype); if (tableinfo.relam) pg_free (tableinfo. relam); I'm looking forward to your reply. Best, Wentao--- describe.c 2022-01-26 16:17:51.353135285 +0800 +++ describe_Patch.c 2022-02-18 15:57:51.099581978 +0800 @@ -3406,6 +3406,13 @@ if (res) PQclear(res); +if (tableinfo.reloptions) +pg_free (tableinfo.reloptions); +if (tableinfo.reloftype) +pg_free (tableinfo.reloftype); +if (tableinfo.relam) +pg_free (tableinfo. relam); + return retval; }
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, here's a slightly updated version of the patch series. The 0001 part adds tracking of server_version_num, so that it's possible to enable other features depending on it. In this case it's used to decide whether TABLESAMPLE is supported. The 0002 part modifies the sampling. I realized we can do something similar even on pre-9.5 releases, by running "WHERE random() < $1". Not perfect, because it still has to read the whole table, but still better than also sending it over the network. There's a "sample" option for foreign server/table, which can be used to disable the sampling if needed. A simple measurement on a table with 10M rows, on localhost. old:6600ms random: 450ms tablesample: 40ms (system) tablesample: 200ms (bernoulli) Local analyze takes ~190ms, so that's quite close. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 6524f8c6f0db13c5dca0438bdda194ee5bedebbb Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 01:32:25 +0100 Subject: [PATCH 1/2] postgres_fdw: track server version for connections To allow using features that only exist on new Postgres versions, we need some information about version of the remote node. We simply request server_version_num from the remote node. We only fetch it if version number is actually needed, and we cache it. --- contrib/postgres_fdw/connection.c | 44 + contrib/postgres_fdw/postgres_fdw.h | 2 ++ 2 files changed, 46 insertions(+) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index f753c6e2324..3ea2948d3ec 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -279,6 +279,50 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state) return entry->conn; } +/* + * Determine remote server version (as an int value). + * + * The value is determined only once and then cached in PgFdwConnState. + */ +int +GetServerVersion(PGconn *conn, PgFdwConnState *state) +{ + PGresult *volatile res = NULL; + + /* + * If we already know server version for this connection, we're done. + */ + if (state->server_version_num != 0) + return state->server_version_num; + + /* PGresult must be released before leaving this function. */ + PG_TRY(); + { + char *line; + + /* + * Execute EXPLAIN remotely. + */ + res = pgfdw_exec_query(conn, "SHOW server_version_num", NULL); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + pgfdw_report_error(ERROR, res, conn, false, "SHOW server_version_num"); + + /* + * Extract the server version number. + */ + line = PQgetvalue(res, 0, 0); + state->server_version_num = strtol(line, NULL, 10); + } + PG_FINALLY(); + { + if (res) + PQclear(res); + } + PG_END_TRY(); + + return state->server_version_num; +} + /* * Reset all transient state fields in the cached connection entry and * establish new connection to the remote server. diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index 8ae79e97e44..1687c62df2d 100644 --- a/contrib/postgres_fdw/postgres_fdw.h +++ b/contrib/postgres_fdw/postgres_fdw.h @@ -132,6 +132,7 @@ typedef struct PgFdwRelationInfo typedef struct PgFdwConnState { AsyncRequest *pendingAreq; /* pending async request */ + int server_version_num; /* remote server version */ } PgFdwConnState; /* in postgres_fdw.c */ @@ -143,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq); extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state); extern void ReleaseConnection(PGconn *conn); +extern int GetServerVersion(PGconn *conn, PgFdwConnState *state); extern unsigned int GetCursorNumber(PGconn *conn); extern unsigned int GetPrepStmtNumber(PGconn *conn); extern void do_sql_command(PGconn *conn, const char *sql); -- 2.34.1 From 546657dd0d3ab3bebdb1145785c62809b78e7644 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 00:33:19 +0100 Subject: [PATCH 2/2] postgres_fdw: sample data on remote node for ANALYZE When performing ANALYZE on a foreign tables, we need to collect sample of rows. Until now, we simply read all data from the remote node and built the sample locally. That is very expensive, especially in terms of network traffic etc. But it's possible to move the sampling to the remote node, and use either TABLESAMPLE or simply random() to transfer just much smaller amount of data. So we run either SELECT * FROM table TABLESAMPLE SYSTEM(fraction) or SELECT * FROM table WHERE random() < fraction depending on the server version (TABLESAMPLE is supported since 9.5). To do that, we need to determine what fraction of the table to sample. We rely on reltuples (fetched from the remote node) to be sufficiently accurate and up to date, and calculate the fraction based on that. We increase the sample size a bit (in case the table shrunk), and we still do the reservoi
Re: adding 'zstd' as a compression algorithm
On Thu, Feb 17, 2022 at 8:18 PM Michael Paquier wrote: > On Thu, Feb 17, 2022 at 09:40:13AM -0500, Robert Haas wrote: > > Ah, OK, cool. It seems like we have consensus on this being a good > > direction, so I plan to commit this later today unless objections or > > additional review comments turn up. > > So, will there be a part of the system where we'll make use of that in > 15? pg_basebackup for server-side and client-side compression, at > least, as far as I can guess? That's my plan, yes. I think the patches only need a small amount of work at this point, but I'd like to get this part committed first. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Merging statistics from children instead of re-sampling everything
On 2/14/22 20:16, Tomas Vondra wrote: On 2/14/22 11:22, Andrey V. Lepikhov wrote: On 2/11/22 20:12, Tomas Vondra wrote: On 2/11/22 05:29, Andrey V. Lepikhov wrote: On 2/11/22 03:37, Tomas Vondra wrote: That being said, this thread was not really about foreign partitions, but about re-analyzing inheritance trees in general. And sampling foreign partitions doesn't really solve that - we'll still do the sampling over and over. IMO, to solve the problem we should do two things: 1. Avoid repeatable partition scans in the case inheritance tree. 2. Avoid to re-analyze everything in the case of active changes in small subset of partitions. For (1) i can imagine a solution like multiplexing: on the stage of defining which relations to scan, group them and prepare parameters of scanning to make multiple samples in one shot. I'm not sure I understand what you mean by multiplexing. The term usually means "sending multiple signals at once" but I'm not sure how that applies to this issue. Can you elaborate? I suppose to make a set of samples in one scan: one sample for plane table, another - for a parent and so on, according to the inheritance tree. And cache these samples in memory. We can calculate all parameters of reservoir method to do it. I doubt keeping the samples just in memory is a good solution. Firstly, there's the question of memory consumption. Imagine a large partitioned table with 1-10k partitions. If we keep a "regular" sample (30k rows) per partition, that's 30M-300M rows. If each row needs 100B, that's 3-30GB of data. I tell about caching a sample only for a time that it needed in this ANALYZE operation. Imagine 3 levels of partitioned table. On each partition you should create and keep three different samples (we can do it in one scan). Sample for a plane table we can use immediately and destroy it. Sample for the partition on second level of hierarchy: we can save a copy of sample for future usage (maybe, repeated analyze) to a disk. In-memory data used to form a reservoir, that has a limited size and can be destroyed immediately. At the third level we can use the same logic. So, at one moment we only use as many samples as many levels of hierarchy we have. IMO, it isn't large number. > the trouble is partitions may be detached, data may be deleted from > some partitions, etc. Because statistics hasn't strong relation with data, we can use two strategies: In the case of explicit 'ANALYZE ' we can recalculate all samples for all partitions, but in autovacuum case or implicit analysis we can use not-so-old versions of samples and samples of detached (but not destroyed) partitions in optimistic assumption that it doesn't change statistic drastically. So IMHO the samples need to be serialized, in some way. Agreed Well, a separate catalog is one of the options. But I don't see how that deals with large samples, etc. I think, we can design fall back to previous approach in the case of very large tuples, like a switch from HashJoin to NestedLoop if we estimate, that we haven't enough memory. -- regards, Andrey Lepikhov Postgres Professional
Re: Emit a warning if the extension's GUC is set incorrectly
Il giorno ven 18 feb 2022 alle ore 10:58 Tom Lane ha scritto: > I wrote: > > As a stopgap to turn the farm green again, I am going to revert > > 75d22069e as well as my followup patches. If we don't want to > > give up on that idea altogether, we have to find some way to > > suppress the chatter from parallel workers. I wonder whether > > it would be appropriate to go further than we have, and actively > > delete placeholders that turn out to be within an extension's > > reserved namespace. The core issue here is that workers don't > > necessarily set GUCs and load extensions in the same order that > > their parent did, so if we leave any invalid placeholders behind > > after reserving an extension's prefix, we're risking issues > > during worker start. > > Here's a delta patch (meant to be applied after reverting cab5b9ab2) > that does things like that. It fixes the parallel-mode problem ... > so do we want to tighten things up this much? > > regards, tom lan > Hello, Thank you for taking care of the bug I introduced with 75d22069e, I didn't notice this thread until now. For what it's worth, I agree with throwing an ERROR if the placeholder is unrecognized. Initially, I didn't want to change too much the liberty of setting any placeholder, but mainly to not go unnoticed. In my humble opinion, I still think that this should go on and disallow bogus placeholders as we do for postgres unrecognized settings. I tested your delta patch with and without parallel mode, and, naturally, it behaves correctly. My 2 cents. +1 Cheers, Florin Irion
Re: Failed transaction statistics to measure the logical replication progress
On Fri, Feb 18, 2022 at 2:04 PM osumi.takami...@fujitsu.com wrote: > > On Thursday, February 17, 2022 6:45 PM Amit Kapila > wrote: > > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila > > wrote: > > > Can't we use pgstat_report_stat() here? Basically, you can update xact > > > completetion counters during apply, and then from > > > pgstat_report_stat(), you can invoke a logical replication worker > > > stats-related function. > > > > > > > If we can do this then we can save the logic this patch is trying to > > introduce for > > PGSTAT_STAT_INTERVAL. > Hi, I've encounter a couple of questions during my modification, following > your advice. > > In the pgstat_report_stat, we refer to the return value of > GetCurrentTransactionStopTimestamp to compare the time different from the > last time. > (In my previous patch, I used GetCurrentTimestamp) > > This time is updated in apply_handle_commit_internal's > CommitTransactionCommand for the apply worker. > Then, if I update the subscription worker stats(commit_count/abort_count) > immediately after > this CommitTransactionCommand and immediately call pgstat_report_stat in the > apply_handle_commit_internal, > the time difference becomes too small (falls short of PGSTAT_STAT_INTERVAL). > Also, the time of GetCurrentTransactionStopTimestamp is not updated > even when I keep calling pgstat_report_stat repeatedly. > Then, IIUC the next possible timing that message of commit_count or > abort_count > is sent to the stats collector would become the time when we execute another > transaction > by the apply worker and update the time for GetCurrentTransactionStopTimestamp > and rerun pgstat_report_stat again. > I think but same is true in the case of the transaction in the backend where we increment commit counter via AtEOXact_PgStat after updating the transaction time. After that, we call pgstat_report_stat() at later point. How is this case different? > So, if we keep GetCurrentTransactionStopTimestamp without change, > an update of stats depends on another new subsequent transaction if we simply > merge those. > (this leads to users cannot see the latest stats information ?) > I think this should be okay as these don't need to be accurate. > At least, I got a test failure because of this function for streaming commit > case > because it uses poll_query_until to wait for stats update. > I feel it is not a good idea to wait for the accurate update of these counters. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Fri, Feb 18, 2022 at 3:06 PM osumi.takami...@fujitsu.com wrote: > > On Friday, February 18, 2022 6:18 PM Amit Kapila > wrote: > > On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > > > wrote: > > > Changing the timing to send the keepalive to the decoding commit > > > timing didn't look impossible to me, although my suggestion can be > > > ad-hoc. > > > > > > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr > > > is updated from logical_decoding_ctx->reader->EndRecPtr in > > XLogSendLogical. > > > In the XLogSendLogical, we update it after we execute > > LogicalDecodingProcessRecord. > > > This order leads to the current implementation to wait the next > > > iteration to send a keepalive in WalSndWaitForWal. > > > > > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for > > > updating sentPtr. The end_lsn is the lsn same as the > > > ctx->reader->EndRecPtr, which means advancing the timing to update the > > sentPtr for the commit case. > > > Then if the transaction is empty in synchronous mode, send the > > > keepalive in WalSndUpdateProgress directly, instead of having the > > > force_keepalive_syncrep flag and having it true. > > > > > > > You have a point in that we don't need to delay sending this message till > > next > > WalSndWaitForWal() but I don't see why we need to change anything about > > update of sentPtr. > Yeah, you're right. > Now I think we don't need the update of sentPtr to send a keepalive. > > I thought we can send a keepalive message > after its update in XLogSendLogical or any appropriate place for it after the > existing update. > Yeah, I think there could be multiple ways (a) We can send such a keep alive in WalSndUpdateProgress() itself by using ctx->write_location. For this, we need to modify WalSndKeepalive() to take sentPtr as input. (b) set some flag in WalSndUpdateProgress() and then do it somewhere in WalSndLoop probably in WalSndKeepaliveIfNecessary, or maybe there is another better way. -- With Regards, Amit Kapila.
JSON path decimal literal syntax
I noticed that the JSON path lexer does not support the decimal literal syntax forms .1 1. (that is, there are no digits before or after the decimal point). This is allowed by the relevant ECMAScript standard (https://262.ecma-international.org/5.1/#sec-7.8.3) and of course SQL allows it as well. Is there a reason for this? I didn't find any code comments or documentation about this. Attached are patches that would enable this. As you can see, a bunch of test cases are affected.From 9fc73f1fa4d83da85dc1626cf8b218ec8a11104c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 18 Feb 2022 10:52:56 +0100 Subject: [PATCH v1 1/2] Test cases for JSON path decimal literals --- src/test/regress/expected/jsonpath.out | 18 ++ src/test/regress/sql/jsonpath.sql | 4 2 files changed, 22 insertions(+) diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index e399fa9631..54eb548ad1 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -878,6 +878,24 @@ select '0.0010e+2'::jsonpath; 0.10 (1 row) +select '.001'::jsonpath; +ERROR: syntax error, unexpected '.' at or near "." of jsonpath input +LINE 1: select '.001'::jsonpath; + ^ +select '.001e1'::jsonpath; +ERROR: syntax error, unexpected '.' at or near "." of jsonpath input +LINE 1: select '.001e1'::jsonpath; + ^ +select '1.'::jsonpath; +ERROR: syntax error, unexpected end of file at end of jsonpath input +LINE 1: select '1.'::jsonpath; + ^ +select '1.e1'::jsonpath; + jsonpath +-- + 1."e1" +(1 row) + select '1e'::jsonpath; ERROR: invalid floating point number at or near "1e" of jsonpath input LINE 1: select '1e'::jsonpath; diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql index 17ab775783..bf71b99fc5 100644 --- a/src/test/regress/sql/jsonpath.sql +++ b/src/test/regress/sql/jsonpath.sql @@ -163,6 +163,10 @@ select '0.0010e-1'::jsonpath; select '0.0010e+1'::jsonpath; select '0.0010e+2'::jsonpath; +select '.001'::jsonpath; +select '.001e1'::jsonpath; +select '1.'::jsonpath; +select '1.e1'::jsonpath; select '1e'::jsonpath; select '1.e'::jsonpath; select '1.2e'::jsonpath; -- 2.35.1 From 1881760218f15749122460eb3a71a0b648b5c86b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 18 Feb 2022 11:11:18 +0100 Subject: [PATCH v1 2/2] Fix JSON path decimal literal syntax Per ECMAScript standard (referenced by SQL standard), the syntax forms .1 1. should be allowed for decimal numeric literals, but the existing implementation rejected them. --- src/backend/utils/adt/jsonpath_scan.l | 12 +- src/test/regress/expected/jsonpath.out | 184 +++-- 2 files changed, 110 insertions(+), 86 deletions(-) diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index 827a9e44cb..bf1cea8c03 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -82,8 +82,7 @@ other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f] digit [0-9] integer(0|[1-9]{digit}*) -decimal{integer}\.{digit}+ -decimalfail{integer}\. +decimal({integer}\.{digit}*|\.{digit}+) real ({integer}|{decimal})[Ee][-+]?{digit}+ realfail1 ({integer}|{decimal})[Ee] realfail2 ({integer}|{decimal})[Ee][-+] @@ -242,15 +241,6 @@ hex_fail \\x{hex_dig}{0,1} return INT_P; } -{decimalfail} { - /* throw back the ., and treat as integer */ - yyless(yyleng - 1); - addstring(true, yytext, yyleng); - addchar(false, '\0'); - yylval->str = scanstring; - return INT_P; - } - ({realfail1}|{realfail2}) { yyerror(NULL, "invalid floating point number"); } \" { diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index 54eb548ad1..058010172f 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -354,11 +354,9 @@ select 'null.type()'::jsonpath; (1 row) select '1.type()'::jsonpath; - jsonpath --- - 1.type() -(1 row) - +ERROR: syntax error, unexpected TYPE_P, expecting end of file
Re: pgsql: pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Re: Robert Haas > > This needs to be guarded with "if (binary_upgrade)". > > Right. Sorry about that, and sorry for not responding sooner also. Fix > pushed now. Thanks, the 14-15 upgrade test is passing again here. Christoph
Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)
Hi, On Fri, Feb 18, 2022 at 05:22:36PM +0900, Michael Paquier wrote: > > So, I have been looking at this problem, and I don't see a problem in > doing something like the attached, where we add a "regress" mode to > compute_query_id that is a synonym of "auto". Or, in short, we have > the default of letting a module decide if a query ID can be computed > or not, at the exception that we hide its result in EXPLAIN outputs. > > Julien, what do you think? I don't see any problem with that patch. > > FWIW, about your question of upthread, I have noticed the behavior > while testing, but I know of some internal customers that enable > pg_stat_statements and like doing tests on the PostgreSQL instance > deployed this way, so that would break. They are not on 14 yet as far > as I know. Are they really testing EXPLAIN output, or just doing application-level tests?
RE: logical replication empty transactions
On Friday, February 18, 2022 6:18 PM Amit Kapila wrote: > On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com > wrote: > > > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > > wrote: > > Changing the timing to send the keepalive to the decoding commit > > timing didn't look impossible to me, although my suggestion can be > > ad-hoc. > > > > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr > > is updated from logical_decoding_ctx->reader->EndRecPtr in > XLogSendLogical. > > In the XLogSendLogical, we update it after we execute > LogicalDecodingProcessRecord. > > This order leads to the current implementation to wait the next > > iteration to send a keepalive in WalSndWaitForWal. > > > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for > > updating sentPtr. The end_lsn is the lsn same as the > > ctx->reader->EndRecPtr, which means advancing the timing to update the > sentPtr for the commit case. > > Then if the transaction is empty in synchronous mode, send the > > keepalive in WalSndUpdateProgress directly, instead of having the > > force_keepalive_syncrep flag and having it true. > > > > You have a point in that we don't need to delay sending this message till next > WalSndWaitForWal() but I don't see why we need to change anything about > update of sentPtr. Yeah, you're right. Now I think we don't need the update of sentPtr to send a keepalive. I thought we can send a keepalive message after its update in XLogSendLogical or any appropriate place for it after the existing update. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com wrote: > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > wrote: > Changing the timing to send the keepalive to the decoding commit > timing didn't look impossible to me, although my suggestion > can be ad-hoc. > > After the initialization of sentPtr(by confirmed_flush lsn), > sentPtr is updated from logical_decoding_ctx->reader->EndRecPtr in > XLogSendLogical. > In the XLogSendLogical, we update it after we execute > LogicalDecodingProcessRecord. > This order leads to the current implementation to wait the next iteration > to send a keepalive in WalSndWaitForWal. > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for updating > sentPtr. The end_lsn is the lsn same as the ctx->reader->EndRecPtr, > which means advancing the timing to update the sentPtr for the commit case. > Then if the transaction is empty in synchronous mode, > send the keepalive in WalSndUpdateProgress directly, > instead of having the force_keepalive_syncrep flag and having it true. > You have a point in that we don't need to delay sending this message till next WalSndWaitForWal() but I don't see why we need to change anything about update of sentPtr. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Thu, Feb 17, 2022 at 4:12 PM Amit Kapila wrote: > > On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian wrote: > > > > Few comments: > = > One more comment: @@ -1546,10 +1557,11 @@ WalSndWaitForWal(XLogRecPtr loc) * otherwise idle, this keepalive will trigger a reply. Processing the * reply will update these MyWalSnd locations. */ - if (MyWalSnd->flush < sentPtr && + if (force_keepalive_syncrep || + (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && - !waiting_for_ping_response) - WalSndKeepalive(false); + !waiting_for_ping_response)) + WalSndKeepalive(false); Will this allow syncrep to proceed in case we are skipping the transaction? Won't we need to send a feedback message with 'requestReply' true in this case as we release syncrep waiters while processing standby message, see ProcessStandbyReplyMessage->SyncRepReleaseWaiters. Without 'requestReply', the subscriber might not send any message and the syncrep won't proceed. Why do you decide to delay sending this message till WalSndWaitForWal()? It may not be called for each transaction. I feel we should try to device a test case to test this sync replication mechanism such that without this particular change the sync rep transaction waits momentarily but with this change it doesn't wait. I am not entirely sure whether we can devise an automated test as this is timing related issue but I guess we can at least manually try to produce a case. -- With Regards, Amit Kapila.
Re: [Proposal] Add foreign-server health checks infrastructure
On 2022/02/17 19:35, kuroda.hay...@fujitsu.com wrote: Dear Horiguchi-san, I think we just don't need to add the special timeout kind to the core. postgres_fdw can use USER_TIMEOUT and it would be suffiction to keep running health checking regardless of transaction state then fire query cancel if disconnection happens. As I said in the previous main, possible extra query cancel woud be safe. Sounds reasonable to me. I finally figured out that you mentioned about user-defined timeout system. Firstly - before posting to hackers - I designed like that, but I was afraid of an overhead that many FDW registers timeout and call setitimer() many times. Is it too overcautious? Isn't it a very special case where many FDWs use their own user timeouts? Could you tell me the assumption that you're thinking, especially how many FDWs are working? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Failed transaction statistics to measure the logical replication progress
On Thursday, February 17, 2022 6:45 PM Amit Kapila wrote: > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila > wrote: > > > > On Tue, Jan 4, 2022 at 5:22 PM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Friday, December 31, 2021 10:12 AM Tang, Haiying/唐 海英 > wrote: > > > > 4) > > > > +void > > > > +pgstat_send_subworker_xact_stats(LogicalRepWorker *repWorker, > > > > +bool > > > > +force) { > > > > + static TimestampTz last_report = 0; > > > > + PgStat_MsgSubWorkerXactEnd msg; > > > > + > > > > + if (!force) > > > > + { > > > > ... > > > > + if (!TimestampDifferenceExceeds(last_report, now, > > > > PGSTAT_STAT_INTERVAL)) > > > > + return; > > > > + last_report = now; > > > > + } > > > > + > > > > ... > > > > + if (repWorker->commit_count == 0 && repWorker->abort_count > > > > + == > > > > 0) > > > > + return; > > > > ... > > > > > > > > I think it's better to check commit_count and abort_count first, > > > > then check if reach PGSTAT_STAT_INTERVAL. > > > > Otherwise if commit_count and abort_count are 0, it is possible > > > > that the value of last_report has been updated but it didn't send > > > > stats in fact. In this case, last_report is not the real time that send > > > > last > message. > > > Yeah, agreed. This fix is right in terms of the variable name aspect. > > > > > > > Can't we use pgstat_report_stat() here? Basically, you can update xact > > completetion counters during apply, and then from > > pgstat_report_stat(), you can invoke a logical replication worker > > stats-related function. > > > > If we can do this then we can save the logic this patch is trying to > introduce for > PGSTAT_STAT_INTERVAL. Hi, I've encounter a couple of questions during my modification, following your advice. In the pgstat_report_stat, we refer to the return value of GetCurrentTransactionStopTimestamp to compare the time different from the last time. (In my previous patch, I used GetCurrentTimestamp) This time is updated in apply_handle_commit_internal's CommitTransactionCommand for the apply worker. Then, if I update the subscription worker stats(commit_count/abort_count) immediately after this CommitTransactionCommand and immediately call pgstat_report_stat in the apply_handle_commit_internal, the time difference becomes too small (falls short of PGSTAT_STAT_INTERVAL). Also, the time of GetCurrentTransactionStopTimestamp is not updated even when I keep calling pgstat_report_stat repeatedly. Then, IIUC the next possible timing that message of commit_count or abort_count is sent to the stats collector would become the time when we execute another transaction by the apply worker and update the time for GetCurrentTransactionStopTimestamp and rerun pgstat_report_stat again. So, if we keep GetCurrentTransactionStopTimestamp without change, an update of stats depends on another new subsequent transaction if we simply merge those. (this leads to users cannot see the latest stats information ?) At least, I got a test failure because of this function for streaming commit case because it uses poll_query_until to wait for stats update. On the other hand, replacing GetCurrentTransactionStopTimestamp with GetCurrentTimestamp in case of apply worker looks have another negative impact. If we do so, it becomes possible that we go into the code to scan TabStatusArray with PgStat_TableStatus's trans with non-null values, because of the timing change. I might be able to avoid this kind of assert failure if I write the message send function of this patch before other existing functions to send various type of messages and return if the process is apply worker in pgstat_report_stat. But, I can't be convinced that this way of modification is OK. What did you think about those issues ? Best Regards, Takamichi Osumi
Re: Design of pg_stat_subscription_workers vs pgstats
On Wed, Feb 16, 2022 at 8:36 PM Masahiko Sawada wrote: > > On Wed, Feb 16, 2022 at 5:49 PM Amit Kapila wrote: > > > > On Tue, Feb 15, 2022 at 11:56 PM Andres Freund wrote: > > > > > > On 2022-02-03 13:33:08 +0900, Masahiko Sawada wrote: > > > > I see that important information such as error-XID that can be used > > > > for ALTER SUBSCRIPTION SKIP needs to be stored in a reliable way, and > > > > using system catalogs is a reasonable way for this purpose. But it's > > > > still unclear to me why all error information that is currently shown > > > > in pg_stat_subscription_workers view, including error-XID and the > > > > error message, relation OID, action, etc., need to be stored in the > > > > catalog. The information other than error-XID doesn't necessarily need > > > > to be reliable compared to error-XID. I think we can have > > > > error-XID/LSN in the pg_subscription catalog and have other error > > > > information in pg_stat_subscription_workers view. After the user > > > > checks the current status of logical replication by checking > > > > error-XID/LSN, they can check pg_stat_subscription_workers for > > > > details. > > > > > > The stats system isn't geared towards storing error messages and > > > such. Generally it is about storing counts of events etc, not about > > > information about a single event. Obviously there are a few cases where > > > that > > > boundary isn't that clear... > > > > > > > True, in the beginning, we discussed this information to be stored in > > system catalog [1] (See Also, I am thinking that instead of a > > stat view, do we need to consider having a system table .. ) but later > > discussion led us to store this as stats. > > > > > IOW, storing information like: > > > - subscription oid > > > - retryable error count > > > - hard error count > > > - #replicated inserts > > > - #replicated updates > > > - #replicated deletes > > > > > > is what pgstats is for. > > > > > > > Some of this and similar ((like error count, last_error_time)) is > > present in stats and something on the lines of the other information > > is proposed in [2] (xacts successfully replicated (committed), > > aborted, etc) to be stored along with it. > > > > > But not > > > - subscription oid > > > - error message > > > - xid of error > > > - ... > > > > > > > I think from the current set of things we are capturing, the other > > thing in this list will be error_command (insert/update/delete..) and > > or probably error_code. So, we can keep this information in a system > > table. > > Agreed. Also, we can have also commit-LSN or replace error-XID with error-LSN? > > > > > Based on this discussion, it appears to me what we want here is to > > store the error info in some persistent way (system table) and the > > counters (both error and success) of logical replication in stats. If > > we can't achieve this work (separation) in the next few weeks (before > > the feature freeze) then I'll revert the work related to stats. > > There was an idea to use shmem to store error info but it seems to be > better to use a system catalog to persist them. > > I'll summarize changes we discussed and make a plan of changes and > feature designs toward the feature freeze (and v16). I think that once > we get a consensus on them we can start implementation and move it > forward. > Here is the summary of the discussion, changes, and plan. 1. Move some error information such as the error message to a new system catalog, pg_subscription_error. The pg_subscription_error table would have the following columns: * sesubid : subscription Oid. * serelid : relation Oid (NULL for apply worker). * seerrlsn : commit-LSN or the error transaction. * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction. * seerrmsg : error message The tuple is inserted or updated when an apply worker or a tablesync worker raises an error. If the same error occurs in a row, the update is skipped. The tuple is removed in the following cases: * the subscription is dropped. * the table is dropped. * the table is removed from the subscription. * the worker successfully committed a non-empty transaction. With this change, pg_stat_subscription_workers will be like: * subid * subname * subrelid * error_count * last_error_timestamp This view will be extended by adding transaction statistics proposed on another thread[1]. 2. Fix a bug in pg_stat_subscription_workers. As pointed out by Andres, there is a bug in pg_stat_subscription_workers; it doesn't drop entries for already-dropped tables. We need to fix it. 3. Show commit-LSN of the error transaction in errcontext. Currently, we show XID and commit timestamp in the errcontext. But given that we use LSN in pg_subscriptoin_error, it's better to show commit-LSN as well (or instead of error-XID). 4. Skipping the particular conflicted transaction. There are two proposals: 4-1. Use the existing replication_origin_advance() SQL function. We don't need to add any additional syntax
Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)
On Tue, Feb 08, 2022 at 12:56:59PM +0900, Michael Paquier wrote: > Well, I can see that this is a second independent complain after a few > months. If you wish to keep this capability, wouldn't it be better to > add a "regress" mode to compute_query_id, where we would mask > automatically this information in the output of EXPLAIN but still run > the computation? So, I have been looking at this problem, and I don't see a problem in doing something like the attached, where we add a "regress" mode to compute_query_id that is a synonym of "auto". Or, in short, we have the default of letting a module decide if a query ID can be computed or not, at the exception that we hide its result in EXPLAIN outputs. Julien, what do you think? FWIW, about your question of upthread, I have noticed the behavior while testing, but I know of some internal customers that enable pg_stat_statements and like doing tests on the PostgreSQL instance deployed this way, so that would break. They are not on 14 yet as far as I know. -- Michael diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h index a4c277269e..c670662db2 100644 --- a/src/include/utils/queryjumble.h +++ b/src/include/utils/queryjumble.h @@ -57,7 +57,8 @@ enum ComputeQueryIdType { COMPUTE_QUERY_ID_OFF, COMPUTE_QUERY_ID_ON, - COMPUTE_QUERY_ID_AUTO + COMPUTE_QUERY_ID_AUTO, + COMPUTE_QUERY_ID_REGRESS }; /* GUC parameters */ diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index b970997c34..fae3926b42 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -604,7 +604,13 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, /* Create textual dump of plan tree */ ExplainPrintPlan(es, queryDesc); - if (es->verbose && plannedstmt->queryId != UINT64CONST(0)) + /* + * COMPUTE_QUERY_ID_REGRESS means COMPUTE_QUERY_ID_AUTO, but we don't + * show it up in any of the EXPLAIN plans to keep stable the results + * generated by any regression test suite. + */ + if (es->verbose && plannedstmt->queryId != UINT64CONST(0) && + compute_query_id != COMPUTE_QUERY_ID_REGRESS) { /* * Output the queryid as an int64 rather than a uint64 so we match diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 01f373815e..7438df9109 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -412,6 +412,7 @@ static const struct config_enum_entry backslash_quote_options[] = { */ static const struct config_enum_entry compute_query_id_options[] = { {"auto", COMPUTE_QUERY_ID_AUTO, false}, + {"regress", COMPUTE_QUERY_ID_REGRESS, false}, {"on", COMPUTE_QUERY_ID_ON, false}, {"off", COMPUTE_QUERY_ID_OFF, false}, {"true", COMPUTE_QUERY_ID_ON, true}, diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index e6f71c7582..a4612d9a8a 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -1925,8 +1925,9 @@ create_database(const char *dbname) "ALTER DATABASE \"%s\" SET lc_numeric TO 'C';" "ALTER DATABASE \"%s\" SET lc_time TO 'C';" "ALTER DATABASE \"%s\" SET bytea_output TO 'hex';" + "ALTER DATABASE \"%s\" SET compute_query_id TO 'regress';" "ALTER DATABASE \"%s\" SET timezone_abbreviations TO 'Default';", - dbname, dbname, dbname, dbname, dbname, dbname); + dbname, dbname, dbname, dbname, dbname, dbname, dbname); psql_end_command(buf, "postgres"); /* diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d99bf38e67..036e6da680 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7934,9 +7934,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; method is not acceptable. In this case, in-core computation must be always disabled. Valid values are off (always disabled), -on (always enabled) and auto, +on (always enabled), auto, which lets modules such as -automatically enable it. +automatically enable it, and regress which +has the same effect as auto, except that the +query identifier is hidden in the EXPLAIN output. The default is auto. signature.asc Description: PGP signature
Re: logical replication empty transactions
On Wed, Feb 16, 2022 at 2:15 PM osumi.takami...@fujitsu.com wrote: > Another idea would be, to create an empty file under the the > pg_replslot/slotname > with a prefix different from "xid" in the DecodePrepare before the shutdown > if the prepare was empty, and bypass the cleanup of the serialized txns > and check the existence after the restart. But, this is pretty ad-hoc and I > wasn't sure > if to address the corner case of the restart has the strong enough > justification > to create this new file format. > Yes, this doesn't look very efficient. > Therefore, in my humble opinion, the idea of protocol change slightly wins, > since the impact of the protocol change would not be big. We introduced > the protocol version 3 in the devel version and the number of users should be > little. Yes, but we don't want to break backward compatibility for this small added optimization. Amit, I will work on your comments. regards, Ajin Cherian Fujitsu Australia