Re: pg_rewind with cascade standby doesn't work well
Thanks for your review! 2023年9月27日(水) 8:33 Michael Paquier : > On Tue, Sep 26, 2023 at 06:44:50PM +0300, Aleksander Alekseev wrote: > >> And also, I'm afraid that I'm not sure what kind of tests I have to make > >> for fix this behavior. Would you mind giving me some advice? > > > > Personally I would prefer not to increase the scope of work. Your TAP > > test added in 0001 seems to be adequate. > > Yeah, agreed. I'm OK with what you are proposing, basically (the > test could be made a bit cheaper actually). I guess you meant that it contains an unnecessary insert and wait. I fixed this and some incorrect comments caused by copy & paste. Please see the attached patch. > /* > - * For Hot Standby, we could treat this like a Shutdown > Checkpoint, > - * but this case is rarer and harder to test, so the benefit > doesn't > - * outweigh the potential extra cost of maintenance. > + * For Hot Standby, we could treat this like an end-of-recovery > checkpoint > */ > +RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT); > > I don't understand what you want to change here. Archive recovery and > crash recovery are two different things, still this code would be > triggered even if there is no standby.signal, aka the node is not a > standby. Why isn't this stuff conditional? You are absolutely right. It should only run in standby mode. Also, according to the document[1], a server can be "Hot Standby" even if it is not in standby mode (i.e. when it is in archive recovery mode). So I fixed the comment above `RequestCheckpoint()`. [1]: https://www.postgresql.org/docs/current/hot-standby.html I hope you will review it again. Regards, Masaki Kuwamura v3-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch Description: Binary data
Re: Clarify where the severity level is defined
> > Committed, with some minor wordsmithing. Thanks! > Thanks for tweaking and pushing, Daniel-san! Masaki Kuwamura
Re: bug fix and documentation improvement about vacuumdb
> > I've applied this down to v16 now, thanks for the submission! > Thanks for pushing! Masaki Kuwamura
Clarify where the severity level is defined
Hi, Recently I read the document about ereport()[1]. Then, I felt that there is little information about severity level. So I guess it can be kind to clarify where severity level is defined(see attached patch please). Any thoughts? [1] https://www.postgresql.org/docs/devel/error-message-reporting.html diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index 514090d5a6..738fd3fe0c 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -105,7 +105,8 @@ less -x4 There are two required elements for every message: a severity level -(ranging from DEBUG to PANIC) and a primary +(ranging from DEBUG to PANIC, defined at +the top of src/include/utils/elog.h) and a primary message text. In addition there are optional elements, the most common of which is an error identifier code that follows the SQL spec's SQLSTATE conventions.
Re: bug fix and documentation improvement about vacuumdb
LGTM too! >> a bit to make the diff smaller, I couldn't think from that perspective. Thanks for your update, Daniel-san. Masaki Kuwamura
Re: bug fix and documentation improvement about vacuumdb
> > No worries at all. If you look at the page now you will see all green > checkmarks indicating that the patch was tested in CI. So now we know that > your tests fail without the fix and work with the fix applied, so all is > well. > Thank you for your kind words! And it seems to me that all concerns are resolved. I'll change the patch status to Ready for Committer. If you have or find any flaw, let me know that. Best Regards, Masaki Kuwamura
Re: bug fix and documentation improvement about vacuumdb
> > I agree. Supporting pattern matching should, if anyone is interested in > trying, be done separately in its own thread, no need to move the goalposts > here. Sorry if I made it sound like so upthread. > I got it. > When sending an update, please include the previous patch as well with > your new > tests as a 0002 patch in a patchset. The CFBot can only apply and > build/test > patches when the entire patchset is attached to the email. The below > testresults indicate that the patch failed the new tests (which in a way is > good since without the fix the tests *should* fail), since the fix patch > was > not applied: > > http://cfbot.cputube.org/masaki-kuwamura.html > I'm sorry, I didn't know that. I attached both the test and fix patch to this mail. (The fix patch is clearly Nathan-san's though) If I'm still in a wrong way, please let me know. Masaki Kuwamura v1_vacuumdb_add_tests.patch Description: Binary data vacuumdb_fix.patch Description: Binary data
Re: bug fix and documentation improvement about vacuumdb
Thank you for all your reviews! >>> PATTERN should be changed to SCHEMA because -n and -N options don't support >>> pattern matching for schema names. The attached patch 0001 fixes this. >> >> True, there is no pattern matching performed. I wonder if it's worth lifting >> the pattern matching from pg_dump into common code such that tools like this >> can use it? > > I agree that this should be changed to SCHEMA. It might be tough to add > pattern matching with the current catalog query, and I don't know whether > there is demand for such a feature, but I wouldn't discourage someone from > trying. I think that supporting pattern matching is quite nice. But it will be not only tough but also a breaking change, I wonder. So I guess this change should be commited either way. >>> Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch >>> together to demonstrate. > > Looks good from a quick skim. I do agree with this updates. Thank you! >> We should probably add some tests... > > Agreed. The attached patch includes new tests for this bug. Also, I fixed the current test for -N option seems to be incorrect. >>> It seems to work fine. However, if we're aiming for consistent >>> spacing, the "IS NULL" (two spaces in between) might be an concern. >> >> I don't think that's a problem. I would rather have readable C code and two >> spaces in the generated SQL than contorting the C code to produce less >> whitespace in a query few will read in its generated form. > > I think we could pretty easily avoid the extra space and keep the C code > relatively readable. These sorts of things bug me, too (see 2af3336). Though I don't think it affects readability, I'm neutral about this. >> >> Third, for the description of the -N option, I wonder if "vacuum all tables except >> >> in the specified schema(s)" might be clearer. The current one says nothing about >> >> tables not in the specified schema. >> > >> > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who >> > would expect anything else than vacuuming everything but the excluded schema >> > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do >> > that can be confusing? >> >> I agree with Daniel on this one. > > +1. That make sense. I retract my suggestion. v1_vacuumdb_add_tests.patch Description: Binary data
Re: pg_rewind with cascade standby doesn't work well
>> IMO a test is needed that makes sure no one is going to break this in >> the future. > > You definitely need more complex test scenarios for that. If you can > come up with new ways to make the TAP tests of pg_rewind mode modular > in handling more complicated node setups, that would be a nice > addition, for example. I'm sorry for lacking tests. For now, I started off with a simple test that cause the problem I mentioned. The updated WIP patch 0001 includes the new test for pg_rewind. And also, I'm afraid that I'm not sure what kind of tests I have to make for fix this behavior. Would you mind giving me some advice? > This patch is at least incorrect in its handling of crash recovery, > because these two should *never* be set in this case as we want to > replay up to the end of WAL. For example, see xlog.c or the top of > xlogrecovery.c about the assumptions behind these variables: > /* crash recovery should always recover to the end of WAL */ > ControlFile->minRecoveryPoint = InvalidXLogRecPtr; > ControlFile->minRecoveryPointTLI = 0; > > If an end-of-recovery record is replayed during crash recovery, these > assumptions are plain broken. That make sense! I really appreciate your knowledgeable review. > One thing that we could consider is to be more aggressive with > restartpoints when replaying this record for a standby, see a few > lines above the lines added by your patch, for example. And we could > potentially emulate a post-promotion restart point to get a refresh of > the control file as it should, with the correct code paths involved in > the updates of minRecoveryPoint when the checkpointer does the job. I'm not confident but you meant we could make restartpoint (i.e., call `RequestCheckpoint()`) instead of my old patch? The patch 0001 also contains my understanding. I also found a bug (maybe). If we call `CreateRestartPoint()` during crash-recovery, the assertion fails at ComputeXidHorizon() in procarray.c. It's inherently orthogonal to the problem I already reported. So you can reproduce this at HEAD with this procedure. 1. Start primary and standby server 2. Modify checkpoint_timeout to 1h on standby 3. Insert 10^10 records and concurrently run CHECKPOINT every second on primary 4. Do an immediate stop on both standby and primary at the end of the insert 5. Modify checkpoint_timeout to 30 on standby 6. Remove standby.signal on standby 7. Restart standby (it will start crash-recovery) 8. Assertion failure is raised shortly I think this is because `TruncateSUBTRANS();` in `CreateRestartPoint()` is called but `StartupSUBTRANS()` isn't called yet. In `StartupXLOG()`, we call `StartupSUBTRANS()` if `(ArchiveRecoveryRequested && EnableHotStandby)`. However, in `CreateRestartPoint()`, we call `TruncateSUBTRANS()` if `(EnableHotStandby)`. I guess the difference causes this bug. The latter possibly be called even crash-recovery while former isn't. The attached patch 0002 fixes it. I think we could discuss about this bug in another thread if needed. Best regards! Masaki Kuwamura v2-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch Description: Binary data v1-0002-Fix-restartpoint-during-crash-recovery.patch Description: Binary data
bug fix and documentation improvement about vacuumdb
Hi there, I have 1 trivial fix, 1 bug fix, and 1 suggestion about vacuumdb. First, I noticed that the help message of `vacuumdb` is a bit incorrect. `vacuumdb -?` displays the following message ``` ... -n, --schema=PATTERNvacuum tables in the specified schema(s) only -N, --exclude-schema=PATTERNdo not vacuum tables in the specified schema(s) ... ``` PATTERN should be changed to SCHEMA because -n and -N options don't support pattern matching for schema names. The attached patch 0001 fixes this. Second, when we use multiple -N options, vacuumdb runs incorrectly as shown below. ``` $ psql =# CREATE SCHEMA s1; =# CREATE SCHEMA s2; =# CREATE SCHEMA s3; =# CREATE TABLE s1.t(i int); =# CREATE TABLE s2.t(i int); =# CREATE TABLE s3.t(i int); =# ALTER SYSTEM SET log_statement TO 'all'; =# SELECT pg_reload_conf(); =# \q $ vacuumdb -N s1 -N s2 ``` We expect that tables in schemas s1 and s2 should not be vacuumed, while the others should be. However, logfile says like this. ``` LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc; LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc; ... LOG: statement: VACUUM (SKIP_DATABASE_STATS) s2.t; LOG: statement: VACUUM (SKIP_DATABASE_STATS) s1.t; LOG: statement: VACUUM (ONLY_DATABASE_STATS); ``` Even specified by -N, s1.t and s2.t are vacuumed, and also the others are vacuumed twice. The attached patch 0002 fixes this. Third, for the description of the -N option, I wonder if "vacuum all tables except in the specified schema(s)" might be clearer. The current one says nothing about tables not in the specified schema. Thoughts? Masaki Kuwamura v1-0001-vacuumdb-Fix-help-message.patch Description: Binary data v1-0002-vacuumdb-Fix-bug-multiple-N-switches.patch Description: Binary data
Re: pg_rewind with cascade standby doesn't work well
> Consider a scenario like this, > > Server A: primary > Server B :replica of A > Server C :replica of B > > and somehow A down ,so B gets promoted. > Server A: down > Server B :new primary > Server C :replica of B > > In this case, pg_rewind can be used to reconstruct the cascade; the source is C and the target is A. > However, we get error as belows by running pg_rewind. > > ``` > pg_rewind: fetched file "global/pg_control", length 8192 > pg_rewind: source and target cluster are on the same timeline > pg_rewind: no rewind required > ``` To fix the above mentioned behavior of pg_rewind, I suggest to change the cascade standby's (i.e. server C's) minRecoveryPointTLI when it receives the new timeline information from the new primary (i.e. server B). When server B is promoted, it creates an end-of-recovery record by calling CreateEndOfRecoveryRecord(). (in xlog.c) And also updates B's minRecoveryPoint and minRecoveryPointTLI. ``` /* * Update the control file so that crash recovery can follow the timeline * changes to this point. */ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->minRecoveryPoint = recptr; ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID; UpdateControlFile(); LWLockRelease(ControlFileLock); ``` Since C is a replica of B, the end-of-recovery record is replicated from B to C, so the record is replayed in C by xlog_redo(). The attached patch updates minRecoveryPoint and minRecoveryPointTLI at this point by mimicking CreateEndOfRecoveryRecord(). With this patch, you can run pg_rewind with cascade standby immediately. (without waiting for checkpointing) Thoughts? Masaki Kuwamura v1-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch Description: Binary data
pg_rewind with cascade standby doesn't work well
Hi there, I tested pg_rewind behavior and found a suspicious one. Consider a scenario like this, Server A: primary Server B :replica of A Server C :replica of B and somehow A down ,so B gets promoted. Server A: down Server B :new primary Server C :replica of B In this case, pg_rewind can be used to reconstruct the cascade; the source is C and the target is A. However, we get error as belows by running pg_rewind. ``` pg_rewind: fetched file "global/pg_control", length 8192 pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required ``` Though A's timeline is 1 and C's is 2 ideally, it says they're on the same timeline. This is because `pg_rewind` currently uses minRecoveryPointTLI and latest checkpoint's TimelineID to compare the TLI between source and target[1]. Both C's minRecoveryPointTLI and Latest checkpoint's TimelineID are not modified until checkpointing. (even though B's are modified). And then, if you run pg_rewind immediately, pg_rewind won't work because C and A appear to be on the same timeline. So we have to CHECKPOINT on C before running pg_rewind; BTW, immediate pg_rewind with cascade standby seems to be already concerned in another discussion[2], but unfortunately missed. Anyway, I don't think this behavior is kind. To fix this, should we use another variable to compare TLI? Or, modify the cascade standby's minRecoveryPointTLI somehow? Masaki Kuwamura [1] https://www.postgresql.org/message-id/flat/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi [2] https://www.postgresql.org/message-id/flat/aeb5f31a-8de2-40a8-64af-ab659a309d6b%40iki.fi
Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
Hi, Fujii-san > Regarding the WARNING message, another idea is to pass the return value > of PQgetCancel() directly to PQcancel() as follows. If NULL is passed, > PQcancel() will detect it and set the proper error message to errbuf. > Then the warning message "WARNING: could not send cancel request: > PQcancel() -- no cancel object supplied" is output. I agree to go with this. With this approach, the information behind the error (e.g., "out of memory") will disappear, I guess. I think we have to deal with it eventually. (I'm sorry, I don't have a good idea right now) However, the original issue is unnecessary waiting, and this should be fixed soon. So it is better to fix the problem this way and discuss retaining information in another patch IMO. I'm afraid I'm new to reviewing. If I'm misunderstanding something, please let me know. Masaki Kuwamura