Re: pg_rewind with cascade standby doesn't work well

2023-10-06 Thread Kuwamura Masaki
Thanks for your review!

2023年9月27日(水) 8:33 Michael Paquier :

> On Tue, Sep 26, 2023 at 06:44:50PM +0300, Aleksander Alekseev wrote:
> >> And also, I'm afraid that I'm not sure what kind of tests I have to make
> >> for fix this behavior. Would you mind giving me some advice?
> >
> > Personally I would prefer not to increase the scope of work. Your TAP
> > test added in 0001 seems to be adequate.
>
> Yeah, agreed.  I'm OK with what you are proposing, basically (the
> test could be made a bit cheaper actually).


I guess you meant that it contains an unnecessary insert and wait.
I fixed this and some incorrect comments caused by copy & paste.
Please see the attached patch.


>
 /*
> - * For Hot Standby, we could treat this like a Shutdown
> Checkpoint,
> - * but this case is rarer and harder to test, so the benefit
> doesn't
> - * outweigh the potential extra cost of maintenance.
> + * For Hot Standby, we could treat this like an end-of-recovery
> checkpoint
>   */
> +RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT);
>
> I don't understand what you want to change here.  Archive recovery and
> crash recovery are two different things, still this code would be
> triggered even if there is no standby.signal, aka the node is not a
> standby.  Why isn't this stuff conditional?


You are absolutely right. It should only run in standby mode.
Also, according to the document[1], a server can be "Hot Standby" even if
it is
not in standby mode (i.e. when it is in archive recovery mode).
So I fixed the comment above `RequestCheckpoint()`.

[1]: https://www.postgresql.org/docs/current/hot-standby.html

I hope you will review it again.

Regards,

Masaki Kuwamura


v3-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch
Description: Binary data


Re: Clarify where the severity level is defined

2023-09-28 Thread Kuwamura Masaki
>
> Committed, with some minor wordsmithing. Thanks!
>

Thanks for tweaking and pushing, Daniel-san!

Masaki Kuwamura


Re: bug fix and documentation improvement about vacuumdb

2023-09-25 Thread Kuwamura Masaki
>
> I've applied this down to v16 now, thanks for the submission!
>

Thanks for pushing!

Masaki Kuwamura


Clarify where the severity level is defined

2023-09-25 Thread Kuwamura Masaki
Hi,

Recently I read the document about ereport()[1].
Then, I felt that there is little information about severity level.
So I guess it can be kind to clarify where severity level is defined(see
attached patch please).

Any thoughts?


[1] https://www.postgresql.org/docs/devel/error-message-reporting.html
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 514090d5a6..738fd3fe0c 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -105,7 +105,8 @@ less -x4
 

 There are two required elements for every message: a severity level
-(ranging from DEBUG to PANIC) and a 
primary
+(ranging from DEBUG to PANIC, 
defined at
+the top of src/include/utils/elog.h) and a primary
 message text.  In addition there are optional elements, the most
 common of which is an error identifier code that follows the SQL spec's
 SQLSTATE conventions.

Re: bug fix and documentation improvement about vacuumdb

2023-09-24 Thread Kuwamura Masaki
LGTM too!

>> a bit to make the diff smaller,
I couldn't think from that perspective. Thanks for your update, Daniel-san.

Masaki Kuwamura


Re: bug fix and documentation improvement about vacuumdb

2023-09-22 Thread Kuwamura Masaki
>
> No worries at all.  If you look at the page now you will see all green
> checkmarks indicating that the patch was tested in CI.  So now we know that
> your tests fail without the fix and work with the fix applied, so all is
> well.
>

Thank you for your kind words!

And it seems to me that all concerns are resolved.
I'll change the patch status to Ready for Committer.
If you have or find any flaw, let me know that.

Best Regards,

Masaki Kuwamura


Re: bug fix and documentation improvement about vacuumdb

2023-09-20 Thread Kuwamura Masaki
>
> I agree.  Supporting pattern matching should, if anyone is interested in
> trying, be done separately in its own thread, no need to move the goalposts
> here. Sorry if I made it sound like so upthread.
>
 I got it.


> When sending an update, please include the previous patch as well with
> your new
> tests as a 0002 patch in a patchset.  The CFBot can only apply and
> build/test
> patches when the entire patchset is attached to the email.  The below
> testresults indicate that the patch failed the new tests (which in a way is
> good since without the fix the tests *should* fail), since the fix patch
> was
> not applied:
>
> http://cfbot.cputube.org/masaki-kuwamura.html
>
I'm sorry, I didn't know that. I attached both the test and fix patch to
this mail.
(The fix patch is clearly Nathan-san's though)
If I'm still in a wrong way, please let me know.

Masaki Kuwamura


v1_vacuumdb_add_tests.patch
Description: Binary data


vacuumdb_fix.patch
Description: Binary data


Re: bug fix and documentation improvement about vacuumdb

2023-09-20 Thread Kuwamura Masaki
Thank you for all your reviews!

>>> PATTERN should be changed to SCHEMA because -n and -N options don't
support
>>> pattern matching for schema names. The attached patch 0001 fixes this.
>>
>> True, there is no pattern matching performed.  I wonder if it's worth
lifting
>> the pattern matching from pg_dump into common code such that tools like
this
>> can use it?
>
> I agree that this should be changed to SCHEMA.  It might be tough to add
> pattern matching with the current catalog query, and I don't know whether
> there is demand for such a feature, but I wouldn't discourage someone from
> trying.

I think that supporting pattern matching is quite nice.
But it will be not only tough but also a breaking change, I wonder.
So I guess this change should be commited either way.

>>> Yeah, I think we can fix the JOIN as you suggest.  I quickly put a patch
>>> together to demonstrate.
>
> Looks good from a quick skim.

I do agree with this updates. Thank you!

>> We should probably add some tests...
>
> Agreed.

The attached patch includes new tests for this bug.
Also, I fixed the current test for -N option seems to be incorrect.

>>> It seems to work fine. However, if we're aiming for consistent
>>> spacing, the "IS NULL" (two spaces in between) might be an concern.
>>
>> I don't think that's a problem.  I would rather have readable C code and
two
>> spaces in the generated SQL than contorting the C code to produce less
>> whitespace in a query few will read in its generated form.
>
> I think we could pretty easily avoid the extra space and keep the C code
> relatively readable.  These sorts of things bug me, too (see 2af3336).

Though I don't think it affects readability, I'm neutral about this.

>> >> Third, for the description of the -N option, I wonder if "vacuum all
tables except
>> >> in the specified schema(s)" might be clearer. The current one says
nothing about
>> >> tables not in the specified schema.
>> >
>> > Maybe, but the point of vacuumdb is to analyze a database so I'm not
sure who
>> > would expect anything else than vacuuming everything but the excluded
schema
>> > when specifying -N.  What else could "vacuumdb -N foo" be interpreted
to do
>> > that can be confusing?
>>
>> I agree with Daniel on this one.
>
> +1.

That make sense. I retract my suggestion.


v1_vacuumdb_add_tests.patch
Description: Binary data


Re: pg_rewind with cascade standby doesn't work well

2023-09-19 Thread Kuwamura Masaki
>> IMO a test is needed that makes sure no one is going to break this in
>> the future.
>
> You definitely need more complex test scenarios for that.  If you can
> come up with new ways to make the TAP tests of pg_rewind mode modular
> in handling more complicated node setups, that would be a nice
> addition, for example.

I'm sorry for lacking tests. For now, I started off with a simple test
that cause the problem I mentioned. The updated WIP patch 0001 includes
the new test for pg_rewind.
And also, I'm afraid that I'm not sure what kind of tests I have to make
for fix this behavior. Would you mind giving me some advice?

> This patch is at least incorrect in its handling of crash recovery,
> because these two should *never* be set in this case as we want to
> replay up to the end of WAL.  For example, see xlog.c or the top of
> xlogrecovery.c about the assumptions behind these variables:
> /* crash recovery should always recover to the end of WAL */
> ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
> ControlFile->minRecoveryPointTLI = 0;
>
> If an end-of-recovery record is replayed during crash recovery, these
> assumptions are plain broken.

That make sense! I really appreciate your knowledgeable review.

> One thing that we could consider is to be more aggressive with
> restartpoints when replaying this record for a standby, see a few
> lines above the lines added by your patch, for example.  And we could
> potentially emulate a post-promotion restart point to get a refresh of
> the control file as it should, with the correct code paths involved in
> the updates of minRecoveryPoint when the checkpointer does the job.

I'm not confident but you meant we could make restartpoint
(i.e., call `RequestCheckpoint()`) instead of my old patch?
The patch 0001 also contains my understanding.

I also found a bug (maybe). If we call `CreateRestartPoint()` during
crash-recovery, the assertion fails at ComputeXidHorizon() in procarray.c.
It's inherently orthogonal to the problem I already reported. So you can
reproduce this at HEAD with this procedure.

1. Start primary and standby server
2. Modify checkpoint_timeout to 1h on standby
3. Insert 10^10 records and concurrently run CHECKPOINT every second on
primary
4. Do an immediate stop on both standby and primary at the end of the insert
5. Modify checkpoint_timeout to 30 on standby
6. Remove standby.signal on standby
7. Restart standby (it will start crash-recovery)
8. Assertion failure is raised shortly

I think this is because `TruncateSUBTRANS();`  in `CreateRestartPoint()` is
called but `StartupSUBTRANS()` isn't called yet. In `StartupXLOG()`, we
call
`StartupSUBTRANS()` if `(ArchiveRecoveryRequested && EnableHotStandby)`.
However, in `CreateRestartPoint()`, we call `TruncateSUBTRANS()` if
`(EnableHotStandby)`. I guess the difference causes this bug. The latter
possibly be called even crash-recovery while former isn't.
The attached patch 0002 fixes it. I think we could discuss about this bug
in
another thread if needed.

Best regards!

Masaki Kuwamura


v2-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch
Description: Binary data


v1-0002-Fix-restartpoint-during-crash-recovery.patch
Description: Binary data


bug fix and documentation improvement about vacuumdb

2023-09-14 Thread Kuwamura Masaki
Hi there,

I have 1 trivial fix, 1 bug fix, and 1 suggestion about vacuumdb.

First, I noticed that the help message of `vacuumdb` is a bit incorrect.

`vacuumdb -?` displays the following message
```
...
  -n, --schema=PATTERNvacuum tables in the specified schema(s)
only
  -N, --exclude-schema=PATTERNdo not vacuum tables in the specified
schema(s)

...
```
PATTERN should be changed to SCHEMA because -n and -N options don't support
pattern matching for schema names. The attached patch 0001 fixes this.

Second, when we use multiple -N options, vacuumdb runs incorrectly as shown
below.
```
$ psql
=# CREATE SCHEMA s1;
=# CREATE SCHEMA s2;
=# CREATE SCHEMA s3;
=# CREATE TABLE s1.t(i int);
=# CREATE TABLE s2.t(i int);
=# CREATE TABLE s3.t(i int);
=# ALTER SYSTEM SET log_statement TO 'all';
=# SELECT pg_reload_conf();
=# \q
$ vacuumdb -N s1 -N s2
```
We expect that tables in schemas s1 and s2 should not be vacuumed, while
the
others should be. However, logfile says like this.
```
LOG:  statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;
LOG:  statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;

...

LOG:  statement: VACUUM (SKIP_DATABASE_STATS) s2.t;
LOG:  statement: VACUUM (SKIP_DATABASE_STATS) s1.t;
LOG:  statement: VACUUM (ONLY_DATABASE_STATS);
```
Even specified by -N, s1.t and s2.t are vacuumed, and also the others are
vacuumed
twice. The attached patch 0002 fixes this.

Third, for the description of the -N option, I wonder if "vacuum all tables
except
in the specified schema(s)" might be clearer. The current one says nothing
about
tables not in the specified schema.

Thoughts?

Masaki Kuwamura


v1-0001-vacuumdb-Fix-help-message.patch
Description: Binary data


v1-0002-vacuumdb-Fix-bug-multiple-N-switches.patch
Description: Binary data


Re: pg_rewind with cascade standby doesn't work well

2023-09-11 Thread Kuwamura Masaki
> Consider a scenario like this,
>
> Server A: primary
> Server B :replica of A
> Server C :replica of B
>
> and somehow A down ,so B gets promoted.
> Server A: down
> Server B :new primary
> Server C :replica of B
>
> In this case, pg_rewind can be used to reconstruct the cascade; the
source is C and the target is A.
> However, we get error as belows by running pg_rewind.
>
>  ```
>  pg_rewind: fetched file "global/pg_control", length 8192
>  pg_rewind: source and target cluster are on the same timeline
>  pg_rewind: no rewind required
>  ```

To fix the above mentioned behavior of pg_rewind, I suggest to change the
cascade standby's (i.e. server C's) minRecoveryPointTLI when it receives
the new timeline information from the new primary (i.e. server B).

When server B is promoted, it creates an end-of-recovery record by calling
CreateEndOfRecoveryRecord(). (in xlog.c)
And also updates B's minRecoveryPoint and minRecoveryPointTLI.
```
/*
  * Update the control file so that crash recovery can follow the
timeline
  * changes to this point.
  */
 LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 ControlFile->minRecoveryPoint = recptr;
 ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;
 UpdateControlFile();
 LWLockRelease(ControlFileLock);
```
Since C is a replica of B, the end-of-recovery record is replicated from B
to C, so the record is replayed in C by xlog_redo().
The attached patch updates minRecoveryPoint and minRecoveryPointTLI at this
point by mimicking CreateEndOfRecoveryRecord().
With this patch, you can run pg_rewind with cascade standby immediately.
(without waiting for checkpointing)

Thoughts?

Masaki Kuwamura


v1-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch
Description: Binary data


pg_rewind with cascade standby doesn't work well

2023-09-07 Thread Kuwamura Masaki
Hi there,

I tested pg_rewind behavior and found a suspicious one.

Consider a scenario like this,

Server A: primary
Server B :replica of A
Server C :replica of B

and somehow A down ,so B gets promoted.

Server A: down
Server B :new primary
Server C :replica of B

In this case, pg_rewind can be used to reconstruct the cascade; the source
is C and the target is A.
However, we get error as belows by running pg_rewind.

```
pg_rewind: fetched file "global/pg_control", length 8192
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required
```
Though A's timeline is 1 and C's is 2 ideally,  it says they're on the same
timeline.

This is because `pg_rewind` currently uses minRecoveryPointTLI and latest
checkpoint's TimelineID to compare the TLI between source and target[1].
Both C's minRecoveryPointTLI and Latest checkpoint's TimelineID are not
modified until checkpointing. (even though B's are modified).
And then, if you run pg_rewind immediately, pg_rewind won't work because C
and A appear to be on the same timeline. So we have to CHECKPOINT on C
before running pg_rewind;

BTW, immediate pg_rewind with cascade standby seems to be already concerned
in another discussion[2], but unfortunately missed.

Anyway, I don't think this behavior is kind.
To fix this, should we use another variable to compare TLI?
Or, modify the cascade standby's minRecoveryPointTLI somehow?

Masaki Kuwamura

[1]
https://www.postgresql.org/message-id/flat/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi
[2]
https://www.postgresql.org/message-id/flat/aeb5f31a-8de2-40a8-64af-ab659a309d6b%40iki.fi


Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2023-07-27 Thread Kuwamura Masaki
Hi, Fujii-san

> Regarding the WARNING message, another idea is to pass the return value
> of PQgetCancel() directly to PQcancel() as follows. If NULL is passed,
> PQcancel() will detect it and set the proper error message to errbuf.
> Then the warning message "WARNING:  could not send cancel request:
> PQcancel() -- no cancel object supplied" is output.

I agree to go with this.

With this approach, the information behind the error (e.g., "out of
memory") will disappear, I guess.
I think we have to deal with it eventually. (I'm sorry, I don't have a good
idea right now)
However, the original issue is unnecessary waiting, and this should be
fixed soon.
So it is better to fix the problem this way and discuss retaining
information in another patch IMO.

I'm afraid I'm new to reviewing.
If I'm misunderstanding something, please let me know.

Masaki Kuwamura