pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-09-07 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

While investigating the cfbot failure [1], I found a strange behavior of pg_ctl
command. How do you think? Is this a bug to be fixed or in the specification?

# Problem

The "pg_ctl start" command returns 0 (succeeded) even if the cluster has
already been started. This occurs on Windows environment, and when the command
is executed just after postmaster starts.


# Analysis

The primal reason is in wait_for_postmaster_start(). In this function the
postmaster.pid file is read and checked whether the start command is
successfully done or not.

Check (1) requires that the postmaster must be started after the our pg_ctl
command, but 2 seconds delay is accepted. 

In the linux mode, the check (2) is also executed to ensures that the forked
process modified the file, so this time window is not so problematic.
But in the windows system, (2) is ignored, *so the pg_ctl command may be
succeeded if the postmaster is started within 2 seconds.*

```
if ((optlines = readfile(pid_file, )) != NULL &&
numlines >= LOCK_FILE_LINE_PM_STATUS)
{
/* File is complete enough for us, parse it */
pid_t   pmpid;
time_t  pmstart;

/*
 * Make sanity checks.  If it's for the wrong PID, or 
the recorded
 * start time is before pg_ctl started, then either we 
are looking
 * at the wrong data directory, or this is a 
pre-existing pidfile
 * that hasn't (yet?) been overwritten by our child 
postmaster.
 * Allow 2 seconds slop for possible cross-process 
clock skew.
 */
pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
if (pmstart >= start_time - 2 && // (1)
#ifndef WIN32
pmpid == pm_pid // (2)
#else
/* Windows can only reject standalone-backend PIDs */
pmpid > 0
#endif

```

# Appendix - how do I found?

I found it while investigating the failure. In the test "pg_upgrade --check"
is executed just after old cluster has been started. I checked the output file 
[2]
and found that the banner says "Performing Consistency Checks", which meant that
the parameter live_check was set to false (see output_check_banner()). This
parameter is set to true when the postmaster has been started at that time and
the pg_ctl start fails. That's how I find.

[1]: https://cirrus-ci.com/task/4634769732927488
[2]: 
https://api.cirrus-ci.com/v1/artifact/task/4634769732927488/testrun/build/testrun/pg_upgrade/003_logical_replication_slots/data/t_003_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20230905T080645.548/log/pg_upgrade_internal.log

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Transaction timeout

2023-09-07 Thread bt23nguyent

On 2023-09-06 20:32, Andrey M. Borodin wrote:

Thanks for looking into this!

On 6 Sep 2023, at 13:16, Fujii Masao  
wrote:


While testing v4 patch, I noticed it doesn't handle the COMMIT AND 
CHAIN case correctly.
When COMMIT AND CHAIN is executed, I believe the transaction timeout 
counter should reset
and start from zero with the next transaction. However, it appears 
that the current
v4 patch doesn't reset the counter in this scenario. Can you confirm 
this?

Yes, I was not aware of this feature. I'll test and fix this.

With the v4 patch, I found that timeout errors no longer occur during 
the idle in
transaction phase. Instead, they occur when the next statement is 
executed. Is this

the intended behavior?

AFAIR I had been testing that behaviour of "idle in transaction" was
intact. I'll check that again.


I thought some users might want to use the transaction timeout
feature to prevent prolonged transactions and promptly release 
resources (e.g., locks)

in case of a timeout, similar to idle_in_transaction_session_timeout.

Yes, this is exactly how I was expecting the feature to behave: empty
up max_connections slots for long-hanging transactions.

Thanks for your findings, I'll check and post new version!


Best regards, Andrey Borodin.

Hi,

Thank you for implementing this nice feature!
I tested the v4 patch in the interactive transaction mode with 3 
following cases:


1. Start a transaction with transaction_timeout=0 (i.e., timeout 
disabled), and then change the timeout value to more than 0 during the 
transaction.


=# SET transaction_timeout TO 0;
=# BEGIN;//timeout is not enabled
=# SELECT pg_sleep(5);
=# SET transaction_timeout TO '1s';
=# SELECT pg_sleep(10);//timeout is enabled with 1s
In this case, the transaction timeout happens during pg_sleep(10).

2. Start a transaction with transaction_timeout>0 (i.e., timeout 
enabled), and then change the timeout value to more than 0 during the 
transaction.


=# SET transaction_timeout TO '1000s';
=# BEGIN;//timeout is enabled with 1000s
=# SELECT pg_sleep(5);
=# SET transaction_timeout TO '1s';
=# SELECT pg_sleep(10);//timeout is not restarted and still running 
with 1000s
In this case, the transaction timeout does NOT happen during 
pg_sleep(10).


3. Start a transaction with transaction_timeout>0 (i.e., timeout 
enabled), and then change the timeout value to 0 during the transaction.


=# SET transaction_timeout TO '10s';
=# BEGIN;//timeout is enabled with 10s
=# SELECT pg_sleep(5);
=# SET transaction_timeout TO 0;
=# SELECT pg_sleep(10);//timeout is NOT disabled and still running 
with 10s

In this case, the transaction timeout happens during pg_sleep(10).

The first case where transaction_timeout is disabled before the 
transaction begins is totally fine. However, in the second and third 
cases, where transaction_timeout is enabled before the transaction 
begins, since the timeout has already enabled with a certain value, it 
will not be enabled again with a new setting value.


Furthermore, let's say I want to set a transaction_timeout value for all 
transactions in postgresql.conf file so it would affect all sessions. 
The same behavior happened but for all 3 cases, here is one example with 
the second case:


=# BEGIN; SHOW transaction_timeout; select pg_sleep(10); SHOW 
transaction_timeout; COMMIT;

BEGIN
 transaction_timeout
-
 15s
(1 row)

2023-09-07 11:52:50.510 JST [23889] LOG:  received SIGHUP, reloading 
configuration files
2023-09-07 11:52:50.510 JST [23889] LOG:  parameter 
"transaction_timeout" changed to "5000"

 pg_sleep
--

(1 row)

 transaction_timeout
-
 5s
(1 row)

COMMIT

I am of the opinion that these behaviors might lead to confusion among 
users. Could you confirm if these are the intended behaviors?


Additionally, I think the short description should be "Sets the maximum 
allowed time to commit a transaction." or "Sets the maximum allowed time 
to wait before aborting a transaction." so that it could be more clear 
and consistent with other %_timeout descriptions.


Also, there is a small whitespace error here:
src/backend/tcop/postgres.c:3373: space before tab in indent.
+   
stmt_reason, comma2, tx_reason)));


On a side note, while testing the patch with pgbench, it came to my 
attention that in scenarios involving the execution of multiple 
concurrent transactions within a high contention environment and with 
relatively short timeout durations, there is a potential for cascading 
blocking. This phenomenon can lead to multiple transactions exceeding 
their designated timeouts, consequently resulting in a degradation of 
transaction processing performance. No?
Do you think this feature should be co-implemented with the existing 
concurrency control protocol to maintain the transaction performance 
(e.g. a transaction scheduling mechanism based on transaction timeout)?


Regards,

Re: [PoC] Reducing planning time when tables have many partitions

2023-09-07 Thread Ashutosh Bapat
Hi Yuya,

On Fri, Aug 25, 2023 at 1:09 PM Yuya Watari  wrote:
>
> 3. Future works
>
> 3.1. Redundant memory allocation of Lists
>
> When we need child EquivalenceMembers in a loop over ec_members, v20
> adds them to the list. However, since we cannot modify the ec_members,
> v20 always copies it. In most cases, there are only one or two child
> members, so this behavior is a waste of memory and time and not a good
> idea. I didn't address this problem in v20 because doing so could add
> much complexity to the code, but it is one of the major future works.
>
> I suspect that the degradation of Queries A and B is due to this
> problem. The difference between 'make installcheck' and Queries A and
> B is whether there are partitioned tables. Most of the tests in 'make
> installcheck' do not have partitions, so find_relids_top_parents()
> could immediately determine the given Relids are already top-level and
> keep degradation very small. However, since Queries A and B have
> partitions, too frequent allocations of Lists may have caused the
> regression. I hope we can reduce the degradation by avoiding these
> memory allocations. I will continue to investigate and fix this
> problem.
>
> 3.2. em_relids and pull_varnos
>
> I'm sorry that v20 did not address your 1st concern regarding
> em_relids and pull_varnos. I will try to look into this.
>
> 3.3. Indexes for RestrictInfos
>
> Indexes for RestrictInfos are still in RangeTblEntry in v20-0002. I
> will also investigate this issue.
>
> 3.4. Correctness
>
> v20 has passed all regression tests in my environment, but I'm not so
> sure if v20 is correct.
>
> 4. Conclusion
>
> I wrote v20 based on a new idea. It may have a lot of problems, but it
> has advantages. At least it solves your 3rd concern. Since we iterate
> Lists instead of Bitmapsets, we don't have to introduce an iterator
> mechanism. My experiment showed that the 'make installcheck'
> degradation was very small. For the 2nd concern, v20 no longer adds
> child EquivalenceMembers to ec_members. I'm sorry if this is not what
> you intended, but it effectively worked. Again, v20 is a new proof of
> concept. I hope the v20-based approach will be a good alternative
> solution if we can overcome several problems, including what I
> mentioned above.

It seems that  you are still investigating and fixing issues. But the
CF entry is marked as "needs review". I think a better status is
"WoA". Do you agree with that?

-- 
Best Wishes,
Ashutosh Bapat




Re: psql help message contains excessive indentations

2023-09-07 Thread Kyotaro Horiguchi
At Thu, 7 Sep 2023 15:02:49 +0900, Yugo NAGATA  wrote in 
> On Thu, 07 Sep 2023 14:29:56 +0900 (JST)
> Kyotaro Horiguchi  wrote:
> >  >  \q quit psql
> >  >  \watch [[i=]SEC] [c=N] [m=MIN]
> > !>  execute query every SEC seconds, up to N times
> > !>  stop if less than MIN rows are returned
> > 
> > The last two lines definitely have some extra indentation.
> 
> Agreed.
> 
> > I've attached a patch that fixes this.
> 
> I wonder this better to fix this in similar way to other places where the
> description has multiple lines., like "\g [(OPTIONS)] [FILE]".

It looks correct to me:

>  \errverboseshow most recent error message at maximum verbosity
>  \g [(OPTIONS)] [FILE]  execute query (and send result to file or |pipe);
> \g with no arguments is equivalent to a semicolon
>  \gdesc describe result of query, without executing it

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pg_rewind with cascade standby doesn't work well

2023-09-07 Thread Kuwamura Masaki
Hi there,

I tested pg_rewind behavior and found a suspicious one.

Consider a scenario like this,

Server A: primary
Server B :replica of A
Server C :replica of B

and somehow A down ,so B gets promoted.

Server A: down
Server B :new primary
Server C :replica of B

In this case, pg_rewind can be used to reconstruct the cascade; the source
is C and the target is A.
However, we get error as belows by running pg_rewind.

```
pg_rewind: fetched file "global/pg_control", length 8192
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required
```
Though A's timeline is 1 and C's is 2 ideally,  it says they're on the same
timeline.

This is because `pg_rewind` currently uses minRecoveryPointTLI and latest
checkpoint's TimelineID to compare the TLI between source and target[1].
Both C's minRecoveryPointTLI and Latest checkpoint's TimelineID are not
modified until checkpointing. (even though B's are modified).
And then, if you run pg_rewind immediately, pg_rewind won't work because C
and A appear to be on the same timeline. So we have to CHECKPOINT on C
before running pg_rewind;

BTW, immediate pg_rewind with cascade standby seems to be already concerned
in another discussion[2], but unfortunately missed.

Anyway, I don't think this behavior is kind.
To fix this, should we use another variable to compare TLI?
Or, modify the cascade standby's minRecoveryPointTLI somehow?

Masaki Kuwamura

[1]
https://www.postgresql.org/message-id/flat/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi
[2]
https://www.postgresql.org/message-id/flat/aeb5f31a-8de2-40a8-64af-ab659a309d6b%40iki.fi


Re: pg_upgrade and logical replication

2023-09-07 Thread Michael Paquier
On Mon, Sep 04, 2023 at 02:12:58PM +0530, Amit Kapila wrote:
> Yeah, I agree that could be hacked quickly but note I haven't reviewed
> in detail if there are other design issues in this patch. Note that we
> thought first to support the upgrade of the publisher node, otherwise,
> immediately after upgrading the subscriber and publisher, the
> subscriptions won't work and start giving errors as they are dependent
> on slots in the publisher. One other point that needs some thought is
> that the LSN positions we are going to copy in the catalog may no
> longer be valid after the upgrade (of the publisher) because we reset
> WAL. Does that need some special consideration or are we okay with
> that in all cases?

In pg_upgrade, copy_xact_xlog_xid() puts the new node ahead of the old
cluster by 8 segments on TLI 1, so how would be it a problem if the
subscribers keep a remote confirmed LSN lower than that in their
catalogs?  (You've mentioned that to me offline, but I forgot the
details in the code.)

> As of now, things are quite safe as documented in
> pg_dump doc page that it will be the user's responsibility to set up
> replication after dump/restore. I think it would be really helpful if
> you could share your thoughts on the publisher-side matter as we are
> facing a few tricky questions to be answered. For example, see a new
> thread [1].

In my experience, users are quite used to upgrade standbys *first*,
even in simple scenarios like minor upgrades, because that's the only
way to do things safely.  For example, updating and/or upgrading
primaries before the standbys could be a problem if an update
introduces a slight change in the WAL record format that could be
generated by the primary but not be processed by a standby, and we've
done such tweaks in some records in the past for some bug fixes that
had to be backpatched to stable branches.

IMO, the upgrade of subscriber nodes and the upgrade of publisher
nodes need to be treated as two independent processing problems, dealt
with separately.

As you have mentioned me earlier offline, these two have, from what I
understand. one dependency: during a publisher upgrade we need to make
sure that there are no invalid slots when beginning to run pg_upgrade,
and that the confirmed LSN of all the slots used by the subscribers
match with the shutdown checkpoint's LSN, ensuring that the
subscribers would not lose any data because everything's already been
consumed by them when the publisher gets to be upgraded.

> The point raised by Jonathan for not having an option for pg_upgrade
> is that it will be easy for users, otherwise, users always need to
> enable this option. Consider a replication setup, wouldn't users want
> by default it to be upgraded? Asking them to do that via an option
> would be an inconvenience. So, that was the reason, we wanted to have
> an --exclude option and by default allow slots to be upgraded. I think
> the same theory applies here.
> 
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1LV3%2B76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg%40mail.gmail.com

I saw this thread, and have some thoughts to share.  Will reply there.
--
Michael


signature.asc
Description: PGP signature


Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Amit Kapila
On Thu, Sep 7, 2023 at 10:13 AM vignesh C  wrote:
>
> On Wed, 6 Sept 2023 at 12:09, Dilip Kumar  wrote:
> >
> > Other than that the patch LGTM.
>
> pgindent reported that the new comments added need to be re-adjusted,
> here is an updated patch for the same.
>

Thanks, the patch looks good to me as well.

-- 
With Regards,
Amit Kapila.




Re: Improve heapgetpage() performance, overhead from serializable

2023-09-07 Thread John Naylor
On Fri, Sep 1, 2023 at 1:08 PM tender wang  wrote:
>
> This thread [1] also Improving the heapgetpage function, and looks like
this thread.
>
> [1]
https://www.postgresql.org/message-id/a9f40066-3d25-a240-4229-ec2fbe94e7a5%40yeah.net

Please don't top-post.

For the archives: That CF entry has been withdrawn, after the author looked
at this one and did some testing.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: RFC: Logging plan of the running query

2023-09-07 Thread torikoshia

On 2023-09-06 11:17, James Coleman wrote:


> I've never been able to reproduce it (haven't tested the new version,
> but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
> (first buster and now bullseye).
>
> I'm attaching several stacktraces in the hope that they provide some
> clues. These all match the ps output I sent earlier, though note in
> that output there is both the regress instance and my test instance
> (pid 3213249) running (different ports, of course, and they are from
> the exact same compilation run). I've attached ps output for the
> postgres processes under the make check process to simplify cross
> referencing.
>
> A few interesting things:
> - There's definitely a lock on a relation that seems to be what's
> blocking the processes.
> - When I try to connect with psql the process forks but then hangs
> (see the ps output with task names stuck in "authentication"). I've
> also included a trace from one of these.

Thanks for sharing them!

Many processes are waiting to acquire the LW lock, including the 
process

trying to output the plan(select1.trace).

I suspect that this is due to a lock that was acquired prior to being
interrupted by ProcessLogQueryPlanInterrupt(), but have not been able 
to

reproduce the same situation..



I don't have time immediately to dig into this, but perhaps loading up
the core dumps would allow us to see what query is running in each
backend process (if it hasn't already been discarded by that point)
and thereby determine what point in each test process led to the error
condition?


Thanks for the suggestion.
I am concerned that core dumps may not be readable on different 
operating systems or other environments. (Unfortunately, I do not have 
Debian on hand)


It seems that we can know what queries were running from the stack 
traces you shared.
As described above, I suspect a lock which was acquired prior to 
ProcessLogQueryPlanInterrupt() caused the issue.
I will try a little more to see if I can devise a way to create the same 
situation.



Alternatively we might be able to apply the same trick to the test
client instead...

BTW, for my own easy reference in this thread: relid 1259 is pg_class
if I'm not mistaken.


Yeah, and I think it's strange that the lock to 1259 appears twice and 
must be avoided.


  #10 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at 
lmgr.c:117

  ..
  #49 0x559d61b4989d in ProcessLogQueryPlanInterrupt () at 
explain.c:5158

  ..
  #53 0x559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at 
lmgr.c:117


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: psql help message contains excessive indentations

2023-09-07 Thread Yugo NAGATA
On Thu, 07 Sep 2023 14:29:56 +0900 (JST)
Kyotaro Horiguchi  wrote:

> This topic is extracted from [1].
> 
> As mentioned there, in psql, running \? displays the following lines.
> 
>  >  \gdesc describe result of query, without executing it
>  >  \gexec execute query, then execute each value in its 
> result
>  >  \gset [PREFIX] execute query and store result in psql variables
>  >  \gx [(OPTIONS)] [FILE] as \g, but forces expanded output mode
>  >  \q quit psql
>  >  \watch [[i=]SEC] [c=N] [m=MIN]
> !>  execute query every SEC seconds, up to N times
> !>  stop if less than MIN rows are returned
> 
> The last two lines definitely have some extra indentation.

Agreed.

> I've attached a patch that fixes this.

I wonder this better to fix this in similar way to other places where the
description has multiple lines., like "\g [(OPTIONS)] [FILE]".

Regards,
Yugo Nagata

> 
> [1] 
> https://www.postgresql.org/message-id/20230830.103356.1909699532104167477.horikyota@gmail.com
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center


-- 
Yugo NAGATA 




<    1   2