Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2024-01-22 Thread Bertrand Drouvot
Hi,

On Mon, Jan 22, 2024 at 04:14:46PM +1100, Peter Smith wrote:
> 2024-01 Commitfest.
> 
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.

Thanks for the warning! I don't think the current code is causing any issues so
given the feedback I've had so far I think I'll withdraw the patch.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2024-01-21 Thread Peter Smith
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4698/
[2] https://cirrus-ci.com/task/5367036042280960

Kind Regards,
Peter Smith.




Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-18 Thread Drouvot, Bertrand

Hi,

On 12/13/23 3:33 PM, Michael Paquier wrote:

On Tue, Dec 12, 2023 at 04:54:32PM -0300, Euler Taveira wrote:

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.


Sounds like a fair point to me, 


Thanks for looking at it! I'm not sure about it, see my comment in [1].


Another thing that I'm a bit surprised with is why it would be OK to
switch the status to STREAMING only we first_stream is set, discarding
the restart case.


Yeah, that looks like a miss on my side. Thanks for pointing out!

Please find attached v2 addressing this remark.

[1]: 
https://www.postgresql.org/message-id/c76c0a65-f754-4614-b616-1d48f9195745%40gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 12da1b3c6295e2369b3a65c8f4ee40882def310f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 11 Dec 2023 20:05:25 +
Subject: [PATCH v2] move walrcv->walRcvState assignment to WALRCV_STREAMING

walrcv->walRcvState = WALRCV_STREAMING was set to early. Move the assignement
to a more appropriate place.
---
 src/backend/replication/walreceiver.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 26ded928a7..5f612d354e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -242,7 +242,6 @@ WalReceiverMain(void)
}
/* Advertise our PID so that the startup process can kill us */
walrcv->pid = MyProcPid;
-   walrcv->walRcvState = WALRCV_STREAMING;
 
/* Fetch information required to start streaming */
walrcv->ready_to_display = false;
@@ -412,6 +411,9 @@ WalReceiverMain(void)
options.proto.physical.startpointTLI = startpointTLI;
if (walrcv_startstreaming(wrconn, ))
{
+   SpinLockAcquire(>mutex);
+   walrcv->walRcvState = WALRCV_STREAMING;
+   SpinLockRelease(>mutex);
if (first_stream)
ereport(LOG,
(errmsg("started streaming WAL 
from primary at %X/%X on timeline %u",
-- 
2.34.1



Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-18 Thread Drouvot, Bertrand

Hi,

On 12/12/23 8:54 PM, Euler Taveira wrote:

On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:

Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
beginning of WalReceiverMain().

But it seems that after this assignment things could be wrong before the
walreicever actually starts streaming (like not being able to connect
to the primary).

It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
returns true: this is the proposal of this patch.


Per the state name (streaming), it seems it should be set later as you
proposed, 


Thanks for looking at it!


however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

     SetInstallXLogFileSegmentActive();
     RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
  PrimarySlotName,
  wal_receiver_create_temp_slot);
     flushedUpto = 0;
     }

     /*
  * Check if WAL receiver is active or wait to start up.
  */
     if (!WalRcvStreaming())
     {
     lastSourceFailed = true;
     break;
     }

After a few lines the function WalRcvStreaming() has:

     SpinLockRelease(>mutex);

     /*
  * If it has taken too long for walreceiver to start up, give up. Setting
  * the state to STOPPED ensures that if walreceiver later does start up
  * after all, it will see that it's not supposed to be running and die
  * without doing anything.
  */
     if (state == WALRCV_STARTING)
     {
     pg_time_t   now = (pg_time_t) time(NULL);

     if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
     {
     bool    stopped = false;

     SpinLockAcquire(>mutex);
     if (walrcv->walRcvState == WALRCV_STARTING)
     {
     state = walrcv->walRcvState = WALRCV_STOPPED;
     stopped = true;
     }
     SpinLockRelease(>mutex);

     if (stopped)
     ConditionVariableBroadcast(>walRcvStoppedCV);
     }
     }

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.


Yeah, so it looks to me that the sequence of events is:

1) The startup process sets walrcv->walRcvState = WALRCV_STARTING (in 
RequestXLogStreaming())
2) The startup process sets the walrcv->startTime (in RequestXLogStreaming())
3) The startup process asks then the postmaster to starts the walreceiver
4) Then The startup process checks if WalRcvStreaming() is true

Note that 3) is not waiting for the walreceiver to actually start: it "just" 
sets
a flag and kill (SIGUSR1) the postmaster (in SendPostmasterSignal()).

It means that if the time between 1 and 4 is <= WALRCV_STARTUP_TIMEOUT (10 
seconds)
then WalRcvStreaming() returns true (even if the walreceiver is not streaming 
yet).

So it looks to me that even if the walreceiver does take time to start 
streaming,
as long as the time between 1 and 4 is <= 10 seconds we are fine.

And I think it's fine because WalRcvStreaming() does not actually "only" check 
that the
walreceiver is streaming but as its comment states:

"
/*
 * Is walreceiver running and streaming (or at least attempting to connect,
 * or starting up)?
 */
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-13 Thread Michael Paquier
On Tue, Dec 12, 2023 at 04:54:32PM -0300, Euler Taveira wrote:
> Couldn't it give up before starting if you apply your patch? My main concern 
> is
> due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
> and the code above kills the walreceiver while in the process to start it.
> Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
> have issues during overload periods.

Sounds like a fair point to me, this area is trickier than it looks.
Another thing that I'm a bit surprised with is why it would be OK to
switch the status to STREAMING only we first_stream is set, discarding
the restart case.
--
Michael


signature.asc
Description: PGP signature


Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-12 Thread Euler Taveira
On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
> beginning of WalReceiverMain().
> 
> But it seems that after this assignment things could be wrong before the
> walreicever actually starts streaming (like not being able to connect
> to the primary).
> 
> It looks to me that WALRCV_STREAMING should be set once 
> walrcv_startstreaming()
> returns true: this is the proposal of this patch.

Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

SetInstallXLogFileSegmentActive();
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
 PrimarySlotName,
 wal_receiver_create_temp_slot);
flushedUpto = 0; 
}

/*
 * Check if WAL receiver is active or wait to start up.
 */
if (!WalRcvStreaming())
{
lastSourceFailed = true;
break;
}

After a few lines the function WalRcvStreaming() has:

SpinLockRelease(>mutex);

/*  
 * If it has taken too long for walreceiver to start up, give up. Setting
 * the state to STOPPED ensures that if walreceiver later does start up
 * after all, it will see that it's not supposed to be running and die
 * without doing anything.
 */
if (state == WALRCV_STARTING)
{   
pg_time_t   now = (pg_time_t) time(NULL);

if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
{   
boolstopped = false;

SpinLockAcquire(>mutex);
if (walrcv->walRcvState == WALRCV_STARTING)
{   
state = walrcv->walRcvState = WALRCV_STOPPED;
stopped = true;
}
SpinLockRelease(>mutex);

if (stopped)
ConditionVariableBroadcast(>walRcvStoppedCV);
}
}   

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-12 Thread Drouvot, Bertrand

Hi hackers,

Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
beginning of WalReceiverMain().

But it seems that after this assignment things could be wrong before the
walreicever actually starts streaming (like not being able to connect
to the primary).

It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
returns true: this is the proposal of this patch.

I don't think the current assignment location is causing any issues, but I
think it's more appropriate to move it like in the attached.

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 000bb21e54085abcf9df8cc75e4a86c21e24700e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 11 Dec 2023 20:05:25 +
Subject: [PATCH v1] move walrcv->walRcvState assignment to WALRCV_STREAMING

walrcv->walRcvState = WALRCV_STREAMING was set to early. Move the assignement
to a more appropriate place.
---
 src/backend/replication/walreceiver.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 26ded928a7..47f1d50144 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -242,7 +242,6 @@ WalReceiverMain(void)
}
/* Advertise our PID so that the startup process can kill us */
walrcv->pid = MyProcPid;
-   walrcv->walRcvState = WALRCV_STREAMING;
 
/* Fetch information required to start streaming */
walrcv->ready_to_display = false;
@@ -413,9 +412,14 @@ WalReceiverMain(void)
if (walrcv_startstreaming(wrconn, ))
{
if (first_stream)
+   {
+   SpinLockAcquire(>mutex);
+   walrcv->walRcvState = WALRCV_STREAMING;
+   SpinLockRelease(>mutex);
ereport(LOG,
(errmsg("started streaming WAL 
from primary at %X/%X on timeline %u",

LSN_FORMAT_ARGS(startpoint), startpointTLI)));
+   }
else
ereport(LOG,
(errmsg("restarted WAL 
streaming at %X/%X on timeline %u",
-- 
2.34.1